Skip to main content
Version: main

🏀 Women's college basketball with sportsdataverse-py

Welcome to the women's college hoops corner of the SportsDataverse! 🎉 In a handful of lines you're about to pull rosters, schedules, play-by-play, live scoreboards, AP rankings, ESPN's Basketball Power Index (BPI), in-game win-probability curves, and season-long parquet releases — all returned as tidy polars DataFrames that are ready to model. 🚀

sportsdataverse.wbb leads with ESPN's deep espn_wbb_* women's-college-basketball surface — over a hundred endpoints — plus blazing-fast load_wbb_* data loaders. If you know the R package wehoop, these names will feel like home. Let's go scout some hoopers! 🏀

🧰 The toolbox

Every accessor returns a tidy polars DataFrame by default — pass return_as_pandas=True for pandas. The ⭐ rows are the premium ESPN analytics surfaces we lead with. Click any name for the full reference:

FunctionWhat it gives youSource
espn_wbb_teamsEvery D-I program, one wide row eachESPN
espn_wbb_team_rosterA team's roster, one row per playerESPN
espn_wbb_scheduleGames for a date / date-rangeESPN
espn_wbb_team_scheduleOne program's full season slateESPN
espn_wbb_scoreboard⭐ Live + final scoreboard, one row per gameESPN
espn_wbb_pbpFull play-by-play + boxscore for a gameESPN
espn_wbb_player_gamelogA player's game-by-game logESPN
espn_wbb_player_splitsA player's situational stat splitsESPN
espn_wbb_team_statsA team's season stat splitsESPN
espn_wbb_standingsConference standings + recordsESPN
espn_wbb_conferencesConference groups + group idsESPN
espn_wbb_rankings⭐ AP / Coaches poll rankingsESPN
espn_wbb_leaders⭐ League statistical leadersESPN
espn_wbb_injuries⭐ Active injury reportESPN
espn_wbb_season_powerindexBPI ratings, one row per teamESPN
espn_wbb_season_powerindex_leaders⭐ BPI / SOS / SOR category leadersESPN
espn_wbb_game_predictor⭐ BPI matchup projection for a gameESPN
espn_wbb_game_probabilities⭐ Play-by-play win-probability curveESPN
espn_wbb_calendarValid game dates for a seasonESPN
load_wbb_scheduleSeason-long schedule (parquet release)release
load_wbb_pbpSeason-long play-by-play (parquet, back to 2002)release
load_wbb_team_boxscoreSeason-long team boxscores (parquet)release
load_wbb_player_boxscoreSeason-long player boxscores (parquet)release

🔌 Setup

pip install sportsdataverse

No API key needed — the ESPN endpoints and the parquet releases are all public. 😊

import polars as pl
import sportsdataverse as sdv
import sportsdataverse.wbb as wbb

SEASON = 2025 # the 2024-25 season — UConn's title run
print('most recent wbb season:', wbb.most_recent_wbb_season())

ESPN's live endpoints are seasonal — polls, injuries, and live scoreboards go quiet in the offseason, and any network call can hiccup. So we use a tiny safe() helper: you get the frame when the feed is up, and a friendly one-liner when it isn't (never a scary traceback). 🛟 The load_wbb_* parquet loaders are rock-solid year-round, so we lean on those for anything historical.

def safe(label, thunk):
"""Run a live call defensively; return None (with a note) if it can't."""
try:
out = thunk()
ok = out is not None and (not hasattr(out, 'height') or out.height > 0)
print(f"{'✅' if ok else '🟡'} {label}" + ('' if ok else ' (no rows right now)'))
return out
except Exception as e: # noqa: BLE001 -- demo resilience
print(f"⏭️ {label}: unavailable right now ({type(e).__name__})")
return None


def has_rows(df):
return df is not None and hasattr(df, 'height') and df.height > 0

🏟️ Teams

espn_wbb_teams returns one wide row per Division-I program. The team_id here is the key you'll feed to roster, stats, and leader endpoints. (NCAA team frames carry no conference column — that comes from espn_wbb_standings() / espn_wbb_conferences() below.)

teams = safe('teams', wbb.espn_wbb_teams)
(teams.select(['team_id', 'team_location', 'team_name', 'team_abbreviation', 'team_display_name']).head(10)
if has_rows(teams) else 'teams unavailable')

👥 Team roster

espn_wbb_team_roster takes a team_id and season and returns one row per player. Here's the 2024-25 UConn Huskies (team_id=2509) — the eventual national champions, led by Paige Bueckers.

uconn = safe('UConn roster', lambda: wbb.espn_wbb_team_roster(team_id=2509, season=SEASON))
(uconn.select(['athlete_id', 'full_name', 'jersey', 'position_abbreviation', 'display_height', 'display_weight']).head(12)
if has_rows(uconn) else 'roster unavailable')

📅 Schedule & scoreboard

Two complementary views of a slate:

FunctionBest for
espn_wbb_schedulea clean game list for a date or 'YYYYMMDD-YYYYMMDD' range
espn_wbb_scoreboard⭐ a richer live/final scoreboard (status, venue, scores)

April 4, 2025 was the women's Final Four. Note: home_score / away_score from espn_wbb_schedule arrive as strings, so cast before arithmetic.

final_four = safe('Final Four schedule', lambda: wbb.espn_wbb_schedule(dates=20250404))
(final_four.select(['id', 'date', 'away_display_name', 'away_score', 'home_display_name', 'home_score', 'status_type_completed'])
if has_rows(final_four) else 'schedule unavailable')
# ⭐ The scoreboard view of the same date — richer game-state columns
board = safe('Final Four scoreboard', lambda: wbb.espn_wbb_scoreboard(dates=20250404))
keep = ['game_id', 'short_name', 'status_type_completed', 'home_team_short_display_name',
'home_team_score', 'away_team_short_display_name', 'away_team_score']
(board.select([c for c in keep if c in board.columns])
if has_rows(board) else 'scoreboard unavailable')

🎬 Play-by-play

espn_wbb_pbp returns a dict of game components (plays, boxscore, header, winprobability, …). The plays value is a list of dicts — build a frame with pl.DataFrame(pbp['plays'], infer_schema_length=None). Columns use ESPN dot-notation (period.number, clock.displayValue, type.text, scoringPlay).

Game 401746075 is the 2025 national championship: South Carolina vs. UConn.

pbp = safe('championship pbp', lambda: wbb.espn_wbb_pbp(game_id=401746075))
plays = None
if pbp is not None and isinstance(pbp, dict) and pbp.get('plays'):
plays = pl.DataFrame(pbp['plays'], infer_schema_length=None)
print('plays shape:', plays.shape, '| components:', list(pbp.keys())[:8])
(plays.select(['period.number', 'clock.displayValue', 'type.text', 'scoringPlay', 'text']).head()
if plays is not None else 'pbp unavailable')
# Scoring plays only, with the running score
(plays.filter(pl.col('scoringPlay') == True)
.select(['period.number', 'clock.displayValue', 'awayScore', 'homeScore', 'text']).head(8)
if plays is not None else 'pbp unavailable')

⭐ Premium ESPN analytics

This is where espn_wbb_* shines. Three live league-wide feeds, each one line:

FunctionGives you
espn_wbb_rankingsthe current AP / Coaches poll
espn_wbb_leadersleague statistical leaders (PPG, RPG, APG, …)
espn_wbb_injuriesthe active injury report

These are in-season feeds, so out of season they return empty — our safe() helper handles that gracefully.

rankings = safe('rankings (AP/Coaches poll)', wbb.espn_wbb_rankings)
(rankings.head(12) if has_rows(rankings)
else 'no poll published right now (offseason) — try during the season')
injuries = safe('injury report', wbb.espn_wbb_injuries)
(injuries.head(10) if has_rows(injuries)
else 'no active injuries posted right now (offseason)')

📊 Basketball Power Index (BPI)

ESPN's BPI is a forward-looking team-strength rating — expected point margin per 70 possessions against an average opponent on a neutral floor. espn_wbb_season_powerindex returns one row per ranked team, with a nested stats list (BPI, BPI rank, SOS, SOR, …). Let's unnest it into a clean BPI leaderboard for 2024-25.

import ast

spi = safe('season BPI', lambda: wbb.espn_wbb_season_powerindex(season=SEASON))


def pick(stats, name):
# The nested `stats` value arrives as a Python-repr string — parse it safely
if isinstance(stats, str):
try:
stats = ast.literal_eval(stats)
except (ValueError, SyntaxError):
return None
for s in (stats or []):
if isinstance(s, dict) and s.get('name') == name:
return s.get('value')
return None


if has_rows(spi):
rows = [
{
'bpi_rank': pick(r['stats'], 'bpirank'),
'bpi': pick(r['stats'], 'bpi'),
'conference_id': r.get('conference_id'),
'team_ref': r.get('team_$ref'),
}
for r in spi.to_dicts()
]
out = pl.DataFrame(rows).sort('bpi', descending=True, nulls_last=True).head(12)
else:
out = 'BPI unavailable right now'
out

And espn_wbb_season_powerindex_leaders lists the category leaders — who tops BPI, strength-of-schedule, strength-of-record, and more.

spi_leaders = safe('BPI category leaders', lambda: wbb.espn_wbb_season_powerindex_leaders(season=SEASON))
(spi_leaders.select(['name', 'display_name']).head(10)
if has_rows(spi_leaders) else 'BPI leaders unavailable')

🏆 Standings & conferences

espn_wbb_standings returns one wide row per team — records, win %, points for/against, and conference membership. espn_wbb_conferences lists the conference groups with their group_ids (handy for filtering).

standings = safe('2025 standings', lambda: wbb.espn_wbb_standings(season=SEASON))
(standings.select(['team_display_name', 'conference_abbreviation', 'wins', 'losses', 'win_percent', 'points_for', 'points_against'])
.sort('win_percent', descending=True, nulls_last=True).head(10)
if has_rows(standings) else 'standings unavailable')
conferences = safe('conferences', wbb.espn_wbb_conferences)
(conferences.select(['group_id', 'name', 'abbreviation', 'short_name']).head(12)
if has_rows(conferences) else 'conferences unavailable')

🍳 Cookbook: common WBB tasks

Now the fun part — real tasks you'll reach for constantly, each built on the premium functions above. The load_wbb_* loaders below read pre-built parquet releases from wehoop-wbb-data, so they're fast and reliable year-round. We base most season-wide recipes on 2024 because that release is fully published; swap the season once newer parquet drops.

First, pull the three season-long parquet releases we'll lean on across the Cookbook — player boxscores, team boxscores, and play-by-play for 2024. One load, many recipes.

player_box = wbb.load_wbb_player_boxscore(seasons=[2024])
team_box = wbb.load_wbb_team_boxscore(seasons=[2024])
season_pbp = wbb.load_wbb_pbp(seasons=[2024])
print('player_box:', player_box.shape, '| team_box:', team_box.shape, '| pbp:', season_pbp.shape)

Recipe 1 — Win-probability ride of a championship 📈

espn_wbb_game_probabilities returns ESPN's play-by-play win-probability snapshots for a game. Let's watch how UConn's win odds evolved through the 2025 title game (event 401746075).

wp = safe('win probability', lambda: wbb.espn_wbb_game_probabilities(event_id=401746075))
if has_rows(wp):
ride = wp.select(['sequence_number', 'home_win_percentage', 'away_win_percentage', 'tie_percentage'])
print('snapshots:', ride.height,
'| opening home win%:', round(float(ride['home_win_percentage'][0]) * 100, 1),
'| final home win%:', round(float(ride['home_win_percentage'][-1]) * 100, 1))
out = ride.head(6)
else:
out = 'win probability unavailable'
out

Recipe 2 — BPI matchup preview for a game 🔮

espn_wbb_game_predictor gives ESPN's BPI-based projection for a single game — matchup quality, projected game score, and each side's predicted point total. Here's the championship preview.

pred = safe('game predictor (BPI)', lambda: wbb.espn_wbb_game_predictor(event_id=401746075))
if has_rows(pred):
home_stats = pred['home_team_statistics'][0]
if isinstance(home_stats, str): # arrives as a Python-repr string
home_stats = ast.literal_eval(home_stats)
preview = pl.DataFrame([
{'stat': s.get('displayName'), 'value': s.get('displayValue')}
for s in home_stats if isinstance(s, dict)
])
out = preview.head(10)
else:
out = 'predictor unavailable'
out

Recipe 3 — Top scorers of a full season 🥇

Take the season-long player boxscore and aggregate with polars to find the highest per-game scorers (min. 20 games).

top_scorers = (
player_box
.group_by(['athlete_id', 'athlete_display_name', 'team_short_display_name'])
.agg(
games=pl.len(),
total_points=pl.col('points').sum(),
ppg=pl.col('points').mean().round(1),
)
.filter(pl.col('games') >= 20)
.sort('ppg', descending=True)
.head(10)
)
top_scorers

Recipe 4 — Best scoring offenses, joined to records 🤝

Aggregate the team boxscore to rank programs by points per game, then attach each team's W-L from the live standings.

offense = (
team_box
.group_by(['team_id', 'team_display_name'])
.agg(games=pl.len(), ppg=pl.col('team_score').mean().round(1))
.filter(pl.col('games') >= 20)
.sort('ppg', descending=True)
.head(10)
)
if has_rows(standings):
recs = standings.select(['team_id', 'wins', 'losses']).with_columns(pl.col('team_id').cast(pl.Int64, strict=False))
offense = offense.with_columns(pl.col('team_id').cast(pl.Int64, strict=False)).join(recs, on='team_id', how='left')
offense

Recipe 5 — A program's full season slate 🗓️

espn_wbb_team_schedule returns one program's complete season — every game with its date, matchup short name, and season type. Here's UConn's 2024-25 road to the title (team_id=2509).

tsched = safe('UConn team schedule', lambda: wbb.espn_wbb_team_schedule(team_id=2509, season=SEASON))
if has_rows(tsched):
keep = ['id', 'date', 'short_name', 'season_type_name', 'week_text']
out = tsched.select([c for c in keep if c in tsched.columns]).head(12)
print('games on the slate:', tsched.height)
else:
out = 'team schedule unavailable (offseason) — try during the season'
out

Recipe 6 — Deadliest three-point shooting teams 🎯

Roll the team boxscore up to season totals and compute each program's three-point percentage. Made ÷ attempted, sorted, min. 20 games.

three_pt = (
team_box
.group_by(['team_id', 'team_display_name'])
.agg(
games=pl.len(),
tpm=pl.col('three_point_field_goals_made').sum(),
tpa=pl.col('three_point_field_goals_attempted').sum(),
)
.filter((pl.col('games') >= 20) & (pl.col('tpa') > 0))
.with_columns((pl.col('tpm') / pl.col('tpa') * 100).round(1).alias('three_pct'))
.sort('three_pct', descending=True)
.head(10)
)
three_pt

Recipe 7 — Clutch shot-makers ⏱️

Slice the season-long play-by-play to scoring plays in the final two minutes of the 4th quarter (or overtime), total each player's clutch points, and name them via the player boxscore. Pure ice in the veins.

name_lookup = player_box.select(
['athlete_id', 'athlete_display_name', 'team_short_display_name']
).unique(subset=['athlete_id'])

clutch = (
season_pbp
.filter(
(pl.col('period_number') >= 4)
& (pl.col('scoring_play') == True)
& (pl.col('start_game_seconds_remaining') <= 120)
& pl.col('athlete_id_1').is_not_null()
)
.group_by('athlete_id_1')
.agg(clutch_points=pl.col('score_value').sum(), clutch_plays=pl.len())
.rename({'athlete_id_1': 'athlete_id'})
.join(name_lookup, on='athlete_id', how='left')
.sort('clutch_points', descending=True)
.select(['athlete_display_name', 'team_short_display_name', 'clutch_plays', 'clutch_points'])
.head(10)
)
clutch

Recipe 8 — Where the buckets come from (shot-zone mix) 🗺️

The play-by-play carries coordinate_x / coordinate_y for shots and a score_value (2 or 3). Bucket every made field goal into a zone and see how a season's points break down by shot location.

shot_zones = (
season_pbp
.filter(
(pl.col('scoring_play') == True)
& (pl.col('score_value') >= 2)
& pl.col('coordinate_y').is_not_null()
)
.with_columns(
pl.when(pl.col('score_value') == 3).then(pl.lit('3-pointer'))
.when(pl.col('coordinate_y') <= 8).then(pl.lit('2pt — at the rim'))
.otherwise(pl.lit('2pt — jumper')).alias('shot_zone')
)
.group_by('shot_zone')
.agg(made_field_goals=pl.len(), points=pl.col('score_value').sum())
.with_columns(
(pl.col('made_field_goals') / pl.col('made_field_goals').sum() * 100).round(1).alias('share_pct')
)
.sort('made_field_goals', descending=True)
)
shot_zones

Recipe 9 — Double-double machines 🔄

Flag every player-game with at least two double-digit categories (points / rebounds / assists), then count who racked up the most double-doubles across the season.

dd = (
player_box
.with_columns(
(
(pl.col('points') >= 10).cast(pl.Int8)
+ (pl.col('rebounds') >= 10).cast(pl.Int8)
+ (pl.col('assists') >= 10).cast(pl.Int8)
).alias('double_digit_cats')
)
.filter(pl.col('double_digit_cats') >= 2)
.group_by(['athlete_id', 'athlete_display_name', 'team_short_display_name'])
.agg(double_doubles=pl.len())
.sort('double_doubles', descending=True)
.head(10)
)
dd

Recipe 10 — Find the best defenses (fewest points allowed) 🛡️

Every team boxscore row carries the opponent's score, so a single group-by yields points allowed per game. Lowest-scoring opponents = stingiest defenses.

defense = (
team_box
.group_by(['team_id', 'team_display_name'])
.agg(
games=pl.len(),
opp_ppg=pl.col('opponent_team_score').mean().round(1),
own_ppg=pl.col('team_score').mean().round(1),
)
.filter(pl.col('games') >= 20)
.with_columns((pl.col('own_ppg') - pl.col('opp_ppg')).round(1).alias('net_ppg'))
.sort('opp_ppg')
.head(10)
)
defense

Recipe 11 — Rolling form: a team's last 10 games 📊

Filter the team boxscore to one program, sort by date, and take the tail — a quick "how did they finish the year?" view with the scoring margin per game. Here's UConn (team_id=2509).

last10 = (
team_box
.filter(pl.col('team_id') == 2509)
.with_columns((pl.col('team_score') - pl.col('opponent_team_score')).alias('margin'))
.sort('game_date')
.tail(10)
.select(['game_date', 'opponent_team_short_display_name', 'team_score', 'opponent_team_score', 'margin'])
)
print('average margin over last 10:', round(last10['margin'].mean(), 1) if last10.height else 'n/a')
last10

Recipe 12 — Pandas interop: a season's play-type mix 🐼

Every loader and accessor takes return_as_pandas=True. Pull the play-by-play as pandas, tally the most common play types with a one-liner value_counts(), and you're back in familiar territory for downstream tooling.

pbp_pd = wbb.load_wbb_pbp(seasons=[2024], return_as_pandas=True)
play_mix = (
pbp_pd['type_text']
.value_counts()
.head(10)
.rename_axis('play_type')
.reset_index(name='count')
)
play_mix

🎉 Where to go next

Now go find the next national champion! 🏀🏆