๐ Men's college basketball with sportsdataverse-py
Welcome to Selection-Sunday-grade hoops data! ๐ In a handful of lines of Python you're about to pull NCAA Division I men's basketball โ full schedules, play-by-play, standings, rosters, statistical leaders and multi-season parquet archives โ and get it all back as tidy polars DataFrames ready to model.
sportsdataverse.mbb leads with two premium sources:
- ๐ฅ ESPN (
espn_mbb_*) โ the site + core APIs behind ESPN.com: live scoreboards, schedules, standings, rankings, box scores, win probability and play-by-play. - ๐ฆ FoxSports (
fox_mbb_*) โ FoxSports' league-leader, standings, roster, boxscore and odds feeds.
Plus ๐ฆ release loaders (load_mbb_*) that hand you whole seasons of
play-by-play, box scores, shots and schedules from the data repo in one call.
R user? The men's-basketball companion is hoopR (NBA + NCAA). Let's tip off! ๐
๐งฐ The toolboxโ
Every accessor returns a tidy polars DataFrame by default โ pass
return_as_pandas=True for pandas. The richest live surfaces are ESPN and Fox;
the load_* loaders read pre-built parquet from the data release (rock-solid,
no live API). Click any name for the full reference.
| Function | What it gives you | Source |
|---|---|---|
espn_mbb_teams | Every D-I team (grab team_ids) | ๐ฅ ESPN โญ |
espn_mbb_schedule | Games + results for a date / window | ๐ฅ ESPN โญ |
espn_mbb_scoreboard | Rich scoreboard for a date (status, lines, odds) | ๐ฅ ESPN โญ |
espn_mbb_standings | Conference standings, one row per team | ๐ฅ ESPN โญ |
espn_mbb_rankings | AP / Coaches poll (in-season) | ๐ฅ ESPN โญ |
espn_mbb_summary | Full game summary: box, plays, win prob | ๐ฅ ESPN โญ |
espn_mbb_team_roster | A team's roster | ๐ฅ ESPN โญ |
espn_mbb_pbp | Event-level play-by-play for a game | ๐ฅ ESPN โญ |
espn_mbb_game_rosters | Who dressed + started for one game | ๐ฅ ESPN โญ |
espn_mbb_player_stats | A player's season stat line | ๐ฅ ESPN โญ |
fox_mbb_league_leaders | Stat leaders (scoring, rebounds, โฆ) | ๐ฆ Fox โญ |
fox_mbb_standings | Fox conference standings for a team | ๐ฆ Fox โญ |
fox_mbb_team_roster | Fox roster for a team | ๐ฆ Fox |
espn_mbb_team_schedule | One team's full season schedule | ๐ฅ ESPN โญ |
espn_mbb_conferences | Conference / group catalog | ๐ฅ ESPN โญ |
load_mbb_schedule | Whole-season schedule parquet | ๐ฆ loader |
load_mbb_player_boxscore | Season player box scores | ๐ฆ loader |
load_mbb_team_boxscore | Season team box scores | ๐ฆ loader |
load_mbb_pbp | Season play-by-play parquet | ๐ฆ loader |
most_recent_mbb_season | Current season-year helper | ๐ ๏ธ helper |
โญ = premium live source.
๐ Setupโ
pip install sportsdataverse
No API key needed โ ESPN, Fox and the parquet loaders are all open. ๐
import polars as pl
import sportsdataverse as sdv
pl.Config.set_tbl_rows(10)
print("most recent MBB season:", sdv.mbb.most_recent_mbb_season())
ESPN's live endpoints (scoreboard, rankings, standings, a single game's
play-by-play) are seasonal โ in the offseason a poll or scoreboard can come
back empty. So we use a tiny safe() helper: you get the frame when the feed
is up, and a friendly one-liner when it isn't โ never a scary traceback. ๐
The load_* parquet loaders are stable year-round, so we call those directly.
def safe(label, thunk):
"""Run a live call defensively; return its result or None with a note."""
try:
out = thunk()
ok = out is not None and (not hasattr(out, "height") or out.height)
print(f"{'โ
' if ok else 'โน๏ธ '} {label}{'' if ok else ' โ no rows right now'}")
return out
except Exception as e: # noqa: BLE001 -- demo resilience
print(f"โญ๏ธ {label}: unavailable right now ({type(e).__name__})")
return None
๐๏ธ Every team in Division Iโ
Start with espn_mbb_teams โ
one row per program, with the team_id you'll pass into roster, schedule and
summary calls. This is a plain catalog fetch, so it's reliable year-round.
teams = sdv.mbb.espn_mbb_teams()
print("teams:", teams.shape)
teams.select(["team_id", "team_location", "team_name", "team_abbreviation", "team_is_active"]).head()
๐ Schedule & scores for a date windowโ
espn_mbb_schedule takes a
single dates=YYYYMMDD or a 'YYYYMMDD-YYYYMMDD' window and returns one row
per game with final scores. Here's championship day of the 2024 tournament.
sched = safe(
"schedule 2024-04-08",
lambda: sdv.mbb.espn_mbb_schedule(dates=20240408),
)
(sched.select(["id", "home_display_name", "away_display_name", "home_score", "away_score"]).head()
if sched is not None and sched.height else "schedule unavailable")
๐ The rich scoreboardโ
espn_mbb_scoreboard is the
deluxe version: for a given date it returns status, broadcast, betting lines and
team line scores โ 50 columns wide. Defaults to polars; we peek at a tidy slice.
sb = safe(
"scoreboard 2024-04-08",
lambda: sdv.mbb.espn_mbb_scoreboard(dates=20240408, return_as_pandas=False),
)
if sb is not None and getattr(sb, "height", 0):
keep = ["game_id", "short_name", "status_type_description",
"home_team_short_display_name", "away_team_short_display_name"]
out = sb.select([c for c in keep if c in sb.columns]).head()
else:
out = "scoreboard empty right now (offseason)"
out
๐ Conference standingsโ
espn_mbb_standings returns one
row per team for a season with wins, losses, win pct, point differential and
conference grouping. Great for a quick power look across the league.
standings = safe(
"standings 2024",
lambda: sdv.mbb.espn_mbb_standings(season=2024, return_as_pandas=False),
)
if standings is not None and getattr(standings, "height", 0):
keep = ["team_display_name", "group_name", "wins", "losses",
"win_percent", "point_differential"]
out = (standings.select([c for c in keep if c in standings.columns])
.sort("win_percent", descending=True).head(10))
else:
out = "standings unavailable"
out
๐ณ Cookbook: common MBB tasksโ
Now the fun part โ real tasks you'll reach for constantly, each built on a premium ESPN or Fox wrapper. Every recipe is guarded so a transient or offseason hiccup prints a note instead of breaking the page.
Recipe 1 โ National scoring leaders ๐ฅ (FoxSports)โ
fox_mbb_league_leaders
serves the leaderboard direct from FoxSports โ pick a category (scoring,
rebounds, assists, โฆ) and who (player or team). No IDs needed.
leaders = safe(
"fox scoring leaders",
lambda: sdv.mbb.fox_mbb_league_leaders(category="scoring", who="player"),
)
if leaders is not None and getattr(leaders, "height", 0):
keep = ["players", "gp", "mpg", "ppg", "pts"]
out = leaders.select([c for c in keep if c in leaders.columns]).head(10)
else:
out = "Fox leaders unavailable right now"
out
Recipe 2 โ Look up a team's roster ๐ฅ (ESPN)โ
Grab a team_id from espn_mbb_teams, then
espn_mbb_team_roster returns
the current roster. Here we resolve UConn (the 2024 champs) by abbreviation so
the recipe is self-contained.
row = teams.filter(pl.col("team_abbreviation") == "CONN")
tid = int(row["team_id"][0]) if row.height else 41 # 41 = UConn fallback
roster = safe(
f"roster team_id={tid}",
lambda: sdv.mbb.espn_mbb_team_roster(team_id=tid, return_as_pandas=False),
)
if roster is not None and getattr(roster, "height", 0):
keep = ["full_name", "jersey", "display_height", "display_weight"]
out = roster.select([c for c in keep if c in roster.columns]).head(12)
else:
out = "roster unavailable right now"
out
Recipe 3 โ Season scoring leaderboard from parquet ๐ฆโ
The load_* loaders pull whole seasons from the data release โ perfect for
analysis that shouldn't depend on a live endpoint.
load_mbb_player_boxscore
gives every player-game; we aggregate to a per-player points-per-game board.
pbox = sdv.mbb.load_mbb_player_boxscore(seasons=[2024])
print("player box rows:", pbox.shape)
(pbox
.filter(pl.col("points").is_not_null())
.group_by(["athlete_display_name", "team_short_display_name"])
.agg(
pl.len().alias("g"),
pl.col("points").cast(pl.Float64, strict=False).mean().round(1).alias("ppg"),
)
.filter(pl.col("g") >= 20)
.sort("ppg", descending=True)
.head(10))
Recipe 4 โ Play-by-play slice for one game ๐ฌ (ESPN)โ
espn_mbb_pbp returns a dict;
its plays list is event-level. We frame it and pull just the scoring plays of
the 2024 national championship (UConn vs. Purdue, game_id=401638636).
pbp = safe("pbp 401638636", lambda: sdv.mbb.espn_mbb_pbp(game_id=401638636))
if isinstance(pbp, dict) and pbp.get("plays"):
plays = pl.DataFrame(pbp["plays"], infer_schema_length=None)
keep = ["period.number", "clock.displayValue", "text", "scoringPlay",
"homeScore", "awayScore"]
out = (plays.select([c for c in keep if c in plays.columns])
.filter(pl.col("scoringPlay") == True) # noqa: E712
.head(10))
else:
out = "play-by-play unavailable right now"
out
Recipe 5 โ Best net scoring margin ๐ (parquet)โ
load_mbb_team_boxscore gives one row per team-game with the opponent's score attached, so a single group-by ranks every program by points scored minus points allowed โ the cleanest one-number power proxy. Pure parquet, no live endpoint.
tbox = sdv.mbb.load_mbb_team_boxscore(seasons=[2024])
print("team box rows:", tbox.shape)
(tbox
.group_by("team_display_name")
.agg(
pl.len().alias("g"),
pl.col("team_score").cast(pl.Float64, strict=False).mean().round(1).alias("ppg"),
pl.col("opponent_team_score").cast(pl.Float64, strict=False).mean().round(1).alias("opp_ppg"),
)
.with_columns((pl.col("ppg") - pl.col("opp_ppg")).round(1).alias("net_margin"))
.filter(pl.col("g") >= 25)
.sort("net_margin", descending=True)
.head(10))
Recipe 6 โ Best 3-point shooting teams ๐ฏ (parquet)โ
Same team-box parquet, different question: sum makes and attempts across the season, then divide. A min attempts filter keeps small-sample flukes off the board so the leaders are real volume shooters.
(tbox
.group_by("team_display_name")
.agg(
pl.col("three_point_field_goals_made")
.cast(pl.Float64, strict=False).sum().alias("tpm"),
pl.col("three_point_field_goals_attempted")
.cast(pl.Float64, strict=False).sum().alias("tpa"),
)
.with_columns((pl.col("tpm") / pl.col("tpa") * 100).round(1).alias("three_pct"))
.filter(pl.col("tpa") >= 500)
.sort("three_pct", descending=True)
.select(["team_display_name", "tpm", "tpa", "three_pct"])
.head(10))
Recipe 7 โ Most efficient scorers โก (true shooting %)โ
Points-per-game rewards volume; true shooting % rewards efficiency โ it folds threes and free throws into one rate via TS% = PTS / (2 ยท (FGA + 0.44ยทFTA)). We compute it straight from load_mbb_player_boxscore, keeping only high-usage scorers.
pbox = sdv.mbb.load_mbb_player_boxscore(seasons=[2024])
(pbox
.filter(pl.col("points").is_not_null())
.group_by(["athlete_display_name", "team_abbreviation"])
.agg(
pl.len().alias("g"),
pl.col("points").cast(pl.Float64, strict=False).sum().alias("pts"),
pl.col("field_goals_attempted").cast(pl.Float64, strict=False).sum().alias("fga"),
pl.col("free_throws_attempted").cast(pl.Float64, strict=False).sum().alias("fta"),
)
.with_columns(
(pl.col("pts") / (2 * (pl.col("fga") + 0.44 * pl.col("fta"))) * 100)
.round(1).alias("ts_pct"))
.filter((pl.col("g") >= 25) & (pl.col("pts") >= 400))
.sort("ts_pct", descending=True)
.select(["athlete_display_name", "team_abbreviation", "g", "pts", "ts_pct"])
.head(10))
Recipe 8 โ One conference's power board ๐๏ธ (ESPN, join)โ
espn_mbb_conferences is the group catalog; espn_mbb_standings carries a group_name per team. Filter standings to a single league โ here the Big 12 โ to get a clean intra-conference pecking order.
confs = safe("conferences", lambda: sdv.mbb.espn_mbb_conferences())
if confs is not None and getattr(confs, "height", 0):
print("some conferences:",
confs.filter(pl.col("is_conference"))["name"].to_list()[:8])
st = safe("standings 2024", lambda: sdv.mbb.espn_mbb_standings(season=2024))
if st is not None and getattr(st, "height", 0) and "group_name" in st.columns:
keep = ["team_display_name", "wins", "losses", "win_percent", "point_differential"]
out = (st.filter(pl.col("group_name").str.contains("Big 12"))
.select([c for c in keep if c in st.columns])
.sort("win_percent", descending=True)
.head(12))
out = out if out.height else st.select(
[c for c in keep if c in st.columns]).sort(
"win_percent", descending=True).head(12)
else:
out = "standings unavailable right now"
out
Recipe 9 โ A team's full season schedule ๐๏ธ (ESPN)โ
espn_mbb_team_schedule returns every game on one team's slate for a season โ matchup name, week and season type โ perfect for building an opponent list. We use UConn's 2024 championship run.
tid_sched = int(row["team_id"][0]) if row.height else 41 # UConn fallback
tsched = safe(
f"team schedule {tid_sched}",
lambda: sdv.mbb.espn_mbb_team_schedule(team_id=tid_sched, season=2024),
)
if tsched is not None and getattr(tsched, "height", 0):
keep = ["id", "short_name", "season_type_name", "week_text"]
out = tsched.select([c for c in keep if c in tsched.columns]).head(12)
else:
out = "team schedule unavailable right now"
out
Recipe 10 โ Top rebounding teams ๐งฒ (FoxSports)โ
fox_mbb_league_leaders isn't just a player board โ flip who="team" and pick category="rebounds" to rank programs on the glass straight from FoxSports. No IDs needed.
team_reb = safe(
"fox team rebounds",
lambda: sdv.mbb.fox_mbb_league_leaders(category="rebounds", who="team"),
)
if team_reb is not None and getattr(team_reb, "height", 0):
keep = ["teams", "gp", "w", "l", "ppg", "ppg_diff"]
out = team_reb.select([c for c in keep if c in team_reb.columns]).head(10)
else:
out = "Fox team leaders unavailable right now"
out
Recipe 11 โ Crunch-time buckets ๐ฅ (parquet PBP)โ
load_mbb_pbp is the whole season's play-by-play in one parquet โ no live game needed. We slice it to scoring plays in the final minute of the second half: every late-game dagger across the year.
season_pbp = sdv.mbb.load_mbb_pbp(seasons=[2024])
print("season pbp rows:", season_pbp.shape)
(season_pbp
.filter(
(pl.col("scoring_play") == True) # noqa: E712
& (pl.col("period_number") >= 2)
& (pl.col("end_period_seconds_remaining").cast(pl.Float64, strict=False) <= 60)
)
.select(["game_id", "period_display_value", "clock_display_value",
"text", "home_score", "away_score"])
.head(10))
Recipe 12 โ Double-double leaders ๐ผ (pandas interop)โ
Prefer pandas? Pass return_as_pandas=True to any loader and stay in your comfort zone. Here we count games where a player hit double digits in at least two of points / rebounds / assists โ the classic double-double โ entirely in pandas.
import pandas as pd
pbox_pd = sdv.mbb.load_mbb_player_boxscore(seasons=[2024], return_as_pandas=True)
for col in ["points", "rebounds", "assists"]:
pbox_pd[col] = pd.to_numeric(pbox_pd[col], errors="coerce")
pbox_pd["is_dd"] = (pbox_pd[["points", "rebounds", "assists"]] >= 10).sum(axis=1) >= 2
(pbox_pd[pbox_pd["is_dd"]]
.groupby(["athlete_display_name", "team_abbreviation"])
.size()
.reset_index(name="double_doubles")
.sort_values("double_doubles", ascending=False)
.head(10)
.reset_index(drop=True))
๐งพ One call, the whole game: espn_mbb_summaryโ
espn_mbb_summary is the Swiss
army knife โ a single event_id returns a dict with team & player box scores,
play-by-play, win probability, leaders, officials and more. Let's grab the team
box score from that 2024 title game.
summ = safe("summary 401638636", lambda: sdv.mbb.espn_mbb_summary(event_id=401638636))
if isinstance(summ, dict) and summ.get("boxscore_team") is not None:
tb = summ["boxscore_team"]
tb = tb if isinstance(tb, pl.DataFrame) else pl.DataFrame(tb)
print("box score sections available:", [k for k in summ.keys()][:8])
out = tb.head()
else:
out = "summary unavailable right now"
out
๐ Who suited up: game rostersโ
espn_mbb_game_rosters
returns one row per dressed player for a game, flagging starters โ handy for
joining onto play-by-play or box scores.
gr = safe("game rosters 401638636", lambda: sdv.mbb.espn_mbb_game_rosters(game_id=401638636))
if gr is not None and getattr(gr, "height", 0):
keep = ["athlete_display_name", "team_abbreviation", "starter"]
out = gr.select([c for c in keep if c in gr.columns]).head(10)
else:
out = "game rosters unavailable right now"
out
๐ง A multi-season pipeline: highest-scoring tournament gamesโ
The schedule loader is stable, so here's a pure-polars analysis with no live dependency. We load the 2024 season schedule and rank games by combined points โ March Madness shootouts float right to the top.
schedule_2024 = sdv.mbb.load_mbb_schedule(seasons=[2024])
print("season schedule rows:", schedule_2024.shape)
(schedule_2024
.with_columns(
(pl.col("home_score").cast(pl.Int64, strict=False)
+ pl.col("away_score").cast(pl.Int64, strict=False)).alias("total"))
.filter(pl.col("total").is_not_null())
.sort("total", descending=True)
.select(["game_date", "home_display_name", "away_display_name",
"home_score", "away_score", "total"])
.head(10))
๐ Where to nextโ
- ๐ฅ ESPN wrappers (
espn_mbb_*) cover the live site + core APIs โ scoreboards, standings, rankings, summaries, play-by-play and more. See the additional and site reference pages. - ๐ฆ FoxSports wrappers (
fox_mbb_*) โ leaders, standings, rosters, boxscores and odds in additional. - ๐ฆ Loaders (
load_mbb_*) read whole seasons of parquet โ see loaders. Passreturn_as_pandas=Trueanywhere for pandas instead of polars. - ๐ R user? The same surface lives in hoopR (NBA + NCAA men's basketball).
- ๐บ Women's hoops? Check out the WBB module and its companion wehoop.
Now go bracket something! ๐๐ฅ