π Men's college basketball with sportsdataverse-py
Welcome to Selection-Sunday-grade hoops data! π In a handful of lines of Python you're about to pull NCAA Division I men's basketball β full schedules, play-by-play, standings, rosters, statistical leaders and multi-season parquet archives β and get it all back as tidy polars DataFrames ready to model.
sportsdataverse.mbb leads with two premium sources:
- π₯ ESPN (
espn_mbb_*) β the site + core APIs behind ESPN.com: live scoreboards, schedules, standings, rankings, box scores, win probability and play-by-play. - π¦ FoxSports (
fox_mbb_*) β FoxSports' league-leader, standings, roster, boxscore and odds feeds.
Plus π¦ release loaders (load_mbb_*) that hand you whole seasons of
play-by-play, box scores, shots and schedules from the data repo in one call.
R user? The men's-basketball companion is hoopR (NBA + NCAA). Let's tip off! π
π§° The toolboxβ
Every accessor returns a tidy polars DataFrame by default β pass
return_as_pandas=True for pandas. The richest live surfaces are ESPN and Fox;
the load_* loaders read pre-built parquet from the data release (rock-solid,
no live API). Click any name for the full reference.
| Function | What it gives you | Source |
|---|---|---|
espn_mbb_teams | Every D-I team (grab team_ids) | π₯ ESPN β |
espn_mbb_schedule | Games + results for a date / window | π₯ ESPN β |
espn_mbb_scoreboard | Rich scoreboard for a date (status, lines, odds) | π₯ ESPN β |
espn_mbb_standings | Conference standings, one row per team | π₯ ESPN β |
espn_mbb_rankings | AP / Coaches poll (in-season) | π₯ ESPN β |
espn_mbb_summary | Full game summary: box, plays, win prob | π₯ ESPN β |
espn_mbb_team_roster | A team's roster | π₯ ESPN β |
espn_mbb_pbp | Event-level play-by-play for a game | π₯ ESPN β |
espn_mbb_game_rosters | Who dressed + started for one game | π₯ ESPN β |
espn_mbb_player_stats | A player's season stat line | π₯ ESPN β |
fox_mbb_league_leaders | Stat leaders (scoring, rebounds, β¦) | π¦ Fox β |
fox_mbb_standings | Fox conference standings for a team | π¦ Fox β |
fox_mbb_team_roster | Fox roster for a team | π¦ Fox |
espn_mbb_team_schedule | One team's full season schedule | π₯ ESPN β |
espn_mbb_conferences | Conference / group catalog | π₯ ESPN β |
load_mbb_schedule | Whole-season schedule parquet | π¦ loader |
load_mbb_player_boxscore | Season player box scores | π¦ loader |
load_mbb_team_boxscore | Season team box scores | π¦ loader |
load_mbb_pbp | Season play-by-play parquet | π¦ loader |
most_recent_mbb_season | Current season-year helper | π οΈ helper |
β = premium live source.
π Setupβ
pip install sportsdataverse
No API key needed β ESPN, Fox and the parquet loaders are all open. π
import polars as pl
import sportsdataverse as sdv
pl.Config.set_tbl_rows(10)
print("most recent MBB season:", sdv.mbb.most_recent_mbb_season())
most recent MBB season: 2026
ESPN's live endpoints (scoreboard, rankings, standings, a single game's
play-by-play) are seasonal β in the offseason a poll or scoreboard can come
back empty. So we use a tiny safe() helper: you get the frame when the feed
is up, and a friendly one-liner when it isn't β never a scary traceback. π
The load_* parquet loaders are stable year-round, so we call those directly.
def safe(label, thunk):
"""Run a live call defensively; return its result or None with a note."""
try:
out = thunk()
ok = out is not None and (not hasattr(out, "height") or out.height)
print(f"{'β
' if ok else 'βΉοΈ '} {label}{'' if ok else ' β no rows right now'}")
return out
except Exception as e: # noqa: BLE001 -- demo resilience
print(f"βοΈ {label}: unavailable right now ({type(e).__name__})")
return None
ποΈ Every team in Division Iβ
Start with espn_mbb_teams β
one row per program, with the team_id you'll pass into roster, schedule and
summary calls. This is a plain catalog fetch, so it's reliable year-round.
teams = sdv.mbb.espn_mbb_teams()
print("teams:", teams.shape)
teams.select(["team_id", "team_location", "team_name", "team_abbreviation", "team_is_active"]).head()
teams: (362, 14)
shape: (5, 5)
βββββββββββ¬ββββββββββββββββββββ¬βββββββββββββββ¬ββββββββββββββββββββ¬βββββββββββββββββ
β team_id β team_location β team_name β team_abbreviation β team_is_active β
β --- β --- β --- β --- β --- β
β str β str β str β str β bool β
βββββββββββͺββββββββββββββββββββͺβββββββββββββββͺββββββββββββββββββββͺβββββββββββββββββ‘
β 2000 β Abilene Christian β Wildcats β ACU β true β
β 2005 β Air Force β Falcons β AF β true β
β 2006 β Akron β Zips β AKR β true β
β 2010 β Alabama A&M β Bulldogs β AAMU β true β
β 333 β Alabama β Crimson Tide β ALA β true β
βββββββββββ΄ββββββββββββββββββββ΄βββββββββββββββ΄ββββββββββββββββββββ΄βββββββββββββββββ
π Schedule & scores for a date windowβ
espn_mbb_schedule takes a
single dates=YYYYMMDD or a 'YYYYMMDD-YYYYMMDD' window and returns one row
per game with final scores. Here's championship day of the 2024 tournament.
sched = safe(
"schedule 2024-04-08",
lambda: sdv.mbb.espn_mbb_schedule(dates=20240408),
)
(sched.select(["id", "home_display_name", "away_display_name", "home_score", "away_score"]).head()
if sched is not None and sched.height else "schedule unavailable")
β
schedule 2024-04-08
shape: (1, 5)
βββββββββββββ¬ββββββββββββββββββββ¬ββββββββββββββββββββββ¬βββββββββββββ¬βββββββββββββ
β id β home_display_name β away_display_name β home_score β away_score β
β --- β --- β --- β --- β --- β
β str β str β str β str β str β
βββββββββββββͺββββββββββββββββββββͺββββββββββββββββββββββͺβββββββββββββͺβββββββββββββ‘
β 401638645 β UConn Huskies β Purdue Boilermakers β 75 β 60 β
βββββββββββββ΄ββββββββββββββββββββ΄ββββββββββββββββββββββ΄βββββββββββββ΄βββββββββββββ
π The rich scoreboardβ
espn_mbb_scoreboard is the
deluxe version: for a given date it returns status, broadcast, betting lines and
team line scores β 50 columns wide. Defaults to polars; we peek at a tidy slice.
sb = safe(
"scoreboard 2024-04-08",
lambda: sdv.mbb.espn_mbb_scoreboard(dates=20240408, return_as_pandas=False),
)
if sb is not None and getattr(sb, "height", 0):
keep = ["game_id", "short_name", "status_type_description",
"home_team_short_display_name", "away_team_short_display_name"]
out = sb.select([c for c in keep if c in sb.columns]).head()
else:
out = "scoreboard empty right now (offseason)"
out
β
scoreboard 2024-04-08
shape: (1, 3)
βββββββββββββ¬ββββββββββββββ¬ββββββββββββββββββββββββββ
β game_id β short_name β status_type_description β
β --- β --- β --- β
β str β str β str β
βββββββββββββͺββββββββββββββͺββββββββββββββββββββββββββ‘
β 401638645 β PUR VS CONN β Final β
βββββββββββββ΄ββββββββββββββ΄ββββββββββββββββββββββββββ
π Conference standingsβ
espn_mbb_standings returns one
row per team for a season with wins, losses, win pct, point differential and
conference grouping. Great for a quick power look across the league.
standings = safe(
"standings 2024",
lambda: sdv.mbb.espn_mbb_standings(season=2024, return_as_pandas=False),
)
if standings is not None and getattr(standings, "height", 0):
keep = ["team_display_name", "group_name", "wins", "losses",
"win_percent", "point_differential"]
out = (standings.select([c for c in keep if c in standings.columns])
.sort("win_percent", descending=True).head(10))
else:
out = "standings unavailable"
out
β
standings 2024
shape: (10, 6)
βββββββββββββββββββββββββ¬ββββββββββββββββββββββββ¬βββββββ¬βββββββββ¬ββββββββββββββ¬βββββββββββββββββββββ
β team_display_name β group_name β wins β losses β win_percent β point_differential β
β --- β --- β --- β --- β --- β --- β
β str β str β f64 β f64 β f64 β f64 β
βββββββββββββββββββββββββͺββββββββββββββββββββββββͺβββββββͺβββββββββͺββββββββββββββͺβββββββββββββββββββββ‘
β McNeese Cowboys β Southland Conference β 17.0 β 1.0 β 0.9444444 β 308.0 β
β Vermont Catamounts β America East β 15.0 β 1.0 β 0.9375 β 179.0 β
β β Conference β β β β β
β Saint Mary's Gaels β West Coast Conference β 15.0 β 1.0 β 0.9375 β 312.0 β
β UConn Huskies β Big East Conference β 18.0 β 2.0 β 0.9 β 277.0 β
β South Florida Bulls β American Conference β 16.0 β 2.0 β 0.8888889 β 124.0 β
β Colgate Raiders β Patriot League β 16.0 β 2.0 β 0.8888889 β 217.0 β
β App State β Sun Belt Conference β 16.0 β 2.0 β 0.8888889 β 198.0 β
β Mountaineers β β β β β β
β Gonzaga Bulldogs β West Coast Conference β 14.0 β 2.0 β 0.875 β 305.0 β
β Princeton Tigers β Ivy League β 12.0 β 2.0 β 0.857143 β 136.0 β
β North Carolina Tar β Atlantic Coast β 17.0 β 3.0 β 0.85 β 210.0 β
β Heels β Conference β β β β β
βββββββββββββββββββββββββ΄ββββββββββββββββββββββββ΄βββββββ΄βββββββββ΄ββββββββββββββ΄βββββββββββββββββββββ
π³ Cookbook: common MBB tasksβ
Now the fun part β real tasks you'll reach for constantly, each built on a premium ESPN or Fox wrapper. Every recipe is guarded so a transient or offseason hiccup prints a note instead of breaking the page.
Recipe 1 β National scoring leaders π₯ (FoxSports)β
fox_mbb_league_leaders
serves the leaderboard direct from FoxSports β pick a category (scoring,
rebounds, assists, β¦) and who (player or team). No IDs needed.
leaders = safe(
"fox scoring leaders",
lambda: sdv.mbb.fox_mbb_league_leaders(category="scoring", who="player"),
)
if leaders is not None and getattr(leaders, "height", 0):
keep = ["players", "gp", "mpg", "ppg", "pts"]
out = leaders.select([c for c in keep if c in leaders.columns]).head(10)
else:
out = "Fox leaders unavailable right now"
out
β
fox scoring leaders
shape: (10, 5)
βββββββββββ¬ββββββ¬βββββββ¬βββββββ¬βββββββ
β players β gp β mpg β ppg β pts β
β --- β --- β --- β --- β --- β
β str β str β str β str β str β
βββββββββββͺββββββͺβββββββͺβββββββͺβββββββ‘
β 1 β 37 β null β null β null β
β 2 β 37 β null β null β null β
β 3 β 37 β null β null β null β
β 4 β 36 β null β null β null β
β 5 β 36 β null β null β null β
β 6 β 35 β null β null β null β
β 7 β 35 β null β null β null β
β 8 β 35 β null β null β null β
β 9 β 35 β null β null β null β
β 10 β 35 β null β null β null β
βββββββββββ΄ββββββ΄βββββββ΄βββββββ΄βββββββ
Recipe 2 β Look up a team's roster π₯ (ESPN)β
Grab a team_id from espn_mbb_teams, then
espn_mbb_team_roster returns
the current roster. Here we resolve UConn (the 2024 champs) by abbreviation so
the recipe is self-contained.
row = teams.filter(pl.col("team_abbreviation") == "CONN")
tid = int(row["team_id"][0]) if row.height else 41 # 41 = UConn fallback
roster = safe(
f"roster team_id={tid}",
lambda: sdv.mbb.espn_mbb_team_roster(team_id=tid, return_as_pandas=False),
)
if roster is not None and getattr(roster, "height", 0):
keep = ["full_name", "jersey", "display_height", "display_weight"]
out = roster.select([c for c in keep if c in roster.columns]).head(12)
else:
out = "roster unavailable right now"
out
β
roster team_id=41
shape: (12, 4)
ββββββββββββββββββββ¬βββββββββ¬βββββββββββββββββ¬βββββββββββββββββ
β full_name β jersey β display_height β display_weight β
β --- β --- β --- β --- β
β str β str β str β str β
ββββββββββββββββββββͺβββββββββͺβββββββββββββββββͺβββββββββββββββββ‘
β Solo Ball β 1 β 6' 4" β 200 lbs β
β Silas Demary Jr. β 2 β 6' 4" β 195 lbs β
β Rrezon Elezaj β 10 β 7' 1" β 225 lbs β
β Jacob Furphy β 7 β 6' 6" β 205 lbs β
β Alex Karaban β 11 β 6' 8" β 230 lbs β
β β¦ β β¦ β β¦ β β¦ β
β Braylon Mullins β 24 β 6' 6" β 196 lbs β
β UroΕ‘ Paunovic β 77 β 6' 3" β 190 lbs β
β Tarris Reed Jr. β 5 β 6' 11" β 265 lbs β
β Eric Reibe β 12 β 7' 1" β 260 lbs β