ποΈ Welcome to sportsdataverse-py β the cross-sport quickstart
One pip install, every major league. sportsdataverse is a single Python
package that speaks to the official, premium native data feeds across the
sporting world β the same endpoints the leagues use to power their own apps β
plus the ESPN mirror and pre-built parquet release loaders. Everything
comes back as a tidy polars DataFrame, ready to model. π
This page is your map to the whole package. By the end you'll be able to:
- πΊοΈ see every datasource available for every league, with links straight to its tutorial and its reference index;
- π§ predict function names you've never seen β sportsdataverse uses one consistent naming contract, so knowing one function tells you the others;
- π³ cook through ~20 cross-sport recipes that show the breadth in action.
If you've used the R sisters β hoopR, wehoop, cfbfastR, baseballr, fastRhockey, oddsapiR β the names here will feel like home. Let's take the tour! π
πΊοΈ 1 Β· The master index β every datasource, every leagueβ
Here's the whole package on one page. Each row is a league (or the betting-odds module); each cell tells you which datasource families are wired up. π³ marks the premium native feeds (the leagues' own APIs / tracking systems / Statcast). Click a league's tutorial for the deep dive, or its reference for the full function index.
| League | Tutorial Β· Reference | ESPN (espn_<lg>_*) | Native premium API | Tracking / analytics | Release loaders (load_*) |
|---|---|---|---|---|---|
| π NBA | tutorial Β· ref | β | β | β | load_nba_pbp, load_nba_team_boxscore |
| π WNBA | tutorial Β· ref | β | β | β | load_wnba_pbp, load_wnba_player_boxscore |
| π MBB (NCAA M) | tutorial Β· ref | β | β | β | load_mbb_pbp, load_mbb_team_boxscore |
| π WBB (NCAA W) | tutorial Β· ref | β | β | β | load_wbb_pbp, load_wbb_team_boxscore |
| π NFL | tutorial Β· ref | β | π³ nfl_* (api.nfl.com) | π³ Next Gen Stats nfl_ngs_* | load_nfl_pbp, load_nfl_player_stats, load_injuries |
| π CFB (College) | tutorial Β· ref | β | yahoo_cfb_*, fox_cfb_* | β | load_cfb_pbp |
| βΎ MLB | tutorial Β· ref | β | π³ mlb_api_* (MLB Stats API) | π³ Statcast statcast_* | load_mlb_pbp, load_mlb_team_boxscore |
| π NHL | tutorial Β· ref | β | π³ nhl_* (api-web) | π³ NHL EDGE nhl_edge_* | load_nhl_pbp, load_nhl_team_boxscore |
| π PWHL (Women's pro) | tutorial Β· ref | β | π³ pwhl_* (HockeyTech) | corsi / shifts / TOI | load_pwhl_schedules |
| π AHL (Minor pro) | tutorial Β· ref | β | π³ ahl_* (HockeyTech) | corsi / shifts / TOI | β |
| π OHL (CHL junior) | tutorial Β· ref | β | π³ ohl_* (HockeyTech) | corsi / shifts / TOI | β |
| π WHL (CHL junior) | tutorial Β· ref | β | π³ whl_* (HockeyTech) | corsi / shifts / TOI | β |
| π QMJHL (CHL junior) | tutorial Β· ref | β | π³ qmjhl_* (HockeyTech) | corsi / shifts / TOI | β |
| π² Betting odds | tutorial Β· ref | β | π³ toa_* (The Odds API) | line history / props | β |
π‘ HockeyTech leagues (AHL/OHL/WHL/QMJHL/PWHL) ship public client keys β no setup needed. Only the betting-odds module wants a free
ODDS_API_KEY.
π§© The five function stylesβ
Across all those rows, only five families exist. Learn the shape of each once and you can read any function name in the package:
- Live ESPN wrappers β
espn_<lg>_*(e.g.espn_nba_teams,espn_wbb_scoreboard). The same set exists for every ESPN league: teams, rosters, scoreboards, standings, schedules, play-by-play, box scores. πͺ - Native premium API wrappers β the league's own feed:
nfl_*(api.nfl.com),mlb_api_*(MLB Stats API),nhl_*(api-web),pwhl_*/ahl_*/ohl_*/whl_*/qmjhl_*(HockeyTech),toa_*(The Odds API). π³ - Tracking / analytics feeds β the really premium stuff:
statcast_*(Baseball Savant),nhl_edge_*(player tracking),nfl_ngs_*(Next Gen Stats). - Release / parquet loaders β
load_<sport>_*()reads a pre-built parquet release (fast, reliable, whole-season-at-once):load_nba_pbp,load_mlb_team_boxscore,load_pwhl_schedules, β¦ - Parser layer β
parse_*turns a raw native payload into a tidy frame (e.g.parse_mlb_api_standings). Most wrappers parse for you; the parsers are there when you fetch the rawDictyourself.
The return contract never changes. Every wrapper gives you polars by
default; pass return_as_pandas=True for a pandas frame, and on the native
APIs pass return_parsed=False for the raw JSON Dict. One contract, every
sport. ποΈ
π Setupβ
pip install sportsdataverse
# or
uv add sportsdataverse
Every league is a submodule of the umbrella package, and the headline cross-league wrappers + discovery helpers are re-exported at the top level. Let's import it.
import os
import polars as pl
import sportsdataverse as sdv
import sportsdataverse.odds as odds
# Every league hangs off the top-level package:
[m for m in dir(sdv) if m in
("cfb", "nfl", "nba", "wnba", "mbb", "wbb", "nhl", "mlb", "pwhl",
"ahl", "ohl", "whl", "qmjhl", "odds")]
Live endpoints are seasonal and occasionally rate-limited, and the
naming-convention loops below fan out many live calls at once β so a tiny
safe() helper runs every network call defensively. You get the frame when the
feed is up, and a friendly one-liner when it isn't β never a scary traceback.
That keeps this whole page runnable offline or in the off-season. π
def safe(label, thunk):
'''Run a live call; return its result, or print a one-liner and return None.'''
try:
out = thunk()
print(f"β
{label}")
return out
except Exception as e: # noqa: BLE001 -- demo resilience
print(f"βοΈ {label}: unavailable right now ({type(e).__name__})")
return None
# Odds is the only module that wants a (free) key β guard those cells:
HAS_KEY = bool(os.environ.get("ODDS_API_KEY"))
print("ODDS_API_KEY set:", HAS_KEY,
"β odds cells will" + ("" if HAS_KEY else " NOT") + " run live")
π§ 2 Β· The naming-convention superpowerβ
Here's the centerpiece. sportsdataverse names things so predictably that knowing one function name tells you the others. The same style of data is exactly one rename away across every sport β swap the league slug and the call just works. Let's prove it. πͺ
πͺ The ESPN families are identical across every leagueβ
espn_<lg>_teams, espn_<lg>_team_roster, espn_<lg>_scoreboard,
espn_<lg>_standings exist for every ESPN league. A one-line helper +
getattr tours them all and returns the same shape each time.
def teams(league):
'''Knowing one name (espn_<lg>_teams) gives you all of them.'''
return getattr(sdv, f"espn_{league}_teams")()
rows = []
for lg in ["nba", "wnba", "nhl", "mlb"]:
df = safe(f"espn_{lg}_teams", lambda lg=lg: teams(lg))
rows.append({"league": lg.upper(),
"fn": f"espn_{lg}_teams()",
"n_teams": None if df is None else df.height,
"n_cols": None if df is None else df.width})
pl.DataFrame(rows) # same columns, same shape β one contract, four leagues
Same trick for the scoreboard and standings families β the call is identical, only the slug changes.
def call(family, league, **kw):
'''Generic dispatcher: call("scoreboard", "nhl") -> espn_nhl_scoreboard().'''
return getattr(sdv, f"espn_{league}_{family}")(**kw)
board = safe("espn_nfl_scoreboard", lambda: call("scoreboard", "nfl"))
stand = safe("espn_nba_standings", lambda: call("standings", "nba"))
print("NFL scoreboard rows:", None if board is None else board.height,
"| NBA standings rows:", None if stand is None else getattr(stand, "height", None))
π¦ The loaders follow one pattern tooβ
load_<sport>_pbp and load_<sport>_team_boxscore read pre-built parquet for
every sport β same signature (seasons=[...]), same return type. Knowing
load_nba_pbp means you already know load_nhl_pbp and load_mlb_pbp.
# A single getattr loop loads play-by-play for four different sports:
season = 2024
for sport in ["nba", "wnba", "nhl"]:
fn = getattr(sdv, f"load_{sport}_pbp")
print(f"load_{sport}_pbp(seasons=[{season}]) -> signature is identical for every sport")
# (we don't pull all of them here β that's a lot of parquet; Recipe 3 runs one.)
π The HockeyTech leagues share one surfaceβ
AHL / OHL / WHL / QMJHL / PWHL all expose <lg>_schedule, <lg>_standings,
<lg>_teams, <lg>_team_roster, and most_recent_<lg>_season. Learn one, you
learned all five.
import sportsdataverse.ahl as ahl
import sportsdataverse.ohl as ohl
import sportsdataverse.whl as whl
import sportsdataverse.qmjhl as qmjhl
import sportsdataverse.pwhl as pwhl
HOCKEYTECH = {"ahl": ahl, "ohl": ohl, "whl": whl, "qmjhl": qmjhl, "pwhl": pwhl}
rows = []
for lg, mod in HOCKEYTECH.items():
season = safe(f"most_recent_{lg}_season", getattr(mod, f"most_recent_{lg}_season"))
rows.append({"league": lg.upper(),
"schedule_fn": f"{lg}_schedule()",
"standings_fn": f"{lg}_standings()",
"season": season})
pl.DataFrame(rows)
π Discovery helpers β when you don't know the name yetβ
Four top-level helpers let you search the surface instead of guessing:
list_functions(league=None, search=..., parsers_only=..., wrappers_only=...)β list/search every wrapper.function_count(league=None)β how many functions each league exposes.find_team(name, league)β fuzzy team lookup (returns the ESPN team dict +id).find_athlete(name, league)β fuzzy player lookup.
# What does the package know about "scoreboard"? (grouped by league)
hits = sdv.list_functions(search="scoreboard")
for lg, fns in hits.items():
print(f"{lg:>4}: {', '.join(fns)}")
# How big is each league's surface?
counts = sdv.function_count()
pl.DataFrame({"league": list(counts.keys()), "n_functions": list(counts.values())}) \
.sort("n_functions", descending=True)
# Fuzzy lookups β no IDs to memorize:
team = sdv.find_team("Lakers", "nba")
ath = sdv.find_athlete("LeBron", "nba")
print("team ->", None if team is None else f"{team['displayName']} (id={team['id']})")
print("athlete ->", None if ath is None else f"{ath['displayName']} (id={ath['id']})")
π³ 3 Β· Twenty cross-sport recipesβ
Now the fun part β 20 runnable recipes that show the breadth and the overlap. Every recipe is defensively guarded, so a flaky network or off-season just prints a friendly note instead of erroring. Mix, match, and remix. π
Recipe 1 β Any league's teams πͺβ
teams("<lg>") (our helper from above) hits espn_<lg>_teams for any ESPN
league. Here's the WBB team list.
wbb_teams = safe("espn_wbb_teams", lambda: teams("wbb"))
cols = ["team_id", "team_abbreviation", "team_display_name", "team_location"]
(wbb_teams.select([c for c in cols if c in wbb_teams.columns]).head()
if wbb_teams is not None and wbb_teams.height else "WBB teams unavailable right now")
Recipe 2 β Any league's scoreboard πβ
espn_<lg>_scoreboard() returns today's slate as a tidy frame. Same call for
MLB, NBA, NHL β just change the slug.
sb = safe("espn_mlb_scoreboard", lambda: sdv.espn_mlb_scoreboard())
(sb.head() if sb is not None and getattr(sb, "height", 0)
else "no MLB games on the board right now")