The SportsDataverse ecosystem & philosophy
sportsdataverse-py is the Python member of the SportsDataverse
— a family of free, open-source packages that put clean, tidy sports data in the
hands of analysts across R, Python, and Node.js. This page explains the design
philosophy the package shares with its sister projects, the function-naming
paradigm that makes the surface predictable, and how to move between the Python
and R packages (and the wider open-source sports ecosystem) without relearning
anything.
Philosophy
Four ideas run through every SportsDataverse package:
- Free and open. The data is public; the tooling that tidies it should be too. Everything here is MIT-licensed and community-maintained.
- Tidy by default. Raw sports APIs return deeply-nested JSON. The job of a SportsDataverse package is to flatten that into rectangular, analysis-ready tables — polars/pandas DataFrames here, tibbles in R — with stable column names you can build a model on.
- One mental model across sports and languages. Learn the pattern once and
it transfers: the same verbs (
scoreboard,pbp,team_roster,player_gamelog,load_*) mean the same thing innbaandwnbaandcfb, and the name you call in Python is the name you'd call in the R sister package. - Benchmarkable models. Beyond aggregation, the project exists to make open-source expected-points (EP) and win-probability (WP) work — especially for American football — reproducible and comparable.
Function-naming paradigm
Once you know the prefixes, you can usually guess the function name.
| Pattern | Meaning | Examples |
|---|---|---|
espn_<league>_<entity>() | ESPN cross-league wrapper (same shape in all 8 leagues) | espn_nba_scoreboard, espn_wnba_team_roster, espn_cfb_player_gamelog |
<league>_<entity>() / <api>_<entity>() | A league's native (non-ESPN) API | nhl_pbp, nhl_edge_skater_detail, mlb_api_schedule, mlb_statcast |
load_<league>_<dataset>(seasons=...) | 404-safe loader of a pre-built parquet release | load_nba_pbp, load_wnba_shots, load_cfb_betting_lines |
parse_<...>() / parser_for_<api>() | Raw Dict → tidy polars/pandas frame (+ registry lookup) | parse_mlb_api_person_stats, parser_for_nhl_api_web |
Two conventions keep the ESPN surface aligned with the R packages:
- R-aligned vocabulary. ESPN's raw taxonomy is normalized to the
cfbfastR/hoopR/wehoop wording: an athlete is a player, an event is a
game, a competitor is a game team. So you call
espn_nba_player_overview()(notathlete_overview) andespn_cfb_game_plays()(notevent_plays) — across every league. - Collision resolution (one bare name). When two endpoints would resolve to
the same name, one keeps the clean bare name and the other is version-qualified.
Every league therefore has a bare
espn_<league>_player_stats()(season stats) alongside the comprehensiveespn_<league>_player_stats_v3().
Return types are predictable: parser-backed wrappers return a polars DataFrame
by default (0.0.54+) — pass return_parsed=False for the raw Dict; wrappers
without a parser return the Dict. Use return_as_pandas=True to get a pandas
DataFrame, or import from the sportsdataverse.parsed.<league> mirror for an
explicit parsed-by-default namespace. See Architecture
and Parsers for the full story.
Data releases & loaders
The load_<league>_*() functions skip live scraping entirely — they read
pre-built, season-partitioned parquet that the SportsDataverse data pipelines
publish on a schedule, and they are 404-safe (a season with no published
asset is skipped with a warning rather than raising). The data comes from a small
set of companion data repositories:
- sportsdataverse-data — the GitHub Releases host that most ESPN-derived datasets load from (NBA, WNBA, MBB, WBB, NHL, PWHL, …).
- cfbfastR-data — college football play-by-play, rosters, schedules, and team info.
- fastRhockey-data — NHL/PWHL play-by-play and box scores.
- nflverse-data — NFL data, read
through the nflreadpy-style
nflmodule.
These mirror the R packages' own release repos (hoopR-data, wehoop-data, …): the same release-backed loader idea, and often the very same data.
Automation status
Each generated-loader league's Loaders reference page carries an Automation status table mapping every dataset to its release tag and the pipeline that produces it, so you can see at a glance what's current and where it comes from:
(The NFL module loads from nflverse releases via nflreadpy, and MLB pairs the official Stats API with Baseball Savant, so those two don't use the generated release-loader pages above.)
Python ↔ R: the sister packages
sdv-py deliberately mirrors the R packages' names, so a call you know in R is the call you make in Python. Each sport's R sister:
| Sport(s) | sportsdataverse-py module | R sister package |
|---|---|---|
| NBA, NCAA men's basketball | nba, mbb | hoopR |
| WNBA, NCAA women's basketball | wnba, wbb | wehoop |
| College football | cfb | cfbfastR |
| NFL | nfl | nflverse (see below) |
| MLB | mlb | baseballr |
| NHL, PWHL | nhl, pwhl | fastRhockey |
For example, today's WNBA scoreboard is the same verb in both languages:
# R (wehoop)
wehoop::espn_wnba_scoreboard()
# Python (sportsdataverse-py)
from sportsdataverse.wnba import espn_wnba_scoreboard
espn_wnba_scoreboard(return_parsed=True)
A 1:1 function map
A representative slice of the surface — each sportsdataverse-py function links to
its reference page, and each R function links to its sister-package docs. The
pattern holds well beyond these rows: ESPN wrappers, native league APIs, and
load_* release loaders all line up.
Where they diverge: sdv-py exposes one function per ESPN surface
(espn_nba_teams_site vs espn_nba_season_teams) where the R packages often
collapse them into a single function with branching internals; and sdv-py returns
polars by default rather than a data.frame/tibble.
Beyond the sport packages, the SportsDataverse spans languages and utilities:
- R umbrella & utilities — sportsdataverse-R (the meta-package that loads them all), oddsapiR (betting odds), recruitR (recruiting), and sportyR (field/court/rink plots).
- Python siblings — sportypy (the Python port of sportyR), collegebaseball, and recruitR-py.
- Node.js — sportsdataverse.js.
nflverse and the wider Python ecosystem
SportsDataverse builds on and complements two neighboring communities:
- nflverse — the NFL-focused open ecosystem
(nflfastR and nflreadr
in R, nflreadpy in Python). The
sportsdataverse.nflmodule mirrors nflreadpy'sload_*surface and reads the same nflverse parquet releases, so nflverse users can swap engines with minimal changes. - PySport — the open-source sports-analytics community and its curated directory of Python libraries. sdv-py sits alongside league-specific tools you may already use — nba_api, pybaseball, and nhl-api-py — and is happy to be one tidy layer in a larger toolbox rather than the only one.
Where to go next
- New here? Start with the quickstart notebook, then the per-sport notebook for your league.
- Want the design details? ESPN cross-league architecture and the parser layer.
- Looking for a specific function? Each league's Reference section lists every wrapper with its endpoint, parameters, and return schema.