Skip to main content
Version: 0.0.55

The SportsDataverse ecosystem & philosophy

sportsdataverse-py is the Python member of the SportsDataverse — a family of free, open-source packages that put clean, tidy sports data in the hands of analysts across R, Python, and Node.js. This page explains the design philosophy the package shares with its sister projects, the function-naming paradigm that makes the surface predictable, and how to move between the Python and R packages (and the wider open-source sports ecosystem) without relearning anything.

Philosophy

Four ideas run through every SportsDataverse package:

  1. Free and open. The data is public; the tooling that tidies it should be too. Everything here is MIT-licensed and community-maintained.
  2. Tidy by default. Raw sports APIs return deeply-nested JSON. The job of a SportsDataverse package is to flatten that into rectangular, analysis-ready tables — polars/pandas DataFrames here, tibbles in R — with stable column names you can build a model on.
  3. One mental model across sports and languages. Learn the pattern once and it transfers: the same verbs (scoreboard, pbp, team_roster, player_gamelog, load_*) mean the same thing in nba and wnba and cfb, and the name you call in Python is the name you'd call in the R sister package.
  4. Benchmarkable models. Beyond aggregation, the project exists to make open-source expected-points (EP) and win-probability (WP) work — especially for American football — reproducible and comparable.

Function-naming paradigm

Once you know the prefixes, you can usually guess the function name.

PatternMeaningExamples
espn_<league>_<entity>()ESPN cross-league wrapper (same shape in all 8 leagues)espn_nba_scoreboard, espn_wnba_team_roster, espn_cfb_player_gamelog
<league>_<entity>() / <api>_<entity>()A league's native (non-ESPN) APInhl_pbp, nhl_edge_skater_detail, mlb_api_schedule, mlb_statcast
load_<league>_<dataset>(seasons=...)404-safe loader of a pre-built parquet releaseload_nba_pbp, load_wnba_shots, load_cfb_betting_lines
parse_<...>() / parser_for_<api>()Raw Dict → tidy polars/pandas frame (+ registry lookup)parse_mlb_api_person_stats, parser_for_nhl_api_web

Two conventions keep the ESPN surface aligned with the R packages:

  • R-aligned vocabulary. ESPN's raw taxonomy is normalized to the cfbfastR/hoopR/wehoop wording: an athlete is a player, an event is a game, a competitor is a game team. So you call espn_nba_player_overview() (not athlete_overview) and espn_cfb_game_plays() (not event_plays) — across every league.
  • Collision resolution (one bare name). When two endpoints would resolve to the same name, one keeps the clean bare name and the other is version-qualified. Every league therefore has a bare espn_<league>_player_stats() (season stats) alongside the comprehensive espn_<league>_player_stats_v3().

Return types are predictable: parser-backed wrappers return a polars DataFrame by default (0.0.54+) — pass return_parsed=False for the raw Dict; wrappers without a parser return the Dict. Use return_as_pandas=True to get a pandas DataFrame, or import from the sportsdataverse.parsed.<league> mirror for an explicit parsed-by-default namespace. See Architecture and Parsers for the full story.

Data releases & loaders

The load_<league>_*() functions skip live scraping entirely — they read pre-built, season-partitioned parquet that the SportsDataverse data pipelines publish on a schedule, and they are 404-safe (a season with no published asset is skipped with a warning rather than raising). The data comes from a small set of companion data repositories:

  • sportsdataverse-data — the GitHub Releases host that most ESPN-derived datasets load from (NBA, WNBA, MBB, WBB, NHL, PWHL, …).
  • cfbfastR-data — college football play-by-play, rosters, schedules, and team info.
  • fastRhockey-data — NHL/PWHL play-by-play and box scores.
  • nflverse-data — NFL data, read through the nflreadpy-style nfl module.

These mirror the R packages' own release repos (hoopR-data, wehoop-data, …): the same release-backed loader idea, and often the very same data.

Automation status

Each generated-loader league's Loaders reference page carries an Automation status table mapping every dataset to its release tag and the pipeline that produces it, so you can see at a glance what's current and where it comes from:

(The NFL module loads from nflverse releases via nflreadpy, and MLB pairs the official Stats API with Baseball Savant, so those two don't use the generated release-loader pages above.)

Python ↔ R: the sister packages

sdv-py deliberately mirrors the R packages' names, so a call you know in R is the call you make in Python. Each sport's R sister:

Sport(s)sportsdataverse-py moduleR sister package
NBA, NCAA men's basketballnba, mbbhoopR
WNBA, NCAA women's basketballwnba, wbbwehoop
College footballcfbcfbfastR
NFLnflnflverse (see below)
MLBmlbbaseballr
NHL, PWHLnhl, pwhlfastRhockey

For example, today's WNBA scoreboard is the same verb in both languages:

# R (wehoop)
wehoop::espn_wnba_scoreboard()
# Python (sportsdataverse-py)
from sportsdataverse.wnba import espn_wnba_scoreboard
espn_wnba_scoreboard(return_parsed=True)

A 1:1 function map

A representative slice of the surface — each sportsdataverse-py function links to its reference page, and each R function links to its sister-package docs. The pattern holds well beyond these rows: ESPN wrappers, native league APIs, and load_* release loaders all line up.

sportsdataverse-pyR sisterWhat it returns
espn_nba_scoreboardhoopR::espn_nba_scoreboardNBA games + scores for a date
espn_wnba_scoreboardwehoop::espn_wnba_scoreboardWNBA games + scores for a date
espn_cfb_scoreboardcfbfastR::espn_cfb_scoreboardCFB games + scores for a week
espn_mlb_scoreboardbaseballr::espn_mlb_scoreboardMLB games + scores for a date
espn_nba_standingshoopR::espn_nba_standingsLeague standings table
espn_wnba_team_rosterwehoop::espn_wnba_team_rosterA team's roster
nhl_web_pbpfastRhockey::nhl_game_pbpNHL play-by-play for a game (api-web)
nhl_edge_skater_detailfastRhockey::nhl_edge_skater_detailPer-skater EDGE tracking (speed / distance / shots)
espn_nhl_teamsfastRhockey::espn_nhl_teamsAll NHL teams (ESPN)
mlb_api_pbpbaseballr::mlb_pbpMLB play-by-play for a game (Stats API)
mlb_api_draftbaseballr::mlb_draftMLB amateur draft picks for a year
load_nba_pbphoopR::load_nba_pbpWhole-season NBA pbp from releases
load_cfb_pbpcfbfastR::load_cfb_pbpWhole-season CFB pbp from releases
load_nhl_pbpfastRhockey::load_nhl_pbpWhole-season NHL pbp from releases

Where they diverge: sdv-py exposes one function per ESPN surface (espn_nba_teams_site vs espn_nba_season_teams) where the R packages often collapse them into a single function with branching internals; and sdv-py returns polars by default rather than a data.frame/tibble.

Beyond the sport packages, the SportsDataverse spans languages and utilities:

nflverse and the wider Python ecosystem

SportsDataverse builds on and complements two neighboring communities:

  • nflverse — the NFL-focused open ecosystem (nflfastR and nflreadr in R, nflreadpy in Python). The sportsdataverse.nfl module mirrors nflreadpy's load_* surface and reads the same nflverse parquet releases, so nflverse users can swap engines with minimal changes.
  • PySport — the open-source sports-analytics community and its curated directory of Python libraries. sdv-py sits alongside league-specific tools you may already use — nba_api, pybaseball, and nhl-api-py — and is happy to be one tidy layer in a larger toolbox rather than the only one.

Where to go next

  • New here? Start with the quickstart notebook, then the per-sport notebook for your league.
  • Want the design details? ESPN cross-league architecture and the parser layer.
  • Looking for a specific function? Each league's Reference section lists every wrapper with its endpoint, parameters, and return schema.