π The NFL with sportsdataverse-py
Welcome to gridiron data! π In a handful of lines you're about to pull standings, rosters, weekly injury reports, NextGen Stats tracking leaderboards, and full play-by-play β straight from the source.
sportsdataverse.nfl leads with the premium api.nfl.com
native endpoints (nfl_standings, nfl_rosters, nfl_injuries, β¦) and the
NextGen Stats tracking API (nfl_ngs_*),
backed by the battle-tested nflverse release
loaders (load_nfl_pbp, load_nfl_player_stats, β¦). ESPN
(espn_nfl_*) rides shotgun as a quick, no-auth secondary path.
Every accessor hands you a tidy polars DataFrame by default β pass
return_as_pandas=True for pandas. If you've used the R packages
nflfastR / nflreadr,
or the Python nflreadpy, you're already
home: the load_* names line up. Let's hike it! π
π§° The toolboxβ
Three data families, one module. The π’ premium rows lead with native
api.nfl.com / NextGen Stats endpoints; the π¦ rows read versioned nflverse
release parquets; ESPN is the π΅ quick secondary path. Click any name for the
full reference.
| Function | What it gives you | Source |
|---|---|---|
nfl_standings | Team standings for a season/week β one row per team | π’ premium (NFL.com) |
nfl_rosters | Season rosters, one row per team (players nested) | π’ premium (NFL.com) |
nfl_injuries | Weekly injury report, one row per player | π’ premium (NFL.com) |
nfl_weeks | The week calendar (bye weeks, date ranges) | π’ premium (NFL.com) |
nfl_weekly_game_details | Rich per-game details for a week (drive charts, standings) | π’ premium (NFL.com) |
nfl_game_summaries | Live game state, one row per game | π’ premium (NFL.com) |
nfl_team | Single-team detail by team_id | π’ premium (NFL.com) |
nfl_ngs_statboard | NextGen Stats season leaderboard (passing/rushing/receiving) | π’ premium (NextGen Stats) |
nfl_ngs_leaders | NextGen top-N highlight boards (speed, YAC over expected, β¦) | π’ premium (NextGen Stats) |
nfl_ngs_league_schedule | NextGen schedule β source of NGS gameIds | π’ premium (NextGen Stats) |
nfl_ngs_gamecenter_overview | Per-game NextGen player splits (passers/rushers/β¦) | π’ premium (NextGen Stats) |
load_nfl_pbp | Full nflfastR play-by-play (370+ columns) | π¦ nflverse release |
load_nfl_player_stats | Weekly player box-score stats | π¦ nflverse release |
load_nfl_nextgen_stats | NextGen Stats back to 2016 (release parquet) | π¦ nflverse release |
load_nfl_rosters | Season rosters with IDs & bios | π¦ nflverse release |
espn_nfl_schedule Β· espn_nfl_scoreboard | ESPN scoreboard/schedule (no auth) | π΅ ESPN (secondary) |
load_nfl_snap_counts | Weekly snap counts & snap-share % per player | π¦ nflverse release |
load_nfl_depth_charts | Weekly depth charts, one row per slotted player | π¦ nflverse release |
load_nfl_schedule | Game results + lines, one row per game | π¦ nflverse release |
load_nfl_draft_picks | Every draft pick + career value, one row per pick | π¦ nflverse release |
get_current_nfl_season Β· most_recent_nfl_season | Season helpers | π’ helper |
π Setupβ
pip install sportsdataverse
No API key needed β the native api.nfl.com wrappers mint a fresh
anonymous token for you, and the NextGen Stats client warms its own browser
cookies. The nflverse loaders just read public release parquets. π
import polars as pl
import sportsdataverse.nfl as nfl
pl.Config.set_tbl_cols(8) # keep wide frames readable in the notebook
Native NFL.com / NextGen endpoints and ESPN are live services β great
in-season, occasionally grumpy in the offseason or behind a flaky network. A
tiny safe() helper runs each live call defensively: you get the frame when
the feed is up, and a friendly one-liner when it isn't (never a scary
traceback). π The load_* release parquets are reliable, so we call those
directly.
def safe(label, thunk):
"""Run a live call; return its result, or print a one-liner and return None."""
try:
out = thunk()
print(f"β
{label}")
return out
except Exception as e: # noqa: BLE001 -- demo resilience over network blips
print(f"βοΈ {label}: unavailable right now ({type(e).__name__})")
return None
# 2024 is a complete season with full data everywhere β a safe default to demo.
SEASON = 2024
π’ Premium first: NFL.com native standingsβ
The headliner. nfl_standings
returns one row per team with conference/division records, streaks,
clinch flags, point differentials β the works. Pass season, season_type
("REG"/"POST"/"PRE" β strings, not ESPN's numeric codes) and week.
standings = safe(
"NFL.com standings",
lambda: nfl.nfl_standings(season=SEASON, season_type="REG", week=18),
)
standings.shape if standings is not None else "standings unavailable"
cols = [
"team_full_name", "conference_rank", "division_rank",
"overall_wins", "overall_losses", "overall_ties",
"division_wins", "division_losses",
]
(standings.select([c for c in cols if c in standings.columns])
.sort("conference_rank")
.head(10)
if standings is not None else "standings unavailable")
π₯ Rosters & the week calendarβ
nfl_rosters gives one row per
team for a season, with the player list nested under persons (great for a
team directory). nfl_weeks is the
season's week calendar β handy for finding bye weeks and date ranges before
you loop over a slate.
| Function | One row per | Key columns |
|---|---|---|
nfl_rosters | team | team_abbreviation, team_conference_abbr, persons |
nfl_weeks | week | week, week_type, bye_teams, date_begin |
rosters = safe("NFL.com rosters", lambda: nfl.nfl_rosters(season=SEASON))
cols = ["team_abbreviation", "team_full_name", "team_conference_abbr", "team_division_full_name"]
(rosters.select([c for c in cols if c in rosters.columns]).head(8)
if rosters is not None else "rosters unavailable")
weeks = safe("NFL.com weeks", lambda: nfl.nfl_weeks(season=SEASON, season_type="REG"))
cols = ["season", "week", "week_type", "date_begin", "date_end", "bye_teams"]
(weeks.select([c for c in cols if c in weeks.columns]).head(8)
if weeks is not None else "weeks unavailable")
π₯ The weekly injury reportβ
nfl_injuries is the official
weekly injury report β one row per listed player with their
injury_status (Out / Doubtful / Questionable), practice participation, and
team. This is the premium native feed, not a scrape.
inj = safe(
"NFL.com injuries",
lambda: nfl.nfl_injuries(season=SEASON, season_type="REG", week=1),
)
cols = [
"team_full_name", "person_display_name", "position",
"injuries", "injury_status", "practice_status",
]
(inj.select([c for c in cols if c in inj.columns]).head(10)
if inj is not None else "injuries unavailable")
π Per-game details for a weekβ
Need the full slate with drive charts, broadcast info and embedded standings?
nfl_weekly_game_details
returns one row per game for a week (toggle the heavy blocks with the
include_* flags). For live in-game state (clock, down & distance, red-zone
flags), reach for
nfl_game_summaries.
wgd = safe(
"NFL.com weekly game details",
lambda: nfl.nfl_weekly_game_details(season=SEASON, season_type="REG", week=1),
)
cols = ["week", "date", "game_type", "away_team_full_name", "home_team_full_name", "status"]
(wgd.select([c for c in cols if c in wgd.columns]).head(8)
if wgd is not None else "weekly game details unavailable")
β‘ NextGen Stats: the tracking layerβ
This is where it gets fun. The NFL's NextGen Stats API exposes player-tracking metrics you won't find in a box score β time to throw, completion percentage over expectation (CPOE), separation, ball-carrier top speed. All token-free.
nfl_ngs_statboard is the
season leaderboard. Ask for stat_type "passing", "rushing", or
"receiving".
qb = safe(
"NGS passing statboard",
lambda: nfl.nfl_ngs_statboard(stat_type="passing", season=SEASON, season_type="REG"),
)
cols = [
"playerName", "passerRating", "completionPercentageAboveExpectation",
"avgTimeToThrow", "aggressiveness", "passYards", "passTouchdowns",
]
(qb.select([c for c in cols if c in qb.columns])
.sort("passerRating", descending=True)
.head(10)
if qb is not None else "NGS statboard unavailable")
And nfl_ngs_leaders
serves the highlight-reel top-N boards β each row is the play that earned
the leader their spot. Categories include "speed" (fastest ball carriers),
"yac_season" (yards-after-catch over expected), "completion_season"
(most-improbable completions) and more.
fast = safe(
"NGS fastest ball carriers",
lambda: nfl.nfl_ngs_leaders(category="speed", season=SEASON, season_type="REG"),
)
cols = ["leader_playerName", "leader_teamAbbr", "leader_maxSpeed", "leader_yards", "play_playDescription"]
(fast.select([c for c in cols if c in fast.columns]).head(8)
if fast is not None else "NGS leaders unavailable")
π¦ nflverse loaders: the bulk-data workhorsesβ
For full-season modelling you want the nflverse release parquets β the exact same assets that power nflfastR / nflreadr / nflreadpy. These are versioned, cached releases (very reliable), so we call them directly.
| Function | Rows | Highlights |
|---|---|---|
load_nfl_pbp | ~49k/season | EPA, WP, air yards, 370+ columns |
load_nfl_player_stats | weekly | passing/rushing/receiving box lines |
load_nfl_nextgen_stats | weekly | NGS back to 2016 |
load_nfl_rosters | per player | IDs, bios, draft info |
pbp = nfl.load_nfl_pbp([SEASON])
pbp.shape
(pbp
.filter(pl.col("play_type").is_not_null())
.select(["game_id", "qtr", "down", "ydstogo", "posteam", "play_type", "yards_gained", "epa", "desc"])
.head(8))
ngs_release = nfl.load_nfl_nextgen_stats([SEASON], stat_type="passing")
(ngs_release
.filter(pl.col("week") == 0) # week 0 == season totals in this release
.select(["player_display_name", "team_abbr", "attempts", "pass_yards",
"completion_percentage_above_expectation", "passer_rating"])
.sort("passer_rating", descending=True)
.head(8))
π΅ Secondary path: ESPN (quick & no-auth)β
When you just want a fast scoreboard without minting a token, ESPN is right
there. espn_nfl_schedule
returns a tidy schedule frame; pass dates=YYYYMMDD for a single day. (There's
also a raw espn_nfl_scoreboard
if you want the unparsed JSON.)
espn_sched = safe("ESPN schedule", lambda: nfl.espn_nfl_schedule(dates=20240908))
cols = ["id", "away_display_name", "home_display_name", "away_score", "home_score", "status_type_description"]
(espn_sched.select([c for c in cols if c in espn_sched.columns]).head(8)
if espn_sched is not None else "ESPN schedule unavailable")
π³ Cookbook: common NFL tasksβ
A full dozen recipes you'll reach for constantly β leaderboards, splits, team-level efficiency, snap-share workhorses, play-by-play slices, a schedule scan, and a quick hop into pandas. Each one leans on the premium native/NextGen feeds or the rock-solid nflverse release parquets, and every live call is wrapped so a network blip never breaks your run. Recipes 1β4 use the live NFL.com / NextGen endpoints; 5β12 build on the cached release parquets, so they run anywhere, anytime. π
Recipe 1 β This week's "Out" list πβ
Filter the official injury report down to players ruled Out β exactly what you'd check before setting a lineup.
rep = safe(
"injury report",
lambda: nfl.nfl_injuries(season=SEASON, season_type="REG", week=1),
)
if rep is not None and rep.height and "injury_status" in rep.columns:
out = (
rep.filter(pl.col("injury_status").str.to_lowercase() == "out")
.select([c for c in ["team_full_name", "person_display_name", "position", "injuries"]
if c in rep.columns])
.head(15)
)
else:
out = "injury report unavailable"
out
Recipe 2 β CPOE leaderboard from NextGen Stats π―β
Who's beating expectation as a passer? Rank qualified QBs by completion percentage above expectation straight off the NextGen statboard.
board = safe(
"NGS passing board",
lambda: nfl.nfl_ngs_statboard(stat_type="passing", season=SEASON, season_type="REG"),
)
if board is not None and board.height and "completionPercentageAboveExpectation" in board.columns:
cpoe = (
board.filter(pl.col("attempts") >= 200)
.select(["playerName", "attempts", "completionPercentage",
"completionPercentageAboveExpectation", "passerRating"])
.sort("completionPercentageAboveExpectation", descending=True)
.head(10)
)
else:
cpoe = "NGS board unavailable"
cpoe
Recipe 3 β Standings β division winners πβ
Take the premium standings and pull the team that tops each division. One group-by and you've got your playoff-seeding cheat sheet.
st = safe(
"standings",
lambda: nfl.nfl_standings(season=SEASON, season_type="REG", week=18),
)
if st is not None and st.height and {"division_rank", "team_full_name"}.issubset(st.columns):
div_col = next((c for c in ["team_division_full_name", "division_full_name", "division"] if c in st.columns), None)
keep = [c for c in [div_col, "team_full_name", "overall_wins", "overall_losses"] if c]
winners = (
st.filter(pl.col("division_rank") == 1)
.select(keep)
.sort(div_col) if div_col else st.filter(pl.col("division_rank") == 1).select(keep)
)
else:
winners = "standings unavailable"
winners
Recipe 4 β A game's NextGen passer splits π¬β
Grab an NGS gameId from the schedule, then pull
nfl_ngs_gamecenter_overview
to see each side's primary passer with tracking-derived splits.
sched = safe(
"NGS schedule",
lambda: nfl.nfl_ngs_league_schedule(season=SEASON, season_type="REG", week=1),
)
if sched is not None and sched.height and "gameId" in sched.columns:
gid = sched["gameId"][0]
ov = safe(f"NGS gamecenter {gid}",
lambda: nfl.nfl_ngs_gamecenter_overview(game_id=gid, group="passers"))
if ov is not None and ov.height:
out = ov.select([c for c in ["side", "teamAbbr", "playerName", "position",
"completions", "attempts", "passYards", "touchdowns"]
if c in ov.columns])
else:
out = "gamecenter unavailable"
else:
out = "NGS schedule unavailable"
out
Recipe 5 β Season rushing leaders πβ
Roll the weekly box scores in load_nfl_player_stats up to season totals and crown the ground-game kings (β₯150 carries).
ps = nfl.load_nfl_player_stats()
rush_cols = {"season", "season_type", "carries", "rushing_yards"}
if rush_cols.issubset(ps.columns):
rush_lb = (
ps.filter((pl.col("season") == SEASON) & (pl.col("season_type") == "REG"))
.group_by(["player_display_name", "recent_team"])
.agg(
pl.col("carries").sum().alias("carries"),
pl.col("rushing_yards").sum().alias("rush_yds"),
pl.col("rushing_tds").sum().alias("rush_td"),
pl.col("rushing_epa").sum().round(1).alias("rush_epa"),
)
.filter(pl.col("carries") >= 150)
.sort("rush_yds", descending=True)
.head(10)
)
else:
rush_lb = "player_stats schema changed β rushing columns missing"
rush_lb
Recipe 6 β The most efficient offenses (EPA/play) πβ
Expected points added is the modeller's favourite efficiency yardstick. Average epa over every run/pass in load_nfl_pbp to rank offenses.
pbp = nfl.load_nfl_pbp([SEASON])
if {"epa", "posteam", "play_type"}.issubset(pbp.columns):
epa_off = (
pbp.filter(pl.col("play_type").is_in(["run", "pass"]))
.group_by("posteam")
.agg(
pl.col("epa").mean().round(3).alias("epa_per_play"),
pl.len().alias("plays"),
)
.filter(pl.col("posteam").is_not_null())
.sort("epa_per_play", descending=True)
.head(10)
)
else:
epa_off = "pbp schema changed β epa/posteam columns missing"
epa_off
Recipe 7 β Third-down conversion kings πβ
Move-the-chains efficiency: keep only 3rd-down run/pass snaps and divide conversions by attempts per offense β a classic play-by-play split.
if {"down", "third_down_converted", "posteam"}.issubset(pbp.columns):
third = (
pbp.filter((pl.col("down") == 3) & (pl.col("play_type").is_in(["run", "pass"])))
.group_by("posteam")
.agg(
pl.col("third_down_converted").sum().alias("conversions"),
pl.len().alias("attempts"),
)
.filter(pl.col("posteam").is_not_null())
.with_columns((pl.col("conversions") / pl.col("attempts") * 100).round(1).alias("conv_pct"))
.sort("conv_pct", descending=True)
.head(10)
)
else:
third = "pbp schema changed β third-down columns missing"
third
Recipe 8 β Red-zone touchdown efficiency π―β
Filter the play-by-play to snaps inside the opponent's 20 (yardline_100 β€ 20) and see which offenses actually punch it in instead of settling for three.
if {"yardline_100", "touchdown", "posteam"}.issubset(pbp.columns):
redzone = (
pbp.filter((pl.col("yardline_100") <= 20) & (pl.col("play_type").is_in(["run", "pass"])))
.group_by("posteam")
.agg(
pl.col("touchdown").sum().alias("rz_tds"),
pl.len().alias("rz_plays"),
)
.filter((pl.col("posteam").is_not_null()) & (pl.col("rz_plays") >= 80))
.with_columns((pl.col("rz_tds") / pl.col("rz_plays") * 100).round(1).alias("td_pct"))
.sort("td_pct", descending=True)
.head(10)
)
else:
redzone = "pbp schema changed β red-zone columns missing"
redzone
Recipe 9 β Snap-share workhorse running backs π΄β
load_nfl_snap_counts carries offense_pct per game β average it to find the backs their teams simply would not take off the field.
snaps = nfl.load_nfl_snap_counts([SEASON])
if {"position", "offense_pct", "player"}.issubset(snaps.columns):
workhorses = (
snaps.filter(pl.col("position") == "RB")
.group_by(["player", "team"])
.agg(
(pl.col("offense_pct").mean() * 100).round(1).alias("avg_snap_pct"),
pl.len().alias("games"),
)
.filter(pl.col("games") >= 10)
.sort("avg_snap_pct", descending=True)
.head(10)
)
else:
workhorses = "snap_counts schema changed β offense_pct/position missing"
workhorses
Recipe 10 β Receiving leaders, then a hop into pandas πΌβ
Aggregate the receiving box lines, .to_pandas(), and add derived columns (yards-per-catch, catch rate) with familiar pandas syntax β the polarsβpandas handoff is one method call.
rec_cols = {"receptions", "receiving_yards", "targets", "season", "season_type"}
if rec_cols.issubset(ps.columns):
rec = (
ps.filter((pl.col("season") == SEASON) & (pl.col("season_type") == "REG"))
.group_by(["player_display_name", "recent_team"])
.agg(
pl.col("receptions").sum().alias("rec"),
pl.col("receiving_yards").sum().alias("rec_yds"),
pl.col("targets").sum().alias("tgt"),
)
.filter(pl.col("rec") >= 70)
)
pdf = rec.to_pandas() # <-- polars -> pandas in one call
pdf["yards_per_rec"] = (pdf["rec_yds"] / pdf["rec"]).round(1)
pdf["catch_rate"] = (pdf["rec"] / pdf["tgt"] * 100).round(1)
rec_out = pdf.sort_values("rec_yds", ascending=False).head(10)[
["player_display_name", "recent_team", "rec", "rec_yds", "yards_per_rec", "catch_rate"]
].reset_index(drop=True)
else:
rec_out = "player_stats schema changed β receiving columns missing"
rec_out
Recipe 11 β Who gets open? NextGen separation π°οΈβ
The receiving statboard exposes a tracking-only metric box scores can't: average separation at the catch point. Rank qualified targets (β₯80) to find the route-runners defenders can't shadow.
sepboard = safe(
"NGS receiving statboard",
lambda: nfl.nfl_ngs_statboard(stat_type="receiving", season=SEASON, season_type="REG"),
)
if sepboard is not None and sepboard.height and "avgSeparation" in sepboard.columns:
sep = (
sepboard.filter(pl.col("targets") >= 80)
.select([c for c in [
"player_displayName", "player_position", "avgSeparation",
"avgYACAboveExpectation", "catchPercentage", "yards",
] if c in sepboard.columns])
.sort("avgSeparation", descending=True)
.head(10)
)
else:
sep = "NGS receiving statboard unavailable"
sep
Recipe 12 β The nail-biters: closest games of the season π¬β
load_nfl_schedule carries the final result (home margin). Take its absolute value and sort ascending to surface the one-score thrillers β built-in betting lines ride along too.
sched = nfl.load_nfl_schedule([SEASON])
if {"result", "game_type", "home_team", "away_team"}.issubset(sched.columns):
scored = (
sched.filter((pl.col("game_type") == "REG") & pl.col("result").is_not_null())
.with_columns(pl.col("result").abs().alias("margin"))
)
want = [
"week", "away_team", "away_score", "home_team", "home_score",
"margin", "spread_line", "total_line",
]
nailbiters = (
scored.select([c for c in want if c in scored.columns])
.sort("margin")
.head(10)
)
else:
nailbiters = "schedule schema changed β result/team columns missing"
nailbiters
ποΈ Season helpersβ
Handy when you want "the current/most-recent season" instead of hard-coding a
year. get_current_nfl_season() / get_current_nfl_week() track the live
calendar; most_recent_nfl_season() gives the latest season with data.
{
"current_season": nfl.get_current_nfl_season(),
"current_week": nfl.get_current_nfl_week(),
"most_recent_season": nfl.most_recent_nfl_season(),
}
π Where to nextβ
- Premium native API β the full
nfl_*endpoint set:docs/docs/nfl/reference/nfl_api.md - NextGen Stats & loaders β
nfl_ngs_*andload_nfl_*:docs/docs/nfl/reference/additional.mdanddocs/docs/nfl/reference/loaders.md - ESPN secondary path β every
espn_nfl_*wrapper:docs/docs/nfl/reference/site.md - Pass
return_as_pandas=Truefor a pandas frame, orreturn_parsed=False(native API) for raw JSON. - R user? The same data lives in nflfastR / nflreadr; Python parity is nflreadpy.
Now go build something great β may your EPA be ever positive! ππ