π NHL hockey with sportsdataverse-py
Welcome to the show! π sportsdataverse.nhl gives you the NHL's own modern
feed β the same api-web.nhle.com data that powers NHL.com β plus the shiny
NHL EDGE puck-and-player tracking layer, the api.nhle.com stats-REST and
records flat APIs, an ESPN fallback, and fast parquet loaders. All of it hands
you tidy polars DataFrames, ready to model. π
We'll lead with the premium native wrappers (the nhl_* and nhl_edge_*
functions) β they're the league's first-party data, no key required β and keep
ESPN (espn_nhl_*) as a friendly secondary path.
R user? The companion package is fastRhockey (NHL + PWHL). Let's drop the puck! π
π§° The toolboxβ
Every native call returns a tidy polars DataFrame by default β pass
return_as_pandas=True for pandas, or return_parsed=False for the raw JSON.
Here's the kit we'll use (click any name for the full reference). The β rows
are the premium native NHL feed β start there.
| Function | What it gives you | Source |
|---|---|---|
nhl_web_schedule | A day's games + scores, native ids | β NHL api-web |
nhl_web_pbp | Event-level play-by-play (one row per event) | β NHL api-web |
nhl_boxscore | One row per player (skaters + goalies) | β NHL api-web |
nhl_standings | Team standings with conference/division | β NHL api-web |
nhl_roster | A club's roster for a season | β NHL api-web |
nhl_club_schedule_season | A team's full-season schedule | β NHL api-web |
nhl_player_game_log | A player's game-by-game line | β NHL api-web |
nhl_player_landing | A player's bio + career snapshot | β NHL api-web |
nhl_skater_leaders | Season skater leaderboard | β NHL api-web |
nhl_goalie_leaders | Season goalie leaderboard | β NHL api-web |
nhl_club_stats | A club's full skater + goalie stat lines | β NHL api-web |
nhl_player_landing | A player's bio + career snapshot | β NHL api-web |
nhl_score | A day's final scores + series context | β NHL api-web |
nhl_draft_picks | Draft board for a year/round | β NHL api-web |
nhl_edge_skater_skating_speed_detail | A skater's tracked speed vs league avg + percentile | β NHL EDGE |
nhl_edge_skater_landing | EDGE skater leaderboards (hardest shot, top speedβ¦) | β NHL EDGE |
nhl_edge_team_landing | EDGE team-level tracking leaders | β NHL EDGE |
nhl_edge_goalie_landing | EDGE goalie tracking leaders | β NHL EDGE |
nhl_stats_rest_leaders_skaters | Stats-REST top-10 skaters by attribute | β NHL stats-REST |
nhl_stats_rest_leaders_goalies | Stats-REST top-10 goalies by attribute | β NHL stats-REST |
nhl_records_franchises | Every franchise in NHL history (Records API) | β NHL records |
nhl_records_franchise_team_totals | All-time W/L/points per franchise | β NHL records |
load_nhl_schedule | Pre-built schedule parquet (offline-friendly) | π¦ loader |
load_nhl_team_box | Pre-built team box parquet | π¦ loader |
load_nhl_player_box | Pre-built player box parquet | π¦ loader |
espn_nhl_teams | ESPN team directory | ESPN |
espn_nhl_schedule | ESPN schedule for a date | ESPN |
espn_nhl_pbp | ESPN play-by-play (a dict) | ESPN |
espn_nhl_standings | ESPN standings | ESPN |
π Setupβ
pip install sportsdataverse
No API key needed β the NHL's public feeds ship ready to go. π
import polars as pl
import sportsdataverse as sdv
import sportsdataverse.nhl as nhl
The native feeds are live and seasonal (and occasionally throttle), so a tiny
safe() helper runs each network call defensively β you get the frame when the
feed is up, and a friendly one-liner when it isn't (never a scary traceback). π
We'll reference the 2024 Stanley Cup Final Game 7 throughout: Florida
Panthers 2, Edmonton Oilers 1 (June 24, 2024). Note the native game id
2023030417 (season + game-type + sequence) is different from ESPN's
401675111 for the very same game.
def safe(label, thunk):
try:
out = thunk()
print(f'β
{label}')
return out
except Exception as e: # noqa: BLE001 -- demo resilience
print(f'βοΈ {label}: unavailable right now ({type(e).__name__})')
return None
# Game 7, 2024 Stanley Cup Final β two ids for the same game
NATIVE_GAME = 2023030417 # api-web.nhle.com
ESPN_GAME = 401675111 # ESPN
SEASON = 20232024 # NHL season strings are start+end years
β The premium native feed (nhl_*)β
These wrappers hit the league's own api-web.nhle.com. They're first-party,
richly detailed, and return polars directly. Let's tour the headline calls.
π Scheduleβ
nhl_web_schedule(date='YYYY-MM-DD')
returns a day's games with home_team_* / away_team_* columns and the native id.
sched = safe('native schedule', lambda: nhl.nhl_web_schedule(date='2024-06-24'))
cols = ['id', 'game_state', 'home_team_abbrev', 'home_team_score',
'away_team_abbrev', 'away_team_score']
(sched.select([c for c in cols if c in sched.columns]).head()
if sched is not None else 'schedule unavailable')
π₯ Play-by-playβ
nhl_web_pbp(game_id=...) returns
one row per event in clean snake_case β type_desc_key, time_in_period,
period_descriptor_number, plus shot coordinates details_x_coord /
details_y_coord. That coordinate pair is your gateway to shot maps. πΊοΈ
pbp = safe('native pbp', lambda: nhl.nhl_web_pbp(game_id=NATIVE_GAME))
if pbp is not None:
print('pbp shape:', pbp.shape)
show = ['period_descriptor_number', 'time_in_period', 'type_desc_key',
'details_event_owner_team_id', 'details_x_coord', 'details_y_coord']
out = pbp.select([c for c in show if c in pbp.columns]).head()
else:
out = 'pbp unavailable'
out
# Event-type mix for the game β native uses `type_desc_key`
(pbp.group_by('type_desc_key').agg(pl.len().alias('events'))
.sort('events', descending=True).head(10)
if pbp is not None else 'pbp unavailable')
π Boxscoreβ
nhl_boxscore(game_id=...) gives
one row per player (skaters + goalies) with home_away, position, and the
per-player stat line. Let's pull the night's top scorers.
box = safe('native boxscore', lambda: nhl.nhl_boxscore(game_id=NATIVE_GAME))
if box is not None:
out = (box.filter(pl.col('position') != 'G')
.select(['name_default', 'home_away', 'position',
'goals', 'assists', 'points', 'sog', 'toi'])
.sort('points', descending=True).head())
else:
out = 'boxscore unavailable'
out
π Standingsβ
nhl_standings(date='YYYY-MM-DD')
returns one row per team with conference/division context and points β pass any
date to get the table as of that day.
standings = safe('native standings', lambda: nhl.nhl_standings(date='2024-04-15'))
if standings is not None:
out = (standings.select(['team_name_default', 'conference_name', 'division_name',
'games_played', 'wins', 'losses', 'points'])
.sort('points', descending=True).head())
else:
out = 'standings unavailable'
out
π°οΈ NHL EDGE β player & puck trackingβ
EDGE is the league's tracking layer: skating speed, shot speed, zone time,
skating distance β all measured by sensors. The *_detail calls return a
player's tracked values alongside the league average and percentile, and
the *_landing calls return wide leaderboard frames.
| Function | Tracking metric |
|---|---|
nhl_edge_skater_skating_speed_detail | top speed, speed bursts, vs league avg |
nhl_edge_skater_landing | skater leaders (hardest shot, top speedβ¦) |
nhl_edge_team_landing | team-level tracking leaders |
Here's Connor McDavid's (8478402) skating-speed detail for 2023-24 β how does
the fastest man in the league stack up? β‘
edge = safe('EDGE skating speed',
lambda: nhl.nhl_edge_skater_skating_speed_detail(player_id=8478402, season=SEASON))
if edge is not None:
keep = [c for c in (
'skating_speed_details_max_skating_speed_imperial',
'skating_speed_details_max_skating_speed_league_avg_imperial',
'skating_speed_details_max_skating_speed_percentile',
'skating_speed_details_bursts_over22_value',
'skating_speed_details_bursts_over22_percentile',
) if c in edge.columns]
out = edge.select(keep) if keep else edge.head()
else:
out = 'EDGE detail unavailable'
out
π Stats-REST & Records flat APIsβ
Two more first-party surfaces round out the kit:
- Stats-REST (
api.nhle.com/stats/rest) β clean leaderboard frames.nhl_stats_rest_leaders_skaters(attribute=...)returns a tidy top-10 for any attribute (goals,points,assists, β¦);nhl_stats_rest_leaders_goaliesis the goalie twin. - Records (
records.nhl.com) β historical reference data, e.g.nhl_records_franchises.
leaders = safe('stats-rest goal leaders',
lambda: nhl.nhl_stats_rest_leaders_skaters(attribute='goals'))
if leaders is not None:
keep = ['player_full_name', 'player_position_code', 'team_tri_code', 'goals']
out = leaders.select([c for c in keep if c in leaders.columns]).head(10)
else:
out = 'leaders unavailable'
out
π³ Cookbook: common NHL tasksβ
Now the fun part β a dozen recipes you'll reach for constantly, almost all
built on the premium native feed. Each one is a copy-paste starting
point: a game pull, a team view, a player line, leaderboards, splits,
joins, season-to-date aggregates, the draft board, franchise history, and
EDGE tracking β every call wrapped in safe() so an offseason or a
throttle never costs you a traceback. π³
Recipe 1 β A game's boxscore + play-by-play π―β
Grab a game_id from nhl_web_schedule,
then pull the nhl_boxscore and
nhl_web_pbp together β the box
for the line score, the pbp for the event stream.
if sched is not None and sched.height:
gid = int(sched['id'][0])
r_box = safe(f'boxscore {gid}', lambda: nhl.nhl_boxscore(game_id=gid))
r_pbp = safe(f'pbp {gid}', lambda: nhl.nhl_web_pbp(game_id=gid))
print('players in box:', None if r_box is None else r_box.height,
'| pbp events:', None if r_pbp is None else r_pbp.height)
else:
print('no schedule rows to pick a game_id from')
Recipe 2 β A team, its schedule & its roster π₯β
Use the team tri-code (e.g. FLA) with
nhl_club_schedule_season
for the full slate and nhl_roster
for the player list.
TEAM = 'FLA'
club_sched = safe(f'{TEAM} schedule',
lambda: nhl.nhl_club_schedule_season(team=TEAM, season=SEASON))
roster = safe(f'{TEAM} roster', lambda: nhl.nhl_roster(team=TEAM, season=SEASON))
print('games:', None if club_sched is None else club_sched.height,
'| roster size:', None if roster is None else roster.height)
if roster is not None and roster.height:
cols = ['id', 'first_name_default', 'last_name_default',
'sweater_number', 'position_code', 'shoots_catches']
out = roster.select([c for c in cols if c in roster.columns]).head()
else:
out = 'roster unavailable'
out
Recipe 3 β A player's game log + the league leaderboard β‘β
Pair a single player's nhl_player_game_log
(game-by-game) with the season-wide
nhl_skater_leaders board
to see where they rank. McDavid is 8478402.
gamelog = safe('McDavid game log',
lambda: nhl.nhl_player_game_log(player_id=8478402, season=SEASON))
if gamelog is not None and gamelog.height:
cols = ['game_date', 'opponent_abbrev', 'goals', 'assists', 'points', 'shots', 'toi']
out = gamelog.select([c for c in cols if c in gamelog.columns]).head()
else:
out = 'game log unavailable'
out
board = safe('skater leaders', lambda: nhl.nhl_skater_leaders(season=SEASON))
if board is not None and board.height:
cols = ['category', 'first_name_default', 'last_name_default', 'team_abbrev', 'value']
out = board.select([c for c in cols if c in board.columns]).head(10)
else:
out = 'leaders unavailable'
out
Recipe 4 β An EDGE tracking leaderboard π°οΈβ
nhl_edge_skater_landing
returns a wide single-row frame of EDGE leaders β hardest shot, fastest
skater, and more. Here we surface who owned the hardest shot in 2023-24.
el = safe('EDGE skater leaders', lambda: nhl.nhl_edge_skater_landing(season=SEASON))
if el is not None:
keep = [c for c in el.columns if c.startswith('leaders_hardest_shot_player_')
and ('first_name' in c or 'last_name' in c or 'team_abbrev' in c
or c.endswith('position'))]
out = el.select(keep) if keep else el.head()
else:
out = 'EDGE leaders unavailable'
out
Recipe 5 β Who's hot? Standings by last-10 form π₯β
The native nhl_standings
frame carries rich split columns β l10_* (last ten games) and streak_* β
so you can rank teams by recent form instead of season-long points.
hot = safe('standings as-of date',
lambda: nhl.nhl_standings(date='2024-04-15'))
if hot is not None and hot.height:
cols = ['team_name_default', 'l10_wins', 'l10_losses', 'l10_ot_losses',
'l10_points', 'streak_code', 'streak_count', 'points']
out = (hot.select([c for c in cols if c in hot.columns])
.sort(['l10_points', 'points'], descending=True).head(8))
else:
out = 'standings unavailable'
out
Recipe 6 β A whole team's stat lines in one call πβ
nhl_club_stats returns a
dict with skaters and goalies frames β the entire roster's season
totals, no looping over players. Here are the Panthers' top point-getters.
cs = safe('FLA club stats',
lambda: nhl.nhl_club_stats(team='FLA', season=SEASON))
if isinstance(cs, dict) and isinstance(cs.get('skaters'), pl.DataFrame) and cs['skaters'].height:
sk = cs['skaters']
cols = ['first_name_default', 'last_name_default', 'position_code',
'games_played', 'goals', 'assists', 'points', 'shots',
'avg_time_on_ice_per_game']
out = (sk.select([c for c in cols if c in sk.columns])
.sort('points', descending=True).head(8))
else:
out = 'club stats unavailable'
out
Recipe 7 β Goalie leaderboard + a netminder's bio π₯ β
Pair the season-wide
nhl_goalie_leaders
board (it bundles wins, save %, GAA and shutouts in one frame, tagged by
category) with a single goalie's
nhl_player_landing
bio card.
gboard = safe('goalie leaders', lambda: nhl.nhl_goalie_leaders(season=SEASON))
if gboard is not None and gboard.height:
cols = ['category', 'first_name_default', 'last_name_default',
'team_abbrev', 'value']
out = (gboard.filter(pl.col('category') == 'wins')
.select([c for c in cols if c in gboard.columns])
.sort('value', descending=True).head(5)
if 'category' in gboard.columns else gboard.head())
else:
out = 'goalie leaders unavailable'
out
# Bobrovsky's bio card (player_id 8475683) β one wide row
bio = safe('goalie landing', lambda: nhl.nhl_player_landing(player_id=8475683))
if bio is not None and bio.height:
cols = ['first_name_default', 'last_name_default', 'position',
'current_team_abbrev', 'height_in_inches', 'weight_in_pounds',
'birth_city_default', 'birth_country', 'draft_details_year',
'draft_details_overall_pick']
out = bio.select([c for c in cols if c in bio.columns])
else:
out = 'player landing unavailable'
out