Skip to main content
Version: main

πŸ’ The PWHL with sportsdataverse-py

Welcome to professional women's hockey! The Professional Women's Hockey League (PWHL) dropped its first puck in January 2024 with six clubs β€” Boston, Minnesota, MontrΓ©al, New York, Ottawa and Toronto β€” and it's been must-watch hockey ever since. πŸŽ‰

sportsdataverse.pwhl gives you the whole league two ways:

  1. πŸ“¦ load_pwhl_* release loaders β€” fast, reliable parquet snapshots (schedules, boxscores, play-by-play, scoring & penalty summaries, rosters). Perfect for season-long analysis, and they work great offline.
  2. πŸ›°οΈ pwhl_* live wrappers + analytics β€” straight off the HockeyTech stats feed (standings, leaders, rosters, stats, single-game PBP) plus derived on-ice metrics (Corsi, time-on-ice, shifts).

And the best part: no API key needed β€” the public HockeyTech client key ships with the package. R companion: fastRhockey. Let's drop the puck! πŸ₯…

🧰 The toolbox​

Everything returns a tidy polars DataFrame by default β€” pass return_as_pandas=True for pandas. The πŸ“¦ loaders read pre-built release parquets (one season per call); the πŸ›°οΈ live wrappers hit the HockeyTech API in real time. Both are premium PWHL sources. Click any name for the full reference:

FunctionWhat it gives youSource
load_pwhl_scheduleGames + results, one row per gameπŸ“¦ loader
load_pwhl_rostersOne row per player per team (skaters + goalies)πŸ“¦ loader
load_pwhl_skater_boxSkater boxscore, one row per player per gameπŸ“¦ loader
load_pwhl_goalie_boxGoalie boxscore (saves, shots against, GAA inputs)πŸ“¦ loader
load_pwhl_team_boxTeam boxscore (shots, PP, faceoffs)πŸ“¦ loader
load_pwhl_pbpEvent-level play-by-play (wide, with coordinates)πŸ“¦ loader
load_pwhl_scoring_summaryTidy goal log (scorer + assists + situation flags)πŸ“¦ loader
load_pwhl_penalty_summaryTidy penalty log (infraction, minutes, who took it)πŸ“¦ loader
load_pwhl_shots_by_periodPer-period shot & goal totals per gameπŸ“¦ loader
load_pwhl_three_starsPost-game three-star selectionsπŸ“¦ loader
pwhl_scheduleLive schedule, one row per gameπŸ›°οΈ live
pwhl_standingsLive standings, one row per teamπŸ›°οΈ live
pwhl_teamsTeams in a season (grab team_ids)πŸ›°οΈ live
pwhl_team_rosterA team's rosterπŸ›°οΈ live
pwhl_leadersStatistical leadersπŸ›°οΈ live
pwhl_statsAggregate skater/goalie statsπŸ›°οΈ live
pwhl_player_searchFind a player_id by nameπŸ›°οΈ live
pwhl_player_statsA player's season-by-season stat linesπŸ›°οΈ live
pwhl_pbpEnriched single-game play-by-playπŸ›°οΈ live
pwhl_game_corsiOn-ice Corsi / Fenwick per playerπŸ›°οΈ live
pwhl_player_toiTime-on-ice per playerπŸ›°οΈ live
pwhl_game_shiftsRaw shift stintsπŸ›°οΈ live
most_recent_pwhl_season Β· pwhl_season_idSeason helpersπŸ›°οΈ live

πŸ”Œ Setup​

pip install sportsdataverse

No key, no config β€” just import and go.

import polars as pl
import sportsdataverse.pwhl as pwhl

# The inaugural season is 2024; this helper tracks the latest known season.
print("most recent PWHL season:", pwhl.most_recent_pwhl_season())

The πŸ›°οΈ live HockeyTech feed is seasonal and occasionally rate-limited, so a tiny safe() helper runs those calls defensively β€” you get the frame when the feed is up, and a friendly one-liner when it isn't (never a scary traceback). The πŸ“¦ loaders read release parquets and are rock-solid, so they don't need the wrapper. πŸ›Ÿ

def safe(label, thunk):
try:
out = thunk()
print(f"βœ… {label}")
return out
except Exception as e: # noqa: BLE001 -- demo resilience
print(f"⏭️ {label}: unavailable right now ({type(e).__name__})")
return None

πŸ“… The schedule (loader)​

load_pwhl_schedule returns one row per game with the result and a set of flag/URL columns pointing at the per-game feeds. Pass seasons=[2024] (a list β€” you can stack multiple seasons). ⚠️ Heads up: home_score/away_score come back as strings, so cast them before doing arithmetic.

schedule = pwhl.load_pwhl_schedule(seasons=[2024])
schedule.shape
schedule.select([
'game_id', 'game_date', 'home_team', 'away_team',
'home_score', 'away_score', 'winner', 'game_type',
]).head()

πŸ‘₯ Rosters (loader)​

load_pwhl_rosters gives one row per player per team, split into skaters and goalies via the player_type column.

rosters = pwhl.load_pwhl_rosters(seasons=[2024])
rosters.select([
'team', 'team_abbr', 'player_type', 'first_name', 'last_name',
'jersey_number', 'position',
]).head()

πŸ“Š Boxscores (loader)​

Boxscores come in three flavours β€” team_box, skater_box, and goalie_box β€” each one row per team/player per game.

FunctionOne row per…
load_pwhl_team_boxteam per game
load_pwhl_skater_boxskater per game
load_pwhl_goalie_boxgoalie per game
skater_box = pwhl.load_pwhl_skater_box(seasons=[2024])
skater_box.select([
'game_id', 'first_name', 'last_name', 'position',
'goals', 'assists', 'points', 'shots', 'plus_minus', 'time_on_ice',
]).head()
goalie_box = pwhl.load_pwhl_goalie_box(seasons=[2024])
goalie_box.select([
'game_id', 'first_name', 'last_name',
'saves', 'shots_against', 'goals_against', 'time_on_ice',
]).head()

🎬 Play-by-play (loader)​

load_pwhl_pbp returns a wide event log. The event column tags each row as faceoff, shot, goal, or penalty β€” and there are several coordinate systems (x_coord/y_coord plus rink-normalized *_fixed / *_right variants) for drawing rink plots.

pbp = pwhl.load_pwhl_pbp(seasons=[2024])
pbp.shape
(pbp
.group_by('event')
.agg(pl.len().alias('events'))
.sort('events', descending=True))

🍳 Cookbook: common PWHL tasks​

Now the fun part β€” a baker's dozen of recipes you'll reach for constantly. Recipes 1–11 lean on the rock-solid πŸ“¦ loaders (great offline); recipes 12–13 tour the πŸ›°οΈ live wrappers, wrapped in safe() so an offseason or a flaky feed never breaks your run. Every recipe ends in a tidy, ready-to-read frame.

Recipe 1 β€” Standings from the schedule πŸ†β€‹

No loader is needed for a quick standings table: the schedule's winner column makes a regular-season win count a one-liner.

(schedule
.filter(pl.col('game_type') == 'regular')
.group_by('winner')
.agg(pl.len().alias('wins'))
.sort('wins', descending=True))

Recipe 2 β€” Season scoring leaders πŸ₯‡β€‹

Aggregate the skater boxscore across every game to build a points leaderboard β€” the inaugural-season top of the table.

(skater_box
.group_by(['player_id', 'first_name', 'last_name'])
.agg(
pl.col('goals').sum().alias('goals'),
pl.col('assists').sum().alias('assists'),
pl.col('points').sum().alias('points'),
)
.sort('points', descending=True)
.select(['first_name', 'last_name', 'goals', 'assists', 'points'])
.head(10))

Recipe 3 β€” Goalie save-percentage leaders πŸ§€β€‹

Sum saves and shots-against from the goalie boxscore, then compute a season save percentage. We require a minimum shot volume so a one-game cameo doesn't top the list.

(goalie_box
.group_by(['player_id', 'first_name', 'last_name'])
.agg(
pl.col('saves').sum().alias('saves'),
pl.col('shots_against').sum().alias('shots_against'),
pl.col('goals_against').sum().alias('goals_against'),
)
.filter(pl.col('shots_against') >= 100)
.with_columns(
(pl.col('saves') / pl.col('shots_against')).round(3).alias('save_pct')
)
.sort('save_pct', descending=True)
.select(['first_name', 'last_name', 'shots_against', 'goals_against', 'save_pct'])
.head(10))

Recipe 4 β€” Biggest blowouts of the season πŸ’₯​

Cast the string scores to integers, compute the margin, and sort β€” the season's most lopsided games fall right out.

(schedule
.with_columns(
pl.col('home_score').cast(pl.Int32),
pl.col('away_score').cast(pl.Int32),
)
.with_columns(
(pl.col('home_score') - pl.col('away_score')).abs().alias('margin')
)
.sort('margin', descending=True)
.select(['game_date', 'home_team', 'home_score',
'away_score', 'away_team', 'winner', 'margin'])
.head(10))

Recipe 5 β€” Team offense: shots & shooting % βš‘β€‹

Roll the team boxscore up to the club level for a quick offensive profile β€” total goals, shot volume, and finishing rate.

# Map each team_id to its abbreviation (both Int32-keyed), then roll up the
# skater box to the club level for a quick offensive profile.
team_lookup = (pwhl.load_pwhl_team_box(seasons=[2024])
.select(['team_id', 'team_abbr']).unique())

(skater_box
.join(team_lookup, on='team_id', how='left')
.group_by('team_abbr')
.agg(
pl.col('goals').sum().alias('goals'),
pl.col('shots').sum().alias('shots'),
)
.with_columns(
(pl.col('goals') / pl.col('shots') * 100).round(1).alias('shooting_pct')
)
.filter(pl.col('team_abbr').is_not_null())
.sort('goals', descending=True))

Recipe 6 β€” Power-play conversion leaders πŸ”Œβ€‹

The team boxscore carries pp_goals and pp_opportunities, so a season power-play percentage is a single division.

team_box = pwhl.load_pwhl_team_box(seasons=[2024])

(team_box
.group_by('team_abbr')
.agg(
pl.col('pp_goals').sum().alias('pp_goals'),
pl.col('pp_opportunities').sum().alias('pp_opportunities'),
)
.with_columns(
(pl.col('pp_goals') / pl.col('pp_opportunities') * 100).round(1).alias('pp_pct')
)
.sort('pp_pct', descending=True))

Recipe 7 β€” Faceoff specialists πŸŽ―β€‹

The skater boxscore tracks faceoff wins and attempts. Aggregate, gate on a minimum-draw threshold, and the dot-dominators rise to the top.

(skater_box
.group_by(['first_name', 'last_name'])
.agg(
pl.col('faceoff_wins').sum().alias('fo_wins'),
pl.col('faceoff_attempts').sum().alias('fo_attempts'),
)
.filter(pl.col('fo_attempts') >= 200)
.with_columns(
(pl.col('fo_wins') / pl.col('fo_attempts') * 100).round(1).alias('fo_pct')
)
.sort('fo_pct', descending=True)
.head(10))

Recipe 8 β€” Two-way workhorses: hits + blocks πŸ§±β€‹

Not every contribution shows up on the scoresheet. Sum hits and blocked shots from the skater box to surface the players doing the dirty work β€” defenders usually own this list.

(skater_box
.group_by(['first_name', 'last_name', 'position'])
.agg(
pl.col('hits').sum().alias('hits'),
pl.col('blocked_shots').sum().alias('blocks'),
)
.with_columns(
(pl.col('hits') + pl.col('blocks')).alias('hits_plus_blocks')
)
.sort('hits_plus_blocks', descending=True)
.head(10))

Recipe 9 β€” The penalty box πŸš¨β€‹

load_pwhl_penalty_summary is a tidy per-infraction log. Two quick cuts: the most common infractions league-wide, and the players spending the most time in the box.

penalties = pwhl.load_pwhl_penalty_summary(seasons=[2024])

# Most common infractions
top_infractions = (penalties
.group_by('description')
.agg(pl.len().alias('count'))
.sort('count', descending=True)
.head(8))
top_infractions
# PIM leaders (players who actually took the penalty)
(penalties
.filter(pl.col('taken_by_last').is_not_null())
.group_by(['taken_by_first', 'taken_by_last'])
.agg(
pl.col('minutes').sum().alias('pim'),
pl.len().alias('penalties'),
)
.sort('pim', descending=True)
.head(10))

Recipe 10 β€” When do goals get scored? ⏱️​

Slice the goal log out of the play-by-play and bucket it by period β€” and pull the league's top finishers straight from the event == 'goal' rows while you're there.

goal_events = pbp.filter(pl.col('event') == 'goal')

# Goals by period
goals_by_period = (goal_events
.group_by('period_of_game')
.agg(pl.len().alias('goals'))
.sort('period_of_game'))
goals_by_period
# Top goal-scorers from the play-by-play feed
(goal_events
.filter(pl.col('player_name_last').is_not_null())
.group_by(['player_name_first', 'player_name_last'])
.agg(pl.len().alias('goals'))
.sort('goals', descending=True)
.head(10))

Recipe 11 β€” Three-stars honour roll ⭐ and a head-to-head series​

Two compact joins-on-themselves. First, who collected the most first-star nods (load_pwhl_three_stars). Then a head-to-head series view from the schedule β€” swap in any two clubs.

three_stars = pwhl.load_pwhl_three_stars(seasons=[2024])

# First-star honour roll
(three_stars
.filter(pl.col('star') == 1)
.group_by(['first_name', 'last_name'])
.agg(pl.len().alias('first_stars'))
.sort('first_stars', descending=True)
.head(10))
# Head-to-head: Boston vs. Montreal, every meeting in 2024
A, B = 'Boston', 'Montreal'
(schedule
.filter(
((pl.col('home_team') == A) & (pl.col('away_team') == B)) |
((pl.col('home_team') == B) & (pl.col('away_team') == A))
)
.select(['game_date', 'home_team', 'home_score',
'away_score', 'away_team', 'winner', 'game_status']))

Recipe 12 β€” Find a player, then pull her career lines πŸ›°οΈπŸ”Žβ€‹

A classic two-step lookup off the live feed: pwhl_player_search resolves a name to a player_id, then pwhl_player_stats returns her season-by-season stat lines. Both are safe()-wrapped for offseason resilience.

hit = safe('player search: Spooner', lambda: pwhl.pwhl_player_search('Spooner'))
if hit is not None and getattr(hit, 'height', 0):
pid = int(hit['player_id'][0])
career = safe(f'player stats {pid}', lambda: pwhl.pwhl_player_stats(player_id=pid))
if career is not None and career.height:
keep = [c for c in ['season_name', 'team_code', 'games_played',
'goals', 'assists', 'points', 'points_per_game']
if c in career.columns]
out = career.select(keep)
else:
out = 'player stats feed unavailable right now'
else:
out = 'player search feed unavailable right now'
out

Recipe 13 β€” A team, its roster, and a game's PBP + Corsi πŸ›°οΈπŸ“ˆβ€‹

The full live tour. List teams with pwhl_teams, grab a team_id, pull the roster with pwhl_team_roster, take a game_id from the loader schedule, then fetch enriched events with pwhl_pbp and shot-attempt share with pwhl_game_corsi β€” all from the same feed. Everything is safe()-wrapped, so offline this prints a friendly note instead of raising.

teams = safe('PWHL teams', lambda: pwhl.pwhl_teams(season=2024))
if teams is not None and teams.height:
tid = int(teams['team_id'][0])
roster = safe(f'PWHL roster {tid}', lambda: pwhl.pwhl_team_roster(team_id=tid, season=2024))
out = (roster.select([c for c in ['first_name', 'last_name', 'position', 'jersey_number']
if c in roster.columns]).head()
if roster is not None else teams.head())
else:
out = 'teams feed unavailable right now'
out
# A game_id from the loader schedule (offline-safe), then enrich it live.
gid = int(schedule['game_id'][0])
pbp_live = safe(f'PWHL pbp {gid}', lambda: pwhl.pwhl_pbp(game_id=gid))
corsi = safe(f'PWHL corsi {gid}', lambda: pwhl.pwhl_game_corsi(game_id=gid))
print('live pbp rows:', None if pbp_live is None else pbp_live.height,
'| corsi rows:', None if corsi is None else corsi.height)

πŸ›°οΈ Live standings & leaders​

Straight off the HockeyTech feed: pwhl_standings for the live table and pwhl_leaders for the statistical leaderboard. Both take a season end-year. We keep them safe()-wrapped because live endpoints are seasonal.

standings = safe('PWHL standings', lambda: pwhl.pwhl_standings(season=2024))
if standings is not None and standings.height:
keep = [c for c in ['team', 'team_code', 'games_played', 'wins', 'losses', 'points']
if c in standings.columns]
out = standings.select(keep).head(10)
else:
out = 'standings feed unavailable right now'
out
leaders = safe('PWHL leaders', lambda: pwhl.pwhl_leaders(season=2024))
if leaders is not None and getattr(leaders, 'height', 0):
keep = [c for c in ['rank', 'name', 'team_code', 'stat_formatted', 'type_formatted']
if c in leaders.columns]
out = leaders.select(keep).head(10)
else:
out = 'leaders feed unavailable right now'
out

πŸ₯… On-ice analytics​

Beyond the box score, three analytics helpers derive advanced metrics from the same shift + play-by-play feed:

FunctionMetric
pwhl_game_corsiCorsi / Fenwick shot-attempt share, with per-60 rates
pwhl_player_toisummed time-on-ice + shift counts per player
pwhl_game_shiftsraw shift stints (who's on the ice, when)

⚠️ Corsi note: the HockeyTech feed has no missed-shot event, so Corsi and Fenwick here are proxies counting shots + blocked shots + goals only (corsi_includes_missed = False).

toi = safe(f'PWHL TOI {gid}', lambda: pwhl.pwhl_player_toi(game_id=gid))
if toi is not None and toi.height:
out = (toi.select([c for c in ['first_name', 'last_name', 'toi_seconds', 'num_shifts']
if c in toi.columns])
.sort('toi_seconds', descending=True).head())
else:
out = 'time-on-ice feed unavailable right now'
out
if corsi is not None and corsi.height:
out = (corsi
.with_columns((pl.col('corsi_for') - pl.col('corsi_against')).alias('corsi_net'))
.select([c for c in ['player_id', 'corsi_for', 'corsi_against', 'corsi_net', 'corsi_for_per60']
if c in corsi.columns])
.sort('corsi_for_per60', descending=True)
.head())
else:
out = 'corsi feed unavailable right now'
out

✨ Bonus: tidy goal log + pandas interop​

load_pwhl_scoring_summary is a clean per-goal log β€” scorer plus up to two assists, with situation flags like power play, short handed, and game-winning. And because every loader takes return_as_pandas=True, dropping into the pandas world is one keyword away.

scoring = pwhl.load_pwhl_scoring_summary(seasons=[2024])
scoring.select([
'game_id', 'period', 'time', 'team_abbr',
'scorer_first', 'scorer_last', 'is_power_play', 'is_game_winning',
]).head()
# Same skater box, but as a pandas DataFrame β€” group with the pandas API.
skater_pd = pwhl.load_pwhl_skater_box(seasons=[2024], return_as_pandas=True)
print('type:', type(skater_pd).__name__, '| shape:', skater_pd.shape)
(skater_pd
.groupby(['first_name', 'last_name'], as_index=False)['points'].sum()
.sort_values('points', ascending=False)
.head(10))

πŸŽ‰ Where to next​

  • πŸ“¦ Loaders are your offline-friendly workhorses β€” stack seasons with seasons=[2024, 2025] and pass return_as_pandas=True for pandas.
  • πŸ›°οΈ Live wrappers (pwhl_*) pull fresh data and add analytics (Corsi, TOI, shifts) β€” no key required.
  • Full reference: the PWHL β†’ Loaders and Additional functions pages in the sidebar.
  • Junior & minor hockey? The same HockeyTech surface powers the AHL / OHL / WHL / QMJHL β€” see 11_junior_hockey_intro.ipynb.
  • The men's game and the modern NHL APIs live in 07_nhl_intro.ipynb.
  • R user? The same data lives in fastRhockey (NHL + PWHL).

Now go tell the story of the PWHL β€” the data's all here. πŸ’πŸ’œ