π Cricket with sportsdataverse-py
ESPN carries live scorecards, standings, and full match summaries for the world's
most widely-played bat-and-ball game. sportsdataverse-py wraps that surface through
the sportsdataverse.cricket module β one league= slug away from IPL, England's
county circuit, the ICC World Cup, and every other tournament ESPN indexes.
Cricket in 30 secondsβ
If you're new to cricket, the key numbers to know:
| Concept | What it means in the data |
|---|---|
| Innings | A team's turn to bat; T20 matches have one per team, Tests have two |
| Score string | "161/5 (18/20 ov, target 156)" β runs / wickets (overs used / overs allowed, target) |
| Wickets | Dismissals; ten wickets = all out, innings ends |
| Overs | Six-ball delivery sets; T20 = 20 overs, ODI = 50, Tests = open |
| Partnership | Runs scored by two batters sharing the crease |
The score string (home_score / away_score) is returned verbatim from ESPN so
downstream analysis retains the full cricket context rather than stripping it to
a bare integer.
What this notebook coversβ
- Setup and the
safe()guard helper - Scoreboard β live and recent matches for a league
- Standings β the
childrenhierarchy, flattened - Match summary β all 8 matchcard sections, with emphasis on the three heterogeneous batting / bowling / partnerships scorecard shapes
- Bonus endpoints β news, injuries, calendar
- Caveats and the full reference
π§° The toolboxβ
Everything returns a tidy polars DataFrame by default β pass
return_as_pandas=True for pandas, or return_parsed=False for the raw JSON dict.
The three cricket-specific parsers are:
| Parser | Input | Output |
|---|---|---|
parse_cricket_scoreboard | espn_cricket_scoreboard payload | One row per match; score strings in cricket format |
parse_cricket_standings | espn_cricket_standings payload | One row per team per group; flattened group column |
parse_cricket_summary | espn_cricket_summary payload | Dict of 8 section DataFrames (or single section) |
All other espn_cricket_* wrappers reuse the universal parsers (parse_news,
parse_items, parse_single_entity, etc.) shared across all ESPN-backed sports.
π Setupβ
pip install sportsdataverse
No API key required.
import polars as pl
import sportsdataverse.cricket as cricket
# IPL (Indian Premier League) league slug β used throughout this notebook.
# Other common slugs: 'eng.1' (England domestic), 'icc.worldcup' (ODI World Cup)
IPL = "8048"
print("polars", pl.__version__)
polars 1.40.1
The ESPN cricket feed is live and occasionally rate-limited, so a small safe()
helper runs every network call defensively. Any exception is caught, printed, and
None is returned β downstream cells check for None before proceeding.
def safe(label, thunk):
try:
out = thunk()
print(f"β
{label}")
return out
except Exception as exc:
print(f"β οΈ {label} β {exc}")
return None
π‘ Scoreboard β today's (and recent) matchesβ
espn_cricket_scoreboard
hits the Site v2 scoreboard endpoint for the given league= slug.
By default (return_parsed=True) it routes the payload through
parse_cricket_scoreboard and returns a tidy polars frame.
Pass return_parsed=False to get the raw ESPN JSON dict instead.
Each row is one match. The home_score / away_score columns carry the full
cricket score string β ESPN doesn't expose a clean integer run-count separately,
and the wickets + overs context matters.
board = safe(
"IPL scoreboard",
lambda: cricket.espn_cricket_scoreboard(league=IPL),
)
board
β
IPL scoreboard
shape: (1, 14)
ββββββββββββ¬βββββββββββββ¬βββββββββββββ¬βββββββββββ ββ¬ββββ¬βββββββββ¬ββββββββββββ¬ββββββββββββ¬ββββββββββββ
β event_id β date β name β short_name β β¦ β status β status_de β venue β neutral_s β
β --- β --- β --- β --- β β --- β tail β --- β ite β
β str β str β str β str β β null β --- β str β --- β
β β β β β β β str β β bool β
ββββββββββββͺβββββββββββββͺβββββββββββββͺβββββββββββββͺββββͺβββββββββͺββββββββββββͺββββββββββββͺββββββββββββ‘
β 1535465 β 2026-05-31 β Royal Chal β RCB v GT β β¦ β null β Final β Narendra β true β
β β T14:00Z β lengers β β β β β Modi β β
β β β Bengaluru β β β β β Stadium, β β
β β β v β¦ β β β β β Motera,β¦ β β
ββββββββββββ΄βββββββββββββ΄βββββββββββββ΄βββββββββββββ΄ββββ΄βββββββββ΄ββββββββββββ΄ββββββββββββ΄ββββββββββββ
What the columns meanβ
| Column | Description |
|---|---|
event_id | ESPN event identifier β pass this to espn_cricket_summary |
date | ISO-8601 match start time |
name / short_name | Full and abbreviated match name |
home_team / away_team | Display names |
home_score / away_score | Cricket score string, e.g. "161/5 (18/20 ov, target 156)" |
status | "Final", "In Progress", "Scheduled", etc. |
status_detail | Human-readable detail, e.g. "Chennai Super Kings won by 5 wickets" |
venue | Ground name |
neutral_site | Boolean β neutral ground match |
# If the scoreboard returned data, show the match-status breakdown.
if board is not None and board.height:
keep = [c for c in ["name", "home_score", "away_score", "status", "status_detail"] if c in board.columns]
print(board.select(keep))
else:
print("scoreboard unavailable right now β try again outside an off-season window")
shape: (1, 5)
ββββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββ¬βββββββββββββ¬βββββββββ¬ββββββββββββββββ
β name β home_score β away_score β status β status_detail β
β --- β --- β --- β --- β --- β
β str β str β str β null β str β
ββββββββββββββββββββββββββββββββββββͺββββββββββββββββββββββββββͺβββββββββββββͺβββββββββͺβββ βββββββββββββ‘
β Royal Challengers Bengaluru v β¦ β 161/5 (18/20 ov, target β 155/8 β null β Final β
β β 156) β β β β
ββββββββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββ΄βββββββββββββ΄βββββββββ΄ββββββββββββββββ
Raw payload modeβ
Pass return_parsed=False to skip the parser entirely and work with the raw
ESPN JSON. This is useful when you want to explore the full payload structure,
or when you need a field the parser doesn't yet surface.
raw_board = safe(
"IPL scoreboard (raw)",
lambda: cricket.espn_cricket_scoreboard(league=IPL, return_parsed=False),
)
if isinstance(raw_board, dict):
print("Top-level keys:", list(raw_board.keys()))
events = raw_board.get("events") or []
print(f"Events in payload: {len(events)}")
β
IPL scoreboard (raw)
Top-level keys: ['leagues', 'teams', 'standings', 'events', 'provider']
Events in payload: 1
π Standingsβ
espn_cricket_standings
returns the league table for a given season. The ESPN cricket standings payload
uses a children hierarchy (groups/divisions) rather than the flat groups shape
used in most other ESPN sports.
parse_cricket_standings flattens that hierarchy β each row is one team in one
group, with a group column so you can split multi-group tournaments (e.g. ICC
World Cup group stages) with a single .filter() call.
Optional parameters: season=, group=, standings_type=.
standings = safe(
"IPL standings",
lambda: cricket.espn_cricket_standings(league=IPL),
)
standings
β
IPL standings
shape: (10, 15)
βββββββββ¬βββββββββββββββββββββ¬ββββββββββ¬ββββββββββββββββββββ¬ββββ¬βββββββββ¬βββββββββ¬ββββββββββ¬ββββββββ
β group β team β team_id β team_abbreviation β β¦ β netrr β for β against β total β
β --- β --- β --- β --- β β --- β --- β --- β --- β
β str β str β str β str β β f64 β f64 β f64 β str β
βββββββββͺβββββββββββββββββββββͺββββββββββͺββββββββββββββββββββͺββββͺβββββββββͺβββββββββͺββββββββββͺββββββββ‘
β β Royal Challengers β 335970 β RCB β β¦ β 0.783 β 10.393 β 9.615 β β
β β Bengaluru β β β β β β β β
β β Gujarat Titans β 1298769 β GT β β¦ β 0.695 β 9.46 β 8.755 β β
β β Sunrisers β 628333 β SRH β β¦ β 0.524 β 10.337 β 9.82 β β
β β Hyderabad β β β β β β β β
β β Rajasthan Royals β 335977 β RR β β¦ β 0.189 β 10.096 β 9.907 β β
β β Punjab Kings β 335973 β PBKS β β¦ β 0.309 β 10.844 β 10.535 β β
β β Delhi Capitals β 335975 β DC β β¦ β -0.651 β 9.394 β 10.039 β β
β β Kolkata Knight β 335971 β KKR β β¦ β -0.147 β 9.096 β 9.24 β β
β β Riders β β β β β β β β
β β Chennai Super β 335974 β CSK β β¦ β -0.345 β 9.2 β 9.548 β β
β β Kings β β β β β β β β
β β Mumbai Indians β 335978 β MI β β¦ β -0.584 β 9.512 β 10.092 β β
β β Lucknow Super β 1298768 β LSG β β¦ β -0.74 β 9.132 β 9.868 β β
β β Giants β β β β β β β β
βββββββββ΄βββββββββββββββββββββ΄ββββββββββ΄ββββββββββββββββββββ΄ββββ΄βββββββββ΄βββββββββ΄ββββββββββ΄ββββββββ
if standings is not None and standings.height:
print("Columns:", standings.columns)
print("\nGroups present:", standings["group"].unique().to_list() if "group" in standings.columns else "(none)")
# Sort by points (or net run rate when available).
sort_col = next((c for c in ["points", "wins"] if c in standings.columns), None)
if sort_col:
print(f"\nTop 4 by {sort_col}:")
print(
standings
.sort(sort_col, descending=True)
.select([c for c in ["team", "group", "wins", "losses", "points", "net_run_rate"]
if c in standings.columns])
.head(4)
)
else:
print("standings unavailable right now")
Columns: ['group', 'team', 'team_id', 'team_abbreviation', 'rank', 'matches_played', 'matches_won', 'matches_lost', 'noresult', 'match_points', 'qualified', 'netrr', 'for', 'against', 'total']
Groups present: ['']
Why the children hierarchy mattersβ
Most ESPN standings payloads have a flat standings.groups[] block. Cricket uses
standings.children[] instead β each child is a group/division with its own
entries array. parse_cricket_standings walks that nesting and stitches a group
column onto every team row, so the resulting frame is directly filterable:
# Keep only Group A in a World Cup-style tournament
standings.filter(pl.col("group") == "Group A")
The numeric stat columns (wins, losses, points, net run rate, etc.) are snake-cased versions of whatever ESPN ships β they vary by tournament format.
π Match summary β the full scorecardβ
espn_cricket_summary is the
richest endpoint: a single call returns 8 frames that collectively make up the
entire matchcard.
| Section key | Content |
|---|---|
header | Match metadata β teams, status, venue, toss result |
matchcards_batting | Per-batter innings rows (runs, balls, fours, sixes, strike rate) |
matchcards_bowling | Per-bowler innings rows (overs, maidens, runs, wickets, economy) |
matchcards_partnerships | Partnership pairs (runs, balls, each batter's contribution) |
rosters | Full squad lists for both teams |
game_info | Match-level metadata (series name, match type, result method) |
leaders | Stat leaders for the match |
standings | In-match standings snapshot |
The three matchcards_* frames have different schemas β batting rows carry
runs/balls/strike_rate; bowling rows carry overs/wickets/economy;
partnership rows carry total_runs/total_balls plus per-batter run splits.
They are returned as separate frames so callers can work with each schema cleanly.
Event ID 1535465 is an IPL match (Chennai Super Kings vs. Mumbai Indians) that
is used as the worked example throughout.
EVENT_ID = 1535465 # IPL β Chennai Super Kings vs. Mumbai Indians
summary_raw = safe(
f"match summary {EVENT_ID}",
lambda: cricket.espn_cricket_summary(league=IPL, event_id=EVENT_ID, return_parsed=False),
)
if isinstance(summary_raw, dict):
print("Top-level keys:", list(summary_raw.keys()))
β
match summary 1535465
Top-level keys: ['notes', 'gameInfo', 'debuts', 'rosters', 'matchcards', 'leaders', 'article', 'videos', 'news', 'header', 'wallclockAvailable', 'meta', 'standings']
Parsing all 8 sections at onceβ
Call parse_cricket_summary with section=None (the default) to get a
dict[str, pl.DataFrame] keyed by section name.
from sportsdataverse.cricket.cricket_espn_parsers import parse_cricket_summary
if summary_raw is not None:
frames = parse_cricket_summary(summary_raw)
for name, df in frames.items():
print(f"{name:30s} {df.shape[0]:>4d} rows Γ {df.shape[1]:>3d} cols")
else:
frames = {}
print("summary payload unavailable β frames dict is empty")
header 1 rows Γ 15 cols
matchcards_batting 11 rows Γ 12 cols
matchcards_bowling 6 rows Γ 10 cols
matchcards_partnerships 6 rows Γ 10 cols
rosters 24 rows Γ 9 cols
game_info 1 rows Γ 7 cols
leaders 0 rows Γ 0 cols
standings 10 rows Γ 14 cols
π Section: matchcards_battingβ
One row per batter per innings. Each innings is identified by innings_number
and team. The summary column carries the batter's dismissal description
(e.g. "c Rohit b Bumrah"). Batters who haven't faced a ball yet appear with
null run / ball counts.
batting = frames.get("matchcards_batting", pl.DataFrame())
if batting.height:
print("Batting columns:", batting.columns)
show_cols = [c for c in ["innings_number", "team", "athlete_display_name",
"runs", "balls", "fours", "sixes", "strike_rate", "summary"]
if c in batting.columns]
print(batting.select(show_cols).head(10))
else:
print("batting scorecard unavailable for this match")
Batting columns: ['innings_number', 'team_name', 'total', 'runs_total', 'extras', 'player_id', 'player_name', 'dismissal', 'runs', 'balls_faced', 'fours', 'sixes']
shape: (10, 4)
ββββββββββββββββββ¬βββββββ¬ββββββββ¬ββββββββ
β innings_number β runs β fours β sixes β
β --- β --- β --- β --- β
β str β str β str β str β
ββββββββββββββββββͺβββββββͺββββββββͺββββββββ‘
β 2 β 32 β 4 β 2 β
β 2 β 75 β 9 β 3 β
β 2 β 1 β 0 β 0 β
β 2 β 15 β 1 β 1 β
β 2 β 1 β 0 β 0 β
β 2 β 24 β 3 β 1 β
β 2 β 11 β 1 β 0 β
β 2 β β β β
β 2 β β β β
β 2 β β β β
ββββββββββββββββββ΄βββββββ΄ββββββββ΄ββββββββ
π― Section: matchcards_bowlingβ
One row per bowler per innings. Key columns: overs, maidens, runs_conceded,
wickets, economy. Economy rate is the average runs conceded per over β
lower is better.
bowling = frames.get("matchcards_bowling", pl.DataFrame())
if bowling.height:
print("Bowling columns:", bowling.columns)
show_cols = [c for c in ["innings_number", "team", "athlete_display_name",
"overs", "maidens", "runs_conceded", "wickets", "economy"]
if c in bowling.columns]
print(bowling.select(show_cols).head(10))
else:
print("bowling scorecard unavailable for this match")
Bowling columns: ['innings_number', 'team_name', 'player_id', 'player_name', 'overs', 'maidens', 'conceded', 'wickets', 'economy_rate', 'nbw']
shape: (6, 4)
ββββββββββββββββββ¬ββββββββ¬ββββββββββ¬ββββββββββ
β innings_number β overs β maidens β wickets β
β --- β --- β --- β --- β
β str β str β str β str β
ββββββββββββββββββͺββββββββͺββββββββββͺββββββββββ‘
β 2 β 4.0 β 0 β 1 β
β 2 β 3.0 β 0 β 1 β
β 2 β 2.0 β 0 β 0 β
β 2 β 4.0 β 0 β 2 β
β 2 β 4.0 β 0 β 1 β
β 2 β 1.0 β 0 β 0 β
ββββββββββββββββββ΄ββββββββ΄ββββββββββ΄ββββββββββ
π€ Section: matchcards_partnershipsβ
One row per partnership (pair of batters sharing the crease) per innings. This frame has a different schema from batting and bowling β it carries total runs and balls for the partnership, plus per-batter run splits. Partnership data is cricket-specific and has no direct analogue in ball-sport box scores.
partnerships = frames.get("matchcards_partnerships", pl.DataFrame())
if partnerships.height:
print("Partnerships columns:", partnerships.columns)
show_cols = [c for c in ["innings_number", "team", "total_runs", "total_balls",
"batter1_display_name", "batter1_runs",
"batter2_display_name", "batter2_runs"]
if c in partnerships.columns]
print(partnerships.select(show_cols).head(8))
else:
print("partnerships scorecard unavailable for this match")
Partnerships columns: ['innings_number', 'team_name', 'partnership_runs', 'partnership_overs', 'wicket_name', 'fow_type', 'player1_name', 'player1_runs', 'player2_name', 'player2_runs']
shape: (6, 1)
ββββββββββββββββββ
β innings_number β
β --- β
β str β
ββββββββββββββββββ‘
β 2 β
β 2 β
β 2 β
β 2 β
β 2 β
β 2 β
ββββββββββββββββββ
ποΈ Section: headerβ
Match-level metadata: teams, status, venue, and the competition context. This is typically the first frame you'd use to confirm match identity and outcome.
header = frames.get("header", pl.DataFrame())
if header.height:
show_cols = [c for c in ["name", "status_type_name", "status_type_detail",
"home_team", "home_score",
"away_team", "away_score", "venue_full_name"]
if c in header.columns]
print(header.select(show_cols).head())
else:
print("header unavailable for this match")
shape: (0, 0)
ββ
ββ‘
ββ
π Section: game_infoβ
Match-level metadata that doesn't fit the header: toss winner, match type (T20, ODI, Test), series name, playing conditions, and the result method if weather interruption applied (D/L method).
game_info = frames.get("game_info", pl.DataFrame())
if game_info.height:
print(game_info.head())
else:
print("game_info unavailable for this match")
shape: (1, 7)
ββββββββββββ¬ββββββββββββββββ¬ββββββββββββββββ¬βββββββββββββ¬βββββββββββββββ¬βββββββββββββ¬βββββββββββββββ
β venue_id β venue_full_na β venue_short_n β venue_city β venue_countr β attendance β officials β
β --- β me β ame β --- β y β --- β --- β
β str β --- β --- β str β --- β i64 β str β
β β str β str β β str β β β
ββββββββββββͺββββββββββββββββͺββββββββββββββββͺβββββββββββββͺβββββββββββββββͺβββββββββββββͺβββββββββββββββ‘
β 57851 β Narendra Modi β Narendra Modi β Ahmedabad β India β 0 β [{'displayNa β
β β Stadium, β Stadium, β β β β me': 'KN β
β β Motera,β¦ β Motera,β¦ β β β β Ananthapaβ¦ β
ββββββββββββ΄ββββββββββββββββ΄ββββββββββββββββ΄βββββββββββββ΄βββββββββββββββ΄βββββββββββββ΄βββββββββββββββ
π Requesting a single sectionβ
When you only need one frame, pass section= to parse_cricket_summary to
avoid deserializing all 8 sections. The wrapper also accepts the section via
return_parsed=True + section= if you want to skip the intermediate raw dict.
if summary_raw is not None:
just_batting = parse_cricket_summary(summary_raw, section="matchcards_batting")
print(type(just_batting), just_batting.shape)
# pandas interop β same one-liner as every other sdv-py endpoint
just_batting_pd = parse_cricket_summary(summary_raw, section="matchcards_batting",
return_as_pandas=True)
print(type(just_batting_pd))
else:
print("no payload to parse")
<class 'polars.dataframe.frame.DataFrame'> (11, 12)
<class 'pandas.DataFrame'>
π° News and injuriesβ
espn_cricket_news and
espn_cricket_injuries
both follow the universal wrapper contract β they return a polars frame by default,
using the shared parse_news and parse_injuries parsers from
_common_espn_parsers.
news = safe(
"IPL news",
lambda: cricket.espn_cricket_news(league=IPL, limit=5),
)
if news is not None and news.height:
show_cols = [c for c in ["headline", "published", "type"] if c in news.columns]
print(news.select(show_cols).head(5))
else:
print("news unavailable right now")
β
IPL news
shape: (5, 3)
βββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββ¬βββββββββββββββ
β headline β published β type β
β --- β --- β --- β
β str β str β str β
βββββββββββββββββββββββββββββββββββͺβββββββββββββββββββββββͺβββββββββββββββ‘
β Phillips rides the Archer lighβ¦ β 2026-06-17T19:41:29Z β Story β
β Glenn Phillips repels England β¦ β 2026-06-17T19:41:47Z β Recap β
β Gill and Kishan hundreds carryβ¦ β 2026-06-17T18:05:42Z β Recap β
β Shafali's all-round show helpsβ¦ β 2026-06-17T17:54:49Z β Recap β
β Shreyanka Patil stretchered ofβ¦ β 2026-06-17T17:54:27Z β HeadlineNews β
βββββββββββββββββββββββββββββββββββ΄βββββββββββββββββββββββ΄βββββββββββββββ
injuries = safe(
"IPL injuries",
lambda: cricket.espn_cricket_injuries(league=IPL),
)
if injuries is not None and injuries.height:
print(injuries.head())
else:
print("injuries feed unavailable or empty right now")
β
IPL injuries
injuries feed unavailable or empty right now
π Calendarβ
espn_cricket_calendar
returns the competition calendar β matchdays, rounds, or phases depending on the
tournament format. It uses the universal parse_items parser.
cal = safe(
"IPL calendar",
lambda: cricket.espn_cricket_calendar(league=IPL),
)
if cal is not None and cal.height:
print(cal.shape)
print(cal.head())
else:
print("calendar unavailable right now")
β οΈ IPL calendar β NoESPNDataError: No data found for https://site.api.espn.com/apis/site/v2/sports/cricket/8048/calendar
calendar unavailable right now
π³ Cookbook: common cricket tasksβ
A handful of patterns you'll reach for constantly when working with the cricket surface.
Recipe 1 β Top run-scorers from a batting scorecard πβ
Filter to the highest individual scores from a batting matchcard. Useful for building a match-by-match batting leaderboard.
if batting.height and "runs" in batting.columns:
(
batting
.filter(pl.col("runs").is_not_null())
.sort("runs", descending=True)
.select([c for c in ["innings_number", "athlete_display_name", "runs",
"balls", "fours", "sixes", "strike_rate"]
if c in batting.columns])
.head(5)
)
else:
print("batting frame not available")
Recipe 2 β Economy leaders from the bowling card π―β
Bowlers who took wickets and kept a tight economy rate β the T20 game-changers.
if bowling.height and "economy" in bowling.columns:
(
bowling
.filter(pl.col("wickets").is_not_null() & (pl.col("wickets").cast(pl.Float64, strict=False) > 0))
.sort("economy", descending=False)
.select([c for c in ["innings_number", "athlete_display_name",
"overs", "wickets", "runs_conceded", "economy"]
if c in bowling.columns])
.head(5)
)
else:
print("bowling frame not available")
bowling frame not available
Recipe 3 β Largest partnerships π€β
Identify which batting pairs put on the biggest stands in a given innings. A large partnership is often the turning point in a T20 match.
if partnerships.height and "total_runs" in partnerships.columns:
(
partnerships
.filter(pl.col("total_runs").is_not_null())
.sort("total_runs", descending=True)
.select([c for c in ["innings_number", "batter1_display_name", "batter2_display_name",
"total_runs", "total_balls"]
if c in partnerships.columns])
.head(5)
)
else:
print("partnerships frame not available")
partnerships frame not available
Recipe 4 β Standings: current top-4 playoff picture πβ
In the IPL, the top 4 teams after the group stage advance to the playoffs. Sort the standings frame by points (then net run rate as a tiebreaker) to see where each franchise stands.
if standings is not None and standings.height:
sort_cols = [c for c in ["points", "net_run_rate"] if c in standings.columns]
if sort_cols:
(
standings
.sort(sort_cols, descending=[True] * len(sort_cols))
.select([c for c in ["team", "wins", "losses", "points", "net_run_rate"]
if c in standings.columns])
.head(4)
)
else:
print("expected sort columns not present:", standings.columns)
else:
print("standings not available")
expected sort columns not present: ['group', 'team', 'team_id', 'team_abbreviation', 'rank', 'matches_played', 'matches_won', 'matches_lost', 'noresult', 'match_points', 'qualified', 'netrr', 'for', 'against', 'total']
Recipe 5 β pandas interop: grouping rosters by role πΌβ
Every sdv-py endpoint accepts return_as_pandas=True, so dropping into the
pandas world is a single keyword. Here we pull the match rosters as a pandas
DataFrame and count players by position/type.
if summary_raw is not None:
rosters_pd = parse_cricket_summary(summary_raw, section="rosters",
return_as_pandas=True)
if rosters_pd is not None and len(rosters_pd):
print(type(rosters_pd))
print(rosters_pd.columns.tolist())
# Count players by position if the column exists
pos_col = next((c for c in ["position_name", "position", "type"]
if c in rosters_pd.columns), None)
if pos_col:
print(rosters_pd.groupby(pos_col, dropna=False).size().sort_values(ascending=False))
else:
print("rosters section empty")
else:
print("no payload")
<class 'pandas.DataFrame'>
['team_id', 'home_away', 'winner', 'athlete_id', 'athlete', 'jersey', 'starter', 'position', 'captain']
position
AR 8
BL 8
UKN 6
WK 2
dtype: int64
β οΈ Caveats and known limitationsβ
No teams endpoint for IPL.
espn_cricket_teams_site(league="8048") returns HTTP 404 β ESPN does not expose
a teams listing for the IPL through the Site v2 API. Use the season_teams
endpoint if you need franchise metadata for a specific season:
teams_seasonal = safe(
"IPL season teams",
lambda: cricket.espn_cricket_season_teams(league=IPL),
)
event_id is required for espn_cricket_summary.
Unlike the scoreboard (which returns today's slate without an ID), the summary
endpoint needs a specific event identifier. Obtain event_id from the
espn_cricket_scoreboard output (event_id column).
Off-season scoreboards may be empty.
The IPL runs AprilβMay; calling espn_cricket_scoreboard(league="8048") in
December returns an empty events list. The parser returns a zero-row frame
(never raises), so your code doesn't need to guard against exceptions β
only against .height == 0.
League slugs vary.
ESPN doesn't publish a canonical slug list. Common IPL slug is "8048";
England's county T20 Blast uses "eng.t20". Use
espn_cricket_league_root(league=slug, return_parsed=False) to verify that
a slug resolves before building a pipeline around it.
parse_cricket_summary section standings reflects in-tournament state.
The standings embedded inside a match summary are a snapshot at match time.
For the current full-tournament standings table, use espn_cricket_standings
directly.
# Demonstrating the safe empty-frame contract β no exception even for an empty payload.
from sportsdataverse.cricket.cricket_espn_parsers import parse_cricket_scoreboard
empty_df = parse_cricket_scoreboard({})
print("Empty payload β zero-row frame:", empty_df.shape)
empty_frames = parse_cricket_summary({})
print("Empty summary β dict of zero-row frames:")
for name, df in empty_frames.items():
print(f" {name}: {df.shape}")
Empty payload β zero-row frame: (0, 0)
Empty summary β dict of zero-row frames:
header: (0, 0)
matchcards_batting: (0, 0)
matchcards_bowling: (0, 0)
matchcards_partnerships: (0, 0)
rosters: (0, 0)
game_info: (0, 0)
leaders: (0, 0)
standings: (0, 0)
π Where to nextβ
- π‘ Full endpoint reference β every
espn_cricket_*wrapper is documented on the Cricket reference pages, grouped by Site v2, Web v3, and Core v2 families. - π
event_idlookup β the scoreboard frame'sevent_idcolumn is the key that unlocks the full summary. Build a pipeline:scoreboard β filter completed β event_id β summary β batting/bowling. - π Other leagues β swap the
league=slug to explore England county ("eng.1"), ICC Men's/Women's World Cup, PSL, BBL, and more. The parsers and workflow are identical across leagues. - πΌ Pass
return_as_pandas=Truefor pandas, orreturn_parsed=Falseon anyespn_cricket_*wrapper for the raw ESPN JSON. - π― Player depth β
espn_cricket_player_info,espn_cricket_player_gamelog, andespn_cricket_player_statsacceptleague=+athlete_id=and follow the samereturn_parsed/return_as_pandascontract. - π₯ The sister R package ecosystem is covered by cfbfastR (American football), hoopR (basketball), baseballr (baseball), and fastRhockey (hockey) β cricket sits in the Python-only surface for now.