Skip to main content
Version: 0.0.70

CFB — additional Python functions

Hand-written wrappers, loaders, and helpers in sportsdataverse.cfb not covered by the generated API-endpoint reference above.

Play-by-play, schedule & rosters

espn_cfb_game_rosters(game_id: 'int', raw=False, return_as_pandas=False, **kwargs) -> 'pl.DataFrame'

espn_cfb_game_rosters() - Pull the game by id.

Parameters

ParameterTypeDefaultDescription
game_idintUnique game_id, can be obtained from espn_cfb_schedule().
rawFalse
return_as_pandasboolFalseIf True, returns a pandas dataframe. If False, returns a polars dataframe.

Returns

Polars dataframe of game roster data with columns: 'athlete_id', 'athlete_uid', 'athlete_guid', 'athlete_type', 'first_name', 'last_name', 'full_name', 'athlete_display_name', 'short_name', 'weight', 'display_weight', 'height', 'display_height', 'age', 'date_of_birth', 'slug', 'jersey', 'linked', 'active', 'alternate_ids_sdr', 'birth_place_city', 'birth_place_state', 'birth_place_country', 'headshot_href', 'headshot_alt', 'experience_years', 'experience_display_value', 'experience_abbreviation', 'status_id', 'status_name', 'status_type', 'status_abbreviation', 'hand_type', 'hand_abbreviation', 'hand_display_value', 'draft_display_text', 'draft_round', 'draft_year', 'draft_selection', 'player_id', 'starter', 'valid', 'did_not_play', 'display_name', 'ejected', 'athlete_href', 'position_href', 'statistics_href', 'team_id', 'team_guid', 'team_uid', 'team_slug', 'team_location', 'team_name', 'team_nickname', 'team_abbreviation', 'team_display_name', 'team_short_display_name', 'team_color', 'team_alternate_color', 'is_active', 'is_all_star', 'team_alternate_ids_sdr', 'logo_href', 'logo_dark_href', 'game_id'

col_nametypedescription
athlete_idintegerESPN athlete id.
athlete_uidcharacterESPN athlete UID (universal identifier).
athlete_guidcharacterESPN athlete GUID.
athlete_typecharacterAthlete type / class.
first_namecharacterAthlete first name.
last_namecharacterAthlete last name.
full_namecharacterVenue full name (e.g. Tenney Stadium).
athlete_display_namecharacterPlayer display name; athlete_detail = TRUE only.
short_namecharacterRanking source short name (e.g. AP Poll).
weightdoubleListed weight (lbs).
display_weightcharacterHuman-readable weight (e.g. 205 lbs).
heightdoubleListed height (inches).
display_heightcharacterHuman-readable height (e.g. 6' 1").
slugcharacterURL slug for the team.
jerseycharacterJersey number.
linkedlogicalTRUE if the record is linked to a related entity.
activelogicalTRUE if the player was active for the game.
alternate_ids_sdrcharacterAlternate ids sdr.
birth_place_citycharacterBirth place city.
birth_place_statecharacterBirth place state.
birth_place_countrycharacterBirth place country.
birth_country_alternate_idcharacterAlternate identifier for the athlete's birth country, used in ESPN's nationality lookup system.
birth_country_abbreviationcharacterBirth country abbreviation.
headshot_hrefcharacterURL of the athlete headshot image.
headshot_altcharacterAlternative-text label for the headshot.
flag_hrefcharacterURL to the athlete's nationality flag image on ESPN's CDN.
flag_altcharacterAlt-text description for the athlete's country flag image, typically the country name.
flag_relcharacterRelationship descriptor for the flag image link (e.g., 'flag' or 'country').
experience_yearsintegerYears of experience.
experience_display_valuecharacterExperience display value.
experience_abbreviationcharacterExperience abbreviation.
status_idcharacterESPN commitment status id.
status_namecharacterStatus-type key (e.g. STATUS_FINAL).
status_typecharacterStatus type.
status_abbreviationcharacterStatus abbreviation.
hand_typecharacterHand type.
hand_abbreviationcharacterHand abbreviation.
hand_display_valuecharacterHand display value.
ageintegerPlayer age (in years).
date_of_birthcharacterPlayer date of birth (if published).
starterlogicalTRUE if the athlete started the game.
jersey_rightcharacterSecondary or alternate jersey number field, distinct from the primary jersey number.
validlogicalTRUE if the roster entry is flagged valid by ESPN.
did_not_playlogicalTRUE if the athlete did not play.
display_namecharacterHuman-readable metric name.
athlete_hrefcharacterESPN API URL reference for the athlete's full profile resource.
position_hrefcharacterESPN API URL reference for the athlete's position resource.
statistics_hrefcharacterESPN API URL reference for the athlete's game or season statistics resource.
team_idintegerESPN team id.
orderintegerTeam order within the competition (0 = first).
home_awaycharacterhome or away.
winnerlogicalTRUE if this team won the game.
team_guidcharacterESPN team GUID.
team_uidcharacterESPN universal team identifier (UID format 's:40~l:...~t:...').
team_slugcharacterTeam slug for the stat row.
team_locationcharacterTeam location / school name; team_detail = TRUE only.
team_namecharacterTeam nickname; team_detail = TRUE only.
team_nicknamecharacterTeam nickname label; team_detail = TRUE only.
team_abbreviationcharacterTeam abbreviation; team_detail = TRUE only.
team_display_namecharacterFull team display name; team_detail = TRUE only.
team_short_display_namecharacterShort team display name; team_detail = TRUE only.
team_colorcharacterPrimary team color; team_detail = TRUE only.
team_alternate_colorcharacterAlternate team color; team_detail = TRUE only.
is_activelogicalWhether the team is currently active.
is_all_starlogicalWhether the team is an all-star team.
team_alternate_ids_sdrcharacterESPN SDR (Sports Data Repository) alternate team identifier for the athlete's team.
logo_hrefcharacterURL of the default team logo.
logo_dark_hrefcharacterURL of the dark-variant team logo.
game_idintegerESPN game identifier.

Example

from sportsdataverse.cfb import espn_cfb_game_rosters
rosters = espn_cfb_game_rosters(game_id=401628334)
print(rosters.shape)

# Pandas round-trip

rosters_pd = espn_cfb_game_rosters(game_id=401628334, return_as_pandas=True)
rosters_pd.head()

# Pipeline next step (filter to game starters)

import polars as pl
starters = espn_cfb_game_rosters(game_id=401628334).filter(
pl.col("starter") == True
)

espn_cfb_play_participants(game_id: 'int', *, raw: 'bool' = False, return_as_pandas: 'bool' = False, resolve_missing: 'bool' = True, resolve_missing_max: 'int' = 50, **kwargs: 'Any') -> 'pl.DataFrame | pd.DataFrame | dict[str, Any]'

Pull ESPN per-play participants for a college-football game.

Parameters

ParameterTypeDefaultDescription
game_idintESPN game / event identifier.
rawboolFalseIf True, returns the raw list of play-items dicts (after following pagination) before any flattening.
return_as_pandasboolFalseIf True, returns a pandas DataFrame; otherwise polars.
resolve_missingboolTrueIf True (default), athletes that the cdn.espn.com sidecar omits are fetched one-by-one from their canonical ESPN $ref URL so the resulting frame has populated *_player_name / *_player_names columns wherever an *_player_id is non-null. Setting this to False skips the extra HTTP fan-out and reproduces the pre-enhancement behavior — rows may then ship with *_player_id populated but *_player_name null on the handful of athletes the sidecar misses (most visible on split sacks, multi-lateral returns, and older games).
resolve_missing_maxint50Hard cap on the number of per-athlete $ref requests issued for a single game. Defaults to 50, which comfortably covers every probed game (typical max is ≤8 unique missing athletes). If breached, a warning is logged and the remaining missing athletes are left with null names. Ignored when resolve_missing=False.

Returns

Polars (or pandas) DataFrame, one row per play. Columns include game_id, play_id, and TWO column families for every participant type ESPN ships for the game (typical types: passer, rusher, receiver, tackler, sacked_by, forced_by, pass_defender, kicker, punter, returner, recoverer, scorer, pat_scorer, penalized, assisted_by): * Scalar{type}_player_id / {type}_player_name: the first occurrence of that participant type on the play. Backwards compatible with the legacy regex-extractor shape. * List{type}_player_ids / {type}_player_names: List(Utf8) columns containing every occurrence of that participant type on the play, in the order ESPN shipped them. Plays with no participant of a given type carry an empty list [] (not null) for downstream consumption simplicity. This family preserves multi-entry participant types (split sacks where ESPN ships two sackedBy entries, multi-tacklers, etc.) that the scalar family collapses to first-only. If raw=True, returns the parsed JSON list of play dicts.

col_nametypedescription
game_idintegerESPN game identifier.
play_idintegerESPN play id.
kicker_player_namecharacterString name for the kicker on FG or kickoff.
passer_player_namecharacterName of the passer on a passing play.
receiver_player_namecharacterName of the receiver on a passing play.
rusher_player_namecharacterName of the rusher on a rushing play.
scorer_player_namecharacterDisplay name of the primary player credited with a touchdown or field goal score on the play.
returner_player_namecharacterDisplay name of the primary player who returned a kick, punt, or interception on the play.
pass_defender_player_namecharacterDisplay name of the primary pass defender who contested the target or broke up the pass on the play.
penalized_player_namecharacterDisplay name of the player assessed a penalty on the play.
sacked_by_player_namecharacterDisplay name of the primary pass rusher credited with the sack on the play.
pat_scorer_player_namecharacterDisplay name of the player who scored the point-after-touchdown conversion on the play.
punter_player_namecharacterName of the punter.
kicker_player_idcharacterUnique identifier for the kicker on FG or kickoff.
passer_player_idcharacterUnique identifier for the player that attempted the pass.
receiver_player_idcharacterUnique identifier for the receiver that was targeted on the pass.
rusher_player_idcharacterUnique identifier for the player that attempted the run.
scorer_player_idcharacterESPN athlete ID for the primary player who scored a touchdown or field goal on the play.
returner_player_idcharacterESPN athlete ID for the primary player who returned a kick, punt, or interception on the play.
pass_defender_player_idcharacterESPN athlete ID for the primary pass defender (cornerback or safety) who contested the target on the play.
penalized_player_idcharacterESPN athlete ID for the primary player who committed the penalty on the play.
sacked_by_player_idcharacterESPN athlete ID for the primary pass rusher who recorded the sack on the play.
pat_scorer_player_idcharacterESPN athlete ID for the primary player who scored a point-after-touchdown (PAT) conversion on the play.
punter_player_idcharacterUnique identifier for the punter.
kicker_player_namescharacterList of display names for all kickers credited on the play.
passer_player_namescharacterList of display names for all passers credited on the play.
receiver_player_namescharacterList of display names for all intended receivers credited on the play.
rusher_player_namescharacterList of display names for all ball carriers credited on the play.
scorer_player_namescharacterList of display names for all players credited with scoring on the play.
returner_player_namescharacterList of display names for all returners credited on the play.
pass_defender_player_namescharacterList of display names for all pass defenders credited on the play.
penalized_player_namescharacterList of display names for all players penalized on the play.
sacked_by_player_namescharacterList of display names for all pass rushers credited with the sack, including secondary participants on split sacks.
pat_scorer_player_namescharacterList of display names for all players credited with a PAT conversion on the play.
punter_player_namescharacterList of display names for all punters credited on the play.
kicker_player_idscharacterList of ESPN athlete IDs for all kickers credited on the play (e.g., kickoff, field goal, or PAT attempt).
passer_player_idscharacterList of ESPN athlete IDs for all passers credited on the play (supports multi-player lateral/trick plays).
receiver_player_idscharacterList of ESPN athlete IDs for all intended receivers on the play (supports lateral chains).
rusher_player_idscharacterList of ESPN athlete IDs for all ball carriers credited on the play (supports lateral handoffs).
scorer_player_idscharacterList of ESPN athlete IDs for all players credited with a scoring event on the play.
returner_player_idscharacterList of ESPN athlete IDs for all returners credited on the play (e.g., during a lateral after a return).
pass_defender_player_idscharacterList of ESPN athlete IDs for all pass defenders who contested the target or were credited with a pass breakup on the play.
penalized_player_idscharacterList of ESPN athlete IDs for all players penalized on the play.
sacked_by_player_idscharacterList of ESPN athlete IDs for all pass rushers credited with the sack on the play (includes split-sack participants).
pat_scorer_player_idscharacterList of ESPN athlete IDs for all players credited with a PAT conversion on the play.
punter_player_idscharacterList of ESPN athlete IDs for all punters credited on the play.

Example

from sportsdataverse.cfb import espn_cfb_play_participants
participants = espn_cfb_play_participants(game_id=401628334)
print(participants.shape)

# Skip the per-athlete fan-out for speed

participants_fast = espn_cfb_play_participants(
game_id=401628334,
resolve_missing=False,
)

# Pipeline next step (join onto play-by-play frame)

from sportsdataverse.cfb import CFBPlayProcess
pbp = CFBPlayProcess(gameId=401628334).espn_cfb_pbp()
plays = pbp["plays"]
joined = plays.join(participants, how="left", left_on="id", right_on="play_id")

espn_cfb_player_stats(athlete_id: 'int', season: 'int', *, season_type: 'str' = 'regular', total: 'bool' = False, raw: 'bool' = False, return_as_pandas: 'bool' = False, **kwargs: 'Any') -> 'pl.DataFrame | pd.DataFrame | dict[str, Any]'

Pull a college-football athlete's ESPN season stat line.

See sportsdataverse.wbb.espn_wbb_player_stats for full documentation of the wide return shape, the {category}_{stat} stat columns (for football: passing_*, rushing_*, receiving_*, scoring_*, ...), the athlete / team metadata blocks, and the season_type / total parameters. For the richer multi-category web-v3 payload use sportsdataverse.cfb.espn_cfb_player_stats_v3.

Parameters

ParameterTypeDefaultDescription
athlete_idintESPN college-football athlete identifier.
seasonintSeason year, used in the core-v2 path.
season_typestr'regular'"regular" (type 2) or "postseason" (type 3).
totalboolFalseForward-compat totals passthrough.
rawboolFalseIf True, returns the raw core-v2 statistics JSON dict.
return_as_pandasboolFalseIf True, returns a pandas DataFrame; else polars.

Returns

A single-row wide DataFrame (polars by default). When raw=True returns the raw statistics JSON dict.

col_nametypedescription
seasonintegerSeason (4-digit year).
season_typecharacterESPN season type (2 = regular, 3 = postseason).
totallogicalTotal.
athlete_idintegerESPN athlete id.
athlete_uidcharacterESPN athlete UID (universal identifier).
athlete_guidcharacterESPN athlete GUID.
athlete_typecharacterAthlete type / class.
first_namecharacterAthlete first name.
last_namecharacterAthlete last name.
full_namecharacterVenue full name (e.g. Tenney Stadium).
display_namecharacterHuman-readable metric name.
short_namecharacterRanking source short name (e.g. AP Poll).
weightdoubleListed weight (lbs).
display_weightcharacterHuman-readable weight (e.g. 205 lbs).
heightdoubleListed height (inches).
display_heightcharacterHuman-readable height (e.g. 6' 1").
ageintegerPlayer age (in years).
date_of_birthcharacterPlayer date of birth (if published).
jerseycharacterJersey number.
slugcharacterURL slug for the team.
activelogicalTRUE if the player was active for the game.
position_idintegerESPN position id.
position_namecharacterPosition name (e.g. Quarterback); position_detail = TRUE only.
position_display_namecharacterHuman-readable position name; position_detail = TRUE only.
position_abbreviationcharacterPosition abbreviation (e.g. QB); position_detail = TRUE only.
college_namecharacterCollege name.
status_idintegerESPN commitment status id.
status_namecharacterStatus-type key (e.g. STATUS_FINAL).
general_fumblesdoubleTotal number of fumbles committed by the player across all offensive and special-teams plays.
general_fumbles_lostdoubleNumber of fumbles the player committed that were recovered by the opposing team.
general_fumbles_touchdownsdoubleTotal touchdowns scored by the player as a result of fumble recoveries, combining offensive and defensive occurrences.
general_games_playeddoubleGames Played.
general_offensive_two_pt_returnsdoubleNumber of two-point conversions the player scored by returning a blocked or intercepted two-point attempt on the offensive side.
general_offensive_fumbles_touchdownsdoubleNumber of touchdowns scored by the player on fumble recoveries credited to the offensive category.
general_defensive_fumbles_touchdownsdoubleNumber of touchdowns scored by the player on fumble recoveries attributed to the defensive category.
passing_avg_gaindoubleAverage yards gained per passing play attempt by the quarterback in the passing category.
passing_completion_pctdoublePercentage of pass attempts thrown by the quarterback that were completed, calculated as completions divided by attempts.
passing_completionsdoublePass completions (split from CFBD's C/ATT field).
passing_espnqb_ratingdoubleESPN's proprietary quarterback rating for the player's passing performance, factoring in efficiency metrics beyond traditional passer rating.
passing_interception_pctdoublePercentage of pass attempts that resulted in an interception, calculated as interceptions divided by passing attempts.
passing_interceptionsdoubleTotal number of passes thrown by the quarterback that were intercepted by the defense.
passing_long_passingdoubleLongest single completed pass in yards recorded by the quarterback during the stat period.
passing_net_passing_yardsdoubleNet passing yards gained by the quarterback after subtracting yardage lost on sacks from gross passing yards.
passing_net_passing_yards_per_gamedoubleNet passing yards per game for the quarterback, computed as net passing yards divided by games played.
passing_net_total_yardsdoubleCombined net yardage from passing and rushing for a quarterback, accounting for sack yardage lost in the passing category.
passing_net_yards_per_gamedoubleNet total yards gained per game for the player as recorded in the passing category context.
passing_passing_attemptsdoubleTotal number of pass attempts thrown by the quarterback, including completions, incompletions, and interceptions.
passing_passing_big_playsdoubleNumber of passing plays that gained 20 or more yards as recorded for the quarterback.
passing_passing_first_downsdoubleNumber of first downs gained by the team on passing plays thrown by the quarterback.
passing_passing_fumblesdoubleNumber of fumbles the quarterback committed during passing plays, including fumbled snaps and sack fumbles.
passing_passing_fumbles_lostdoubleNumber of fumbles the quarterback committed on passing plays that were recovered by the opposing team.
passing_passing_touchdown_pctdoublePercentage of pass attempts that resulted in a passing touchdown, calculated as touchdowns divided by attempts.
passing_passing_touchdownsdoubleTotal number of touchdown passes thrown by the quarterback.
passing_passing_yardsdoubleGross passing yards gained by the quarterback on completed passes.
passing_passing_yards_after_catchdoubleTotal yards gained by receivers after the catch on passes thrown by the quarterback.
passing_passing_yards_at_catchdoubleTotal yards gained at the point of the catch (air yards) on passes thrown by the quarterback, before any yards after catch.
passing_passing_yards_per_gamedoubleGross passing yards per game for the quarterback, computed as passing yards divided by games played.
passing_qb_ratingdoubleTraditional NCAA passer rating for the quarterback, calculated from completion percentage, yards per attempt, touchdown rate, and interception rate.
passing_sacksdoubleTotal number of times the quarterback was sacked (tackled behind the line of scrimmage on a passing play).
passing_sack_yards_lostdoubleTotal yards lost by the quarterback as a result of being sacked, subtracted when computing net passing yards.
passing_team_games_playeddoubleNumber of team games played during the stat period, used as the denominator for per-game passing rate statistics.
passing_total_offensive_playsdoubleTotal number of offensive plays (pass attempts plus rushes) for the team during the stat period, recorded in the passing category context.
passing_total_points_per_gamedoubleAverage total points scored per game by the player's team as recorded alongside passing statistics.
passing_total_touchdownsdoubleTotal touchdowns accounted for by the quarterback across passing and rushing in the passing category context.
passing_total_yardsdoubleTotal offensive yardage (passing plus rushing) accumulated by the quarterback as reported in the passing category.
passing_total_yards_from_scrimmagedoubleTotal yards from scrimmage accumulated by the quarterback (passing plus rushing yards) in the passing category context.
passing_two_point_pass_convsdoubleNumber of successful two-point conversions the quarterback converted via a passing play.
passing_two_pt_passdoubleIndicator or count of two-point conversion passing attempts recorded for the quarterback.
passing_two_pt_pass_attemptsdoubleTotal number of two-point conversion attempts the quarterback made via a passing play.
passing_yards_from_scrimmage_per_gamedoubleAverage yards from scrimmage per game for the quarterback as reported in the passing category.
passing_yards_per_completiondoubleAverage yards gained per completed pass by the quarterback, calculated as passing yards divided by completions.
passing_yards_per_gamedoubleAverage gross passing yards per game for the quarterback, equivalent to passing_passing_yards_per_game.
passing_yards_per_pass_attemptdoubleAverage yards gained per pass attempt by the quarterback, calculated as passing yards divided by attempts.
passing_net_yards_per_pass_attemptdoubleNet passing yards divided by total pass attempts, including sack yardage lost in the denominator's context.
passing_qbrdoubleESPN Quarterback Rating (QBR) for the player in this game.
passing_adj_qbrdoubleESPN's adjusted Total Quarterback Rating (QBR) for the player's passing performance, controlling for opponent difficulty and game situation.
passing_quarterback_ratingdoubleTraditional passer rating for the quarterback, equivalent to passing_qb_rating, using the standard NCAA formula.
rushing_avg_gaindoubleAverage yards gained per rushing attempt for the player in the rushing category.
rushing_espnrb_ratingdoubleESPN's proprietary running back rating for the player's rushing performance.
rushing_long_rushingdoubleLongest single rushing carry in yards recorded by the player during the stat period.
rushing_net_total_yardsdoubleNet total yardage accumulated by the player from rushing and any receiving contributions as reported in the rushing category.
rushing_net_yards_per_gamedoubleNet total yards per game for the player as reported in the rushing category context.
rushing_rushing_attemptsdoubleTotal number of rushing attempts (carries) credited to the player.
rushing_rushing_big_playsdoubleNumber of rushing plays that gained 10 or more yards for the player.
rushing_rushing_first_downsdoubleNumber of first downs gained by the player via rushing plays.
rushing_rushing_fumblesdoubleNumber of fumbles the player committed on rushing plays.
rushing_rushing_fumbles_lostdoubleNumber of fumbles the player committed on rushing plays that were recovered by the opposing team.
rushing_rushing_touchdownsdoubleTotal number of rushing touchdowns scored by the player.
rushing_rushing_yardsdoubleTotal yards gained by the player on rushing attempts.
rushing_rushing_yards_per_gamedoubleAverage rushing yards per game for the player, calculated as rushing yards divided by games played.
rushing_stuffsdoubleNumber of rushing attempts in which the player was stopped at or behind the line of scrimmage.
rushing_stuff_yards_lostdoubleTotal yards lost by the player on stuffed rushing plays (carries stopped at or behind the line of scrimmage).
rushing_team_games_playeddoubleNumber of team games played during the stat period, used as the denominator for per-game rushing rate statistics.
rushing_total_offensive_playsdoubleTotal number of offensive plays for the team during the stat period, recorded in the rushing category context.
rushing_total_points_per_gamedoubleAverage total points scored per game by the player's team as recorded alongside rushing statistics.
rushing_total_touchdownsdoubleTotal touchdowns scored by the player across all methods as reported in the rushing category context.
rushing_total_yardsdoubleTotal offensive yardage accumulated by the player as reported in the rushing category.
rushing_total_yards_from_scrimmagedoubleTotal yards from scrimmage for the player (rushing plus receiving yards) as reported in the rushing category.
rushing_two_point_rush_convsdoubleNumber of successful two-point conversions the player converted via a rushing play.
rushing_two_pt_rushdoubleIndicator or count of two-point conversion rushing attempts recorded for the player.
rushing_two_pt_rush_attemptsdoubleTotal number of two-point conversion attempts the player made via a rushing play.
rushing_yards_from_scrimmage_per_gamedoubleAverage yards from scrimmage per game for the player as reported in the rushing category.
rushing_yards_per_gamedoubleAverage rushing yards per game for the player, equivalent to rushing_rushing_yards_per_game.
rushing_yards_per_rush_attemptdoubleAverage yards gained per rushing attempt for the player, calculated as rushing yards divided by attempts.
receiving_avg_gaindoubleAverage yards gained per reception for the player in the receiving category.
receiving_espnwr_ratingdoubleESPN's proprietary wide receiver / pass-catcher rating for the player's receiving performance.
receiving_long_receptiondoubleLongest single reception in yards recorded by the player during the stat period.
receiving_net_total_yardsdoubleNet total yardage accumulated by the player from receiving and any rushing contributions as reported in the receiving category.
receiving_net_yards_per_gamedoubleNet total yards per game for the player as reported in the receiving category context.
receiving_receiving_big_playsdoubleNumber of receiving plays that gained 20 or more yards for the player.
receiving_receiving_first_downsdoubleNumber of first downs gained by the player via receptions.
receiving_receiving_fumblesdoubleNumber of fumbles the player committed after catching a pass.
receiving_receiving_fumbles_lostdoubleNumber of fumbles the player committed on receiving plays that were recovered by the opposing team.
receiving_receiving_targetsdoubleTotal number of times the player was targeted as the intended receiver on a pass play.
receiving_receiving_touchdownsdoubleTotal number of touchdown receptions scored by the player.
receiving_receiving_yardsdoubleTotal yards gained by the player on completed receptions.
receiving_receiving_yards_after_catchdoubleTotal yards gained by the player after the catch on receiving plays.
receiving_receiving_yards_at_catchdoubleTotal air yards gained at the point of the catch on receiving plays, before any yards after catch.
receiving_receiving_yards_per_gamedoubleAverage receiving yards per game for the player, calculated as receiving yards divided by games played.
receiving_receptionsdoubleTotal number of completed receptions (catches) recorded by the player.
receiving_team_games_playeddoubleNumber of team games played during the stat period, used as the denominator for per-game receiving rate statistics.
receiving_total_offensive_playsdoubleTotal number of offensive plays for the team during the stat period, recorded in the receiving category context.
receiving_total_points_per_gamedoubleAverage total points scored per game by the player's team as recorded alongside receiving statistics.
receiving_total_touchdownsdoubleTotal touchdowns scored by the player across all methods as reported in the receiving category context.
receiving_total_yardsdoubleTotal offensive yardage accumulated by the player as reported in the receiving category.
receiving_total_yards_from_scrimmagedoubleTotal yards from scrimmage for the player (receiving plus rushing yards) as reported in the receiving category.
receiving_two_point_rec_convsdoubleNumber of successful two-point conversions the player converted via a reception.
receiving_two_pt_receptiondoubleIndicator or count of two-point conversion receptions recorded for the player.
receiving_two_pt_reception_attemptsdoubleTotal number of two-point conversion attempts the player made via a receiving play.
receiving_yards_from_scrimmage_per_gamedoubleAverage yards from scrimmage per game for the player as reported in the receiving category.
receiving_yards_per_gamedoubleAverage receiving yards per game for the player, equivalent to receiving_receiving_yards_per_game.
receiving_yards_per_receptiondoubleAverage yards gained per reception for the player, calculated as receiving yards divided by receptions.
scoring_defensive_pointsdoubleTotal points scored by the player through defensive plays such as defensive touchdowns, safeties, or fumble-return scores.
scoring_field_goalsdoubleTotal number of field goals made by the player in the scoring category.
scoring_kick_extra_pointsdoubleTotal number of extra point attempts kicked by the player.
scoring_kick_extra_points_madedoubleTotal number of successful extra points (PATs) kicked by the player.
scoring_misc_pointsdoublePoints scored by the player through miscellaneous means not captured by standard scoring categories.
scoring_passing_touchdownsdoubleTotal touchdown passes thrown by the player as counted in the scoring category.
scoring_receiving_touchdownsdoubleTotal touchdown receptions scored by the player as counted in the scoring category.
scoring_return_touchdownsdoubleTotal touchdowns scored by the player on kick or punt returns as counted in the scoring category.
scoring_rushing_touchdownsdoubleTotal rushing touchdowns scored by the player as counted in the scoring category.
scoring_total_pointsdoubleTotal points scored by the player across all scoring methods during the stat period.
scoring_total_points_per_gamedoubleAverage total points scored by the player per game during the stat period.
scoring_total_touchdownsdoubleTotal touchdowns scored by the player across all methods (passing, rushing, receiving, and return) in the scoring category.
scoring_total_two_point_convsdoubleTotal number of successful two-point conversions scored by the player across passing, rushing, and receiving attempts.
scoring_two_point_pass_convsdoubleNumber of successful two-point conversions the player scored via a passing play, as counted in the scoring category.
scoring_two_point_rec_convsdoubleNumber of successful two-point conversions the player scored via a reception, as counted in the scoring category.
scoring_two_point_rush_convsdoubleNumber of successful two-point conversions the player scored via a rushing play, as counted in the scoring category.
scoring_one_pt_safeties_madedoubleNumber of one-point safeties scored by the player's team, credited in the scoring category.
team_idintegerESPN team id.
team_uidcharacterESPN universal team identifier (UID format 's:40~l:...~t:...').
team_guidcharacterESPN team GUID.
team_slugcharacterTeam slug for the stat row.
team_locationcharacterTeam location / school name; team_detail = TRUE only.
team_namecharacterTeam nickname; team_detail = TRUE only.
team_abbreviationcharacterTeam abbreviation; team_detail = TRUE only.
team_display_namecharacterFull team display name; team_detail = TRUE only.
team_short_display_namecharacterShort team display name; team_detail = TRUE only.
team_colorcharacterPrimary team color; team_detail = TRUE only.
team_alternate_colorcharacterAlternate team color; team_detail = TRUE only.
team_is_activelogicalTRUE if the team is currently active.
team_logo_hrefcharacterDefault team logo URL; team_detail = TRUE only.

Example

from sportsdataverse.cfb import espn_cfb_player_stats
df = espn_cfb_player_stats(athlete_id=4426338, season=2023)
df.select(["full_name", "team_display_name", "passing_passing_yards"])

espn_cfb_schedule(dates=None, week=None, season_type=None, groups=None, limit=500, return_as_pandas=False, **kwargs) -> 'pl.DataFrame'

espn_cfb_schedule - look up the college football schedule for a given season

Parameters

ParameterTypeDefaultDescription
datesintNoneUsed to define different seasons. 2002 is the earliest available season.
weekintNoneWeek of the schedule.
season_typeintNone2 for regular season, 3 for post-season, 4 for off-season.
groupsintNoneUsed to define different divisions. 80 is FBS, 81 is FCS.
limitint500number of records to return, default: 500.
return_as_pandasboolFalseIf True, returns a pandas dataframe. If False, returns a polars dataframe.

Returns

Polars dataframe containing schedule dates for the requested season. Returns None if no games

col_nametypedescription
idcharacter247Sports referencing id for the recruit.
uidcharacterESPN global unique identifier.
datecharacterDate of the poll release.
attendanceintegerReported attendance at the game.
time_validlogicalWhether the start time is confirmed.
date_validlogicalBoolean flag indicating whether the game's scheduled date is confirmed and valid.
neutral_sitelogicalTRUE/FALSE flag for if the game took place at a neutral site.
conference_competitionlogicalConference competition.
play_by_play_availablelogicalWhether play-by-play data is available.
recentlogicalWhether the game is recent.
start_datecharacterSeason start timestamp (ISO 8601, UTC).
broadcastcharacterBroadcast network short name.
highlightscharacterGame highlight urls.
notes_typecharacterNotes type.
notes_headlinecharacterNotes headline.
broadcast_marketcharacterBroadcast market label (e.g. 'national', 'home').
broadcast_namecharacterBroadcast name.
type_idcharacterPlay-type id.
type_abbreviationcharacterPlay-type abbreviation (e.g. RUSH, TD).
venue_idcharacterReferencing venue id.
venue_full_namecharacterVenue full name.
venue_address_citycharacterVenue address city.
venue_address_countrycharacterCountry in which the game venue is located, as provided by ESPN's venue data.
venue_indoorlogicalWhether the home venue is indoors.
status_clockdoubleGame clock in seconds.
status_display_clockcharacterStatus display clock.
status_periodintegerCurrent period.
status_type_idcharacterUnique identifier for status type.
status_type_namecharacterStatus type name.
status_type_statecharacterStatus state (pre/in/post).
status_type_completedlogicalWhether the game is complete.
status_type_descriptioncharacterStatus type description.
status_type_detailcharacterStatus type detail.
status_type_short_detailcharacterStatus type short detail.
format_regulation_periodsintegerFormat regulation periods.
home_idcharacterHome team referencing id.
home_uidcharacterHome team's uid.
home_locationcharacterHome team's location.
home_namecharacterHome team display name.
home_abbreviationcharacterHome team's abbreviation.
home_display_namecharacterHome team display name.
home_short_display_namecharacterHome short display name.
home_colorcharacterHome team primary color hex.
home_alternate_colorcharacterColor code (hex) for home alternate.
home_is_activelogicalHome team's is active.
home_venue_idcharacterUnique identifier for home venue.
home_logocharacterHome team logo URL.
home_conference_idcharacterUnique identifier for home conference.
home_scorecharacterHome-team score after the play.
home_current_rankintegerAP or Coaches Poll ranking of the home team at the time of the game (null if unranked).
home_linescoreslistPer-period point totals for the home team, stored as an array of quarter/overtime scores.
home_recordscharacterWin-loss record of the home team at the time of the game, as reported by ESPN (e.g., overall or conference record).
away_idcharacterAway team referencing id.
away_uidcharacterAway team's uid.
away_locationcharacterAway team's location.
away_namecharacterAway team display name.
away_abbreviationcharacterAway team's abbreviation.
away_display_namecharacterAway team display name.
away_short_display_namecharacterAway short display name.
away_colorcharacterAway team primary color hex.
away_alternate_colorcharacterColor code (hex) for away alternate.
away_is_activelogicalAway team's is active.
away_venue_idcharacterUnique identifier for away venue.
away_logocharacterAway team logo URL.
away_conference_idcharacterUnique identifier for away conference.
away_scorecharacterAway-team score after the play.
away_current_rankintegerAP or Coaches Poll ranking of the away team at the time of the game (null if unranked).
away_linescoreslistPer-period point totals for the away team, stored as an array of quarter/overtime scores.
away_recordscharacterWin-loss record of the away team at the time of the game, as reported by ESPN (e.g., overall or conference record).
game_idintegerESPN game identifier.
seasonintegerSeason (4-digit year).
season_typeintegerESPN season type (2 = regular, 3 = postseason).
weekintegerGame week of the season.
venue_address_statecharacterVenue address state / region.
groups_idcharacterUnique identifier for groups.
groups_namecharacterGroups name.
groups_short_namecharacterGroups short name.
groups_is_conferencelogicalGroups is conference.

Example

from sportsdataverse.cfb import espn_cfb_schedule
slate = espn_cfb_schedule()
print(slate.shape if slate is not None else "no games")

# Pull a specific week of FBS games

week5 = espn_cfb_schedule(dates=2023, week=5, season_type=2)

# Pipeline next step (extract finals only)

import polars as pl
finals = espn_cfb_schedule(dates=2023, week=5).filter(
pl.col("status_type_completed") == True
)

Dataset loaders

load_cfb_betting_lines(return_as_pandas=False) -> 'pl.DataFrame'

Load college football betting lines information

Parameters

ParameterTypeDefaultDescription
return_as_pandasboolFalseIf True, returns a pandas dataframe. If False, returns a polars dataframe.

Returns

Polars dataframe containing betting lines available for the available seasons.

col_nametypedescription
iddouble247Sports referencing id for the recruit.
game_idintegerESPN game identifier.
seasondoubleSeason (4-digit year).
game_desccharacterHuman-readable description of the game, typically including team names and context.
date_timecharacterDate and time of the game to which the betting line applies, as a string.
market_typecharacterGeographic market type (e.g. National).
abbrcharacterSelection/side this odds row applies to — a team abbreviation for spread and moneyline markets, or 'over'/'under' for total markets (the data is long-format, one row per book per selection per market_type).
linesdoubleNumeric line for this row's market — the per-side point spread for spread markets or the over/under total points for total markets; null for moneyline rows.
oddsintegerAmerican-odds price for this selection — the juice/vig on spread and total rows, or the moneyline price itself on moneyline rows.
opening_linesdoubleOpening numeric line for this row's market (per-side spread or over/under total points) before line movement; null for moneyline rows.
opening_oddsintegerOpening American-odds price for this selection before line movement (vig on spread/total rows, moneyline price on moneyline rows).
bookcharacterName of the sportsbook or oddsmaker that provided the betting line.
season_typecharacterESPN season type (2 = regular, 3 = postseason).
weekintegerGame week of the season.

Example

from sportsdataverse.cfb import load_cfb_betting_lines
lines = load_cfb_betting_lines()
print(lines.shape)

# Pandas round-trip

lines_pd = load_cfb_betting_lines(return_as_pandas=True)
lines_pd.head()

# Pipeline next step (filter to one provider in 2023)

import polars as pl
consensus_2023 = load_cfb_betting_lines().filter(
(pl.col("season") == 2023) & (pl.col("provider") == "consensus")
)

load_cfb_rosters_crosswalk(return_as_pandas: 'bool' = False) -> 'pl.DataFrame'

Load the current ESPN x Fox CFB rosters crosswalk (single snapshot).

Unlike the per-season load_cfb_teams_crosswalk / load_cfb_schedule_crosswalk loaders, this one is season-less: ESPN's and Fox's team-roster endpoints only expose the current roster, so the published artifact is a single snapshot rather than a historical per-season series. It is built by cfbfastR-cfb-data's scripts/build_cfb_crosswalk.py (which fans the per-team sportsdataverse.cfb.cfb_rosters_crosswalk builder out over the current season's ESPN<->Fox team-id pairs) and refreshed on that repo's cadence.

Parameters

ParameterTypeDefaultDescription
return_as_pandasboolFalseIf True, returns a pandas dataframe. If False, returns a polars dataframe.

Returns

one row per matched player, carrying espn_team_id / fox_team_id provenance plus each provider's athlete id, name, jersey, position, and the match_method / matched_sources flags.

Example

from sportsdataverse.cfb import load_cfb_rosters_crosswalk
xwalk = load_cfb_rosters_crosswalk()
print(xwalk.shape)

# Pandas round-trip

xwalk_pd = load_cfb_rosters_crosswalk(return_as_pandas=True)

# Pipeline next step (one team's ESPN<->Fox athlete map)

import polars as pl
osu = load_cfb_rosters_crosswalk().filter(pl.col("espn_team_id") == 194)

Utilities & helpers

CFBPlayProcess(gameId=0, raw=False, path_to_json='/', return_keys=None, odds_override=None, game_roster=None, participants=None, **kwargs)

Process ESPN college-football play-by-play feeds into a tidy game-level dictionary.

Wraps the ESPN playbyplay / summary endpoints (or a local JSON dump) and pipes the result through a chain of feature-engineering steps -- down/distance, play-type flags, EPA, WPA, QBR, drive aggregation, and an advanced box score. Use run_processing_pipeline() for the full feature set or run_cleaning_pipeline() for a lighter clean.

Parameters

ParameterTypeDefaultDescription
gameId0ESPN game id.
rawFalseif True, espn_cfb_pbp() returns the (allowlisted) summary verbatim.
path_to_json'/'directory for cfb_pbp_disk() offline loads.
return_keysNoneoptional subset of result keys to return.
odds_overrideNoneoptional dict {gameSpread, overUnder, homeFavorite, gameSpreadAvailable} that short-circuits odds resolution (sets odds_source="injected") so offline rebuilds never hit the live core-odds endpoint or fall back to defaults. Validated + coerced here.
game_rosterNoneoptional pre-fetched game roster (the list of athlete records from ~sportsdataverse.cfb.cfb_game_rosters.espn_cfb_game_rosters, or the {"data": [...]} wrapper). Used by attach_player_idsto resolve a roster-backed{type}_player_idfor each extracted{type}_player_nameon games that lack a structuredparticipants[]` array (pre-2014). Passing it makes offline rebuilds fetch-free; when omitted the live path fetches the roster on demand only if needed.
participantsNone

Example

from sportsdataverse.cfb import CFBPlayProcess
proc = CFBPlayProcess(gameId=401628334)
proc.espn_cfb_pbp()
result = proc.run_processing_pipeline()
len(result["plays"])

# Offline replay from a JSON dump

proc = CFBPlayProcess(gameId=401628334, path_to_json="./pbp_dump")
proc.cfb_pbp_disk()
result = proc.run_processing_pipeline()

Methods

CFBPlayProcess.add_2pt_probs()

Add the cfb4th two-point-conversion decision surface to the processed plays.

Runs run_processing_pipeline first if it hasn't already, then computes the extra-point vs go-for-2 win-probability options on every point-after / two-point conversion row via sportsdataverse.cfb.cfb_two_point.get_2pt_probs. A row is treated as a PAT / two-point attempt when pointAfterAttempt.text is present (or the derived extra_point_result / two_point_conv_result is non-null). The new columns -- two_pt_wp, xp_wp, prob_2pt, two_pt_recommendation ("go_for_2" / "kick_xp") and two_pt_wp_diff (two_pt_wp - xp_wp, positive => go for 2) -- are written back onto self.plays_json (and self.json's plays); every other row carries nulls.

Returns

self.plays_json as a frame with the decision columns appended (also persisted back onto the instance).

Example

from sportsdataverse.cfb import CFBPlayProcess
game = CFBPlayProcess(gameId=401628334)
game.espn_cfb_pbp()
game.run_processing_pipeline()
out = game.add_2pt_probs()
print(out.filter(pl.col("two_pt_recommendation").is_not_null())
.select(["two_pt_wp", "xp_wp", "two_pt_recommendation"])
.head())

CFBPlayProcess.add_fourth_down_probs()

Add the cfb4th 4th-down decision surface to the processed plays.

Runs run_processing_pipeline first if it hasn't already, then computes the go / punt / field-goal win-probability options plus the max-WP fourth_down_recommendation (and per-option *_wp_diff and go_boost) on every 4th-down row via sportsdataverse.cfb.cfb_fourth_down.get_4th_down_probs. The new columns are written back onto self.plays_json (and self.json's plays); non-4th-down rows carry nulls for the decision columns.

Field-goal columns (fg_make_prob / make_fg_wp / miss_fg_wp / fg_wp) are null when the cfb4th FG model isn't bundled (cfb_fourth_down.FG_MODEL_AVAILABLE is False) -- the go + punt surface and the recommendation over the available options are still computed.

Returns

self.plays_json as a frame with the decision columns appended (also persisted back onto the instance).

Example

from sportsdataverse.cfb import CFBPlayProcess
game = CFBPlayProcess(gameId=401628334)
game.espn_cfb_pbp()
game.run_processing_pipeline()
fourth = game.add_fourth_down_probs()
print(fourth.filter(pl.col("start.down") == 4)
.select(["go_wp", "punt_wp", "fg_wp", "fourth_down_recommendation"])
.head())

CFBPlayProcess.cfb_pbp_disk()

Load a previously cached ESPN summary JSON for this game from disk.

Reads {path_to_json}/{gameId}.json where path_to_json was passed to the CFBPlayProcess constructor.

Returns

Parsed JSON contents, also stored on self.json.

Example

from sportsdataverse.cfb import CFBPlayProcess
game = CFBPlayProcess(gameId=401628334, path_to_json="./cache")
pbp = game.cfb_pbp_disk()
print(list(pbp.keys()))

CFBPlayProcess.cfb_pbp_json(**kwargs)

Return the JSON payload currently attached to this CFBPlayProcess

instance.

Returns

The cached JSON payload (self.json).

Example

from sportsdataverse.cfb import CFBPlayProcess
game = CFBPlayProcess(gameId=401628334)
game.espn_cfb_pbp()
cached = game.cfb_pbp_json()

CFBPlayProcess.corrupt_pbp_check()

Heuristic check for corrupt or incomplete play-by-play.

Flags games with zero plays, fewer than 50 plays for a completed game, or more than 500 plays for a completed game -- all of which historically indicate ESPN delivered a malformed PBP payload that should not be processed downstream.

Returns

True if PBP looks corrupt and the processing pipeline should be skipped, False otherwise.

Example

from sportsdataverse.cfb import CFBPlayProcess
game = CFBPlayProcess(gameId=401628334)
game.espn_cfb_pbp()
if not game.corrupt_pbp_check():
game.run_processing_pipeline()

CFBPlayProcess.create_box_score(play_df)

Build a per-team and per-player advanced box score from a processed

plays frame.

Triggers run_processing_pipeline first if it hasn't already run, so the input play_df is expected to be the post-pipeline plays frame.

Parameters

ParameterTypeDefaultDescription
play_dfpl.DataFrameThe plays frame produced by run_processing_pipeline (with EPA, WPA and play-type flags already populated).

Returns

Box-score sections, each a list of records — "pass" / "rush" / "receiver" (per-player advanced + EPA lines), "team" and "situational" (per-team), "defensive" and "defensive_players" (team- and player-level havoc), "specialists" (kicking / punting / return players), "turnover", "drives", and the ESPN-sourced "espn_team" / "espn_players" totals.

Example

from sportsdataverse.cfb import CFBPlayProcess
game = CFBPlayProcess(gameId=401628334)
game.espn_cfb_pbp()
processed = game.run_processing_pipeline()
box = game.create_box_score(game.plays_json)
print(list(box.keys()))

CFBPlayProcess.espn_cfb_pbp(**kwargs)

espn_cfb_pbp() - Pull the game by id. Data from API endpoints: college-football/playbyplay,

college-football/summary

Returns

Dictionary of game data with keys - "gameId", "plays", "boxscore", "header", "broadcasts", "videos", "playByPlaySource", "standings", "leaders", "timeouts", "homeTeamSpread", "overUnder", "pickcenter", "againstTheSpread", "odds", "predictor", "winprobability", "espnWP", "gameInfo", "season"

Example

from sportsdataverse.cfb import CFBPlayProcess
game = CFBPlayProcess(gameId=401628334)
pbp = game.espn_cfb_pbp()
print(list(pbp.keys()))

# Pull only the raw ESPN summary payload (skip cleaning)

raw_pbp = CFBPlayProcess(gameId=401628334, raw=True).espn_cfb_pbp()

# Pipeline next step (run the full processing pipeline for advanced features)

game = CFBPlayProcess(gameId=401628334)
game.espn_cfb_pbp()
processed = game.run_processing_pipeline() # adds EPA, WPA, box score

CFBPlayProcess.run_cleaning_pipeline()

Run the lighter cleaning pipeline (no EPA/WPA/QBR/box-score).

Same per-play feature engineering as run_processing_pipeline through add_spread_time`, but stops short of the modeling steps. Use this when you only need cleaned plays and don't need expected points or win probability columns.

Returns

Cleaned game payload (no advBoxScore key).

Example

from sportsdataverse.cfb import CFBPlayProcess
game = CFBPlayProcess(gameId=401628334)
game.espn_cfb_pbp()
cleaned = game.run_cleaning_pipeline()
print(len(cleaned["plays"]))

CFBPlayProcess.run_processing_pipeline(fourth_down_probs: 'bool' = True, two_pt_probs: 'bool' = True)

Run the full play-by-play processing pipeline.

Applies every scoring/feature step in order: down detection, play type flags, rush/pass flags, team score variables, new play types, penalty setup, play category flags, yardage cols, player cols, after cols, spread time, EPA, WPA, drive data, and QBR. Also produces an advanced box score and stores it under advBoxScore on the returned dict.

Idempotent -- subsequent calls return the cached self.json.

Parameters

ParameterTypeDefaultDescription
fourth_down_probsboolTruewhen True (default), run the cfb4th decision surface (sportsdataverse.cfb.cfb_fourth_down.get_4th_down_probs) on the enriched frame and append the go/field-goal/punt WP columns plus the fourth_down_recommendation to 4th-down plays (null elsewhere). Pass False to skip it (e.g. to avoid loading the fourth-down model).
two_pt_probsboolTruewhen True (default), run the cfb4th two-point decision surface (sportsdataverse.cfb.cfb_two_point.get_2pt_probs) and append two_pt_wp / xp_wp / prob_2pt / two_pt_recommendation / two_pt_wp_diff to point-after / two-point rows (null elsewhere).

Returns

The fully-processed game payload. If the constructor was given return_keys, only those keys are returned.

Example

from sportsdataverse.cfb import CFBPlayProcess
game = CFBPlayProcess(gameId=401628334)
game.espn_cfb_pbp()
processed = game.run_processing_pipeline()
print(processed["advBoxScore"].keys())

# Pipeline next step (return only selected keys)

game = CFBPlayProcess(gameId=401628334, return_keys=["plays", "advBoxScore"])
game.espn_cfb_pbp()
trimmed = game.run_processing_pipeline()

most_recent_cfb_season()

Return the most recent college football season year based on today's date.

The college football season starts in mid-August. If today is on or after August 15 (or any day in September or later), this returns the current calendar year. Otherwise, it returns the previous calendar year.

Returns

The most recent CFB season year.

Example

from sportsdataverse.cfb import most_recent_cfb_season
year = most_recent_cfb_season()
print(year)

# Combine with the loaders for a "current season" pull

from sportsdataverse.cfb import load_cfb_schedule, most_recent_cfb_season
sched = load_cfb_schedule(seasons=[most_recent_cfb_season()])

Other

cfb_odds_events_crosswalk(season: 'Optional[int]' = None, week: 'Optional[int]' = None, *, sport: 'str' = 'americanfootball_ncaaf', api_key: 'Optional[str]' = None, season_type: 'int' = 2, return_as_pandas: 'bool' = False, **kwargs: 'Any') -> 'DataFrameT'

Match The Odds API CFB events to ESPN game ids.

Pulls the upcoming/live events for sport from The Odds API and the ESPN scoreboard for (season, week), then joins them on the order-independent team matchup so each odds event id maps to its ESPN event id. Because The Odds API only lists near-term events, this is most useful for the current/upcoming week.

Parameters

ParameterTypeDefaultDescription
seasonOptional[int]NoneESPN season year for the schedule side. Defaults to the most recent CFB season.
weekOptional[int]NoneESPN schedule week. When None, ESPN returns its default (current) slate.
sportstr'americanfootball_ncaaf'The Odds API sport key. Defaults to "americanfootball_ncaaf".
api_keyOptional[str]NoneThe Odds API key; falls back to the ODDS_API_KEY env var.
season_typeint2ESPN season type (2 regular, 3 post-season). Defaults to 2.
return_as_pandasboolFalseIf True return a pandas DataFrame; otherwise polars.

Returns

A polars DataFrame (pandas when return_as_pandas=True), one row per odds event, with columns matchup_key, odds_event_id, espn_game_id, home_team, away_team, commence_time, espn_date, matched_sources.

Example

from sportsdataverse.cfb import cfb_odds_events_crosswalk
xwalk = cfb_odds_events_crosswalk(season=2024, week=5)
matched = xwalk.filter(pl.col("espn_game_id").is_not_null())

cfb_rosters_crosswalk(espn_team_id: 'Union[int, str]', fox_team_id: 'Union[int, str]', *, season: 'Optional[int]' = None, providers: 'Optional[Sequence[str]]' = None, return_as_pandas: 'bool' = False, **kwargs: 'Any') -> 'DataFrameT'

Build the ESPN x Fox x Yahoo player-id crosswalk for one team.

Fetches the selected providers' players for the team, matches them on normalized name (with jersey as a confidence signal), and returns each player's ESPN, Fox, and Yahoo athlete ids side by side. Use cfb_teams_crosswalk first to translate an ESPN team id into the matching Fox team id.

ESPN and Fox provide full rosters, so the default is ("espn", "fox"). Yahoo is opt-in (pass providers=("espn", "fox", "yahoo")) because it has no roster endpoint — its only player feed is the season stat-leaderboard (sportsdataverse.cfb.yahoo_cfb_player_season_stats), which is the league's top ~200 players (roughly one per team) and frequently includes no player for a given team at all. When selected, the team is resolved by matching Yahoo's (abbreviated) team name against the ESPN team's name; if it can't be resolved, the Yahoo columns are simply null.

Parameters

ParameterTypeDefaultDescription
espn_team_idUnion[int, str]ESPN team id (e.g. 194 for Ohio State).
fox_team_idUnion[int, str]Fox Bifrost team id (e.g. 25 for Ohio State).
seasonOptional[int]NoneSeason year for the Yahoo player-stats leg. Defaults to the most recent CFB season. Unused when Yahoo isn't selected.
providersOptional[Sequence[str]]NoneWhich sources to include — any of "espn", "fox", "yahoo". None (default) uses ("espn", "fox"); add "yahoo" explicitly for its (sparse) leg, or pass a single source.
return_as_pandasboolFalseIf True return a pandas DataFrame; otherwise polars.

Returns

A polars DataFrame (pandas when return_as_pandas=True) with columns person_key, espn_athlete_id, fox_athlete_id, yahoo_athlete_id, name, espn_jersey, fox_jersey, espn_position, fox_position, yahoo_position, match_method, matched_sources. match_method reflects the ESPN/Fox jersey agreement: name_jersey (agree), name (name only), name_jersey_conflict (jerseys differ — review), or unmatched.

Example

from sportsdataverse.cfb import cfb_rosters_crosswalk
xwalk = cfb_rosters_crosswalk(espn_team_id=194, fox_team_id=25, season=2024)
matched = xwalk.filter(pl.col("matched_sources") == "espn+fox")

# Just ESPN vs Fox (skip Yahoo's partial leg)

espn_fox = cfb_rosters_crosswalk(194, 25, providers=("espn", "fox"))

cfb_schedule_crosswalk(season: 'int', week: 'Optional[int]' = None, *, season_type: 'int' = 2, providers: 'Optional[Sequence[str]]' = None, return_as_pandas: 'bool' = False, **kwargs: 'Any') -> 'DataFrameT'

Build the ESPN x Fox x Yahoo CFB game-id crosswalk.

Each ESPN game is keyed by its order-independent team matchup, and the Fox and Yahoo games are mapped onto it, so each row pairs the ESPN event id with the Fox Bifrost event id and the Yahoo dotted game id. Where a provider has no game, its columns are None and matched_sources records who contributed — so regular season, conference championships, bowls, and the CFP all flow through the same call, degrading gracefully when a source lacks a game.

Two modes:

  • Full season (week omitted): pulls every ESPN game (regular weeks + bowls + CFP), Fox's full season, and Yahoo's full season, and matches on team + date (date disambiguates rematches — a regular-season game vs a conference-championship or CFP rematch of the same teams).
  • Single week (week given): just that week's slate, matched on team.

Each provider leg is best-effort: a Fox outage, a Yahoo per-week parser hiccup, or Fox's offseason-projected CFP matchups simply leave that provider's columns null rather than failing the call.

Parameters

ParameterTypeDefaultDescription
seasonintSeason year (e.g. 2024).
weekOptional[int]NoneSchedule week number for single-week mode; omit (None) for the whole season.
season_typeint2ESPN season type for single-week mode — 2 regular, 3 post-season (week=1 bowls, week=999 CFP). Ignored in full-season mode. Defaults to 2.
providersOptional[Sequence[str]]NoneWhich sources to include — any of "espn", "fox", "yahoo". None (default) uses all three; pass a subset for a pairwise crosswalk (e.g. ("espn", "fox")) or a single source. Unselected providers are not fetched and surface as null columns.
return_as_pandasboolFalseIf True return a pandas DataFrame; otherwise polars.

Returns

A polars DataFrame (pandas when return_as_pandas=True) with columns matchup_key, espn_game_id, fox_game_id, yahoo_game_id, yahoo_global_game_id, home_team, away_team, espn_date, fox_date, yahoo_date, matched_sources.

Example

from sportsdataverse.cfb import cfb_schedule_crosswalk
full = cfb_schedule_crosswalk(2024)
all_three = full.filter(pl.col("matched_sources") == "espn+fox+yahoo")

# Or just one week

wk5 = cfb_schedule_crosswalk(2024, 5)

cfb_teams_crosswalk(*, season: 'Optional[int]' = None, week: 'int' = 1, providers: 'Optional[Sequence[str]]' = None, return_as_pandas: 'bool' = False, **kwargs: 'Any') -> 'DataFrameT'

Build the ESPN x Fox x Yahoo CFB team-id crosswalk.

Fetches the selected provider team directories, normalizes each team name to a shared key, and full-outer-joins them so every row carries each provider's id, name, and abbreviation (None where a provider has no match). The matched_sources column records which providers contributed.

Parameters

ParameterTypeDefaultDescription
seasonOptional[int]NoneSeason year used only to fetch Yahoo's embedded team directory (Yahoo has no standalone teams endpoint). Defaults to the most recent CFB season.
weekint1Schedule week used for the Yahoo scoreboard fetch. Defaults to 1. The embedded directory is the full league list regardless.
providersOptional[Sequence[str]]NoneWhich sources to include — any of "espn", "fox", "yahoo". None (default) uses all three; pass a subset for a pairwise crosswalk (e.g. ("espn", "fox")) or a single source. Unselected providers are not fetched and surface as null columns.
return_as_pandasboolFalseIf True return a pandas DataFrame; otherwise polars.

Returns

A polars DataFrame (pandas when return_as_pandas=True) with columns norm_key, espn_team_id, espn_team, espn_abbreviation, fox_team_id, fox_team, fox_abbreviation, yahoo_team_id, yahoo_team, yahoo_abbreviation, matched_sources.

Example

from sportsdataverse.cfb import cfb_teams_crosswalk
xwalk = cfb_teams_crosswalk(season=2024)
row = xwalk.filter(pl.col("espn_team_id") == 194) # Ohio State

# Pairwise — just ESPN vs Fox

espn_fox = cfb_teams_crosswalk(providers=("espn", "fox"))

espn_cfb_teams(groups=None, return_as_pandas=False, **kwargs) -> 'pl.DataFrame'

espn_cfb_teams - look up the college football teams

Parameters

ParameterTypeDefaultDescription
groupsintNoneUsed to define different divisions. 80 is FBS, 81 is FCS.
return_as_pandasboolFalseIf True, returns a pandas dataframe. If False, returns a polars dataframe.

Returns

Polars dataframe containing schedule dates for the requested season. This function caches by default, so if you want to refresh the data, use the command sportsdataverse.cfb.espn_cfb_teams.clear_cache().

col_nametypedescription
team_abbreviationcharacterTeam abbreviation; team_detail = TRUE only.
team_alternate_colorcharacterAlternate team color; team_detail = TRUE only.
team_colorcharacterPrimary team color; team_detail = TRUE only.
team_display_namecharacterFull team display name; team_detail = TRUE only.
team_idcharacterESPN team id.
team_is_activelogicalTRUE if the team is currently active.
team_is_all_starlogicalTRUE if the row represents an All-Star team.
team_locationcharacterTeam location / school name; team_detail = TRUE only.
team_logosintegerTeam logo metadata.
team_namecharacterTeam nickname; team_detail = TRUE only.
team_nicknamecharacterTeam nickname label; team_detail = TRUE only.
team_short_display_namecharacterShort team display name; team_detail = TRUE only.
team_slugcharacterTeam slug for the stat row.
team_uidcharacterESPN universal team identifier (UID format 's:40~l:...~t:...').

Example

from sportsdataverse.cfb import espn_cfb_teams
teams = espn_cfb_teams()
print(teams.shape)

# Pull FCS teams (group 81)

fcs = espn_cfb_teams(groups=81, return_as_pandas=True)
fcs.head()

# Pipeline next step (build an abbreviation lookup)

teams = espn_cfb_teams()
abbr_map = dict(zip(teams["team_id"], teams["team_abbreviation"]))

fox_cfb_boxscore(game_id: 'Union[int, str]', *, return_parsed: 'bool' = True, return_as_pandas: 'bool' = False, **kwargs: 'Any') -> "Union[pl.DataFrame, 'pd.DataFrame', Dict[str, Any]]"

Fox Sports CFB boxscore (long: one row per player-stat).

Endpoint: GET https://api.foxsports.com/bifrost/v1/cfb/event/{game_id}/data (the boxscore block).

Parameters

ParameterTypeDefaultDescription
game_idUnion[int, str]Fox Bifrost event id (e.g. "41616").
return_parsedboolTrueIf True (default) flatten the per-team stat tables to long form; if False return the raw JSON dict.
return_as_pandasboolFalseIf True return a pandas DataFrame; otherwise polars. Ignored when return_parsed=False.

Returns

A polars DataFrame (default), a pandas DataFrame when return_as_pandas=True, or the raw JSON dict when return_parsed=False.

Example

from sportsdataverse.cfb import fox_cfb_boxscore
df = fox_cfb_boxscore("41616")

fox_cfb_league_leaders(category: 'str' = 'passing', who: 'str' = 'player', page: 'int' = 0, group_id: 'Union[int, str]' = '2', *, return_parsed: 'bool' = True, return_as_pandas: 'bool' = False, **kwargs: 'Any') -> "Union[pl.DataFrame, 'pd.DataFrame', Dict[str, Any]]"

Fox Sports CFB statistical leaders (one row per player/team).

Endpoint: GET .../bifrost/v1/cfb/league/stats-con/{who}/{category}/{page}

Parameters

ParameterTypeDefaultDescription
categorystr'passing'Stat category -- passing, rushing, receiving, defense, kicking, returning, scoring, yardage (team adds downs, turnovers). Defaults to "passing".
whostr'player'"player" or "team". Defaults to "player".
pageint00-based result page. Defaults to 0.
group_idUnion[int, str]'2'Conference/group filter. Defaults to "2".
return_parsedboolTrueIf True (default) flatten the leader tables to a DataFrame; if False return the raw JSON dict.
return_as_pandasboolFalseIf True return a pandas DataFrame; otherwise polars. Ignored when return_parsed=False.

Returns

A polars DataFrame (default), a pandas DataFrame when return_as_pandas=True, or the raw JSON dict when return_parsed=False.

Example

from sportsdataverse.cfb import fox_cfb_league_leaders
df = fox_cfb_league_leaders("passing")

fox_cfb_odds(game_id: 'Union[int, str]', *, return_parsed: 'bool' = True, return_as_pandas: 'bool' = False, **kwargs: 'Any') -> "Union[pl.DataFrame, 'pd.DataFrame', Dict[str, Any]]"

Fox Sports CFB game odds six-pack (spread / to win / total per team).

Endpoint: GET https://api.foxsports.com/bifrost/v1/cfb/event/{game_id}/odds

Parameters

ParameterTypeDefaultDescription
game_idUnion[int, str]Fox Bifrost event id (e.g. "41616").
return_parsedboolTrueIf True (default) flatten the six-pack market to a DataFrame; if False return the raw JSON dict.
return_as_pandasboolFalseIf True return a pandas DataFrame; otherwise polars. Ignored when return_parsed=False.

Returns

A polars DataFrame (default; empty when no market is posted), a pandas DataFrame when return_as_pandas=True, or the raw JSON dict when return_parsed=False.

Example

from sportsdataverse.cfb import fox_cfb_odds
df = fox_cfb_odds("41616")

fox_cfb_pbp(game_id: 'Union[int, str]', *, return_parsed: 'bool' = True, return_as_pandas: 'bool' = False, **kwargs: 'Any') -> "Union[pl.DataFrame, 'pd.DataFrame', Dict[str, Any]]"

Fox Sports CFB play-by-play (one row per play).

Endpoint: GET https://api.foxsports.com/bifrost/v1/cfb/event/{game_id}/data

Parameters

ParameterTypeDefaultDescription
game_idUnion[int, str]Fox Bifrost event id (e.g. "41616") -- not the ESPN id.
return_parsedboolTrueIf True (default) flatten the pbp layout to a DataFrame; if False return the raw JSON dict.
return_as_pandasboolFalseIf True return a pandas DataFrame; otherwise polars. Ignored when return_parsed=False.

Returns

A polars DataFrame (default), a pandas DataFrame when return_as_pandas=True, or the raw JSON dict when return_parsed=False.

Example

from sportsdataverse.cfb import fox_cfb_pbp
df = fox_cfb_pbp("41616")

fox_cfb_play_process(event_id, odds_override: 'Optional[Dict[str, Any]]' = None, process: 'bool' = True, raw: 'bool' = False, **kwargs) -> 'Dict[str, Any]'

Build a processed CFB play-by-play game from FoxSports as a backup to ESPN.

Where ~sportsdataverse.cfb.cfb_fox_ext.fox_cfb_pbp returns the raw Fox play-by-play rows, this runs Fox data through the full ESPN play processor: it fetches FoxSports Bifrost cfb/event/{event_id}/data, adapts it into the ESPN-summary shape via fox_to_espn_summary, and runs the same ~sportsdataverse.cfb.cfb_pbp.CFBPlayProcess pipeline ESPN games use -- producing EPA / WPA / advanced box score. The result carries source="fox" so downstream consumers know the provenance (and that text-derived columns are lower fidelity than the ESPN path).

Parameters

ParameterTypeDefaultDescription
event_idFoxSports CFB event id (e.g. 41616).
odds_overrideOptional[Dict[str, Any]]NoneOptional {gameSpread, overUnder, homeFavorite, gameSpreadAvailable} dict. Fox does not expose a clean pre-game spread, so when omitted a neutral pick'em line is used (EPA is unaffected; only the WP model's spread term is neutralized).
processboolTrueIf True (default) run the full ~sportsdataverse.cfb.cfb_pbp.CFBPlayProcess.run_processing_pipeline (EPA/WPA/box). If False run the lighter ~sportsdataverse.cfb.cfb_pbp.CFBPlayProcess.run_cleaning_pipeline.
rawboolFalseIf True skip the processor entirely and return the adapted ESPN-summary dict (the input the processor would consume).

Returns

The processed game payload (same keys as CFBPlayProcess.run_processing_pipeline) with an added source="fox" key. When raw=True, the adapted summary dict.

Example

from sportsdataverse.cfb import fox_cfb_play_process
game = fox_cfb_play_process(41616)
print(len(game["plays"]), game["source"])

fox_cfb_schedule(season: 'Optional[int]' = None, *, segment_id: 'Optional[str]' = None, group_id: 'Union[int, str]' = '2', return_parsed: 'bool' = True, return_as_pandas: 'bool' = False, **kwargs: 'Any') -> "Union[pl.DataFrame, 'pd.DataFrame', Dict[str, Any]]"

Fox Sports CFB full-season schedule (one row per game).

Fox lists games behind a two-step selector -> segment flow: scoreboard/main enumerates the season's segments (its selectionGroupList), and league/scores-segment/{segmentId} returns the games for one segment. Pass a season to scrape the whole season -- every regular week plus conference championships, bowls, and every College Football Playoff round -- enumerated from the live selector and unioned, deduplicated by game_id.

Segment ids encode the phase, not an ESPN-style integer week: "{season}-{week}-1" for a regular-season week, "{season}-bowls-2" for the bowls, "{season}-cfp-2" for the CFP (conference championships fall in the final regular-season week). Pass segment_id to fetch just one of them.

The numeric game_id is the Fox Bifrost event id that fox_cfb_pbp / fox_cfb_odds accept; week_label is the section title.

Parameters

ParameterTypeDefaultDescription
seasonOptional[int]NoneSeason year -> scrape the full season. Ignored when segment_id is given; if both are None the current segment is returned.
segment_idOptional[str]NoneExplicit Fox segment id (e.g. "2025-5-1", "2025-cfp-2") -> fetch just that segment.
group_idUnion[int, str]'2'Conference/division group filter. Defaults to "2" (FBS).
return_parsedboolTrueIf True (default) flatten to a DataFrame; if False return the raw JSON (a single segment's dict, or a {segment_id: dict} map in full-season mode).
return_as_pandasboolFalseIf True return a pandas DataFrame; otherwise polars. Ignored when return_parsed=False.

Returns

A polars DataFrame (default) with columns game_id, date, status, week_label, home_team, home_team_id, away_team, away_team_id, segment_id; a pandas DataFrame when return_as_pandas=True; or raw JSON when return_parsed=False.

Example

from sportsdataverse.cfb import fox_cfb_schedule
season = fox_cfb_schedule(2025)

# Fetch just one segment (a week, or the playoff)

wk5 = fox_cfb_schedule(segment_id="2025-5-1")
cfp = fox_cfb_schedule(segment_id="2025-cfp-2")

fox_cfb_standings(team_id: 'Union[int, str]', *, return_parsed: 'bool' = True, return_as_pandas: 'bool' = False, **kwargs: 'Any') -> "Union[pl.DataFrame, 'pd.DataFrame', Dict[str, Any]]"

Fox Sports CFB conference standings for a team's conference.

Endpoint: GET https://api.foxsports.com/bifrost/v1/cfb/team/{team_id}/standings (the league-wide league/standings endpoint returns header-only tables, so standings are keyed by team).

Parameters

ParameterTypeDefaultDescription
team_idUnion[int, str]Fox Bifrost team id (e.g. "11" = Miami (FL)).
return_parsedboolTrueIf True (default) flatten the standings tables to a DataFrame; if False return the raw JSON dict.
return_as_pandasboolFalseIf True return a pandas DataFrame; otherwise polars. Ignored when return_parsed=False.

Returns

A polars DataFrame (default), a pandas DataFrame when return_as_pandas=True, or the raw JSON dict when return_parsed=False.

Example

from sportsdataverse.cfb import fox_cfb_standings
df = fox_cfb_standings("11")

fox_cfb_team_gamelog(team_id: 'Union[int, str]', *, return_parsed: 'bool' = True, return_as_pandas: 'bool' = False, **kwargs: 'Any') -> "Union[pl.DataFrame, 'pd.DataFrame', Dict[str, Any]]"

Fox Sports CFB team game log -- tidy long: one row per (game, stat).

Endpoint: GET https://api.foxsports.com/bifrost/v1/cfb/team/{team_id}/gamelog The endpoint groups team per-game stats by category (passing, rushing, defense, ...) and season-type split; this flattens to columns team_id, season_type, category, game_id, game_date, opponent, stat, value.

Parameters

ParameterTypeDefaultDescription
team_idUnion[int, str]Fox Bifrost team id (e.g. "11" = Miami (FL)).
return_parsedboolTrueIf True (default) flatten to long form; if False return the raw JSON dict.
return_as_pandasboolFalseIf True return a pandas DataFrame; otherwise polars. Ignored when return_parsed=False.

Returns

A polars DataFrame (default), a pandas DataFrame when return_as_pandas=True, or the raw JSON dict when return_parsed=False.

Example

from sportsdataverse.cfb import fox_cfb_team_gamelog
df = fox_cfb_team_gamelog("11")

fox_cfb_team_roster(team_id: 'Union[int, str]', *, return_parsed: 'bool' = True, return_as_pandas: 'bool' = False, **kwargs: 'Any') -> "Union[pl.DataFrame, 'pd.DataFrame', Dict[str, Any]]"

Fox Sports CFB team roster (one row per player).

Endpoint: GET https://api.foxsports.com/bifrost/v1/cfb/team/{team_id}/roster

Parameters

ParameterTypeDefaultDescription
team_idUnion[int, str]Fox Bifrost team id (e.g. "11" = Miami (FL)); discover via the league team directory (cfb/league/teamnav).
return_parsedboolTrueIf True (default) flatten the position-group tables to a DataFrame; if False return the raw JSON dict.
return_as_pandasboolFalseIf True return a pandas DataFrame; otherwise polars. Ignored when return_parsed=False.

Returns

A polars DataFrame (default), a pandas DataFrame when return_as_pandas=True, or the raw JSON dict when return_parsed=False.

Example

from sportsdataverse.cfb import fox_cfb_team_roster
df = fox_cfb_team_roster("11")

fox_cfb_team_stats(team_id: 'Union[int, str]', *, return_parsed: 'bool' = True, return_as_pandas: 'bool' = False, **kwargs: 'Any') -> "Union[pl.DataFrame, 'pd.DataFrame', Dict[str, Any]]"

Fox Sports CFB team stat leaders (one row per category leader).

Endpoint: GET https://api.foxsports.com/bifrost/v1/cfb/team/{team_id}/stats

Parameters

ParameterTypeDefaultDescription
team_idUnion[int, str]Fox Bifrost team id (e.g. "11" = Miami (FL)).
return_parsedboolTrueIf True (default) flatten the leader sections to a DataFrame; if False return the raw JSON dict.
return_as_pandasboolFalseIf True return a pandas DataFrame; otherwise polars. Ignored when return_parsed=False.

Returns

A polars DataFrame (default), a pandas DataFrame when return_as_pandas=True, or the raw JSON dict when return_parsed=False.

Example

from sportsdataverse.cfb import fox_cfb_team_stats
df = fox_cfb_team_stats("11")

fox_cfb_teams(*, return_parsed: 'bool' = True, return_as_pandas: 'bool' = False, **kwargs: 'Any') -> "Union[pl.DataFrame, 'pd.DataFrame', Dict[str, Any]]"

Fox Sports CFB team directory (one row per team).

Endpoint: GET https://api.foxsports.com/bifrost/v1/cfb/league/teamnav

The team-nav payload is the canonical Fox directory: it maps every team's Bifrost id to its abbreviation, full name, and web slug. This is the lookup you need to translate a human team name into the numeric team_id the other fox_cfb_* wrappers expect, and it is the Fox side of sportsdataverse.cfb.cfb_teams_crosswalk.

Parameters

ParameterTypeDefaultDescription
return_parsedboolTrueIf True (default) flatten the nav items to a DataFrame; if False return the raw JSON dict.
return_as_pandasboolFalseIf True return a pandas DataFrame; otherwise polars. Ignored when return_parsed=False.

Returns

A polars DataFrame (default) with columns fox_team_id, abbreviation, name, slug, color, logo_url; a pandas DataFrame when return_as_pandas=True; or the raw JSON dict when return_parsed=False.

Example

from sportsdataverse.cfb import fox_cfb_teams
teams = fox_cfb_teams()
fox_id = dict(zip(teams["abbreviation"], teams["fox_team_id"]))

fox_to_espn_summary(fox_data: 'Dict[str, Any]') -> 'Dict[str, Any]'

Adapt a Fox cfb/event/{id}/data payload into the ESPN-summary shape.

Parameters

ParameterTypeDefaultDescription
fox_dataDict[str, Any]Parsed JSON from api.foxsports.com/bifrost/v1/cfb/event/{id}/data.

Returns

A dict shaped like ESPN's college-football/summary response (header + drives + stub pickcenter/boxscore/...), ready to assign onto CFBPlayProcess(...).json.

get_2pt_probs(pbp_df: 'Any') -> 'pd.DataFrame'

Two-point-conversion decision surface (cfb4th get_2pt_wp).

Treats each row as "the scoring team just made a touchdown; decide between the extra point and going for two". Enumerates the three point outcomes (0 / 1 / 2) of the try, scores the opponent's ensuing-drive WP for each from the scoring team's perspective, and combines them with the two-point conversion probability (bundled CFB model) and the empirical CFB extra-point make rate (XP_MAKE_PROB`).

Parameters

ParameterTypeDefaultDescription
pbp_dfAnyPlay-by-play frame (polars or pandas) carrying the start.* state columns in sportsdataverse.cfb.cfb_fourth_down._PBP_COLS.

Returns

A pandas copy of pbp_df plus: * two_pt_wp -- prob_2pt * wp(pts=2) + (1 - prob_2pt) * wp(pts=0). * xp_wp -- prob_xp * wp(pts=1) + (1 - prob_xp) * wp(pts=0) with prob_xp = _XP_MAKE_PROB. * prob_2pt -- the bundled-model two-point conversion probability. * two_pt_recommendation -- "go_for_2" iff two_pt_wp > xp_wp else "kick_xp" (None where the inputs are NaN). * two_pt_wp_diff -- two_pt_wp - xp_wp (positive => go for 2). When the two-point model isn't bundled (TWO_PT_MODEL_AVAILABLE is False) or the required state columns are missing, all decision columns are null -- probabilities are never fabricated.

Example

from sportsdataverse.cfb.cfb_two_point import get_2pt_probs
out = get_2pt_probs(touchdown_rows)
print(out[["two_pt_wp", "xp_wp", "two_pt_recommendation"]].head())

get_4th_down_probs(pbp_df) -> 'pd.DataFrame'

Full 4th-down decision surface (cfb4th add_4th_probs) + recommendation.

Runs get_go_wp, get_fg_wp, get_punt_wp on the fourth-down rows and adds the combined option columns plus:

  • fourth_down_recommendation -- the max-WP choice among {go, punt, field_goal} (NaN options are excluded; when the FG model isn't bundled, field_goal is excluded from the comparison).
  • go_wp_diff / punt_wp_diff / fg_wp_diff -- each option's WP minus the recommended option's WP (the recommended option's diff is 0, the others <= 0). NaN where the option WP is NaN.
  • go_boost -- cfb4th's headline number: 100 * (go_wp - max(fg_wp, punt_wp)) in percentage points.

Parameters

ParameterTypeDefaultDescription
pbp_dfPlay-by-play frame (polars or pandas) of fourth-down situations carrying the start.* state columns in PBP_COLS`.

Returns

A pandas copy of pbp_df with the decision columns added. Empty input returns the input plus empty decision columns.

Example

from sportsdataverse.cfb.cfb_fourth_down import get_4th_down_probs
out = get_4th_down_probs(fourth_down_rows)
print(out[["go_wp", "punt_wp", "fg_wp", "fourth_down_recommendation"]].head())

get_cfb_teams(return_as_pandas=False) -> 'pl.DataFrame'

Load college football team ID information and logos

Parameters

ParameterTypeDefaultDescription
return_as_pandasboolFalseIf True, returns a pandas dataframe. If False, returns a polars dataframe.

Returns

Polars dataframe containing teams available.

col_nametypedescription
team_idintegerESPN team id.
schoolcharacterTeam name.
mascotcharacterTeam mascot.
abbreviationcharacterMetric abbreviation.
alt_name1characterTeam alternate name 1 (as it appears in play_text).
alt_name2characterTeam alternate name 2 (as it appears in play_text).
alt_name3characterTeam alternate name 3 (as it appears in play_text).
conferencecharacterConference of the team.
divisioncharacterDivision in the conference for the team.
colorcharacterPrimary team color (hex, no #).
alt_colorcharacterTeam color (alternate).
logocharacterTeam or league logo URL.
logo_darkcharacterDark-mode logo URL.

Example

from sportsdataverse.cfb import get_cfb_teams
teams = get_cfb_teams()
print(teams.shape)

# Pandas round-trip

teams_pd = get_cfb_teams(return_as_pandas=True)
teams_pd.head()

# Pipeline next step (build a team_id to logo URL map)

teams = get_cfb_teams()
logo_map = dict(zip(teams["team_id"], teams["logo"]))

get_fg_wp(pbp_df) -> 'pd.DataFrame'

Expected win probability of attempting a field goal (cfb4th get_fg_wp).

Parameters

ParameterTypeDefaultDescription
pbp_dfPlay-by-play frame (polars or pandas) of fourth-down situations.

Returns

A pandas copy of pbp_df plus fg_make_prob, make_fg_wp, miss_fg_wp and fg_wp (= make_prob*make_wp + (1-make_prob)*miss_wp, from the kicking team's perspective). All four are NaN when the FG model is not bundled (FG_MODEL_AVAILABLE is False) -- probabilities are never fabricated.

get_go_wp(pbp_df) -> 'pd.DataFrame'

Expected win probability of going for it on 4th down (cfb4th get_go_wp).

Parameters

ParameterTypeDefaultDescription
pbp_dfPlay-by-play frame (polars or pandas) of fourth-down situations carrying the start.* state columns in PBP_COLS`.

Returns

A pandas copy of pbp_df plus go_wp (prob-weighted WP of going for it), first_down_prob (P(conversion)), wp_succeed (mean WP over conversion outcomes) and wp_fail (mean WP over failure outcomes). go_wp is always in [0, 1]; the conditional columns are in [0, 1] but can be NaN for degenerate goal-line plays where one outcome bucket is empty (matches the R reference pivot_wider NA behavior).

Example

from sportsdataverse.cfb.cfb_fourth_down import get_go_wp
out = get_go_wp(fourth_down_rows)
print(out[["go_wp", "first_down_prob"]].head())

get_punt_wp(pbp_df) -> 'pd.DataFrame'

Expected win probability of punting on 4th down (cfb4th get_punt_wp).

Parameters

ParameterTypeDefaultDescription
pbp_dfPlay-by-play frame (polars or pandas) of fourth-down situations.

Returns

A pandas copy of pbp_df plus punt_wp (prob-weighted WP of punting, from the punting team's perspective). punt_wp is NaN where the punt end-yardline distribution has no support for the play's yards_to_goal (e.g. inside the 31, where punting is dominated and the cfb4th table is empty -- matching the R reference's left-join NA behavior).

scoreboard_event_parsing(event)

Internal helper that flattens an ESPN scoreboard event dict into a shape

suitable for pd.json_normalize.

Parameters

ParameterTypeDefaultDescription
eventdictA single scoreboard events[*] entry from the ESPN college-football scoreboard API.

Returns

The same event dict, mutated in place with home/away copies of the competitors and trimmed of unused link/odds keys.

Example

from sportsdataverse.cfb import espn_cfb_schedule
sched = espn_cfb_schedule(dates=2023, week=5)

yahoo_cfb_boxscore(game_id: 'Union[int, str]', *, return_parsed: 'bool' = False, return_as_pandas: 'bool' = False, **kwargs: 'Any') -> 'Dict[str, Any]'

Yahoo CFB boxscore — raw JSON passthrough (parsing not yet implemented).

Wraps the editorial boxscore/{game_id} resource. The payload uses a normalized decoder-dictionary schema (player_stats[playerId][variation][stat_type]=value joined against the stat_types/stat_categories dictionaries). Flattening that into tidy frames is a follow-up; until then this returns the raw JSON dict and fails fast if a parsed frame is requested rather than silently ignoring return_parsed.

Parameters

ParameterTypeDefaultDescription
game_idUnion[int, str]Dotted Yahoo game id (e.g. "ncaaf.g.202509200023").
return_parsedboolFalseMust be False (the default). Passing True raises NotImplementedError because parsing is not implemented.
return_as_pandasboolFalseAccepted for signature parity with the sibling wrappers; has no effect while only raw output is supported.

Returns

The raw editorial boxscore JSON as a dict (service.boxscore).

Example

from sportsdataverse.cfb import yahoo_cfb_boxscore
raw = yahoo_cfb_boxscore("ncaaf.g.202509200023")

yahoo_cfb_player_season_stats(season: 'int' = 2024, *, league_structure: 'str' = 'ncaaf.struct.div.1', count: 'int' = 200, qualified: 'bool' = False, return_parsed: 'bool' = True, return_as_pandas: 'bool' = False, **kwargs: 'Any') -> "Union[pl.DataFrame, 'pd.DataFrame', Dict[str, Any]]"

Yahoo CFB player season stats (modern; one wide row per player).

Wraps the shangrila leagueStatsIndividual query, which returns every stat group (passing/rushing/receiving/...) in one call, pivoted wide with one column per statId. NCAAF data is available 2013-present.

Parameters

ParameterTypeDefaultDescription
seasonint2024Season year (2013-present). Defaults to 2024.
league_structurestr'ncaaf.struct.div.1'Yahoo league-structure id (division filter). Defaults to "ncaaf.struct.div.1" (FBS).
countint200Maximum number of players to request. Defaults to 200.
qualifiedboolFalseRestrict to qualified leaders only. Defaults to False.
return_parsedboolTrueIf True (default) flatten to a DataFrame; if False return the raw JSON dict.
return_as_pandasboolFalseIf True return a pandas DataFrame; otherwise polars. Ignored when return_parsed=False.

Returns

A wide polars DataFrame (default), a pandas DataFrame when return_as_pandas=True, or the raw JSON dict when return_parsed=False. Includes a self-describing season column.

Example

from sportsdataverse.cfb import yahoo_cfb_player_season_stats
df = yahoo_cfb_player_season_stats(season=2024)

yahoo_cfb_player_season_stats_legacy(season: 'int' = 2024, category: 'str' = 'Passing', sort_stat: 'str' = 'PASSING_YARDS', *, league_structure: 'str' = 'ncaaf.struct.div.1', count: 'int' = 200, return_parsed: 'bool' = True, return_as_pandas: 'bool' = False, **kwargs: 'Any') -> "Union[pl.DataFrame, 'pd.DataFrame', Dict[str, Any]]"

Yahoo CFB legacy per-category player leaders (one wide row per player).

Wraps the legacy seasonStatsFootball{Category}Ncaaf query (one stat category per call), pivoted wide with one column per statId.

Parameters

ParameterTypeDefaultDescription
seasonint2024Season year (2013-present). Defaults to 2024.
categorystr'Passing'Stat category, one of {"Passing", "Rushing", "Receiving", "Defense", "Kicking", "Punting", "Returns"}. Defaults to "Passing".
sort_statstr'PASSING_YARDS'Required FootballStatId to sort by (see the catalog vocab). Defaults to "PASSING_YARDS".
league_structurestr'ncaaf.struct.div.1'Yahoo league-structure id (division filter). Defaults to "ncaaf.struct.div.1" (FBS).
countint200Maximum number of players to request. Defaults to 200.
return_parsedboolTrueIf True (default) flatten to a DataFrame; if False return the raw JSON dict.
return_as_pandasboolFalseIf True return a pandas DataFrame; otherwise polars. Ignored when return_parsed=False.

Returns

A wide polars DataFrame (default), a pandas DataFrame when return_as_pandas=True, or the raw JSON dict when return_parsed=False. Includes self-describing season and category columns.

Example

from sportsdataverse.cfb import yahoo_cfb_player_season_stats_legacy
df = yahoo_cfb_player_season_stats_legacy(
season=2024, category="Rushing", sort_stat="RUSHING_YARDS"
)

yahoo_cfb_scoreboard(season: 'int', week: 'int' = 1, *, count: 'int' = 500, return_parsed: 'bool' = True, return_as_pandas: 'bool' = False, **kwargs: 'Any') -> "Union[pl.DataFrame, 'pd.DataFrame', Dict[str, Any]]"

Yahoo CFB scoreboard (one row per game).

Wraps the editorial scoreboard resource and flattens the games map. season is required — there is no meaningful default for a weekly scoreboard and the API has no concept of "current season". The full raw payload also carries teams/leagues/odds maps (use return_parsed=False).

Parameters

ParameterTypeDefaultDescription
seasonintSeason year (required).
weekint1Schedule week number. Defaults to 1.
countint500Maximum number of games to request. Defaults to 500.
return_parsedboolTrueIf True (default) flatten the games map to a DataFrame; if False return the raw JSON dict.
return_as_pandasboolFalseIf True return a pandas DataFrame; otherwise polars. Ignored when return_parsed=False.

Returns

A polars DataFrame (default) with one row per game, a pandas DataFrame when return_as_pandas=True, or the raw JSON dict when return_parsed=False. Includes self-describing season and week columns.

Example

from sportsdataverse.cfb import yahoo_cfb_scoreboard
df = yahoo_cfb_scoreboard(season=2024, week=1)

yahoo_cfb_team_season_stats(season: 'int' = 2024, *, league_structure: 'str' = 'ncaaf.struct.div.1', count: 'int' = 200, return_parsed: 'bool' = True, return_as_pandas: 'bool' = False, **kwargs: 'Any') -> "Union[pl.DataFrame, 'pd.DataFrame', Dict[str, Any]]"

Yahoo CFB team season stats (modern; one wide row per team).

Wraps the shangrila leagueStatsByTeam query (all stat groups in one call, pivoted wide with one column per statId).

Parameters

ParameterTypeDefaultDescription
seasonint2024Season year (2013-present). Defaults to 2024.
league_structurestr'ncaaf.struct.div.1'Yahoo league-structure id (division filter). Defaults to "ncaaf.struct.div.1" (FBS).
countint200Maximum number of teams to request. Defaults to 200.
return_parsedboolTrueIf True (default) flatten to a DataFrame; if False return the raw JSON dict.
return_as_pandasboolFalseIf True return a pandas DataFrame; otherwise polars. Ignored when return_parsed=False.

Returns

A wide polars DataFrame (default), a pandas DataFrame when return_as_pandas=True, or the raw JSON dict when return_parsed=False. Includes a self-describing season column.

Example

from sportsdataverse.cfb import yahoo_cfb_team_season_stats
df = yahoo_cfb_team_season_stats(season=2024)

yahoo_cfb_team_season_stats_legacy(season: 'int' = 2024, category: 'str' = 'Passing', sort_stat: 'str' = 'PASSING_YARDS', *, league_structure: 'str' = 'ncaaf.struct.div.1', count: 'int' = 200, return_parsed: 'bool' = True, return_as_pandas: 'bool' = False, **kwargs: 'Any') -> "Union[pl.DataFrame, 'pd.DataFrame', Dict[str, Any]]"

Yahoo CFB legacy per-category team stats (one wide row per team).

Wraps the legacy seasonTeamStatsFootball{Category} query (one stat category per call), pivoted wide with one column per statId.

Parameters

ParameterTypeDefaultDescription
seasonint2024Season year (2013-present). Defaults to 2024.
categorystr'Passing'Stat category, one of {"Passing", "Rushing", "Receiving", "Defense", "Kicking", "Punting", "Returns", "Kickoffs", "Offense"}. Defaults to "Passing".
sort_statstr'PASSING_YARDS'Required FootballStatId to sort by. Defaults to "PASSING_YARDS".
league_structurestr'ncaaf.struct.div.1'Yahoo league-structure id (division filter). Defaults to "ncaaf.struct.div.1" (FBS).
countint200Maximum number of teams to request. Defaults to 200.
return_parsedboolTrueIf True (default) flatten to a DataFrame; if False return the raw JSON dict.
return_as_pandasboolFalseIf True return a pandas DataFrame; otherwise polars. Ignored when return_parsed=False.

Returns

A wide polars DataFrame (default), a pandas DataFrame when return_as_pandas=True, or the raw JSON dict when return_parsed=False. Includes self-describing season and category columns.

Example

from sportsdataverse.cfb import yahoo_cfb_team_season_stats_legacy
df = yahoo_cfb_team_season_stats_legacy(
season=2024, category="Rushing", sort_stat="RUSHING_YARDS"
)

yahoo_cfb_teams(season: 'int', week: 'int' = 1, *, return_parsed: 'bool' = True, return_as_pandas: 'bool' = False, **kwargs: 'Any') -> "Union[pl.DataFrame, 'pd.DataFrame', Dict[str, Any]]"

Yahoo CFB team directory (one row per team).

Yahoo has no standalone teams resource (the documented sports.league.teams resource 404s without auth). Instead the editorial scoreboard payload is "fat": one call embeds the full ~186-team directory under service.scoreboard.teams keyed by the dotted ncaaf.t.<id> team id. This wrapper pulls that map for the requested (season, week) and projects it to the directory columns -- it is the Yahoo side of sportsdataverse.cfb.cfb_teams_crosswalk.

Parameters

ParameterTypeDefaultDescription
seasonintSeason year (required; the scoreboard is fetched to obtain the embedded teams map).
weekint1Schedule week used to fetch the scoreboard. Defaults to 1. The embedded directory is the full league list regardless of week.
return_parsedboolTrueIf True (default) flatten the teams map to a DataFrame; if False return the raw scoreboard JSON dict.
return_as_pandasboolFalseIf True return a pandas DataFrame; otherwise polars. Ignored when return_parsed=False.

Returns

A polars DataFrame (default) with one row per team -- columns team_id, abbreviation, display_name, full_name, location, nickname, conference, conference_abbreviation, conference_id, division, division_id, seatgeek_id -- a pandas DataFrame when return_as_pandas=True, or the raw scoreboard JSON dict when return_parsed=False.

Example

from sportsdataverse.cfb import yahoo_cfb_teams
teams = yahoo_cfb_teams(season=2024)
abbr = dict(zip(teams["team_id"], teams["abbreviation"]))