Scrapes and parses pbp data for ncaa basketball games
ncaa_bb_pbp(game_id)
the game_id for the specified game from stats.ncaa.com
a dataframe of cleaned, parsed play-by-play data for the specified
game_id
check for which team has possession on each play -- this is effectively
the instantaneous "who has possession" whenever a given stat is recorded
e.g. if team A makes a shot, that shot is recorded as the possession of
team A, not team B who has possession immediately after the made shot
however, if team B forces a steal, team B is recorded as the possessing
team
fixing a very annoying edge case where a team wins jump ball
then immediately loses it on a turnover
for when we can't guess who has possession, we basically fill in the gaps
based on who had possession before and after a play
these are deliberately commented out for a few reasons:
As stringer data, these designators are somewhat noisy
As Seth Partnow pointed out in The Midrange Theory, these designators can be biased
THese designators are only available in V2 of the PBP, not V1 grouping by period because some lineups will change between periods without being noted in the pbp whenever a player is subbed in, they have a 1 in that row and a 0 in the row before. whenever a player is subbed out, they have a 0 in that row and a 1 in the row before we then cascade the 1s and 0s up and down to create map of who is in the game at any given time now we map player names to the roster df. this could be noisy in theory but i haven't seen any issues in practice some character encoding stuff, dropping players with mispelled names in the pbp for v1, these columns are not recorded, so they are set to NA so they don't register as false negatives