On Jan. 20, Philadelphia Flyers forward Zac Rinaldo was ejected from a game after boarding Penguins defenseman Kris Letang. The Flyers came back to win. After the game, Rinaldo said he "changed the game" (for which he was suspended 8 games). Using Python for webscraping and data analysis, I explore data from 10 NHL seasons to investigate how hockey penalties affect the outcome of the game.
When is Good to be Bad? How do hockey penalties affect the outcome of the game?
This talk will focus more on the process of getting the penalty and goal data and less on data analysis. Although in the end, I will address the question of "When is it good to be bad?"
LOOKING AT THE SOURCE DATA TO UNDERSTAND HOW TO PARSE IT WITH BEAUTIFULSOUP Beginning with the 2002-2003 season, I explored 10 seasons worth of penalty data from NHL.com. During this time, NHL.com had at least 3 formats for their play-by-play game recaps, including some that were not valid HTML formats recognized by BeautifulSoup. With BeautifulSoup, I was able to build parser that could scrape the penalty information.
USING TRY/EXCEPT TO IDENTIFY AND ACCOMODATE OF EDGE CASES Many edge cases had to be accounted for in the parser. For example, how do I account for a shootout goal versus a regulation/overtime goal? And, how do I account for goals scored before the game starts?
USING NUMPY AND SCIPY FOR EXPLORATORY DATA ANALYSIS Once all the data was put together, I had to classify penalties. Some classifications I took from the NHL rule book (eg, Physical Fouls, Stick Infractions) and other classifications I came up with based on logic (eg, Physical Altercations, Bench, Double Minor). Penalties could belong to more than one classification. Not all seasons gave the same penalties in the pla-by-play, and rules changed between seasons. I then divided the penalties by the team score when the penalty occurred (tied, ahead, or behind), and the outcome after the penalty and calculated the odds ratio for different penalty types based on different starting points. I did a similar analysis but used "team to score the next goal" instead of outcome in case the effect of the penalty is fleeting. All analyses are preliminary, and I need to figure out how to take into account that you need at least one member of each team for a physical altercation penalty.