Villanova players celebrate on the basketball court.

Villanova players celebrate after winning last year’s Final Four NCAA college basketball tournament.

AP file

Work & Economy

Playing by the numbers

7 min read

Student-run Harvard Sports Analysis Collective takes an empirical look at key questions in basketball (including March Madness), football, even curling

Hoping for an edge in this year’s March Madness office pool? Have a longstanding argument with your friends on which team’s fans are the most loyal? Always wondered how much of a difference it makes to be able to throw the last stone in the initial curling end? You can find your answers in the work of the Harvard Sports Analysis Collective (HSAC), a student-run organization dedicated to the quantitative analysis of sports strategy and management.

Since its founding in 2006 under the tutelage of “Moneyball”-cited statistician and Professor Emeritus Carl Morris, HSAC has been answering a variety of sports-related questions, employing often-sophisticated statistical models to get to the bottom of longstanding debates or offer context to those eye-popping and head-scratching numbers that excite, and boggle the minds of, sports fanatics and pundits all over the world. (The collective just posted its analysis of this year’s March Madness college basketball tournament.)

HSAC member projects, which range from social media posts drawn from simple fact-finding exercises to senior theses engaging complex quantitative analysis, reflect what’s current and relevant in the sports world, and they often emerge from spirited conversation during Collective meetings, which take place Tuesday nights in Winthrop House. According to HSAC faculty adviser and senior preceptor in statistics Kevin Rader, popular methodologies compare two groups (teams, leagues, player pools) or look at how things have changed over time. “Or a really extreme event happens,” he explains, “something cool happens in the Super Bowl, and a decision needed to be made. Was it the right decision? Let’s investigate that from an empirical perspective.”

This past January, HSAC took to Twitter to answer a simple question many college football fans were likely pondering during Clemson’s surprising national championship drubbing of Alabama, 44–16, namely: When was the last time the Crimson Tide gave up more than 50 points in a regulation game? The answer, according to HSAC: When they lost to Sewanee 54­–4 way back in 1907. The tweet received close to 250 retweets and nearly 500 likes.

Co-president of HSAC Erik Johnsson ’20 (from left), President Emeritus Andrew Puopolo ’19, and Co-President Jack Schroeder ’21.

Jon Chase/Harvard Staff Photographer

When the HSAC team looks to delve deeper into a question and really engage their skills as statisticians, they’ll write about their findings on the blog, which has drawn coverage from significant mainstream media outlets like ESPN, NBC Sports, Bleacher Report, the Boston Globe, and The New Yorker, as well as major league franchises and the leagues themselves, including the NBA’s Dallas Mavericks, Memphis Grizzlies, and Orlando Magic, the National Football League, and Major League Soccer. Some popular posts over the years: “A Way-Too-Early Prediction of the NFL Season,” “Conference Bias in College Football,” and “Which Sports League Has the Most Parity?”

Often, existing fan theories (“that referee hates my team” or “we never win in that stadium”) inspire HSAC members to challenge their veracity. Last February, HSAC President Emeritus Andrew Puopolo, a senior at the College and a self-professed soccer addict, sought to answer the age-old question of referee bias using the oft-maligned English soccer official Mike Dean, who is particularly reviled by supporters of the London-based Arsenal Football Club, as an entrée into a statistical analysis of referee/team-specific bias throughout the English Premier League. In short, Puopolo looked at every combination of Premier League teams and referees who managed at least 15 of their matches between the 2005–2006 and 2016–2017 seasons, comparing actual results against pregame betting odds in his quest to find bias — of which, in the end, he found “no alarming signs.” Not that an Arsenal supporter would ever be swayed by the data, even if it was culled from tens of thousands of combinations.

Which is fine by Puopolo, who is the first to admit when he finds flaws in his own methodologies, and who loves the opportunity to spark conversation — on sports, but especially on statistics — in a quest to help himself and his colleagues get better. Often, HSAC analyses encourage readers to make their own decisions about the data; there isn’t always a clear-cut answer to every question. This spirit of engagement in finding new ways to look at data is what HSAC is all about.

Often, existing fan theories (“that referee hates my team” or “we never win in that stadium”) inspire HSAC members to challenge their veracity.

HSAC’s current leadership shares Puopolo’s commitment to moving the field forward. Current co-president Erik Johnsson, a junior concentrating in statistics and a member of the Crimson volleyball team, recently completed a project designed to improve upon the Elo model, a widely respected player skill-level rating system often employed by statistics heavyweight fivethirtyeight.com. When perusing fivethirtyeight while watching an NBA game, Johnsson noticed that the site had “huge percent chances” for then-underperforming teams the Utah Jazz and the New Orleans Pelicans to make the playoffs, which he thought to be “a little odd.” So Johnsson read up on the site’s model, replicated it, and, to make the model more exact, added in some new variables (in short, accounting for off-season changes in team strength by making adjustments in ratings for games earlier in the current season). His findings: Over a 10-year period, his model did make “slightly better” yet “statistically significant” predictions.

By working with the Elo model, Johnsson followed in the footsteps of HSAC faculty adviser and senior lecturer on statistics Mark Glickman, whose Glicko Rating System was also developed as an improvement to the Elo model. Johnsson was also able to implement ideas from a Harvard statistics course in his analysis. This spirit of learning and then teaching, especially among members of the Collective, has always been a big part of what HSAC does.

“We actively encourage members to ask us for help,” said the other current co-president, Jack Schroeder, a sophomore studying government and data science who is also on the curling team, “either with the methodology behind the project, the writing process, or even just getting the data, which is often the hardest part.”

Faculty adviser Rader added that he is able to maintain a largely hands-off approach in his own role thanks to mentoring from the older members in the group, who have a wealth of institutional knowledge and a stronger understanding of potential methodologies than some of their younger counterparts. He said he only steps in when he sees an opportunity to push the students further by recommending more sophisticated models that they may not be familiar with yet.

Johnsson, Schroeder, and Puopolo all foresee potential future careers in sports analytics, aspiring to follow in the footsteps of HSAC alumni such as Alec Halaby ’09, vice president of football operations and strategy for the Philadelphia Eagles; Daniel Adler ’10, HLS/HBS ’17, director of baseball operations for the Minnesota Twins; and recent grad Nathán Goldberg Crenier ’18, who is already assistant to the president of the U.S. Soccer Federation. And opportunities may arise in fields outside of traditional sports venues, said Puopolo. As more and more states seek to legalize sports gambling, there will be new opportunities for machine-learning- and data-science-minded graduates to pursue careers in that field as well.

The pipeline is real, and the connections to the professional major sports are active. Last semester, Puopolo set up consulting projects with teams from the National Football League and Major League Baseball, which are ongoing.

“[These projects] give everyone a chance to take these skills that we talk about during meetings, and stuff people are learning at school in an academic setting,” said Schroeder, “and really apply it in a professional, business setting.”

“We see this as a great way to increase membership in the club, too,” added Johnsson. “If we can convince freshmen and sophomores who like sports and statistics to come to the club, and who can then gain actual experience working for real teams, and say they have connections with [major professional sports teams], it’s a great way to get people involved and excited.”