Updating the Statistical Ranking of the NBA's Most Loyal (but not really) Fanbases

This post is a follow-up to a recent post on the Harvard Sports Analysis Collective, which was subsequently picked up on the /r/NBA subreddit. The main argument presented in the Harvard post is that the correlation between NBA teams' home stadium attendance and win % can be used as a measure of fan loyalty. There exists many more articles that use similar methodology, such as this Bleacher Report post from 2015.

http://3ricountymusik.blogspot.com/2012/06/heat-fans-fickle-or-cocky-loyal-or.html

It's Easy to Create Interesting Headlines from Misleading Data

While I appreciate the spirit behind the Bleacher Report post ("Which NBA Fanbase Has Been the Most Loyal During the Past 15 Years?"), the Harvard post ("Which NBA Team has the most Loyal Fans?"), and the Reddit post ("A Statistical Ranking of the NBA's most Loyal Fanbases"), their titles invite incorrect conclusions, because the correlation between home stadium attendance and win % is a terribly simplistic proxy for fan loyalty.

I know it looks like I'm calling out the authors of these post, but that's not my intention. I'm only hoping to make readers aware of a trend in sports and esports: drawing improper conclusions from data sets that would make statisticians and data scientists blow a gasket. It's a huge problem because many people on the Internet don't read past the headline.

Now for Some Data...

I had the chance to perform a similar home stadium attendance and win % correlation as part of a group project in my sports economics class. The main purpose of our project was to investigate competitive balance in the NBA during the last 20 years, but some of our data applies to what I've discussed above. 

Our data was taken from '01 to the '16 - '17 seasons, while the original Harvard article took data from '91 - '92 to '13 - '14. It looks like we used different attendance data sources as well (our group used ESPN, which counts standing space attendance, thus allowing some teams to have over 100% attendance). Overall, I think both data sets check out fine. Also: the Harvard post was adapted from a project done by a sophomore college student, which I don't think a lot of readers realized. All things considered, I think he (or possibly they) did a great job.

Below are my group's calculations (check out the Harvard post for the original data). Note: Unlike the Harvard post, we did not omit teams that moved stadiums during the tracked period.

I omitted the 0 in the thousandths place of the Hawks' correlation value to annoy anyone who is OCD about this kind of thing. Sorry. 

Get More Views with the Following Clickbait Titles! 

A sexy but unsubstantiated article headline from the above data could be: "Statistical Proof that Knicks Fans are Masochists," because a negative correlation means that attendance is higher during seasons when the team loses more games. Other possible eye-catching headlines: "OKC Thunder Fans are Bandwagoners" or "Mavs Fans are Among the NBA's Most Loyal."

However, these conclusions are crap.

https://www.cheatsheet.com/wp-content/uploads/2015/08/Spike-Lee-Nick-Laham-Getty-Images-e1438693839791.jpg?x45964
A Spike Lee joint.

The Above Correlations are Close to Meaningless

Let's take a look at what's wrong with the aforementioned correlations, in keeping with the spirit of the title of this blog, Where Stats Lie (like that double entendre?). Using win rate and attendance correlation as a proxy for fan loyalty is way too simplistic. The Harvard post does mention a bunch of confounding variables, but not until the end, which most readers probably didn't get to. A list of issues, some brought up in the Harvard + Reddit posts and some not:
  • Teams track and report their own attendance data, which means there exists the potential for teams to fudge their numbers. 
  • Attendance are likely affected by other variables besides win rate, such as the presence of a superstar, ticket prices, location of the stadium, and metro area median income / population.
    1. Teams that want to boast high attendance figures can aggressively discount or give away tickets. Case in point, the Mavericks. Here are a few quotes from a Dallas Business Journal article from earlier this year: 
      • "...despite a 22-34 record landing them in the bottom of the Western Conference, the Mavericks [...] have the second-best attendance in the NBA this season.
      • "I don't try to maximize revenue," Owner Mark Cuban told the Dallas Business Journal in an email interview. "Rather than having seats unused, we partner with charities, youth groups and schools to bring groups to every game."
      • "On the flip side of the attendance success, local TV ratings have taken a huge hit. The Sports Business Journal released their TV ratings data through the All-Star break, and the ratings for the Mavs have dropped 53 percent this season. That's the largest drop in the NBA."
    • A more accurate measure of loyalty would also take into account other considerations, such as TV ratings, merchandise sales, etc. 
    • Are correlations even the best statistical tool to use? I don't know.
    How do we account for celebrity supporters in our loyalty measure? One celeb = 1,000 regular people sounds about right. Also, +10 people for each Diet Coke bottle. 

    So How Would You Accurately Measure Fan Loyalty?

    Fan loyalty can never be perfectly measured because it's a concept that doesn't have a concrete definition. You might consider attendance a more important measure of fan loyalty than TV ratings, while I believe the opposite. The best we can do is take a basket of variables and create an "index." Conceptually, it would be similar to how the consumer price index (for GDP) or consumer loyalty index (for brands) work. Some of the bullet points in the above section would probably make their way into this index.

    Alternatively, you can check out the Brand Keys Sports Loyalty Index.

    Some Bonus Graphs from our Class Project

    I'll leave you with some graphs - originally made for our class project - that are relevant to the topic of this post. Draw conclusions at your own discretion.




    The earlier Wade era wasn't graphed, but he was on the team from the '03-'04 to '15-'16 seasons.




    Comments