Tuesday, March 10, 2015

Nerdiest Baseball Card Post Ever

I've always had the suspicion that cards in a box of baseball cards are not random. It always seemed that there were more cards of the more popular teams in the box. Or maybe it depended on where you lived.

I'm an engineer and I know that data is more important than suppositions. So this year I decided to check out my supposition.

I finished cataloging the Topps Series 1 cards I bought. I had 284 base cards, counting duplicates. With 30 teams, that averages to 9.4 cards per team.  What did I actually get?  I'm not going to show the whole table but the count of cards per team ranged from 3 cards for the Cubs and 15 cards for the Mets. Sounds unevenly distributed doesn't it?

What would a box or randomly distributed cards look like? It would not be a box with the same amount of cards per team. Random doesn't work like that. What it should look like is a normal distribution of cards per team around the average (really the mean, for you other nerds out there) of the cards per team.

Here's my crude drawing of what the distribution of cards per team looks like for my Topps cards.

There was one team (the Cubs) with 3 cards. There were 8 teams with 8 cards. There was only 1 team with 10 cards, etc. Without doing the math from the Wikipedia article, I think this is about as close to a normal distribution as could be expected. Considering that gap in the middle the distribution might also be bimodel but I can't think of why it might be. When I get the Series 2 cards into the count I'll revisit this.

Of course, I don't know if there are the same number of cards for each team produced. I expect that not to be the case. For example there were several Postseason cards featuring Giants. There are League Leader cards which I put with the team of the first player on the card (which is how I catalog them).

6 comments:

Ryan G said...

If actual print runs per card are the same, then the only thing that would matter would be how many different cards each team has in the set. And for multi-team cards you'd have to either not include them, or count them for all teams (in both checklist tallying and distribution tallying).

If you go beyond base cards, then you have to also examine overall odds/print runs for each set.

The Junior Junkie said...

Long live the nerds

BASEBALL DAD said...

You and I are kind of on the same wave length here !
http://baseballdad-mytribeblog.blogspot.com/2015/02/what-difference-inserts-and-parallels.html#comment-form

Josh D. said...

I enjoyed this post (and any attempts to engineer-ify the card world, but I still think I had the nerdiest post of all time. :-)
http://royalsandrandoms.blogspot.com/2011/09/pack-searching-with-science-2011-topps.html

capewood said...

Thanks for the comments.

Baseball Dad - great minds think alike

Josh D. - OK, I'll concede. I could have used Excel too.

Fuji said...

Great post. Nerds rule!