Random thought: does a parent’s name have a measurable effect on what they name their child? I wish I could tell you, but unfortunately I couldn’t find a large dataset associating real parents and children.
Instead, I stumbled upon this, all of the marriages in NYC since the 1950s. Just the section from 1996-2010 has nearly 2 million real people’s names!
One place to start with this many names is looking at initials: does a person’s last initial correlate at all with their first initial? Fortunately, I can answer this one… and the answer is, seems like it, though I didn’t actually perform any statistical analysis to prove so. (Catchy!)
To start, I first found the expected distribution of first+last initals assuming both were independent. That is, if 5% of the sample population has last initial of M and 5% of the sample population has first initial J, we’d expect .05*.05 = .0025, or 0.25%, of the population to have the initials JM.
I then found the actual distribution of initial pairings. Let’s say .27% of the population is actually named JM. Then there’s a .02% surplus of JMs over what we expected. I performed that calculation for all 676 initial pairing and created the heatmap you see.
The diagonal stripe of more-common-than-expected double initials is the main thing that stood out to me. Commenters on Reddit’s DataIsBeautiful pointed out the concentrations of WC, XC, YC, and WL, XL, YL are likely due to the large Chinese population in NYC.
Of course, this data comes with caveats. People getting married in NYC are probably not representative of the USA as a whole. Also, people can get married more than once and may skew the distribution a little. On the whole, though, this was a fun exercise using the largest set of real people’s names I could find.