<!-- TITLE: Wiki Matched Obits --> <!-- SUBTITLE: A quick summary of Wiki Matched Obits --> # Matching Obituaries with Wikipedia Entries ## March 11 - Analysis My Takeaways ### Matched vs. Unmatched Obituaries - While only a small sample of 783 people was taken due to errors in the code (ids after 783 created a Nonetype error) I found that a majority of the obituaries did not match with a wikipedia entry - The proportion of unmatched obituraies is about 0.725, while the proportion of matched obituaries is 0.274 of the sample of 783 obituaries. ![Screen Shot 2020 04 12 At 12 36 08 Am](/uploads/screen-shot-2020-04-12-at-12-36-08-am.png "Screen Shot 2020 04 12 At 12 36 08 Am") - After creating a list of all the matched and unmatched obituaries from the sample, I calcluated the gender distribution of the matched and unmatched obituaries and created associated bar plots to show the difference - What I found was that the gender distributions of the matched and unmatched obituaries was very similar even though their is a signficantly higher propriton of unmatched obituaties - Of the unmatched obituaries, the proportion that is male is 0.8415, while the proportion that is female is 0.1585 - Of the matched obituaries, the proportion that is male is 0.8837, while the proportion that is female is 0.1162 #### Some ids are coming up as a NoneType (so does this mean that there are obitaries that have no ids???? - Could be a bug that needs to be worked out - The id 783 and so forth causes an error ## April 7th- Analysis My Takeaways ### Update: - I was able to work out what the bug was and why it was not working in order to properly check the correct proportions of of matched vs unmatched wikipedia articles. - Before I was creating a for loop and looping around - The previous method of looping around the id_list of 60,000+ obituaries was not effective and would crash at around id 783 for some reason. So I developed the a different loop which checked in a much faster way the proportion of matched/unmatched obituries to wikipedia articles. ![Screen Shot 2020 04 12 At 12 28 47 Am](/uploads/screen-shot-2020-04-12-at-12-28-47-am.png "Screen Shot 2020 04 12 At 12 28 47 Am") ### From the 60,000 + obituaries majority of them are unmatched - The proportion of obituarized people that had a matching wikipedia article was about 0.3824, while the proprtion that did not was about 0.6176. - This is interesting since one would think if someone is well known enough to be obituarized in the New York Times they would also have a wikipedia article written about them. - This leads to the question, are these proportions correct? This is something that could be explored further.