Our project started as an attempt to compare the similarities and differences of RSS feeds of news sources from around the world. Origianlly we thought to sort the stories from each RSS feed into general categories of business, sports, etc. and display the importance and number of each story to the country of origin. However, upon finding a very similar finished product, Google News Maps, we decided to take it in a different direction and directly compare the similarities of each news source to compare the cultural difference as well as the geographical distance.
We selected RSS feeds for a diverse range across the world, from most continents (including a news source about Antarctica) as well as Detroit and CNN for a more local comparison with our own news feed in Ann Arbor. Romefeeder was used to download and parse the RSS feeds.
The machine learning part of the code uses the tfidf functionality of the Wekaizing library. This set of algorithms ranks the importance of words to a document in a larger set of documents by increasing the priority of words that occur frequently in the document, while decreasing its priority if it occurs commonly in all of the documents in the set. Once we compute the most important words for each news article, we then compare the list of top words for each document to determine which news articles are similar in content across news sources.
The final output of the project is the visualizer. In this image, the size of the circles represents the number of stories in the RSS feed, which means locations with a larger circle might have results more relvant to Ann Arbor simply because of the number of stories in its feed, not as much because of an actual similarity.
The real product here is the distance between each circle and the circle in the center representing Ann Arbor. The closer a news source, the more similar its news stories are to the news in Ann Arbor. Of note are Detroit, which makes since as it had a large number of stories in its feed as well as its geographical proximity. More strangely, Antarctica and South Africa are close as well. Perhaps the science-related stories of Antarctica are similar to Ann Arbor’s research and environmental tendencies? Also, CNN has almost no relevancy to Ann Arbor’s likely more local news, surprisingly.