Geography 970

May 6, 2010

Animating Twitter Data

Filed under: Uncategorized — Tags: , , , , , , — Tim Wallace @ 3:05 pm

Background on 970

This blog documents an exploration shared by the seminar attendees of Geography 970: The Geoweb, during the spring of 2010, at the Geography Department of the University of Wisconsin–Madison.

Instructor: Dr. Mark Harrower
Students (in the alphabetical order): Fei Du, Jing Gao, Daniel Huffman, Kevin McGrath, Matt Moehr, Tim Wallace, Jeremy White

The seminar coalesced around a team project. While designing and developing the products, we discussed and debated over various ideas, methods, data and tools, and challenges of contemporary web-based cartography, through both weekly meetings and blogging.

Project Goal

Twitter is a rich source of instant information on people’s locations; about one tenth of all tweets are geocoded, meaning they are marked with the location that the person was at when they tweeted. This geocoding gives us a door to examine if there are spatial patterns in the use of Twitter. Where are most people tweeting from? What about their friends?

At the beginning of the semester, we reviewed existing online visualizations of Twitter, and found that although many exist, few touched on the underlying geographic phenomena. So, we set a goal to explore and discover effective ways to summarize this massive data set, and to make the unapparent emerge.

Twitter Hitter

Twitter provides free access to subset of tweets (called “the garden hose”) or about 10% of tweets on a live, streaming basis. To make accessing and processing this stream easier, Jeremy White wrote Twitter Hitter, an application which listened in on the Twitter stream and wrote out the results to an Excel spreadsheet. Twitter Hitter also allowed us to select which parts of the data stream we wanted to record and to ignore any tweets that didn’t match our criteria. For example, one of the projects listed below followed the geography of re-tweets and it was necessary to find any tweets where *both* the location of the original poster and the location of the re-tweeters were known. Without Twitter Hitter, this kind of sifting and sorting of the 1 million + daily tweets would have been (nearly) impossible.

GeoData in the Stream

“Location” has 2 meanings in the world of Twitter. It can mean (1) where someone was when they tweeted provided they are using a GPS-enabled smartphone, or (2) where someone lives (users can specify their home location). The second is less helpful because, of course, people can tweet while away from home, often while on the other side of the planet. For example, in one of the animations below you can see researcher and scientists located at the South Pole tweeting to friends and colleagues back home in South Korea.

Geographic data in the form of lat/long pairs is encoded in the Twitter data stream from third party applications such as ÜberTwitter or by mobile devices such as iPhones. These geographic coordinates provided the platform for exploring the geography of Twitter.  The stream also has optional user-added locations or addresses. Since approximately 90% of the stream was without coordinates, significant time was devoted to an attempt to transform the “user location” field (such as “New York City” or “Galveston, Texas”) into lat/lon pairs.  Ultimately, however, the processing time associated with georeferencing tens of thousands of points proved prohibitive.  Additionally, there were problems with getting accurate coordinates from a highly variable text field — if a user gives their location as “Madison,” for example, do they mean Madison, Wisconsin? Madison, Alabama?  For more on why our project only used 10% of available Tweets, see this post.

ANIMATION #1: Mapnodes Twitter Animation

As a part of his PhD research, Jeremy White, is authoring a new tool for the Cartography/GIS community.  Mapnodes is a platform for connecting independent map-design tasks, such as line generalization, hill shading or – in this case -animation.  For more info, check out Mapnodes.

ANIMATION #2: Processing/KML Project

Global maps show a lot of interesting trends, but some of the replies and Twitter activity is only grasped at the city-scale. We looked into multiple options for providing user interactivity to browse the data at multiple scales. We found Google Earth/KML, Flash, and Blender to be choppy and just not pretty enough.

To show some of the interesting local stories, we used the same data and some 3D visualization techniques in the Processing language to create a tour. The final movie file is available for viewing, but we can also provide the .kml and .jar files that went into creating the final movie.

ANIMATION #3: Lava Map

From where do most tweets originate? The obvious answer would be large, wealthy cities such as Tokyo or New York. In order to get beyond the obvious, we decided to look at how many tweets were originating out of an area, divided by that area’s population. Many people in New York tweet, to be sure, but is the number of tweets per person as high as it might be in some smaller cities?  For more on this process, see this post.

Advertisements

Create a free website or blog at WordPress.com.