A full Twitter index in your Thinkup

Posted: April 25th, 2012 | Author: | Filed under: Research, Talks | Tags: , , , , , | 7 Comments »

An interesting bit of news came to light at Privacy International a while back: “What does Twitter know about its users?”

It is possible for residents of the EU to request from Twitter all of the data it has stored about them in accordance with European data protection laws (just follow the steps). Some Twitter users have requested their data and filled in the necessary paperwork. After a while they have gotten all of their records including a file with all of their tweets in it.

I had seen Martin Weber’s post about this before but when I saw Anne Helmond post about her experiences as well, I was prompted to carry out the idea I’d had before: to import an entire Twitter archive into Thinkup to complement the partial archive it contains of my longtime Twitter use (since September 2006).


I use Thinkup myself enthusiastically to supplement existing archival, statistics and API functionality around the web and more importantly to have it under my own control. These services serve as my social memory and it is nice to have a copy of them that can’t disappear because of some M&A mishap. It has proven useful more than once to be able to search through either all of my tweets or all of my @replies. But as noted, Thinkup can only go back 3200 tweets from when first you install it because of Twitter API limits. For people like me (35k tweets) or Anne (50k tweets), that’s just not enough.

I installed a new Thinkup on a test domain and asked for (sample) files from Anne and Martin and went at it. Command-line being the easiest, I took the upgrade.php script, ripped out most of its innards and spent an afternoon scouring the Thinkup source code to see how it does a Twitter crawl itself and mirrored the functionality. PHP is not my language of choice (by a long shot), but I have dabbled in it occasionally and with a bit of a refresher it is pretty easy to get going.

I finally managed to insert everything into the right table using the Thinkup DAO but it still wasn’t showing anything. Gina Trapani —Thinkup’s creator— told me which tables I had to supplement for the website to show something and after that it worked! A fully searchable archive of all your tweets in Thinkup.

web_martin on Twitter | ThinkUp

The code is a gist on Github right now and not usable (!) without programming knowledge. It is hackish and needs to be cleaned up, but it works ((It should scan available instances and only import tweets if they match an instance in your install among many many other things.)). Ideally this would eventually become a plugin for Thinkup but that is still a bit off.

What’s the point of all this? There are a couple:

First it shows that data protection laws such as the ones we have in Europe do have an effect (see also for instance: Europe v. Facebook). Even on the internet laws have teeth and practical applications. Data protection laws can be useful if they are drafted on general principles and applied judiciously.

But the result you get: a massive text file in your inbox is not the most usable way to use or explore half a decade’s worth of social media history. That’s where Thinkup comes in. It’s brilliant functionality serves as a way to make this data live again and magnifies for each person the effect of their data request.

Secondly, for any active user of Thinkup, supplementing their archive with a full history is a definitive WANT feature. Twitter has been very lax in providing access to more than the last 3200 tweets. If a lot of users used their analog API to demand their tweets, Twitter may be forced to create a general solution sooner.

Lastly, Thinkup has applied for funds with the Knight Foundation to turn itself into a federated social network piggy-backed on top of the existing ones. Thinkup would draw in all of the data that is already out there into its private store and then build functionality on top of that (sort of an inverse Privatesquare). Having access to all of your data would be a first step for any plan that involves data ownership and federation.

I presented this hack yesterday at the Berlin Hack and Tell. Your ideas and comments and help are very welcome.