It’s been a bit quiet here, but to give an update: We are currently working on some data science projects for non-disclosed clients. This week also our principle participated in the Moscow Urban Forum on the subject of open data.
Just to post a recap of the things that have happened during the last year. While it has been awfully quiet around here a lot of things have been going on behind the screens and in other places.
We did most of our data work within Hack de Overheid / Open State where we believe our efforts produced the largest possible impact. During the past two years as part of my program direction, curation, advice and advocacy along with the help of my extremely capable fellows has enabled a series of events and a network to evolve in the Netherlands, the likes of which the world has not yet seen.
A summary would already be too long to post here (just skim the weeknotes to get a glimpse), but to name some highlights: several app competitions, numerous hackathons (the next of which is in Rotterdam) with apps developed in the hundreds and a range of theme networks around certain subjects. We have turned the Netherlands from an open data laggard into a country with one of the most vibrant ecosystems in Europe. Doing this together taught me one of the most important lessons in doing business: people working together can achieve more than they can alone.
To add to that, the foundation we built is going to be even stronger in 2013 which will alleviate my personal involvement in the operational day-to-day and enable me to focus on my own data business again.
On the services part, we have diversified from a pure visualization offering to a set of services that emphasize product development in a big data context as well as taking on teaching to develop data capabilities within organizations.
We focus on Analysis, Creation and Teaching because we have found that a pure focus on visualization is not pertinent to solve significant problems for most organizations. The issues they deal with are far more complex and capable visualizations are only one small part of the solution.
Added to that we found that data literacy at all levels in society from decision makers to journalists and designers to the general public is severely lacking. This creates a challenging environment for those busy in the field of data. Explaining the work they do to the general public is not only in a practitioner’s self-interest, it is also a moral responsibility.
Also we moved our base of operations to the European startup capital Berlin while still maintaining offices in the Netherlands.
The presentation ‘Designing in the Face of Defeat’ written up below, has been recorded by the kind people of the WdKA and is viewable below:
Or you can view it on Blip.tv.
We’ve been pleased with the feedback on this blend of algorithms, new aesthetic and object-oriented ontology. More to follow.
I called this talk I gave for the Willem de Kooning Academy’s CrossLab night ‘New Design for a New Aesthetic’ initially, but I reconsidered that title. Not because of the person who took semantic issue with the idea of a ‘new aesthetic’, I couldn’t really care less about that. The idea that there can be a new design that addresses the issues within the New Aesthetic is just too ambitious. We cannot possibly succeed which is why I’m calling this discipline we’re engaged in: designing in the face of defeat (I blogged about this before) and it is what we will be doing for the foreseeable future.
I pre-rolled a screencapture of Aaron Straup Cope’s Wanderdrone to add ominous foreboding to the mix of design/advertising enthusiasm permeating the room. Crosslab called the night a night about Dynamic Design, which I didn’t really get, but I retook an old talk about algorithmic design but now heavily updated to incorporate current thinking about algorithms, the new aesthetic and object oriented ontology.
Given the fact that we as Monster Swell are a company that does a lot of stuff with maps we are affected by the fact that mapping is being turned on its head. And it’s not because there aren’t enough interesting maps, there are now more than ever before. Just to show a couple.
Eric Fischer’s Twitter Traffic Maps of various cities, here New York:
Google Hotel Finder with an isochrone projection in London:
Timemaps, a map of the Netherlands distorted by the amount of travel time required during various times of day:
But right now there is a projection inversion going on where a lot of the time we no longer project the real world onto flat surfaces and call that maps, but where we overlay maps themselves back onto reality. And what we call maps does not need to have any relation with physical reality anymore, we can map anything onto anything using any (non-)geometric form we choose.
This is mainly a consequence of us putting the internet into maps. But if you think again, the internet is not the only place where we put maps. We put the internet into pretty much everything by now.
So maps are creeping back into the real world and we get odd clashes when we try to overlay a map back onto the territory or when we try to perfectly capture a capricious world, as you can see in these Google Maps and Street View examples: 1, 2, 3, 4. I don’t know how long they will be online over at the New Aesthetic Tumblr since that has been closed by James Bridle right now.
We got QR codes to enable the machine readable world. These hardly have any real world use (just go over to the WTF QR Codes Tumblr) but they function more as cultural icons, precursors of a strange and inscrutable future.
And even more interestingly they are being used for instance by the Chinese to calibrate spy satellites. So these are maps on the earth that are being used to create better maps of the earth.
The New Aesthetic is when this kind of projection inversion happens more widely, not just in the realm of maps, but in all of the places in the world that the internet touches. By now that is nearly everything. The examples that were being collected over at the New Aesthetic Tumblr showed how the arts were picking up on this trend.
All of these things have been created by algorithms which are not as mysterious as many people make them out to be. Algorithms are how computers work and increasingly how the world works. They codify behaviour and quoting Robert Fabricant, as designers ‘behaviour is our medium’. Being a designer should entail more than a passing knowledge of and proficiency with algorithms. We are moving into a world where creative work is becoming procedural. The most important media are prescriptive and set rules for the world more than they are descriptive and depict the world.
The real problem with algorithms is that they often involve us but they are completely alien to us (in the Bogostian sense). They are operationally closed. Operational Closure means that things may work in ways that are not at all obvious to us, neither at first nor after we poke into them because any kind of sense we make of it is either partial or does not translate into our frame of reference. Algorithms get inputs and perform outputs but the way they operate on these has nothing to do with how we as humans think about the world. Think is not even the right word, but we try to relate to them from our human cognition. The machines see us, but they do not ‘see’ us in any way we would recognize as seeing and we have no idea what it is that they see.
The Machine Pareidolia experiment over at Urban Honking is a good example.
The ability to see faces in things is a basic aspect of our visual pattern recognition. When we teach that same skill to computers we get unexpected consequences. It is the same with the flash crash on the stock market that happens in the blink of an eye without anybody really knowing what caused it. The rationales of the algorithms are opaque to us and their emergent behaviour unpredictable.
As Kevin Slavin mentioned in an interview: the more autonomous the algorithms are and the more effects they have on our daily lives, the more we may be accommodating them without realizing it.
There was this story recently that scientists have created a robot fish that is so good at mimicking the behaviour of regular fish that it can become their leader. This is what worries me. Who says we are not all following robot fish most of the time?
So that is what I think is the biggest challenge right now for designers. Try to create systems that harness the open and generative power of the internet while on the other hand remaining human and aligned with human interest. One way would be to make the internals of algorithms transparent so people can enter into an informed relationship with them.
Unfortunately there are no magic bullets for this whatever your local design visionary has been telling you. There never have been. Everything is made up of withdrawn objects that are mediated towards one another with unexpected consequences. To quote Graham Harman from the Prince of Networks:
“the engineer must negotiate with the mountain at every stage of the project, testing to see where the rock resists and where it yields, and is quite often surprised by the behaviour of the rock.”
There are no ideas that will solve all problems, there are no products that will do everything. There is only the work through which we may gain more understanding and make better things. So with that, I hope we all can do good work.
An interesting bit of news came to light at Privacy International a while back: “What does Twitter know about its users?”
It is possible for residents of the EU to request from Twitter all of the data it has stored about them in accordance with European data protection laws (just follow the steps). Some Twitter users have requested their data and filled in the necessary paperwork. After a while they have gotten all of their records including a file with all of their tweets in it.
I had seen Martin Weber’s post about this before but when I saw Anne Helmond post about her experiences as well, I was prompted to carry out the idea I’d had before: to import an entire Twitter archive into Thinkup to complement the partial archive it contains of my longtime Twitter use (since September 2006).
I use Thinkup myself enthusiastically to supplement existing archival, statistics and API functionality around the web and more importantly to have it under my own control. These services serve as my social memory and it is nice to have a copy of them that can’t disappear because of some M&A mishap. It has proven useful more than once to be able to search through either all of my tweets or all of my @replies. But as noted, Thinkup can only go back 3200 tweets from when first you install it because of Twitter API limits. For people like me (35k tweets) or Anne (50k tweets), that’s just not enough.
I installed a new Thinkup on a test domain and asked for (sample) files from Anne and Martin and went at it. Command-line being the easiest, I took the upgrade.php script, ripped out most of its innards and spent an afternoon scouring the Thinkup source code to see how it does a Twitter crawl itself and mirrored the functionality. PHP is not my language of choice (by a long shot), but I have dabbled in it occasionally and with a bit of a refresher it is pretty easy to get going.
I finally managed to insert everything into the right table using the Thinkup DAO but it still wasn’t showing anything. Gina Trapani —Thinkup’s creator— told me which tables I had to supplement for the website to show something and after that it worked! A fully searchable archive of all your tweets in Thinkup.
The code is a gist on Github right now and not usable (!) without programming knowledge. It is hackish and needs to be cleaned up, but it works ((It should scan available instances and only import tweets if they match an instance in your install among many many other things.)). Ideally this would eventually become a plugin for Thinkup but that is still a bit off.
What’s the point of all this? There are a couple:
First it shows that data protection laws such as the ones we have in Europe do have an effect (see also for instance: Europe v. Facebook). Even on the internet laws have teeth and practical applications. Data protection laws can be useful if they are drafted on general principles and applied judiciously.
But the result you get: a massive text file in your inbox is not the most usable way to use or explore half a decade’s worth of social media history. That’s where Thinkup comes in. It’s brilliant functionality serves as a way to make this data live again and magnifies for each person the effect of their data request.
Secondly, for any active user of Thinkup, supplementing their archive with a full history is a definitive WANT feature. Twitter has been very lax in providing access to more than the last 3200 tweets. If a lot of users used their analog API to demand their tweets, Twitter may be forced to create a general solution sooner.
Lastly, Thinkup has applied for funds with the Knight Foundation to turn itself into a federated social network piggy-backed on top of the existing ones. Thinkup would draw in all of the data that is already out there into its private store and then build functionality on top of that (sort of an inverse Privatesquare). Having access to all of your data would be a first step for any plan that involves data ownership and federation.
I presented this hack yesterday at the Berlin Hack and Tell. Your ideas and comments and help are very welcome.
The year has started nicely and we already have a nice line-up of events. Thursday a week ago saw the iBestuur Congress in the Netherlands the winners of the Apps voor Nederland competition were announced. I’m pleased we managed to shape the data and developer programme of this national event and how it turned out. See a write-up of the winners over at the Hack de Overheid site. Future plans along the same track are already being worked on.
There are two upcoming events at which I will be speaking that bear mentioning here.
There will be an evening in Pakhuis de Zwijger to celebrate the Nederland van Boven television series that the VPRO produced in the Netherlands ((Borrowing conceptually from Britain from Above among others.)). I will be joining the esteemed panel there as a board member of Hack de Overheid to talk about issues of democracy, participation and truth in cartography.
The week after that there’s “Social Cities of Tomorrow”. I will be speaking in a brief timeslot about Apps for Amsterdam about how you can create a data commons for your government of organization and where to take it from there.
Last week Sargasso had procured a dataset of interruptions from politicians in our House of Representatives. With the counts from which politician had interrupted which in debates they had made some nice infographics and a couple of blog posts. I thought this was the ideal opportunity to put all of the data (aggregated by party) in the D3 example chord diagram.
This was featured on Sargasso the next day.
The graphic is not directly clear, but the data is deep and interesting enough to afford some exploration and it yields insight into the behaviours of various political parties during the reign of this cabinet. And what seems to matter a lot to people: it looks quite pretty.
With regard to D3, I think I will use it more often. It works quite similar to Protovis with which we have done some stuff before, but it feels much more current. Protovis itself is discontinued in favor of D3 according to a notice on the site and D3 seems a very worth successor.
The backend of the TIMEMAPS project is based on the Zotonic web framework and Erlang. This article highlights the technical challenges and concessions that were considered while building the visualization.
The NS API and its limitations
The NS, the dutch railway system provider, provides an API which allows a developer to build upon it. While it is a nice effort and opens up a lot of possible applications, we found out that for the TIMEMAPS project it was not an ideal API to work with.
But, our requirements were pretty ambitous to begin with, and, from a practical point of view, not what an API designer would call “typical”. In TIMEMAPS, given any point T in time, we need to know, for every train station, how long it takes to travel at moment T to any other train station in the netherlands. Even for a small country like the Netherlands, this becomes a pretty big matrix of travel possibilities, given that there are 379 train stations in the country.
Ideally, for every element in this matrix an API call has to be done to get the actual planning.
Given that the NS API only allows an app to do up to 50.000 requests per day and we did not want to hammer the already stressed API servers too much, we needed to come up with a solution, while not sacrificing the real time aspect too much.
An open source travel planner..?
Another API call that the NS offers are the “Actuele vertrektijden”: given a station, return the 10 first trains that depart from it. It returns also the train numbers: a “unique” number which is assigned to a train on a single trajectory for the day (it might be re-used though in time). By linking the departure times from different stations through this train number, it should possible to see when a train that departs from A passes through B, if it is on the same trajectory.
However, some drawbacks popped up while implementing this approach.
- For long trajectories (>1h) this approach did not work since the arrival station did not yet list the departure of the train you departed on since it was too far in the future
- There was no API call for arrival times for trains on stations: this made it impossible to take the stopover-time into account and it was not possible to use this planning mechanism for destinations on the very end of the trajectory (e.g., no departure listed for the arriving train)
- Doing a “naive” planning this way takes a considerable amount of database processing power as each stopover adds 2 self-joins to the database query, thus increasing exponentially in complexity.
Scraping of the departure times gives a increasingly complete graph of the railway system, and this graph, combined with the geographical location of stations might be used in a search algorithm to make an offline planner. For me however this aproach was too far of a longshot for the already pretty complex project so I decided to put this approach in the fridge for now.
However, this effort has brought me in contact with the OpenOV guys who are dedicated to liberate all public transportation data in the Netherlands. In the future, I hope I can contribute something to their wonderful initiative.
Luckily, the TIMEMAPS project had one “business rule” with respect to its visualization: only stations that are near the border of the map are allowed to modify the map. That made the list of stations considerably smaller: after selection there were 60 stations left.
However this limited the practical application of the map in that some of the displayed travel times are not accurate: for the remaining, smaller / non-border stations we chose to interpolate the travel times between the “main” stations: an inaccuracy, given the fact that it often takes longer to travel from a minor station (e.g. Eindhoven Beukenlaan) to any other city. But for the sake for the clarity of the visualization, we agreed on this concession.
Data model & worker processes
There are two worker processees running in the background.
One process constantly (approximately 1 request per 1.5 second) queries the NS API for any A → B trip that has no planning in the future. This process favors distance: it tries first to find plannings for longest A → B trajectories, since the NS API also returns every timing information for intermediate stops, allowing to get more than one planning per API request. This planning information is stored in the database and kept for at least a week.
Table "public.static_planning" Column | Type | Modifiers --------------------+-----------------------------+------------------------ id | integer | not null station_from | character varying(32) | not null station_to | character varying(32) | not null time | timestamp without time zone | not null duration | integer | not null ns | boolean | not null default true fetchtime | timestamp without time zone | spoor | character varying(32) | aankomstvertraging | integer | vertrekvertraging | integer |
Another process constantly queries the Actuele Vertrektijden API for every station (not only border stations). This information is used for the “fallback” scenario of step 3), in which no real planning is found for the station combination and we fall back on a fixed travel time, but do include the scraped departure time.
Table "public.vertrektijd" Column | Type | Modifiers ----------------+-----------------------------+----------- station | character varying(32) | not null time | timestamp without time zone | not null vertraging | integer | not null ritnummer | integer | not null eindbestemming | character varying(32) | fetchtime | timestamp without time zone |
Building the travel time matrix
The current map exposes an API to the N^2 matrix of the current time at the URL /api/reisplanner/actueel. It is a JSON long list where each entry looks like this:
["std", "amf", "2011-10-30 13:13:00", 7440, "2b"]
This particular entry shows that the next train from Sittard (std) to Amersfoort (amf) leaves on 13:13h, from track 2B and takes 7440 seconds (2 hours and 4 minutes). For every station to another station (for the “border stations”) there is an entry in this list.
A second URL, /api/reisplanner/history?date=2011-10-29T22:00:00Z, gives this list for a certain date in the past.
Given the fact that we were unable to query every planning in real time, these results are build up in a three-step phase:
- Given each station A, B, check if there has been a planning retrieved for A → B for which the start time is in the future. Return the planning that is closest to the current time.
- Failing condition 1), check if there has been a planning retrieved for A → B last week. Return the planning that is closest to the current time minus 7 days. We assume that for every day of the week, the planning is the same. Note that this does not hold for holidays / festive days.
- Failing condition 1) and 2), return the planning tuple in which we assume a constant, pre-fetched travel time (a static matrix for times between A and B without time information). We assume that the first train leaving for A is the right train for getting to B.
A combiner algorithm retrieves for every station-to-station combination the results from step 1, otherwise those from step 2 and as final fallback step 3 (which always has a result, although it might not be accurate).
mod_reisplanner – the module making all this happen
Above processes have all been implemented in Erlang as a module for Zotonic. It will be open-sourced soon, so that it hopefully can serve as a basis and/or inspiration for other applications using Erlang and the NS API.
It is my pleasure to introduce here on Monster Swell a new collaboration and a spectacular piece of work. Arjan Scherpenisse of Miracle Things will be collaborating with us in the field of data visualization.
Arjan is that rare breed of artist né programmer formally trained in both but picking neither side. He is active on the most innovative edge of software as well as building physical interaction projects and schooling others in programming be it in Erlang or some other language.
The TIMEMAPS project written up just before this post is the first of we hope many forays into data visualization for Arjan and we look forward to collaborate on many such projects in the future.