How Erlang and the dutch railways power a real-time data visualization

Posted: November 8th, 2011 | Author: | Filed under: TIMEMAPS | Tags: , , , , , , , , | No Comments »

The backend of the TIMEMAPS project is based on the Zotonic web framework and Erlang. This article highlights the technical challenges and concessions that were considered while building the visualization.

The NS API and its limitations

The NS, the dutch railway system provider, provides an API which allows a developer to build upon it. While it is a nice effort and opens up a lot of possible applications, we found out that for the TIMEMAPS project it was not an ideal API to work with.

But, our requirements were pretty ambitous to begin with, and, from a practical point of view, not what an API designer would call “typical”. In TIMEMAPS, given any point T in time, we need to know, for every train station, how long it takes to travel at moment T to any other train station in the netherlands. Even for a small country like the Netherlands, this becomes a pretty big matrix of travel possibilities, given that there are 379 train stations in the country.

Ideally, for every element in this matrix an API call has to be done to get the actual planning.

Given that the NS API only allows an app to do up to 50.000 requests per day and we did not want to hammer the already stressed API servers too much, we needed to come up with a solution, while not sacrificing the real time aspect too much.

An open source travel planner..?

Another API call that the NS offers are the “Actuele vertrektijden”: given a station, return the 10 first trains that depart from it. It returns also the train numbers: a “unique” number which is assigned to a train on a single trajectory for the day (it might be re-used though in time). By linking the departure times from different stations through this train number, it should possible to see when a train that departs from A passes through B, if it is on the same trajectory.

However, some drawbacks popped up while implementing this approach.

  • For long trajectories (>1h) this approach did not work since the arrival station did not yet list the departure of the train you departed on since it was too far in the future
  • There was no API call for arrival times for trains on stations: this made it impossible to take the stopover-time into account and it was not possible to use this planning mechanism for destinations on the very end of the trajectory (e.g., no departure listed for the arriving train)
  • Doing a “naive” planning this way takes a considerable amount of database processing power as each stopover adds 2 self-joins to the database query, thus increasing exponentially in complexity.

Scraping of the departure times gives a increasingly complete graph of the railway system, and this graph, combined with the geographical location of stations might be used in a search algorithm to make an offline planner. For me however this aproach was too far of a longshot for the already pretty complex project so I decided to put this approach in the fridge for now.

However, this effort has brought me in contact with the OpenOV guys who are dedicated to liberate all public transportation data in the Netherlands. In the future, I hope I can contribute something to their wonderful initiative.

Doing consessions

Luckily, the TIMEMAPS project had one “business rule” with respect to its visualization: only stations that are near the border of the map are allowed to modify the map. That made the list of stations considerably smaller: after selection there were 60 stations left.

However this limited the practical application of the map in that some of the displayed travel times are not accurate: for the remaining, smaller / non-border stations we chose to interpolate the travel times between the “main” stations: an inaccuracy, given the fact that it often takes longer to travel from a minor station (e.g. Eindhoven Beukenlaan) to any other city. But for the sake for the clarity of the visualization, we agreed on this concession.

Data model & worker processes

There are two worker processees running in the background.

One process constantly (approximately 1 request per 1.5 second) queries the NS API for any A → B trip that has no planning in the future. This process favors distance: it tries first to find plannings for longest A → B trajectories, since the NS API also returns every timing information for intermediate stops, allowing to get more than one planning per API request. This planning information is stored in the database and kept for at least a week.

Table "public.static_planning"

 Column             |            Type             | Modifiers
--------------------+-----------------------------+------------------------
 id                 | integer                     | not null
 station_from       | character varying(32)       | not null
 station_to         | character varying(32)       | not null
 time               | timestamp without time zone | not null
 duration           | integer                     | not null
 ns                 | boolean                     | not null default true
 fetchtime          | timestamp without time zone |
 spoor              | character varying(32)       |
 aankomstvertraging | integer                     |
 vertrekvertraging  | integer                     |

Another process constantly queries the Actuele Vertrektijden API for every station (not only border stations). This information is used for the “fallback” scenario of step 3), in which no real planning is found for the station combination and we fall back on a fixed travel time, but do include the scraped departure time.

Table "public.vertrektijd"
     Column     |            Type             | Modifiers
----------------+-----------------------------+-----------
 station        | character varying(32)       | not null
 time           | timestamp without time zone | not null
 vertraging     | integer                     | not null
 ritnummer      | integer                     | not null
 eindbestemming | character varying(32)       |
 fetchtime      | timestamp without time zone |

Building the travel time matrix

The current map exposes an API to the N^2 matrix of the current time at the URL /api/reisplanner/actueel. It is a JSON long list where each entry looks like this:

["std",
 "amf",
 "2011-10-30 13:13:00",
 7440,
 "2b"]

This particular entry shows that the next train from Sittard (std) to Amersfoort (amf) leaves on 13:13h, from track 2B and takes 7440 seconds (2 hours and 4 minutes). For every station to another station (for the “border stations”) there is an entry in this list.

A second URL, /api/reisplanner/history?date=2011-10-29T22:00:00Z, gives this list for a certain date in the past.

Given the fact that we were unable to query every planning in real time, these results are build up in a three-step phase:

  1. Given each station A, B, check if there has been a planning retrieved for A → B for which the start time is in the future. Return the planning that is closest to the current time.
  2. Failing condition 1), check if there has been a planning retrieved for A → B last week. Return the planning that is closest to the current time minus 7 days. We assume that for every day of the week, the planning is the same. Note that this does not hold for holidays / festive days.
  3. Failing condition 1) and 2), return the planning tuple in which we assume a constant, pre-fetched travel time (a static matrix for times between A and B without time information). We assume that the first train leaving for A is the right train for getting to B.

A combiner algorithm retrieves for every station-to-station combination the results from step 1, otherwise those from step 2 and as final fallback step 3 (which always has a result, although it might not be accurate).

mod_reisplanner – the module making all this happen

Above processes have all been implemented in Erlang as a module for Zotonic. It will be open-sourced soon, so that it hopefully can serve as a basis and/or inspiration for other applications using Erlang and the NS API.

This article is the second in a series about the TIMEMAPS project.  TIMEMAPS’ concept and design are by Vincent Meertens, the implementation is by Arjan Scherpenisse.


Glanceable news using the new NOS API

Posted: January 4th, 2011 | Author: | Filed under: Manifestations | Tags: , , , | 2 Comments »

I had been aware of the fact that our Dutch public broadcaster NOS was going to open up their news using an API. During the run up to this I gave them a couple of pointers concerning what we think are great APIs and must-haves for this kind of thing.

Then they launched open.nos.nl to a limited amount of people at first to play around with and see the initial beginnings. It is quite modest right now with three end-points and not a lot of functionality or content to play with, but it’s a solid beginning and it is clearly meant to grow.

Ons Nieuws — Het laatste nieuws van NOS.nl

Busy though we are right now, I got a bit of an itch on Monday and whipped together this simple example: Ons Nieuws which is a glanceable news display of the latest news on NOS.nl meant to replace the tt101 screens that run at pretty much every news/publicity department.

Ons Nieuws is heavily influenced among others by: DEXTR. Glanceability and ambiance seems to be getting back in vogue also with apps such as Trickle. This thing we whipped up is really a very simple proof of concept version mainly meant to exercise the API, but it may lead to more serious applications.


Opinion piece on public transportation data in Dutch newspaper NRC.next

Posted: October 28th, 2010 | Author: | Filed under: Publications | Tags: , , , , , , , , , , | No Comments »

Yesterday Dutch daily newspaper NRC.next published an opinion piece I’d written for them concerning Dutch public transportation information. Alexander Klöpping had written a piece before and mine was to be a more specific follow-up on a certain kind of information: transit.

The situation is that Dutch transit information is controlled by a single private entity who are not sharing any of their data. The Dutch government plans to write a tender for a public data warehouse (to be called: NDOV) to be built with all transit information in it. Now this is both a threat for the private institution currently exploiting the data and an opportunity for the Netherlands to make a —much needed— jump forward in the realm of transit information. Which makes this an interesting turning point.

After making the altogether clear and well-known argument for open transit data, I propose three hard requirements such a data warehouse should fulfill. Here in English:

  1. All public transit data: journey plans, live locations, departure times and (un)planned mutations in the NDOV have to be readable both for humans on a website and for computers via an API.
  2. All this data has to be freely accessible to everybody without limitations.
  3. There always has to be a high quality journey planner but others have to be allowed to build their own or build on top of it.

Doesn’t sound too unreasonable, does it?

Full text in Dutch on Alper’s personal weblog for those interested.


Speaking at the Club of Amsterdam

Posted: September 25th, 2010 | Author: | Filed under: Talks | Tags: , , , , , , | 5 Comments »

For an upcoming presentation at the Club of Amsterdam on the Future of Hacking, Alper will be presenting on an intersection of APIs, public information and game design titled: “Civic duty in a hyper-connected world”

Brief:

The Internet of Things is an abstract term for something that is quickly becoming real. The world is being filled with sensors and actuators, all of which are linked into systems and being fed back to us with real and virtual displays. This is changing the fabric of society and the definition of what society is. It is our responsibility as citizens of this hyper-connected world to hack our environments to work better however we can.

Hope to see you there and have a lively discussion on this topic.