How Erlang and the dutch railways power a real-time data visualization

Posted: November 8th, 2011 | Author: | Filed under: TIMEMAPS | Tags: , , , , , , , , | No Comments »

The backend of the TIMEMAPS project is based on the Zotonic web framework and Erlang. This article highlights the technical challenges and concessions that were considered while building the visualization.

The NS API and its limitations

The NS, the dutch railway system provider, provides an API which allows a developer to build upon it. While it is a nice effort and opens up a lot of possible applications, we found out that for the TIMEMAPS project it was not an ideal API to work with.

But, our requirements were pretty ambitous to begin with, and, from a practical point of view, not what an API designer would call “typical”. In TIMEMAPS, given any point T in time, we need to know, for every train station, how long it takes to travel at moment T to any other train station in the netherlands. Even for a small country like the Netherlands, this becomes a pretty big matrix of travel possibilities, given that there are 379 train stations in the country.

Ideally, for every element in this matrix an API call has to be done to get the actual planning.

Given that the NS API only allows an app to do up to 50.000 requests per day and we did not want to hammer the already stressed API servers too much, we needed to come up with a solution, while not sacrificing the real time aspect too much.

An open source travel planner..?

Another API call that the NS offers are the “Actuele vertrektijden”: given a station, return the 10 first trains that depart from it. It returns also the train numbers: a “unique” number which is assigned to a train on a single trajectory for the day (it might be re-used though in time). By linking the departure times from different stations through this train number, it should possible to see when a train that departs from A passes through B, if it is on the same trajectory.

However, some drawbacks popped up while implementing this approach.

  • For long trajectories (>1h) this approach did not work since the arrival station did not yet list the departure of the train you departed on since it was too far in the future
  • There was no API call for arrival times for trains on stations: this made it impossible to take the stopover-time into account and it was not possible to use this planning mechanism for destinations on the very end of the trajectory (e.g., no departure listed for the arriving train)
  • Doing a “naive” planning this way takes a considerable amount of database processing power as each stopover adds 2 self-joins to the database query, thus increasing exponentially in complexity.

Scraping of the departure times gives a increasingly complete graph of the railway system, and this graph, combined with the geographical location of stations might be used in a search algorithm to make an offline planner. For me however this aproach was too far of a longshot for the already pretty complex project so I decided to put this approach in the fridge for now.

However, this effort has brought me in contact with the OpenOV guys who are dedicated to liberate all public transportation data in the Netherlands. In the future, I hope I can contribute something to their wonderful initiative.

Doing consessions

Luckily, the TIMEMAPS project had one “business rule” with respect to its visualization: only stations that are near the border of the map are allowed to modify the map. That made the list of stations considerably smaller: after selection there were 60 stations left.

However this limited the practical application of the map in that some of the displayed travel times are not accurate: for the remaining, smaller / non-border stations we chose to interpolate the travel times between the “main” stations: an inaccuracy, given the fact that it often takes longer to travel from a minor station (e.g. Eindhoven Beukenlaan) to any other city. But for the sake for the clarity of the visualization, we agreed on this concession.

Data model & worker processes

There are two worker processees running in the background.

One process constantly (approximately 1 request per 1.5 second) queries the NS API for any A → B trip that has no planning in the future. This process favors distance: it tries first to find plannings for longest A → B trajectories, since the NS API also returns every timing information for intermediate stops, allowing to get more than one planning per API request. This planning information is stored in the database and kept for at least a week.

Table "public.static_planning"

 Column             |            Type             | Modifiers
 id                 | integer                     | not null
 station_from       | character varying(32)       | not null
 station_to         | character varying(32)       | not null
 time               | timestamp without time zone | not null
 duration           | integer                     | not null
 ns                 | boolean                     | not null default true
 fetchtime          | timestamp without time zone |
 spoor              | character varying(32)       |
 aankomstvertraging | integer                     |
 vertrekvertraging  | integer                     |

Another process constantly queries the Actuele Vertrektijden API for every station (not only border stations). This information is used for the “fallback” scenario of step 3), in which no real planning is found for the station combination and we fall back on a fixed travel time, but do include the scraped departure time.

Table "public.vertrektijd"
     Column     |            Type             | Modifiers
 station        | character varying(32)       | not null
 time           | timestamp without time zone | not null
 vertraging     | integer                     | not null
 ritnummer      | integer                     | not null
 eindbestemming | character varying(32)       |
 fetchtime      | timestamp without time zone |

Building the travel time matrix

The current map exposes an API to the N^2 matrix of the current time at the URL /api/reisplanner/actueel. It is a JSON long list where each entry looks like this:

 "2011-10-30 13:13:00",

This particular entry shows that the next train from Sittard (std) to Amersfoort (amf) leaves on 13:13h, from track 2B and takes 7440 seconds (2 hours and 4 minutes). For every station to another station (for the “border stations”) there is an entry in this list.

A second URL, /api/reisplanner/history?date=2011-10-29T22:00:00Z, gives this list for a certain date in the past.

Given the fact that we were unable to query every planning in real time, these results are build up in a three-step phase:

  1. Given each station A, B, check if there has been a planning retrieved for A → B for which the start time is in the future. Return the planning that is closest to the current time.
  2. Failing condition 1), check if there has been a planning retrieved for A → B last week. Return the planning that is closest to the current time minus 7 days. We assume that for every day of the week, the planning is the same. Note that this does not hold for holidays / festive days.
  3. Failing condition 1) and 2), return the planning tuple in which we assume a constant, pre-fetched travel time (a static matrix for times between A and B without time information). We assume that the first train leaving for A is the right train for getting to B.

A combiner algorithm retrieves for every station-to-station combination the results from step 1, otherwise those from step 2 and as final fallback step 3 (which always has a result, although it might not be accurate).

mod_reisplanner – the module making all this happen

Above processes have all been implemented in Erlang as a module for Zotonic. It will be open-sourced soon, so that it hopefully can serve as a basis and/or inspiration for other applications using Erlang and the NS API.

This article is the second in a series about the TIMEMAPS project.  TIMEMAPS’ concept and design are by Vincent Meertens, the implementation is by Arjan Scherpenisse.