How Erlang and the dutch railways power a real-time data visualization

Posted: November 8th, 2011 | Author: | Filed under: TIMEMAPS | Tags: , , , , , , , , | No Comments »

The backend of the TIMEMAPS project is based on the Zotonic web framework and Erlang. This article highlights the technical challenges and concessions that were considered while building the visualization.

The NS API and its limitations

The NS, the dutch railway system provider, provides an API which allows a developer to build upon it. While it is a nice effort and opens up a lot of possible applications, we found out that for the TIMEMAPS project it was not an ideal API to work with.

But, our requirements were pretty ambitous to begin with, and, from a practical point of view, not what an API designer would call “typical”. In TIMEMAPS, given any point T in time, we need to know, for every train station, how long it takes to travel at moment T to any other train station in the netherlands. Even for a small country like the Netherlands, this becomes a pretty big matrix of travel possibilities, given that there are 379 train stations in the country.

Ideally, for every element in this matrix an API call has to be done to get the actual planning.

Given that the NS API only allows an app to do up to 50.000 requests per day and we did not want to hammer the already stressed API servers too much, we needed to come up with a solution, while not sacrificing the real time aspect too much.

An open source travel planner..?

Another API call that the NS offers are the “Actuele vertrektijden”: given a station, return the 10 first trains that depart from it. It returns also the train numbers: a “unique” number which is assigned to a train on a single trajectory for the day (it might be re-used though in time). By linking the departure times from different stations through this train number, it should possible to see when a train that departs from A passes through B, if it is on the same trajectory.

However, some drawbacks popped up while implementing this approach.

  • For long trajectories (>1h) this approach did not work since the arrival station did not yet list the departure of the train you departed on since it was too far in the future
  • There was no API call for arrival times for trains on stations: this made it impossible to take the stopover-time into account and it was not possible to use this planning mechanism for destinations on the very end of the trajectory (e.g., no departure listed for the arriving train)
  • Doing a “naive” planning this way takes a considerable amount of database processing power as each stopover adds 2 self-joins to the database query, thus increasing exponentially in complexity.

Scraping of the departure times gives a increasingly complete graph of the railway system, and this graph, combined with the geographical location of stations might be used in a search algorithm to make an offline planner. For me however this aproach was too far of a longshot for the already pretty complex project so I decided to put this approach in the fridge for now.

However, this effort has brought me in contact with the OpenOV guys who are dedicated to liberate all public transportation data in the Netherlands. In the future, I hope I can contribute something to their wonderful initiative.

Doing consessions

Luckily, the TIMEMAPS project had one “business rule” with respect to its visualization: only stations that are near the border of the map are allowed to modify the map. That made the list of stations considerably smaller: after selection there were 60 stations left.

However this limited the practical application of the map in that some of the displayed travel times are not accurate: for the remaining, smaller / non-border stations we chose to interpolate the travel times between the “main” stations: an inaccuracy, given the fact that it often takes longer to travel from a minor station (e.g. Eindhoven Beukenlaan) to any other city. But for the sake for the clarity of the visualization, we agreed on this concession.

Data model & worker processes

There are two worker processees running in the background.

One process constantly (approximately 1 request per 1.5 second) queries the NS API for any A → B trip that has no planning in the future. This process favors distance: it tries first to find plannings for longest A → B trajectories, since the NS API also returns every timing information for intermediate stops, allowing to get more than one planning per API request. This planning information is stored in the database and kept for at least a week.

Table "public.static_planning"

 Column             |            Type             | Modifiers
--------------------+-----------------------------+------------------------
 id                 | integer                     | not null
 station_from       | character varying(32)       | not null
 station_to         | character varying(32)       | not null
 time               | timestamp without time zone | not null
 duration           | integer                     | not null
 ns                 | boolean                     | not null default true
 fetchtime          | timestamp without time zone |
 spoor              | character varying(32)       |
 aankomstvertraging | integer                     |
 vertrekvertraging  | integer                     |

Another process constantly queries the Actuele Vertrektijden API for every station (not only border stations). This information is used for the “fallback” scenario of step 3), in which no real planning is found for the station combination and we fall back on a fixed travel time, but do include the scraped departure time.

Table "public.vertrektijd"
     Column     |            Type             | Modifiers
----------------+-----------------------------+-----------
 station        | character varying(32)       | not null
 time           | timestamp without time zone | not null
 vertraging     | integer                     | not null
 ritnummer      | integer                     | not null
 eindbestemming | character varying(32)       |
 fetchtime      | timestamp without time zone |

Building the travel time matrix

The current map exposes an API to the N^2 matrix of the current time at the URL /api/reisplanner/actueel. It is a JSON long list where each entry looks like this:

["std",
 "amf",
 "2011-10-30 13:13:00",
 7440,
 "2b"]

This particular entry shows that the next train from Sittard (std) to Amersfoort (amf) leaves on 13:13h, from track 2B and takes 7440 seconds (2 hours and 4 minutes). For every station to another station (for the “border stations”) there is an entry in this list.

A second URL, /api/reisplanner/history?date=2011-10-29T22:00:00Z, gives this list for a certain date in the past.

Given the fact that we were unable to query every planning in real time, these results are build up in a three-step phase:

  1. Given each station A, B, check if there has been a planning retrieved for A → B for which the start time is in the future. Return the planning that is closest to the current time.
  2. Failing condition 1), check if there has been a planning retrieved for A → B last week. Return the planning that is closest to the current time minus 7 days. We assume that for every day of the week, the planning is the same. Note that this does not hold for holidays / festive days.
  3. Failing condition 1) and 2), return the planning tuple in which we assume a constant, pre-fetched travel time (a static matrix for times between A and B without time information). We assume that the first train leaving for A is the right train for getting to B.

A combiner algorithm retrieves for every station-to-station combination the results from step 1, otherwise those from step 2 and as final fallback step 3 (which always has a result, although it might not be accurate).

mod_reisplanner – the module making all this happen

Above processes have all been implemented in Erlang as a module for Zotonic. It will be open-sourced soon, so that it hopefully can serve as a basis and/or inspiration for other applications using Erlang and the NS API.

This article is the second in a series about the TIMEMAPS project.  TIMEMAPS’ concept and design are by Vincent Meertens, the implementation is by Arjan Scherpenisse.


Introducing Arjan Scherpenisse

Posted: November 7th, 2011 | Author: | Filed under: TIMEMAPS | Tags: , , , | No Comments »

It is my pleasure to introduce here on Monster Swell a new collaboration and a spectacular piece of work. Arjan Scherpenisse of Miracle Things will be collaborating with us in the field of data visualization.

Arjan is that rare breed of artist né programmer formally trained in both but picking neither side. He is active on the most innovative edge of software as well as building physical interaction projects and schooling others in programming be it in Erlang or some other language.

The TIMEMAPS project written up just before this post is the first of we hope many forays into data visualization for Arjan and we look forward to collaborate on many such projects in the future.


TIMEMAPS: a different perspective

Posted: November 7th, 2011 | Author: | Filed under: TIMEMAPS | Tags: , , , , | No Comments »

TIMEMAPS visualizes how the map of the Netherlands would look if it would be scaled proportionally to the travel times (by train) between cities. I was asked by the designer of the concept, Vincent Meertens of graphsic, to transform his manually crafted PDF files into a real-time, interactive visualization. TIMEMAPS has been exhibited at the Graduation Show event during the Dutch Design Week 2011.

The map is a real-time interactive map. Clicking a city allows one to set the perspective to a city of his choice. Hovering the map shows a pop-up which highlights the time it takes to travel to the city the mouse is currently over. Every coloured “ring” on the map denotes 30 minutes of travel time, at the current time.

Drawing the map with Canvas

The visualization is done using the HTML5 canvas. Why canvas, and not just SVG, one would ask? Good question: I wanted to learn more about the canvas and thus was a bit biased. I think the project could have been done with SVG as well.

The map consists of a set of polygons: the outline of the Netherlands and its various islands. All the cities are located on those shapes with all 379 train stations. Furthermore, there are several bridges between the islands, like the big “Afsluitduik”, which each connect 2 vertices of the polygons.

The initial, un-transformed shape of the country and the station positions is the same as that on the famous yellow overview map that the NS uses in the stations: it is a schematic view of the Netherlands, constrained in a grid of 0, 45 and 90-degree lines.

The drawing algorithm first draws all the polygons and bridges, and subsequently fills those areas with a pattern of colored concentric circles. This is done in canvas by blitting the previous shape with a pre-rendered image of the circles using the compositeOperation method. The distances between the circles are scaled to represent 30 minutes of travel time. Then, the cities are drawn as big/small dots (main stations are bigger) and connected to the current city by a thin white line.

The information hovers (a plain HTML div) are done by using the “mousemove” event on the canvas and calculating which city is the closest to the current mouse location. Clicking a city causes the current perspective to shift to the clicked city in an animated fashion, using a simple (cosine) transition.

Map deformation

The angles at which cities view each other are kept constant. So, for example, viewed from Rotterdam, Utrecht centraal is always at a 45-degree angle, regardless of the time it takes to travel from Rotterdam to Utrecht. The actual city location is scaled proportionally along these angles: if it takes less time to travel, the city is pulled closer; if it takes more time it is pushed further away. But the angle remains constant.

The polygons (that make the actual shape of the map) are “magnetic” and each vertex “sticks” to the cities it is initially closest to, in a weighted fashion. This algorithm is loosely based on the article “Feature point based mesh animation applied to MPEG-4 facial animation”.

For the islands, this mesh-stretching was mixed 60%/40% with a simple vertex displacement to prevent the islands from becoming unrecognizable: since there are no stations on islands, they are prone to more deformation since the feature points (cities) lie further away.

Problems in the visualization

The shape of the map sometimes is deformed beyond recognition because in certain cases cities which are normally close are being pushed away beyond cities that are normally far away: thus causing the polygon to turn “inside out” and cause cities to appear to be located in the sea.

Another issue is the 45-degree grid constraint: the mesh stretching algorithm does not take this into account because this constraint is applied in a later calculation stage: this sometimes causes cities to be located in the sea as well. A temporary solution for this was to add more vertices to the polygons so the map had more flexibility while stretching.

Application to other maps

The Netherlands is a pretty ideal country in the way the transportation system is organized: viewed from the center, “de Randstad”, or Utrecht or Amersfoort, it is indeed so that travel times do increase almost linearly with geographical distance. I do not think this holds for every country: especially with the advance of faster railways (the fast Fyra train was not taken into account in our implementation!), the map might deform in ways that are beyond recognition and beyond representation in the 2D domain.However it might be an interesting experiment to apply the same techniques to a different country.

This article is the second in a series about the TIMEMAPS project. TIMEMAPS’ concept and design are by Vincent Meertens, the implementation is by Arjan Scherpenisse.