Visualizing the tree of life


Visualizing the tree of life

TLDR: I made a website about how different species are related: taxonomy.schauderbasis.de

I wanted my children to understand how species are related to each other:

Words can only get you so far with complicated topics, so I looked into visualizations of the tree of life. There are some awesome websites out there:

It was interesting, but I had a hard time finding something that fit the questions my children have. They want to know about specific animal, those that they see in zoological garden.

What I needed was an interactive website that would show a tree of life for those species that I am interested in. I couldn’t find one. But I did find Wikidata.

Wikidata is like Wikipedia, but as a database. A graph database with a lot of very interesting design decisions. And it does have tons of taxonomy data in it (Taxonomy is the science of categorizing things, organisms in this case). Each instance of taxon (think of a taxon as a species) has a parent taxon. And that one has a parent taxon of its own, and so forth… until you arrive at Biota, the root of the tree.

All the tree data I wanted was right there, under a free license and available through an open api.

So I made the interactive visualisation myself. You can find it at taxonomy.schauderbasis.de.

Project goals

Project strategy

I went for an agile approach (because that is the only way I know how to write software):

Of course, I am doing this in my spare free time, so I need to choose carefully what I work on. Sooo…

At this point I am pretty done with the project. I can’t think of anything I would like to add or remove. But when inspiration strikes I will be happy to come back to work on this.

Technology

I used the most basic setup possible:

Of course I only used free software and the project itself is also under a free license (GPL3). I also made sure to track all the sources and licenses of dependencies and assets.

gitlab view of the project

Design decisions

I made a few design decisions that I am pretty proud of:

A button with a lion emoji on it

Accessibility

I tried hard to make everything as accessible as possible. This was hard because:

Graph visualization libraries

There are not as many graph-visulisation libraries as I had expected. And those I found are not as great for the visualizing trees as I wanted.

In an earlier version I used vis-network as a visalization library. It has many excelent examples and I liked how the tree was curvy and organic. I wasn’t able to make it planar, though. All the branches overlapped all the time.

Wikidata

This project is definitely a love-letter to Wikidata. I didn’t know it before and now I am obsessed with it. What a great resource.

Mastering the SPARQL query language was very hard though. You to know a lot about the graph and the data it encodes. Also, I find it impossible to predict which queries will be fast and which ones will never terminate.

Querying wikidata for lifeforms with an associated emoji

Taxonomy

It turs out that the world of taxonomy is not as clear as I had expected. The tree of life is well defined in it’s basic form, but there are all sorts of open questions to consider:

The problem about uncertainty was one I could solve algorithmically. I decided to remove all the connections that acted as shortcuts. When there are two branches that could reach a taxon (one with many intermediate nodes and one with few) then I would always choose the one with more nodes, deleting the one with few nodes. (The rationale here is that I wanted all the intermediate nodes but not all the complicated connections.) This made the trees a lot cleaner. A LOT!

Two trees with the same animals 3. The tree on the left has all the datapoints in it and looks like a swirly mess. The right tree is trimmed, it looks clean and straight

Only a few cases remained where this algorithm couldn’t decide what to trim. Thats ok, though.

The ancestry of the influenca A virus is not a straight line - it splits and remerges

Results