Building Torrent Family Trees (Beta)
We are used to seeing family trees in biology. For example this is the human family tree:
But how about for software? I think it is possible for us to look at family tree of digital media too: So here is what the family tree for BitTorrent software looks like:
So what does this image mean? The first key point to note here is that this is not a family tree of the idea of torrents (we’ll call that a ‘meme‘) but of the actual source-code relationships between BitTorrent and subsequent software clients based on it.
In biology, relationships between species can be determined using phylogenetic analysis – where the evolutionary relatedness among various groups of organisms are determined via molecular sequencing data and morphological data matrices. These linkages can then be plotted onto a phylogenetic tree. The branching structure is used because evolution is a branching process, whereby alteration over time can result in speciation and thus branching of populations. As species hybridize or terminate (extinction), the results can be visualized in a phylogenetic tree. This is a strong similarity to the methodological approach being used here; by plotting the generations of a species of p2p software along with where the off-shoot branched from the common ancestor (the code-basis).
The key reason that this is akin to the phylogenetic method is because the linkages are based on relationships of the source code (read: DNA) and not upon the meme-layer (read:idea). A similar exercise but around the meme-layer would produce very different results.
So how was the image generated? I am interested in the change over time of software systems. To get more of an overview of the world of p2p software I thought it would be interesting to see the changes over time as a whole. So with this in mind I looked to the releases of versions of each and every p2p torrent client. This has been done using the following methodology;
- Create an entry for each type of torrent client software. For this experiment, only separate software systems designed to be installed on a operating system were used. This research did not include non-installed software such as browser clients like BitLet.org or mobile phone versions of software.
- For that client, search for the changelog, if not possible then look for the date of the source code releases and if this cannot be found, then the executable releases or news and/or email list announcements. If multiple dates were given for the same release version, then the date of the source code (.tar) files were used as the primary source.
- On a per-month basis, record the most current version released in that month. The record is in the form of the version number given by the developers, abbreviated to one decimal place (rounded down always). am aware that this is a self-reported piece of data, and it should not be considered that either within that project or in comparison to other project, there is enough consistency to consider this number an empirical item of data, however it does provide a numerical record of generational change. See detailed notes at the end of this post*.
- Where is is noted in the documents read for research, record what other source code was used in the construction of the project – this gives us a sense of the linkages between projects (sub-species) and allows us to construct a family tree. This is generally recorded by the developer in their notes. The beta data-set, including reference links from where the data came from, is available as a spreadsheet here.
- Enter this data into a graph plotting version number over time. This gives us a broad view of p2p software releases over time.
- Then plot the main linkages between the different releases by their basis source code, e.g. Tomato Torrent is based on v4.2 of BitTorrent – this is recorded as a family linkage. Over the graph, draw lines to connect the ‘off-shoot’ software to their ‘code-basis’.
This method produces the following results (the graph is pretty big, so if you want to see it in more detail, look at the PDF)
So when the family linkages of the main progenitor software (which the data showed to be Azureus, BitTorrent and LibTorrent) are plotted onto this graph we can see:
Which can then be separated out from the graph to show each family tree in isolation…
Azureus Family Tree:
BitTorrent Family Tree:
LibTorrent Family Tree:
Please feel free to send me comments, feedback and other notes on the ideas presented here. As titled, this is a beta – so more info is welcome!
* Detailed notes on version number recording: Where a later release is out but under a later version no. but is counting up, e.g. 0.3.9 to 0.3.10 then can indicate another 0.3 on chart indicate version change on chart by compensating for 2 decimal place counting up. BitStormLite used 0.2a, b etc so was incremented as 0.1 for 0.2a etc. The software Acquisition’s betas used the notation 124.3 whereas the releases used 2.0, so this research amended the notation to 1.2 for 125.4 etc. as developer indicates by calling v2.0 also v209. Gnome used 0.01 notation so 0.10 was taken as 0.1 for recording. LocalHost was tough, I did email them to ask but no reply so had to use their home page and work backwards from last news entry about new version (which had a version number 0.4.1), dropping version number by 0.1 each time news noted a new release (as no other news items had version number). BitTornado has experimental release called 5.6.x then a main release of 0.1, so to reconcile the experimental was called 0.6 then the main release started at 1.1 for recording. BitTorrent was also referred to as ‘mainline’. Note the date is given in months for ease of handling a long timeline. Where more than one version was released in a month (e.g. v2.3 & v2.4) then the latest number for that month is taken (e.g. v2.4). If the developer has not incremented the decimal places, going from 0.9 to 0.10, then we have recorded this as 0.9 to 1.0.
Trackbacks