New in p2p Technology
A few news links of interesting stuff going on in the p2p world…
- Tixati takes on uTorrent & Vuze for the title of best p2p client software.
- P2P Next project has started to stream BBC content in true 1080p HD via BitTorrent.
- Flash Player 10.1 beta offers video via P2P multicast and could be used to build large-scale P2P groupware solutions that work right within the browser. Nice.
Peer Metadata & Comments
I’ve had a read of ‘Robust vote sampling in a P2P media distribution system‘. It is a very interesting research paper. The idea that grabs me most is the one about using gossip-based network systems to decentralise metadata. Let me explain; if you think of a book on Amazon, it comes with lots of additional information as well as the basic item itself – user reviews and ratings, items that might be similar and so on. This is all great information that can often help guide you as a user in your choice. But this data is centralised – it is on Amazon’s servers and it is up to them what they do with it. If you are in the process of trying to decentralise a system – so it is not under the control of one central person or server – how can you collate comments and additional data?
The authors have come up with a very simple and elegant method. First off the use the Gossip-protocol as a means to an end (so named as it mimics how gossip spreads in a social situation);
In order to propagate and store metadata we selected a gossip (or epidemic) based replication approach. Each peer stores metadata in its own local database. By storing metadata locally we ensure that it has high availability. Periodically peers are paired randomly and exchange metadata updating their own local databases … We selected a gossip based design because it requires no central components and is robust to high churn rates. We could have stored metadata in a Distributed Hash Table but these require explicit leave and join operations which are costly in systems with high churn, such as file sharing networks. Additionally, search performance is considerably enhanced if metadata is stored locally because it is not necessary to perform multi-hop look-ups.
Then they add a second layer of interactive functionality to this; users are required to vote on whether or not they like the comments. This means that a user, in order to get the benefit from the cloud of comments, must act as a kind of screening to the data-sources;
Moderations are disseminated in a gossip-like fashion to other peers by using the PSS [peer sampling service; the means by which nodes to discover others and potentially exchange messages with them]. However, nodes only pass on metadata from those moderators they have approved. Approval involves the user explicitly selecting a thumbs-up icon displayed next to the metadata from the given moderator indicating a positive (+) vote for the moderator. Users may also disapprove of a moderator by selecting a thumbs-down indicating a negative (-) vote. Essentially then, the idea is that, “good” moderators, as judged by the approval of others, will spread their metadata quickly but “bad” moderators, obtaining low numbers of approvals and / or disapprovals, will only be able to spread their metadata slowly.
Voting approval on comments is nothing new. You can see it in action on a huge range of sites from The Guardian to Slashdot.org – it is a good way of user-comment regulation. (For anyone who’s ever run a busy public site, you know how messy comment regulation can get!). What is interesting about this system is that the comments the user sees are filtered to their individual choice and taste, unlike a public system that just shows the popular ones. This means that over time, the comments you get will grow to your taste. The problem with the current centralised public system (which this new idea avoids) is that when it comes to contentious issues comments can often we rated by polarisation – where user back commentators based on not the accuracy of their comments, but of their bias on a larger issue.
What is also clever about this design is that it still allows for the ‘wisdom of the crowds’, even when the cache of data you have is unique to each user. It does this using what they authors term the ‘local ballot box’;
Essentially then, each peer individually conducts its own poll by asking other randomly selected peers directly to supply their local vote list. Hence pairs of peers meet randomly and exchange votes, building, over time, a sample of the votes of the population in their local ballot boxes. Nodes do not forward or share the accumulated information in their local ballot box with other peers. This precludes certain kinds of malicious vote manipulation where a node could lie about the votes received from others. But this means that each peer can only accumulate a sample of the population votes, based on its direct experience, not a globally accurate total count.
There are lots of interesting ideas and designs in the paper and it is worth a read.
Pervasive Media Studio Talk on Software Palaeontology
I’m speaking at an event next week at the PM Studio in Bristol on Wed 16th at 4pm and the event is free!
Software Paleontology – Tomas Rawlings (FluffyLogic & DCRC PhD Student)
Tomas is on a GWR PhD scholarship applying evolutionary theory to peer to peer networks. As part of this research Tomas has developed a unique methodology of ‘software paleontology’ comparing the change logs of P2P software versions to the fossil records of biological evolution.
Nietzsche contra Caillois: Beyond Play and Games – Dan Dixon (BIT) Dan teaches in Digital Media and researches in social gaming. In this recent paper he argues that there is no continuum between the experiences of gaming and playing; these are two separate aesthetic qualities both present during the playing of games. Secondly that these aesthetic experiences map onto Nietzsche’s Apollonian and Dionysian principles, as set out in The Birth of Tragedy (1993). Following this separation, particular attention is paid to the terms playing and gaming as specific aesthetic terms and neither of which are privileged experiences of digital or non-digital games.
Stopping copyright violations on p2p: Can the technology ever work?
Getting back to the question of if there is a technical solution that would ever be able to stop copyright violations on p2p, a couple of interesting blog posts that might back my eariler hypothesis that it is simply not possible to stop. First off is the results from a study into anonymizing service use in Sweden.
As pressure from anti-piracy outfits on governments to implement strict anti-piracy laws increases, millions of file-sharers have decided to protect their privacy by going anonymous. In Sweden alone an estimated 500,000 Internet subscribers are hiding their identities. Many more say they will follow suit if the Government continues to toughen copyright law. These findings are the result of the Cyber Norms sociological research project carried out by a group of Swedish researchers. The researchers conducted a survey among Swedes aged between 15 and 25 and found that 10 percent of this group is currently taking measures against increasing online surveillance. Måns Svensson, PhD in Sociology of Law in Lund, estimates the percentage of all Swedes who are hidden on the Internet to be as high as 6 or 7 percent. If this figure is accurate, it means that there are more than half a million Swedes who already use a service to hide their identity.
Second is the results of a number of major ISPs dropping Usenet services (which is a bit of a web free-for-all space). While initally there was drop in use, the ongoing upward trend is the use of the service soon resumed;
Links from eComm
Am at the eComm and have just given the planned talk. Seemed to go well – my notes are here. There is one blog post up about the talk already (but it is in Dutch) on the site dutchcowgirls.net which is cool. Also there is a Google Wave for the whole event, including one for my talk here. There are also videos and images:
More on Evolution and Software Family Trees
I had an interesting email discussion with Ernesto (the Editor-in-Chief of TorrentFreak) about my last post on the blog and thought I’d reproduce some of it here (with his permission of course!)…
Ernesto: Is the graph on your blog the complete tree, or will more client be added?
Me: This is a beta of my data – I would look to add more as time goes on!
Ernesto: What type of feedback are you looking for at the moment?Me: Well on my blog I have published the spreadsheet I used to generate this family trees – I would love a story about the research and hope to get feedback on missing client software, corrected dates, new dates where I am missing data, any links between the source code of projects that I don’t yet have covered and also general feedback on the method.
Another important point with this research is that if, as I suspect it to be the case, p2p works on the basis of evolutionary principles then the efforts by those trying to stop p2p technology, far from either stopping or slowing it, are actually helping the development of the technology. Yes this would mean the end (extinction?) of some clients and protocols, but those that survive are stronger and harder to stop and so the cycle grows.
Ernesto: How do you define evolutionary principles? Using code? Does there have to be progress, adaptation or added value? I’m not really convinced that the evolutionary principles are used to improve the clients, the originals are often better..
Good question – I am using the source code as akin to the DNA of a biological species, but in essence I think the process is the almost the same as we see in nature; each generation of client is a modified version and thus is descent though modification, new clients based on the source code of other projects act as speciation and the choices of the users of what to use/support acts as a form of selection – all the ingredients for evolution.Where the source-code is important is that it acts as a marker via which we can measure change. While lots of other researches have suggested that evolution works on technology, source-code gives us something more that just our impressions about a technology – it gives us it’s DNA!
Another important point is that evolution is blind; it does not make value judgements – it just is. So as individuals we do make value judgements but as a mass of users the decisions become a kind of crow-sourced selection that is for all intents and purposes, blind. So what comes might be better or worse than what has gone, but the point is that it an adaptation to suit the time and place.
Getting some feedback on this data would be invaluable to me making this idea a strong concept that can be taken forward!
Building Torrent Family Trees (Beta)
We are used to seeing family trees in biology. For example this is the human family tree:

But how about for software? I think it is possible for us to look at family tree of digital media too: So here is what the family tree for BitTorrent software looks like:

So what does this image mean? The first key point to note here is that this is not a family tree of the idea of torrents (we’ll call that a ‘meme‘) but of the actual source-code relationships between BitTorrent and subsequent software clients based on it.
In biology, relationships between species can be determined using phylogenetic analysis – where the evolutionary relatedness among various groups of organisms are determined via molecular sequencing data and morphological data matrices. These linkages can then be plotted onto a phylogenetic tree. The branching structure is used because evolution is a branching process, whereby alteration over time can result in speciation and thus branching of populations. As species hybridize or terminate (extinction), the results can be visualized in a phylogenetic tree. This is a strong similarity to the methodological approach being used here; by plotting the generations of a species of p2p software along with where the off-shoot branched from the common ancestor (the code-basis).
The key reason that this is akin to the phylogenetic method is because the linkages are based on relationships of the source code (read: DNA) and not upon the meme-layer (read:idea). A similar exercise but around the meme-layer would produce very different results.
So how was the image generated? I am interested in the change over time of software systems. To get more of an overview of the world of p2p software I thought it would be interesting to see the changes over time as a whole. So with this in mind I looked to the releases of versions of each and every p2p torrent client. This has been done using the following methodology;
- Create an entry for each type of torrent client software. For this experiment, only separate software systems designed to be installed on a operating system were used. This research did not include non-installed software such as browser clients like BitLet.org or mobile phone versions of software.
- For that client, search for the changelog, if not possible then look for the date of the source code releases and if this cannot be found, then the executable releases or news and/or email list announcements. If multiple dates were given for the same release version, then the date of the source code (.tar) files were used as the primary source.
- On a per-month basis, record the most current version released in that month. The record is in the form of the version number given by the developers, abbreviated to one decimal place (rounded down always). am aware that this is a self-reported piece of data, and it should not be considered that either within that project or in comparison to other project, there is enough consistency to consider this number an empirical item of data, however it does provide a numerical record of generational change. See detailed notes at the end of this post*.
- Where is is noted in the documents read for research, record what other source code was used in the construction of the project – this gives us a sense of the linkages between projects (sub-species) and allows us to construct a family tree. This is generally recorded by the developer in their notes. The beta data-set, including reference links from where the data came from, is available as a spreadsheet here.
- Enter this data into a graph plotting version number over time. This gives us a broad view of p2p software releases over time.
- Then plot the main linkages between the different releases by their basis source code, e.g. Tomato Torrent is based on v4.2 of BitTorrent – this is recorded as a family linkage. Over the graph, draw lines to connect the ‘off-shoot’ software to their ‘code-basis’.
This method produces the following results (the graph is pretty big, so if you want to see it in more detail, look at the PDF)

So when the family linkages of the main progenitor software (which the data showed to be Azureus, BitTorrent and LibTorrent) are plotted onto this graph we can see:

Which can then be separated out from the graph to show each family tree in isolation…
Azureus Family Tree:

BitTorrent Family Tree:

LibTorrent Family Tree:

Please feel free to send me comments, feedback and other notes on the ideas presented here. As titled, this is a beta – so more info is welcome!
* Detailed notes on version number recording: Where a later release is out but under a later version no. but is counting up, e.g. 0.3.9 to 0.3.10 then can indicate another 0.3 on chart indicate version change on chart by compensating for 2 decimal place counting up. BitStormLite used 0.2a, b etc so was incremented as 0.1 for 0.2a etc. The software Acquisition’s betas used the notation 124.3 whereas the releases used 2.0, so this research amended the notation to 1.2 for 125.4 etc. as developer indicates by calling v2.0 also v209. Gnome used 0.01 notation so 0.10 was taken as 0.1 for recording. LocalHost was tough, I did email them to ask but no reply so had to use their home page and work backwards from last news entry about new version (which had a version number 0.4.1), dropping version number by 0.1 each time news noted a new release (as no other news items had version number). BitTornado has experimental release called 5.6.x then a main release of 0.1, so to reconcile the experimental was called 0.6 then the main release started at 1.1 for recording. BitTorrent was also referred to as ‘mainline’. Note the date is given in months for ease of handling a long timeline. Where more than one version was released in a month (e.g. v2.3 & v2.4) then the latest number for that month is taken (e.g. v2.4). If the developer has not incremented the decimal places, going from 0.9 to 0.10, then we have recorded this as 0.9 to 1.0.
Is P2P Traffic Declining?
Wired recently published an article, based on the analysis of traffic from 110 different ISPs over on nearly 3,000 routers, for a total of 264 exabytes of traffic – and the article concluded that p2p traffic globally was on the decline:
Rising from the ashes in the early 2000s of banned services like Napster, P2P soon became demonized as an imminent threat to software industry, Hollywood and the internet’s backbone, prompting high-profile piracy trials, federal government hearings on traffic management and hand-wringing from ISPs who said torrents of illicit traffic would overwhelm the net. But peer-to-peer file sharing is falling out of favor quickly, according a new report from Arbor Networks, a network-management firm used by more than 70 percent of the world’s top ISPs. Falling out of favor so fast that the report declares that P2P is dead to ISPs.
“Globally P2P is declining and it is declining quickly,” said Craig Labovitz, the chief scientist at Arbor Networks, in a preview of a paper of findings from data collected by Arbor Networks from its customers. … In fact, according to its sensors, peer-to-peer traffic still accounts for about 18 percent of all traffic. (That’s by looking at packets — by protocol, P2P fell to less than one percent of traffic, but file sharing applications mask themselves in order to evade technical blocks.) But compare that to 2007, when peer-to-peer peaked as high as 40 percent of net traffic, according to Labovitz.
But is this the case? First off I think it is important to note that measuring p2p traffic accurately is very difficult. For example Bolla et al (2008) reported that failing to distinguish between arrival times, durations, volumes and average packet sizes of P2P conversations in the statistical analysis can lead to misleading results. The tool being used for this research is a proprietary system and as such it is difficult to know if such issues are at play.
Secondly the article is also a little vague about the context in which this is set; a decline in percentage is not the same as a decline in usage – if the overall numbers are growing also. Commentator Mark Goldberg notes this issue;
But, in reality, there was no drop in P2P traffic reported in either the study or the Wired article. The article spoke of a drop in the proportion of total internet traffic, with P2P file sharing dropping from 40% in 2007 to 18% today. To look at P2P traffic totals, you need to see what total internet traffic was doing in that 2 year period. According to a recent report on a Telegeography release, total internet traffic is up 188% (up 79% in 2009 and 61% in 2008). As a result, total P2P traffic appears to have actually increased 25% in the 2 year period – hardly a “big drop”.
That said, in some senses it would be no surprise to find that p2p traffic was declining – as there are some easier to use alternatives, especailly for music (e.g Spotify), now appearing. However whether or not this equates to a drop in peer-sharing of digital content is another thing. The huge rise in streaming (as well at its growing ease and ubiquity though services such as Jango) coupled with the rise in both the number of connected devices (PC, netbooks, smart phones) , the number of methods (sharing via remote hardrives, encrypted p2p, dark nets) and the drop in the cost of storage, all means that it would be very hard to suggest that any decline in p2p software was mirrored by a decline in peer-sharing.
(First publised on the p2p foundation blog)
P2P Moves into Browsers
I suspect most people’s use of the torrent protocol comes in the form of a separate p2p software client (e.g. Azureus or Transmission), indeed, this is the method I have been looking at for my research. I was always award, however, that more methods existed. I was very impressed with BitLet – a java applet that was based in a web-browser that allows you to torrent-away. But more methods of browser-use have also since emerged…
– Machsend; a non-Java based sharing client that uses Ruby and Yahoo’s Browserplus extension.
– Littleshoot; a powerful sounding plugin for most browsers with integrated download and search.
– and now BitLet are looking to stream video using p2p too!
The Laws of Biology: Omnivorous Spiders
There is an interesting post on the blog Why Evolution in True – interesting because it points to an interesting idea for those of us looking to biology for tools in other realms (such as media in my case aka media ecology). First here is a quote from the post;
The “laws” of biology aren’t like the laws of physics, because they deal with stuff that’s alive, which doesn’t always obey the mathematical rigour of a “law”. And when we think we’ve found something that’s hard and fast, there’s generally an exception. So, for example, over 40,000 species of spider have been described, and they are all carnivorous (even if some occasionally sip nectar or eat pollen). “Spiders are carnivorous” would seem to be an appropriate generalization. But it isn’t a law. … Today’s issue of Current Biology shows why. It features an amazing discovery – a largely herbivorous jumping spider (Salticid) going by the charming name of Bagheera kiplingi … The spider eats the juicy orange tips (”Beltian bodies”) of the leaves of the acacia tree … Both behavioral observations and chemical analysis show that the spider eats the Beltian bodies. However, it is not strictly herbivorous – it will also nibble the odd ant larva.
Apart from being a very interesting idea – veggy spiders – the author also points to the messy nature of living systems. This means that in trying to classify things, there will always we aspects to simply don’t fit, exceptions to the rule and other niosy anamaly. This means we are more often than we are looking for ‘generalizations’ than ‘laws’ and our methods need to reflect this. In my current research, I need to find out all the p2p torrent client software that existed. That was not easy – but also the boundaries were fuzzy with mobile clients, web-based clients and also add-on and plugins that added/enchanced torrent clients, that I have to draw some lines somewhere! I hope to post the first pass results soon and get feedback from yourselves…






