DELUSIONS OF ANONYMITY have clouded the issue of Bittorrent's reliance on relatively few servers to distribute files on the network.
A supposedly worrying report detailed how researchers in France had managed to track users participating in Bittorrent swarms and had somehow revealed never before generated data. The truth is, not only have researchers been "sampling" real data from P2P for years, it's not even very hard to do.
P2P networks are in many ways the same as the client-server networks of the last century. The difference with a P2P network is the ability for any node to be a server, and in the case of Bittorrent, where files are split into "chunks", a server does not require the whole file to participate. Much like one would expect a traditional server to keep track of which node is accessing it, P2P nodes do the same.
For years researchers have tried to model the behaviour of P2P networks in order to gain an understanding of peer distributions, uptimes and how the network copes with the transient nature of its membership, known as churn. The way researchers have gone about this task is to create crawlers to walk the network, identifying nodes by their IP addresses. Depending on the area of research, the lifetime of a node and what files it was offering to other nodes were also recorded.
Prior to Bittorrent, research had been carried out on earlier networks that were home to file shares. Saroiu, back in 2002, managed to gather not only the IP addresses of Napster peers but intimate knowledge of what files the peer was sharing. The authors claimed they managed to grab somewhere in the region of 40 to 60 per cent of all the peers on a particular Napster server.
Though Napster wasn't a P2P network, the same kind of data was harvested through the Gnutella network, a bone-fide "unstructured" P2P network. Here the authors claimed to gather between 8,000 to 10,000 peers in just two minutes. This, they claimed represented anything between 25 and 50 per cent of the nodes on the network at that time.
Another study carried out in 2003 focused on the Overnet network, known to most as its sister network Edonkey. Bhagwan ran a crawler to harvest information and a "prober" to check whether nodes were still active on the P2P network. The research managed to capture 84,000 host IDs in a 24 hour period.
While those networks have seen their popularity recede or disappear completely, research has focused on Bittorrent in the past five years. The difference with Bittorrent is that one doesn't require a bespoke crawler to see other peers peddling chunks in the swarm. Bittorent clients such as µTorrent will display information such as throughput, IP addresses, client version and even IP geo-location all automatically.
Bittorrent is widely used as a low cost distribution method for legitimate software such as Linux distributions. Using the protocol to acquire the latest Ubuntu ISO, it wasn't long before we were able to see several hundred other nodes, with their IP addresses taking part in the swarm. Using built-in IP geo-location, it was easy to find out, roughly, where other nodes were located.
Of course, none of this should be particularly surprising or shocking to anyone who has used the Internet for any length of time. Clients such as µTorrent make it relatively simple to view this information but programs such as Netstat or Wireshark can provide lower level data regardless of protocol.
It's a shame then that the real conclusion from Blond was mis-reported. The finding of the research was not the ability to monitor real-time node participation in P2P networks, something that was demonstrated a decade previously, but rather the reliance of a P2P protocol on relatively few nodes to inject data into the system.
The notion of "super nodes" has been around for many years. These are nodes that have server-like characteristics, high availability and bandwidth, and are often used in P2P deployments to bootstrap the swarm. In most cases super nodes aren't standard computers but rather servers hosted by an ISP within a datacentre.
Some of these servers are rented by users for use as "seedboxes" to maintain healthy upload and download ratios on private torrent tracker sites. Conversely it opens up the question of swarms relying on a select few hosts to deliver content and the consequences to the distribution of content if these nodes suddenly go offline or worse still become compromised.
The fluctuating membership in P2P networks is known as churn. In "high churn" scenarios, a significant majority of nodes can "disgracefully" leave the network, with remaining nodes unaware of the departures. Nodes remaining would then waste time and bandwidth trying to contact nodes that they believe are still online. The move to decentralised tracking mechanisms, such as distributed hash tables (DHT), by Bittorrent tracker sites will exasperate the situation.
While The Pirate Bay might think DHTs will help the tracker evade the unbounded reach of US law enforcement, for the tracker, the development might lead to its downfall. DHTs were never designed for environments where volatile node membership exists. Plainly put, even with the current state of the art, such as that proposed by Terpstra, significant node departures can cripple a DHT system. A professor who co-wrote the paper that first proposed DHTs told The INQUIRER that DHTs simply cannot work in high churn environments.
His point stems from the strategies used to mitigate performance degradation in DHTs, which typically centre around replicating content. The NP complete problem is what content should be replicated, how many times and which approaches result in strategies that do not scale.
All this brings up a very worrying question. What happens if so-called "copyright holders" actively take part in swarms and leave suddenly? P2P protocols, especially Bittorrent, have built in resilience to such departures so the show will go on, but there will be a pause. How significant that pause is, depends entirely upon how reliant the swarm is on particular nodes.
From the significant dataset Blond and his co-authors collected, over 50 per cent of the top 20 "content providers" came from just two networks. It is that low level of relative decentralisation that is moving Bittorrent towards the client-server paradigm used by Napster and others, and which could be exploited to disrupt swarms.
The findings by Blond are given extra credence by the enormous number of IP addresses collected, 148 million in total. It tends to suggest that P2P is far from being a decentralised mechanism for distributing data, rather it's at the behest of a 'rich club' of nodes that are posting new content and doing the majority of work peddling that content.
It isn't the lack of anonymity that users should lose sleep over, but rather the realisation that Bittorent, currently the most popular P2P protocol on the Internet, is propped up by a few networks.
While Bittorrent might be a legitimate P2P protocol, the behaviour of those who want to give more than their fair share could push the service towards a client-server paradigm and help those who want it shut down. µ
But the search giant has now squashed the bug
But it's not yet available here in Blighty
We're not sure this is what The Maybot had in mind
Typical politicians - meme, meme, me