The Inquirer-Home

Bittorrent's server reliance could be its downfall

Analysis Not very decentralised at all
Mon May 10 2010, 12:48

DELUSIONS OF ANONYMITY have clouded the issue of Bittorrent's reliance on relatively few servers to distribute files on the network.

A supposedly worrying report detailed how researchers in France had managed to track users participating in Bittorrent swarms and had somehow revealed never before generated data. The truth is, not only have researchers been "sampling" real data from P2P for years, it's not even very hard to do.

P2P networks are in many ways the same as the client-server networks of the last century. The difference with a P2P network is the ability for any node to be a server, and in the case of Bittorrent, where files are split into "chunks", a server does not require the whole file to participate. Much like one would expect a traditional server to keep track of which node is accessing it, P2P nodes do the same.

For years researchers have tried to model the behaviour of P2P networks in order to gain an understanding of peer distributions, uptimes and how the network copes with the transient nature of its membership, known as churn. The way researchers have gone about this task is to create crawlers to walk the network, identifying nodes by their IP addresses. Depending on the area of research, the lifetime of a node and what files it was offering to other nodes were also recorded.

Prior to Bittorrent, research had been carried out on earlier networks that were home to file shares. Saroiu, back in 2002, managed to gather not only the IP addresses of Napster peers but intimate knowledge of what files the peer was sharing. The authors claimed they managed to grab somewhere in the region of 40 to 60 per cent of all the peers on a particular Napster server.

Though Napster wasn't a P2P network, the same kind of data was harvested through the Gnutella network, a bone-fide "unstructured" P2P network. Here the authors claimed to gather between 8,000 to 10,000 peers in just two minutes. This, they claimed represented anything between 25 and 50 per cent of the nodes on the network at that time.

Another study carried out in 2003 focused on the Overnet network, known to most as its sister network Edonkey. Bhagwan ran a crawler to harvest information and a "prober" to check whether nodes were still active on the P2P network. The research managed to capture 84,000 host IDs in a 24 hour period.

While those networks have seen their popularity recede or disappear completely, research has focused on Bittorrent in the past five years. The difference with Bittorrent is that one doesn't require a bespoke crawler to see other peers peddling chunks in the swarm. Bittorent clients such as µTorrent will display information such as throughput, IP addresses, client version and even IP geo-location all automatically.

torrent-peersBittorrent is widely used as a low cost distribution method for legitimate software such as Linux distributions. Using the protocol to acquire the latest Ubuntu ISO, it wasn't long before we were able to see several hundred other nodes, with their IP addresses taking part in the swarm. Using built-in IP geo-location, it was easy to find out, roughly, where other nodes were located.

Of course, none of this should be particularly surprising or shocking to anyone who has used the Internet for any length of time. Clients such as µTorrent make it relatively simple to view this information but programs such as Netstat or Wireshark can provide lower level data regardless of protocol.

It's a shame then that the real conclusion from Blond was mis-reported. The finding of the research was not the ability to monitor real-time node participation in P2P networks, something that was demonstrated a decade previously, but rather the reliance of a P2P protocol on relatively few nodes to inject data into the system.

The notion of "super nodes" has been around for many years. These are nodes that have server-like characteristics, high availability and bandwidth, and are often used in P2P deployments to bootstrap the swarm. In most cases super nodes aren't standard computers but rather servers hosted by an ISP within a datacentre.

Some of these servers are rented by users for use as "seedboxes" to maintain healthy upload and download ratios on private torrent tracker sites. Conversely it opens up the question of swarms relying on a select few hosts to deliver content and the consequences to the distribution of content if these nodes suddenly go offline or worse still become compromised.

The fluctuating membership in P2P networks is known as churn. In "high churn" scenarios, a significant majority of nodes can "disgracefully" leave the network, with remaining nodes unaware of the departures. Nodes remaining would then waste time and bandwidth trying to contact nodes that they believe are still online. The move to decentralised tracking mechanisms, such as distributed hash tables (DHT), by Bittorrent tracker sites will exasperate the situation.

While The Pirate Bay might think DHTs will help the tracker evade the unbounded reach of US law enforcement, for the tracker, the development might lead to its downfall. DHTs were never designed for environments where volatile node membership exists. Plainly put, even with the current state of the art, such as that proposed by Terpstra, significant node departures can cripple a DHT system. A professor who co-wrote the paper that first proposed DHTs told The INQUIRER that DHTs simply cannot work in high churn environments.

His point stems from the strategies used to mitigate performance degradation in DHTs, which typically centre around replicating content. The NP complete problem is what content should be replicated, how many times and which approaches result in strategies that do not scale.

All this brings up a very worrying question. What happens if so-called "copyright holders" actively take part in swarms and leave suddenly? P2P protocols, especially Bittorrent, have built in resilience to such departures so the show will go on, but there will be a pause. How significant that pause is, depends entirely upon how reliant the swarm is on particular nodes.

From the significant dataset Blond and his co-authors collected, over 50 per cent of the top 20 "content providers" came from just two networks. It is that low level of relative decentralisation that is moving Bittorrent towards the client-server paradigm used by Napster and others, and which could be exploited to disrupt swarms.

The findings by Blond are given extra credence by the enormous number of IP addresses collected, 148 million in total. It tends to suggest that P2P is far from being a decentralised mechanism for distributing data, rather it's at the behest of a 'rich club' of nodes that are posting new content and doing the majority of work peddling that content.

It isn't the lack of anonymity that users should lose sleep over, but rather the realisation that Bittorent, currently the most popular P2P protocol on the Internet, is propped up by a few networks.

While Bittorrent might be a legitimate P2P protocol, the behaviour of those who want to give more than their fair share could push the service towards a client-server paradigm and help those who want it shut down. µ

 

Share this:

Comments
Helpful hint it's hoped

@David Schwartz Go find some christian republican blog to comment please, thanks, this is not the place for you, this site isn't even american you know? It's a scary socialist-europe site, RUN FOR YOUR LIFE.

posted by : W.-, 17 May 2010 Complain about this comment
Any Disruption Would Be Temporary

With other networks, it was possible to fool downloaders with fake content. BitTorrent is a bit more resistant to that, with its integrated hash digests of every single chunk of every download, allowing fake substitutions to be spotted quickly and rejected.

All this “churn” business can achieve is to delay the download, not stop it completely. If the technique starts achieving any results at all, developers of client software will simply come up with techniques to detect such attempts at sabotage and switch to the more reliable peers.

Yes, it’s an arms race. But given the built-in robustness of the protocol to begin with, I think the advantage is to the network, not to those trying to block it.

posted by : Lawrence D'Oliveiro, 15 May 2010 Complain about this comment
This article doesn't seem to make any sense.

So the downfall of Bittorrent is that it might be hard to use it to break the law? That's like saying that the only problem with smart guns is that criminals can't steal them from cops and then use them to shoot innocent people.

posted by : David Schwartz, 14 May 2010 Complain about this comment
So what?

If I'm reading this right, what is the significance of the structure itself here? Super-seeders sound like a vulnerable point in the network only if you see the network as a purely technical apparatus. However, it isn't a purely technical phenomenon, it's remarkably social. I think a big problem with this research is that it was done on public networks, and those aren't typical of file-sharing right now. In the closed networks, super-seeders are figures of authority that give structure to a social network. They're often administrators, they set ratio rules, make standards, and set the tone for sharing practices. Though public BT networks are often larger, they're less organized and won't tell you a damn thing about how people use the tools; they'll just tell you how the tools can work.

Then there's also the issue of dated research here. A great deal of public file-sharing has left BT for hosting on sites like rapidshare, or megaupload. The bandwidth that they have is sufficient for moderately large files, but again, the organization of is done largely in specific communities and isn't easily quantified.

posted by : flipmode, 13 May 2010 Complain about this comment
Peer Exchange

What about peer exchange, did they even consider than? This article did not go into depth about the problems with using DHT has the sole source of peers so I cannot comment completely.

But with peer exchange, from my understanding you would only have to find one peer (A) in a swarm with DHT.
Then peer A can give you the IP of B, C and D and those three peers can give you the IP of thier peers etc until you have the enough peers.

I think the comment about the limited amount of sources at the top might be talking about release groups and their private servers? These servers are not part of the bittorrent network and so should not be considered.

In that case you would be considering the stability of the filtering of content down from these groups to the public.

posted by : Krz, 12 May 2010 Complain about this comment
Magnet Links

Read about it.

posted by : turtle, 11 May 2010 Complain about this comment
".torrent" file Vs "ed2k://" google cache

before hosting server jargon, this is the difference/demarit .... ".torrent" file Vs "ed2k://" URL findable even in google cache. one is file while other is hyperlink.

posted by : Muhammad Imran/mi1400, 11 May 2010 Complain about this comment
Lessons

Does it really matter? According to all the new laws created by the big companies and implemented by their political 'employees' we'll soon all be guilty, regardless if you do anything like file sharing, such details do not matter, point is that you are a bad slave and must learn a lesson just for illegally thinking you are in a democratic society alone already.

posted by : W.-, 11 May 2010 Complain about this comment
But if they do that....

But if the "Copyright Holders" knowingly, willingly and significantly assist the distribution of their own "Intellectual Property" it can be seen by both the dark side and the light side (up to you to decide which is which). On their next lawsuit would it not be sufficient for the defendants to request copies of all communications with Hosting service X as part of discovery? If they manage to distance themselves so well that there is no paper trail to link them at least the defence could question why they went to the trouble of establishing the identity of the disabled grandmother who may have downloaded 3 songs when with a simple whois lookup they could identify the hosting service responsible for feeding potentially millions of songs.

I don't see how you can be found guilty of taking something that was being given away.

Oh and by the way I was going to say that you EXACERBATE the situation and EXASPERATE your readership but I see there is an obsolete usage of exasperate which means to make more grievous so I'll say nothing...

posted by : Grammar Nazi, 10 May 2010 Complain about this comment
Who are those few with fast servers?

"They discovered the the [sic] vast majority of the material on BitTorrent started with a relatively small number of individuals."

I've been wondering about "super-seeders" for a while. Obviously they have: a) access to large libraries of material, b) high bandwidth, c) plenty of time on their hands even if largely automated, d) some compelling reason.

Those suggest the following in various degrees:
1) True nuts obsessed with media (doesn't even have to be pornography!).
2) Sysadmins on big Linux / Unix systems with little to actually do...
3) Deliberate seeding by industry for purposes of providing substance to claims of piracy.

I lean toward #3 as most likely, though true nuts surely exist. But for industry there's direct financial return *now* through increased sales of otherwise obscure material, indirect but real, it's advertising; PLUS it's an investment to obtain future power in locked-down content and internet control, unquestionably a goal of media industries.

posted by : bigger_luddite, 10 May 2010 Complain about this comment
aboutus
Advertisement
Subscribe to INQ newsletters
Advertisement
INQ Poll

Authorities in several countries raided Megaupload recently, shut down all of its services, seized hundreds of servers and arrested several of its executives on criminal charges.

Do you think the move was justified?