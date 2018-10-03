There should be a lot less of this from now on

WIKIPEDIA HAS been made a little bit more accurate, thanks to a five year mission by The Internet Archive.

The problem with a peer-curated encyclopedia is that most of the editing happens on currently active subjects. This means that as time takes its inevitable toll on dormant entries, more and more of the links stop working.

But thanks to the "Build A Better Web" initiative from the people behind the Wayback Machine which has been crawling and storing snapshots of webpages since we were all younger and fitter, around 9m of these 404ified pages have been updated, with their links now taking you to a snapshot of the site in Wayback.

A piece of software called Internet Archive Bot (IABot) has been trawling across pages in 22 languages to find dormant pages and updating the URL. That's updated 6m links, with the rest coming from the community who stumped up the other 3m.

In a blog post, The Internet Archive explains: "When broken links are discovered, IABot searches for archives in the Wayback Machine and other web archives to replace them with. Restoring links ensures Wikipedia remains accurate and verifiable and thus meets one of Wikipedia's three core content policies: ‘Verifiability'."

So far, in a 10 day testing period, the external links to Wayback from Wikipedia were measured by a team from Stanforfd and EPFL and found that Wayback, with 25,000 clicks, is now the most popular site for clickthroughs, more than three times the next most popular - Google Books, on the English version alone.

Part of the issue though remains that as well as "link rot" where links are broken, there is also the issue of "content drift" which is where the current page has altered so much from what was put there at the time that it becomes irrelevant.

The trick now will be to make sure that the right snapshot for the right piece of information is used, as well as expanding the scope of the project to include "more web pages, digital books and academic papers". μ