stephen.news

hypertext, words and more

Over 9 Million Broken Links on Wikipedia Are Now Rescued

On this blog, I rant a lot about the health of the web. Today, is no different.

The Internet Archive is doing the lord’s work (if you will). Wikipedia, a champion of the web, is expectedly cut from the same cloth. Both non-profits, run a tight ship catalyzed by their bootstraps, bleeding-edge technology and undoubtedly some of the brightest contributors, engineers and computer scientists of our lifetime. They owe us nothing, and we owe them everything. They work tirelessly to realize the true goal of the web — a boundless continuum of mankind’s knowledge like the edge of forever.

It’s almost always a sad sight coming across a dreaded 404 page. Perhaps, more so when you’ve gone down the rabbit hole on Wikipedia. But the Internet Archive and Wikipedia are teaming up to make the web, a little bit nicer these days:

For more than 5 years, the Internet Archive has been archiving nearly every URL referenced in close to 300 wikipedia sites as soon as those links are added or changed at the rate of about 20 million URLs/week.


And for the past 3 years, we have been running a software robot called IABot on 22 Wikipedia language editions looking for broken links (URLs that return a ‘404’, or ‘Page Not Found’). When broken links are discovered, IABot searches for archives in the Wayback Machine and other web archives to replace them with. Restoring links ensures Wikipedia remains accurate and verifiable and thus meets one of Wikipedia’s three core content policies: ‘Verifiability’.


To date we have successfully used IABot to edit and “fix” the URLs of nearly 6 million external references that would have otherwise returned a 404. In addition, members of the Wikipedia community have fixed more than 3 million links individually. Now more than 9 million URLs, on 22 Wikipedia sites, point to archived resources from the Wayback Machine and other web archive providers.

This is a massive achievement by any measure, even prior to Internet Archive’s efforts with IABot. The Wikipedia community alone repaired more than 3 million 404s. That’s astounding. One can only hope to see the number of 404s referenced in Wikipedia to diminish even further as the years continue to unfurl before us.