stephen.news

hypertext, words and more

Internet Archive

  • Nina Strochlic at National Geographic writes:

    Between 1950 and 2010, 230 languages went extinct, according to the UNESCO Atlas of the World’s Languages in Danger. Today, a third of the world’s languages have fewer than 1,000 speakers left. Every two weeks a language dies with its last speaker, 50 to 90 percent of them are predicted to disappear by the next century.

    In rare cases, political will and a thorough written record can resurrect a lost language. Hebrew was extinct from the fourth century BC to the 1800s, and Catalan only bloomed during a government transition in the 1970s. In 2001, more than 40 years after the last native speaker died, the language of Oklahoma’s Miami tribe started being learned by students at Miami University in Ohio. The internet has connected rare language speakers with each other and with researchers. Even texting has helped formalize languages that don’t have a set writing system.

    Other languages have not been so lucky in a post-internet world. Many, will never return from extinction. But it’s true that being more connected, we have more opportunities to connect and preserve our ancestral dialects and languages. In National Geographic’s article, they share a video of two surviving speakers of Gottscheerish:

    For more information, check out WikiTongues, the seed bank of the world’s languages.

  • On this blog, I rant a lot about the health of the web. Today, is no different.

    The Internet Archive is doing the lord’s work (if you will). Wikipedia, a champion of the web, is expectedly cut from the same cloth. Both non-profits, run a tight ship catalyzed by their bootstraps, bleeding-edge technology and undoubtedly some of the brightest contributors, engineers and computer scientists of our lifetime. They owe us nothing, and we owe them everything. They work tirelessly to realize the true goal of the web — a boundless continuum of mankind’s knowledge like the edge of forever.

    It’s almost always a sad sight coming across a dreaded 404 page. Perhaps, more so when you’ve gone down the rabbit hole on Wikipedia. But the Internet Archive and Wikipedia are teaming up to make the web, a little bit nicer these days:

    For more than 5 years, the Internet Archive has been archiving nearly every URL referenced in close to 300 wikipedia sites as soon as those links are added or changed at the rate of about 20 million URLs/week.


    And for the past 3 years, we have been running a software robot called IABot on 22 Wikipedia language editions looking for broken links (URLs that return a ‘404’, or ‘Page Not Found’). When broken links are discovered, IABot searches for archives in the Wayback Machine and other web archives to replace them with. Restoring links ensures Wikipedia remains accurate and verifiable and thus meets one of Wikipedia’s three core content policies: ‘Verifiability’.


    To date we have successfully used IABot to edit and “fix” the URLs of nearly 6 million external references that would have otherwise returned a 404. In addition, members of the Wikipedia community have fixed more than 3 million links individually. Now more than 9 million URLs, on 22 Wikipedia sites, point to archived resources from the Wayback Machine and other web archive providers.

    This is a massive achievement by any measure, even prior to Internet Archive’s efforts with IABot. The Wikipedia community alone repaired more than 3 million 404s. That’s astounding. One can only hope to see the number of 404s referenced in Wikipedia to diminish even further as the years continue to unfurl before us.