(Cross-posted from the Google Translate Blog)
To help Wikipedia become more helpful to speakers of smaller languages, we’re working with volunteers, translators and Wikipedians across India, the Middle East and Africa to translate more than 16 million words for Wikipedia into Arabic, Gujarati, Hindi, Kannada, Swahili, Tamil and Telugu. We began these efforts in 2008, starting with translating Wikipedia articles into Hindi, a language spoken by tens of millions of Internet users. At that time the Hindi Wikipedia had only 3.4 million words across 21,000 articles—while in contrast, the English Wikipedia had 1.3 billion words across 2.5 million articles.
We selected the Wikipedia articles using a couple of different sets of criteria. First, we used Google search data to determine the most popular English Wikipedia articles read in India. Using Google Trends, we found the articles that were consistently read over time—and not just temporarily popular. Finally we used Translator Toolkit to translate articles that either did not exist or were placeholder articles or “stubs” in Hindi Wikipedia. In three months, we used a combination of human and machine translation tools to translate 600,000 words from more than 100 articles in English Wikipedia, growing Hindi Wikipedia by almost 20 percent. We’ve since repeated this process for other languages, to bring our total number of words translated to 16 million.
We’re off to a good start but, as you can see in the graph below, we have a lot more work to do to bring the information in Wikipedia to people worldwide:
Number of non-stub Wikipedia articles by Internet users, normalized (English = 1)
We presented these results last Saturday, July 10, at Wikimania 2010 in GdaĆsk, Poland. We look forward to continuing to support the creation of the world’s largest encyclopedia and we can’t wait to work with Wikipedians and volunteers to create more content worldwide.
0 komentar:
Posting Komentar