Great question!

The high level answer is many more than is assumed by most folks. The main challenge with calculating this number is missing interlanguage links, but in 2010, we found for instance that the English Wikipedia only covered about 41% of concepts in the Japanese Wikipedia, with a missing interlanguage link rate of only 2% (as determined by two bilingual coders going through 150 sample articles). The equivalent number for Italian was 65% with an 8% missing interlanguage link rate.

We have more detailed and up-to-date results (that are also more complicated), but this gets across the general idea.

There is also the matter of article-level diversity (e.g. what gets covered about a given concept in different language editions), but this an issue for another day.

On 4/1/2012 2:21 PM, emijrp wrote:
Hi Brent. How many articles exist in other Wikipedias and don't have an English translation at English Wikipedia? Any estimate?

2012/3/31 Brent Hecht <brent@u.northwestern.edu>
Hello Wikidata Folks,

My name is Brent Hecht, and I've done a great deal of research on the differences and similarities between the language editions. I was really excited to hear about the Wikidata project moving forward, and I think some of my research might be of assistance. I'd enjoy being able to help the community make this important transition.

In particular, my experience navigating interlanguage link conflicts might be able to help in Phase 1. Please let me know if there's anything I can do over the short term or long term!

Some of my relevant papers:

[1] Bao, P., Hecht, B., Carton, S., Quaderi, M., Horn, M. and Gergle, D. 2012. Omnipedia: Bridging the Wikipedia Language Gap. CHI  ’12: 30th International Conference on Human Factors in Computing Systems (2012).
[2] Hecht, B. and Gergle, D. 2010. The Tower of Babel Meets Web 2.0: User-Generated Content and Its Applications in a Multilingual Context. CHI  ’10: 28th International Conference on Human Factors in Computing Systems (Atlanta, GA, 2010), 291–300.
[3] Hecht, B. and Gergle, D. 2009. Measuring Self-Focus Bias in Community-Maintained Knowledge Repositories. Communities and Technologies 2009: 4th International Conference on Communities and Technologies (State College, PA, 2009), 11–19.

- Brent
Brent Hecht
Ph.D. Candidate in Computer Science
CollabLab: The Collaborative Technology Laboratory
Northwestern University
w: http://www.brenthecht.com
e: brent@u.northwestern.edu







_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l




_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l