Hello Wikidata Folks,
My name is Brent Hecht, and I've done a great deal of research on the differences and similarities between the language editions. I was really excited to hear about the Wikidata project moving forward, and I think some of my research might be of assistance. I'd enjoy being able to help the community make this important transition.
In particular, my experience navigating interlanguage link conflicts might be able to help in Phase 1. Please let me know if there's anything I can do over the short term or long term!
Some of my relevant papers:
[1] Bao, P., Hecht, B., Carton, S., Quaderi, M., Horn, M. and Gergle, D. 2012. Omnipedia: Bridging the Wikipedia Language Gap. CHI '12: 30th International Conference on Human Factors in Computing Systems (2012). [2] Hecht, B. and Gergle, D. 2010. The Tower of Babel Meets Web 2.0: User-Generated Content and Its Applications in a Multilingual Context. CHI '10: 28th International Conference on Human Factors in Computing Systems (Atlanta, GA, 2010), 291--300. [3] Hecht, B. and Gergle, D. 2009. Measuring Self-Focus Bias in Community-Maintained Knowledge Repositories. Communities and Technologies 2009: 4th International Conference on Communities and Technologies (State College, PA, 2009), 11--19.
- Brent
Brent Hecht Ph.D. Candidate in Computer Science CollabLab: The Collaborative Technology Laboratory Northwestern University w:http://www.brenthecht.com http://www.brenthecht.com/ e:brent@u.northwestern.edu mailto:brent@u.northwestern.edu
On Sat, Mar 31, 2012 at 7:15 PM, Brent Hecht brent@u.northwestern.edu wrote:
Hello Wikidata Folks,
My name is Brent Hecht, and I've done a great deal of research on the differences and similarities between the language editions. I was really excited to hear about the Wikidata project moving forward, and I think some of my research might be of assistance. I'd enjoy being able to help the community make this important transition.
In particular, my experience navigating interlanguage link conflicts might be able to help in Phase 1. Please let me know if there's anything I can do over the short term or long term!
Thanks! This does indeed sound helpful for the very first phase of the project. I'd love to have a short chat with you about that. Maybe the week after next? You can use http://doodle.com/nightrose to find a good time if you want.
Cheers Lydia
Hi Brent. How many articles exist in other Wikipedias and don't have an English translation at English Wikipedia? Any estimate?
2012/3/31 Brent Hecht brent@u.northwestern.edu
Hello Wikidata Folks,
My name is Brent Hecht, and I've done a great deal of research on the differences and similarities between the language editions. I was really excited to hear about the Wikidata project moving forward, and I think some of my research might be of assistance. I'd enjoy being able to help the community make this important transition.
In particular, my experience navigating interlanguage link conflicts might be able to help in Phase 1. Please let me know if there's anything I can do over the short term or long term!
Some of my relevant papers:
[1] Bao, P., Hecht, B., Carton, S., Quaderi, M., Horn, M. and Gergle, D. 2012. Omnipedia: Bridging the Wikipedia Language Gap. CHI ’12: 30th International Conference on Human Factors in Computing Systems (2012). [2] Hecht, B. and Gergle, D. 2010. The Tower of Babel Meets Web 2.0: User-Generated Content and Its Applications in a Multilingual Context. CHI ’10: 28th International Conference on Human Factors in Computing Systems (Atlanta, GA, 2010), 291–300. [3] Hecht, B. and Gergle, D. 2009. Measuring Self-Focus Bias in Community-Maintained Knowledge Repositories. Communities and Technologies 2009: 4th International Conference on Communities and Technologies (State College, PA, 2009), 11–19.
- Brent
Brent Hecht Ph.D. Candidate in Computer Science CollabLab: The Collaborative Technology Laboratory Northwestern University w: http://www.brenthecht.com e: brent@u.northwestern.edu
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Great question!
The high level answer is many more than is assumed by most folks. The main challenge with calculating this number is missing interlanguage links, but in 2010, we found for instance that the English Wikipedia only covered about 41% of concepts in the Japanese Wikipedia, with a missing interlanguage link rate of only 2% (as determined by two bilingual coders going through 150 sample articles). The equivalent number for Italian was 65% with an 8% missing interlanguage link rate.
We have more detailed and up-to-date results (that are also more complicated), but this gets across the general idea.
There is also the matter of article-level diversity (e.g. what gets covered about a given concept in different language editions), but this an issue for another day.
On 4/1/2012 2:21 PM, emijrp wrote:
Hi Brent. How many articles exist in other Wikipedias and don't have an English translation at English Wikipedia? Any estimate?
2012/3/31 Brent Hecht <brent@u.northwestern.edu mailto:brent@u.northwestern.edu>
Hello Wikidata Folks, My name is Brent Hecht, and I've done a great deal of research on the differences and similarities between the language editions. I was really excited to hear about the Wikidata project moving forward, and I think some of my research might be of assistance. I'd enjoy being able to help the community make this important transition. In particular, my experience navigating interlanguage link conflicts might be able to help in Phase 1. Please let me know if there's anything I can do over the short term or long term! Some of my relevant papers: [1] Bao, P., Hecht, B., Carton, S., Quaderi, M., Horn, M. and Gergle, D. 2012. Omnipedia: Bridging the Wikipedia Language Gap. CHI '12: 30th International Conference on Human Factors in Computing Systems (2012). [2] Hecht, B. and Gergle, D. 2010. The Tower of Babel Meets Web 2.0: User-Generated Content and Its Applications in a Multilingual Context. CHI '10: 28th International Conference on Human Factors in Computing Systems (Atlanta, GA, 2010), 291--300. [3] Hecht, B. and Gergle, D. 2009. Measuring Self-Focus Bias in Community-Maintained Knowledge Repositories. Communities and Technologies 2009: 4th International Conference on Communities and Technologies (State College, PA, 2009), 11--19. - Brent Brent Hecht Ph.D. Candidate in Computer Science CollabLab: The Collaborative Technology Laboratory Northwestern University w:http://www.brenthecht.com <http://www.brenthecht.com/> e:brent@u.northwestern.edu <mailto:brent@u.northwestern.edu> _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
A little off topic, but have you any number for propagation rates between languages? On 1. apr. 2012 20.35, "Brent Hecht" brent@u.northwestern.edu wrote:
Great question!
The high level answer is many more than is assumed by most folks. The main challenge with calculating this number is missing interlanguage links, but in 2010, we found for instance that the English Wikipedia only covered about 41% of concepts in the Japanese Wikipedia, with a missing interlanguage link rate of only 2% (as determined by two bilingual coders going through 150 sample articles). The equivalent number for Italian was 65% with an 8% missing interlanguage link rate.
We have more detailed and up-to-date results (that are also more complicated), but this gets across the general idea.
There is also the matter of article-level diversity (e.g. what gets covered about a given concept in different language editions), but this an issue for another day.
On 4/1/2012 2:21 PM, emijrp wrote:
Hi Brent. How many articles exist in other Wikipedias and don't have an English translation at English Wikipedia? Any estimate?
2012/3/31 Brent Hecht brent@u.northwestern.edu
Hello Wikidata Folks,
My name is Brent Hecht, and I've done a great deal of research on the differences and similarities between the language editions. I was really excited to hear about the Wikidata project moving forward, and I think some of my research might be of assistance. I'd enjoy being able to help the community make this important transition.
In particular, my experience navigating interlanguage link conflicts might be able to help in Phase 1. Please let me know if there's anything I can do over the short term or long term!
Some of my relevant papers:
[1] Bao, P., Hecht, B., Carton, S., Quaderi, M., Horn, M. and Gergle, D. 2012. Omnipedia: Bridging the Wikipedia Language Gap. CHI ’12: 30th International Conference on Human Factors in Computing Systems (2012). [2] Hecht, B. and Gergle, D. 2010. The Tower of Babel Meets Web 2.0: User-Generated Content and Its Applications in a Multilingual Context. CHI ’10: 28th International Conference on Human Factors in Computing Systems (Atlanta, GA, 2010), 291–300. [3] Hecht, B. and Gergle, D. 2009. Measuring Self-Focus Bias in Community-Maintained Knowledge Repositories. Communities and Technologies 2009: 4th International Conference on Communities and Technologies (State College, PA, 2009), 11–19.
- Brent
Brent Hecht Ph.D. Candidate in Computer Science CollabLab: The Collaborative Technology Laboratory Northwestern University w: http://www.brenthecht.com e: brent@u.northwestern.edu
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing listWikidata-l@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l