Wikidata September 2006

wikidata@lists.wikimedia.org

2 participants
2 discussions

by Sabine Cretella

http://de.wikipedia.org/wiki/Liste_der_byzantinischen_Kaiser this here is a list that can be worked on with translations and basic data - where should we place links to lists where we can work on? Ciao, Sabine ___________________________________ Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB http://mail.yahoo.it

17 years, 7 months

Re: [Wikidata-l] Creating content for many Wikipedias

by Purodha B Blissenbach

Basically I am with Sabine and support the idea. Yet I want to warn against doing it now and doing it quickly so as to avoid certain pitfalls. I suggest, rather to develop & train bots, data, and algorithms with the test wikipedia only, for the time being, where spefic situations can easily be (re)created without risk of havoc. What follows are details and reasons, you can safely stop reading here if not interested. I've been mass inserting data in the ripuarian test wikipedia at a semi autmated level, which I created from several small database like collections, such as: - names, and ISO codes of languages having Wikipedias, - dates and mottos of carnival parades in the city of cologne of the last 185 years. - redirects for dialectal, and spelling variants - etc. So I've (limited) experience. Pitfalls to be avoided. ---------------------- If we have inserted data in a WP already, and later a refined version of that data becomes available, we want to pass that to the WP. This becomes complicated when an article already exists for a record. Thus we may strategically choose to export data as late as possible, at an as complete state as possible, when general additions and amendments have become unlikely, and the data structure is stable. We can safely replace articles, when we can determine that they have been unaltered since our own last update - i.e. we need to be able to look at the version history for those cases. When an article has been conventionally updated by an editor, that may mean, he altered data, which we originally supplied, and that we have to update our source before we may re-exort data to WPs again. It is possible, that an update made in one WP shall influence others as well, though this is not neccessarily so. When we say, we supply only some specific data to an article, e.g. an infobox, then we can reread the infobox, and if it has not been altered, we can rewrite it for an update. We can also use such infoboxes to import new data from WPs, when they have been altered, e.g. someone died. We should have, however, some protection agains collecting errors, garbage, and vandal drivel. Both such uses should imho be documented by comments in the wikicode of the articles in question. Editors must know of the implications of their edits. Summarizing all this, I'd suggest to carefulls plan, and test drive, all aplications having the least chance to be more than sheer article-creation-and-the-leave-it-alone-forever projects. Language. -------- Another field needing attention is language. A pretty huge number of names (of persons, places, langages, etc.) are identical between languages, or are transliterated somehow, or undergo systematic transformations (e.g. of the kind that Estonian versions of male names have 'as' appended to them, afaik) etc. The rule of thumb is that for lesser-known distant things (places, languages, persons, etc.) the existance of special or irregular translations is very unlikely. That may mean, we can compile a set of transformation rules, and an exception lookup mechanism (e.g. in WiktionaryZ) and pretty well assume, when exceptions are not found, that we can use the rules. Naturally, when this assumption fails, we need to have a feedback path from the respective language community, that allows us to "repair" errors. Since in most Wikipedias there are editors reviewing all, or most, new articles, we can assume feedback to be rather quick and reliable. Finding the right, grammar, wording etc. for automatically generated non-tabular content is quite an interesting task which I'll not address here any further ;-) Community aspects. ----------------- Wikies not having alert proofreaders should imho not be filled with much automated content, since this might be a remarkable hindrance for community buildup. The amount of newly inserted automated data should be determinable by wiki admins, and generally it might be wise to make it somehow related to the number of edits of any given time period, so as not to overload the community. How wiki admins find the right figures should imho be left to them, valid suggestion might be by public voting, or taken from experience of how thoroughly data can be verified. Alo, keeping data up to date needs imho to be negotiated with the communities. I bet, we'll receive several interesting ideas of how this could be accomplished without interfering with potential human editors too much. Greetings to all Purodha -- e-mail: <wikidata-l.mail.wikimedia.org(a)publi.purodha.net>

17 years, 7 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Wikidata September 2006