Martin Trautmann wrote:
Hi all,
I'm not yet sure about the real purpose of the wikidata project.
Currently I feel a (or the?) main part is the data entry.
I welcome a reasonable and simplified approach. I guess, a form based entry is the most obvious solution. But I'm not sure about the appropriate Wiki syntax for the 'expert' mode where everything is open to extended definitions. I feel the syntax should be closer to XML instead of HTML commands, since XHTML2 becomes more and more stable. I guess that there's not much help to implement a new syntax without upcoming standards in mind.
With Wikidata, you are talking about relational data. It means that the data is presented in a fixed format for Wikidata data. There will be restraints to what can be entered where. Entering data using SQL will be possible for those who have database access. Typically batch data will be entered using XML. For Ultimate Wiktionary, we are considering TBX. The import and export of batch data is seperate from the base functionality.
However, I expect a rather different result from wikidata: I'd like to access not only one fixed set of data, but I'd expect an overview of multiple records, presented as a table.
I want to filter many records and limit the search to a subset. I want to sort those records by one or multiple sort fields.
There are many more conclusions drawn from this expectation, such as
- search and replace operation on multiple records
- normalization of entries
- language conversion
- display variations (e.g. abbreviated, columns with markers
instead of field contents, html-table or preformatted text, reordering of columns, subset of columns, transformation, ...)
- maybe one could even implement spreadsheet like field entry options
- custom sort orders
- search ranges (e.g. depending on field types)
- ...
As it is a relational database, the inclusion of a translation for "Dutch" in a language will imply that the User Interface of that language will use that translation in stead of "Dutch". Normalisation of entries is implicit in a relational database. Language conversion is possible if the datadesign allows for it. Several projects have been described on Meta that make use of this eg http://meta.wikimedia.org/wiki/Using_Ultimate_Wiktionary_for_Commons Ultimate Wiktionary does this explicitly (see the ERD)
It would be nice to have all the things you describe as spreadsheet like functionality, custom sort orders etc. We are however talking about a server side application. These kind of functionality require huge amounts of processing power and disk IO, I do not think this will prove to be practical. We will have some fixed functionality and we should be happy when we achieve that.
But the main request here is: wikidata should be able to merge info from multiple sources into one table overview.
As Wikidata will be relational, it will not be one table overview. Merging data in one repository however is very much the goal.
I know about several projects which work this way on wikipedia. One of them is to filter all wikipedia geo coordinates and put them into one extracted format. I don't know about the operation after: Could I see this extracted info in wikipedia/wikidata itself? Could I limit my view of the full set of data (several thousand records) e.g. to a limited range (e.g. an area within a certain range). Could I fix errors in this subset? And would those modification be updated back to the original sources?
When geodata becomes a Wikidata project; the data will be available in a database, this database should be lanuguage neutral so that the data should be useable in all our projects. This will be its first application; sharing data and making data available to all projects. Providing other types of access to the database is a next step. Given the problems that we have with keeping our search engine always on line, I think that would have priority over all kinds of nifty search functionality on Wikidata data.
Fixing data in Wikidata projects, adding, modifying and removing is very much the object of the excercise. As this is database information it should be reflected instantly.
I guess, it COULD be done - in theory. I don't know yet whether this is one of the goals of wikidata.
Thanks, Martin