Martin Trautmann wrote:
Hi all,
I'm not yet sure about the real purpose of the wikidata project.
Currently I feel a (or the?) main part is the data entry.
I welcome a reasonable and simplified approach. I guess, a
form based entry is the most obvious solution. But I'm not sure about the
appropriate Wiki syntax for the 'expert' mode where everything is open to
extended definitions. I feel the syntax should be closer to XML instead of
HTML commands, since XHTML2 becomes more and more stable. I guess that
there's not much help to implement a new syntax without upcoming standards
in mind.
With Wikidata, you are talking about relational data. It means that the
data is presented in a fixed format for Wikidata data. There will be
restraints to what can be entered where. Entering data using SQL will be
possible for those who have database access. Typically batch data will
be entered using XML. For Ultimate Wiktionary, we are considering TBX.
The import and export of batch data is seperate from the base functionality.
However, I expect a rather different result from wikidata: I'd like to
access not only one fixed set of data, but I'd expect an overview of
multiple records, presented as a table.
I want to filter many records and limit the search to a subset.
I want to sort those records by one or multiple sort fields.
There are many more conclusions drawn from this expectation, such as
- search and replace operation on multiple records
- normalization of entries
- language conversion
- display variations (e.g. abbreviated, columns with markers
instead of field contents, html-table or preformatted text,
reordering of columns, subset of columns, transformation, ...)
- maybe one could even implement spreadsheet like field entry options
- custom sort orders
- search ranges (e.g. depending on field types)
- ...
As it is a relational database, the inclusion of a translation for
"Dutch" in a language will imply that the User Interface of that
language will use that translation in stead of "Dutch". Normalisation of
entries is implicit in a relational database. Language conversion is
possible if the datadesign allows for it. Several projects have been
described on Meta that make use of this eg
http://meta.wikimedia.org/wiki/Using_Ultimate_Wiktionary_for_Commons
Ultimate Wiktionary does this explicitly (see the ERD)
It would be nice to have all the things you describe as spreadsheet like
functionality, custom sort orders etc. We are however talking about a
server side application. These kind of functionality require huge
amounts of processing power and disk IO, I do not think this will prove
to be practical. We will have some fixed functionality and we should be
happy when we achieve that.
But the main request here is: wikidata should be able
to merge info from
multiple sources into one table overview.
As Wikidata will be relational, it will not be one table overview.
Merging data in one repository however is very much the goal.
I know about several projects which work this way on wikipedia. One of
them is to filter all wikipedia geo coordinates and put them into one
extracted format. I don't know about the operation after: Could I see this
extracted info in wikipedia/wikidata itself? Could I limit my view of the
full set of data (several thousand records) e.g. to a limited range (e.g.
an area within a certain range). Could I fix errors in this subset? And
would those modification be updated back to the original sources?
When geodata becomes a Wikidata project; the data will be available in a
database, this database should be lanuguage neutral so that the data
should be useable in all our projects. This will be its first
application; sharing data and making data available to all projects.
Providing other types of access to the database is a next step. Given
the problems that we have with keeping our search engine always on line,
I think that would have priority over all kinds of nifty search
functionality on Wikidata data.
Fixing data in Wikidata projects, adding, modifying and removing is very
much the object of the excercise. As this is database information it
should be reflected instantly.
I guess, it COULD be done - in theory. I don't know
yet whether this is
one of the goals of wikidata.
Thanks,
Martin