Hi all,
I'm not yet sure about the real purpose of the wikidata project.
Currently I feel a (or the?) main part is the data entry.
I welcome a reasonable and simplified approach. I guess, a form based entry is the most obvious solution. But I'm not sure about the appropriate Wiki syntax for the 'expert' mode where everything is open to extended definitions. I feel the syntax should be closer to XML instead of HTML commands, since XHTML2 becomes more and more stable. I guess that there's not much help to implement a new syntax without upcoming standards in mind.
However, I expect a rather different result from wikidata: I'd like to access not only one fixed set of data, but I'd expect an overview of multiple records, presented as a table.
I want to filter many records and limit the search to a subset. I want to sort those records by one or multiple sort fields.
There are many more conclusions drawn from this expectation, such as - search and replace operation on multiple records - normalization of entries - language conversion - display variations (e.g. abbreviated, columns with markers instead of field contents, html-table or preformatted text, reordering of columns, subset of columns, transformation, ...) - maybe one could even implement spreadsheet like field entry options - custom sort orders - search ranges (e.g. depending on field types) - ...
But the main request here is: wikidata should be able to merge info from multiple sources into one table overview.
I know about several projects which work this way on wikipedia. One of them is to filter all wikipedia geo coordinates and put them into one extracted format. I don't know about the operation after: Could I see this extracted info in wikipedia/wikidata itself? Could I limit my view of the full set of data (several thousand records) e.g. to a limited range (e.g. an area within a certain range). Could I fix errors in this subset? And would those modification be updated back to the original sources?
I guess, it COULD be done - in theory. I don't know yet whether this is one of the goals of wikidata.
Thanks, Martin
Martin Trautmann wrote:
Hi all,
I'm not yet sure about the real purpose of the wikidata project.
Currently I feel a (or the?) main part is the data entry.
I welcome a reasonable and simplified approach. I guess, a form based entry is the most obvious solution. But I'm not sure about the appropriate Wiki syntax for the 'expert' mode where everything is open to extended definitions. I feel the syntax should be closer to XML instead of HTML commands, since XHTML2 becomes more and more stable. I guess that there's not much help to implement a new syntax without upcoming standards in mind.
With Wikidata, you are talking about relational data. It means that the data is presented in a fixed format for Wikidata data. There will be restraints to what can be entered where. Entering data using SQL will be possible for those who have database access. Typically batch data will be entered using XML. For Ultimate Wiktionary, we are considering TBX. The import and export of batch data is seperate from the base functionality.
However, I expect a rather different result from wikidata: I'd like to access not only one fixed set of data, but I'd expect an overview of multiple records, presented as a table.
I want to filter many records and limit the search to a subset. I want to sort those records by one or multiple sort fields.
There are many more conclusions drawn from this expectation, such as
- search and replace operation on multiple records
- normalization of entries
- language conversion
- display variations (e.g. abbreviated, columns with markers
instead of field contents, html-table or preformatted text, reordering of columns, subset of columns, transformation, ...)
- maybe one could even implement spreadsheet like field entry options
- custom sort orders
- search ranges (e.g. depending on field types)
- ...
As it is a relational database, the inclusion of a translation for "Dutch" in a language will imply that the User Interface of that language will use that translation in stead of "Dutch". Normalisation of entries is implicit in a relational database. Language conversion is possible if the datadesign allows for it. Several projects have been described on Meta that make use of this eg http://meta.wikimedia.org/wiki/Using_Ultimate_Wiktionary_for_Commons Ultimate Wiktionary does this explicitly (see the ERD)
It would be nice to have all the things you describe as spreadsheet like functionality, custom sort orders etc. We are however talking about a server side application. These kind of functionality require huge amounts of processing power and disk IO, I do not think this will prove to be practical. We will have some fixed functionality and we should be happy when we achieve that.
But the main request here is: wikidata should be able to merge info from multiple sources into one table overview.
As Wikidata will be relational, it will not be one table overview. Merging data in one repository however is very much the goal.
I know about several projects which work this way on wikipedia. One of them is to filter all wikipedia geo coordinates and put them into one extracted format. I don't know about the operation after: Could I see this extracted info in wikipedia/wikidata itself? Could I limit my view of the full set of data (several thousand records) e.g. to a limited range (e.g. an area within a certain range). Could I fix errors in this subset? And would those modification be updated back to the original sources?
When geodata becomes a Wikidata project; the data will be available in a database, this database should be lanuguage neutral so that the data should be useable in all our projects. This will be its first application; sharing data and making data available to all projects. Providing other types of access to the database is a next step. Given the problems that we have with keeping our search engine always on line, I think that would have priority over all kinds of nifty search functionality on Wikidata data.
Fixing data in Wikidata projects, adding, modifying and removing is very much the object of the excercise. As this is database information it should be reflected instantly.
I guess, it COULD be done - in theory. I don't know yet whether this is one of the goals of wikidata.
Thanks, Martin
On 2005-07-20 12:26, Gerard Meijssen wrote:
Entering data using SQL will be possible for those who have database access.
Ah, thanks, I did not know yet that this could be done from outside wihtout downloading the full database to do it locally.
As it is a relational database, the inclusion of a translation for "Dutch" in a language will imply that the User Interface of that language will use that translation in stead of "Dutch".
translation can be twofold: some data is native in one language which would required real translation mechanisms.
However, lots of data is in any kind of numerics or normalized form. Thus it's mainly a definition of native column headers or field type formats (e.g. date conversion from mm.dd.yyyy to yyyy-mm-dd).
Several projects have been described on Meta that make use of this eg http://meta.wikimedia.org/wiki/Using_Ultimate_Wiktionary_for_Commons Ultimate Wiktionary does this explicitly (see the ERD)
I never looked at those Wiktionary details. Thanks - I'll check later on. Maybe I should check for inclusions of my *ictionaries, such as Phictionary: Photographic terms http://home.arcor.de/objektive/Phictionary.html Bictionary: Bicycle terms http://www.fa-technik.adfc.de/Ratgeber/Bictionary/index.html
A dictionary is one of those examples where you want search and table options, although the sort requirement is less important.
It would be nice to have all the things you describe as spreadsheet like functionality, custom sort orders etc. We are however talking about a server side application. These kind of functionality require huge amounts of processing power and disk IO, I do not think this will prove to be practical.
I agree that search, sort etc. are server side - and those are huge task. However, google and other search engines show that you may list many found hits on one pages. It's just not in a table view yet.
I don't know about spreadsheet like operation. I guess there could be different client side editors (any kind based on some Java* ?) that could permit spreadsheet like operations, while the SQL interface might be the easiest choice. I guess it's more the question of providing or recommending the proper SQL tools and setup for this kind of work.
Merging data in one repository however is very much the goal.
Glad to hear.
I didn't understand this kind of info from http://meta.wikimedia.org/wiki/Wikidata
Thanks, Martin
Speaking of wikidata, is anyone actually working on software? If so, who and where? I'd like to have a look, after I wrote that mockup some time ago and then declared it unfeasible...
Magnus
Magnus Manske wrote:
Speaking of wikidata, is anyone actually working on software? If so, who and where? I'd like to have a look, after I wrote that mockup some time ago and then declared it unfeasible...
Magnus
Hoi, I am suprised that you are not aware that Erik Moeller is working on it. Erik has promissed to deliver the first bits of software in the near future. Erik did publish some documentation on Meta including some descriptions of how it is to work. Thanks, GerardM
wikitech-l@lists.wikimedia.org