Martynas,
could you please let me know where RDF or any of the W3C standards covers topics like units, uncertainty, and their conversion. I would be very much interested in that.
Cheers, Denny
2012/12/19 Martynas Jusevičius martynas@graphity.org
Hey wikidatians,
occasionally checking threads in this list like the current one, I get a mixed feeling: on one hand, it is sad to see the efforts and resources waisted as Wikidata tries to reinvent RDF, and now also triplestore design as well as XSD datatypes. What's next, WikiQL instead of SPARQL?
On the other hand, it feels reassuring as I was right to predict this: http://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg00056.html http://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg00750.html
Best,
Martynas graphity.org
On Wed, Dec 19, 2012 at 4:11 PM, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
On 19.12.2012 14:34, Friedrich Röhrs wrote:
Hi,
Sorry for my ignorance, if this is common knowledge: What is the use
case for
sorting millions of different measures from different objects?
Finding all cities with more than 100000 inhabitants requires the
database to
look through all values for the property "population" (or even all
properties
with countable values, depending on implementation an query planning),
compare
each value with "100000" and return those with a greater value. To speed
this
up, an index sorted by this value would be needed.
For cars there could be entries by the manufacturer, by some car-testing magazine, etc. I don't see how this could be adequatly represented/sorted by a database only query.
If this cannot be done adequatly on the database level, then it cannot
be done
efficiently, which means we will not allow it. So our task is to come up
with an
architecture that does allow this.
(One way to allow "scripted" queries like this to run efficiently is to
do this
in a massively parallel way, using a map/reduce framework. But that's
also not
trivial, and would require a whole new server infrastructure).
If however this is necessary, i still don't understand why it must
affect the
datavalue structure. If a index is necessary it could be done over a
serialized
representation of the value.
"Serialized" can mean a lot of things, but an index on some data blob is
only
useful for exact matches, it can not be used for greater/lesser queries.
We need
to map our values to scalar data types the database can understand
directly, and
use for indexing.
This needs to be done anyway, since the values are saved at a specific unit (which is just a wikidata item). To compare
them on a
database level they must all be saved at the same unit, or some sort of procedure must be used to compare them (or am i missing something
again?).
If they measure the same dimension, they should be saved using the same
unit
(probably the SI base unit for that dimension). Saving values using
different
units would make it impossible to run efficient queries against these
values,
thereby defying one of the major reasons for Wikidata's existance. I
don't see a
way around this.
-- daniel
-- Daniel Kinzler, Softwarearchitekt Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l