Re: [Wikidata-l] Data values

19 Dec 2012

Martynas,

could you please let me know where RDF or any of the W3C standards covers
topics like units, uncertainty, and their conversion. I would be very much
interested in that.

Cheers,
Denny

2012/12/19 Martynas Jusevičius &lt;martynas(a)graphity.org&gt;

...
  Hey wikidatians,

 occasionally checking threads in this list like the current one, I get
 a mixed feeling: on one hand, it is sad to see the efforts and
 resources waisted as Wikidata tries to reinvent RDF, and now also
 triplestore design as well as XSD datatypes. What's next, WikiQL
 instead of SPARQL?

 On the other hand, it feels reassuring as I was right to predict this:
 http://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg00056.html
 http://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg00750.html

 Best,

 Martynas
 graphity.org

 On Wed, Dec 19, 2012 at 4:11 PM, Daniel Kinzler
 &lt;daniel.kinzler(a)wikimedia.de&gt; wrote:
  On 19.12.2012 14:34, Friedrich Röhrs wrote:
> Hi,
>
> Sorry for my ignorance, if this is common knowledge: What is the use  case for
   sorting
millions of different measures from different objects? 
 Finding all cities with more than 100000 inhabitants requires the  database to
  look through all values for the property
"population" (or even all  properties
  with countable values, depending on
implementation an query planning),  compare
  each value with "100000" and return
those with a greater value. To speed  this
  up, an index sorted by this value would be
needed.

  For cars there could be entries by the
manufacturer, by some
 car-testing magazine, etc. I don't see how this could be adequatly
 represented/sorted by a database only query. 
 If this cannot be done adequatly on the database level, then it cannot  be done
  efficiently, which means we will not allow it. So
our task is to come up  with an
  architecture that does allow this.

 (One way to allow "scripted" queries like this to run efficiently is to 
do this
  in a massively parallel way, using a map/reduce
framework. But that's  also not
  trivial, and would require a whole new server
infrastructure).

> If however this is necessary, i still don't understand why it must  affect
the
 > datavalue structure. If a index is necessary
it could be done over a  serialized

representation of the value. 
 "Serialized" can mean a lot of things, but an index on some data blob is 
only
  useful for exact matches, it can not be used for
greater/lesser queries.  We need
  to map our values to scalar data types the
database can understand  directly, and
  use for indexing.

> This needs to be done anyway, since the values are
> saved at a specific unit (which is just a wikidata item). To compare  them on
a
 > database level they must all be saved at the
same unit, or some sort of
> procedure must be used to compare them (or am i missing something  again?).

 If they measure the same dimension, they should be saved using the same  unit
  (probably the SI base unit for that dimension).
Saving values using  different
  units would make it impossible to run efficient
queries against these  values,
  thereby defying one of the major reasons for
Wikidata's existance. I  don't see a
  way around this.

 -- daniel

 --
 Daniel Kinzler, Softwarearchitekt
 Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

 _______________________________________________
 Wikidata-l mailing list
 Wikidata-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l 
 _______________________________________________
 Wikidata-l mailing list
 Wikidata-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l

-- 
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata-l] Data values