My philosophy is this: We should do whatever works best for Wikidata and Wikidata's needs. If people want to reuse our content, and the choices we've made make existing tools unworkable, they can build new tools themselves. We should not be clinging to "what's been done already" if it gets in the way of "what will make Wikidata better". Everything that we make and do is open, including the software we're going to operate the database on. Every WMF project has done things differently from the standards of the time, and people have developed tools to use our content before. Wikidata will be no different in that regard.

Sven

On Wed, Dec 19, 2012 at 12:27 PM, Martynas Jusevičius <martynas@graphity.org> wrote:

Denny,

you're sidestepping the main issue here -- every sensible architecture
should build on as much previous standards as possible, and build own
custom solution only if a *very* compelling reason is found to do so
instead of finding a compromise between the requirements and the
standard. Wikidata seems to be constantly doing the opposite --
building a custom solution with whatever reason, or even without it.
This drives the compatibility and reuse towards zero.

This thread originally discussed datatypes for values such as numbers,
dates and their intervals -- semantics for all of those are defined in
XML Schema Datatypes: http://www.w3.org/TR/xmlschema-2/
All the XML and RDF tools are compatible with XSD, however I don't
think there is even a single mention of it in this thread? What makes
Wikidata so special that its datatypes cannot build on XSD? And this
is only one of the issues, I've pointed out others earlier.

Martynas
graphity.org

On Wed, Dec 19, 2012 at 5:58 PM, Denny Vrandečić
<denny.vrandecic@wikimedia.de> wrote:
> Martynas,
>
> could you please let me know where RDF or any of the W3C standards covers
> topics like units, uncertainty, and their conversion. I would be very much
> interested in that.
>
> Cheers,
> Denny
>
>
>
>
> 2012/12/19 Martynas Jusevičius <martynas@graphity.org>
>>
>> Hey wikidatians,
>>
>> occasionally checking threads in this list like the current one, I get
>> a mixed feeling: on one hand, it is sad to see the efforts and
>> resources waisted as Wikidata tries to reinvent RDF, and now also
>> triplestore design as well as XSD datatypes. What's next, WikiQL
>> instead of SPARQL?
>>
>> On the other hand, it feels reassuring as I was right to predict this:
>> http://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg00056.html
>> http://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg00750.html
>>
>> Best,
>>
>> Martynas
>> graphity.org
>>
>> On Wed, Dec 19, 2012 at 4:11 PM, Daniel Kinzler
>> <daniel.kinzler@wikimedia.de> wrote:
>> > On 19.12.2012 14:34, Friedrich Röhrs wrote:
>> >> Hi,
>> >>
>> >> Sorry for my ignorance, if this is common knowledge: What is the use
>> >> case for
>> >> sorting millions of different measures from different objects?
>> >
>> > Finding all cities with more than 100000 inhabitants requires the
>> > database to
>> > look through all values for the property "population" (or even all
>> > properties
>> > with countable values, depending on implementation an query planning),
>> > compare
>> > each value with "100000" and return those with a greater value. To speed
>> > this
>> > up, an index sorted by this value would be needed.
>> >
>> >> For cars there could be entries by the manufacturer, by some
>> >> car-testing magazine, etc. I don't see how this could be adequatly
>> >> represented/sorted by a database only query.
>> >
>> > If this cannot be done adequatly on the database level, then it cannot
>> > be done
>> > efficiently, which means we will not allow it. So our task is to come up
>> > with an
>> > architecture that does allow this.
>> >
>> > (One way to allow "scripted" queries like this to run efficiently is to
>> > do this
>> > in a massively parallel way, using a map/reduce framework. But that's
>> > also not
>> > trivial, and would require a whole new server infrastructure).
>> >
>> >> If however this is necessary, i still don't understand why it must
>> >> affect the
>> >> datavalue structure. If a index is necessary it could be done over a
>> >> serialized
>> >> representation of the value.
>> >
>> > "Serialized" can mean a lot of things, but an index on some data blob is
>> > only
>> > useful for exact matches, it can not be used for greater/lesser queries.
>> > We need
>> > to map our values to scalar data types the database can understand
>> > directly, and
>> > use for indexing.
>> >
>> >> This needs to be done anyway, since the values are
>> >> saved at a specific unit (which is just a wikidata item). To compare
>> >> them on a
>> >> database level they must all be saved at the same unit, or some sort of
>> >> procedure must be used to compare them (or am i missing something
>> >> again?).
>> >
>> > If they measure the same dimension, they should be saved using the same
>> > unit
>> > (probably the SI base unit for that dimension). Saving values using
>> > different
>> > units would make it impossible to run efficient queries against these
>> > values,
>> > thereby defying one of the major reasons for Wikidata's existance. I
>> > don't see a
>> > way around this.
>> >
>> > -- daniel
>> >
>> > --
>> > Daniel Kinzler, Softwarearchitekt
>> > Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>> >
>> >
>> > _______________________________________________
>> > Wikidata-l mailing list
>> > Wikidata-l@lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>>
>> _______________________________________________
>> Wikidata-l mailing list
>> Wikidata-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>
>
>
>
> --
> Project director Wikidata
> Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
> Tel. +49-30-219 158 26-0 | http://wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
> der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
> Körperschaften I Berlin, Steuernummer 27/681/51985.
>
> _______________________________________________
> Wikidata-l mailing list
> Wikidata-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>

_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l