In most cases, however, you can infer the property type from the datavalue type, but not in all. Unfortunately, you do not generally find the property type in a dump before you find its first use.
Hi all,
I think one source of confusion here are the overlapping names of property datatypes and datavalue types. Basically, the mapping is as follows right now:
[Format: property type => datavalue type occurring in current dumps]
'wikibase-item' => 'wikibase-entityid'
'string' => 'string'
'time' => 'time'
'globe-coordinate' => 'globecoordinate'
'commonsMedia' => 'string'
The point is that "string" on the left is not the same as "string" on the right. (Also note the lack of a consistent naming scheme for these ids :-/ ...) In most cases, however, you can infer the property type from the datavalue type, but not in all. Unfortunately, you do not generally find the property type in a dump before you find its first use.
The wda script's RDF export has code for dealing with this. It remembers all types that it finds (from P entities in the dump), it infers types from values where possible, and it uses the API to find out the type of a property if all else fails (typically, if you find a string value but don't know yet if the property is of type string or commonsMedia). In addition, the script has a hardcoded list of known types that can be extended (there are not so many properties and their types never change, hence one can do this quite easily). You can find all the code at [1].
Cheers,
Markus
[1] https://github.com/mkroetzsch/wda/blob/master/includes/epTurtleFileWriter.py (esp. see __getPropertyType() and __fetchPropertyType())
On 21/08/13 21:00, Byrial Jensen wrote:
Den 21-08-2013 21:09, Hady elsahar skrev:
Hello Jeroen ,
can i get from your words that this page :
http://www.wikidata.org/wiki/Special:ListDatatypes
is not up to date ?if so how can i get all the datatypes in Wikidata ?
Pages in the virtual Special namespace are generated by MediaWiki on
demand, and are therefore always (in principle - there can be caching in
some cases) up to date.
string could be anything ( so time could be a string) , but there's a
defined lower level representation of common media files . so is it
wrong to represent it as string ,
Time cannot be a string, as there are several components in a time value
(time, timezone, precision, calendar model, before and after precisions).
I see nothing wrong in storing commonsMedia values as string values. You
will know from the property's datatype that the string is a CommonsMedia
string.
Regards,
- Byrial
_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l
_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l