Hello Markus ,

thanks for pointing to wda code it's very useful , 
i guess by looking on the Wikidata glossary  property data types and data value types are the same thing : 
http://www.wikidata.org/wiki/Wikidata:Glossary#Datatypes

this may be shallow a little bit , but what i saw is that (correct me if i'm mistaken) :
- they don't use the same names when you search for the datatype of the property item and the value type of the item that uses this property.

another problem is that they decided to represent commonsMedia in strings , for some purpose i don't know that's why i didn't get it and thought it's some sort of consistency 


In most cases, however, you can infer the property type from the datavalue type, but not in all. Unfortunately, you do not generally find the property type in a dump before you find its first use.

could you point me why depending on such mappings didn't always work , for just wikipedia common files ?

'wikibase-item' => 'wikibase-entityid'
'string' => 'string'
'time' => 'time'
'globe-coordinate' => 'globecoordinate'
'commonsMedia' => 'string'


thanks
Regards




On Thu, Aug 22, 2013 at 11:33 AM, Markus Krötzsch <markus@semantic-mediawiki.org> wrote:
Hi all,

I think one source of confusion here are the overlapping names of property datatypes and datavalue types. Basically, the mapping is as follows right now:

[Format: property type => datavalue type occurring in current dumps]

'wikibase-item' => 'wikibase-entityid'
'string' => 'string'
'time' => 'time'
'globe-coordinate' => 'globecoordinate'
'commonsMedia' => 'string'

The point is that "string" on the left is not the same as "string" on the right. (Also note the lack of a consistent naming scheme for these ids :-/ ...) In most cases, however, you can infer the property type from the datavalue type, but not in all. Unfortunately, you do not generally find the property type in a dump before you find its first use.

The wda script's RDF export has code for dealing with this. It remembers all types that it finds (from P entities in the dump), it infers types from values where possible, and it uses the API to find out the type of a property if all else fails (typically, if you find a string value but don't know yet if the property is of type string or commonsMedia). In addition, the script has a hardcoded list of known types that can be extended (there are not so many properties and their types never change, hence one can do this quite easily). You can find all the code at [1].

Cheers,

Markus

[1] https://github.com/mkroetzsch/wda/blob/master/includes/epTurtleFileWriter.py (esp. see __getPropertyType() and __fetchPropertyType())


On 21/08/13 21:00, Byrial Jensen wrote:
Den 21-08-2013 21:09, Hady elsahar skrev:
Hello Jeroen ,

can i get from your words that this page :
http://www.wikidata.org/wiki/Special:ListDatatypes
is not up to date ?if so how can i get all the datatypes in Wikidata ?

Pages in the virtual Special namespace are generated by MediaWiki on
demand, and are therefore always (in principle - there can be caching in
some cases) up to date.

string could be anything ( so time could be a string) , but there's a
defined lower level representation of common media files . so is it
wrong to represent it as string ,

Time cannot be a string, as there are several components in a time value
(time, timezone, precision, calendar model, before and after precisions).

I see nothing wrong in storing commonsMedia values as string values. You
will know from the property's datatype that the string is a CommonsMedia
string.

Regards,
- Byrial


_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l



--
-------------------------------------------------
Hady El-Sahar
Research Assistant 
Center of Informatics Sciences | Nile University