Hello Markus ,
thanks for pointing to wda code it's very useful ,
i guess by looking on the Wikidata glossary property data types and
data value types are the same thing :
http://www.wikidata.org/wiki/Wikidata:Glossary#Datatypes
this may be shallow a little bit , but what i saw is that (correct me if
i'm mistaken) :
- they don't use the same names when you search for the datatype of the
property item and the value type of the item that uses this property.
another problem is that they decided to represent commonsMedia in
strings , for some purpose i don't know that's why i didn't get it and
thought it's some sort of consistency
In most cases, however, you can infer the property type from the
datavalue type, but not in all. Unfortunately, you do not generally
find the property type in a dump before you find its first use.
could you point me why depending on such mappings didn't always work ,
for just wikipedia common files ?
'wikibase-item' => 'wikibase-entityid'
'string' => 'string'
'time' => 'time'
'globe-coordinate' => 'globecoordinate'
'commonsMedia' => 'string'
The key is to understand that property types and value types are *not*
the same. They match in many cases, but not in all. In the future, there
might be more property types that use the same value type. Property
types are what the user sees; they define every detail of user
interaction and UI. Value types are part of the underlying data model;
they define what the content of the data is. For most data processing,
you should not need to know the property type.
The situation with commonsMedia is a bit bad because it should be a URL
rather than a string. What I do in wda is effectively a type conversion
from string to URI in this particular case. Maybe we can fix this
somehow in the future when URIs are supported as a value datatype.
Markus
On Thu, Aug 22, 2013 at 11:33 AM, Markus Krötzsch
<markus(a)semantic-mediawiki.org <mailto:markus@semantic-mediawiki.org>>
wrote:
Hi all,
I think one source of confusion here are the overlapping names of
property datatypes and datavalue types. Basically, the mapping is as
follows right now:
[Format: property type => datavalue type occurring in current dumps]
'wikibase-item' => 'wikibase-entityid'
'string' => 'string'
'time' => 'time'
'globe-coordinate' => 'globecoordinate'
'commonsMedia' => 'string'
The point is that "string" on the left is not the same as
"string"
on the right. (Also note the lack of a consistent naming scheme for
these ids :-/ ...) In most cases, however, you can infer the
property type from the datavalue type, but not in all.
Unfortunately, you do not generally find the property type in a dump
before you find its first use.
The wda script's RDF export has code for dealing with this. It
remembers all types that it finds (from P entities in the dump), it
infers types from values where possible, and it uses the API to find
out the type of a property if all else fails (typically, if you find
a string value but don't know yet if the property is of type string
or commonsMedia). In addition, the script has a hardcoded list of
known types that can be extended (there are not so many properties
and their types never change, hence one can do this quite easily).
You can find all the code at [1].
Cheers,
Markus
[1]
https://github.com/mkroetzsch/__wda/blob/master/includes/__epTurtleFileWrit…
<https://github.com/mkroetzsch/wda/blob/master/includes/epTurtleFileWriter.py>
(esp. see __getPropertyType() and __fetchPropertyType())
On 21/08/13 21:00, Byrial Jensen wrote:
Den 21-08-2013 21 <tel:21-08-2013%2021>:09, Hady elsahar skrev:
Hello Jeroen ,
can i get from your words that this page :
http://www.wikidata.org/wiki/__Special:ListDatatypes
<http://www.wikidata.org/wiki/Special:ListDatatypes>
is not up to date ?if so how can i get all the datatypes in
Wikidata ?
Pages in the virtual Special namespace are generated by MediaWiki on
demand, and are therefore always (in principle - there can be
caching in
some cases) up to date.
string could be anything ( so time could be a string) , but
there's a
defined lower level representation of common media files .
so is it
wrong to represent it as string ,
Time cannot be a string, as there are several components in a
time value
(time, timezone, precision, calendar model, before and after
precisions).
I see nothing wrong in storing commonsMedia values as string
values. You
will know from the property's datatype that the string is a
CommonsMedia
string.
Regards,
- Byrial
_________________________________________________
Wikidata-l mailing list
Wikidata-l(a)lists.wikimedia.org
<mailto:Wikidata-l@lists.wikimedia.org>
https://lists.wikimedia.org/__mailman/listinfo/wikidata-l
<https://lists.wikimedia.org/mailman/listinfo/wikidata-l>
_________________________________________________
Wikidata-l mailing list
Wikidata-l(a)lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org>
https://lists.wikimedia.org/__mailman/listinfo/wikidata-l
<https://lists.wikimedia.org/mailman/listinfo/wikidata-l>
--
-------------------------------------------------
Hady El-Sahar
Research Assistant
Center of Informatics Sciences | Nile University
<http://nileuniversity.edu.eg/>
_______________________________________________
Wikidata-l mailing list
Wikidata-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l