On 13/08/13 22:26, Claus Stadler wrote:
Hi Markus,
Thank you for your information. Below some thoughts and comments from
our side.
> This is because the Wikidata datatype for
numbers is not implemented
yet.
Ok, is there a timeline when this is planned to be ready?
(Lydia answered this)
> To edit Wikidata, you should create an
account.
But one can use an existing Wikidata account for performing edits via
the Wikidata API' "login" action, right?
Yes.
> If you intend to do mass edits, ...
I
don't think we are. The new DBpedia interface should just support
users by transferring selected DBpedia data items to WD *via their own
WD account*.
Ok, fair enough. One can also do anonymous edits, of course (but people
should be warned then that their IP will be recorded publicly).
> I could imagine that most of the data gets
bulk imported from
Wikipedia infoboxes by the community anyway.
By "bulk imported from Wikipedia infoboxes by the community " are you
referring to an automatic or manual process?
For anything manual, the idea is to use the DBpedia-Viewer as a support
tool, as it already contains the data from the infoboxes.
If its automatic, can you explain on how the data is extracted from
Wikipedia?
I guess "semi-automatic" best describes it. What usually happens is that
some user proposes an import for some specific property (e.g., import
sex information from Italian Wikipedia categories Man/Woman), and then
this is done. I cannot explain in all cases how users get the
information; it's quite amazing what they do ;-) However, there is no
manual control of all imported facts, and errors have been known to
happen. There are quality control mechanisms on Wikidata to find
problems with the current data (whether imported or not).
> Note that there are some differences beyond
the vocabulary. Most
Wikidata statements have source information attached
The source for an item from DBpedia is the Wikipedia for the
corresponding language.
Yes, that seems to be a good idea.
> , and there are also qualifiers (not used
heavily yet, since the
selection/filtering mechanisms of Wikidata are quite weak so
far, but
this will change). In some domains, such as roles of actors in films,
quantifiers are getting widely used now; so this is not really triple
data any more. But there will be enough triple data left, I guess.
As for qualifiers, they don't really exist on DBpedia, so if someone
wanted to provide them via the DBpedia-Viewer, one would have to be
provide them manually anyway. I am currently not sure, if they are a
priority to us.
There will be enough cases where they are not needed. With respect to
the viewer, the bigger challenge is probably to avoid
duplicates/redundant information (entering a property without any
qualifier is fine if there is nothing at all yet; but if there is
already one with a more specific qualifier, then nothing else should be
added).
So for the DBpedia-Viewer's "transfer DBpedia triple to Wikidata"
feature we see three options, whereas only (c) seems feasible:
(a) The viewer just provides a link the Wikidata, and the user has to
fill out the forms there manually. But we would like a better
interaction between the RDF and WD.
(b) Wikidata offers a way to open an item with a pre-filled out form.
However, on WD it currently seems only a single item can be edit mode.
So this won't work yet, and not sure if this is ever planned to work.
(c) The DBpedia-Viewer aids the user by providing a pre-filled out
edit-form by mapping a triple's propery and object to the corresponding
WD values.
For validation, the user could be presented existing WD-values for that
property. Also, upon edit, a popover or tab with the corresponding WD
page could open up.
Yes, I agree that (c) is most convenient. Since all of the WD interface
is coded in Javascript, using the Web API for data exchange, one can
create custom UIs with similar functionality quite well. In fact, there
are user-contributed Javascript modules that can be activated on
wikidata.org to get alternative/additional UIs for editing. So this can
be integrated into the web site quite easily if the code is there.
So this kind of edit seems to be possible to do with
the API, yet we
would need a mapping between WD RDF and the WD ID's.
All URIs in WD RDF contain the relevant IDs already as substrings.
Moreover, the WD RDF URIs are resolvable and support content negotiation
(though most formats are quite limited, e.g., the RDF is not the
complete RDF that you have in the dumps yet). Is there any further
mapping you need?
Markus
Cheers,
Claus
p.s:
> Note that all property ids start with a P. If
it's of the form Q...,
then it is not a property.
Oops ;)
On 08/13/2013 09:32 PM, Markus Krötzsch wrote:
Hi Claus,
a brief partial reply:
On 13/08/13 16:20, Claus Stadler wrote:
...
For example, I notice that the Wikidata page for
my home town "Berndorf
in Lower Austria" does not contain the population:
http://www.wikidata.org/wiki/Q666615
This is because the Wikidata datatype for numbers is not implemented yet.
Looking at the corresponding DBpedia entry, this information actually
exists there:
http://dbpedia.org/resource/Berndorf,_Lower_Austria
The new DBpedia interface should offer a button next to the "population
8728" triple which enables transfer of this information to Wikidata.
To edit Wikidata, you should create an account. If you intend to do
mass edits, this account should be granted bot status first to avoid
it from being blocked if it sends a lot of requests. This is mostly a
community process: you should discuss the intended edit activities
with the community to find out if they are happy with this (this list
is only about the technical aspects). It is good to have additional
inputs, but I could imagine that most of the data gets bulk imported
from Wikipedia infoboxes by the community anyway, which is what
happens with a lot of data right now.
In another GSoC project, Hady Elsahar is working on mappings between the
wikidata RDF vocabulary and the DBpedia vocabulary.
This means, we can in principle map DBpedia RDF data to Wikidata RDF.
Note that there are some differences beyond the vocabulary. Most
Wikidata statements have source information attached, and there are
also qualifiers (not used heavily yet, since the selection/filtering
mechanisms of Wikidata are quite weak so far, but this will change).
In some domains, such as roles of actors in films, quantifiers are
getting widely used now; so this is not really triple data any more.
But there will be enough triple data left, I guess.
However, looking at the Wikidata API [2] there is
action=wbcreateclaim *
with the example:
api.php?action=wbcreateclaim&entity=q42&property=p9001&snaktype=novalue&token=foobar&baserevid=7201010
So the core question is, how can we map e.g. properties such as
wikidata:population (if that existed) to their respective Wikidata
property identifier (Q12345)?
This goes for any property that may occur in an RDF dump, such as:
http://www.wikidata.org/wiki/Special:EntityData/Q666615.nt
Note that all property ids start with a P. If it's of the form Q...,
then it is not a property.
Cheers,
Markus