Am 04.06.2015 um 14:00 schrieb Markus Krötzsch <markus(a)semantic-mediawiki.org>rg>:
On 04.06.2015 12:17, Dimitris Kontokostas wrote:
...
Another question: can DBpedia extract references from Wikipedia
articles too? If this would be possible, it might be feasible to
guess and suggest a reference (or a list of references). Especially
with things like date of death, one would expect that references
have a publication date very close to (but strictly after) the
event, which could narrow down the choices very much.
We don't extract them for now, although I think we could relatively
easily. The problem in this case would be that we cannot associate
references with facts. The DBpedia Information Extraction Framework is
quite module and can be easily extended with new extractors but it is
hard to make these extractors "talk to each other".
So we could easily get something like the following
dbp:A dbo:birthDate "..."
dbp:A dbo:deahthDate "..."
dbp:A dbo:reference dbp:r1 # and maybe " dbp:r1 ....something else"
depending on the modeling
dbp:A dbo:reference dbp:r2
but not sure if this solves your problem
Yes, I understand that you can hardly get the association between extracted facts and
references. My suggestion was to extract both independently and then to query for
references that have a publication date close to a person's death so as to suggest
them to users as a possible reference for the death-date fact. This would still require a
manual check, since we cannot know if the guessed reference belongs to the date of death,
but if it has a high precision it would be a worthwhile way of spending volunteer time to
obtain confirmed references.
The DBpedia Events Dataset [
http://events.dbpedia.org/] contains people who died recently.
Well, this is extracted from DBpedia Live, which is again extracted from Wikipedia
articles. But it usually gets peoples death by the end of the day, which is often before
it is in the (German) news:
http://events.dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fevents.dbpe…
At the same time, it might be one of the fastest ways
to get sourced date of death into Wikidata, since news articles will usually appear before
the major authority files are updated (so even if we get donations from them, some lag
would remain). With such an extraction framework, one could establish a pipeline from
Wikipedia to Wikidata.
In the long run, references from authority files will become more valuable than news
articles, because they are more long-lived.
Best wishes,
Markus
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
--
Magnus Knuth
Hasso-Plattner-Institut für Softwaresystemtechnik GmbH
Prof.-Dr.-Helmert-Str. 2-3
14482 Potsdam
Amtsgericht Potsdam, HRB 12184
Geschäftsführung: Prof. Dr. Christoph Meinel
tel: +49 331 5509 547
email: magnus.knuth(a)hpi.de
web:
http://www.hpi.de/
webID:
http://magnus.13mm.de/