Coming back to an old thread. We now extract references from Wikipedia and are available in the 2015-10 beta release
citation_data_en.ttl.bz2 http://downloads.dbpedia.org/2015-10/core-i18n/en/citation_data_en.ttl.bz2citation_links_en.ttl.bz2 http://downloads.dbpedia.org/2015-10/core-i18n/en/citation_links_en.ttl.bz2
any feedback is more than welcome
Best,
Dimitris
On Thu, Jun 4, 2015 at 3:00 PM, Markus Krötzsch < markus@semantic-mediawiki.org> wrote:
On 04.06.2015 12:17, Dimitris Kontokostas wrote: ...
Another question: can DBpedia extract references from Wikipedia articles too? If this would be possible, it might be feasible to guess and suggest a reference (or a list of references). Especially with things like date of death, one would expect that references have a publication date very close to (but strictly after) the event, which could narrow down the choices very much.
We don't extract them for now, although I think we could relatively easily. The problem in this case would be that we cannot associate references with facts. The DBpedia Information Extraction Framework is quite module and can be easily extended with new extractors but it is hard to make these extractors "talk to each other". So we could easily get something like the following dbp:A dbo:birthDate "..." dbp:A dbo:deahthDate "..." dbp:A dbo:reference dbp:r1 # and maybe " dbp:r1 ....something else" depending on the modeling dbp:A dbo:reference dbp:r2
but not sure if this solves your problem
Yes, I understand that you can hardly get the association between extracted facts and references. My suggestion was to extract both independently and then to query for references that have a publication date close to a person's death so as to suggest them to users as a possible reference for the death-date fact. This would still require a manual check, since we cannot know if the guessed reference belongs to the date of death, but if it has a high precision it would be a worthwhile way of spending volunteer time to obtain confirmed references.
At the same time, it might be one of the fastest ways to get sourced date of death into Wikidata, since news articles will usually appear before the major authority files are updated (so even if we get donations from them, some lag would remain). With such an extraction framework, one could establish a pipeline from Wikipedia to Wikidata.
In the long run, references from authority files will become more valuable than news articles, because they are more long-lived.
Best wishes,
Markus
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata