Hoi,
Would you say that it has value to annotate wikilinks as well as red links not only for Wikipedia and Wikidata quality reasons but that this is an additional reason why it makes sense ?

Would it make sense when words in the text that do not have a link have an item with statements referencing the item for the article?
Thanks,
      GerardM

On 5 February 2017 at 21:40, Markus Kroetzsch <markus.kroetzsch@tu-dresden.de> wrote:
On 05.02.2017 15:47, Samuel Printz wrote:
Hello everyone,

I am looking for a text corpus that is annotated with Wikidata entites.
I need this for the evaluation of an entity linking tool based on
Wikidata, which is part of my bachelor thesis.

Does such a corpus exist?

Ideal would be a corpus annotated in the NIF format [1], as I want to
use GERBIL [2] for the evaluation. But it is not necessary.

I don't know of any such corpus, but Wikidata is linked with Wikipedia in all languages. You can therefore take any Wikipedia article and find, with very little effort, the Wikidata entity for each link in the text.

The downside of this is that Wikipedia pages do not link all occurrences of all linkable entities. You can get a higher coverage when taking only the first paragraph of each page, but many things will still not be linked.

However, you could also take any existing Wikipedia-page annotated corpus and translate the links to Wikidata in the same way.

Finally, DBpedia also is linked to Wikipedia (in fact, the local names of entities are Wikipedia article names). So if you find any DBpedia-annotated corpus, you can also translate it to Wikidata easily.

Good luck,

Markus

P.S. If you build such a corpus from another resource, it would be nice if you could publish it for others to save some effort :-)



Thanks for hints!
Samuel

[1] https://site.nlp2rdf.org/
[2] http://aksw.org/Projects/GERBIL.html


_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata