Hi Sam,

The NLP task you are referring to is often called "wikification," and if you Google using that term you'll find some hits for datasets. Here's the first one I found: https://cogcomp.cs.illinois.edu/page/resource_view/4

I also have a full EN corpus marked up by a simple Wikification algorithm. It's not very good, but you are welcome to it!

-Shilad

On Mon, Feb 6, 2017 at 3:28 AM, Samuel Printz <samuel.printz@outlook.de> wrote:
Hello Markus,

to take a Wikipedia-annotated corpus and replace the the Wikipedia-URIs
by the respective Wikidata-URIs is a great idea, I think I'll try that out.

Thank you!

Samuel


Am 05.02.2017 um 21:40 schrieb Markus Kroetzsch:
> On 05.02.2017 15:47, Samuel Printz wrote:
>> Hello everyone,
>>
>> I am looking for a text corpus that is annotated with Wikidata entites.
>> I need this for the evaluation of an entity linking tool based on
>> Wikidata, which is part of my bachelor thesis.
>>
>> Does such a corpus exist?
>>
>> Ideal would be a corpus annotated in the NIF format [1], as I want to
>> use GERBIL [2] for the evaluation. But it is not necessary.
>
> I don't know of any such corpus, but Wikidata is linked with Wikipedia
> in all languages. You can therefore take any Wikipedia article and
> find, with very little effort, the Wikidata entity for each link in
> the text.
>
> The downside of this is that Wikipedia pages do not link all
> occurrences of all linkable entities. You can get a higher coverage
> when taking only the first paragraph of each page, but many things
> will still not be linked.
>
> However, you could also take any existing Wikipedia-page annotated
> corpus and translate the links to Wikidata in the same way.
>
> Finally, DBpedia also is linked to Wikipedia (in fact, the local names
> of entities are Wikipedia article names). So if you find any
> DBpedia-annotated corpus, you can also translate it to Wikidata easily.
>
> Good luck,
>
> Markus
>
> P.S. If you build such a corpus from another resource, it would be
> nice if you could publish it for others to save some effort :-)
>
>>
>> Thanks for hints!
>> Samuel
>>
>> [1] https://site.nlp2rdf.org/
>> [2] http://aksw.org/Projects/GERBIL.html
>>
>>
>> _______________________________________________
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata



--
Shilad W. Sen

Associate Professor
Mathematics, Statistics, and Computer Science Dept.
Macalester College

Senior Research Fellow, Target Corporation