Hi Sam,The NLP task you are referring to is often called "wikification," and if you Google using that term you'll find some hits for datasets. Here's the first one I found: https://cogcomp.cs.illinois.edu/page/resource_ view/4 I also have a full EN corpus marked up by a simple Wikification algorithm. It's not very good, but you are welcome to it!-Shilad--On Mon, Feb 6, 2017 at 3:28 AM, Samuel Printz <samuel.printz@outlook.de> wrote:Hello Markus,
to take a Wikipedia-annotated corpus and replace the the Wikipedia-URIs
by the respective Wikidata-URIs is a great idea, I think I'll try that out.
Thank you!
Samuel
Am 05.02.2017 um 21:40 schrieb Markus Kroetzsch:
> On 05.02.2017 15:47, Samuel Printz wrote:
>> Hello everyone,
>>
>> I am looking for a text corpus that is annotated with Wikidata entites.
>> I need this for the evaluation of an entity linking tool based on
>> Wikidata, which is part of my bachelor thesis.
>>
>> Does such a corpus exist?
>>
>> Ideal would be a corpus annotated in the NIF format [1], as I want to
>> use GERBIL [2] for the evaluation. But it is not necessary.
>
> I don't know of any such corpus, but Wikidata is linked with Wikipedia
> in all languages. You can therefore take any Wikipedia article and
> find, with very little effort, the Wikidata entity for each link in
> the text.
>
> The downside of this is that Wikipedia pages do not link all
> occurrences of all linkable entities. You can get a higher coverage
> when taking only the first paragraph of each page, but many things
> will still not be linked.
>
> However, you could also take any existing Wikipedia-page annotated
> corpus and translate the links to Wikidata in the same way.
>
> Finally, DBpedia also is linked to Wikipedia (in fact, the local names
> of entities are Wikipedia article names). So if you find any
> DBpedia-annotated corpus, you can also translate it to Wikidata easily.
>
> Good luck,
>
> Markus
>
> P.S. If you build such a corpus from another resource, it would be
> nice if you could publish it for others to save some effort :-)
>
>>
>> Thanks for hints!
>> Samuel
>>
>> [1] https://site.nlp2rdf.org/
>> [2] http://aksw.org/Projects/GERBIL.html
>>
>>
>> _______________________________________________
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
Shilad W. Sen
Associate Professor
Mathematics, Statistics, and Computer Science Dept.
Macalester College
Senior Research Fellow, Target Corporation