Whoops! Apologies for shorting your name to
"Sam." Looks like the coffee
has not yet kicked in this morning...
On Mon, Feb 6, 2017 at 8:02 AM, Shilad Sen <ssen(a)macalester.edu> wrote:
Hi Sam,
The NLP task you are referring to is often called "wikification," and if
you Google using that term you'll find some hits for datasets. Here's the
first one I found:
https://cogcomp.cs.illinois.edu/page/resource_view/4
I also have a full EN corpus marked up by a simple Wikification
algorithm. It's not very good, but you are welcome to it!
-Shilad
On Mon, Feb 6, 2017 at 3:28 AM, Samuel Printz <samuel.printz(a)outlook.de>
wrote:
Hello Markus,
to take a Wikipedia-annotated corpus and replace the the Wikipedia-URIs
by the respective Wikidata-URIs is a great idea, I think I'll try that
out.
Thank you!
Samuel
Am 05.02.2017 um 21:40 schrieb Markus Kroetzsch:
> On 05.02.2017 15:47, Samuel Printz wrote:
>> Hello everyone,
>>
>> I am looking for a text corpus that is annotated with Wikidata
entites.
>> I need this for the evaluation of an entity linking tool based on
>> Wikidata, which is part of my bachelor thesis.
>>
>> Does such a corpus exist?
>>
>> Ideal would be a corpus annotated in the NIF format [1], as I want to
>> use GERBIL [2] for the evaluation. But it is not necessary.
>
> I don't know of any such corpus, but Wikidata is linked with Wikipedia
> in all languages. You can therefore take any Wikipedia article and
> find, with very little effort, the Wikidata entity for each link in
> the text.
>
> The downside of this is that Wikipedia pages do not link all
> occurrences of all linkable entities. You can get a higher coverage
> when taking only the first paragraph of each page, but many things
> will still not be linked.
>
> However, you could also take any existing Wikipedia-page annotated
> corpus and translate the links to Wikidata in the same way.
>
> Finally, DBpedia also is linked to Wikipedia (in fact, the local names
> of entities are Wikipedia article names). So if you find any
> DBpedia-annotated corpus, you can also translate it to Wikidata
easily.
>
> Good luck,
>
> Markus
>
> P.S. If you build such a corpus from another resource, it would be
> nice if you could publish it for others to save some effort :-)
>
>>
>> Thanks for hints!
>> Samuel
>>
>> [1]
https://site.nlp2rdf.org/
>> [2]
http://aksw.org/Projects/GERBIL.html
>>
>>
>> _______________________________________________
>> Wikidata mailing list
>> Wikidata(a)lists.wikimedia.org
>>
https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
> _______________________________________________
> Wikidata mailing list
> Wikidata(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
--
Shilad W. Sen
Associate Professor
Mathematics, Statistics, and Computer Science Dept.
Macalester College
Senior Research Fellow, Target Corporation
ssen(a)macalester.edu
http://www.shilad.com
https://www.linkedin.com/in/shilad
651-696-6273 <(651)%20696-6273>
--
Shilad W. Sen
Associate Professor
Mathematics, Statistics, and Computer Science Dept.
Macalester College
Senior Research Fellow, Target Corporation
ssen(a)macalester.edu
651-696-6273 <(651)%20696-6273>
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org