Whoops! Apologies for shorting your name to
"Sam." Looks like the coffee
has not yet kicked in this morning...
On Mon, Feb 6, 2017 at 8:02 AM, Shilad Sen <ssen(a)macalester.edu> wrote:
Hi Sam,
The NLP task you are referring to is often called "wikification," and if
you Google using that term you'll find some hits for datasets. Here's the
first one I found:
https://cogcomp.cs.illinois.edu/page/resource_view/4
I also have a full EN corpus marked up by a simple Wikification
algorithm. It's not very good, but you are welcome to it!
-Shilad
On Mon, Feb 6, 2017 at 3:28 AM, Samuel Printz <samuel.printz(a)outlook.de>
wrote:
Hello Markus,
to take a Wikipedia-annotated corpus and replace the the Wikipedia-URIs
by the respective Wikidata-URIs is a great idea, I think I'll try that
out.
Thank you!
Samuel
Am 05.02.2017 um 21:40 schrieb Markus Kroetzsch:
On 05.02.2017 15:47, Samuel Printz wrote:
> Hello everyone,
>
> I am looking for a text corpus that is annotated with Wikidata
entites.
> I need this for the evaluation of an entity
linking tool based on
> Wikidata, which is part of my bachelor thesis.
>
> Does such a corpus exist?
>
> Ideal would be a corpus annotated in the NIF format [1], as I want to
> use GERBIL [2] for the evaluation. But it is not necessary.
I don't know of any such corpus, but Wikidata is linked with Wikipedia
in all languages. You can therefore take any Wikipedia article and
find, with very little effort, the Wikidata entity for each link in
the text.
The downside of this is that Wikipedia pages do not link all
occurrences of all linkable entities. You can get a higher coverage
when taking only the first paragraph of each page, but many things
will still not be linked.
However, you could also take any existing Wikipedia-page annotated
corpus and translate the links to Wikidata in the same way.
Finally, DBpedia also is linked to Wikipedia (in fact, the local names
of entities are Wikipedia article names). So if you find any
DBpedia-annotated corpus, you can also translate it to Wikidata easily.
Good luck,
Markus
P.S. If you build such a corpus from another resource, it would be
nice if you could publish it for others to save some effort :-)
>
> Thanks for hints!
> Samuel
>
> [1]
https://site.nlp2rdf.org/
> [2]
http://aksw.org/Projects/GERBIL.html
>
>
> _______________________________________________
> Wikidata mailing list
> Wikidata(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/wikidata
>
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
--
Shilad W. Sen
Associate Professor
Mathematics, Statistics, and Computer Science Dept.
Macalester College
Senior Research Fellow, Target Corporation
ssen(a)macalester.edu
http://www.shilad.com
https://www.linkedin.com/in/shilad
651-696-6273 <(651)%20696-6273>
--
Shilad W. Sen
Associate Professor
Mathematics, Statistics, and Computer Science Dept.
Macalester College
Senior Research Fellow, Target Corporation
ssen(a)macalester.edu
651-696-6273
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org