Hi Marco,
Where might we find some statistics on the current accuracy of the
automated claim and reference extractors? I assume that information must
be in there somewhere, but I had trouble finding it.
This is a very ambitious project covering a very large technical territory
(which I applaud). It would be great if your results could be synthesized
a bit more clearly so we can understand where the weak/strong points are
and where we might be able to help improve or make use of what you have
done in other domains.
-Ben
On Wed, Jun 15, 2016 at 9:06 AM, Marco Fossati <fossati(a)spaziodati.eu>
wrote:
[Feel free to blame me if you read this more than
once]
To whom it may interest,
Full of delight, I would like to announce the first beta release of
*StrepHit*:
https://github.com/Wikidata/StrepHit
TL;DR: StrepHit is an intelligent reading agent that understands text and
translates it into *referenced* Wikidata statements.
It is a IEG project funded by the Wikimedia Foundation.
Key features:
-Web spiders to harvest a collection of documents (corpus) from reliable
sources
-automatic corpus analysis to understand the most meaningful verbs
-sentences and semi-structured data extraction
-train a machine learning classifier via crowdsourcing
-*supervised and rule-based fact extraction from text*
-Natural Language Processing utilities
-parallel processing
You can find all the details here:
https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Va…
https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Va…
If you like it, star it on GitHub!
Best,
Marco
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata