Hi Marco,
Where might we find some statistics on the current accuracy of the automated claim and reference extractors? I assume that information must be in there somewhere, but I had trouble finding it.
This is a very ambitious project covering a very large technical territory (which I applaud). It would be great if your results could be synthesized a bit more clearly so we can understand where the weak/strong points are and where we might be able to help improve or make use of what you have done in other domains.
-Ben
On Wed, Jun 15, 2016 at 9:06 AM, Marco Fossati fossati@spaziodati.eu wrote:
[Feel free to blame me if you read this more than once]
To whom it may interest,
Full of delight, I would like to announce the first beta release of *StrepHit*:
https://github.com/Wikidata/StrepHit
TL;DR: StrepHit is an intelligent reading agent that understands text and translates it into *referenced* Wikidata statements. It is a IEG project funded by the Wikimedia Foundation.
Key features: -Web spiders to harvest a collection of documents (corpus) from reliable sources -automatic corpus analysis to understand the most meaningful verbs -sentences and semi-structured data extraction -train a machine learning classifier via crowdsourcing -*supervised and rule-based fact extraction from text* -Natural Language Processing utilities -parallel processing
You can find all the details here:
https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Val...
https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Val...
If you like it, star it on GitHub!
Best,
Marco
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata