Hello,Yesterday I published new version of
Kian. I ran it to add statement to claimless items from Japanese Wikipedia and German Wikipedia and
it is working I'm planning to add French and English Wikipedia, You can install it and run it too.
How new version of Kian works? I introduced concept of "model" in Kian. A model consists four properties: Wiki (such as "enwikivoyage" or "fawiki"), name (an arbitrary name), property (like "P31", "P27") and value of that property (like "Q5" for P31 or "Q31" for "P27"), then Kian goes and trains that model and once we have that model ready, you can use it to add statements on any kind of lists of articles (more technically page gens of pywikibot) for example add this statement on new articles by running something like this:
python scripts/parser.py -lang:ja -newpages:100 -n jaHuman
which jaHuman is name of that model. It caches all data related to that model in data/jaHuman/
Or find possible mistakes in that wiki:
python scripts/possible_mistakes.py -n jaHuman
etc.
Another things worth mentioning are:
*scripts of Kian and the library (the part that actually does stuff) are separated, so you can easily write your own scripts for Kian.
*Since it uses autolists to train and find possible mistakes, results are live.
* Kian now caches results of SQL queries in different folder of model, so first model you build for Spanish Wikipedia may take a while to complete but the second model for Spanish Wikipedia would take so much less time.
* I doubled number of features in a way to made accuracy of Kian really high [1] (e.g. P31:Q5 for German Wikipedia has AUC of 99.75% and precision and recall are 99.11%, 98.31% at threshold 63%)
*Thresholds are being chosen automatically based on
F-beta scores to have optimum accuracy and high recall
* It can give results in different classes of certainty, and we can send these results to semi-automated tools. If anyone willing to help, please do tell.
A crazy idea: What do you think If I make a webservice for Kian, so you can go to a page in labs, register a model and after a while get results, or use OAuth to add statements?
Last thing: Suggest me models and I will work on them :)
[1]: the old Kian worked this way: It labeled all categories based on percentage of members that already has that statements then labels articles based on number of categories in each class the article does have. The new Kian does this but also labels categories based on percentage of members that has that property but not that value (e.g. "Category:Fictional characters" would have a high percentage in model of P31:Q5) and also labels articles based on number of categories in each class.
Best