Try to download it, or change the character encoding to utf-8 or unicode.

And yes it's based on dumps. :)

On Fri, Mar 20, 2015 at 3:51 AM Ricordisamoa <ricordisamoa@openmailbox.org> wrote:
Il 20/03/2015 01:11, Amir Ladsgroup ha scritto:

OK, I have some news:
1- Today I rewrote some parts of Kian and now it automatically chooses regulation parameter (lambda), thus predictions are more accurate. I wanted to push changes to the github but It seems my ssh has issues. It'll be there soon
2- (Important) I wrote a code that can find possible mistakes in Wikidata based on Kian. The code will be in github soon. Check out this link. It's result from comparing French Wikipedia against Wikidata e.g. this line:
Q2994923: 1 (d), 0.257480420229 (w) [0, 0, 1, 2, 0]
1 (d) means Wikidata thinks it's a human

0.25... (w) means French Wikipedia thinks it's not a human (with 74.3% certainty)

And if you check the link you can see it's a mistake in Wikidata. Please check other results and fix them.

Tell me if you want this test to be ran from another language too.

3- I used Kian to import unconnected pages from French Wikipedia and created about 1900 items. The result is here and please check if anything in this list is not human and tell me and I run some error analysis.


Best
Great!
Unfortunately, some files seem to have been published with the wrong character encoding.
E.g. the first name shows up as "Échécrate" in my browsers.
_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l