[Wikidata-l] Kian: The first neural network to serve Wikidata

List overview All Threads
Download

newer

older

Re: [Wikidata-l] new patrol tool

[Wikidata-l] new patrol tool

Amir Ladsgroup

7 Mar 2015 7 Mar '15

12:24 p.m.

Hey, I spent last few weeks working on this lights off [1] and now it's ready to work! Kian is a three-layered neural network with flexible number of inputs and outputs. So if we can parametrize a job, we can teach him easily and get the job done. For example and as the first job. We want to add P31:5 (human) to items of Wikidata based on categories of articles in Wikipedia. The only thing we need to is get list of items with P31:5 and list of items of not-humans (P31 exists but not 5 in it). then get list of category links in any wiki we want[2] and at last we feed these files to Kian and let him learn. Afterwards if we give Kian other articles and their categories, he classifies them as human, not human, or failed to determine. As test I gave him categories of ckb wiki (a small wiki) and worked pretty well and now I'm creating the training set from German Wikipedia and the next step will be English Wikipedia. Number of P31:5 will drastically increase this week. I would love comments or ideas for tasks that Kian can do. [1]: Because I love surprises [2]: "select pp_value, cl_to from page_props join categorylinks on pp_page = cl_from where pp_propname = 'wikibase_item';" Best -- Amir

Attachments:

attachment.htm (text/html — 1.4 KB)

Show replies by thread

Amir Ladsgroup

7 Mar 7 Mar

12:36 p.m.

Some useful tasks that I'm looking for a way to do are: *Anti-vandal bot (or how we can quantify an edit). *Auto labeling for humans (That's the next task). *Add more :) On Sat, Mar 7, 2015 at 3:54 PM, Amir Ladsgroup <ladsgroup(a)gmail.com> wrote:

...

-- Amir

Gerard Meijssen

3:44 p.m.

Hoi, For me this is a dream come true. I very much do NOT want another million edits. That is not to say that I would lose interest; I do not want to do this. Thanks, GerardM On 7 March 2015 at 13:24, Amir Ladsgroup <ladsgroup(a)gmail.com> wrote:

...

Magnus Manske

5:21 p.m.

...

-- Amir _______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Jeroen De Dauw

5:49 p.m.

Hey, Yay, neural nets are definitely fun! Am I right in understanding this is a software you created for the specific purpose of doing tasks in Wikidata?

...

Congratulations for this bold step towards the Singularity :-)

Markus Krötzsch

6:18 p.m.

On 07.03.2015 18:21, Magnus Manske wrote:

...

Congratulations for this bold step towards the Singularity :-)

Lol. The word "neural" in the name of the algorithm is infinitely more attractive and inspiring than something abstract like "Support Vector Machine", isn't it? -- although we know that both approaches are much more similar to each other than to any biological neural system. ;-) However, since this is a general mailing list, it may be fair to clarify that this is just a gradient-descent based optimization procedure that we are deadling with here, and that it has nothing to do with a "thinking" general AI. I know that you know this, but not all of our readers may ... Cheers, Markus

Markus Krötzsch

6:22 p.m.

Hi Amir, In spite of all due enthusiasm, please evaluate your results (with humans!) before making automated edits. In fact, I would contradict Magnus here and say that such an approach would best be suited to provide meaningful (pre-filtered) *input* to people who play a Wikidata game, rather than bypassing the game (and humans) altogether. The expected error rates are quite high for such an approach, but it can still save a lot of works for humans. As for the next steps, I would suggest that you have a look at the works that others have done already. Try Google Scholar: https://scholar.google.com/scholar?q=machine+learning+wikipedia As you can see, there are countless works on using machine learning techniques on Wikipedia, both for information extraction (e.g., understanding link semantics) and for things like vandalism detection. I am sure that one could get a lot of inspiration from there, both on potential applications and on technical hints on how to improve result quality. You will find that people are using many different approaches in these works. The good old ANN is still a relevant algorithm in practice, but there are many other techniques, such as SVNs, Markov models, or random forests, which have been found to work better than ANNs in many cases. Not saying that a three-layer feed-forward ANN cannot do some jobs as well, but I would not restrict to one ML approach if you have a whole arsenal of algorithms available, most of them pre-implemented in libraries (the first Google hit has a lot of relevant projects listed: http://daoudclarke.github.io/machine%20learning%20in%20practice/2013/10/08/…). I would certainly recommend that you don't implement any of the standard ML algorithms from scratch. In practice, the most challenging task for successful ML is often feature engineering: the question which features you use as an input to your learning algorithm. This is far more important that the choice of algorithm. Wikipedia in particular offers you so many relevant pieces of information with each article that are not just mere keywords (links, categories, in-links, ...) and it is not easy to decide which of these to feed into your learner. This will be different for each task you solve (subject classification is fundamentally different from vandalism detection, and even different types of vandalism would require very different techniques). You should pick hard or very large tasks to make sure that the tweaking you need in each case takes less time than you would need as a human to solve the task manually ;-) Anyway, it's an interesting field, and we could certainly use some effort to exploit the countless works in this field for Wikidata. But you should be aware that this is no small challenge and that there is no universal solution that will work well even for all the tasks that you have mentioned in your email. Best wishes, Markus On 07.03.2015 18:21, Magnus Manske wrote:

...

Congratulations for this bold step towards the Singularity :-) As for tasks, basically everything us mere humans do in the Wikidata game: https://tools.wmflabs.org/wikidata-game/ Some may require text parsing. Not sure how to get that working; haven't spent much time with (artificial) neural nets in a while. On Sat, Mar 7, 2015 at 12:36 PM Amir Ladsgroup <ladsgroup(a)gmail.com <mailto:ladsgroup@gmail.com>> wrote: Some useful tasks that I'm looking for a way to do are: *Anti-vandal bot (or how we can quantify an edit). *Auto labeling for humans (That's the next task). *Add more :) On Sat, Mar 7, 2015 at 3:54 PM, Amir Ladsgroup <ladsgroup(a)gmail.com <mailto:ladsgroup@gmail.com>> wrote: Hey, I spent last few weeks working on this lights off [1] and now it's ready to work! Kian is a three-layered neural network with flexible number of inputs and outputs. So if we can parametrize a job, we can teach him easily and get the job done. For example and as the first job. We want to add P31:5 (human) to items of Wikidata based on categories of articles in Wikipedia. The only thing we need to is get list of items with P31:5 and list of items of not-humans (P31 exists but not 5 in it). then get list of category links in any wiki we want[2] and at last we feed these files to Kian and let him learn. Afterwards if we give Kian other articles and their categories, he classifies them as human, not human, or failed to determine. As test I gave him categories of ckb wiki (a small wiki) and worked pretty well and now I'm creating the training set from German Wikipedia and the next step will be English Wikipedia. Number of P31:5 will drastically increase this week. I would love comments or ideas for tasks that Kian can do. [1]: Because I love surprises [2]: "select pp_value, cl_to from page_props join categorylinks on pp_page = cl_from where pp_propname = 'wikibase_item';" Best -- Amir -- Amir _________________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org> https://lists.wikimedia.org/__mailman/listinfo/wikidata-l <https://lists.wikimedia.org/mailman/listinfo/wikidata-l> _______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Amir Ladsgroup

6:44 p.m.

Hey Markus, Thanks for your insight :) On Sat, Mar 7, 2015 at 9:52 PM, Markus Krötzsch < markus(a)semantic-mediawiki.org> wrote:

...

errors by using the certainty factor

...

As for the next steps, I would suggest that you have a look at the works that others have done already. Try Google Scholar: https://scholar.google.com/scholar?q=machine+learning+wikipedia As you can see, there are countless works on using machine learning techniques on Wikipedia, both for information extraction (e.g., understanding link semantics) and for things like vandalism detection. I am sure that one could get a lot of inspiration from there, both on potential applications and on technical hints on how to improve result quality. Yes, definitely I would use them, thanks.

...

You will find that people are using many different approaches in these works. The good old ANN is still a relevant algorithm in practice, but there are many other techniques, such as SVNs, Markov models, or random forests, which have been found to work better than ANNs in many cases. Not saying that a three-layer feed-forward ANN cannot do some jobs as well, but I would not restrict to one ML approach if you have a whole arsenal of algorithms available, most of them pre-implemented in libraries (the first Google hit has a lot of relevant projects listed: http://daoudclarke.github.io/machine%20learning%20in% 20practice/2013/10/08/machine-learning-libraries/). I would certainly recommend that you don't implement any of the standard ML algorithms from scratch. I use backward propagation algorithm and I use octave in ML for my

personal works, but in Wikipedia I use python (for two main reasons: integrating with with other wikipedia-related tools like pywikibot and bad performance of octave and Matlab in big sets of data) and I had to write that parts from scratch since I couldn't find any related library in python. Even algorithms like BFGS is not there (I could find in scipy but I wasn't sure it works correctly and because no documentation is there)

...

In practice, the most challenging task for successful ML is often feature engineering: the question which features you use as an input to your learning algorithm. This is far more important that the choice of algorithm. Wikipedia in particular offers you so many relevant pieces of information with each article that are not just mere keywords (links, categories, in-links, ...) and it is not easy to decide which of these to feed into your learner. This will be different for each task you solve (subject classification is fundamentally different from vandalism detection, and even different types of vandalism would require very different techniques). You should pick hard or very large tasks to make sure that the tweaking you need in each case takes less time than you would need as a human to solve the task manually ;-) Yes, feature engineering is the most important thing and it can be tricky

but feature engineering in Wikidata is lot easier (it's easier than Wikipedia. Wikipedia itself it's easier than other places). Anti-Vandalism bots are lot easier in Wikidata than Wikipedia. Editing in Wikidata is limited to certain kinds (like removing a sitelink, etc.) but it's not easy in Wikipedia.

...

Anyway, it's an interesting field, and we could certainly use some effort to exploit the countless works in this field for Wikidata. But you should be aware that this is no small challenge and that there is no universal solution that will work well even for all the tasks that you have mentioned in your email. Of course, I spent lots of time studying this and I would be happy if

anyone who knows about neural networks or AI can contribute too.

...

Best wishes, Markus On 07.03.2015 18:21, Magnus Manske wrote:

https://lists.wikimedia.org/__mailman/listinfo/wikidata-l <https://lists.wikimedia.org/mailman/listinfo/wikidata-l> _______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

_______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

-- Amir

Amir Ladsgroup

6:52 p.m.

In technical terms a machine which is using forward and backward propagation to make a approximate prediction [1] is being called a neural network and doesn't matter if I agree or not. BTW: I use BGFS not gradient descending. [1]: https://en.wikipedia.org/wiki/Artificial_neural_network On Sat, Mar 7, 2015 at 9:48 PM, Markus Krötzsch < markus(a)semantic-mediawiki.org> wrote:

...

On 07.03.2015 18:21, Magnus Manske wrote:

Congratulations for this bold step towards the Singularity :-)

-- Amir

Emw

6:55 p.m.

Amir, What is the false positive rate of your algorithm when dealing with fictitious humans and (non-fictitious) non-human organisms? That is, how often does your program classify such non-humans as humans? Regarding the latter, note that items about individual dogs, elephants, chimpanzees and even trees can use properties that are otherwise extremely skewed towards humans. For example, Prometheus (Q590010) [1], an extremely old tree, has claims for *date of birth* (P569), *date of death* (P570), even *killed by* (P157). Non-human animals can also have kinship claims (e.g. *mother*, *brother, child*), among other properties typically used on humans. Best, Eric https://www.wikidata.org/wiki/User:Emw 1. Prometheus. https://www.wikidata.org/wiki/Q590010 On Sat, Mar 7, 2015 at 1:44 PM, Amir Ladsgroup <ladsgroup(a)gmail.com> wrote:

...

Hey Markus, Thanks for your insight :) On Sat, Mar 7, 2015 at 9:52 PM, Markus Krötzsch < markus(a)semantic-mediawiki.org> wrote:

errors by using the certainty factor

anyone who knows about neural networks or AI can contribute too.

Best wishes, Markus On 07.03.2015 18:21, Magnus Manske wrote:

Congratulations for this bold step towards the Singularity :-) As for tasks, basically everything us mere humans do in the Wikidata game: https://tools.wmflabs.org/wikidata-game/ Some may require text parsing. Not sure how to get that working; haven't spent much time with (artificial) neural nets in a while. On Sat, Mar 7, 2015 at 12:36 PM Amir Ladsgroup <ladsgroup(a)gmail.com <mailto:ladsgroup@gmail.com>> wrote: Some useful tasks that I'm looking for a way to do are: *Anti-vandal bot (or how we can quantify an edit). *Auto labeling for humans (That's the next task). *Add more :) On Sat, Mar 7, 2015 at 3:54 PM, Amir Ladsgroup <ladsgroup(a)gmail.com <mailto:ladsgroup@gmail.com>> wrote: Hey, I spent last few weeks working on this lights off [1] and now it's ready to work! Kian is a three-layered neural network with flexible number of inputs and outputs. So if we can parametrize a job, we can teach him easily and get the job done. For example and as the first job. We want to add P31:5 (human) to items of Wikidata based on categories of articles in Wikipedia. The only thing we need to is get list of items with P31:5 and list of items of not-humans (P31 exists but not 5 in it). then get list of category links in any wiki we want[2] and at last we feed these files to Kian and let him learn. Afterwards if we give Kian other articles and their categories, he classifies them as human, not human, or failed to determine. As test I gave him categories of ckb wiki (a small wiki) and worked pretty well and now I'm creating the training set from German Wikipedia and the next step will be English Wikipedia. Number of P31:5 will drastically increase this week. I would love comments or ideas for tasks that Kian can do. [1]: Because I love surprises [2]: "select pp_value, cl_to from page_props join categorylinks on pp_page = cl_from where pp_propname = 'wikibase_item';" Best -- Amir -- Amir _________________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org <mailto:Wikidata-l@lists. wikimedia.org> https://lists.wikimedia.org/__mailman/listinfo/wikidata-l <https://lists.wikimedia.org/mailman/listinfo/wikidata-l> _______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

_______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

-- Amir _______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Amir Ladsgroup

7:13 p.m.

On Sat, Mar 7, 2015 at 10:25 PM, Emw <emw.wiki(a)gmail.com> wrote:

...

Regarding the latter, note that items about individual dogs, elephants, chimpanzees and even trees can use properties that are otherwise extremely skewed towards humans. For example, Prometheus (Q590010) [1], an extremely old tree, has claims for *date of birth* (P569), *date of death* (P570), even *killed by* (P157). Non-human animals can also have kinship claims (e.g. *mother*, *brother, child*), among other properties typically used on humans. The trick to avoid such errors is to give big negative score for having a

group E, or D category. Feature engineering for this task is a little complicated. At first I group categories of a Wiki by having human articles. If more than 80% members of a category are known to be humans, it's a group A category and so on. (D group= 0%). so an article can be parameterized by number of categories in each group it has. e.g. an article about human usually is like 5,3,2,0,0 and an article about a tree can be like 1,0,0,6,7 and having one or several group A category alongside with several group D category prevents the bot from making such false statements. How it's possible and how a bot can do that? it's because the huge set of data (training set) we have already and neural networks algorithms. Best

...

Best, Eric https://www.wikidata.org/wiki/User:Emw 1. Prometheus. https://www.wikidata.org/wiki/Q590010 On Sat, Mar 7, 2015 at 1:44 PM, Amir Ladsgroup <ladsgroup(a)gmail.com> wrote:

Hey Markus, Thanks for your insight :) On Sat, Mar 7, 2015 at 9:52 PM, Markus Krötzsch < markus(a)semantic-mediawiki.org> wrote:

errors by using the certainty factor

tricky but feature engineering in Wikidata is lot easier (it's easier than Wikipedia. Wikipedia itself it's easier than other places). Anti-Vandalism bots are lot easier in Wikidata than Wikipedia. Editing in Wikidata is limited to certain kinds (like removing a sitelink, etc.) but it's not easy in Wikipedia.

anyone who knows about neural networks or AI can contribute too.

Best wishes, Markus On 07.03.2015 18:21, Magnus Manske wrote:

Congratulations for this bold step towards the Singularity :-) As for tasks, basically everything us mere humans do in the Wikidata game: https://tools.wmflabs.org/wikidata-game/ Some may require text parsing. Not sure how to get that working; haven't spent much time with (artificial) neural nets in a while. On Sat, Mar 7, 2015 at 12:36 PM Amir Ladsgroup <ladsgroup(a)gmail.com <mailto:ladsgroup@gmail.com>> wrote: Some useful tasks that I'm looking for a way to do are: *Anti-vandal bot (or how we can quantify an edit). *Auto labeling for humans (That's the next task). *Add more :) On Sat, Mar 7, 2015 at 3:54 PM, Amir Ladsgroup <ladsgroup(a)gmail.com <mailto:ladsgroup@gmail.com>> wrote: Hey, I spent last few weeks working on this lights off [1] and now it's ready to work! Kian is a three-layered neural network with flexible number of inputs and outputs. So if we can parametrize a job, we can teach him easily and get the job done. For example and as the first job. We want to add P31:5 (human) to items of Wikidata based on categories of articles in Wikipedia. The only thing we need to is get list of items with P31:5 and list of items of not-humans (P31 exists but not 5 in it). then get list of category links in any wiki we want[2] and at last we feed these files to Kian and let him learn. Afterwards if we give Kian other articles and their categories, he classifies them as human, not human, or failed to determine. As test I gave him categories of ckb wiki (a small wiki) and worked pretty well and now I'm creating the training set from German Wikipedia and the next step will be English Wikipedia. Number of P31:5 will drastically increase this week. I would love comments or ideas for tasks that Kian can do. [1]: Because I love surprises [2]: "select pp_value, cl_to from page_props join categorylinks on pp_page = cl_from where pp_propname = 'wikibase_item';" Best -- Amir -- Amir _________________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org <mailto:Wikidata-l@lists. wikimedia.org> https://lists.wikimedia.org/__mailman/listinfo/wikidata-l <https://lists.wikimedia.org/mailman/listinfo/wikidata-l> _______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

_______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

-- Amir _______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

_______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

-- Amir

Amir Ladsgroup

7:27 p.m.

:) On Sat, Mar 7, 2015 at 8:51 PM, Magnus Manske <magnusmanske(a)googlemail.com> wrote:

...

possible but a little bit tricky. I should think more about other games. A question: How do you realize that two items are suspicious to merge? Maybe it would be easy to start merging them. Best

...

On Sat, Mar 7, 2015 at 12:36 PM Amir Ladsgroup <ladsgroup(a)gmail.com> wrote:

-- Amir _______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

_______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

-- Amir

Daniel Kinzler

9:26 p.m.

Am 07.03.2015 um 20:27 schrieb Amir Ladsgroup:

...

A question: How do you realize that two items are suspicious to merge? Maybe it would be easy to start merging them.

I would think that merge candidates should 1) be similar and 2) not conflict (or have limited or easily resolved conflicts). The exact measure for similarity you'd want to use for this approach is probably the crucial part. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.

Daniel Kinzler

9:28 p.m.

Am 07.03.2015 um 18:49 schrieb Jeroen De Dauw:

...

Don't worry, it'll be some time before AI can actually ingest Wikidata, see https://dl.dropboxusercontent.com/u/7313450/entropy/aitraining.png

ERR 0xBAADF00D -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.

Amir Ladsgroup

11:37 p.m.

On Sat, Mar 7, 2015 at 9:19 PM, Jeroen De Dauw <jeroendedauw(a)gmail.com> wrote:

...

Hey, Yay, neural nets are definitely fun! Am I right in understanding this is a software you created for the specific purpose of doing tasks in Wikidata?

Yes, in Wikidata and Wikipedia.

...

Congratulations for this bold step towards the Singularity :-)

Don't worry, it'll be some time before AI can actually ingest Wikidata, see https://dl.dropboxusercontent.com/u/7313450/entropy/aitraining.png Cheers -- Jeroen De Dauw - http://www.bn2vs.com Software craftsmanship advocate Evil software architect at Wikimedia Germany ~=[,,_,,]:3 _______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

-- Amir

Amir Ladsgroup

8 Mar 8 Mar

11:34 a.m.

This is the result for German Wikipedia: I ran the bot for German and I wanted to add P31:5 but it seems more than 90% of Wikidata items have P31 statement (how?) and there was nothing that I could do, so I got list of articles in German Wikipedia that doesn't have item in Wikidata. There were 16K articles and output of the bot for each one of them is this <https://tools.wmflabs.org/dexbot/kian_res2.txt>. If you plot it, you would have this <https://tools.wmflabs.org/dexbot/kian2.png>. When the number is below 0.50 it is obvious that they are not human. Between 0.50-0.61 there are 78 articles that the bot can't determine whether it's a human or not [1] and articles with more than 0.61 is definitely human. I used 0.62 just to be sure and created 3600 items with P31:5 in them. Imagine if I do something like that for English Wikipedia. [1]: They are probably about a cat or tree with categories of humans in them. Best On Sun, Mar 8, 2015 at 3:07 AM, Amir Ladsgroup <ladsgroup(a)gmail.com> wrote:

...

On Sat, Mar 7, 2015 at 9:19 PM, Jeroen De Dauw <jeroendedauw(a)gmail.com> wrote:

Hey, Yay, neural nets are definitely fun! Am I right in understanding this is a software you created for the specific purpose of doing tasks in Wikidata?

Yes, in Wikidata and Wikipedia.

Congratulations for this bold step towards the Singularity :-)

-- Amir

Ricordisamoa

11:56 a.m.

Sounds promising! It'd be good to have the code publicly viewable. Il 07/03/2015 13:24, Amir Ladsgroup ha scritto:

...

Tom Morris

9 Mar 9 Mar

2:20 a.m.

On Sun, Mar 8, 2015 at 7:34 AM, Amir Ladsgroup <ladsgroup(a)gmail.com> wrote:

...

This is the result for German Wikipedia: ... so I got list of articles in German Wikipedia that doesn't have item in Wikidata. There were 16K articles ... When the number is below 0.50 it is obvious that they are not human. Between 0.50-0.61 there are 78 articles that the bot can't determine whether it's a human or not [1] and articles with more than 0.61 is definitely human. I used 0.62 just to be sure and created 3600 items with P31:5 in them.

"Definitely human" in this context means that you did 100% verification of the 3600 items and one (or more?) human(s) agreed with the bots judgement in these cases? Or that you validated a statistically significant sample of the 3600? Or something else? Tom

Amir Ladsgroup

11:09 a.m.

Hey, On Mon, Mar 9, 2015 at 5:50 AM, Tom Morris <tfmorris(a)gmail.com> wrote:

...

On Sun, Mar 8, 2015 at 7:34 AM, Amir Ladsgroup <ladsgroup(a)gmail.com> wrote:

them but their names <https://tools.wmflabs.org/dexbot/kian_res_de.txt> seems to be ok. please take a look and examine this list any way you want. Best

...

Tom _______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

-- Amir

Amir Ladsgroup

9:40 p.m.

Result for English Wikipedia (6366 articles classified as human) <https://tools.wmflabs.org/dexbot/kian_res_en.txt> On Mon, Mar 9, 2015 at 2:39 PM, Amir Ladsgroup <ladsgroup(a)gmail.com> wrote:

...

Hey, On Mon, Mar 9, 2015 at 5:50 AM, Tom Morris <tfmorris(a)gmail.com> wrote:

On Sun, Mar 8, 2015 at 7:34 AM, Amir Ladsgroup <ladsgroup(a)gmail.com> wrote:

them but their names <https://tools.wmflabs.org/dexbot/kian_res_de.txt> seems to be ok. please take a look and examine this list any way you want. Best

Tom _______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

-- Amir

Maarten Dammers

11 Mar 11 Mar

9:14 p.m.

Hi Amir, Amir Ladsgroup schreef op 9-3-2015 om 22:40:

...

Result for English Wikipedia (6366 articles classified as human) <https://tools.wmflabs.org/dexbot/kian_res_en.txt>

Sounds like fun! Can you run it on the Dutch Wikipedia too? On https://tools.wmflabs.org/multichill/queries/wikidata/noclaims_nlwiki.txt I have a list of items without claims (linking them to other items). Maarten

Sjoerd de Bruin

10:38 p.m.

I'm ready for it! All existing humans on nlwiki have a gender now, so it's easy to review this batch. Bring it on.

...

Op 11 mrt. 2015, om 22:14 heeft Maarten Dammers <maarten(a)mdammers.nl> het volgende geschreven: Hi Amir, Amir Ladsgroup schreef op 9-3-2015 om 22:40:

Result for English Wikipedia (6366 articles classified as human) <https://tools.wmflabs.org/dexbot/kian_res_en.txt>

Sounds like fun! Can you run it on the Dutch Wikipedia too? On https://tools.wmflabs.org/multichill/queries/wikidata/noclaims_nlwiki.txt <https://tools.wmflabs.org/multichill/queries/wikidata/noclaims_nlwiki.txt> I have a list of items without claims (linking them to other items). Maarten _______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Amir Ladsgroup

12 Mar 12 Mar

11:28 a.m.

Sure, tonight it will be done. Best On Thu, Mar 12, 2015 at 2:08 AM, Sjoerd de Bruin <sjoerddebruin(a)me.com> wrote:

...

I'm ready for it! All existing humans on nlwiki have a gender now, so it's easy to review this batch. Bring it on. Op 11 mrt. 2015, om 22:14 heeft Maarten Dammers <maarten(a)mdammers.nl> het volgende geschreven: Hi Amir, Amir Ladsgroup schreef op 9-3-2015 om 22:40: Result for English Wikipedia (6366 articles classified as human) <https://tools.wmflabs.org/dexbot/kian_res_en.txt> Sounds like fun! Can you run it on the Dutch Wikipedia too? On https://tools.wmflabs.org/multichill/queries/wikidata/noclaims_nlwiki.txt I have a list of items without claims (linking them to other items). Maarten _______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l _______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

-- Amir

Amir Ladsgroup

13 Mar 13 Mar

11:51 p.m.

Sorry for the late answer, got busy in the real world. This is the result for unconnected pages of Dutch Wikipedia. http://tools.wmflabs.org/dexbot/kian_res_nl.txt Please check and tell me when they are not human. I'm producing result for empty items related to Dutch Wikipedia. On Thu, Mar 12, 2015 at 2:58 PM Amir Ladsgroup <ladsgroup(a)gmail.com> wrote:

...

Sure, tonight it will be done. Best On Thu, Mar 12, 2015 at 2:08 AM, Sjoerd de Bruin <sjoerddebruin(a)me.com> wrote:

-- Amir

Sjoerd de Bruin

14 Mar 14 Mar

8:39 a.m.

I've corrected two lists (Lijst van voorzitters van de SER and Lijst van voorzitters van de WRR) and a music group (Viper (Belgische danceact)). Will play the gender game the next few days to check them. Greetings, Sjoerd de Bruin sjoerddebruin(a)me.com

...

Op 14 mrt. 2015, om 00:51 heeft Amir Ladsgroup <ladsgroup(a)gmail.com> het volgende geschreven: Sorry for the late answer, got busy in the real world. This is the result for unconnected pages of Dutch Wikipedia. http://tools.wmflabs.org/dexbot/kian_res_nl.txt <http://tools.wmflabs.org/dexbot/kian_res_nl.txt> Please check and tell me when they are not human. I'm producing result for empty items related to Dutch Wikipedia. On Thu, Mar 12, 2015 at 2:58 PM Amir Ladsgroup <ladsgroup(a)gmail.com <mailto:ladsgroup@gmail.com>> wrote: Sure, tonight it will be done. Best On Thu, Mar 12, 2015 at 2:08 AM, Sjoerd de Bruin <sjoerddebruin(a)me.com <mailto:sjoerddebruin@me.com>> wrote: I'm ready for it! All existing humans on nlwiki have a gender now, so it's easy to review this batch. Bring it on.

Op 11 mrt. 2015, om 22:14 heeft Maarten Dammers <maarten(a)mdammers.nl <mailto:maarten@mdammers.nl>> het volgende geschreven: Hi Amir, Amir Ladsgroup schreef op 9-3-2015 om 22:40:

Result for English Wikipedia (6366 articles classified as human) <https://tools.wmflabs.org/dexbot/kian_res_en.txt>

Sounds like fun! Can you run it on the Dutch Wikipedia too? On https://tools.wmflabs.org/multichill/queries/wikidata/noclaims_nlwiki.txt <https://tools.wmflabs.org/multichill/queries/wikidata/noclaims_nlwiki.txt> I have a list of items without claims (linking them to other items). Maarten _______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata-l <https://lists.wikimedia.org/mailman/listinfo/wikidata-l>

_______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata-l <https://lists.wikimedia.org/mailman/listinfo/wikidata-l> -- Amir _______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Sjoerd de Bruin

9:21 a.m.

...

Op 14 mrt. 2015, om 09:39 heeft Sjoerd de Bruin <sjoerddebruin(a)me.com> het volgende geschreven: I've corrected two lists (Lijst van voorzitters van de SER and Lijst van voorzitters van de WRR) and a music group (Viper (Belgische danceact)). Will play the gender game the next few days to check them. Greetings, Sjoerd de Bruin sjoerddebruin(a)me.com <mailto:sjoerddebruin@me.com>

Op 14 mrt. 2015, om 00:51 heeft Amir Ladsgroup <ladsgroup(a)gmail.com <mailto:ladsgroup@gmail.com>> het volgende geschreven: Sorry for the late answer, got busy in the real world. This is the result for unconnected pages of Dutch Wikipedia. http://tools.wmflabs.org/dexbot/kian_res_nl.txt <http://tools.wmflabs.org/dexbot/kian_res_nl.txt> Please check and tell me when they are not human. I'm producing result for empty items related to Dutch Wikipedia. On Thu, Mar 12, 2015 at 2:58 PM Amir Ladsgroup <ladsgroup(a)gmail.com <mailto:ladsgroup@gmail.com>> wrote: Sure, tonight it will be done. Best On Thu, Mar 12, 2015 at 2:08 AM, Sjoerd de Bruin <sjoerddebruin(a)me.com <mailto:sjoerddebruin@me.com>> wrote: I'm ready for it! All existing humans on nlwiki have a gender now, so it's easy to review this batch. Bring it on.

Op 11 mrt. 2015, om 22:14 heeft Maarten Dammers <maarten(a)mdammers.nl <mailto:maarten@mdammers.nl>> het volgende geschreven: Hi Amir, Amir Ladsgroup schreef op 9-3-2015 om 22:40:

Result for English Wikipedia (6366 articles classified as human) <https://tools.wmflabs.org/dexbot/kian_res_en.txt>

Sounds like fun! Can you run it on the Dutch Wikipedia too? On https://tools.wmflabs.org/multichill/queries/wikidata/noclaims_nlwiki.txt <https://tools.wmflabs.org/multichill/queries/wikidata/noclaims_nlwiki.txt> I have a list of items without claims (linking them to other items). Maarten _______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata-l <https://lists.wikimedia.org/mailman/listinfo/wikidata-l>

_______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata-l <https://lists.wikimedia.org/mailman/listinfo/wikidata-l> -- Amir _______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata-l

_______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Amir Ladsgroup

1:18 p.m.

I'm writing a parser so I can feed gender classification to Kian, It'll be done soon and you can use it :) On Sat, Mar 14, 2015 at 12:53 PM Sjoerd de Bruin <sjoerddebruin(a)me.com> wrote:

...

Hm, the Wikidata Game is really slow. Magnus, if you read this: do you know what's going on? I play the gender game with only nlwiki articles, but it never loads. It was working yesterday with just 50 items, so it should work now imo. Greetings, Sjoerd de Bruin sjoerddebruin(a)me.com Op 14 mrt. 2015, om 09:39 heeft Sjoerd de Bruin <sjoerddebruin(a)me.com> het volgende geschreven: I've corrected two lists (Lijst van voorzitters van de SER and Lijst van voorzitters van de WRR) and a music group (Viper (Belgische danceact)). Will play the gender game the next few days to check them. Greetings, Sjoerd de Bruin sjoerddebruin(a)me.com Op 14 mrt. 2015, om 00:51 heeft Amir Ladsgroup <ladsgroup(a)gmail.com> het volgende geschreven: Sorry for the late answer, got busy in the real world. This is the result for unconnected pages of Dutch Wikipedia. http://tools.wmflabs.org/dexbot/kian_res_nl.txt Please check and tell me when they are not human. I'm producing result for empty items related to Dutch Wikipedia. On Thu, Mar 12, 2015 at 2:58 PM Amir Ladsgroup <ladsgroup(a)gmail.com> wrote:

Sure, tonight it will be done. Best On Thu, Mar 12, 2015 at 2:08 AM, Sjoerd de Bruin <sjoerddebruin(a)me.com> wrote:

-- Amir _______________________________________________

Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l _______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l _______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Cristian Consonni

5:11 p.m.

2015-03-08 12:56 GMT+01:00 Ricordisamoa <ricordisamoa(a)openmailbox.org>rg>:

...

Sounds promising! It'd be good to have the code publicly viewable.

+1 Cristian

Amir Ladsgroup

15 Mar 15 Mar

11:01 a.m.

I'm cleaning up and pep8fiying the code to publish it. On Sat, Mar 14, 2015 at 8:42 PM Cristian Consonni <kikkocristian(a)gmail.com> wrote:

...

2015-03-08 12:56 GMT+01:00 Ricordisamoa <ricordisamoa(a)openmailbox.org>rg>:

Sounds promising! It'd be good to have the code publicly viewable.

+1 Cristian _______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Sjoerd de Bruin

2:20 p.m.

Now the gender game is working again, I encountered there were a lot of issues with the following category: https://nl.wikipedia.org/wiki/Categorie:Danceact <https://nl.wikipedia.org/wiki/Categorie:Danceact> As you can see, it's about musical groups but they all were marked as human. Greetings, Sjoerd de Bruin sjoerddebruin(a)me.com

...

Op 14 mrt. 2015, om 14:18 heeft Amir Ladsgroup <ladsgroup(a)gmail.com> het volgende geschreven: I'm writing a parser so I can feed gender classification to Kian, It'll be done soon and you can use it :) On Sat, Mar 14, 2015 at 12:53 PM Sjoerd de Bruin <sjoerddebruin(a)me.com <mailto:sjoerddebruin@me.com>> wrote: Hm, the Wikidata Game is really slow. Magnus, if you read this: do you know what's going on? I play the gender game with only nlwiki articles, but it never loads. It was working yesterday with just 50 items, so it should work now imo. Greetings, Sjoerd de Bruin sjoerddebruin(a)me.com <mailto:sjoerddebruin@me.com>

Op 14 mrt. 2015, om 09:39 heeft Sjoerd de Bruin <sjoerddebruin(a)me.com <mailto:sjoerddebruin@me.com>> het volgende geschreven:

Op 11 mrt. 2015, om 22:14 heeft Maarten Dammers <maarten(a)mdammers.nl <mailto:maarten@mdammers.nl>> het volgende geschreven: Hi Amir, Amir Ladsgroup schreef op 9-3-2015 om 22:40: > Result for English Wikipedia (6366 articles classified as human) <https://tools.wmflabs.org/dexbot/kian_res_en.txt> > Sounds like fun! Can you run it on the Dutch Wikipedia too? On https://tools.wmflabs.org/multichill/queries/wikidata/noclaims_nlwiki.txt <https://tools.wmflabs.org/multichill/queries/wikidata/noclaims_nlwiki.txt> I have a list of items without claims (linking them to other items). Maarten _______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata-l <https://lists.wikimedia.org/mailman/listinfo/wikidata-l>

_______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata-l <https://lists.wikimedia.org/mailman/listinfo/wikidata-l> -- Amir _______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata-l <https://lists.wikimedia.org/mailman/listinfo/wikidata-l>

_______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata-l <https://lists.wikimedia.org/mailman/listinfo/wikidata-l> _______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Amir Ladsgroup

16 Mar 16 Mar

6:16 p.m.

I just published the code https://github.com/Ladsgroup/Kian I really appreciate any comments or changes on the code. On Sun, Mar 15, 2015 at 2:30 PM Amir Ladsgroup <ladsgroup(a)gmail.com> wrote:

...

I'm cleaning up and pep8fiying the code to publish it. On Sat, Mar 14, 2015 at 8:42 PM Cristian Consonni <kikkocristian(a)gmail.com> wrote:

2015-03-08 12:56 GMT+01:00 Ricordisamoa <ricordisamoa(a)openmailbox.org>rg>:

Sounds promising! It'd be good to have the code publicly viewable.

+1 Cristian _______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Amir Ladsgroup

6:20 p.m.

Thanks Sjoerddebruin, I'm working on this so I can write a system to find possible mistakes and it will find and report mistakes made by Dexbot or others. It works more precise as the time goes by. Best On Sun, Mar 15, 2015 at 8:51 PM Sjoerd de Bruin <sjoerddebruin(a)me.com> wrote:

...

Now the gender game is working again, I encountered there were a lot of issues with the following category: https://nl.wikipedia.org/wiki/Categorie:Danceact As you can see, it's about musical groups but they all were marked as human. Greetings, Sjoerd de Bruin sjoerddebruin(a)me.com Op 14 mrt. 2015, om 14:18 heeft Amir Ladsgroup <ladsgroup(a)gmail.com> het volgende geschreven: I'm writing a parser so I can feed gender classification to Kian, It'll be done soon and you can use it :) On Sat, Mar 14, 2015 at 12:53 PM Sjoerd de Bruin <sjoerddebruin(a)me.com> wrote:

Sure, tonight it will be done. Best On Thu, Mar 12, 2015 at 2:08 AM, Sjoerd de Bruin <sjoerddebruin(a)me.com> wrote:

-- Amir _______________________________________________

_______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l _______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Amir Ladsgroup

20 Mar 20 Mar

12:11 a.m.

OK, I have some news: 1- Today I rewrote some parts of Kian and now it automatically chooses regulation parameter (lambda), thus predictions are more accurate. I wanted to push changes to the github but It seems my ssh has issues. It'll be there soon 2- (Important) I wrote a code that can find possible mistakes in Wikidata based on Kian. The code will be in github soon. Check out this link <http://tools.wmflabs.org/dexbot/possible_mistakes_fr.txt>. It's result from comparing French Wikipedia against Wikidata e.g. this line: Q2994923: 1 (d), 0.257480420229 (w) [0, 0, 1, 2, 0] 1 (d) means Wikidata thinks it's a human 0.25... (w) means French Wikipedia thinks it's not a human (with 74.3% certainty) And if you check the link you can see it's a mistake in Wikidata. Please check other results and fix them. Tell me if you want this test to be ran from another language too. 3- I used Kian to import unconnected pages from French Wikipedia and created about 1900 items. The result is here <http://tools.wmflabs.org/dexbot/kian_res_fr.txt> and please check if anything in this list is not human and tell me and I run some error analysis. Best On Mon, Mar 16, 2015 at 9:50 PM, Amir Ladsgroup <ladsgroup(a)gmail.com> wrote:

...

Now the gender game is working again, I encountered there were a lot of issues with the following category: https://nl. wikipedia.org/wiki/Categorie:Danceact As you can see, it's about musical groups but they all were marked as human. Greetings, Sjoerd de Bruin sjoerddebruin(a)me.com Op 14 mrt. 2015, om 14:18 heeft Amir Ladsgroup <ladsgroup(a)gmail.com> het volgende geschreven: I'm writing a parser so I can feed gender classification to Kian, It'll be done soon and you can use it :) On Sat, Mar 14, 2015 at 12:53 PM Sjoerd de Bruin <sjoerddebruin(a)me.com> wrote:

Sure, tonight it will be done. Best On Thu, Mar 12, 2015 at 2:08 AM, Sjoerd de Bruin <sjoerddebruin(a)me.com> wrote: > I'm ready for it! All existing humans on nlwiki have a gender now, so > it's easy to review this batch. Bring it on. > > Op 11 mrt. 2015, om 22:14 heeft Maarten Dammers <maarten(a)mdammers.nl> > het volgende geschreven: > > Hi Amir, > > Amir Ladsgroup schreef op 9-3-2015 om 22:40: > > Result for English Wikipedia (6366 articles classified as human) > <https://tools.wmflabs.org/dexbot/kian_res_en.txt> > > Sounds like fun! Can you run it on the Dutch Wikipedia too? On > https://tools.wmflabs.org/multichill/queries/wikidata/ > noclaims_nlwiki.txt I have a list of items without claims (linking > them to other items). > > Maarten > _______________________________________________ > Wikidata-l mailing list > Wikidata-l(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata-l > > > > _______________________________________________ > Wikidata-l mailing list > Wikidata-l(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata-l > > -- Amir _______________________________________________

-- Amir

Ricordisamoa

12:20 a.m.

Il 20/03/2015 01:11, Amir Ladsgroup ha scritto:

...

Great! Unfortunately, some files seem to have been published with the wrong character encoding. E.g. the first name shows up as "Ã‰chÃ©crate" in my browsers.

Ricordisamoa

12:27 a.m.

Il 20/03/2015 01:11, Amir Ladsgroup ha scritto:

...

The data are based on dumps, aren't they? Wikidata hasn't been thinking Q73823 <https://www.wikidata.org/wiki/Q73823> is a human since 21 Feb.

Amir Ladsgroup

12:29 a.m.

Try to download it, or change the character encoding to utf-8 or unicode. And yes it's based on dumps. :) On Fri, Mar 20, 2015 at 3:51 AM Ricordisamoa <ricordisamoa(a)openmailbox.org> wrote:

...

Il 20/03/2015 01:11, Amir Ladsgroup ha scritto: OK, I have some news: 1- Today I rewrote some parts of Kian and now it automatically chooses regulation parameter (lambda), thus predictions are more accurate. I wanted to push changes to the github but It seems my ssh has issues. It'll be there soon 2- (Important) I wrote a code that can find possible mistakes in Wikidata based on Kian. The code will be in github soon. Check out this link <http://tools.wmflabs.org/dexbot/possible_mistakes_fr.txt>. It's result from comparing French Wikipedia against Wikidata e.g. this line: Q2994923: 1 (d), 0.257480420229 (w) [0, 0, 1, 2, 0] 1 (d) means Wikidata thinks it's a human 0.25... (w) means French Wikipedia thinks it's not a human (with 74.3% certainty) And if you check the link you can see it's a mistake in Wikidata. Please check other results and fix them. Tell me if you want this test to be ran from another language too. 3- I used Kian to import unconnected pages from French Wikipedia and created about 1900 items. The result is here <http://tools.wmflabs.org/dexbot/kian_res_fr.txt> and please check if anything in this list is not human and tell me and I run some error analysis. Best Great! Unfortunately, some files seem to have been published with the wrong character encoding. E.g. the first name shows up as "Ã‰chÃ©crate" in my browsers. _______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Amir Ladsgroup

12:38 a.m.

One mistake <https://www.wikidata.org/wiki/Q2963097> I just found via the report. Article in French Wikipedia is about a French type of cheese but connected to an article in Russian Wikipedia about a French playwriter. Best On Fri, Mar 20, 2015 at 3:59 AM Amir Ladsgroup <ladsgroup(a)gmail.com> wrote:

...

Try to download it, or change the character encoding to utf-8 or unicode. And yes it's based on dumps. :) On Fri, Mar 20, 2015 at 3:51 AM Ricordisamoa <ricordisamoa(a)openmailbox.org> wrote:

Magnus Manske

10:05 a.m.

On Fri, Mar 20, 2015 at 12:38 AM Amir Ladsgroup <ladsgroup(a)gmail.com> wrote:

...

Maybe his plays are cheesy?

Amir Ladsgroup

10:52 a.m.

Probably :P On Fri, Mar 20, 2015 at 1:36 PM Magnus Manske <magnusmanske(a)googlemail.com> wrote:

...

On Fri, Mar 20, 2015 at 12:38 AM Amir Ladsgroup <ladsgroup(a)gmail.com> wrote:

Maybe his plays are cheesy? _______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

3335

days inactive

3348

days old

wikidata@lists.wikimedia.org

Manage subscription

38 comments

12 participants

tags (0)

participants (12)

Amir Ladsgroup
Cristian Consonni
Daniel Kinzler
Emw
Gerard Meijssen
Jeroen De Dauw
Maarten Dammers
Magnus Manske
Markus Krötzsch
Ricordisamoa
Sjoerd de Bruin
Tom Morris