Hey, I spent last few weeks working on this lights off [1] and now it's ready to work!
Kian is a three-layered neural network with flexible number of inputs and outputs. So if we can parametrize a job, we can teach him easily and get the job done.
For example and as the first job. We want to add P31:5 (human) to items of Wikidata based on categories of articles in Wikipedia. The only thing we need to is get list of items with P31:5 and list of items of not-humans (P31 exists but not 5 in it). then get list of category links in any wiki we want[2] and at last we feed these files to Kian and let him learn. Afterwards if we give Kian other articles and their categories, he classifies them as human, not human, or failed to determine. As test I gave him categories of ckb wiki (a small wiki) and worked pretty well and now I'm creating the training set from German Wikipedia and the next step will be English Wikipedia. Number of P31:5 will drastically increase this week.
I would love comments or ideas for tasks that Kian can do.
[1]: Because I love surprises [2]: "select pp_value, cl_to from page_props join categorylinks on pp_page = cl_from where pp_propname = 'wikibase_item';" Best
Some useful tasks that I'm looking for a way to do are: *Anti-vandal bot (or how we can quantify an edit). *Auto labeling for humans (That's the next task). *Add more :)
On Sat, Mar 7, 2015 at 3:54 PM, Amir Ladsgroup ladsgroup@gmail.com wrote:
Hey, I spent last few weeks working on this lights off [1] and now it's ready to work!
Kian is a three-layered neural network with flexible number of inputs and outputs. So if we can parametrize a job, we can teach him easily and get the job done.
For example and as the first job. We want to add P31:5 (human) to items of Wikidata based on categories of articles in Wikipedia. The only thing we need to is get list of items with P31:5 and list of items of not-humans (P31 exists but not 5 in it). then get list of category links in any wiki we want[2] and at last we feed these files to Kian and let him learn. Afterwards if we give Kian other articles and their categories, he classifies them as human, not human, or failed to determine. As test I gave him categories of ckb wiki (a small wiki) and worked pretty well and now I'm creating the training set from German Wikipedia and the next step will be English Wikipedia. Number of P31:5 will drastically increase this week.
I would love comments or ideas for tasks that Kian can do.
[1]: Because I love surprises [2]: "select pp_value, cl_to from page_props join categorylinks on pp_page = cl_from where pp_propname = 'wikibase_item';" Best -- Amir
Congratulations for this bold step towards the Singularity :-)
As for tasks, basically everything us mere humans do in the Wikidata game: https://tools.wmflabs.org/wikidata-game/
Some may require text parsing. Not sure how to get that working; haven't spent much time with (artificial) neural nets in a while.
On Sat, Mar 7, 2015 at 12:36 PM Amir Ladsgroup ladsgroup@gmail.com wrote:
Some useful tasks that I'm looking for a way to do are: *Anti-vandal bot (or how we can quantify an edit). *Auto labeling for humans (That's the next task). *Add more :)
On Sat, Mar 7, 2015 at 3:54 PM, Amir Ladsgroup ladsgroup@gmail.com wrote:
Hey, I spent last few weeks working on this lights off [1] and now it's ready to work!
Kian is a three-layered neural network with flexible number of inputs and outputs. So if we can parametrize a job, we can teach him easily and get the job done.
For example and as the first job. We want to add P31:5 (human) to items of Wikidata based on categories of articles in Wikipedia. The only thing we need to is get list of items with P31:5 and list of items of not-humans (P31 exists but not 5 in it). then get list of category links in any wiki we want[2] and at last we feed these files to Kian and let him learn. Afterwards if we give Kian other articles and their categories, he classifies them as human, not human, or failed to determine. As test I gave him categories of ckb wiki (a small wiki) and worked pretty well and now I'm creating the training set from German Wikipedia and the next step will be English Wikipedia. Number of P31:5 will drastically increase this week.
I would love comments or ideas for tasks that Kian can do.
[1]: Because I love surprises [2]: "select pp_value, cl_to from page_props join categorylinks on pp_page = cl_from where pp_propname = 'wikibase_item';" Best -- Amir
-- Amir
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Hey,
Yay, neural nets are definitely fun! Am I right in understanding this is a software you created for the specific purpose of doing tasks in Wikidata?
Congratulations for this bold step towards the Singularity :-)
Don't worry, it'll be some time before AI can actually ingest Wikidata, see https://dl.dropboxusercontent.com/u/7313450/entropy/aitraining.png
Cheers
-- Jeroen De Dauw - http://www.bn2vs.com Software craftsmanship advocate Evil software architect at Wikimedia Germany ~=[,,_,,]:3
Am 07.03.2015 um 18:49 schrieb Jeroen De Dauw:
Don't worry, it'll be some time before AI can actually ingest Wikidata, see https://dl.dropboxusercontent.com/u/7313450/entropy/aitraining.png
ERR 0xBAADF00D
On Sat, Mar 7, 2015 at 9:19 PM, Jeroen De Dauw jeroendedauw@gmail.com wrote:
Hey,
Yay, neural nets are definitely fun! Am I right in understanding this is a software you created for the specific purpose of doing tasks in Wikidata?
Yes, in Wikidata and Wikipedia.
Congratulations for this bold step towards the Singularity :-)
Don't worry, it'll be some time before AI can actually ingest Wikidata, see https://dl.dropboxusercontent.com/u/7313450/entropy/aitraining.png
Cheers
-- Jeroen De Dauw - http://www.bn2vs.com Software craftsmanship advocate Evil software architect at Wikimedia Germany ~=[,,_,,]:3
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
This is the result for German Wikipedia: I ran the bot for German and I wanted to add P31:5 but it seems more than 90% of Wikidata items have P31 statement (how?) and there was nothing that I could do, so I got list of articles in German Wikipedia that doesn't have item in Wikidata. There were 16K articles and output of the bot for each one of them is this https://tools.wmflabs.org/dexbot/kian_res2.txt. If you plot it, you would have this https://tools.wmflabs.org/dexbot/kian2.png. When the number is below 0.50 it is obvious that they are not human. Between 0.50-0.61 there are 78 articles that the bot can't determine whether it's a human or not [1] and articles with more than 0.61 is definitely human. I used 0.62 just to be sure and created 3600 items with P31:5 in them.
Imagine if I do something like that for English Wikipedia.
[1]: They are probably about a cat or tree with categories of humans in them.
Best
On Sun, Mar 8, 2015 at 3:07 AM, Amir Ladsgroup ladsgroup@gmail.com wrote:
On Sat, Mar 7, 2015 at 9:19 PM, Jeroen De Dauw jeroendedauw@gmail.com wrote:
Hey,
Yay, neural nets are definitely fun! Am I right in understanding this is a software you created for the specific purpose of doing tasks in Wikidata?
Yes, in Wikidata and Wikipedia.
Congratulations for this bold step towards the Singularity :-)
Don't worry, it'll be some time before AI can actually ingest Wikidata, see https://dl.dropboxusercontent.com/u/7313450/entropy/aitraining.png
Cheers
-- Jeroen De Dauw - http://www.bn2vs.com Software craftsmanship advocate Evil software architect at Wikimedia Germany ~=[,,_,,]:3
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
-- Amir
On Sun, Mar 8, 2015 at 7:34 AM, Amir Ladsgroup ladsgroup@gmail.com wrote:
This is the result for German Wikipedia: ... so I got list of articles in German Wikipedia that doesn't have item in Wikidata. There were 16K articles ... When the number is below 0.50 it is obvious that they are not human. Between 0.50-0.61 there are 78 articles that the bot can't determine whether it's a human or not [1] and articles with more than 0.61 is definitely human. I used 0.62 just to be sure and created 3600 items with P31:5 in them.
"Definitely human" in this context means that you did 100% verification of the 3600 items and one (or more?) human(s) agreed with the bots judgement in these cases? Or that you validated a statistically significant sample of the 3600? Or something else?
Tom
Hey,
On Mon, Mar 9, 2015 at 5:50 AM, Tom Morris tfmorris@gmail.com wrote:
On Sun, Mar 8, 2015 at 7:34 AM, Amir Ladsgroup ladsgroup@gmail.com wrote:
This is the result for German Wikipedia: ... so I got list of articles in German Wikipedia that doesn't have item in Wikidata. There were 16K articles ... When the number is below 0.50 it is obvious that they are not human. Between 0.50-0.61 there are 78 articles that the bot can't determine whether it's a human or not [1] and articles with more than 0.61 is definitely human. I used 0.62 just to be sure and created 3600 items with P31:5 in them.
"Definitely human" in this context means that you did 100% verification of the 3600 items and one (or more?) human(s) agreed with the bots judgement in these cases? Or that you validated a statistically significant sample of the 3600? Or something else?
I meant Kian classified them as "Definitely human", I can't check all of
them but their names https://tools.wmflabs.org/dexbot/kian_res_de.txt seems to be ok. please take a look and examine this list any way you want.
Best
Tom
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Result for English Wikipedia (6366 articles classified as human) https://tools.wmflabs.org/dexbot/kian_res_en.txt
On Mon, Mar 9, 2015 at 2:39 PM, Amir Ladsgroup ladsgroup@gmail.com wrote:
Hey,
On Mon, Mar 9, 2015 at 5:50 AM, Tom Morris tfmorris@gmail.com wrote:
On Sun, Mar 8, 2015 at 7:34 AM, Amir Ladsgroup ladsgroup@gmail.com wrote:
This is the result for German Wikipedia: ... so I got list of articles in German Wikipedia that doesn't have item in Wikidata. There were 16K articles ... When the number is below 0.50 it is obvious that they are not human. Between 0.50-0.61 there are 78 articles that the bot can't determine whether it's a human or not [1] and articles with more than 0.61 is definitely human. I used 0.62 just to be sure and created 3600 items with P31:5 in them.
"Definitely human" in this context means that you did 100% verification of the 3600 items and one (or more?) human(s) agreed with the bots judgement in these cases? Or that you validated a statistically significant sample of the 3600? Or something else?
I meant Kian classified them as "Definitely human", I can't check all of
them but their names https://tools.wmflabs.org/dexbot/kian_res_de.txt seems to be ok. please take a look and examine this list any way you want.
Best
Tom
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
-- Amir
Hi Amir,
Amir Ladsgroup schreef op 9-3-2015 om 22:40:
Result for English Wikipedia (6366 articles classified as human) https://tools.wmflabs.org/dexbot/kian_res_en.txt
Sounds like fun! Can you run it on the Dutch Wikipedia too? On https://tools.wmflabs.org/multichill/queries/wikidata/noclaims_nlwiki.txt I have a list of items without claims (linking them to other items).
Maarten
I'm ready for it! All existing humans on nlwiki have a gender now, so it's easy to review this batch. Bring it on.
Op 11 mrt. 2015, om 22:14 heeft Maarten Dammers maarten@mdammers.nl het volgende geschreven:
Hi Amir,
Amir Ladsgroup schreef op 9-3-2015 om 22:40:
Result for English Wikipedia (6366 articles classified as human) https://tools.wmflabs.org/dexbot/kian_res_en.txt
Sounds like fun! Can you run it on the Dutch Wikipedia too? On https://tools.wmflabs.org/multichill/queries/wikidata/noclaims_nlwiki.txt https://tools.wmflabs.org/multichill/queries/wikidata/noclaims_nlwiki.txt I have a list of items without claims (linking them to other items).
Maarten _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Sure, tonight it will be done.
Best
On Thu, Mar 12, 2015 at 2:08 AM, Sjoerd de Bruin sjoerddebruin@me.com wrote:
I'm ready for it! All existing humans on nlwiki have a gender now, so it's easy to review this batch. Bring it on.
Op 11 mrt. 2015, om 22:14 heeft Maarten Dammers maarten@mdammers.nl het volgende geschreven:
Hi Amir,
Amir Ladsgroup schreef op 9-3-2015 om 22:40:
Result for English Wikipedia (6366 articles classified as human) https://tools.wmflabs.org/dexbot/kian_res_en.txt
Sounds like fun! Can you run it on the Dutch Wikipedia too? On https://tools.wmflabs.org/multichill/queries/wikidata/noclaims_nlwiki.txt I have a list of items without claims (linking them to other items).
Maarten _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Sorry for the late answer, got busy in the real world. This is the result for unconnected pages of Dutch Wikipedia. http://tools.wmflabs.org/dexbot/kian_res_nl.txt Please check and tell me when they are not human. I'm producing result for empty items related to Dutch Wikipedia.
On Thu, Mar 12, 2015 at 2:58 PM Amir Ladsgroup ladsgroup@gmail.com wrote:
Sure, tonight it will be done.
Best
On Thu, Mar 12, 2015 at 2:08 AM, Sjoerd de Bruin sjoerddebruin@me.com wrote:
I'm ready for it! All existing humans on nlwiki have a gender now, so it's easy to review this batch. Bring it on.
Op 11 mrt. 2015, om 22:14 heeft Maarten Dammers maarten@mdammers.nl het volgende geschreven:
Hi Amir,
Amir Ladsgroup schreef op 9-3-2015 om 22:40:
Result for English Wikipedia (6366 articles classified as human) https://tools.wmflabs.org/dexbot/kian_res_en.txt
Sounds like fun! Can you run it on the Dutch Wikipedia too? On https://tools.wmflabs.org/multichill/queries/wikidata/noclaims_nlwiki.txt I have a list of items without claims (linking them to other items).
Maarten _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
-- Amir
I've corrected two lists (Lijst van voorzitters van de SER and Lijst van voorzitters van de WRR) and a music group (Viper (Belgische danceact)). Will play the gender game the next few days to check them.
Greetings,
Sjoerd de Bruin sjoerddebruin@me.com
Op 14 mrt. 2015, om 00:51 heeft Amir Ladsgroup ladsgroup@gmail.com het volgende geschreven:
Sorry for the late answer, got busy in the real world. This is the result for unconnected pages of Dutch Wikipedia. http://tools.wmflabs.org/dexbot/kian_res_nl.txt http://tools.wmflabs.org/dexbot/kian_res_nl.txt Please check and tell me when they are not human. I'm producing result for empty items related to Dutch Wikipedia.
On Thu, Mar 12, 2015 at 2:58 PM Amir Ladsgroup <ladsgroup@gmail.com mailto:ladsgroup@gmail.com> wrote: Sure, tonight it will be done.
Best
On Thu, Mar 12, 2015 at 2:08 AM, Sjoerd de Bruin <sjoerddebruin@me.com mailto:sjoerddebruin@me.com> wrote: I'm ready for it! All existing humans on nlwiki have a gender now, so it's easy to review this batch. Bring it on.
Op 11 mrt. 2015, om 22:14 heeft Maarten Dammers <maarten@mdammers.nl mailto:maarten@mdammers.nl> het volgende geschreven:
Hi Amir,
Amir Ladsgroup schreef op 9-3-2015 om 22:40:
Result for English Wikipedia (6366 articles classified as human) https://tools.wmflabs.org/dexbot/kian_res_en.txt
Sounds like fun! Can you run it on the Dutch Wikipedia too? On https://tools.wmflabs.org/multichill/queries/wikidata/noclaims_nlwiki.txt https://tools.wmflabs.org/multichill/queries/wikidata/noclaims_nlwiki.txt I have a list of items without claims (linking them to other items).
Maarten _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org mailto:Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org mailto:Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l https://lists.wikimedia.org/mailman/listinfo/wikidata-l
-- Amir
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Hm, the Wikidata Game is really slow. Magnus, if you read this: do you know what's going on? I play the gender game with only nlwiki articles, but it never loads. It was working yesterday with just 50 items, so it should work now imo.
Greetings,
Sjoerd de Bruin sjoerddebruin@me.com
Op 14 mrt. 2015, om 09:39 heeft Sjoerd de Bruin sjoerddebruin@me.com het volgende geschreven:
I've corrected two lists (Lijst van voorzitters van de SER and Lijst van voorzitters van de WRR) and a music group (Viper (Belgische danceact)). Will play the gender game the next few days to check them.
Greetings,
Sjoerd de Bruin sjoerddebruin@me.com mailto:sjoerddebruin@me.com
Op 14 mrt. 2015, om 00:51 heeft Amir Ladsgroup <ladsgroup@gmail.com mailto:ladsgroup@gmail.com> het volgende geschreven:
Sorry for the late answer, got busy in the real world. This is the result for unconnected pages of Dutch Wikipedia. http://tools.wmflabs.org/dexbot/kian_res_nl.txt http://tools.wmflabs.org/dexbot/kian_res_nl.txt Please check and tell me when they are not human. I'm producing result for empty items related to Dutch Wikipedia.
On Thu, Mar 12, 2015 at 2:58 PM Amir Ladsgroup <ladsgroup@gmail.com mailto:ladsgroup@gmail.com> wrote: Sure, tonight it will be done.
Best
On Thu, Mar 12, 2015 at 2:08 AM, Sjoerd de Bruin <sjoerddebruin@me.com mailto:sjoerddebruin@me.com> wrote: I'm ready for it! All existing humans on nlwiki have a gender now, so it's easy to review this batch. Bring it on.
Op 11 mrt. 2015, om 22:14 heeft Maarten Dammers <maarten@mdammers.nl mailto:maarten@mdammers.nl> het volgende geschreven:
Hi Amir,
Amir Ladsgroup schreef op 9-3-2015 om 22:40:
Result for English Wikipedia (6366 articles classified as human) https://tools.wmflabs.org/dexbot/kian_res_en.txt
Sounds like fun! Can you run it on the Dutch Wikipedia too? On https://tools.wmflabs.org/multichill/queries/wikidata/noclaims_nlwiki.txt https://tools.wmflabs.org/multichill/queries/wikidata/noclaims_nlwiki.txt I have a list of items without claims (linking them to other items).
Maarten _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org mailto:Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org mailto:Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l https://lists.wikimedia.org/mailman/listinfo/wikidata-l
-- Amir
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org mailto:Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
I'm writing a parser so I can feed gender classification to Kian, It'll be done soon and you can use it :)
On Sat, Mar 14, 2015 at 12:53 PM Sjoerd de Bruin sjoerddebruin@me.com wrote:
Hm, the Wikidata Game is really slow. Magnus, if you read this: do you know what's going on? I play the gender game with only nlwiki articles, but it never loads. It was working yesterday with just 50 items, so it should work now imo.
Greetings,
Sjoerd de Bruin sjoerddebruin@me.com
Op 14 mrt. 2015, om 09:39 heeft Sjoerd de Bruin sjoerddebruin@me.com het volgende geschreven:
I've corrected two lists (Lijst van voorzitters van de SER and Lijst van voorzitters van de WRR) and a music group (Viper (Belgische danceact)). Will play the gender game the next few days to check them.
Greetings,
Sjoerd de Bruin sjoerddebruin@me.com
Op 14 mrt. 2015, om 00:51 heeft Amir Ladsgroup ladsgroup@gmail.com het volgende geschreven:
Sorry for the late answer, got busy in the real world. This is the result for unconnected pages of Dutch Wikipedia. http://tools.wmflabs.org/dexbot/kian_res_nl.txt Please check and tell me when they are not human. I'm producing result for empty items related to Dutch Wikipedia.
On Thu, Mar 12, 2015 at 2:58 PM Amir Ladsgroup ladsgroup@gmail.com wrote:
Sure, tonight it will be done.
Best
On Thu, Mar 12, 2015 at 2:08 AM, Sjoerd de Bruin sjoerddebruin@me.com wrote:
I'm ready for it! All existing humans on nlwiki have a gender now, so it's easy to review this batch. Bring it on.
Op 11 mrt. 2015, om 22:14 heeft Maarten Dammers maarten@mdammers.nl het volgende geschreven:
Hi Amir,
Amir Ladsgroup schreef op 9-3-2015 om 22:40:
Result for English Wikipedia (6366 articles classified as human) https://tools.wmflabs.org/dexbot/kian_res_en.txt
Sounds like fun! Can you run it on the Dutch Wikipedia too? On https://tools.wmflabs.org/multichill/queries/wikidata/noclaims_nlwiki.txt I have a list of items without claims (linking them to other items).
Maarten _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
-- Amir
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Now the gender game is working again, I encountered there were a lot of issues with the following category: https://nl.wikipedia.org/wiki/Categorie:Danceact https://nl.wikipedia.org/wiki/Categorie:Danceact
As you can see, it's about musical groups but they all were marked as human.
Greetings,
Sjoerd de Bruin sjoerddebruin@me.com
Op 14 mrt. 2015, om 14:18 heeft Amir Ladsgroup ladsgroup@gmail.com het volgende geschreven:
I'm writing a parser so I can feed gender classification to Kian, It'll be done soon and you can use it :)
On Sat, Mar 14, 2015 at 12:53 PM Sjoerd de Bruin <sjoerddebruin@me.com mailto:sjoerddebruin@me.com> wrote: Hm, the Wikidata Game is really slow. Magnus, if you read this: do you know what's going on? I play the gender game with only nlwiki articles, but it never loads. It was working yesterday with just 50 items, so it should work now imo.
Greetings,
Sjoerd de Bruin sjoerddebruin@me.com mailto:sjoerddebruin@me.com
Op 14 mrt. 2015, om 09:39 heeft Sjoerd de Bruin <sjoerddebruin@me.com mailto:sjoerddebruin@me.com> het volgende geschreven:
I've corrected two lists (Lijst van voorzitters van de SER and Lijst van voorzitters van de WRR) and a music group (Viper (Belgische danceact)). Will play the gender game the next few days to check them.
Greetings,
Sjoerd de Bruin sjoerddebruin@me.com mailto:sjoerddebruin@me.com
Op 14 mrt. 2015, om 00:51 heeft Amir Ladsgroup <ladsgroup@gmail.com mailto:ladsgroup@gmail.com> het volgende geschreven:
Sorry for the late answer, got busy in the real world. This is the result for unconnected pages of Dutch Wikipedia. http://tools.wmflabs.org/dexbot/kian_res_nl.txt http://tools.wmflabs.org/dexbot/kian_res_nl.txt Please check and tell me when they are not human. I'm producing result for empty items related to Dutch Wikipedia.
On Thu, Mar 12, 2015 at 2:58 PM Amir Ladsgroup <ladsgroup@gmail.com mailto:ladsgroup@gmail.com> wrote: Sure, tonight it will be done.
Best
On Thu, Mar 12, 2015 at 2:08 AM, Sjoerd de Bruin <sjoerddebruin@me.com mailto:sjoerddebruin@me.com> wrote: I'm ready for it! All existing humans on nlwiki have a gender now, so it's easy to review this batch. Bring it on.
Op 11 mrt. 2015, om 22:14 heeft Maarten Dammers <maarten@mdammers.nl mailto:maarten@mdammers.nl> het volgende geschreven:
Hi Amir,
Amir Ladsgroup schreef op 9-3-2015 om 22:40:
Result for English Wikipedia (6366 articles classified as human) https://tools.wmflabs.org/dexbot/kian_res_en.txt
Sounds like fun! Can you run it on the Dutch Wikipedia too? On https://tools.wmflabs.org/multichill/queries/wikidata/noclaims_nlwiki.txt https://tools.wmflabs.org/multichill/queries/wikidata/noclaims_nlwiki.txt I have a list of items without claims (linking them to other items).
Maarten _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org mailto:Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org mailto:Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l https://lists.wikimedia.org/mailman/listinfo/wikidata-l
-- Amir
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org mailto:Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org mailto:Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org mailto:Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l https://lists.wikimedia.org/mailman/listinfo/wikidata-l _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Thanks Sjoerddebruin,
I'm working on this so I can write a system to find possible mistakes and it will find and report mistakes made by Dexbot or others. It works more precise as the time goes by.
Best
On Sun, Mar 15, 2015 at 8:51 PM Sjoerd de Bruin sjoerddebruin@me.com wrote:
Now the gender game is working again, I encountered there were a lot of issues with the following category: https://nl.wikipedia.org/wiki/Categorie:Danceact
As you can see, it's about musical groups but they all were marked as human.
Greetings,
Sjoerd de Bruin sjoerddebruin@me.com
Op 14 mrt. 2015, om 14:18 heeft Amir Ladsgroup ladsgroup@gmail.com het volgende geschreven:
I'm writing a parser so I can feed gender classification to Kian, It'll be done soon and you can use it :)
On Sat, Mar 14, 2015 at 12:53 PM Sjoerd de Bruin sjoerddebruin@me.com wrote:
Hm, the Wikidata Game is really slow. Magnus, if you read this: do you know what's going on? I play the gender game with only nlwiki articles, but it never loads. It was working yesterday with just 50 items, so it should work now imo.
Greetings,
Sjoerd de Bruin sjoerddebruin@me.com
Op 14 mrt. 2015, om 09:39 heeft Sjoerd de Bruin sjoerddebruin@me.com het volgende geschreven:
I've corrected two lists (Lijst van voorzitters van de SER and Lijst van voorzitters van de WRR) and a music group (Viper (Belgische danceact)). Will play the gender game the next few days to check them.
Greetings,
Sjoerd de Bruin sjoerddebruin@me.com
Op 14 mrt. 2015, om 00:51 heeft Amir Ladsgroup ladsgroup@gmail.com het volgende geschreven:
Sorry for the late answer, got busy in the real world. This is the result for unconnected pages of Dutch Wikipedia. http://tools.wmflabs.org/dexbot/kian_res_nl.txt Please check and tell me when they are not human. I'm producing result for empty items related to Dutch Wikipedia.
On Thu, Mar 12, 2015 at 2:58 PM Amir Ladsgroup ladsgroup@gmail.com wrote:
Sure, tonight it will be done.
Best
On Thu, Mar 12, 2015 at 2:08 AM, Sjoerd de Bruin sjoerddebruin@me.com wrote:
I'm ready for it! All existing humans on nlwiki have a gender now, so it's easy to review this batch. Bring it on.
Op 11 mrt. 2015, om 22:14 heeft Maarten Dammers maarten@mdammers.nl het volgende geschreven:
Hi Amir,
Amir Ladsgroup schreef op 9-3-2015 om 22:40:
Result for English Wikipedia (6366 articles classified as human) https://tools.wmflabs.org/dexbot/kian_res_en.txt
Sounds like fun! Can you run it on the Dutch Wikipedia too? On https://tools.wmflabs.org/multichill/queries/wikidata/noclaims_nlwiki.txt I have a list of items without claims (linking them to other items).
Maarten _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
-- Amir
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
OK, I have some news: 1- Today I rewrote some parts of Kian and now it automatically chooses regulation parameter (lambda), thus predictions are more accurate. I wanted to push changes to the github but It seems my ssh has issues. It'll be there soon 2- (Important) I wrote a code that can find possible mistakes in Wikidata based on Kian. The code will be in github soon. Check out this link http://tools.wmflabs.org/dexbot/possible_mistakes_fr.txt. It's result from comparing French Wikipedia against Wikidata e.g. this line:
Q2994923: 1 (d), 0.257480420229 (w) [0, 0, 1, 2, 0]
1 (d) means Wikidata thinks it's a human
0.25... (w) means French Wikipedia thinks it's not a human (with 74.3% certainty)
And if you check the link you can see it's a mistake in Wikidata. Please check other results and fix them.
Tell me if you want this test to be ran from another language too.
3- I used Kian to import unconnected pages from French Wikipedia and created about 1900 items. The result is here http://tools.wmflabs.org/dexbot/kian_res_fr.txt and please check if anything in this list is not human and tell me and I run some error analysis.
Best
On Mon, Mar 16, 2015 at 9:50 PM, Amir Ladsgroup ladsgroup@gmail.com wrote:
Thanks Sjoerddebruin,
I'm working on this so I can write a system to find possible mistakes and it will find and report mistakes made by Dexbot or others. It works more precise as the time goes by.
Best
On Sun, Mar 15, 2015 at 8:51 PM Sjoerd de Bruin sjoerddebruin@me.com wrote:
Now the gender game is working again, I encountered there were a lot of issues with the following category: https://nl. wikipedia.org/wiki/Categorie:Danceact
As you can see, it's about musical groups but they all were marked as human.
Greetings,
Sjoerd de Bruin sjoerddebruin@me.com
Op 14 mrt. 2015, om 14:18 heeft Amir Ladsgroup ladsgroup@gmail.com het volgende geschreven:
I'm writing a parser so I can feed gender classification to Kian, It'll be done soon and you can use it :)
On Sat, Mar 14, 2015 at 12:53 PM Sjoerd de Bruin sjoerddebruin@me.com wrote:
Hm, the Wikidata Game is really slow. Magnus, if you read this: do you know what's going on? I play the gender game with only nlwiki articles, but it never loads. It was working yesterday with just 50 items, so it should work now imo.
Greetings,
Sjoerd de Bruin sjoerddebruin@me.com
Op 14 mrt. 2015, om 09:39 heeft Sjoerd de Bruin sjoerddebruin@me.com het volgende geschreven:
I've corrected two lists (Lijst van voorzitters van de SER and Lijst van voorzitters van de WRR) and a music group (Viper (Belgische danceact)). Will play the gender game the next few days to check them.
Greetings,
Sjoerd de Bruin sjoerddebruin@me.com
Op 14 mrt. 2015, om 00:51 heeft Amir Ladsgroup ladsgroup@gmail.com het volgende geschreven:
Sorry for the late answer, got busy in the real world. This is the result for unconnected pages of Dutch Wikipedia. http://tools.wmflabs.org/dexbot/kian_res_nl.txt Please check and tell me when they are not human. I'm producing result for empty items related to Dutch Wikipedia.
On Thu, Mar 12, 2015 at 2:58 PM Amir Ladsgroup ladsgroup@gmail.com wrote:
Sure, tonight it will be done.
Best
On Thu, Mar 12, 2015 at 2:08 AM, Sjoerd de Bruin sjoerddebruin@me.com wrote:
I'm ready for it! All existing humans on nlwiki have a gender now, so it's easy to review this batch. Bring it on.
Op 11 mrt. 2015, om 22:14 heeft Maarten Dammers maarten@mdammers.nl het volgende geschreven:
Hi Amir,
Amir Ladsgroup schreef op 9-3-2015 om 22:40:
Result for English Wikipedia (6366 articles classified as human) https://tools.wmflabs.org/dexbot/kian_res_en.txt
Sounds like fun! Can you run it on the Dutch Wikipedia too? On https://tools.wmflabs.org/multichill/queries/wikidata/ noclaims_nlwiki.txt I have a list of items without claims (linking them to other items).
Maarten _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
-- Amir
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Il 20/03/2015 01:11, Amir Ladsgroup ha scritto:
OK, I have some news: 1- Today I rewrote some parts of Kian and now it automatically chooses regulation parameter (lambda), thus predictions are more accurate. I wanted to push changes to the github but It seems my ssh has issues. It'll be there soon 2- (Important) I wrote a code that can find possible mistakes in Wikidata based on Kian. The code will be in github soon. Check out this link http://tools.wmflabs.org/dexbot/possible_mistakes_fr.txt. It's result from comparing French Wikipedia against Wikidata e.g. this line: Q2994923: 1 (d), 0.257480420229 (w) [0, 0, 1, 2, 0] 1 (d) means Wikidata thinks it's a human
0.25... (w) means French Wikipedia thinks it's not a human (with 74.3% certainty)
And if you check the link you can see it's a mistake in Wikidata. Please check other results and fix them.
Tell me if you want this test to be ran from another language too.
3- I used Kian to import unconnected pages from French Wikipedia and created about 1900 items. The result is here http://tools.wmflabs.org/dexbot/kian_res_fr.txt and please check if anything in this list is not human and tell me and I run some error analysis.
Best
Great! Unfortunately, some files seem to have been published with the wrong character encoding. E.g. the first name shows up as "Échécrate" in my browsers.
Try to download it, or change the character encoding to utf-8 or unicode.
And yes it's based on dumps. :)
On Fri, Mar 20, 2015 at 3:51 AM Ricordisamoa ricordisamoa@openmailbox.org wrote:
Il 20/03/2015 01:11, Amir Ladsgroup ha scritto:
OK, I have some news: 1- Today I rewrote some parts of Kian and now it automatically chooses regulation parameter (lambda), thus predictions are more accurate. I wanted to push changes to the github but It seems my ssh has issues. It'll be there soon 2- (Important) I wrote a code that can find possible mistakes in Wikidata based on Kian. The code will be in github soon. Check out this link http://tools.wmflabs.org/dexbot/possible_mistakes_fr.txt. It's result from comparing French Wikipedia against Wikidata e.g. this line:
Q2994923: 1 (d), 0.257480420229 (w) [0, 0, 1, 2, 0]
1 (d) means Wikidata thinks it's a human
0.25... (w) means French Wikipedia thinks it's not a human (with 74.3% certainty)
And if you check the link you can see it's a mistake in Wikidata. Please check other results and fix them.
Tell me if you want this test to be ran from another language too.
3- I used Kian to import unconnected pages from French Wikipedia and created about 1900 items. The result is here http://tools.wmflabs.org/dexbot/kian_res_fr.txt and please check if anything in this list is not human and tell me and I run some error analysis.
Best
Great! Unfortunately, some files seem to have been published with the wrong character encoding. E.g. the first name shows up as "Échécrate" in my browsers. _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
One mistake https://www.wikidata.org/wiki/Q2963097 I just found via the report. Article in French Wikipedia is about a French type of cheese but connected to an article in Russian Wikipedia about a French playwriter.
Best On Fri, Mar 20, 2015 at 3:59 AM Amir Ladsgroup ladsgroup@gmail.com wrote:
Try to download it, or change the character encoding to utf-8 or unicode.
And yes it's based on dumps. :)
On Fri, Mar 20, 2015 at 3:51 AM Ricordisamoa ricordisamoa@openmailbox.org wrote:
Il 20/03/2015 01:11, Amir Ladsgroup ha scritto:
OK, I have some news: 1- Today I rewrote some parts of Kian and now it automatically chooses regulation parameter (lambda), thus predictions are more accurate. I wanted to push changes to the github but It seems my ssh has issues. It'll be there soon 2- (Important) I wrote a code that can find possible mistakes in Wikidata based on Kian. The code will be in github soon. Check out this link http://tools.wmflabs.org/dexbot/possible_mistakes_fr.txt. It's result from comparing French Wikipedia against Wikidata e.g. this line:
Q2994923: 1 (d), 0.257480420229 (w) [0, 0, 1, 2, 0]
1 (d) means Wikidata thinks it's a human
0.25... (w) means French Wikipedia thinks it's not a human (with 74.3% certainty)
And if you check the link you can see it's a mistake in Wikidata. Please check other results and fix them.
Tell me if you want this test to be ran from another language too.
3- I used Kian to import unconnected pages from French Wikipedia and created about 1900 items. The result is here http://tools.wmflabs.org/dexbot/kian_res_fr.txt and please check if anything in this list is not human and tell me and I run some error analysis.
Best
Great! Unfortunately, some files seem to have been published with the wrong character encoding. E.g. the first name shows up as "Échécrate" in my browsers. _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On Fri, Mar 20, 2015 at 12:38 AM Amir Ladsgroup ladsgroup@gmail.com wrote:
One mistake https://www.wikidata.org/wiki/Q2963097 I just found via the report. Article in French Wikipedia is about a French type of cheese but connected to an article in Russian Wikipedia about a French playwriter.
Maybe his plays are cheesy?
Probably :P
On Fri, Mar 20, 2015 at 1:36 PM Magnus Manske magnusmanske@googlemail.com wrote:
On Fri, Mar 20, 2015 at 12:38 AM Amir Ladsgroup ladsgroup@gmail.com wrote:
One mistake https://www.wikidata.org/wiki/Q2963097 I just found via the report. Article in French Wikipedia is about a French type of cheese but connected to an article in Russian Wikipedia about a French playwriter.
Maybe his plays are cheesy?
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Il 20/03/2015 01:11, Amir Ladsgroup ha scritto:
OK, I have some news: 1- Today I rewrote some parts of Kian and now it automatically chooses regulation parameter (lambda), thus predictions are more accurate. I wanted to push changes to the github but It seems my ssh has issues. It'll be there soon 2- (Important) I wrote a code that can find possible mistakes in Wikidata based on Kian. The code will be in github soon. Check out this link http://tools.wmflabs.org/dexbot/possible_mistakes_fr.txt. It's result from comparing French Wikipedia against Wikidata e.g. this line: Q2994923: 1 (d), 0.257480420229 (w) [0, 0, 1, 2, 0] 1 (d) means Wikidata thinks it's a human
0.25... (w) means French Wikipedia thinks it's not a human (with 74.3% certainty)
And if you check the link you can see it's a mistake in Wikidata. Please check other results and fix them.
Tell me if you want this test to be ran from another language too.
3- I used Kian to import unconnected pages from French Wikipedia and created about 1900 items. The result is here http://tools.wmflabs.org/dexbot/kian_res_fr.txt and please check if anything in this list is not human and tell me and I run some error analysis.
Best
The data are based on dumps, aren't they? Wikidata hasn't been thinking Q73823 https://www.wikidata.org/wiki/Q73823 is a human since 21 Feb.
On 07.03.2015 18:21, Magnus Manske wrote:
Congratulations for this bold step towards the Singularity :-)
Lol. The word "neural" in the name of the algorithm is infinitely more attractive and inspiring than something abstract like "Support Vector Machine", isn't it? -- although we know that both approaches are much more similar to each other than to any biological neural system. ;-) However, since this is a general mailing list, it may be fair to clarify that this is just a gradient-descent based optimization procedure that we are deadling with here, and that it has nothing to do with a "thinking" general AI. I know that you know this, but not all of our readers may ...
Cheers,
Markus
In technical terms a machine which is using forward and backward propagation to make a approximate prediction [1] is being called a neural network and doesn't matter if I agree or not.
BTW: I use BGFS not gradient descending.
[1]: https://en.wikipedia.org/wiki/Artificial_neural_network
On Sat, Mar 7, 2015 at 9:48 PM, Markus Krötzsch < markus@semantic-mediawiki.org> wrote:
On 07.03.2015 18:21, Magnus Manske wrote:
Congratulations for this bold step towards the Singularity :-)
Lol. The word "neural" in the name of the algorithm is infinitely more attractive and inspiring than something abstract like "Support Vector Machine", isn't it? -- although we know that both approaches are much more similar to each other than to any biological neural system. ;-) However, since this is a general mailing list, it may be fair to clarify that this is just a gradient-descent based optimization procedure that we are deadling with here, and that it has nothing to do with a "thinking" general AI. I know that you know this, but not all of our readers may ...
Cheers,
Markus
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Hi Amir,
In spite of all due enthusiasm, please evaluate your results (with humans!) before making automated edits. In fact, I would contradict Magnus here and say that such an approach would best be suited to provide meaningful (pre-filtered) *input* to people who play a Wikidata game, rather than bypassing the game (and humans) altogether. The expected error rates are quite high for such an approach, but it can still save a lot of works for humans.
As for the next steps, I would suggest that you have a look at the works that others have done already. Try Google Scholar:
https://scholar.google.com/scholar?q=machine+learning+wikipedia
As you can see, there are countless works on using machine learning techniques on Wikipedia, both for information extraction (e.g., understanding link semantics) and for things like vandalism detection. I am sure that one could get a lot of inspiration from there, both on potential applications and on technical hints on how to improve result quality.
You will find that people are using many different approaches in these works. The good old ANN is still a relevant algorithm in practice, but there are many other techniques, such as SVNs, Markov models, or random forests, which have been found to work better than ANNs in many cases. Not saying that a three-layer feed-forward ANN cannot do some jobs as well, but I would not restrict to one ML approach if you have a whole arsenal of algorithms available, most of them pre-implemented in libraries (the first Google hit has a lot of relevant projects listed: http://daoudclarke.github.io/machine%20learning%20in%20practice/2013/10/08/m...). I would certainly recommend that you don't implement any of the standard ML algorithms from scratch.
In practice, the most challenging task for successful ML is often feature engineering: the question which features you use as an input to your learning algorithm. This is far more important that the choice of algorithm. Wikipedia in particular offers you so many relevant pieces of information with each article that are not just mere keywords (links, categories, in-links, ...) and it is not easy to decide which of these to feed into your learner. This will be different for each task you solve (subject classification is fundamentally different from vandalism detection, and even different types of vandalism would require very different techniques). You should pick hard or very large tasks to make sure that the tweaking you need in each case takes less time than you would need as a human to solve the task manually ;-)
Anyway, it's an interesting field, and we could certainly use some effort to exploit the countless works in this field for Wikidata. But you should be aware that this is no small challenge and that there is no universal solution that will work well even for all the tasks that you have mentioned in your email.
Best wishes,
Markus
On 07.03.2015 18:21, Magnus Manske wrote:
Congratulations for this bold step towards the Singularity :-)
As for tasks, basically everything us mere humans do in the Wikidata game: https://tools.wmflabs.org/wikidata-game/
Some may require text parsing. Not sure how to get that working; haven't spent much time with (artificial) neural nets in a while.
On Sat, Mar 7, 2015 at 12:36 PM Amir Ladsgroup <ladsgroup@gmail.com mailto:ladsgroup@gmail.com> wrote:
Some useful tasks that I'm looking for a way to do are: *Anti-vandal bot (or how we can quantify an edit). *Auto labeling for humans (That's the next task). *Add more :) On Sat, Mar 7, 2015 at 3:54 PM, Amir Ladsgroup <ladsgroup@gmail.com <mailto:ladsgroup@gmail.com>> wrote: Hey, I spent last few weeks working on this lights off [1] and now it's ready to work! Kian is a three-layered neural network with flexible number of inputs and outputs. So if we can parametrize a job, we can teach him easily and get the job done. For example and as the first job. We want to add P31:5 (human) to items of Wikidata based on categories of articles in Wikipedia. The only thing we need to is get list of items with P31:5 and list of items of not-humans (P31 exists but not 5 in it). then get list of category links in any wiki we want[2] and at last we feed these files to Kian and let him learn. Afterwards if we give Kian other articles and their categories, he classifies them as human, not human, or failed to determine. As test I gave him categories of ckb wiki (a small wiki) and worked pretty well and now I'm creating the training set from German Wikipedia and the next step will be English Wikipedia. Number of P31:5 will drastically increase this week. I would love comments or ideas for tasks that Kian can do. [1]: Because I love surprises [2]: "select pp_value, cl_to from page_props join categorylinks on pp_page = cl_from where pp_propname = 'wikibase_item';" Best -- Amir -- Amir _________________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org> https://lists.wikimedia.org/__mailman/listinfo/wikidata-l <https://lists.wikimedia.org/mailman/listinfo/wikidata-l>
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Hey Markus, Thanks for your insight :)
On Sat, Mar 7, 2015 at 9:52 PM, Markus Krötzsch < markus@semantic-mediawiki.org> wrote:
Hi Amir,
In spite of all due enthusiasm, please evaluate your results (with humans!) before making automated edits. In fact, I would contradict Magnus here and say that such an approach would best be suited to provide meaningful (pre-filtered) *input* to people who play a Wikidata game, rather than bypassing the game (and humans) altogether. The expected error rates are quite high for such an approach, but it can still save a lot of works for humans.
there is a "certainty factor" and It can save a lot without making such
errors by using the certainty factor
As for the next steps, I would suggest that you have a look at the works that others have done already. Try Google Scholar:
https://scholar.google.com/scholar?q=machine+learning+wikipedia
As you can see, there are countless works on using machine learning techniques on Wikipedia, both for information extraction (e.g., understanding link semantics) and for things like vandalism detection. I am sure that one could get a lot of inspiration from there, both on potential applications and on technical hints on how to improve result quality.
Yes, definitely I would use them, thanks.
You will find that people are using many different approaches in these works. The good old ANN is still a relevant algorithm in practice, but there are many other techniques, such as SVNs, Markov models, or random forests, which have been found to work better than ANNs in many cases. Not saying that a three-layer feed-forward ANN cannot do some jobs as well, but I would not restrict to one ML approach if you have a whole arsenal of algorithms available, most of them pre-implemented in libraries (the first Google hit has a lot of relevant projects listed: http://daoudclarke.github.io/machine%20learning%20in% 20practice/2013/10/08/machine-learning-libraries/). I would certainly recommend that you don't implement any of the standard ML algorithms from scratch.
I use backward propagation algorithm and I use octave in ML for my
personal works, but in Wikipedia I use python (for two main reasons: integrating with with other wikipedia-related tools like pywikibot and bad performance of octave and Matlab in big sets of data) and I had to write that parts from scratch since I couldn't find any related library in python. Even algorithms like BFGS is not there (I could find in scipy but I wasn't sure it works correctly and because no documentation is there)
In practice, the most challenging task for successful ML is often feature engineering: the question which features you use as an input to your learning algorithm. This is far more important that the choice of algorithm. Wikipedia in particular offers you so many relevant pieces of information with each article that are not just mere keywords (links, categories, in-links, ...) and it is not easy to decide which of these to feed into your learner. This will be different for each task you solve (subject classification is fundamentally different from vandalism detection, and even different types of vandalism would require very different techniques). You should pick hard or very large tasks to make sure that the tweaking you need in each case takes less time than you would need as a human to solve the task manually ;-)
Yes, feature engineering is the most important thing and it can be tricky
but feature engineering in Wikidata is lot easier (it's easier than Wikipedia. Wikipedia itself it's easier than other places). Anti-Vandalism bots are lot easier in Wikidata than Wikipedia. Editing in Wikidata is limited to certain kinds (like removing a sitelink, etc.) but it's not easy in Wikipedia.
Anyway, it's an interesting field, and we could certainly use some effort to exploit the countless works in this field for Wikidata. But you should be aware that this is no small challenge and that there is no universal solution that will work well even for all the tasks that you have mentioned in your email.
Of course, I spent lots of time studying this and I would be happy if
anyone who knows about neural networks or AI can contribute too.
Best wishes,
Markus
On 07.03.2015 18:21, Magnus Manske wrote:
Congratulations for this bold step towards the Singularity :-)
As for tasks, basically everything us mere humans do in the Wikidata game: https://tools.wmflabs.org/wikidata-game/
Some may require text parsing. Not sure how to get that working; haven't spent much time with (artificial) neural nets in a while.
On Sat, Mar 7, 2015 at 12:36 PM Amir Ladsgroup <ladsgroup@gmail.com mailto:ladsgroup@gmail.com> wrote:
Some useful tasks that I'm looking for a way to do are: *Anti-vandal bot (or how we can quantify an edit). *Auto labeling for humans (That's the next task). *Add more :) On Sat, Mar 7, 2015 at 3:54 PM, Amir Ladsgroup <ladsgroup@gmail.com <mailto:ladsgroup@gmail.com>> wrote: Hey, I spent last few weeks working on this lights off [1] and now it's ready to work! Kian is a three-layered neural network with flexible number of inputs and outputs. So if we can parametrize a job, we can teach him easily and get the job done. For example and as the first job. We want to add P31:5 (human) to items of Wikidata based on categories of articles in Wikipedia. The only thing we need to is get list of items with P31:5 and list of items of not-humans (P31 exists but not 5 in it). then get list of category links in any wiki we want[2] and at last we feed these files to Kian and let him learn. Afterwards if we give Kian other articles and their categories, he classifies them as human, not human, or failed to determine. As test I gave him categories of ckb wiki (a small wiki) and worked pretty well and now I'm creating the training set from German Wikipedia and the next step will be English Wikipedia. Number of P31:5 will drastically increase this week. I would love comments or ideas for tasks that Kian can do. [1]: Because I love surprises [2]: "select pp_value, cl_to from page_props join categorylinks on pp_page = cl_from where pp_propname = 'wikibase_item';" Best -- Amir -- Amir _________________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/__mailman/listinfo/wikidata-l <https://lists.wikimedia.org/mailman/listinfo/wikidata-l>
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Amir,
What is the false positive rate of your algorithm when dealing with fictitious humans and (non-fictitious) non-human organisms? That is, how often does your program classify such non-humans as humans?
Regarding the latter, note that items about individual dogs, elephants, chimpanzees and even trees can use properties that are otherwise extremely skewed towards humans. For example, Prometheus (Q590010) [1], an extremely old tree, has claims for *date of birth* (P569), *date of death* (P570), even *killed by* (P157). Non-human animals can also have kinship claims (e.g. *mother*, *brother, child*), among other properties typically used on humans.
Best, Eric
https://www.wikidata.org/wiki/User:Emw
1. Prometheus. https://www.wikidata.org/wiki/Q590010
On Sat, Mar 7, 2015 at 1:44 PM, Amir Ladsgroup ladsgroup@gmail.com wrote:
Hey Markus, Thanks for your insight :)
On Sat, Mar 7, 2015 at 9:52 PM, Markus Krötzsch < markus@semantic-mediawiki.org> wrote:
Hi Amir,
In spite of all due enthusiasm, please evaluate your results (with humans!) before making automated edits. In fact, I would contradict Magnus here and say that such an approach would best be suited to provide meaningful (pre-filtered) *input* to people who play a Wikidata game, rather than bypassing the game (and humans) altogether. The expected error rates are quite high for such an approach, but it can still save a lot of works for humans.
there is a "certainty factor" and It can save a lot without making such
errors by using the certainty factor
As for the next steps, I would suggest that you have a look at the works that others have done already. Try Google Scholar:
https://scholar.google.com/scholar?q=machine+learning+wikipedia
As you can see, there are countless works on using machine learning techniques on Wikipedia, both for information extraction (e.g., understanding link semantics) and for things like vandalism detection. I am sure that one could get a lot of inspiration from there, both on potential applications and on technical hints on how to improve result quality.
Yes, definitely I would use them, thanks.
You will find that people are using many different approaches in these works. The good old ANN is still a relevant algorithm in practice, but there are many other techniques, such as SVNs, Markov models, or random forests, which have been found to work better than ANNs in many cases. Not saying that a three-layer feed-forward ANN cannot do some jobs as well, but I would not restrict to one ML approach if you have a whole arsenal of algorithms available, most of them pre-implemented in libraries (the first Google hit has a lot of relevant projects listed: http://daoudclarke.github.io/machine%20learning%20in% 20practice/2013/10/08/machine-learning-libraries/). I would certainly recommend that you don't implement any of the standard ML algorithms from scratch.
I use backward propagation algorithm and I use octave in ML for my
personal works, but in Wikipedia I use python (for two main reasons: integrating with with other wikipedia-related tools like pywikibot and bad performance of octave and Matlab in big sets of data) and I had to write that parts from scratch since I couldn't find any related library in python. Even algorithms like BFGS is not there (I could find in scipy but I wasn't sure it works correctly and because no documentation is there)
In practice, the most challenging task for successful ML is often feature engineering: the question which features you use as an input to your learning algorithm. This is far more important that the choice of algorithm. Wikipedia in particular offers you so many relevant pieces of information with each article that are not just mere keywords (links, categories, in-links, ...) and it is not easy to decide which of these to feed into your learner. This will be different for each task you solve (subject classification is fundamentally different from vandalism detection, and even different types of vandalism would require very different techniques). You should pick hard or very large tasks to make sure that the tweaking you need in each case takes less time than you would need as a human to solve the task manually ;-)
Yes, feature engineering is the most important thing and it can be tricky
but feature engineering in Wikidata is lot easier (it's easier than Wikipedia. Wikipedia itself it's easier than other places). Anti-Vandalism bots are lot easier in Wikidata than Wikipedia. Editing in Wikidata is limited to certain kinds (like removing a sitelink, etc.) but it's not easy in Wikipedia.
Anyway, it's an interesting field, and we could certainly use some effort to exploit the countless works in this field for Wikidata. But you should be aware that this is no small challenge and that there is no universal solution that will work well even for all the tasks that you have mentioned in your email.
Of course, I spent lots of time studying this and I would be happy if
anyone who knows about neural networks or AI can contribute too.
Best wishes,
Markus
On 07.03.2015 18:21, Magnus Manske wrote:
Congratulations for this bold step towards the Singularity :-)
As for tasks, basically everything us mere humans do in the Wikidata game: https://tools.wmflabs.org/wikidata-game/
Some may require text parsing. Not sure how to get that working; haven't spent much time with (artificial) neural nets in a while.
On Sat, Mar 7, 2015 at 12:36 PM Amir Ladsgroup <ladsgroup@gmail.com mailto:ladsgroup@gmail.com> wrote:
Some useful tasks that I'm looking for a way to do are: *Anti-vandal bot (or how we can quantify an edit). *Auto labeling for humans (That's the next task). *Add more :) On Sat, Mar 7, 2015 at 3:54 PM, Amir Ladsgroup <ladsgroup@gmail.com <mailto:ladsgroup@gmail.com>> wrote: Hey, I spent last few weeks working on this lights off [1] and now it's ready to work! Kian is a three-layered neural network with flexible number of inputs and outputs. So if we can parametrize a job, we can teach him easily and get the job done. For example and as the first job. We want to add P31:5 (human) to items of Wikidata based on categories of articles in Wikipedia. The only thing we need to is get list of items with P31:5 and list of items of not-humans (P31 exists but not 5 in it). then get list of category links in any wiki we want[2] and at last we feed these files to Kian and let him learn. Afterwards if we give Kian other articles and their categories, he classifies them as human, not human, or failed to determine. As test I gave him categories of ckb wiki (a small wiki) and worked pretty well and now I'm creating the training set from German Wikipedia and the next step will be English Wikipedia. Number of P31:5 will drastically increase this week. I would love comments or ideas for tasks that Kian can do. [1]: Because I love surprises [2]: "select pp_value, cl_to from page_props join categorylinks on pp_page = cl_from where pp_propname = 'wikibase_item';" Best -- Amir -- Amir _________________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org <mailto:Wikidata-l@lists.
wikimedia.org> https://lists.wikimedia.org/__mailman/listinfo/wikidata-l https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
-- Amir
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On Sat, Mar 7, 2015 at 10:25 PM, Emw emw.wiki@gmail.com wrote:
Amir,
What is the false positive rate of your algorithm when dealing with fictitious humans and (non-fictitious) non-human organisms? That is, how often does your program classify such non-humans as humans?
I give you an exact number for German Wikipedia in several hours
Regarding the latter, note that items about individual dogs, elephants, chimpanzees and even trees can use properties that are otherwise extremely skewed towards humans. For example, Prometheus (Q590010) [1], an extremely old tree, has claims for *date of birth* (P569), *date of death* (P570), even *killed by* (P157). Non-human animals can also have kinship claims (e.g. *mother*, *brother, child*), among other properties typically used on humans.
The trick to avoid such errors is to give big negative score for having a
group E, or D category.
Feature engineering for this task is a little complicated. At first I group categories of a Wiki by having human articles. If more than 80% members of a category are known to be humans, it's a group A category and so on. (D group= 0%). so an article can be parameterized by number of categories in each group it has. e.g. an article about human usually is like 5,3,2,0,0 and an article about a tree can be like 1,0,0,6,7 and having one or several group A category alongside with several group D category prevents the bot from making such false statements. How it's possible and how a bot can do that? it's because the huge set of data (training set) we have already and neural networks algorithms.
Best
Best, Eric
https://www.wikidata.org/wiki/User:Emw
- Prometheus. https://www.wikidata.org/wiki/Q590010
On Sat, Mar 7, 2015 at 1:44 PM, Amir Ladsgroup ladsgroup@gmail.com wrote:
Hey Markus, Thanks for your insight :)
On Sat, Mar 7, 2015 at 9:52 PM, Markus Krötzsch < markus@semantic-mediawiki.org> wrote:
Hi Amir,
In spite of all due enthusiasm, please evaluate your results (with humans!) before making automated edits. In fact, I would contradict Magnus here and say that such an approach would best be suited to provide meaningful (pre-filtered) *input* to people who play a Wikidata game, rather than bypassing the game (and humans) altogether. The expected error rates are quite high for such an approach, but it can still save a lot of works for humans.
there is a "certainty factor" and It can save a lot without making such
errors by using the certainty factor
As for the next steps, I would suggest that you have a look at the works that others have done already. Try Google Scholar:
https://scholar.google.com/scholar?q=machine+learning+wikipedia
As you can see, there are countless works on using machine learning techniques on Wikipedia, both for information extraction (e.g., understanding link semantics) and for things like vandalism detection. I am sure that one could get a lot of inspiration from there, both on potential applications and on technical hints on how to improve result quality.
Yes, definitely I would use them, thanks.
You will find that people are using many different approaches in these works. The good old ANN is still a relevant algorithm in practice, but there are many other techniques, such as SVNs, Markov models, or random forests, which have been found to work better than ANNs in many cases. Not saying that a three-layer feed-forward ANN cannot do some jobs as well, but I would not restrict to one ML approach if you have a whole arsenal of algorithms available, most of them pre-implemented in libraries (the first Google hit has a lot of relevant projects listed: http://daoudclarke.github.io/machine%20learning%20in% 20practice/2013/10/08/machine-learning-libraries/). I would certainly recommend that you don't implement any of the standard ML algorithms from scratch.
I use backward propagation algorithm and I use octave in ML for my
personal works, but in Wikipedia I use python (for two main reasons: integrating with with other wikipedia-related tools like pywikibot and bad performance of octave and Matlab in big sets of data) and I had to write that parts from scratch since I couldn't find any related library in python. Even algorithms like BFGS is not there (I could find in scipy but I wasn't sure it works correctly and because no documentation is there)
In practice, the most challenging task for successful ML is often feature engineering: the question which features you use as an input to your learning algorithm. This is far more important that the choice of algorithm. Wikipedia in particular offers you so many relevant pieces of information with each article that are not just mere keywords (links, categories, in-links, ...) and it is not easy to decide which of these to feed into your learner. This will be different for each task you solve (subject classification is fundamentally different from vandalism detection, and even different types of vandalism would require very different techniques). You should pick hard or very large tasks to make sure that the tweaking you need in each case takes less time than you would need as a human to solve the task manually ;-)
Yes, feature engineering is the most important thing and it can be
tricky but feature engineering in Wikidata is lot easier (it's easier than Wikipedia. Wikipedia itself it's easier than other places). Anti-Vandalism bots are lot easier in Wikidata than Wikipedia. Editing in Wikidata is limited to certain kinds (like removing a sitelink, etc.) but it's not easy in Wikipedia.
Anyway, it's an interesting field, and we could certainly use some effort to exploit the countless works in this field for Wikidata. But you should be aware that this is no small challenge and that there is no universal solution that will work well even for all the tasks that you have mentioned in your email.
Of course, I spent lots of time studying this and I would be happy if
anyone who knows about neural networks or AI can contribute too.
Best wishes,
Markus
On 07.03.2015 18:21, Magnus Manske wrote:
Congratulations for this bold step towards the Singularity :-)
As for tasks, basically everything us mere humans do in the Wikidata game: https://tools.wmflabs.org/wikidata-game/
Some may require text parsing. Not sure how to get that working; haven't spent much time with (artificial) neural nets in a while.
On Sat, Mar 7, 2015 at 12:36 PM Amir Ladsgroup <ladsgroup@gmail.com mailto:ladsgroup@gmail.com> wrote:
Some useful tasks that I'm looking for a way to do are: *Anti-vandal bot (or how we can quantify an edit). *Auto labeling for humans (That's the next task). *Add more :) On Sat, Mar 7, 2015 at 3:54 PM, Amir Ladsgroup <ladsgroup@gmail.com <mailto:ladsgroup@gmail.com>> wrote: Hey, I spent last few weeks working on this lights off [1] and now it's ready to work! Kian is a three-layered neural network with flexible number of inputs and outputs. So if we can parametrize a job, we can teach him easily and get the job done. For example and as the first job. We want to add P31:5 (human) to items of Wikidata based on categories of articles in Wikipedia. The only thing we need to is get list of items with P31:5 and list of items of not-humans (P31 exists but not 5 in it). then get list of category links in any wiki we want[2] and at last we feed these files to Kian and let him learn. Afterwards if we give Kian other articles and their categories, he classifies them as human, not human, or failed to determine. As test I gave him categories of ckb wiki (a small wiki) and worked pretty well and now I'm creating the training set from German Wikipedia and the next step will be English Wikipedia. Number of P31:5 will drastically increase this week. I would love comments or ideas for tasks that Kian can do. [1]: Because I love surprises [2]: "select pp_value, cl_to from page_props join categorylinks on pp_page = cl_from where pp_propname = 'wikibase_item';" Best -- Amir -- Amir _________________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org <mailto:Wikidata-l@lists.
wikimedia.org> https://lists.wikimedia.org/__mailman/listinfo/wikidata-l https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
-- Amir
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
:)
On Sat, Mar 7, 2015 at 8:51 PM, Magnus Manske magnusmanske@googlemail.com wrote:
Congratulations for this bold step towards the Singularity :-)
As for tasks, basically everything us mere humans do in the Wikidata game: https://tools.wmflabs.org/wikidata-game/
Some may require text parsing. Not sure how to get that working; haven't spent much time with (artificial) neural nets in a while.
Gender is easy, occupation, country of citizenship, DoB and DoD are also
possible but a little bit tricky. I should think more about other games.
A question: How do you realize that two items are suspicious to merge? Maybe it would be easy to start merging them.
Best
On Sat, Mar 7, 2015 at 12:36 PM Amir Ladsgroup ladsgroup@gmail.com wrote:
Some useful tasks that I'm looking for a way to do are: *Anti-vandal bot (or how we can quantify an edit). *Auto labeling for humans (That's the next task). *Add more :)
On Sat, Mar 7, 2015 at 3:54 PM, Amir Ladsgroup ladsgroup@gmail.com wrote:
Hey, I spent last few weeks working on this lights off [1] and now it's ready to work!
Kian is a three-layered neural network with flexible number of inputs and outputs. So if we can parametrize a job, we can teach him easily and get the job done.
For example and as the first job. We want to add P31:5 (human) to items of Wikidata based on categories of articles in Wikipedia. The only thing we need to is get list of items with P31:5 and list of items of not-humans (P31 exists but not 5 in it). then get list of category links in any wiki we want[2] and at last we feed these files to Kian and let him learn. Afterwards if we give Kian other articles and their categories, he classifies them as human, not human, or failed to determine. As test I gave him categories of ckb wiki (a small wiki) and worked pretty well and now I'm creating the training set from German Wikipedia and the next step will be English Wikipedia. Number of P31:5 will drastically increase this week.
I would love comments or ideas for tasks that Kian can do.
[1]: Because I love surprises [2]: "select pp_value, cl_to from page_props join categorylinks on pp_page = cl_from where pp_propname = 'wikibase_item';" Best -- Amir
-- Amir
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Am 07.03.2015 um 20:27 schrieb Amir Ladsgroup:
A question: How do you realize that two items are suspicious to merge? Maybe it would be easy to start merging them.
I would think that merge candidates should 1) be similar and 2) not conflict (or have limited or easily resolved conflicts). The exact measure for similarity you'd want to use for this approach is probably the crucial part.
Hoi, For me this is a dream come true. I very much do NOT want another million edits. That is not to say that I would lose interest; I do not want to do this. Thanks, GerardM
On 7 March 2015 at 13:24, Amir Ladsgroup ladsgroup@gmail.com wrote:
Hey, I spent last few weeks working on this lights off [1] and now it's ready to work!
Kian is a three-layered neural network with flexible number of inputs and outputs. So if we can parametrize a job, we can teach him easily and get the job done.
For example and as the first job. We want to add P31:5 (human) to items of Wikidata based on categories of articles in Wikipedia. The only thing we need to is get list of items with P31:5 and list of items of not-humans (P31 exists but not 5 in it). then get list of category links in any wiki we want[2] and at last we feed these files to Kian and let him learn. Afterwards if we give Kian other articles and their categories, he classifies them as human, not human, or failed to determine. As test I gave him categories of ckb wiki (a small wiki) and worked pretty well and now I'm creating the training set from German Wikipedia and the next step will be English Wikipedia. Number of P31:5 will drastically increase this week.
I would love comments or ideas for tasks that Kian can do.
[1]: Because I love surprises [2]: "select pp_value, cl_to from page_props join categorylinks on pp_page = cl_from where pp_propname = 'wikibase_item';" Best -- Amir
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Sounds promising! It'd be good to have the code publicly viewable.
Il 07/03/2015 13:24, Amir Ladsgroup ha scritto:
Hey, I spent last few weeks working on this lights off [1] and now it's ready to work!
Kian is a three-layered neural network with flexible number of inputs and outputs. So if we can parametrize a job, we can teach him easily and get the job done.
For example and as the first job. We want to add P31:5 (human) to items of Wikidata based on categories of articles in Wikipedia. The only thing we need to is get list of items with P31:5 and list of items of not-humans (P31 exists but not 5 in it). then get list of category links in any wiki we want[2] and at last we feed these files to Kian and let him learn. Afterwards if we give Kian other articles and their categories, he classifies them as human, not human, or failed to determine. As test I gave him categories of ckb wiki (a small wiki) and worked pretty well and now I'm creating the training set from German Wikipedia and the next step will be English Wikipedia. Number of P31:5 will drastically increase this week.
I would love comments or ideas for tasks that Kian can do.
[1]: Because I love surprises [2]: "select pp_value, cl_to from page_props join categorylinks on pp_page = cl_from where pp_propname = 'wikibase_item';" Best -- Amir
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
2015-03-08 12:56 GMT+01:00 Ricordisamoa ricordisamoa@openmailbox.org:
Sounds promising! It'd be good to have the code publicly viewable.
+1
Cristian
I'm cleaning up and pep8fiying the code to publish it.
On Sat, Mar 14, 2015 at 8:42 PM Cristian Consonni kikkocristian@gmail.com wrote:
2015-03-08 12:56 GMT+01:00 Ricordisamoa ricordisamoa@openmailbox.org:
Sounds promising! It'd be good to have the code publicly viewable.
+1
Cristian
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
I just published the code https://github.com/Ladsgroup/Kian
I really appreciate any comments or changes on the code.
On Sun, Mar 15, 2015 at 2:30 PM Amir Ladsgroup ladsgroup@gmail.com wrote:
I'm cleaning up and pep8fiying the code to publish it.
On Sat, Mar 14, 2015 at 8:42 PM Cristian Consonni kikkocristian@gmail.com wrote:
2015-03-08 12:56 GMT+01:00 Ricordisamoa ricordisamoa@openmailbox.org:
Sounds promising! It'd be good to have the code publicly viewable.
+1
Cristian
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l