I've been practicing using the Wikidata and MediaWiki APIs today. Even though the Wikidata API is still being developed, using it was more pleasant than parsing the templates. The good news is I'll probably be able to reuse a lot of that code for other infoboxes that still need to be imported. It's looking through 76000 articles for new statements it can add using Wikidata's currently supported film properties, excluding the IMDb ID because that isn't included in the infobox. So far it has added 691 new statements to 68 movies. I forgot to add a counter for when it finds one that already had all of the information entered. It will definitely have some errors, but I scanned the results for the first 100 movies before I started importing them, and I think the value-add will be much greater than the number of errors.
Here's the contributions from my IP address if you want to monitor it: http://www.wikidata.org/wiki/Special:Contributions/108.235.225.145
From: hale.michael.jr@live.com To: wikidata-l@lists.wikimedia.org Date: Tue, 2 Apr 2013 00:58:05 -0400 Subject: [Wikidata-l] Running "Infobox film" import script
I've been practicing using the Wikidata and MediaWiki APIs today. Even though the Wikidata API is still being developed, using it was more pleasant than parsing the templates. The good news is I'll probably be able to reuse a lot of that code for other infoboxes that still need to be imported. It's looking through 76000 articles for new statements it can add using Wikidata's currently supported film properties, excluding the IMDb ID because that isn't included in the infobox. So far it has added 691 new statements to 68 movies. I forgot to add a counter for when it finds one that already had all of the information entered. It will definitely have some errors, but I scanned the results for the first 100 movies before I started importing them, and I think the value-add will be much greater than the number of errors.
_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Hi Michael, I had blocked your IP before I saw this email. We have a bot policy ( https://www.wikidata.org/wiki/Wikidata:Bot) that requires coders to get approval before they can run their scripts (see https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot).
Your script did look pretty cool though :)
-- Legoktm http://enwp.org/d:User:Legoktm
On Tue, Apr 2, 2013 at 12:01 AM, Michael Hale hale.michael.jr@live.comwrote:
Here's the contributions from my IP address if you want to monitor it: http://www.wikidata.org/wiki/Special:Contributions/108.235.225.145
From: hale.michael.jr@live.com To: wikidata-l@lists.wikimedia.org Date: Tue, 2 Apr 2013 00:58:05 -0400 Subject: [Wikidata-l] Running "Infobox film" import script
I've been practicing using the Wikidata and MediaWiki APIs today. Even though the Wikidata API is still being developed, using it was more pleasant than parsing the templates. The good news is I'll probably be able to reuse a lot of that code for other infoboxes that still need to be imported. It's looking through 76000 articles for new statements it can add using Wikidata's currently supported film properties, excluding the IMDb ID because that isn't included in the infobox. So far it has added 691 new statements to 68 movies. I forgot to add a counter for when it finds one that already had all of the information entered. It will definitely have some errors, but I scanned the results for the first 100 movies before I started importing them, and I think the value-add will be much greater than the number of errors.
_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Oh, I thought bot approval was just for tasks that have an indefinite running time like cleaning vandalism, etc. I added a user-agent string to identify my IP address, but I hadn't run into problems on Wikipedia without one before. Am I supposed to re-apply each time I change a script?
From: legoktm.wikipedia@gmail.com Date: Tue, 2 Apr 2013 00:21:44 -0500 To: wikidata-l@lists.wikimedia.org Subject: Re: [Wikidata-l] Running "Infobox film" import script
Hi Michael, I had blocked your IP before I saw this email. We have a bot policy (https://www.wikidata.org/wiki/Wikidata:Bot) that requires coders to get approval before they can run their scripts (see https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot).
Your script did look pretty cool though :)
-- Legoktm http://enwp.org/d:User:Legoktm
On Tue, Apr 2, 2013 at 12:01 AM, Michael Hale hale.michael.jr@live.com wrote:
Here's the contributions from my IP address if you want to monitor it: http://www.wikidata.org/wiki/Special:Contributions/108.235.225.145
From: hale.michael.jr@live.com To: wikidata-l@lists.wikimedia.org
Date: Tue, 2 Apr 2013 00:58:05 -0400 Subject: [Wikidata-l] Running "Infobox film" import script
I've been practicing using the Wikidata and MediaWiki APIs today. Even though the Wikidata API is still being developed, using it was more pleasant than parsing the templates. The good news is I'll probably be able to reuse a lot of that code for other infoboxes that still need to be imported. It's looking through 76000 articles for new statements it can add using Wikidata's currently supported film properties, excluding the IMDb ID because that isn't included in the infobox. So far it has added 691 new statements to 68 movies. I forgot to add a counter for when it finds one that already had all of the information entered. It will definitely have some errors, but I scanned the results for the first 100 movies before I started importing them, and I think the value-add will be much greater than the number of errors.
_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l
_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Michael Hale, 02/04/2013 07:31:
Oh, I thought bot approval was just for tasks that have an indefinite running time like cleaning vandalism, etc. I added a user-agent string to identify my IP address, but I hadn't run into problems on Wikipedia without one before. Am I supposed to re-apply each time I change a script?
No, you're supposed not to use an IP at all: bot should run on bot accounts. https://meta.wikimedia.org/wiki/Bot
Nemo
Bot approval is needed for any task that's going to be editing at high speeds or is (semi-)automated, which yours was. You can just request approval for "adding claims based off of Wikipedia infoboxes" and say you're going to start with films, and that will be good enough, so you won't need to request approval again.
-- Legoktm http://enwp.org/d:User:Legoktm
On Tue, Apr 2, 2013 at 12:31 AM, Michael Hale hale.michael.jr@live.comwrote:
Oh, I thought bot approval was just for tasks that have an indefinite running time like cleaning vandalism, etc. I added a user-agent string to identify my IP address, but I hadn't run into problems on Wikipedia without one before. Am I supposed to re-apply each time I change a script?
From: legoktm.wikipedia@gmail.com Date: Tue, 2 Apr 2013 00:21:44 -0500 To: wikidata-l@lists.wikimedia.org Subject: Re: [Wikidata-l] Running "Infobox film" import script
Hi Michael, I had blocked your IP before I saw this email. We have a bot policy ( https://www.wikidata.org/wiki/Wikidata:Bot) that requires coders to get approval before they can run their scripts (see https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot).
Your script did look pretty cool though :)
-- Legoktm http://enwp.org/d:User:Legoktm
On Tue, Apr 2, 2013 at 12:01 AM, Michael Hale hale.michael.jr@live.comwrote:
Here's the contributions from my IP address if you want to monitor it: http://www.wikidata.org/wiki/Special:Contributions/108.235.225.145
From: hale.michael.jr@live.com To: wikidata-l@lists.wikimedia.org Date: Tue, 2 Apr 2013 00:58:05 -0400 Subject: [Wikidata-l] Running "Infobox film" import script
I've been practicing using the Wikidata and MediaWiki APIs today. Even though the Wikidata API is still being developed, using it was more pleasant than parsing the templates. The good news is I'll probably be able to reuse a lot of that code for other infoboxes that still need to be imported. It's looking through 76000 articles for new statements it can add using Wikidata's currently supported film properties, excluding the IMDb ID because that isn't included in the infobox. So far it has added 691 new statements to 68 movies. I forgot to add a counter for when it finds one that already had all of the information entered. It will definitely have some errors, but I scanned the results for the first 100 movies before I started importing them, and I think the value-add will be much greater than the number of errors.
_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Ok, I created a request.
From: legoktm.wikipedia@gmail.com Date: Tue, 2 Apr 2013 00:57:36 -0500 To: wikidata-l@lists.wikimedia.org Subject: Re: [Wikidata-l] Running "Infobox film" import script
Bot approval is needed for any task that's going to be editing at high speeds or is (semi-)automated, which yours was. You can just request approval for "adding claims based off of Wikipedia infoboxes" and say you're going to start with films, and that will be good enough, so you won't need to request approval again.
-- Legoktm http://enwp.org/d:User:Legoktm
On Tue, Apr 2, 2013 at 12:31 AM, Michael Hale hale.michael.jr@live.com wrote:
Oh, I thought bot approval was just for tasks that have an indefinite running time like cleaning vandalism, etc. I added a user-agent string to identify my IP address, but I hadn't run into problems on Wikipedia without one before. Am I supposed to re-apply each time I change a script?
From: legoktm.wikipedia@gmail.com Date: Tue, 2 Apr 2013 00:21:44 -0500 To: wikidata-l@lists.wikimedia.org
Subject: Re: [Wikidata-l] Running "Infobox film" import script
Hi Michael, I had blocked your IP before I saw this email. We have a bot policy (https://www.wikidata.org/wiki/Wikidata:Bot) that requires coders to get approval before they can run their scripts (see https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot).
Your script did look pretty cool though :)
-- Legoktm http://enwp.org/d:User:Legoktm
On Tue, Apr 2, 2013 at 12:01 AM, Michael Hale hale.michael.jr@live.com wrote:
Here's the contributions from my IP address if you want to monitor it: http://www.wikidata.org/wiki/Special:Contributions/108.235.225.145
From: hale.michael.jr@live.com To: wikidata-l@lists.wikimedia.org
Date: Tue, 2 Apr 2013 00:58:05 -0400 Subject: [Wikidata-l] Running "Infobox film" import script
I've been practicing using the Wikidata and MediaWiki APIs today. Even though the Wikidata API is still being developed, using it was more pleasant than parsing the templates. The good news is I'll probably be able to reuse a lot of that code for other infoboxes that still need to be imported. It's looking through 76000 articles for new statements it can add using Wikidata's currently supported film properties, excluding the IMDb ID because that isn't included in the infobox. So far it has added 691 new statements to 68 movies. I forgot to add a counter for when it finds one that already had all of the information entered. It will definitely have some errors, but I scanned the results for the first 100 movies before I started importing them, and I think the value-add will be much greater than the number of errors.
_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l
_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l
_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Thanks for the guidance. The bot is now working properly at the preferred rate of no more than 5 edits per minute.
From: hale.michael.jr@live.com To: wikidata-l@lists.wikimedia.org Date: Tue, 2 Apr 2013 02:01:47 -0400 Subject: Re: [Wikidata-l] Running "Infobox film" import script
Ok, I created a request.
From: legoktm.wikipedia@gmail.com Date: Tue, 2 Apr 2013 00:57:36 -0500 To: wikidata-l@lists.wikimedia.org Subject: Re: [Wikidata-l] Running "Infobox film" import script
Bot approval is needed for any task that's going to be editing at high speeds or is (semi-)automated, which yours was. You can just request approval for "adding claims based off of Wikipedia infoboxes" and say you're going to start with films, and that will be good enough, so you won't need to request approval again.
-- Legoktm http://enwp.org/d:User:Legoktm
On Tue, Apr 2, 2013 at 12:31 AM, Michael Hale hale.michael.jr@live.com wrote:
Oh, I thought bot approval was just for tasks that have an indefinite running time like cleaning vandalism, etc. I added a user-agent string to identify my IP address, but I hadn't run into problems on Wikipedia without one before. Am I supposed to re-apply each time I change a script?
From: legoktm.wikipedia@gmail.com Date: Tue, 2 Apr 2013 00:21:44 -0500 To: wikidata-l@lists.wikimedia.org
Subject: Re: [Wikidata-l] Running "Infobox film" import script
Hi Michael, I had blocked your IP before I saw this email. We have a bot policy (https://www.wikidata.org/wiki/Wikidata:Bot) that requires coders to get approval before they can run their scripts (see https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot).
Your script did look pretty cool though :)
-- Legoktm http://enwp.org/d:User:Legoktm
On Tue, Apr 2, 2013 at 12:01 AM, Michael Hale hale.michael.jr@live.com wrote:
Here's the contributions from my IP address if you want to monitor it: http://www.wikidata.org/wiki/Special:Contributions/108.235.225.145
From: hale.michael.jr@live.com To: wikidata-l@lists.wikimedia.org
Date: Tue, 2 Apr 2013 00:58:05 -0400 Subject: [Wikidata-l] Running "Infobox film" import script
I've been practicing using the Wikidata and MediaWiki APIs today. Even though the Wikidata API is still being developed, using it was more pleasant than parsing the templates. The good news is I'll probably be able to reuse a lot of that code for other infoboxes that still need to be imported. It's looking through 76000 articles for new statements it can add using Wikidata's currently supported film properties, excluding the IMDb ID because that isn't included in the infobox. So far it has added 691 new statements to 68 movies. I forgot to add a counter for when it finds one that already had all of the information entered. It will definitely have some errors, but I scanned the results for the first 100 movies before I started importing them, and I think the value-add will be much greater than the number of errors.
_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l
_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l
_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Hi Michael,
Is the code for your bot publicly available somewhere?
//Ed
On Tue, Apr 2, 2013 at 12:58 AM, Michael Hale hale.michael.jr@live.com wrote:
I've been practicing using the Wikidata and MediaWiki APIs today. Even though the Wikidata API is still being developed, using it was more pleasant than parsing the templates. The good news is I'll probably be able to reuse a lot of that code for other infoboxes that still need to be imported. It's looking through 76000 articles for new statements it can add using Wikidata's currently supported film properties, excluding the IMDb ID because that isn't included in the infobox. So far it has added 691 new statements to 68 movies. I forgot to add a counter for when it finds one that already had all of the information entered. It will definitely have some errors, but I scanned the results for the first 100 movies before I started importing them, and I think the value-add will be much greater than the number of errors.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
http://www.wikidata.org/wiki/User:Wakebrdkid%27s_bot/code
Date: Tue, 2 Apr 2013 05:10:22 -0400 From: ehs@pobox.com To: wikidata-l@lists.wikimedia.org Subject: Re: [Wikidata-l] Running "Infobox film" import script
Hi Michael,
Is the code for your bot publicly available somewhere?
//Ed
On Tue, Apr 2, 2013 at 12:58 AM, Michael Hale hale.michael.jr@live.com wrote:
I've been practicing using the Wikidata and MediaWiki APIs today. Even though the Wikidata API is still being developed, using it was more pleasant than parsing the templates. The good news is I'll probably be able to reuse a lot of that code for other infoboxes that still need to be imported. It's looking through 76000 articles for new statements it can add using Wikidata's currently supported film properties, excluding the IMDb ID because that isn't included in the infobox. So far it has added 691 new statements to 68 movies. I forgot to add a counter for when it finds one that already had all of the information entered. It will definitely have some errors, but I scanned the results for the first 100 movies before I started importing them, and I think the value-add will be much greater than the number of errors.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
It's a bit sloppy in some areas, but I'll improve it as I need it to do more things.
From: hale.michael.jr@live.com To: wikidata-l@lists.wikimedia.org Date: Tue, 2 Apr 2013 05:25:14 -0400 Subject: Re: [Wikidata-l] Running "Infobox film" import script
http://www.wikidata.org/wiki/User:Wakebrdkid%27s_bot/code
Date: Tue, 2 Apr 2013 05:10:22 -0400 From: ehs@pobox.com To: wikidata-l@lists.wikimedia.org Subject: Re: [Wikidata-l] Running "Infobox film" import script
Hi Michael,
Is the code for your bot publicly available somewhere?
//Ed
On Tue, Apr 2, 2013 at 12:58 AM, Michael Hale hale.michael.jr@live.com wrote:
I've been practicing using the Wikidata and MediaWiki APIs today. Even though the Wikidata API is still being developed, using it was more pleasant than parsing the templates. The good news is I'll probably be able to reuse a lot of that code for other infoboxes that still need to be imported. It's looking through 76000 articles for new statements it can add using Wikidata's currently supported film properties, excluding the IMDb ID because that isn't included in the infobox. So far it has added 691 new statements to 68 movies. I forgot to add a counter for when it finds one that already had all of the information entered. It will definitely have some errors, but I scanned the results for the first 100 movies before I started importing them, and I think the value-add will be much greater than the number of errors.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On Tue, Apr 2, 2013 at 5:25 AM, Michael Hale hale.michael.jr@live.com wrote:
I got a 400 when fetching this (in Chrome).
//Ed
I can't reproduce the error here (using Chrome too). Anyone else?
Date: Tue, 2 Apr 2013 14:44:24 -0400 From: ehs@pobox.com To: wikidata-l@lists.wikimedia.org Subject: Re: [Wikidata-l] Running "Infobox film" import script
On Tue, Apr 2, 2013 at 5:25 AM, Michael Hale hale.michael.jr@live.com wrote:
I got a 400 when fetching this (in Chrome).
//Ed
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
The original URL is a 400 (looks like it's double URL encoded), but the version you quoted works fine. If you end up with %2527 in your browser bar, correct it to just %27 (apostrophes in the URL aren't such a great idea to start with though).
Tom
On Tue, Apr 2, 2013 at 2:44 PM, Ed Summers ehs@pobox.com wrote:
On Tue, Apr 2, 2013 at 5:25 AM, Michael Hale hale.michael.jr@live.com wrote:
I got a 400 when fetching this (in Chrome).
//Ed
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Good eye. Outlook webmail seems to double-encode if I paste a link while using Chrome, but not IE. Spending too much time trying to fix things like that only to realize that it would be bad for their brand identity if all of the issues like that were ever fixed is why I had to get out of the corporations before it was too late for my health and sanity. I wouldn't be able to sleep if I continued to spend my time putting wind in a sail that accomplishes so little per dollar compared to Wikimedia.
Date: Tue, 2 Apr 2013 17:02:02 -0400 From: tfmorris@gmail.com To: wikidata-l@lists.wikimedia.org Subject: Re: [Wikidata-l] Running "Infobox film" import script
The original URL is a 400 (looks like it's double URL encoded), but the version you quoted works fine. If you end up with %2527 in your browser bar, correct it to just %27 (apostrophes in the URL aren't such a great idea to start with though).
Tom
On Tue, Apr 2, 2013 at 2:44 PM, Ed Summers ehs@pobox.com wrote:
On Tue, Apr 2, 2013 at 5:25 AM, Michael Hale hale.michael.jr@live.com wrote:
I got a 400 when fetching this (in Chrome).
//Ed
_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l
_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On Tue, Apr 2, 2013 at 12:58 AM, Michael Hale hale.michael.jr@live.comwrote:
It will definitely have some errors, but I scanned the results for the first 100 movies before I started importing them, and I think the value-add will be much greater than the number of errors.
Does Wikidata have a quality goal or error rate threshold? For example, Freebase has a nominal quality goal of 99% accuracy and this is the metric that new data loads are judged against (they also want to be in the 95% confidence interval, which determines how big a sample you need when doing evaluations).
I haven't looked at this bot, but a develop/test/deploy cycle measured in hours seems, on the surface, to be very aggressive.
Tom
I think that's a fine threshold, but it will probably vary some per the type of data. Our goal is ultimately that every claim will have a reference, and then we can sort of pass the burden of accuracy on to the references. Wikipedia has grown well with that principle. Adding references will be much easier when we batch import data from sources dedicated to a particular type of data as opposed to parsing information out of Wikipedia (the IMDb has more consistency checks than Infobox film).
Date: Tue, 2 Apr 2013 10:39:22 -0400 From: tfmorris@gmail.com To: wikidata-l@lists.wikimedia.org Subject: Re: [Wikidata-l] Running "Infobox film" import script
On Tue, Apr 2, 2013 at 12:58 AM, Michael Hale hale.michael.jr@live.com wrote:
It will definitely have some errors, but I scanned the results for the first 100 movies before I started importing them, and I think the value-add will be much greater than the number of errors.
Does Wikidata have a quality goal or error rate threshold? For example, Freebase has a nominal quality goal of 99% accuracy and this is the metric that new data loads are judged against (they also want to be in the 95% confidence interval, which determines how big a sample you need when doing evaluations).
I haven't looked at this bot, but a develop/test/deploy cycle measured in hours seems, on the surface, to be very aggressive. Tom
_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
That sounds good in principle, but people might get upset ("why did we put this into Wikipedia then?")
A compromise could be to import from (in this example) both Wikipedia and IMDb, add both as a reference to the same claim if they agree, and separately if not. We can then deal with the contradictions manually.
On Tue, Apr 2, 2013 at 5:21 PM, Michael Hale hale.michael.jr@live.comwrote:
I think that's a fine threshold, but it will probably vary some per the type of data. Our goal is ultimately that every claim will have a reference, and then we can sort of pass the burden of accuracy on to the references. Wikipedia has grown well with that principle. Adding references will be much easier when we batch import data from sources dedicated to a particular type of data as opposed to parsing information out of Wikipedia (the IMDb has more consistency checks than Infobox film).
Date: Tue, 2 Apr 2013 10:39:22 -0400 From: tfmorris@gmail.com
To: wikidata-l@lists.wikimedia.org Subject: Re: [Wikidata-l] Running "Infobox film" import script
On Tue, Apr 2, 2013 at 12:58 AM, Michael Hale hale.michael.jr@live.comwrote:
It will definitely have some errors, but I scanned the results for the first 100 movies before I started importing them, and I think the value-add will be much greater than the number of errors.
Does Wikidata have a quality goal or error rate threshold? For example, Freebase has a nominal quality goal of 99% accuracy and this is the metric that new data loads are judged against (they also want to be in the 95% confidence interval, which determines how big a sample you need when doing evaluations).
I haven't looked at this bot, but a develop/test/deploy cycle measured in hours seems, on the surface, to be very aggressive.
Tom
_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l