Sharing some good news, both about the progress of ORES and (my primary inspiration for sharing this email) significant improvements in article quality thanks to WikiProject Women scientists. The latter has been designated as the Keilana Effect.
Pine
---------- Forwarded message ---------- From: Aaron Halfaker aaron.halfaker@gmail.com Date: Thu, Mar 16, 2017 at 2:14 PM Subject: Re: [Wikitech-l] The Revision Scoring weekly update To: Application of Artificial Intelligence and other advanced computing strategies to Wikimedia Projects ai@lists.wikimedia.org Cc: wikitech-l wikitech-l@lists.wikimedia.org
Hey folks!
I should really stop calling this a weekly update because it's getting a bit silly at this point. :) But if it were a weekly update, it would cover the weeks of 42 - 46.
*Highlights:*
- 3 new models: Finnish Wikipedia (reverted) and Estonian Wikipedia (damaging & goodfaith)
- We estimated and agreed on funding for ORES servers in the next year with Operations
- We published a paper about vandalism detection in Wikidata and a blog post about the massive effect of some initiatives on coverage of Women Scientists in Wikipedia.
*New development:*
- We added recall-based threshold metrics to the new draftquality model which should help tool devs know what which new page creations to highlight for review[1]
- We added optional notices for ORES pages which will help us visually distinguish our experimental install in WMFlabs from the Prod install ( ores.wikimedia.org)[2]
- We added basic language support for Finish (Thanks 4shadoww)[3] and deployed a 'reverted' model[4]
- We lead a discussion in Wikidata about "item quality" that resulted in a Wikipedia 1.0 like scale for Wikidata quality[5,6] and designed a Wikilabels form to capture the gist of it[7]
- We enabled the ORES Review Tool on Czech Wikipedia[8]
- We configured ChangeProp to use our new minified JSON output to save bandwidth[9]
- We extended the Estonian language assets (Thanks Cumbril)[10] and deployed the 'damaging' and 'goodfaith' models[11,12]
- We enabled a testing model for 'goodfaith' on the Beta Cluster to make it easier for the Collaboration team to run tests with their new filter interface[13]
- We created a new "precache" endpoint that will allow us to de-duplicate configuration with ChangeProp and handle all routing in ORES locally[14]
*Resourcing:*
- We completed a 2 year estimate of ORES resource needs and discussed funding (capital expendature) for ORES in the coming fiscal year[15]. This will allow us to continue to grow ORES both in number of models and in scoring capacity.
*Communications:*
- Amir improved the KDD paper based on review feedback[16] and got it published[17]
- We published a blob post about our measurements of WikiProject Women Scientists[18,19] -- "The Keilana Effect"
- Thanks to Cumbril's work, the Estonian labeling campaing was finished[20]
*Deployments:*
- In early February, we deployed a new set of translations to Wikilabels (specifcally targeting Romanian Wikipedia)[21]
- In mid-February, we deployed some fixes to ORES documentation and response formatting[22]
- In mid-March, we deployed 3 new scoring models and ORES notices[23]
*Maintenance and robustness:*
- We fixed a serious issue in the "mwoauth" library that Wikilabels depends on[24]
- We reduced the number of revisions per request that we could receive via api.php[25]
- We investigated a scap issue that broke ORES deployment[26]
- We fixed a minor issue with JSON minification behavior[27] and hard-coding of the location of ORES in the documentation[28]
- We improved performance of ORES filters on MediaWiki[29]
- We improved the language describing ORES behavior on Special:Contributions[30]
- We added a notice to the Wikipages that Dexbot maintains about its behavior[31]
- We added notices to ores.wmflabs.org about it's experimental nature[32]
- We fixed some issues with testing Finnish language assets[33]
- We fixed some styling issues that resulted from an upgrade of OOJS UI[34]
1. https://phabricator.wikimedia.org/T157454 -- Add recall based thresholds to draftquality model 2. https://phabricator.wikimedia.org/T150962 -- Add an optional notice to ORES main and ui pages 3. https://phabricator.wikimedia.org/T158587 -- Add language support for Finnish 4. https://phabricator.wikimedia.org/T160228 -- Train/test reverted model for fiwiki 5. https://phabricator.wikimedia.org/T157489 -- [Discuss] item quality in Wikidata 6. https://www.wikidata.org/wiki/Wikidata:Item_quality 7. https://phabricator.wikimedia.org/T155828 -- Design item_quality form for Wikidata 8. https://phabricator.wikimedia.org/T151611 -- Enable ORES Review Tool on Czech Wikipedia 9. https://phabricator.wikimedia.org/T157693 -- Use minified JSON format in ChangeProp 10. https://phabricator.wikimedia.org/T160193 -- Extend estonian language assets from Wiki page 11. https://phabricator.wikimedia.org/T159608 -- Train/test damaging/goodfaith models for etwiki 12. https://phabricator.wikimedia.org/T130280 -- Deploy edit quality models for etwiki 13. https://phabricator.wikimedia.org/T160467 -- Enable 'goodfaith' on testwiki on Beta Cluster 14. https://phabricator.wikimedia.org/T148714 -- Create generalized "precache" endpoint for ORES 15. https://phabricator.wikimedia.org/T157222 -- Estimate ORES capex for FY2017-18 16. https://phabricator.wikimedia.org/T148443 -- Improve the KDD paper based on the review 17. https://arxiv.org/abs/1703.03861 18. https://phabricator.wikimedia.org/T160078 -- Blog post about wp10 measurements of Women Scientists 19. https://blog.wikimedia.org/2017/03/07/the-keilana-effect/ 20. https://phabricator.wikimedia.org/T129702 -- Complete etwiki edit quality campaign 21. https://phabricator.wikimedia.org/T157580 -- Deploy Romanian translations for Wiki labels 22. https://phabricator.wikimedia.org/T157842 -- Prod deployment of ORES 23. https://phabricator.wikimedia.org/T160279 -- Deploy ores in prod (Mid-March) 24. https://phabricator.wikimedia.org/T157858 -- mwoauth is broken 25. https://phabricator.wikimedia.org/T157983 -- Reduce the number of revisions that can be requested in one batch 26. https://phabricator.wikimedia.org/T157623 -- Investigate failed ORES deployment 27. https://phabricator.wikimedia.org/T157721 -- Investigate default JSON minification behavior in production 28. https://phabricator.wikimedia.org/T157723 -- ORES swagger is hard-coded for wmflabs 29. https://phabricator.wikimedia.org/T152585 -- rcshow=oresreview is slow 30. https://phabricator.wikimedia.org/T158862 -- Fix message in Special:Contributions 31. https://phabricator.wikimedia.org/T158899 -- Add notice about Dexbot overwriting manual changes to our tracking table. 32. https://phabricator.wikimedia.org/T159055 -- Add a notice to ores-wmflabs-deploy about "experimental" nature 33. https://phabricator.wikimedia.org/T160192 -- Fix testing issues in finnish language assets 34. https://phabricator.wikimedia.org/T160258 -- Fix minor styling issues with OOJS-UI in wikilabels
Sincerely, Aaron from the Scoring Platform team _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hoi, I noticed the notion about "quality in Wikidata". The approach is very much in line with what is the norm in Wikipedia. This is inot the right approach for Wikidata. Many of the items in Wikidata can be of high "quality"; ie the statements have a source and there are enough labels but the true value of these items are in the use of these items as statements in other items.. (for instance a university indicates that someone studied there). Another quality point is that for authors a VIAF statements allows for the linking in Wikipedias in external sources. This is of a high importance, it makes Wikidata useful and, if that is not of a quality consideration what is?
One other aspect of Wikidata is that it is still highly immature. Just consider the statistics for labels and statements [1] . This is only the first month where less than 10% of our items have no statement.. We talk about quality but quality should have a practical meaning. Just saying this or that item is so good, it makes for stamp collecting. The point of a stamp is not to collect them it is to send mail. Quality means that we know how many articles have been written in one or more editathons. It is in finding for a collection of items that it is better known what award, schooling has been achieved by the people that was written for. It is in using Wikidata to indicate what categories could be in what Wikipedia article.
Quality needs to be actionable. What is the use of static quality? Thanks, GerardM
[1] https://tools.wmflabs.org/wikidata-todo/stats.php?reverse
On 17 March 2017 at 02:19, Pine W wiki.pine@gmail.com wrote:
Sharing some good news, both about the progress of ORES and (my primary inspiration for sharing this email) significant improvements in article quality thanks to WikiProject Women scientists. The latter has been designated as the Keilana Effect.
Pine
---------- Forwarded message ---------- From: Aaron Halfaker aaron.halfaker@gmail.com Date: Thu, Mar 16, 2017 at 2:14 PM Subject: Re: [Wikitech-l] The Revision Scoring weekly update To: Application of Artificial Intelligence and other advanced computing strategies to Wikimedia Projects ai@lists.wikimedia.org Cc: wikitech-l wikitech-l@lists.wikimedia.org
Hey folks!
I should really stop calling this a weekly update because it's getting a bit silly at this point. :) But if it were a weekly update, it would cover the weeks of 42 - 46.
*Highlights:*
- 3 new models: Finnish Wikipedia (reverted) and Estonian Wikipedia
(damaging & goodfaith)
- We estimated and agreed on funding for ORES servers in the next year
with Operations
- We published a paper about vandalism detection in Wikidata and a blog
post about the massive effect of some initiatives on coverage of Women Scientists in Wikipedia.
*New development:*
- We added recall-based threshold metrics to the new draftquality model
which should help tool devs know what which new page creations to highlight for review[1]
- We added optional notices for ORES pages which will help us visually
distinguish our experimental install in WMFlabs from the Prod install ( ores.wikimedia.org)[2]
- We added basic language support for Finish (Thanks 4shadoww)[3] and
deployed a 'reverted' model[4]
- We lead a discussion in Wikidata about "item quality" that resulted in
a Wikipedia 1.0 like scale for Wikidata quality[5,6] and designed a Wikilabels form to capture the gist of it[7]
We enabled the ORES Review Tool on Czech Wikipedia[8]
We configured ChangeProp to use our new minified JSON output to save
bandwidth[9]
- We extended the Estonian language assets (Thanks Cumbril)[10] and
deployed the 'damaging' and 'goodfaith' models[11,12]
- We enabled a testing model for 'goodfaith' on the Beta Cluster to make
it easier for the Collaboration team to run tests with their new filter interface[13]
- We created a new "precache" endpoint that will allow us to
de-duplicate configuration with ChangeProp and handle all routing in ORES locally[14]
*Resourcing:*
- We completed a 2 year estimate of ORES resource needs and discussed
funding (capital expendature) for ORES in the coming fiscal year[15]. This will allow us to continue to grow ORES both in number of models and in scoring capacity.
*Communications:*
- Amir improved the KDD paper based on review feedback[16] and got it
published[17]
- We published a blob post about our measurements of WikiProject Women
Scientists[18,19] -- "The Keilana Effect"
- Thanks to Cumbril's work, the Estonian labeling campaing was
finished[20]
*Deployments:*
- In early February, we deployed a new set of translations to Wikilabels
(specifcally targeting Romanian Wikipedia)[21]
- In mid-February, we deployed some fixes to ORES documentation and
response formatting[22]
- In mid-March, we deployed 3 new scoring models and ORES notices[23]
*Maintenance and robustness:*
- We fixed a serious issue in the "mwoauth" library that Wikilabels
depends on[24]
- We reduced the number of revisions per request that we could receive
via api.php[25]
We investigated a scap issue that broke ORES deployment[26]
We fixed a minor issue with JSON minification behavior[27] and
hard-coding of the location of ORES in the documentation[28]
We improved performance of ORES filters on MediaWiki[29]
We improved the language describing ORES behavior on
Special:Contributions[30]
- We added a notice to the Wikipages that Dexbot maintains about its
behavior[31]
- We added notices to ores.wmflabs.org about it's experimental
nature[32]
We fixed some issues with testing Finnish language assets[33]
We fixed some styling issues that resulted from an upgrade of OOJS
UI[34]
- https://phabricator.wikimedia.org/T157454 -- Add recall based
thresholds to draftquality model 2. https://phabricator.wikimedia.org/T150962 -- Add an optional notice to ORES main and ui pages 3. https://phabricator.wikimedia.org/T158587 -- Add language support for Finnish 4. https://phabricator.wikimedia.org/T160228 -- Train/test reverted model for fiwiki 5. https://phabricator.wikimedia.org/T157489 -- [Discuss] item quality in Wikidata 6. https://www.wikidata.org/wiki/Wikidata:Item_quality 7. https://phabricator.wikimedia.org/T155828 -- Design item_quality form for Wikidata 8. https://phabricator.wikimedia.org/T151611 -- Enable ORES Review Tool on Czech Wikipedia 9. https://phabricator.wikimedia.org/T157693 -- Use minified JSON format in ChangeProp 10. https://phabricator.wikimedia.org/T160193 -- Extend estonian language assets from Wiki page 11. https://phabricator.wikimedia.org/T159608 -- Train/test damaging/goodfaith models for etwiki 12. https://phabricator.wikimedia.org/T130280 -- Deploy edit quality models for etwiki 13. https://phabricator.wikimedia.org/T160467 -- Enable 'goodfaith' on testwiki on Beta Cluster 14. https://phabricator.wikimedia.org/T148714 -- Create generalized "precache" endpoint for ORES 15. https://phabricator.wikimedia.org/T157222 -- Estimate ORES capex for FY2017-18 16. https://phabricator.wikimedia.org/T148443 -- Improve the KDD paper based on the review 17. https://arxiv.org/abs/1703.03861 18. https://phabricator.wikimedia.org/T160078 -- Blog post about wp10 measurements of Women Scientists 19. https://blog.wikimedia.org/2017/03/07/the-keilana-effect/ 20. https://phabricator.wikimedia.org/T129702 -- Complete etwiki edit quality campaign 21. https://phabricator.wikimedia.org/T157580 -- Deploy Romanian translations for Wiki labels 22. https://phabricator.wikimedia.org/T157842 -- Prod deployment of ORES 23. https://phabricator.wikimedia.org/T160279 -- Deploy ores in prod (Mid-March) 24. https://phabricator.wikimedia.org/T157858 -- mwoauth is broken 25. https://phabricator.wikimedia.org/T157983 -- Reduce the number of revisions that can be requested in one batch 26. https://phabricator.wikimedia.org/T157623 -- Investigate failed ORES deployment 27. https://phabricator.wikimedia.org/T157721 -- Investigate default JSON minification behavior in production 28. https://phabricator.wikimedia.org/T157723 -- ORES swagger is hard-coded for wmflabs 29. https://phabricator.wikimedia.org/T152585 -- rcshow=oresreview is slow 30. https://phabricator.wikimedia.org/T158862 -- Fix message in Special:Contributions 31. https://phabricator.wikimedia.org/T158899 -- Add notice about Dexbot overwriting manual changes to our tracking table. 32. https://phabricator.wikimedia.org/T159055 -- Add a notice to ores-wmflabs-deploy about "experimental" nature 33. https://phabricator.wikimedia.org/T160192 -- Fix testing issues in finnish language assets 34. https://phabricator.wikimedia.org/T160258 -- Fix minor styling issues with OOJS-UI in wikilabels
Sincerely, Aaron from the Scoring Platform team _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Having guidance on quality helps people learning about Wikidata understand what they should be aiming for.
The paper on vandalism detection in Wikidata sounds interesting, where can I find it?
Richard
On 17 March 2017 at 09:09, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, I noticed the notion about "quality in Wikidata". The approach is very much in line with what is the norm in Wikipedia. This is inot the right approach for Wikidata. Many of the items in Wikidata can be of high "quality"; ie the statements have a source and there are enough labels but the true value of these items are in the use of these items as statements in other items.. (for instance a university indicates that someone studied there). Another quality point is that for authors a VIAF statements allows for the linking in Wikipedias in external sources. This is of a high importance, it makes Wikidata useful and, if that is not of a quality consideration what is?
One other aspect of Wikidata is that it is still highly immature. Just consider the statistics for labels and statements [1] . This is only the first month where less than 10% of our items have no statement.. We talk about quality but quality should have a practical meaning. Just saying this or that item is so good, it makes for stamp collecting. The point of a stamp is not to collect them it is to send mail. Quality means that we know how many articles have been written in one or more editathons. It is in finding for a collection of items that it is better known what award, schooling has been achieved by the people that was written for. It is in using Wikidata to indicate what categories could be in what Wikipedia article.
Quality needs to be actionable. What is the use of static quality? Thanks, GerardM
[1] https://tools.wmflabs.org/wikidata-todo/stats.php?reverse
On 17 March 2017 at 02:19, Pine W wiki.pine@gmail.com wrote:
Sharing some good news, both about the progress of ORES and (my primary inspiration for sharing this email) significant improvements in article quality thanks to WikiProject Women scientists. The latter has been designated as the Keilana Effect.
Pine
---------- Forwarded message ---------- From: Aaron Halfaker aaron.halfaker@gmail.com Date: Thu, Mar 16, 2017 at 2:14 PM Subject: Re: [Wikitech-l] The Revision Scoring weekly update To: Application of Artificial Intelligence and other advanced computing strategies to Wikimedia Projects ai@lists.wikimedia.org Cc: wikitech-l wikitech-l@lists.wikimedia.org
Hey folks!
I should really stop calling this a weekly update because it's getting a bit silly at this point. :) But if it were a weekly update, it would cover the weeks of 42 - 46.
*Highlights:*
- 3 new models: Finnish Wikipedia (reverted) and Estonian Wikipedia
(damaging & goodfaith)
- We estimated and agreed on funding for ORES servers in the next year
with Operations
- We published a paper about vandalism detection in Wikidata and a
blog
post about the massive effect of some initiatives on coverage of Women Scientists in Wikipedia.
*New development:*
- We added recall-based threshold metrics to the new draftquality
model
which should help tool devs know what which new page creations to highlight for review[1]
- We added optional notices for ORES pages which will help us visually
distinguish our experimental install in WMFlabs from the Prod install
(
ores.wikimedia.org)[2]
- We added basic language support for Finish (Thanks 4shadoww)[3] and
deployed a 'reverted' model[4]
- We lead a discussion in Wikidata about "item quality" that resulted
in
a Wikipedia 1.0 like scale for Wikidata quality[5,6] and designed a Wikilabels form to capture the gist of it[7]
We enabled the ORES Review Tool on Czech Wikipedia[8]
We configured ChangeProp to use our new minified JSON output to save
bandwidth[9]
- We extended the Estonian language assets (Thanks Cumbril)[10] and
deployed the 'damaging' and 'goodfaith' models[11,12]
- We enabled a testing model for 'goodfaith' on the Beta Cluster to
make
it easier for the Collaboration team to run tests with their new
filter
interface[13]
- We created a new "precache" endpoint that will allow us to
de-duplicate configuration with ChangeProp and handle all routing in ORES locally[14]
*Resourcing:*
- We completed a 2 year estimate of ORES resource needs and discussed
funding (capital expendature) for ORES in the coming fiscal year[15]. This will allow us to continue to grow ORES both in number of models and in scoring capacity.
*Communications:*
- Amir improved the KDD paper based on review feedback[16] and got it
published[17]
- We published a blob post about our measurements of WikiProject Women
Scientists[18,19] -- "The Keilana Effect"
- Thanks to Cumbril's work, the Estonian labeling campaing was
finished[20]
*Deployments:*
- In early February, we deployed a new set of translations to
Wikilabels
(specifcally targeting Romanian Wikipedia)[21]
- In mid-February, we deployed some fixes to ORES documentation and
response formatting[22]
- In mid-March, we deployed 3 new scoring models and ORES notices[23]
*Maintenance and robustness:*
- We fixed a serious issue in the "mwoauth" library that Wikilabels
depends on[24]
- We reduced the number of revisions per request that we could receive
via api.php[25]
We investigated a scap issue that broke ORES deployment[26]
We fixed a minor issue with JSON minification behavior[27] and
hard-coding of the location of ORES in the documentation[28]
We improved performance of ORES filters on MediaWiki[29]
We improved the language describing ORES behavior on
Special:Contributions[30]
- We added a notice to the Wikipages that Dexbot maintains about its
behavior[31]
- We added notices to ores.wmflabs.org about it's experimental
nature[32]
We fixed some issues with testing Finnish language assets[33]
We fixed some styling issues that resulted from an upgrade of OOJS
UI[34]
- https://phabricator.wikimedia.org/T157454 -- Add recall based
thresholds to draftquality model 2. https://phabricator.wikimedia.org/T150962 -- Add an optional notice
to
ORES main and ui pages 3. https://phabricator.wikimedia.org/T158587 -- Add language support for Finnish 4. https://phabricator.wikimedia.org/T160228 -- Train/test reverted
model
for fiwiki 5. https://phabricator.wikimedia.org/T157489 -- [Discuss] item quality
in
Wikidata 6. https://www.wikidata.org/wiki/Wikidata:Item_quality 7. https://phabricator.wikimedia.org/T155828 -- Design item_quality form for Wikidata 8. https://phabricator.wikimedia.org/T151611 -- Enable ORES Review Tool
on
Czech Wikipedia 9. https://phabricator.wikimedia.org/T157693 -- Use minified JSON format in ChangeProp 10. https://phabricator.wikimedia.org/T160193 -- Extend estonian
language
assets from Wiki page 11. https://phabricator.wikimedia.org/T159608 -- Train/test damaging/goodfaith models for etwiki 12. https://phabricator.wikimedia.org/T130280 -- Deploy edit quality models for etwiki 13. https://phabricator.wikimedia.org/T160467 -- Enable 'goodfaith' on testwiki on Beta Cluster 14. https://phabricator.wikimedia.org/T148714 -- Create generalized "precache" endpoint for ORES 15. https://phabricator.wikimedia.org/T157222 -- Estimate ORES capex for FY2017-18 16. https://phabricator.wikimedia.org/T148443 -- Improve the KDD paper based on the review 17. https://arxiv.org/abs/1703.03861 18. https://phabricator.wikimedia.org/T160078 -- Blog post about wp10 measurements of Women Scientists 19. https://blog.wikimedia.org/2017/03/07/the-keilana-effect/ 20. https://phabricator.wikimedia.org/T129702 -- Complete etwiki edit quality campaign 21. https://phabricator.wikimedia.org/T157580 -- Deploy Romanian translations for Wiki labels 22. https://phabricator.wikimedia.org/T157842 -- Prod deployment of ORES 23. https://phabricator.wikimedia.org/T160279 -- Deploy ores in prod (Mid-March) 24. https://phabricator.wikimedia.org/T157858 -- mwoauth is broken 25. https://phabricator.wikimedia.org/T157983 -- Reduce the number of revisions that can be requested in one batch 26. https://phabricator.wikimedia.org/T157623 -- Investigate failed ORES deployment 27. https://phabricator.wikimedia.org/T157721 -- Investigate default
JSON
minification behavior in production 28. https://phabricator.wikimedia.org/T157723 -- ORES swagger is hard-coded for wmflabs 29. https://phabricator.wikimedia.org/T152585 -- rcshow=oresreview is
slow
- https://phabricator.wikimedia.org/T158862 -- Fix message in
Special:Contributions 31. https://phabricator.wikimedia.org/T158899 -- Add notice about Dexbot overwriting manual changes to our tracking table. 32. https://phabricator.wikimedia.org/T159055 -- Add a notice to ores-wmflabs-deploy about "experimental" nature 33. https://phabricator.wikimedia.org/T160192 -- Fix testing issues in finnish language assets 34. https://phabricator.wikimedia.org/T160258 -- Fix minor styling
issues
with OOJS-UI in wikilabels
Sincerely, Aaron from the Scoring Platform team _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Hey, It's in arxiv: https://arxiv.org/abs/1703.03861 Any feedback is welcome :)
Best
On Fri, Mar 17, 2017 at 3:37 PM Richard Nevell < richard.nevell@wikimedia.org.uk> wrote:
Having guidance on quality helps people learning about Wikidata understand what they should be aiming for.
The paper on vandalism detection in Wikidata sounds interesting, where can I find it?
Richard
On 17 March 2017 at 09:09, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, I noticed the notion about "quality in Wikidata". The approach is very
much
in line with what is the norm in Wikipedia. This is inot the right
approach
for Wikidata. Many of the items in Wikidata can be of high "quality"; ie the statements have a source and there are enough labels but the true
value
of these items are in the use of these items as statements in other
items..
(for instance a university indicates that someone studied there).
Another
quality point is that for authors a VIAF statements allows for the
linking
in Wikipedias in external sources. This is of a high importance, it makes Wikidata useful and, if that is not of a quality consideration what is?
One other aspect of Wikidata is that it is still highly immature. Just consider the statistics for labels and statements [1] . This is only the first month where less than 10% of our items have no statement.. We talk about quality but quality should have a practical meaning. Just saying
this
or that item is so good, it makes for stamp collecting. The point of a stamp is not to collect them it is to send mail. Quality means that we
know
how many articles have been written in one or more editathons. It is in finding for a collection of items that it is better known what award, schooling has been achieved by the people that was written for. It is in using Wikidata to indicate what categories could be in what Wikipedia article.
Quality needs to be actionable. What is the use of static quality? Thanks, GerardM
[1] https://tools.wmflabs.org/wikidata-todo/stats.php?reverse
On 17 March 2017 at 02:19, Pine W wiki.pine@gmail.com wrote:
Sharing some good news, both about the progress of ORES and (my primary inspiration for sharing this email) significant improvements in article quality thanks to WikiProject Women scientists. The latter has been designated as the Keilana Effect.
Pine
---------- Forwarded message ---------- From: Aaron Halfaker aaron.halfaker@gmail.com Date: Thu, Mar 16, 2017 at 2:14 PM Subject: Re: [Wikitech-l] The Revision Scoring weekly update To: Application of Artificial Intelligence and other advanced computing strategies to Wikimedia Projects ai@lists.wikimedia.org Cc: wikitech-l wikitech-l@lists.wikimedia.org
Hey folks!
I should really stop calling this a weekly update because it's getting
a
bit silly at this point. :) But if it were a weekly update, it would cover the weeks of 42 - 46.
*Highlights:*
- 3 new models: Finnish Wikipedia (reverted) and Estonian Wikipedia
(damaging & goodfaith)
- We estimated and agreed on funding for ORES servers in the next
year
with Operations
- We published a paper about vandalism detection in Wikidata and a
blog
post about the massive effect of some initiatives on coverage of
Women
Scientists in Wikipedia.
*New development:*
- We added recall-based threshold metrics to the new draftquality
model
which should help tool devs know what which new page creations to highlight for review[1]
- We added optional notices for ORES pages which will help us
visually
distinguish our experimental install in WMFlabs from the Prod
install
(
ores.wikimedia.org)[2]
- We added basic language support for Finish (Thanks 4shadoww)[3]
and
deployed a 'reverted' model[4]
- We lead a discussion in Wikidata about "item quality" that
resulted
in
a Wikipedia 1.0 like scale for Wikidata quality[5,6] and designed a Wikilabels form to capture the gist of it[7]
We enabled the ORES Review Tool on Czech Wikipedia[8]
We configured ChangeProp to use our new minified JSON output to
save
bandwidth[9]
- We extended the Estonian language assets (Thanks Cumbril)[10] and
deployed the 'damaging' and 'goodfaith' models[11,12]
- We enabled a testing model for 'goodfaith' on the Beta Cluster to
make
it easier for the Collaboration team to run tests with their new
filter
interface[13]
- We created a new "precache" endpoint that will allow us to
de-duplicate configuration with ChangeProp and handle all routing in ORES locally[14]
*Resourcing:*
- We completed a 2 year estimate of ORES resource needs and
discussed
funding (capital expendature) for ORES in the coming fiscal
year[15].
This will allow us to continue to grow ORES both in number of models and
in
scoring capacity.
*Communications:*
- Amir improved the KDD paper based on review feedback[16] and got
it
published[17]
- We published a blob post about our measurements of WikiProject
Women
Scientists[18,19] -- "The Keilana Effect"
- Thanks to Cumbril's work, the Estonian labeling campaing was
finished[20]
*Deployments:*
- In early February, we deployed a new set of translations to
Wikilabels
(specifcally targeting Romanian Wikipedia)[21]
- In mid-February, we deployed some fixes to ORES documentation and
response formatting[22]
- In mid-March, we deployed 3 new scoring models and ORES
notices[23]
*Maintenance and robustness:*
- We fixed a serious issue in the "mwoauth" library that Wikilabels
depends on[24]
- We reduced the number of revisions per request that we could
receive
via api.php[25]
We investigated a scap issue that broke ORES deployment[26]
We fixed a minor issue with JSON minification behavior[27] and
hard-coding of the location of ORES in the documentation[28]
We improved performance of ORES filters on MediaWiki[29]
We improved the language describing ORES behavior on
Special:Contributions[30]
- We added a notice to the Wikipages that Dexbot maintains about its
behavior[31]
- We added notices to ores.wmflabs.org about it's experimental
nature[32]
We fixed some issues with testing Finnish language assets[33]
We fixed some styling issues that resulted from an upgrade of OOJS
UI[34]
- https://phabricator.wikimedia.org/T157454 -- Add recall based
thresholds to draftquality model 2. https://phabricator.wikimedia.org/T150962 -- Add an optional notice
to
ORES main and ui pages 3. https://phabricator.wikimedia.org/T158587 -- Add language support
for
Finnish 4. https://phabricator.wikimedia.org/T160228 -- Train/test reverted
model
for fiwiki 5. https://phabricator.wikimedia.org/T157489 -- [Discuss] item quality
in
Wikidata 6. https://www.wikidata.org/wiki/Wikidata:Item_quality 7. https://phabricator.wikimedia.org/T155828 -- Design item_quality
form
for Wikidata 8. https://phabricator.wikimedia.org/T151611 -- Enable ORES Review
Tool
on
Czech Wikipedia 9. https://phabricator.wikimedia.org/T157693 -- Use minified JSON
format
in ChangeProp 10. https://phabricator.wikimedia.org/T160193 -- Extend estonian
language
assets from Wiki page 11. https://phabricator.wikimedia.org/T159608 -- Train/test damaging/goodfaith models for etwiki 12. https://phabricator.wikimedia.org/T130280 -- Deploy edit quality models for etwiki 13. https://phabricator.wikimedia.org/T160467 -- Enable 'goodfaith' on testwiki on Beta Cluster 14. https://phabricator.wikimedia.org/T148714 -- Create generalized "precache" endpoint for ORES 15. https://phabricator.wikimedia.org/T157222 -- Estimate ORES capex
for
FY2017-18 16. https://phabricator.wikimedia.org/T148443 -- Improve the KDD paper based on the review 17. https://arxiv.org/abs/1703.03861 18. https://phabricator.wikimedia.org/T160078 -- Blog post about wp10 measurements of Women Scientists 19. https://blog.wikimedia.org/2017/03/07/the-keilana-effect/ 20. https://phabricator.wikimedia.org/T129702 -- Complete etwiki edit quality campaign 21. https://phabricator.wikimedia.org/T157580 -- Deploy Romanian translations for Wiki labels 22. https://phabricator.wikimedia.org/T157842 -- Prod deployment of
ORES
- https://phabricator.wikimedia.org/T160279 -- Deploy ores in prod
(Mid-March) 24. https://phabricator.wikimedia.org/T157858 -- mwoauth is broken 25. https://phabricator.wikimedia.org/T157983 -- Reduce the number of revisions that can be requested in one batch 26. https://phabricator.wikimedia.org/T157623 -- Investigate failed
ORES
deployment 27. https://phabricator.wikimedia.org/T157721 -- Investigate default
JSON
minification behavior in production 28. https://phabricator.wikimedia.org/T157723 -- ORES swagger is hard-coded for wmflabs 29. https://phabricator.wikimedia.org/T152585 -- rcshow=oresreview is
slow
- https://phabricator.wikimedia.org/T158862 -- Fix message in
Special:Contributions 31. https://phabricator.wikimedia.org/T158899 -- Add notice about
Dexbot
overwriting manual changes to our tracking table. 32. https://phabricator.wikimedia.org/T159055 -- Add a notice to ores-wmflabs-deploy about "experimental" nature 33. https://phabricator.wikimedia.org/T160192 -- Fix testing issues in finnish language assets 34. https://phabricator.wikimedia.org/T160258 -- Fix minor styling
issues
with OOJS-UI in wikilabels
Sincerely, Aaron from the Scoring Platform team _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
-- Richard Nevell Project Coordinator Wikimedia UK - sign up to our newsletter http://eepurl.com/cnYOw5 +44 (0) 20 7065 0921 <+44%2020%207065%200921>
Wikimedia UK is a Company Limited by Guarantee registered in England and Wales, Registered No. 6741827. Registered Charity No.1144513. Registered Office 4th Floor, Development House, 56-64 Leonard Street, London EC2A 4LT. United Kingdom. Wikimedia UK is the UK chapter of a global Wikimedia movement. The Wikimedia projects are run by the Wikimedia Foundation (who operate Wikipedia, amongst other projects).
*Wikimedia UK is an independent non-profit charity with no legal control over Wikipedia nor responsibility for its contents.* _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Hoi, I so agree but when it is quality that is to be achieved let it be a guidance that helps us to achieve quality.
Wikidata should bring things together. I do not aim to achieve the quality as described because it fails in achieving things that are actionable and have a measurable effect on the quality of Wikidat as being complimentary to Wikipedia.
Arguably the quality that Wikidata brings is not realised because of Wikidata items are considered in the same way as articles. They are not. Thanks, GerardM
On 17 March 2017 at 13:06, Richard Nevell richard.nevell@wikimedia.org.uk wrote:
Having guidance on quality helps people learning about Wikidata understand what they should be aiming for.
The paper on vandalism detection in Wikidata sounds interesting, where can I find it?
Richard
On 17 March 2017 at 09:09, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, I noticed the notion about "quality in Wikidata". The approach is very
much
in line with what is the norm in Wikipedia. This is inot the right
approach
for Wikidata. Many of the items in Wikidata can be of high "quality"; ie the statements have a source and there are enough labels but the true
value
of these items are in the use of these items as statements in other
items..
(for instance a university indicates that someone studied there).
Another
quality point is that for authors a VIAF statements allows for the
linking
in Wikipedias in external sources. This is of a high importance, it makes Wikidata useful and, if that is not of a quality consideration what is?
One other aspect of Wikidata is that it is still highly immature. Just consider the statistics for labels and statements [1] . This is only the first month where less than 10% of our items have no statement.. We talk about quality but quality should have a practical meaning. Just saying
this
or that item is so good, it makes for stamp collecting. The point of a stamp is not to collect them it is to send mail. Quality means that we
know
how many articles have been written in one or more editathons. It is in finding for a collection of items that it is better known what award, schooling has been achieved by the people that was written for. It is in using Wikidata to indicate what categories could be in what Wikipedia article.
Quality needs to be actionable. What is the use of static quality? Thanks, GerardM
[1] https://tools.wmflabs.org/wikidata-todo/stats.php?reverse
On 17 March 2017 at 02:19, Pine W wiki.pine@gmail.com wrote:
Sharing some good news, both about the progress of ORES and (my primary inspiration for sharing this email) significant improvements in article quality thanks to WikiProject Women scientists. The latter has been designated as the Keilana Effect.
Pine
---------- Forwarded message ---------- From: Aaron Halfaker aaron.halfaker@gmail.com Date: Thu, Mar 16, 2017 at 2:14 PM Subject: Re: [Wikitech-l] The Revision Scoring weekly update To: Application of Artificial Intelligence and other advanced computing strategies to Wikimedia Projects ai@lists.wikimedia.org Cc: wikitech-l wikitech-l@lists.wikimedia.org
Hey folks!
I should really stop calling this a weekly update because it's getting
a
bit silly at this point. :) But if it were a weekly update, it would cover the weeks of 42 - 46.
*Highlights:*
- 3 new models: Finnish Wikipedia (reverted) and Estonian Wikipedia
(damaging & goodfaith)
- We estimated and agreed on funding for ORES servers in the next
year
with Operations
- We published a paper about vandalism detection in Wikidata and a
blog
post about the massive effect of some initiatives on coverage of
Women
Scientists in Wikipedia.
*New development:*
- We added recall-based threshold metrics to the new draftquality
model
which should help tool devs know what which new page creations to highlight for review[1]
- We added optional notices for ORES pages which will help us
visually
distinguish our experimental install in WMFlabs from the Prod
install
(
ores.wikimedia.org)[2]
- We added basic language support for Finish (Thanks 4shadoww)[3]
and
deployed a 'reverted' model[4]
- We lead a discussion in Wikidata about "item quality" that
resulted
in
a Wikipedia 1.0 like scale for Wikidata quality[5,6] and designed a Wikilabels form to capture the gist of it[7]
We enabled the ORES Review Tool on Czech Wikipedia[8]
We configured ChangeProp to use our new minified JSON output to
save
bandwidth[9]
- We extended the Estonian language assets (Thanks Cumbril)[10] and
deployed the 'damaging' and 'goodfaith' models[11,12]
- We enabled a testing model for 'goodfaith' on the Beta Cluster to
make
it easier for the Collaboration team to run tests with their new
filter
interface[13]
- We created a new "precache" endpoint that will allow us to
de-duplicate configuration with ChangeProp and handle all routing in ORES locally[14]
*Resourcing:*
- We completed a 2 year estimate of ORES resource needs and
discussed
funding (capital expendature) for ORES in the coming fiscal
year[15].
This will allow us to continue to grow ORES both in number of models and
in
scoring capacity.
*Communications:*
- Amir improved the KDD paper based on review feedback[16] and got
it
published[17]
- We published a blob post about our measurements of WikiProject
Women
Scientists[18,19] -- "The Keilana Effect"
- Thanks to Cumbril's work, the Estonian labeling campaing was
finished[20]
*Deployments:*
- In early February, we deployed a new set of translations to
Wikilabels
(specifcally targeting Romanian Wikipedia)[21]
- In mid-February, we deployed some fixes to ORES documentation and
response formatting[22]
- In mid-March, we deployed 3 new scoring models and ORES
notices[23]
*Maintenance and robustness:*
- We fixed a serious issue in the "mwoauth" library that Wikilabels
depends on[24]
- We reduced the number of revisions per request that we could
receive
via api.php[25]
We investigated a scap issue that broke ORES deployment[26]
We fixed a minor issue with JSON minification behavior[27] and
hard-coding of the location of ORES in the documentation[28]
We improved performance of ORES filters on MediaWiki[29]
We improved the language describing ORES behavior on
Special:Contributions[30]
- We added a notice to the Wikipages that Dexbot maintains about its
behavior[31]
- We added notices to ores.wmflabs.org about it's experimental
nature[32]
We fixed some issues with testing Finnish language assets[33]
We fixed some styling issues that resulted from an upgrade of OOJS
UI[34]
- https://phabricator.wikimedia.org/T157454 -- Add recall based
thresholds to draftquality model 2. https://phabricator.wikimedia.org/T150962 -- Add an optional notice
to
ORES main and ui pages 3. https://phabricator.wikimedia.org/T158587 -- Add language support
for
Finnish 4. https://phabricator.wikimedia.org/T160228 -- Train/test reverted
model
for fiwiki 5. https://phabricator.wikimedia.org/T157489 -- [Discuss] item quality
in
Wikidata 6. https://www.wikidata.org/wiki/Wikidata:Item_quality 7. https://phabricator.wikimedia.org/T155828 -- Design item_quality
form
for Wikidata 8. https://phabricator.wikimedia.org/T151611 -- Enable ORES Review
Tool
on
Czech Wikipedia 9. https://phabricator.wikimedia.org/T157693 -- Use minified JSON
format
in ChangeProp 10. https://phabricator.wikimedia.org/T160193 -- Extend estonian
language
assets from Wiki page 11. https://phabricator.wikimedia.org/T159608 -- Train/test damaging/goodfaith models for etwiki 12. https://phabricator.wikimedia.org/T130280 -- Deploy edit quality models for etwiki 13. https://phabricator.wikimedia.org/T160467 -- Enable 'goodfaith' on testwiki on Beta Cluster 14. https://phabricator.wikimedia.org/T148714 -- Create generalized "precache" endpoint for ORES 15. https://phabricator.wikimedia.org/T157222 -- Estimate ORES capex
for
FY2017-18 16. https://phabricator.wikimedia.org/T148443 -- Improve the KDD paper based on the review 17. https://arxiv.org/abs/1703.03861 18. https://phabricator.wikimedia.org/T160078 -- Blog post about wp10 measurements of Women Scientists 19. https://blog.wikimedia.org/2017/03/07/the-keilana-effect/ 20. https://phabricator.wikimedia.org/T129702 -- Complete etwiki edit quality campaign 21. https://phabricator.wikimedia.org/T157580 -- Deploy Romanian translations for Wiki labels 22. https://phabricator.wikimedia.org/T157842 -- Prod deployment of
ORES
- https://phabricator.wikimedia.org/T160279 -- Deploy ores in prod
(Mid-March) 24. https://phabricator.wikimedia.org/T157858 -- mwoauth is broken 25. https://phabricator.wikimedia.org/T157983 -- Reduce the number of revisions that can be requested in one batch 26. https://phabricator.wikimedia.org/T157623 -- Investigate failed
ORES
deployment 27. https://phabricator.wikimedia.org/T157721 -- Investigate default
JSON
minification behavior in production 28. https://phabricator.wikimedia.org/T157723 -- ORES swagger is hard-coded for wmflabs 29. https://phabricator.wikimedia.org/T152585 -- rcshow=oresreview is
slow
- https://phabricator.wikimedia.org/T158862 -- Fix message in
Special:Contributions 31. https://phabricator.wikimedia.org/T158899 -- Add notice about
Dexbot
overwriting manual changes to our tracking table. 32. https://phabricator.wikimedia.org/T159055 -- Add a notice to ores-wmflabs-deploy about "experimental" nature 33. https://phabricator.wikimedia.org/T160192 -- Fix testing issues in finnish language assets 34. https://phabricator.wikimedia.org/T160258 -- Fix minor styling
issues
with OOJS-UI in wikilabels
Sincerely, Aaron from the Scoring Platform team _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
-- Richard Nevell Project Coordinator Wikimedia UK - sign up to our newsletter http://eepurl.com/cnYOw5 +44 (0) 20 7065 0921
Wikimedia UK is a Company Limited by Guarantee registered in England and Wales, Registered No. 6741827. Registered Charity No.1144513. Registered Office 4th Floor, Development House, 56-64 Leonard Street, London EC2A 4LT. United Kingdom. Wikimedia UK is the UK chapter of a global Wikimedia movement. The Wikimedia projects are run by the Wikimedia Foundation (who operate Wikipedia, amongst other projects).
*Wikimedia UK is an independent non-profit charity with no legal control over Wikipedia nor responsibility for its contents.* _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
wikimedia-l@lists.wikimedia.org