Hello all,
This summer I am working on a project to evaluate and improve the available MediaWiki web API client libraries. As pywikibot met the initial criteria of quality, features, and development status I chose to evaluate it in more depth. There is now a "gold standard"[1] that will be used to find and enable the listing of particularly well-designed and easy-to-use MediaWiki web API client libraries--I've now evaluated several Python libraries against this standard and suggested additions and changes that would help them meet the standard.
First, thank you all for contributing to pywikibot and its community of users!
My evaluation for pywikibot is posted here.[2] Pywikibot is impressively full-featured (including Wikidata API coverage), and it makes it possible for bot runners and wiki maintainers to quickly get started automating wiki management tasks. Some areas that could be improved include expanded and centralized documentation, efficiency in use of API calls, and making the setup process lighter-weight and easier to use.
I will follow up by posting specific suggestions to Bugzilla[3] later this week. If you have comments or questions, please feel free to post on the evaluation talk page, respond to the bugs filed, or make corrections on the evaluation page if I've missed something.
-Frances Hocutt MediaWiki intern
[1] https://www.mediawiki.org/wiki/API:Client_code/Gold_standard [2] https://www.mediawiki.org/wiki/API:Client_code/Evaluations/Pywikibot [3] https://bugzilla.wikimedia.org/buglist.cgi?query_format=specific&product...
Thank you, this is helpful, I want to work on some of them:
- Use gzip compression by default - Make it easy to add a user-agent header and give examples of a good one in the documentation for it (see https://meta.wikimedia.org/wiki/User-agent_policy) - Add Python 3 compatibility (this is in progress for the core branch) - Package pywikibot for installation from PyPI via pip install - Make the initial installation process lighter-weight: - Design pwb.py with user experience in mind, particularly valuing feedback from new or one-time users during the redesign process - Make it possible to install into a virtualenv without putting a config file in the home directory - Make it possible to run import pywikibot without having to log in - Iterating over a list and calling the API for each item is an inefficient use of API calls. Efficiency in API usage is an important feature of a gold standard library. If you are interested in gold standard status, consider making this more efficient by combining API calls as much as possible (e.g. using generators and combining results title=title1|title2|...). One option may be a constructor method that collects Page requests and enables larger, less frequent API calls. It may be possible to take advantage of the database-like structure of the MediaWiki API and help users save bandwidth.
Process-related
- Foster a hospitable attitude on pywikipedia-l, especially to new and/or inexperienced users. Consider agreeing on community standards for interaction; the Hacker School social rules https://www.hackerschool.com/manual#sec-environment may be a useful starting point. - Create more centralized and updated documentation, including: - Easy-to-find, complete, and intuitive installation instructions, including installing via pip and into virtual environments - Code samples for common tasks, including queries and edits - Documentation for people who aren't running bots with existing scripts (particularly researchers and beginning/intermediate bot writers) - Links in method documentation to the corresponding API https://www.mediawiki.org/wiki/API subpages - Streamline or add more resources to the patch review process to reduce the backlog of unreviewed patches
If someone is willing to help out, let's work!
On Fri, Jul 4, 2014 at 2:36 AM, Frances Hocutt frances.hocutt@gmail.com wrote:
Hello all,
This summer I am working on a project to evaluate and improve the available MediaWiki web API client libraries. As pywikibot met the initial criteria of quality, features, and development status I chose to evaluate it in more depth. There is now a "gold standard"[1] that will be used to find and enable the listing of particularly well-designed and easy-to-use MediaWiki web API client libraries--I've now evaluated several Python libraries against this standard and suggested additions and changes that would help them meet the standard.
First, thank you all for contributing to pywikibot and its community of users!
My evaluation for pywikibot is posted here.[2] Pywikibot is impressively full-featured (including Wikidata API coverage), and it makes it possible for bot runners and wiki maintainers to quickly get started automating wiki management tasks. Some areas that could be improved include expanded and centralized documentation, efficiency in use of API calls, and making the setup process lighter-weight and easier to use.
I will follow up by posting specific suggestions to Bugzilla[3] later this week. If you have comments or questions, please feel free to post on the evaluation talk page, respond to the bugs filed, or make corrections on the evaluation page if I've missed something.
-Frances Hocutt MediaWiki intern
[1] https://www.mediawiki.org/wiki/API:Client_code/Gold_standard [2] https://www.mediawiki.org/wiki/API:Client_code/Evaluations/Pywikibot [3] https://bugzilla.wikimedia.org/buglist.cgi?query_format=specific&product...
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Likewise, thank you Francis for this evaluation. It is very helpful.
Are we sure that gzip isnt occurring by default? I started to investigate this a few weeks ago, and confirmed httplib2 defaults to gzip, but I didnt verify that pywiki core isnt meddling with that default.
This is quite important for the performance of WIkidata, as it contains a lot of repetition in the JSON output and that repetition increases as the item grows. e.g new label and sitelinks of articles about species are usually the same as the label / sitelink in a different languages
http://lists.wikimedia.org/pipermail/pywikipedia-l/2014-June/008886.html
On Sat, Jul 5, 2014 at 1:15 AM, Amir Ladsgroup ladsgroup@gmail.com wrote:
Thank you, this is helpful, I want to work on some of them:
Use gzip compression by default Make it easy to add a user-agent header and give examples of a good one in the documentation for it (see https://meta.wikimedia.org/wiki/User-agent_policy) Add Python 3 compatibility (this is in progress for the core branch) Package pywikibot for installation from PyPI via pip install Make the initial installation process lighter-weight:
Design pwb.py with user experience in mind, particularly valuing feedback from new or one-time users during the redesign process Make it possible to install into a virtualenv without putting a config
file
in the home directory Make it possible to run import pywikibot without having to log in
Iterating over a list and calling the API for each item is an inefficient use of API calls. Efficiency in API usage is an important feature of a
gold
standard library. If you are interested in gold standard status, consider making this more efficient by combining API calls as much as possible
(e.g.
using generators and combining resultstitle=title1|title2|...). One option may be a constructor method that collects Page requests and enables
larger,
less frequent API calls. It may be possible to take advantage of the database-like structure of the MediaWiki API and help users save
bandwidth.
Process-related
Foster a hospitable attitude on pywikipedia-l, especially to new and/or inexperienced users. Consider agreeing on community standards for interaction; the Hacker School social rules may be a useful starting
point.
Create more centralized and updated documentation, including:
Easy-to-find, complete, and intuitive installation instructions, including installing via pip and into virtual environments Code samples for common tasks, including queries and edits Documentation for people who aren't running bots with existing scripts (particularly researchers and beginning/intermediate bot writers) Links in method documentation to the corresponding API subpages
Streamline or add more resources to the patch review process to reduce the backlog of unreviewed patches
If someone is willing to help out, let's work!
On Fri, Jul 4, 2014 at 2:36 AM, Frances Hocutt frances.hocutt@gmail.com wrote:
Hello all,
This summer I am working on a project to evaluate and improve the available MediaWiki web API client libraries. As pywikibot met the initial criteria of quality, features, and development status I chose to evaluate it in more depth. There is now a "gold standard"[1] that will be used to find and enable the listing of particularly well-designed and easy-to-use MediaWiki web API client libraries--I've now evaluated several Python libraries against this standard and suggested additions and changes that would help them meet the standard.
First, thank you all for contributing to pywikibot and its community of users!
My evaluation for pywikibot is posted here.[2] Pywikibot is impressively full-featured (including Wikidata API coverage), and it makes it possible for bot runners and wiki maintainers to quickly get started automating wiki management tasks. Some areas that could be improved include expanded and centralized documentation, efficiency in use of API calls, and making the setup process lighter-weight and easier to use.
I will follow up by posting specific suggestions to Bugzilla[3] later this week. If you have comments or questions, please feel free to post on the evaluation talk page, respond to the bugs filed, or make corrections on the evaluation page if I've missed something.
-Frances Hocutt MediaWiki intern
[1] https://www.mediawiki.org/wiki/API:Client_code/Gold_standard [2] https://www.mediawiki.org/wiki/API:Client_code/Evaluations/Pywikibot [3]
https://bugzilla.wikimedia.org/buglist.cgi?query_format=specific&product...
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
-- Amir
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
-- John Vandenberg
I think this blog post would help us a lot (it suggests in stream compression we use zlib instead of gzip) http://rationalpie.wordpress.com/2010/06/02/python-streaming-gzip-decompress...
What do you think?
On Sat, Jul 5, 2014 at 6:16 PM, John Mark Vandenberg jayvdb@gmail.com wrote:
Likewise, thank you Francis for this evaluation. It is very helpful.
Are we sure that gzip isnt occurring by default? I started to investigate this a few weeks ago, and confirmed httplib2 defaults to gzip, but I didnt verify that pywiki core isnt meddling with that default.
This is quite important for the performance of WIkidata, as it contains a lot of repetition in the JSON output and that repetition increases as the item grows. e.g new label and sitelinks of articles about species are usually the same as the label / sitelink in a different languages
http://lists.wikimedia.org/pipermail/pywikipedia-l/2014-June/008886.html
On Sat, Jul 5, 2014 at 1:15 AM, Amir Ladsgroup ladsgroup@gmail.com wrote:
Thank you, this is helpful, I want to work on some of them:
Use gzip compression by default Make it easy to add a user-agent header and give examples of a good one
in
the documentation for it (see https://meta.wikimedia.org/wiki/User-agent_policy) Add Python 3 compatibility (this is in progress for the core branch) Package pywikibot for installation from PyPI via pip install Make the initial installation process lighter-weight:
Design pwb.py with user experience in mind, particularly valuing feedback from new or one-time users during the redesign process Make it possible to install into a virtualenv without putting a config
file
in the home directory Make it possible to run import pywikibot without having to log in
Iterating over a list and calling the API for each item is an inefficient use of API calls. Efficiency in API usage is an important feature of a
gold
standard library. If you are interested in gold standard status, consider making this more efficient by combining API calls as much as possible
(e.g.
using generators and combining resultstitle=title1|title2|...). One
option
may be a constructor method that collects Page requests and enables
larger,
less frequent API calls. It may be possible to take advantage of the database-like structure of the MediaWiki API and help users save
bandwidth.
Process-related
Foster a hospitable attitude on pywikipedia-l, especially to new and/or inexperienced users. Consider agreeing on community standards for interaction; the Hacker School social rules may be a useful starting
point.
Create more centralized and updated documentation, including:
Easy-to-find, complete, and intuitive installation instructions,
including
installing via pip and into virtual environments Code samples for common tasks, including queries and edits Documentation for people who aren't running bots with existing scripts (particularly researchers and beginning/intermediate bot writers) Links in method documentation to the corresponding API subpages
Streamline or add more resources to the patch review process to reduce
the
backlog of unreviewed patches
If someone is willing to help out, let's work!
On Fri, Jul 4, 2014 at 2:36 AM, Frances Hocutt <frances.hocutt@gmail.com
wrote:
Hello all,
This summer I am working on a project to evaluate and improve the available MediaWiki web API client libraries. As pywikibot met the initial criteria of quality, features, and development status I chose to evaluate it in more depth. There is now a "gold standard"[1] that will be used to find and enable the listing of particularly well-designed and easy-to-use MediaWiki web API client libraries--I've now evaluated several Python libraries against this standard and suggested additions and changes that would help them meet the standard.
First, thank you all for contributing to pywikibot and its community of users!
My evaluation for pywikibot is posted here.[2] Pywikibot is impressively full-featured (including Wikidata API coverage), and it makes it possible for bot runners and wiki maintainers to quickly get started automating wiki management tasks. Some areas that could be improved include expanded and centralized documentation, efficiency in use of API calls, and making the setup process lighter-weight and easier to use.
I will follow up by posting specific suggestions to Bugzilla[3] later this week. If you have comments or questions, please feel free to post on the evaluation talk page, respond to the bugs filed, or make corrections on the evaluation page if I've missed something.
-Frances Hocutt MediaWiki intern
[1] https://www.mediawiki.org/wiki/API:Client_code/Gold_standard [2]
https://www.mediawiki.org/wiki/API:Client_code/Evaluations/Pywikibot
[3]
https://bugzilla.wikimedia.org/buglist.cgi?query_format=specific&product...
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
-- Amir
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
-- John Vandenberg
-- John Vandenberg
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
On Sat, Jul 5, 2014 at 9:19 PM, Amir Ladsgroup ladsgroup@gmail.com wrote:
I think this blog post would help us a lot (it suggests in stream compression we use zlib instead of gzip) http://rationalpie.wordpress.com/2010/06/02/python-streaming-gzip-decompress...
What do you think?
It appears that bug is fixed in python 3.2, so I think we should just use compression provided by httplib2, and we should incorporate zlib support into httplib2 if required to work around the bug in python 2.x.
-- John Vandenberg
I made some comments in the talk page.
Best
On Sun, Jul 6, 2014 at 10:58 AM, John Mark Vandenberg jayvdb@gmail.com wrote:
On Sat, Jul 5, 2014 at 9:19 PM, Amir Ladsgroup ladsgroup@gmail.com wrote:
I think this blog post would help us a lot (it suggests in stream compression we use zlib instead of gzip)
http://rationalpie.wordpress.com/2010/06/02/python-streaming-gzip-decompress...
What do you think?
It appears that bug is fixed in python 3.2, so I think we should just use compression provided by httplib2, and we should incorporate zlib support into httplib2 if required to work around the bug in python 2.x.
-- John Vandenberg
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
On Sun, Jul 6, 2014 at 9:14 PM, Amir Ladsgroup ladsgroup@gmail.com wrote:
I made some comments in the talk page.
concluding the gzip query, it is definitely occurring. https://gerrit.wikimedia.org/r/#/c/144850/
A discussion about user-agents is on the talk page: https://www.mediawiki.org/wiki/API_talk:Client_code/Evaluations/Pywikibot
-- John Vandenberg
On Tue, Jul 8, 2014 at 8:47 PM, John Mark Vandenberg jayvdb@gmail.com wrote:
A discussion about user-agents is on the talk page: https://www.mediawiki.org/wiki/API_talk:Client_code/Evaluations/Pywikibot
-- John Vandenberg
I posted a comment there about a possible error in the test script patch.