I think this blog post would help us a lot (it suggests in stream compression we use zlib instead of gzip)
http://rationalpie.wordpress.com/2010/06/02/python-streaming-gzip-decompression/

What do you think?


On Sat, Jul 5, 2014 at 6:16 PM, John Mark Vandenberg <jayvdb@gmail.com> wrote:
Likewise, thank you Francis for this evaluation.  It is very helpful.

Are we sure that gzip isnt occurring by default?  I started to investigate this a few weeks ago, and confirmed httplib2 defaults to gzip, but I didnt verify that pywiki core isnt meddling with that default.

This is quite important for the performance of WIkidata, as it contains a lot of repetition in the JSON output and that repetition increases as the item grows. e.g new label and sitelinks of articles about species are usually the same as the label / sitelink in a different languages

http://lists.wikimedia.org/pipermail/pywikipedia-l/2014-June/008886.html


On Sat, Jul 5, 2014 at 1:15 AM, Amir Ladsgroup <ladsgroup@gmail.com> wrote:
> Thank you, this is helpful, I want to work on some of them:
>
> Use gzip compression by default
> Make it easy to add a user-agent header and give examples of a good one in
> the documentation for it (see
> https://meta.wikimedia.org/wiki/User-agent_policy)
> Add Python 3 compatibility (this is in progress for the core branch)
> Package pywikibot for installation from PyPI via pip install
> Make the initial installation process lighter-weight:
>
> Design pwb.py with user experience in mind, particularly valuing feedback
> from new or one-time users during the redesign process
> Make it possible to install into a virtualenv without putting a config file
> in the home directory
> Make it possible to run import pywikibot without having to log in
>
> Iterating over a list and calling the API for each item is an inefficient
> use of API calls. Efficiency in API usage is an important feature of a gold
> standard library. If you are interested in gold standard status, consider
> making this more efficient by combining API calls as much as possible (e.g.
> using generators and combining resultstitle=title1|title2|...). One option
> may be a constructor method that collects Page requests and enables larger,
> less frequent API calls. It may be possible to take advantage of the
> database-like structure of the MediaWiki API and help users save bandwidth.
>
> Process-related
>
> Foster a hospitable attitude on pywikipedia-l, especially to new and/or
> inexperienced users. Consider agreeing on community standards for
> interaction; the Hacker School social rules may be a useful starting point.
> Create more centralized and updated documentation, including:
>
> Easy-to-find, complete, and intuitive installation instructions, including
> installing via pip and into virtual environments
> Code samples for common tasks, including queries and edits
> Documentation for people who aren't running bots with existing scripts
> (particularly researchers and beginning/intermediate bot writers)
> Links in method documentation to the corresponding API subpages
>
> Streamline or add more resources to the patch review process to reduce the
> backlog of unreviewed patches
>
>
> If someone is willing to help out, let's work!
>
>
>
>
> On Fri, Jul 4, 2014 at 2:36 AM, Frances Hocutt <frances.hocutt@gmail.com>
> wrote:
>>
>> Hello all,
>>
>> This summer I am working on a project to evaluate and improve the
>> available MediaWiki web API client libraries. As pywikibot met the
>> initial criteria of quality, features, and development status I chose
>> to evaluate it in more depth. There is now a "gold standard"[1] that
>> will be used to find and enable the listing of particularly
>> well-designed and easy-to-use MediaWiki web API client libraries--I've
>> now evaluated several Python libraries against this standard and
>> suggested additions and changes that would help them meet the
>> standard.
>>
>> First, thank you all for contributing to pywikibot and its community of
>> users!
>>
>> My evaluation for pywikibot is posted here.[2] Pywikibot is
>> impressively full-featured (including Wikidata API coverage), and it
>> makes it possible for bot runners and wiki maintainers to quickly get
>> started automating wiki management tasks.  Some areas that could be
>> improved include expanded and centralized documentation, efficiency in
>> use of API calls, and making the setup process lighter-weight and
>> easier to use.
>>
>> I will follow up by posting specific suggestions to Bugzilla[3] later
>> this week. If you have comments or questions, please feel free to post
>> on the evaluation talk page, respond to the bugs filed, or make
>> corrections on the evaluation page if I've missed something.
>>
>> -Frances Hocutt
>> MediaWiki intern
>>
>> [1] https://www.mediawiki.org/wiki/API:Client_code/Gold_standard
>> [2] https://www.mediawiki.org/wiki/API:Client_code/Evaluations/Pywikibot
>> [3]
>> https://bugzilla.wikimedia.org/buglist.cgi?query_format=specific&product=Pywikibot&list_id=235557
>>
>> _______________________________________________
>> Pywikipedia-l mailing list
>> Pywikipedia-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>
>
>
>
> --
> Amir
>
>
> _______________________________________________
> Pywikipedia-l mailing list
> Pywikipedia-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>



--
John Vandenberg


--
John Vandenberg

_______________________________________________
Pywikipedia-l mailing list
Pywikipedia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l




--
Amir