On Sun, Dec 16, 2012 at 2:42 AM, Yuri Astrakhan <yuriastrakhan(a)gmail.com>wrote;wrote:
After a very productive talk with legoktm @IRC, it
seems some of these
changes have already been made as part of the rewrite at
http://svn.wikimedia.org/viewvc/pywikipedia/branches/rewrite/ documented
at
http://botwiki.sno.cc/wiki/Rewrite , and only a few things remain:
* GIT migration
Mediawiki will not support SVN forever. I wouldn't want to copy and
break away completely - history is very important for maintenance (Yes, i
saw the previous GIT posts here)
I don't like gerrit, however I think we should move sooner and when we're
ready rather than being forced to move. I'm not a big fan of having to
submit patches/issues on sourceforge either...
* Python3 migration
Guido started it for a very good reason. Much better Unicode, cleaner
libraries.
Yes, yes, and yes. This would fix nearly every single unicode problem
possible.
* Directory structure
I propose two separate root dirs:
pywikibot\ - current code
pywikibot3\ - Python3 rewrite
The directories should be at the same level, and not as branch, because
we will never be able to merge them back together - anyone who wants to
continue using Python26+ and older features will keep using pywikibot, and
all the new code plus migrated scripts will move to pywikibot3
Well I think we should keep the current wikipedia.py trunk version alive,
and somewhat supported (as long as someone is willing to write patches?)
The current "rewrite" branch should be used as the basis of the pywikibot3
version.
* API only
No need to maintain complex and cumbersome Special:Export and page
scraping methods. If mediawiki installation doesn't support API - they
don't want bot access. If their installation is over 5 years old, they can
use older pywiki framework version.
There is still xml dump support in pywiki, but many functions it seems
will not work properly, so maybe we should remove that as well.
I think its important for pywiki to still be able to read from a XML dump.
There are certain tasks that are much better done from a dump than fetching
live wikitext.
There are also a few things that still don't have API modules and we should
still make it possible to screenscrape those, however anything with an API
module should be using the API.
* Higher-level HTTP library
Not as sure about this yet, but it seems python Requests is a better
library for the API calls than the httplib2 - for complexity and
cleanliness reasons. See
http://docs.python-requests.org/
Requests is the library I recommended to Yurik on IRC. It supports just
about everything you can think of, so I don't know why we're trying to
recreate the wheel by using our own library. That said, I can see an
argument for reducing the number of dependencies pywiki uses, but I feel
the benefits that would come by using a library like requests outweigh the
negatives.
When can we start? :)
As soon as possible! I'm going to busy for the next week or so due to IRL
commitments, but I'm all for this as soon as I'm done.
Thanks for getting this started Yurik!
-- Legoktm
http://enwp.org/User:Legoktm