On Sun, Dec 16, 2012 at 2:42 AM, Yuri Astrakhan yuriastrakhan@gmail.comwrote:
After a very productive talk with legoktm @IRC, it seems some of these changes have already been made as part of the rewrite at http://svn.wikimedia.org/viewvc/pywikipedia/branches/rewrite/ documented at http://botwiki.sno.cc/wiki/Rewrite , and only a few things remain:
- GIT migration Mediawiki will not support SVN forever. I wouldn't want to copy and
break away completely - history is very important for maintenance (Yes, i saw the previous GIT posts here)
I don't like gerrit, however I think we should move sooner and when we're
ready rather than being forced to move. I'm not a big fan of having to submit patches/issues on sourceforge either...
- Python3 migration Guido started it for a very good reason. Much better Unicode, cleaner
libraries.
Yes, yes, and yes. This would fix nearly every single unicode problem possible.
- Directory structure I propose two separate root dirs: pywikibot\ - current code pywikibot3\ - Python3 rewrite The directories should be at the same level, and not as branch, because
we will never be able to merge them back together - anyone who wants to continue using Python26+ and older features will keep using pywikibot, and all the new code plus migrated scripts will move to pywikibot3
Well I think we should keep the current wikipedia.py trunk version alive, and somewhat supported (as long as someone is willing to write patches?) The current "rewrite" branch should be used as the basis of the pywikibot3 version.
- API only No need to maintain complex and cumbersome Special:Export and page
scraping methods. If mediawiki installation doesn't support API - they don't want bot access. If their installation is over 5 years old, they can use older pywiki framework version. There is still xml dump support in pywiki, but many functions it seems will not work properly, so maybe we should remove that as well.
I think its important for pywiki to still be able to read from a XML dump. There are certain tasks that are much better done from a dump than fetching live wikitext. There are also a few things that still don't have API modules and we should still make it possible to screenscrape those, however anything with an API module should be using the API.
- Higher-level HTTP library Not as sure about this yet, but it seems python Requests is a better
library for the API calls than the httplib2 - for complexity and cleanliness reasons. See http://docs.python-requests.org/
Requests is the library I recommended to Yurik on IRC. It supports just about everything you can think of, so I don't know why we're trying to recreate the wheel by using our own library. That said, I can see an argument for reducing the number of dependencies pywiki uses, but I feel the benefits that would come by using a library like requests outweigh the negatives.
When can we start? :)
As soon as possible! I'm going to busy for the next week or so due to IRL commitments, but I'm all for this as soon as I'm done. Thanks for getting this started Yurik!
-- Legoktm http://enwp.org/User:Legoktm