Hi, I just made a proposal on how to change API to better support simple "continue" scenarious -- see http://lists.wikimedia.org/pipermail/mediawiki-api/2012-December/002768.html , and would like to get some feedback from pywiki community. Would this simplify internal API use inside pywiki? What are the biggest issues with scripts when using API?
On the pywiki side, I am thinking of reworking query module in this direction, and help with migrating all API requests through it. There could always be two levels for the script authors - low level, where individual API parameters are known to the script writer and the result is returned as a dict(), and high level - with most common features offered by API are wrapped in methods, yet the speed is almost the same as low level (multiple data items returned per web request).
pg1 = Page(u'Python') pg2 = Page(u'Cobra') pg3 = Page(u'Viper') pages = [pg1, pg2, pg3] params = {'prop': 'links', 'pllimit' : 'max', 'titles': pages}
# QueryBlocks -- run query until there is no more "continue", return the dictionary as-is from each web call. for block in pywiki.QueryBlocks(params): # Process block
# QueryPages -- will take any query that returns a list of pages, and yield one page at a time. The individual page data will be merged accross multiple API calls in case it exceeds the limit. This method could also return pre-populated Page objects.
for page in pywiki.QueryPages(params): # process one page at a time # Page object will have its links() property populated
# List* methods work with list= API to request all available items based on the parameters:
for page in pywiki.ListAllPages(from=u'T', getContent=True, getLinks=True): # each page object will be prepopulated with links and page content
Thanks! Any feedback is helpful =) --Yuri
After a very productive talk with legoktm @IRC, it seems some of these changes have already been made as part of the rewrite at http://svn.wikimedia.org/viewvc/pywikipedia/branches/rewrite/ documented at http://botwiki.sno.cc/wiki/Rewrite , and only a few things remain:
* GIT migration Mediawiki will not support SVN forever. I wouldn't want to copy and break away completely - history is very important for maintenance (Yes, i saw the previous GIT posts here)
* Python3 migration Guido started it for a very good reason. Much better Unicode, cleaner libraries.
* Directory structure I propose two separate root dirs: pywikibot\ - current code pywikibot3\ - Python3 rewrite The directories should be at the same level, and not as branch, because we will never be able to merge them back together - anyone who wants to continue using Python26+ and older features will keep using pywikibot, and all the new code plus migrated scripts will move to pywikibot3
* API only No need to maintain complex and cumbersome Special:Export and page scraping methods. If mediawiki installation doesn't support API - they don't want bot access. If their installation is over 5 years old, they can use older pywiki framework version. There is still xml dump support in pywiki, but many functions it seems will not work properly, so maybe we should remove that as well.
* Higher-level HTTP library Not as sure about this yet, but it seems python Requests is a better library for the API calls than the httplib2 - for complexity and cleanliness reasons. See http://docs.python-requests.org/
When can we start? :)
On Sun, Dec 16, 2012 at 2:42 AM, Yuri Astrakhan yuriastrakhan@gmail.comwrote:
After a very productive talk with legoktm @IRC, it seems some of these changes have already been made as part of the rewrite at http://svn.wikimedia.org/viewvc/pywikipedia/branches/rewrite/ documented at http://botwiki.sno.cc/wiki/Rewrite , and only a few things remain:
- GIT migration Mediawiki will not support SVN forever. I wouldn't want to copy and
break away completely - history is very important for maintenance (Yes, i saw the previous GIT posts here)
I don't like gerrit, however I think we should move sooner and when we're
ready rather than being forced to move. I'm not a big fan of having to submit patches/issues on sourceforge either...
- Python3 migration Guido started it for a very good reason. Much better Unicode, cleaner
libraries.
Yes, yes, and yes. This would fix nearly every single unicode problem possible.
- Directory structure I propose two separate root dirs: pywikibot\ - current code pywikibot3\ - Python3 rewrite The directories should be at the same level, and not as branch, because
we will never be able to merge them back together - anyone who wants to continue using Python26+ and older features will keep using pywikibot, and all the new code plus migrated scripts will move to pywikibot3
Well I think we should keep the current wikipedia.py trunk version alive, and somewhat supported (as long as someone is willing to write patches?) The current "rewrite" branch should be used as the basis of the pywikibot3 version.
- API only No need to maintain complex and cumbersome Special:Export and page
scraping methods. If mediawiki installation doesn't support API - they don't want bot access. If their installation is over 5 years old, they can use older pywiki framework version. There is still xml dump support in pywiki, but many functions it seems will not work properly, so maybe we should remove that as well.
I think its important for pywiki to still be able to read from a XML dump. There are certain tasks that are much better done from a dump than fetching live wikitext. There are also a few things that still don't have API modules and we should still make it possible to screenscrape those, however anything with an API module should be using the API.
- Higher-level HTTP library Not as sure about this yet, but it seems python Requests is a better
library for the API calls than the httplib2 - for complexity and cleanliness reasons. See http://docs.python-requests.org/
Requests is the library I recommended to Yurik on IRC. It supports just about everything you can think of, so I don't know why we're trying to recreate the wheel by using our own library. That said, I can see an argument for reducing the number of dependencies pywiki uses, but I feel the benefits that would come by using a library like requests outweigh the negatives.
When can we start? :)
As soon as possible! I'm going to busy for the next week or so due to IRL commitments, but I'm all for this as soon as I'm done. Thanks for getting this started Yurik!
-- Legoktm http://enwp.org/User:Legoktm
pywikipedia-l@lists.wikimedia.org