[Pywikipedia-l] import wikipedia

Marcin Cieslak saper at saper.info
Thu Mar 17 00:34:07 UTC 2011


>> Merlijn van Deen <valhallasw at arctus.nl> wrote:
>
> However. The current framework is so old, it has become quite hard to fix
> these kinds of issues. I'm not 100% sure what the rewrite does for these
> things - but fixing it there is much easier than fixing it in the current
> framework, where all kinds of issues are scattered around.

> Last note - You're the first one I met who actually uses pydoc to inspect
> python objects. All other developers I know use the python prompt and
> object.__doc__ - then your problems are not an issue, which is probably why
> no-one has looked into this before.

pydoc is pretty useful, it has "pydoc -k". It's not much different
from running help(object) in the python interpreter, which actually 
not much different from object.__doc__. But to get an access to
an object, you need to "import wikipedia" first. And then the configuration
dialog pops up. 

Unfortunately, pydoc -k break on pywikipedia (if included in PYTHONPATH):

$ pydoc -k Page
add_text - This is a Bot written by Filnik to add a text at the end of the page but above
archivebot - archivebot.py - discussion page archiving bot.
blockpageschecker - This is a script originally written by Wikihermit and then rewritten by Filnik,
casechecker - Script to enumerate all pages on the wiki and find all titles
catall - Add or change categories on a number of pages. Usage:
category_redirect - This bot will move pages out of redirected categories
catlib - Library to work with category pages on Wikipedia
cfd - This script processes the Categories for discussion working page.  It parses
cosmetic_changes - This module can do slight modifications to a wiki page source code such that
delete - This script can be used to delete and undelete pages en masse.
disambredir - Goes through the disambiguation pages, checks their links, and asks for
djvutext - This bot uploads text from djvu files onto pages in the "Page"
extract_wikilinks - Script to extract all wiki page names a certain HTML file points to in
get - Very simple script which gets a page and writes its contents to
inline_images - This bot goes over multiple pages of the home wiki, and looks for
interwiki - Script to check language links for general pages. This works by downloading the
isbn - This script goes over multiple pages of the home wiki, and reports invalid
lonelypages - This is a script written to add the template "orphan" to the pages that aren't linked by other pages.
movepages - This script can move pages.
noreferences - This script goes over multiple pages, searches for pages where <references />
pagefromfile - This bot takes its input from a file that contains a number of
pagegenerators - This module offers a wide variety of page generators. A page generator is an
pageimport - This is a script to import pages from a certain wiki to another.
protect - This script can be used to protect and unprotect pages en masse.
Traceback (most recent call last):
  File "/usr/local/bin/pydoc", line 5, in <module>
    pydoc.cli()
  File "/usr/local/lib/python2.5/pydoc.py", line 2194, in cli
    apropos(val)
  File "/usr/local/lib/python2.5/pydoc.py", line 1889, in apropos
    ModuleScanner().run(callback, key)
  File "/usr/local/lib/python2.5/pydoc.py", line 1854, in run
    for importer, modname, ispkg in pkgutil.walk_packages():
  File "/usr/local/lib/python2.5/pkgutil.py", line 110, in walk_packages
    __import__(name)
  File "/home/admini/saper/wikipedia/pywikipedia/pywikibot/__init__.py", line 16, in <module>
    from textlib import *
  File "/home/admini/saper/wikipedia/pywikipedia/pywikibot/textlib.py", line 17, in <module>
    import wikipedia as pywikibot
  File "/home/admini/saper/wikipedia/pywikipedia/wikipedia.py", line 7959, in <module>
    get_throttle = Throttle()
NameError: name 'Throttle' is not defined

Probably a script walking the package isn't too smart in figuring complex imports.

Regarding issues and rewrite branch - I see that owing mainly to xqt wikipedia.py and 
friends got so much cleaner.  Of course I agree that at places the code is ugly
and complex, but that unfortunately reflects the reality of MediaWiki. From time 
to time someone pops up with an ancient version of MediaWiki and I am pretty
confident I can use my old scripts and the amount of hacking to adapt pywikipedia
to some quirks will not be overwhelming. Rewrite branch is nice and clean but
probably I wouldn't be using it against anything than pretty recent MediaWiki 
(say, with api and stuff) and it's fine for this purposes. 

There are few things I don't like about rewrite - but this is only 
my opinion and I think it is not relevant to others. Those are:
- the use of "instanceof" that preclude the use of object instances
  that are alike to the wished class but not derived from. I did use
  file-like or string-like custom objects in my programs and I find
  this techique useful.
- And, unforunately, the trunk is somehow better adjusted to the
  past and present of MediaWiki - not only the old style ugly
  screenscraping, but it also gets updated quicker. 
  For example, Page.fullVersionHistory from trunk gives
  longer tuples than its rewrite equivalent. Sure, someone
  updated this not paying much to the broken API.
  MediaWiki updates break pywikipedia from time to time,
  and so (but actually much less often) pywikipedia
  may break scripts using it. That's sometimes how the life goes
  and we cannot (and maybe sometimes we shouldn't) provide
  full backwards compatibility in order to be in full agreement
  with the current MediaWiki state.

But this is just my POV of somebody how does not mind to
add one or two lines of code to his work if pywikipedia
breaks on me for some reason. 

It may happen - *may* - that we will follow the story
of Zope2 vs. Zope 3, which is an interesting lesson to examine
- why Zope 3 never really took off despite being nicely
designed and cleaned up.  

But I actually hoped for a more technical discussion whether
we can move actual framework startup outside of __main__ :)

//Marcin






More information about the Pywikipedia-l mailing list