SVN: [10528] archive - Pywikipedia-svn

16 Sep 2012

http://www.mediawiki.org/wiki/Special:Code/pywikipedia/10528

Revision: 10528
Author:   xqt
Date:     2012-09-16 13:48:36 +0000 (Sun, 16 Sep 2012)
Log Message:
-----------
old python 2.3 scripts

Added Paths:
-----------
    archive/old python 2.3 scripts/
    archive/old python 2.3 scripts/interwiki.py
    archive/old python 2.3 scripts/wikipedia.py

Copied: archive/old python 2.3 scripts/interwiki.py (from rev 10463,
trunk/pywikipedia/interwiki.py)
===================================================================

--- archive/old python 2.3 scripts/interwiki.py	                        (rev 0)
+++ archive/old python 2.3 scripts/interwiki.py	2012-09-16 13:48:36 UTC (rev 10528)
@@ -0,0 +1,2585 @@
+#!/usr/bin/python
+# -*- coding: utf-8  -*-
+"""
+Script to check language links for general pages. This works by downloading the
+page, and using existing translations plus hints from the command line to
+download the equivalent pages from other languages. All of such pages are
+downloaded as well and checked for interwiki links recursively until there are
+no more links that are encountered. A rationalization process then selects the
+right interwiki links, and if this is unambiguous, the interwiki links in the
+original page will be automatically updated and the modified page uploaded.
+
+These command-line arguments can be used to specify which pages to work on:
+
+&pagegenerators_help;
+
+    -days:         Like -years, but runs through all date pages. Stops at
+                   Dec 31.  If the argument is given in the form -days:X,
+                   it will start at month no. X through Dec 31. If the
+                   argument is simply given as -days, it will run from
+                   Jan 1 through Dec 31.  E.g. for -days:9 it will run
+                   from Sep 1 through Dec 31.
+
+    -years:        run on all year pages in numerical order. Stop at year 2050.
+                   If the argument is given in the form -years:XYZ, it
+                   will run from [[XYZ]] through [[2050]]. If XYZ is a
+                   negative value, it is interpreted as a year BC. If the
+                   argument is simply given as -years, it will run from 1
+                   through 2050.
+
+                   This implies -noredirect.
+
+    -new:          Work on the 100 newest pages. If given as -new:x, will work
+                   on the x newest pages.
+                   When multiple -namespace parameters are given, x pages are
+                   inspected, and only the ones in the selected name spaces are
+                   processed. Use -namespace:all for all namespaces. Without
+                   -namespace, only article pages are processed.
+
+                   This implies -noredirect.
+
+    -restore:      restore a set of "dumped" pages the robot was working on
+                   when it terminated. The dump file will be subsequently
+                   removed.
+
+    -restore:all   restore a set of "dumped" pages of all dumpfiles to a given
+                   family remaining in the "interwiki-dumps" directory. All
+                   these dump files will be subsequently removed. If restoring
+                   process interrupts again, it saves all unprocessed pages in
+                   one new dump file of the given site.
+
+    -continue:     like restore, but after having gone through the dumped pages,
+                   continue alphabetically starting at the last of the dumped
+                   pages. The dump file will be subsequently removed.
+
+    -warnfile:     used as -warnfile:filename, reads all warnings from the
+                   given file that apply to the home wiki language,
+                   and read the rest of the warning as a hint. Then
+                   treats all the mentioned pages. A quicker way to
+                   implement warnfile suggestions without verifying them
+                   against the live wiki is using the warnfile.py
+                   script.
+
+Additionaly, these arguments can be used to restrict the bot to certain pages:
+
+    -namespace:n   Number or name of namespace to process. The parameter can be
+                   used multiple times. It works in combination with all other
+                   parameters, except for the -start parameter. If you e.g.
+                   want to iterate over all categories starting at M, use
+                   -start:Category:M.
+
+    -number:       used as -number:#, specifies that the robot should process
+                   that amount of pages and then stop. This is only useful in
+                   combination with -start. The default is not to stop.
+
+    -until:        used as -until:title, specifies that the robot should
+                   process pages in wiki default sort order up to, and
+                   including, "title" and then stop. This is only useful in
+                   combination with -start. The default is not to stop.
+                   Note: do not specify a namespace, even if -start has one.
+
+    -bracket       only work on pages that have (in the home language)
+                   parenthesis in their title. All other pages are skipped.
+                   (note: without ending colon)
+
+    -skipfile:     used as -skipfile:filename, skip all links mentioned in
+                   the given file. This does not work with -number!
+
+    -skipauto      use to skip all pages that can be translated automatically,
+                   like dates, centuries, months, etc.
+                   (note: without ending colon)
+
+    -lack:         used as -lack:xx with xx a language code: only work on pages
+                   without links to language xx. You can also add a number nn
+                   like -lack:xx:nn, so that the bot only works on pages with
+                   at least nn interwiki links (the default value for nn is 1).
+
+These arguments control miscellanous bot behaviour:
+
+    -quiet         Use this option to get less output
+                   (note: without ending colon)
+
+    -async         Put page on queue to be saved to wiki asynchronously. This
+                   enables loading pages during saving throtteling and gives a
+                   better performance.
+                   NOTE: For post-processing it always assumes that saving the
+                   the pages was sucessful.
+                   (note: without ending colon)
+
+    -summary:      Set an additional action summary message for the edit. This
+                   could be used for further explainings of the bot action.
+                   This will only be used in non-autonomous mode.
+
+    -hintsonly     The bot does not ask for a page to work on, even if none of
+                   the above page sources was specified.  This will make the
+                   first existing page of -hint or -hinfile slip in as the start
+                   page, determining properties like namespace, disambiguation
+                   state, and so on.  When no existing page is found in the
+                   hints, the bot does nothing.
+                   Hitting return without input on the "Which page to check:"
+                   prompt has the same effect as using -hintsonly.
+                   Options like -back, -same or -wiktionary are in effect only
+                   after a page has been found to work on.
+                   (note: without ending colon)
+
+These arguments are useful to provide hints to the bot:
+
+    -hint:         used as -hint:de:Anweisung to give the robot a hint
+                   where to start looking for translations. If no text
+                   is given after the second ':', the name of the page
+                   itself is used as the title for the hint, unless the
+                   -hintnobracket command line option (see there) is also
+                   selected.
+
+                   There are some special hints, trying a number of languages
+                   at once:
+                      * all:       All languages with at least ca. 100 articles.
+                      * 10:        The 10 largest languages (sites with most
+                                   articles). Analogous for any other natural
+                                   number.
+                      * arab:      All languages using the Arabic alphabet.
+                      * cyril:     All languages that use the Cyrillic alphabet.
+                      * chinese:   All Chinese dialects.
+                      * latin:     All languages using the Latin script.
+                      * scand:     All Scandinavian languages.
+
+                   Names of families that forward their interlanguage links
+                   to the wiki family being worked upon can be used (with
+                   -family=wikipedia only), they are:
+                      * commons:   Interlanguage links of Mediawiki Commons.
+                      * incubator: Links in pages on the Mediawiki Incubator.
+                      * meta:      Interlanguage links of named pages on Meta.
+                      * species:   Interlanguage links of the wikispecies wiki.
+                      * strategy:  Links in pages on Wikimedias strategy wiki.
+                      * test:      Take interwiki links from Test Wikipedia
+
+                   Languages, groups and families having the same page title
+                   can be combined, as  -hint:5,scand,sr,pt,commons:New_York
+
+    -hintfile:     similar to -hint, except that hints are taken from the given
+                   file, enclosed in [[]] each, instead of the command line.
+
+    -askhints:     for each page one or more hints are asked. See hint: above
+                   for the format, one can for example give "en:something" or
+                   "20:" as hint.
+
+    -same          looks over all 'serious' languages for the same title.
+                   -same is equivalent to -hint:all:
+                   (note: without ending colon)
+
+    -wiktionary:   similar to -same, but will ONLY accept names that are
+                   identical to the original. Also, if the title is not
+                   capitalized, it will only go through other wikis without
+                   automatic capitalization.
+
+    -untranslated: works normally on pages with at least one interlanguage
+                   link; asks for hints for pages that have none.
+
+    -untranslatedonly: same as -untranslated, but pages which already have a
+                   translation are skipped. Hint: do NOT use this in
+                   combination with -start without a -number limit, because
+                   you will go through the whole alphabet before any queries
+                   are performed!
+
+    -showpage      when asking for hints, show the first bit of the text
+                   of the page always, rather than doing so only when being
+                   asked for (by typing '?'). Only useful in combination
+                   with a hint-asking option like -untranslated, -askhints
+                   or -untranslatedonly.
+                   (note: without ending colon)
+
+    -noauto        Do not use the automatic translation feature for years and
+                   dates, only use found links and hints.
+                   (note: without ending colon)
+
+    -hintnobracket used to make the robot strip everything in brackets,
+                   and surrounding spaces from the page name, before it is
+                   used in a -hint:xy: where the page name has been left out,
+                   or -hint:all:, -hint:10:, etc. without a name, or
+                   an -askhint reply, where only a language is given.
+
+These arguments define how much user confirmation is required:
+
+    -autonomous    run automatically, do not ask any questions. If a question
+    -auto          to an operator is needed, write the name of the page
+                   to autonomous_problems.dat and continue on the next page.
+                   (note: without ending colon)
+
+    -confirm       ask for confirmation before any page is changed on the
+                   live wiki. Without this argument, additions and
+                   unambiguous modifications are made without confirmation.
+                   (note: without ending colon)
+
+    -force         do not ask permission to make "controversial" changes,
+                   like removing a language because none of the found
+                   alternatives actually exists.
+                   (note: without ending colon)
+
+    -cleanup       like -force but only removes interwiki links to non-existent
+                   or empty pages.
+
+    -select        ask for each link whether it should be included before
+                   changing any page. This is useful if you want to remove
+                   invalid interwiki links and if you do multiple hints of
+                   which some might be correct and others incorrect. Combining
+                   -select and -confirm is possible, but seems like overkill.
+                   (note: without ending colon)
+
+These arguments specify in which way the bot should follow interwiki links:
+
+    -noredirect    do not follow redirects nor category redirects.
+                   (note: without ending colon)
+
+    -initialredirect  work on its target if a redirect or category redirect is
+                   entered on the command line or by a generator (note: without
+                   ending colon). It is recommended to use this option with the
+                   -movelog pagegenerator.
+
+    -neverlink:    used as -neverlink:xx where xx is a language code:
+                   Disregard any links found to language xx. You can also
+                   specify a list of languages to disregard, separated by
+                   commas.
+
+    -ignore:       used as -ignore:xx:aaa where xx is a language code, and
+                   aaa is a page title to be ignored.
+
+    -ignorefile:   similar to -ignore, except that the pages are taken from
+                   the given file instead of the command line.
+
+    -localright    do not follow interwiki links from other pages than the
+                   starting page. (Warning! Should be used very sparingly,
+                   only when you are sure you have first gotten the interwiki
+                   links on the starting page exactly right).
+                   (note: without ending colon)
+
+    -hintsareright do not follow interwiki links to sites for which hints
+                   on existing pages are given. Note that, hints given
+                   interactively, via the -askhint command line option,
+                   are only effective once they have been entered, thus
+                   interwiki links on the starting page are followed
+                   regardess of hints given when prompted.
+                   (Warning! Should be used with caution!)
+                   (note: without ending colon)
+
+    -back          only work on pages that have no backlink from any other
+                   language; if a backlink is found, all work on the page
+                   will be halted.  (note: without ending colon)
+
+The following arguments are only important for users who have accounts for
+multiple languages, and specify on which sites the bot should modify pages:
+
+    -localonly     only work on the local wiki, not on other wikis in the
+                   family I have a login at. (note: without ending colon)
+
+    -limittwo      only update two pages - one in the local wiki (if logged-in)
+                   and one in the top available one.
+                   For example, if the local page has links to de and fr,
+                   this option will make sure that only the local site and
+                   the de: (larger) sites are updated. This option is useful
+                   to quickly set two way links without updating all of the
+                   wiki families sites.
+                   (note: without ending colon)
+
+    -whenneeded    works like limittwo, but other languages are changed in the
+                   following cases:
+                   * If there are no interwiki links at all on the page
+                   * If an interwiki link must be removed
+                   * If an interwiki link must be changed and there has been
+                     a conflict for this page
+                   Optionally, -whenneeded can be given an additional number
+                   (for example -whenneeded:3), in which case other languages
+                   will be changed if there are that number or more links to
+                   change or add. (note: without ending colon)
+
+The following arguments influence how many pages the bot works on at once:
+
+    -array:        The number of pages the bot tries to be working on at once.
+                   If the number of pages loaded is lower than this number,
+                   a new set of pages is loaded from the starting wiki. The
+                   default is 100, but can be changed in the config variable
+                   interwiki_min_subjects
+
+    -query:        The maximum number of pages that the bot will load at once.
+                   Default value is 60.
+
+Some configuration option can be used to change the working of this robot:
+
+interwiki_min_subjects: the minimum amount of subjects that should be processed
+                    at the same time.
+
+interwiki_backlink: if set to True, all problems in foreign wikis will
+                    be reported
+
+interwiki_shownew:  should interwiki.py display every new link it discovers?
+
+interwiki_graph:    output a graph PNG file on conflicts? You need pydot for
+                    this: http://dkbza.org/pydot.html
+
+interwiki_graph_format: the file format for interwiki graphs
+
+without_interwiki:  save file with local articles without interwikis
+
+All these options can be changed through the user-config.py configuration file.
+
+If interwiki.py is terminated before it is finished, it will write a dump file
+to the interwiki-dumps subdirectory. The program will read it if invoked with
+the "-restore" or "-continue" option, and finish all the subjects in
that list.
+After finishing the dump file will be deleted. To run the interwiki-bot on all
+pages on a language, run it with option "-start:!", and if it takes so long
+that you have to break it off, use "-continue" next time.
+
+"""
+#
+# (C) Rob W.W. Hooft, 2003
+# (C) Daniel Herding, 2004
+# (C) Yuri Astrakhan, 2005-2006
+# (C) xqt, 2009-2012
+# (C) Pywikipedia bot team, 2007-2012
+#
+# Distributed under the terms of the MIT license.
+#
+__version__ = '$Id$'
+#
+
+import sys, copy, re, os
+import time
+import codecs
+import socket
+
+try:
+    set # introduced in Python 2.4: faster and future
+except NameError:
+    from sets import Set as set
+
+try: sorted ## Introduced in 2.4
+except NameError:
+    def sorted(seq, cmp=None, key=None, reverse=False):
+        """Copy seq and sort and return it.
+        >>> sorted([3, 1, 2])
+        [1, 2, 3]
+        """
+        seq2 = copy.copy(seq)
+        if key:
+            if cmp is None:
+                cmp = __builtins__.cmp
+            seq2.sort(lambda x,y: cmp(key(x), key(y)))
+        else:
+            if cmp is None:
+                seq2.sort()
+            else:
+                seq2.sort(cmp)
+        if reverse:
+            seq2.reverse()
+        return seq2
+
+import wikipedia as pywikibot
+import config
+import catlib
+import pagegenerators
+from pywikibot import i18n
+import titletranslate, interwiki_graph
+import webbrowser
+
+docuReplacements = {
+    '&pagegenerators_help;': pagegenerators.parameterHelp
+}
+
+class SaveError(pywikibot.Error):
+    """
+    An attempt to save a page with changed interwiki has failed.
+    """
+
+class LinkMustBeRemoved(SaveError):
+    """
+    An interwiki link has to be removed, but this can't be done because of user
+    preferences or because the user chose not to change the page.
+    """
+
+class GiveUpOnPage(pywikibot.Error):
+    """
+    The user chose not to work on this page and its linked pages any more.
+    """
+
+# Subpage templates. Must be in lower case,
+# whereas subpage itself must be case sensitive
+moved_links = {
+    'bn' : (u'documentation', u'/doc'),
+    'ca' : (u'ús de la plantilla', u'/ús'),
+    'cs' : (u'dokumentace',   u'/doc'),
+    'da' : (u'dokumentation', u'/doc'),
+    'de' : (u'dokumentation', u'/Meta'),
+    'dsb': ([u'dokumentacija', u'doc'],
u'/Dokumentacija'),
+    'en' : ([u'documentation',
+             u'template documentation',
+             u'template doc',
+             u'doc',
+             u'documentation, template'], u'/doc'),
+    'es' : ([u'documentación', u'documentación de plantilla'],
u'/doc'),
+    'eu' : (u'txantiloi dokumentazioa', u'/dok'),
+    'fa' : ([u'documentation',
+             u'template documentation',
+             u'template doc',
+             u'doc',
+             u'توضیحات',
+             u'زیرصفحه توضیحات'], u'/doc'),
+    # fi: no idea how to handle this type of subpage at :Metasivu:
+    'fi' : (u'mallineohje', None),
+    'fr' : ([u'/documentation', u'documentation',
u'doc_modèle',
+             u'documentation modèle', u'documentation modèle compliqué',
+             u'documentation modèle en sous-page',
+             u'documentation modèle compliqué en sous-page',
+             u'documentation modèle utilisant les parserfunctions en sous-page',
+            ],
+            u'/Documentation'),
+    'hsb': ([u'dokumentacija', u'doc'],
u'/Dokumentacija'),
+    'hu' : (u'sablondokumentáció', u'/doc'),
+    'id' : (u'template doc',  u'/doc'),
+    'ja' : (u'documentation', u'/doc'),
+    'ka' : (u'თარგის ინფო',   u'/ინფო'),
+    'ko' : (u'documentation', u'/설명문서'),
+    'ms' : (u'documentation', u'/doc'),
+    'no' : (u'dokumentasjon', u'/dok'),
+    'nn' : (u'dokumentasjon', u'/dok'),
+    'pl' : (u'dokumentacja',  u'/opis'),
+    'pt' : ([u'documentação', u'/doc'],  u'/doc'),
+    'ro' : (u'documentaţie',  u'/doc'),
+    'ru' : (u'doc',           u'/doc'),
+    'sv' : (u'dokumentation', u'/dok'),
+    'uk' : ([u'документація',
+             u'doc',
+             u'documentation'], u'/Документація'),
+    'vi' : (u'documentation', u'/doc'),
+    'zh' : ([u'documentation', u'doc'], u'/doc'),
+}
+
+# A list of template names in different languages.
+# Pages which contain these shouldn't be changed.
+ignoreTemplates = {
+    '_default': [u'delete'],
+    'ar' : [u'قيد الاستخدام'],
+    'cs' : [u'Pracuje_se'],
+    'de' : [u'inuse', 'in use', u'in bearbeitung',
u'inbearbeitung',
+            u'löschen', u'sla',
+            u'löschantrag', u'löschantragstext',
+            u'falschschreibung',
+            u'obsolete schreibung', 'veraltete schreibweise'],
+    'en' : [u'inuse', u'softredirect'],
+    'fa' : [u'در دست ویرایش ۲', u'حذف سریع'],
+    'pdc': [u'lösche'],
+}
+
+class Global(object):
+    """
+    Container class for global settings.
+    Use of globals outside of this is to be avoided.
+    """
+    autonomous = False
+    confirm = False
+    always = False
+    select = False
+    followredirect = True
+    initialredirect = False
+    force = False
+    cleanup = False
+    remove = []
+    maxquerysize = 60
+    same = False
+    skip = set()
+    skipauto = False
+    untranslated = False
+    untranslatedonly = False
+    auto = True
+    neverlink = []
+    showtextlink = 0
+    showtextlinkadd = 300
+    localonly = False
+    limittwo = False
+    strictlimittwo = False
+    needlimit = 0
+    ignore = []
+    parenthesesonly = False
+    rememberno = False
+    followinterwiki = True
+    minsubjects = config.interwiki_min_subjects
+    nobackonly = False
+    askhints = False
+    hintnobracket = False
+    hints = []
+    hintsareright = False
+    contentsondisk = config.interwiki_contents_on_disk
+    lacklanguage = None
+    minlinks = 0
+    quiet  = False
+    restoreAll = False
+    async  = False
+    summary = u''
+
+    def readOptions(self, arg):
+        """ Read all commandline parameters for the global container
"""
+        if arg == '-noauto':
+            self.auto = False
+        elif arg.startswith('-hint:'):
+            self.hints.append(arg[6:])
+        elif arg.startswith('-hintfile'):
+            hintfilename = arg[10:]
+            if (hintfilename is None) or (hintfilename == ''):
+                hintfilename = pywikibot.input(u'Please enter the hint
filename:')
+            f = codecs.open(hintfilename, 'r', config.textfile_encoding)
+            R = re.compile(ur'\[\[(.+?)(?:\]\]|\|)') # hint or title ends either
before | or before ]]
+            for pageTitle in R.findall(f.read()):
+                self.hints.append(pageTitle)
+            f.close()
+        elif arg == '-force':
+            self.force = True
+        elif arg == '-cleanup':
+            self.cleanup = True
+        elif arg == '-same':
+            self.same = True
+        elif arg == '-wiktionary':
+            self.same = 'wiktionary'
+        elif arg == '-untranslated':
+            self.untranslated = True
+        elif arg == '-untranslatedonly':
+            self.untranslated = True
+            self.untranslatedonly = True
+        elif arg == '-askhints':
+            self.untranslated = True
+            self.untranslatedonly = False
+            self.askhints = True
+        elif arg == '-hintnobracket':
+            self.hintnobracket = True
+        elif arg == '-confirm':
+            self.confirm = True
+        elif arg == '-select':
+            self.select = True
+        elif arg == '-autonomous' or arg == '-auto':
+            self.autonomous = True
+        elif arg == '-noredirect':
+            self.followredirect = False
+        elif arg == '-initialredirect':
+            self.initialredirect = True
+        elif arg == '-localonly':
+            self.localonly = True
+        elif arg == '-limittwo':
+            self.limittwo = True
+            self.strictlimittwo = True
+        elif arg.startswith('-whenneeded'):
+            self.limittwo = True
+            self.strictlimittwo = False
+            try:
+                self.needlimit = int(arg[12:])
+            except KeyError:
+                pass
+            except ValueError:
+                pass
+        elif arg.startswith('-skipfile:'):
+            skipfile = arg[10:]
+            skipPageGen = pagegenerators.TextfilePageGenerator(skipfile)
+            for page in skipPageGen:
+                self.skip.add(page)
+            del skipPageGen
+        elif arg == '-skipauto':
+            self.skipauto = True
+        elif arg.startswith('-neverlink:'):
+            self.neverlink += arg[11:].split(",")
+        elif arg.startswith('-ignore:'):
+            self.ignore += [pywikibot.Page(None,p) for p in
arg[8:].split(",")]
+        elif arg.startswith('-ignorefile:'):
+            ignorefile = arg[12:]
+            ignorePageGen = pagegenerators.TextfilePageGenerator(ignorefile)
+            for page in ignorePageGen:
+                self.ignore.append(page)
+            del ignorePageGen
+        elif arg == '-showpage':
+            self.showtextlink += self.showtextlinkadd
+        elif arg == '-graph':
+            # override configuration
+            config.interwiki_graph = True
+        elif arg == '-bracket':
+            self.parenthesesonly = True
+        elif arg == '-localright':
+            self.followinterwiki = False
+        elif arg == '-hintsareright':
+            self.hintsareright = True
+        elif arg.startswith('-array:'):
+            self.minsubjects = int(arg[7:])
+        elif arg.startswith('-query:'):
+            self.maxquerysize = int(arg[7:])
+        elif arg == '-back':
+            self.nobackonly = True
+        elif arg == '-quiet':
+            self.quiet = True
+        elif arg == '-async':
+            self.async = True
+        elif arg.startswith('-summary'):
+            if len(arg) == 8:
+                self.summary = pywikibot.input(u'What summary do you want to
use?')
+            else:
+                self.summary = arg[9:]
+        elif arg.startswith('-lack:'):
+            remainder = arg[6:].split(':')
+            self.lacklanguage = remainder[0]
+            if len(remainder) > 1:
+                self.minlinks = int(remainder[1])
+            else:
+                self.minlinks = 1
+        else:
+            return False
+        return True
+
+class StoredPage(pywikibot.Page):
+    """
+    Store the Page contents on disk to avoid sucking too much
+    memory when a big number of Page objects will be loaded
+    at the same time.
+    """
+
+    # Please prefix the class members names by SP
+    # to avoid possible name clashes with pywikibot.Page
+
+    # path to the shelve
+    SPpath = None
+    # shelve
+    SPstore = None
+
+    # attributes created by pywikibot.Page.__init__
+    SPcopy = [ '_editrestriction',
+               '_site',
+               '_namespace',
+               '_section',
+               '_title',
+               'editRestriction',
+               'moveRestriction',
+               '_permalink',
+               '_userName',
+               '_ipedit',
+               '_editTime',
+               '_startTime',
+               '_revisionId',
+               '_deletedRevs' ]
+
+    def SPdeleteStore():
+        if StoredPage.SPpath:
+            del StoredPage.SPstore
+            os.unlink(StoredPage.SPpath)
+    SPdeleteStore = staticmethod(SPdeleteStore)
+
+    def __init__(self, page):
+        for attr in StoredPage.SPcopy:
+            setattr(self, attr, getattr(page, attr))
+
+        if not StoredPage.SPpath:
+            import shelve
+            index = 1
+            while True:
+                path = config.datafilepath('cache', 'pagestore' +
str(index))
+                if not os.path.exists(path): break
+                index += 1
+            StoredPage.SPpath = path
+            StoredPage.SPstore = shelve.open(path)
+
+        self.SPkey = str(self)
+        self.SPcontentSet = False
+
+    def SPgetContents(self):
+        return StoredPage.SPstore[self.SPkey]
+
+    def SPsetContents(self, contents):
+        self.SPcontentSet = True
+        StoredPage.SPstore[self.SPkey] = contents
+
+    def SPdelContents(self):
+        if self.SPcontentSet:
+            del StoredPage.SPstore[self.SPkey]
+
+    _contents = property(SPgetContents, SPsetContents, SPdelContents)
+
+class PageTree(object):
+    """
+    Structure to manipulate a set of pages.
+    Allows filtering efficiently by Site.
+    """
+    def __init__(self):
+        # self.tree :
+        # Dictionary:
+        # keys: Site
+        # values: list of pages
+        # All pages found within Site are kept in
+        # self.tree[site]
+
+        # While using dict values would be faster for
+        # the remove() operation,
+        # keeping list values is important, because
+        # the order in which the pages were found matters:
+        # the earlier a page is found, the closer it is to the
+        # Subject.originPage. Chances are that pages found within
+        # 2 interwiki distance from the originPage are more related
+        # to the original topic than pages found later on, after
+        # 3, 4, 5 or more interwiki hops.
+
+        # Keeping this order is hence important to display an ordered
+        # list of pages to the user when he'll be asked to resolve
+        # conflicts.
+        self.tree = {}
+        self.size = 0
+
+    def filter(self, site):
+        """
+        Iterates over pages that are in Site site
+        """
+        try:
+            for page in self.tree[site]:
+                yield page
+        except KeyError:
+            pass
+
+    def __len__(self):
+        return self.size
+
+    def add(self, page):
+        site = page.site
+        if not site in self.tree:
+            self.tree[site] = []
+        self.tree[site].append(page)
+        self.size += 1
+
+    def remove(self, page):
+        try:
+            self.tree[page.site].remove(page)
+            self.size -= 1
+        except ValueError:
+            pass
+
+    def removeSite(self, site):
+        """
+        Removes all pages from Site site
+        """
+        try:
+            self.size -= len(self.tree[site])
+            del self.tree[site]
+        except KeyError:
+            pass
+
+    def siteCounts(self):
+        """
+        Yields (Site, number of pages in site) pairs
+        """
+        for site, d in self.tree.iteritems():
+            yield site, len(d)
+
+    def __iter__(self):
+        for site, plist in self.tree.iteritems():
+            for page in plist:
+                yield page
+
+class Subject(object):
+    """
+    Class to follow the progress of a single 'subject' (i.e. a page with
+    all its translations)
+
+
+    Subject is a transitive closure of the binary relation on Page:
+    "has_a_langlink_pointing_to".
+
+    A formal way to compute that closure would be:
+
+    With P a set of pages, NL ('NextLevel') a function on sets defined as:
+        NL(P) = { target | ∃ source ∈ P, target ∈ source.langlinks() }
+    pseudocode:
+        todo <- [originPage]
+        done <- []
+        while todo != []:
+            pending <- todo
+            todo <-NL(pending) / done
+            done <- NL(pending) U done
+        return done
+
+
+    There is, however, one limitation that is induced by implementation:
+    to compute efficiently NL(P), one has to load the page contents of
+    pages in P.
+    (Not only the langlinks have to be parsed from each Page, but we also want
+     to know if the Page is a redirect, a disambiguation, etc...)
+
+    Because of this, the pages in pending have to be preloaded.
+    However, because the pages in pending are likely to be in several sites
+    we cannot "just" preload them as a batch.
+
+    Instead of doing "pending <- todo" at each iteration, we have to elect
a
+    Site, and we put in pending all the pages from todo that belong to that
+    Site:
+
+    Code becomes:
+        todo <- {originPage.site:[originPage]}
+        done <- []
+        while todo != {}:
+            site <- electSite()
+            pending <- todo[site]
+
+            preloadpages(site, pending)
+
+            todo[site] <- NL(pending) / done
+            done <- NL(pending) U done
+        return done
+
+
+    Subject objects only operate on pages that should have been preloaded before.
+    In fact, at any time:
+      * todo contains new Pages that have not been loaded yet
+      * done contains Pages that have been loaded, and that have been treated.
+      * If batch preloadings are successful, Page._get() is never called from
+        this Object.
+    """
+
+    def __init__(self, originPage=None, hints=None):
+        """Constructor. Takes as arguments the Page on the home wiki
+           plus optionally a list of hints for translation"""
+
+        if globalvar.contentsondisk:
+            if originPage:
+                originPage = StoredPage(originPage)
+
+        # Remember the "origin page"
+        self.originPage = originPage
+        # todo is a list of all pages that still need to be analyzed.
+        # Mark the origin page as todo.
+        self.todo = PageTree()
+        if originPage:
+            self.todo.add(originPage)
+
+        # done is a list of all pages that have been analyzed and that
+        # are known to belong to this subject.
+        self.done = PageTree()
+        # foundIn is a dictionary where pages are keys and lists of
+        # pages are values. It stores where we found each page.
+        # As we haven't yet found a page that links to the origin page, we
+        # start with an empty list for it.
+        if originPage:
+            self.foundIn = {self.originPage:[]}
+        else:
+            self.foundIn = {}
+        # This is a list of all pages that are currently scheduled for
+        # download.
+        self.pending = PageTree()
+        if globalvar.hintsareright:
+            # This is a set of sites that we got hints to
+            self.hintedsites = set()
+        self.translate(hints, globalvar.hintsareright)
+        self.confirm = globalvar.confirm
+        self.problemfound = False
+        self.untranslated = None
+        self.hintsAsked = False
+        self.forcedStop = False
+        self.workonme = True
+
+    def getFoundDisambig(self, site):
+        """
+        If we found a disambiguation on the given site while working on the
+        subject, this method returns it. If several ones have been found, the
+        first one will be returned.
+        Otherwise, None will be returned.
+        """
+        for tree in [self.done, self.pending]:
+            for page in tree.filter(site):
+                if page.exists() and page.isDisambig():
+                    return page
+        return None
+
+    def getFoundNonDisambig(self, site):
+        """
+        If we found a non-disambiguation on the given site while working on the
+        subject, this method returns it. If several ones have been found, the
+        first one will be returned.
+        Otherwise, None will be returned.
+        """
+        for tree in [self.done, self.pending]:
+            for page in tree.filter(site):
+                if page.exists() and not page.isDisambig() \
+                   and not page.isRedirectPage() and not page.isCategoryRedirect():
+                    return page
+        return None
+
+    def getFoundInCorrectNamespace(self, site):
+        """
+        If we found a page that has the expected namespace on the given site
+        while working on the subject, this method returns it. If several ones
+        have been found, the first one will be returned.
+        Otherwise, None will be returned.
+        """
+        for tree in [self.done, self.pending, self.todo]:
+            for page in tree.filter(site):
+                # -hintsonly: before we have an origin page, any namespace will do.
+                if self.originPage and page.namespace() == self.originPage.namespace():
+                    if page.exists() and not page.isRedirectPage() and not
page.isCategoryRedirect():
+                        return page
+        return None
+
+    def translate(self, hints = None, keephintedsites = False):
+        """Add the given translation hints to the todo
list"""
+        if globalvar.same and self.originPage:
+            if hints:
+                pages = titletranslate.translate(self.originPage, hints = hints +
['all:'],
+                           auto = globalvar.auto, removebrackets =
globalvar.hintnobracket)
+            else:
+                pages = titletranslate.translate(self.originPage, hints =
['all:'],
+                           auto = globalvar.auto, removebrackets =
globalvar.hintnobracket)
+        else:
+            pages = titletranslate.translate(self.originPage, hints=hints,
+                           auto=globalvar.auto, removebrackets=globalvar.hintnobracket,
+                           site=pywikibot.getSite())
+        for page in pages:
+            if globalvar.contentsondisk:
+                page = StoredPage(page)
+            self.todo.add(page)
+            self.foundIn[page] = [None]
+            if keephintedsites:
+                self.hintedsites.add(page.site)
+
+    def openSites(self):
+        """
+        Iterator. Yields (site, count) pairs:
+        * site is a site where we still have work to do on
+        * count is the number of items in that Site that need work on
+        """
+        return self.todo.siteCounts()
+
+    def whatsNextPageBatch(self, site):
+        """
+        By calling this method, you 'promise' this instance that you will
+        preload all the 'site' Pages that are in the todo list.
+
+        This routine will return a list of pages that can be treated.
+        """
+        # Bug-check: Isn't there any work still in progress? We can't work on
+        # different sites at a time!
+        if len(self.pending) > 0:
+            raise 'BUG: Can\'t start to work on %s; still working on %s' %
(site, self.pending)
+        # Prepare a list of suitable pages
+        result = []
+        for page in self.todo.filter(site):
+            self.pending.add(page)
+            result.append(page)
+
+        self.todo.removeSite(site)
+
+        # If there are any, return them. Otherwise, nothing is in progress.
+        return result
+
+    def makeForcedStop(self,counter):
+        """
+        Ends work on the page before the normal end.
+        """
+        for site, count in self.todo.siteCounts():
+            counter.minus(site, count)
+        self.todo = PageTree()
+        self.forcedStop = True
+
+    def addIfNew(self, page, counter, linkingPage):
+        """
+        Adds the pagelink given to the todo list, but only if we didn't know
+        it before. If it is added, update the counter accordingly.
+
+        Also remembers where we found the page, regardless of whether it had
+        already been found before or not.
+
+        Returns True if the page is new.
+        """
+        if self.forcedStop:
+            return False
+        # cannot check backlink before we have an origin page
+        if globalvar.nobackonly and self.originPage:
+            if page == self.originPage:
+                try:
+                    pywikibot.output(u"%s has a backlink from %s."
+                                     % (page, linkingPage))
+                except UnicodeDecodeError:
+                    pywikibot.output(u"Found a backlink for a page.")
+                self.makeForcedStop(counter)
+                return False
+
+        if page in self.foundIn:
+            # not new
+            self.foundIn[page].append(linkingPage)
+            return False
+        else:
+            if globalvar.contentsondisk:
+                page = StoredPage(page)
+            self.foundIn[page] = [linkingPage]
+            self.todo.add(page)
+            counter.plus(page.site)
+            return True
+
+    def skipPage(self, page, target, counter):
+        return self.isIgnored(target) or \
+            self.namespaceMismatch(page, target, counter) or \
+            self.wiktionaryMismatch(target)
+
+    def namespaceMismatch(self, linkingPage, linkedPage, counter):
+        """
+        Checks whether or not the given page has another namespace
+        than the origin page.
+
+        Returns True if the namespaces are different and the user
+        has selected not to follow the linked page.
+        """
+        if linkedPage in self.foundIn:
+            # We have seen this page before, don't ask again.
+            return False
+        elif self.originPage and self.originPage.namespace() != linkedPage.namespace():
+            # Allow for a mapping between different namespaces
+            crossFrom =
self.originPage.site.family.crossnamespace.get(self.originPage.namespace(), {})
+            crossTo = crossFrom.get(self.originPage.site.language(),
crossFrom.get('_default', {}))
+            nsmatch = crossTo.get(linkedPage.site.language(),
crossTo.get('_default', []))
+            if linkedPage.namespace() in nsmatch:
+                return False
+            if globalvar.autonomous:
+                pywikibot.output(u"NOTE: Ignoring link from page %s in namespace %i
to page %s in namespace %i."
+                                 % (linkingPage, linkingPage.namespace(),
+                                    linkedPage, linkedPage.namespace()))
+                # Fill up foundIn, so that we will not write this notice
+                self.foundIn[linkedPage] = [linkingPage]
+                return True
+            else:
+                preferredPage = self.getFoundInCorrectNamespace(linkedPage.site)
+                if preferredPage:
+                    pywikibot.output(u"NOTE: Ignoring link from page %s in namespace
%i to page %s in namespace %i because page %s in the correct namespace has already been
found."
+                                     % (linkingPage, linkingPage.namespace(),
linkedPage,
+                                        linkedPage.namespace(), preferredPage))
+                    return True
+                else:
+                    choice = pywikibot.inputChoice(
+u'WARNING: %s is in namespace %i, but %s is in namespace %i. Follow it anyway?'
+                        % (self.originPage, self.originPage.namespace(),
+                           linkedPage, linkedPage.namespace()),
+                        ['Yes', 'No', 'Add an alternative',
'give up'],
+                        ['y', 'n', 'a', 'g'])
+                    if choice != 'y':
+                        # Fill up foundIn, so that we will not ask again
+                        self.foundIn[linkedPage] = [linkingPage]
+                        if choice == 'g':
+                            self.makeForcedStop(counter)
+                        elif choice == 'a':
+                            newHint = pywikibot.input(u'Give the alternative for
language %s, not using a language code:'
+                                                      % linkedPage.site.language())
+                            if newHint:
+                                alternativePage = pywikibot.Page(linkedPage.site,
newHint)
+                                if alternativePage:
+                                    # add the page that was entered by the user
+                                    self.addIfNew(alternativePage, counter, None)
+                        else:
+                            pywikibot.output(
+                                u"NOTE: ignoring %s and its interwiki links"
+                                % linkedPage)
+                        return True
+        else:
+            # same namespaces, no problem
+            # or no origin page yet, also no problem
+            return False
+
+    def wiktionaryMismatch(self, page):
+        if self.originPage and globalvar.same=='wiktionary':
+            if page.title().lower() != self.originPage.title().lower():
+                pywikibot.output(u"NOTE: Ignoring %s for %s in wiktionary mode"
% (page, self.originPage))
+                return True
+            elif page.title() != self.originPage.title() and
self.originPage.site.nocapitalize and page.site.nocapitalize:
+                pywikibot.output(u"NOTE: Ignoring %s for %s in wiktionary mode
because both languages are uncapitalized."
+                                 % (page, self.originPage))
+                return True
+        return False
+
+    def disambigMismatch(self, page, counter):
+        """
+        Checks whether or not the given page has the another disambiguation
+        status than the origin page.
+
+        Returns a tuple (skip, alternativePage).
+
+        skip is True if the pages have mismatching statuses and the bot
+        is either in autonomous mode, or the user chose not to use the
+        given page.
+
+        alternativePage is either None, or a page that the user has
+        chosen to use instead of the given page.
+        """
+        if not self.originPage:
+            return (False, None) # any page matches until we have an origin page
+        if globalvar.autonomous:
+            if self.originPage.isDisambig() and not page.isDisambig():
+                pywikibot.output(u"NOTE: Ignoring link from disambiguation page %s
to non-disambiguation %s"
+                                 % (self.originPage, page))
+                return (True, None)
+            elif not self.originPage.isDisambig() and page.isDisambig():
+                pywikibot.output(u"NOTE: Ignoring link from non-disambiguation page
%s to disambiguation %s"
+                                 % (self.originPage, page))
+                return (True, None)
+        else:
+            choice = 'y'
+            if self.originPage.isDisambig() and not page.isDisambig():
+                disambig = self.getFoundDisambig(page.site)
+                if disambig:
+                    pywikibot.output(
+                        u"NOTE: Ignoring non-disambiguation page %s for %s because
disambiguation page %s has already been found."
+                        % (page, self.originPage, disambig))
+                    return (True, None)
+                else:
+                    choice = pywikibot.inputChoice(
+                        u'WARNING: %s is a disambiguation page, but %s doesn\'t
seem to be one. Follow it anyway?'
+                        % (self.originPage, page),
+                        ['Yes', 'No', 'Add an alternative',
'Give up'],
+                        ['y', 'n', 'a', 'g'])
+            elif not self.originPage.isDisambig() and page.isDisambig():
+                nondisambig = self.getFoundNonDisambig(page.site)
+                if nondisambig:
+                    pywikibot.output(u"NOTE: Ignoring disambiguation page %s for %s
because non-disambiguation page %s has already been found."
+                                     % (page, self.originPage, nondisambig))
+                    return (True, None)
+                else:
+                    choice = pywikibot.inputChoice(
+                        u'WARNING: %s doesn\'t seem to be a disambiguation page,
but %s is one. Follow it anyway?'
+                        % (self.originPage, page),
+                        ['Yes', 'No', 'Add an alternative',
'Give up'],
+                        ['y', 'n', 'a', 'g'])
+            if choice == 'n':
+                return (True, None)
+            elif choice == 'a':
+                newHint = pywikibot.input(u'Give the alternative for language %s, not
using a language code:'
+                                          % page.site.language())
+                alternativePage = pywikibot.Page(page.site, newHint)
+                return (True, alternativePage)
+            elif choice == 'g':
+                self.makeForcedStop(counter)
+                return (True, None)
+        # We can follow the page.
+        return (False, None)
+
+    def isIgnored(self, page):
+        if page.site.language() in globalvar.neverlink:
+            pywikibot.output(u"Skipping link %s to an ignored language" %
page)
+            return True
+        if page in globalvar.ignore:
+            pywikibot.output(u"Skipping link %s to an ignored page" % page)
+            return True
+        return False
+
+    def reportInterwikilessPage(self, page):
+        if not globalvar.quiet or pywikibot.verbose:
+            pywikibot.output(u"NOTE: %s does not have any interwiki links"
+                             % self.originPage)
+        if config.without_interwiki:
+            f = codecs.open(
+                pywikibot.config.datafilepath('without_interwiki.txt'),
+                'a', 'utf-8')
+            f.write(u"# %s \n" % page)
+            f.close()
+
+    def askForHints(self, counter):
+        if not self.workonme:
+            # Do not ask hints for pages that we don't work on anyway
+            return
+        if (self.untranslated or globalvar.askhints) and not self.hintsAsked \
+           and self.originPage and self.originPage.exists() \
+           and not self.originPage.isRedirectPage() and not
self.originPage.isCategoryRedirect():
+            # Only once!
+            self.hintsAsked = True
+            if globalvar.untranslated:
+                newhint = None
+                t = globalvar.showtextlink
+                if t:
+                    pywikibot.output(self.originPage.get()[:t])
+                # loop
+                while True:
+                    newhint = pywikibot.input(u'Give a hint (? to see
pagetext):')
+                    if newhint == '?':
+                        t += globalvar.showtextlinkadd
+                        pywikibot.output(self.originPage.get()[:t])
+                    elif newhint and not ':' in newhint:
+                        pywikibot.output(u'Please enter a hint in the format
language:pagename or type nothing if you do not have a hint.')
+                    elif not newhint:
+                        break
+                    else:
+                        pages = titletranslate.translate(self.originPage,
hints=[newhint],
+                                   auto = globalvar.auto,
removebrackets=globalvar.hintnobracket)
+                        for page in pages:
+                            self.addIfNew(page, counter, None)
+                            if globalvar.hintsareright:
+                                self.hintedsites.add(page.site)
+
+    def batchLoaded(self, counter):
+        """
+        This is called by a worker to tell us that the promised batch of
+        pages was loaded.
+        In other words, all the pages in self.pending have already
+        been preloaded.
+
+        The only argument is an instance
+        of a counter class, that has methods minus() and plus() to keep
+        counts of the total work todo.
+        """
+        # Loop over all the pages that should have been taken care of
+        for page in self.pending:
+            # Mark the page as done
+            self.done.add(page)
+
+            # make sure that none of the linked items is an auto item
+            if globalvar.skipauto:
+                dictName, year = page.autoFormat()
+                if dictName is not None:
+                    if self.originPage:
+                        pywikibot.output(u'WARNING: %s:%s relates to %s:%s, which is
an auto entry %s(%s)'
+                                         % (self.originPage.site.language(),
self.originPage,
+                                            page.site.language(), page, dictName, year))
+
+                    # Abort processing if the bot is running in autonomous mode.
+                    if globalvar.autonomous:
+                        self.makeForcedStop(counter)
+
+            # Register this fact at the todo-counter.
+            counter.minus(page.site)
+
+            # Now check whether any interwiki links should be added to the
+            # todo list.
+
+            if not page.exists():
+                globalvar.remove.append(unicode(page))
+                if not globalvar.quiet or pywikibot.verbose:
+                    pywikibot.output(u"NOTE: %s does not exist. Skipping."
+                                     % page)
+                if page == self.originPage:
+                    # The page we are working on is the page that does not exist.
+                    # No use in doing any work on it in that case.
+                    for site, count in self.todo.siteCounts():
+                        counter.minus(site, count)
+                    self.todo = PageTree()
+                    # In some rare cases it might be we already did check some
'automatic' links
+                    self.done = PageTree()
+                continue
+
+            elif page.isRedirectPage() or page.isCategoryRedirect():
+                if page.isRedirectPage():
+                    redir = u''
+                else:
+                    redir = u'category '
+                try:
+                    if page.isRedirectPage():
+                        redirectTargetPage = page.getRedirectTarget()
+                    else:
+                        redirectTargetPage = page.getCategoryRedirectTarget()
+                except pywikibot.InvalidTitle:
+                    # MW considers #redirect [[en:#foo]] as a redirect page,
+                    # but we can't do anything useful with such pages
+                    if not globalvar.quiet or pywikibot.verbose:
+                        pywikibot.output(
+                            u"NOTE: %s redirects to an invalid title" % page)
+                    continue
+                if not globalvar.quiet or pywikibot.verbose:
+                    pywikibot.output(u"NOTE: %s is %sredirect to %s"
+                                     % (page, redir, redirectTargetPage))
+                if self.originPage is None or page == self.originPage:
+                    # the 1st existig page becomes the origin page, if none was supplied
+                    if globalvar.initialredirect:
+                        if globalvar.contentsondisk:
+                            redirectTargetPage = StoredPage(redirectTargetPage)
+                        # don't follow another redirect; it might be a self loop
+                        if not redirectTargetPage.isRedirectPage() \
+                           and not redirectTargetPage.isCategoryRedirect():
+                            self.originPage = redirectTargetPage
+                            self.todo.add(redirectTargetPage)
+                            counter.plus(redirectTargetPage.site)
+                    else:
+                        # This is a redirect page to the origin. We don't need to
+                        # follow the redirection.
+                        # In this case we can also stop all hints!
+                        for site, count in self.todo.siteCounts():
+                            counter.minus(site, count)
+                        self.todo = PageTree()
+                elif not globalvar.followredirect:
+                    if not globalvar.quiet or pywikibot.verbose:
+                        pywikibot.output(u"NOTE: not following %sredirects."
+                                         % redir)
+                elif page.isStaticRedirect():
+                    if not globalvar.quiet or pywikibot.verbose:
+                        pywikibot.output(
+                            u"NOTE: not following static %sredirects." %
redir)
+                elif page.site.family == redirectTargetPage.site.family \
+                    and not self.skipPage(page, redirectTargetPage, counter):
+                    if self.addIfNew(redirectTargetPage, counter, page):
+                        if config.interwiki_shownew or pywikibot.verbose:
+                            pywikibot.output(u"%s: %s gives new %sredirect %s"
+                                             % (self.originPage, page, redir,
+                                                redirectTargetPage))
+                continue
+
+            # must be behind the page.isRedirectPage() part
+            # otherwise a redirect error would be raised
+            elif page.isEmpty() and not page.isCategory():
+                globalvar.remove.append(unicode(page))
+                if not globalvar.quiet or pywikibot.verbose:
+                    pywikibot.output(u"NOTE: %s is empty. Skipping." % page)
+                if page == self.originPage:
+                    for site, count in self.todo.siteCounts():
+                        counter.minus(site, count)
+                    self.todo = PageTree()
+                    self.done = PageTree()
+                    self.originPage = None
+                continue
+
+            elif page.section():
+                if not globalvar.quiet or pywikibot.verbose:
+                    pywikibot.output(u"NOTE: %s is a page section. Skipping."
+                                     % page)
+                continue
+
+            # Page exists, isnt a redirect, and is a plain link (no section)
+            if self.originPage is None:
+                # the 1st existig page becomes the origin page, if none was supplied
+                self.originPage = page
+            try:
+                iw = page.interwiki()
+            except pywikibot.NoSuchSite:
+                if not globalvar.quiet or pywikibot.verbose:
+                    pywikibot.output(u"NOTE: site %s does not exist" %
page.site())
+                continue
+
+            (skip, alternativePage) = self.disambigMismatch(page, counter)
+            if skip:
+                pywikibot.output(u"NOTE: ignoring %s and its interwiki links"
+                                 % page)
+                self.done.remove(page)
+                iw = ()
+                if alternativePage:
+                    # add the page that was entered by the user
+                    self.addIfNew(alternativePage, counter, None)
+
+            duplicate = None
+            for p in self.done.filter(page.site):
+                if p != page and p.exists() and not p.isRedirectPage() and not
p.isCategoryRedirect():
+                    duplicate = p
+                    break
+
+            if self.originPage == page:
+                self.untranslated = (len(iw) == 0)
+                if globalvar.untranslatedonly:
+                    # Ignore the interwiki links.
+                    iw = ()
+                if globalvar.lacklanguage:
+                    if globalvar.lacklanguage in [link.site.language() for link in iw]:
+                        iw = ()
+                        self.workonme = False
+                if len(iw) < globalvar.minlinks:
+                    iw = ()
+                    self.workonme = False
+
+            elif globalvar.autonomous and duplicate and not skip:
+                pywikibot.output(u"Stopping work on %s because duplicate
pages"\
+                    " %s and %s are found" % (self.originPage, duplicate,
page))
+                self.makeForcedStop(counter)
+                try:
+                    f = codecs.open(
+                           
pywikibot.config.datafilepath('autonomous_problems.dat'),
+                            'a', 'utf-8')
+                    f.write(u"* %s {Found more than one link for %s}"
+                            % (self.originPage, page.site))
+                    if config.interwiki_graph and config.interwiki_graph_url:
+                        filename = interwiki_graph.getFilename(self.originPage, extension
= config.interwiki_graph_formats[0])
+                        f.write(u" [%s%s graph]" % (config.interwiki_graph_url,
filename))
+                    f.write("\n")
+                    f.close()
+                # FIXME: What errors are we catching here?
+                # except: should be avoided!!
+                except:
+                   #raise
+                   pywikibot.output(u'File autonomous_problems.dat open or corrupted!
Try again with -restore.')
+                   sys.exit()
+                iw = ()
+            elif page.isEmpty() and not page.isCategory():
+                globalvar.remove.append(unicode(page))
+                if not globalvar.quiet or pywikibot.verbose:
+                    pywikibot.output(u"NOTE: %s is empty; ignoring it and its
interwiki links"
+                                     % page)
+                # Ignore the interwiki links
+                self.done.remove(page)
+                iw = ()
+
+            for linkedPage in iw:
+                if globalvar.hintsareright:
+                    if linkedPage.site in self.hintedsites:
+                        pywikibot.output(u"NOTE: %s: %s extra interwiki on hinted
site ignored %s"
+                                         % (self.originPage, page, linkedPage))
+                        break
+                if not self.skipPage(page, linkedPage, counter):
+                    if globalvar.followinterwiki or page == self.originPage:
+                        if self.addIfNew(linkedPage, counter, page):
+                            # It is new. Also verify whether it is the second on the
+                            # same site
+                            lpsite=linkedPage.site
+                            for prevPage in self.foundIn:
+                                if prevPage != linkedPage and prevPage.site == lpsite:
+                                    # Still, this could be "no problem" as
either may be a
+                                    # redirect to the other. No way to find out quickly!
+                                    pywikibot.output(u"NOTE: %s: %s gives duplicate
interwiki on same site %s"
+                                                     % (self.originPage, page,
+                                                        linkedPage))
+                                    break
+                            else:
+                                if config.interwiki_shownew or pywikibot.verbose:
+                                    pywikibot.output(u"%s: %s gives new interwiki
%s"
+                                                     % (self.originPage,
+                                                        page, linkedPage))
+                if self.forcedStop:
+                    break
+        # These pages are no longer 'in progress'
+        self.pending = PageTree()
+        # Check whether we need hints and the user offered to give them
+        if self.untranslated and not self.hintsAsked:
+            self.reportInterwikilessPage(page)
+        self.askForHints(counter)
+
+    def isDone(self):
+        """Return True if all the work for this subject has
completed."""
+        return len(self.todo) == 0
+
+    def problem(self, txt, createneed = True):
+        """Report a problem with the resolution of this
subject."""
+        pywikibot.output(u"ERROR: %s" % txt)
+        self.confirm = True
+        if createneed:
+            self.problemfound = True
+
+    def whereReport(self, page, indent=4):
+        for page2 in sorted(self.foundIn[page]):
+            if page2 is None:
+                pywikibot.output(u" "*indent + "Given as a hint.")
+            else:
+                pywikibot.output(u" "*indent + unicode(page2))
+
+
+    def assemble(self):
+        # No errors have been seen so far, except....
+        errorCount = self.problemfound
+        mysite = pywikibot.getSite()
+        # Build up a dictionary of all pages found, with the site as key.
+        # Each value will be a list of pages.
+        new = {}
+        for page in self.done:
+            if page.exists() and not page.isRedirectPage() and not
page.isCategoryRedirect():
+                site = page.site
+                if site.family.interwiki_forward:
+                    #TODO: allow these cases to be propagated!
+                    continue # inhibit the forwarding families pages to be updated.
+                if site == self.originPage.site:
+                    if page != self.originPage:
+                        self.problem(u"Found link to %s" % page)
+                        self.whereReport(page)
+                        errorCount += 1
+                else:
+                    if site in new:
+                        new[site].append(page)
+                    else:
+                        new[site] = [page]
+        # See if new{} contains any problematic values
+        result = {}
+        for site, pages in new.iteritems():
+            if len(pages) > 1:
+                errorCount += 1
+                self.problem(u"Found more than one link for %s" % site)
+
+        if not errorCount and not globalvar.select:
+            # no errors, so all lists have only one item
+            for site, pages in new.iteritems():
+                result[site] = pages[0]
+            return result
+
+        # There are any errors.
+        if config.interwiki_graph:
+            graphDrawer = interwiki_graph.GraphDrawer(self)
+            graphDrawer.createGraph()
+
+        # We don't need to continue with the rest if we're in autonomous
+        # mode.
+        if globalvar.autonomous:
+            return None
+
+        # First loop over the ones that have more solutions
+        for site, pages in new.iteritems():
+            if len(pages) > 1:
+                pywikibot.output(u"=" * 30)
+                pywikibot.output(u"Links to %s" % site)
+                i = 0
+                for page2 in pages:
+                    i += 1
+                    pywikibot.output(u"  (%d) Found link to %s in:"
+                                     % (i, page2))
+                    self.whereReport(page2, indent = 8)
+                while True:
+                    #TODO: allow answer to repeat previous or go back after a mistake
+                    answer = pywikibot.input(u"Which variant should be used?
(<number>, [n]one, [g]ive up) ").lower()
+                    if answer:
+                        if answer == 'g':
+                            return None
+                        elif answer == 'n':
+                            # None acceptable
+                            break
+                        elif answer.isdigit():
+                            answer = int(answer)
+                            try:
+                                result[site] = pages[answer - 1]
+                            except IndexError:
+                                # user input is out of range
+                                pass
+                            else:
+                                break
+        # Loop over the ones that have one solution, so are in principle
+        # not a problem.
+        acceptall = False
+        for site, pages in new.iteritems():
+            if len(pages) == 1:
+                if not acceptall:
+                    pywikibot.output(u"=" * 30)
+                    page2 = pages[0]
+                    pywikibot.output(u"Found link to %s in:" % page2)
+                    self.whereReport(page2, indent = 4)
+                while True:
+                    if acceptall:
+                        answer = 'a'
+                    else:
+                        #TODO: allow answer to repeat previous or go back after a
mistake
+                        answer = pywikibot.inputChoice(u'What should be done?',
['accept', 'reject', 'give up', 'accept all'],
['a', 'r', 'g', 'l'], 'a')
+                    if answer == 'l': # accept all
+                        acceptall = True
+                        answer = 'a'
+                    if answer == 'a': # accept this one
+                        result[site] = pages[0]
+                        break
+                    elif answer == 'g': # give up
+                        return None
+                    elif answer == 'r': # reject
+                        # None acceptable
+                        break
+        return result
+
+    def finish(self, bot = None):
+        """Round up the subject, making any necessary changes. This
method
+           should be called exactly once after the todo list has gone empty.
+
+           This contains a shortcut: if a subject list is given in the argument
+           bot, just before submitting a page change to the live wiki it is
+           checked whether we will have to wait. If that is the case, the bot will
+           be told to make another get request first."""
+
+        #from clean_sandbox
+        def minutesDiff(time1, time2):
+            if type(time1) is long:
+                time1 = str(time1)
+            if type(time2) is long:
+                time2 = str(time2)
+            t1 = (((int(time1[0:4]) * 12 + int(time1[4:6])) * 30 +
+                   int(time1[6:8])) * 24 + int(time1[8:10])) * 60 + \
+                   int(time1[10:12])
+            t2 = (((int(time2[0:4]) * 12 + int(time2[4:6])) * 30 +
+                   int(time2[6:8])) * 24 + int(time2[8:10])) * 60 + \
+                   int(time2[10:12])
+            return abs(t2-t1)
+
+        if not self.isDone():
+            raise "Bugcheck: finish called before done"
+        if not self.workonme:
+            return
+        if self.originPage:
+            if self.originPage.isRedirectPage():
+                return
+            if self.originPage.isCategoryRedirect():
+                return
+        else:
+            return
+        if not self.untranslated and globalvar.untranslatedonly:
+            return
+        if self.forcedStop: # autonomous with problem
+            pywikibot.output(u"======Aborted processing %s======"
+                             % self.originPage)
+            return
+        # The following check is not always correct and thus disabled.
+        # self.done might contain no interwiki links because of the -neverlink
+        # argument or because of disambiguation conflicts.
+#         if len(self.done) == 1:
+#             # No interwiki at all
+#             return
+        pywikibot.output(u"======Post-processing %s======" % self.originPage)
+        # Assemble list of accepted interwiki links
+        new = self.assemble()
+        if new is None: # User said give up
+            pywikibot.output(u"======Aborted processing %s======"
+                             % self.originPage)
+            return
+
+        # Make sure new contains every page link, including the page we are processing
+        # TODO: should be move to assemble()
+        # replaceLinks will skip the site it's working on.
+        if self.originPage.site not in new:
+            #TODO: make this possible as well.
+            if not self.originPage.site.family.interwiki_forward:
+                new[self.originPage.site] = self.originPage
+
+        #self.replaceLinks(self.originPage, new, True, bot)
+
+        updatedSites = []
+        notUpdatedSites = []
+        # Process all languages here
+        globalvar.always = False
+        if globalvar.limittwo:
+            lclSite = self.originPage.site
+            lclSiteDone = False
+            frgnSiteDone = False
+
+            for siteCode in lclSite.family.languages_by_size:
+                site = pywikibot.getSite(code = siteCode)
+                if (not lclSiteDone and site == lclSite) or \
+                   (not frgnSiteDone and site != lclSite and site in new):
+                    if site == lclSite:
+                        lclSiteDone = True   # even if we fail the update
+                    if site.family.name in config.usernames and site.lang in
config.usernames[site.family.name]:
+                        try:
+                            if self.replaceLinks(new[site], new, bot):
+                                updatedSites.append(site)
+                            if site != lclSite:
+                                 frgnSiteDone = True
+                        except SaveError:
+                            notUpdatedSites.append(site)
+                        except GiveUpOnPage:
+                            break
+                elif not globalvar.strictlimittwo and site in new \
+                     and site != lclSite:
+                    old={}
+                    try:
+                        for page in new[site].interwiki():
+                            old[page.site] = page
+                    except pywikibot.NoPage:
+                        pywikibot.output(u"BUG>>> %s no longer
exists?"
+                                         % new[site])
+                        continue
+                    mods, mcomment, adding, removing, modifying \
+                          = compareLanguages(old, new, insite = lclSite)
+                    if (len(removing) > 0 and not globalvar.autonomous) or \
+                       (len(modifying) > 0 and self.problemfound) or \
+                       len(old) == 0 or \
+                       (globalvar.needlimit and \
+                        len(adding) + len(modifying) >= globalvar.needlimit +1):
+                        try:
+                            if self.replaceLinks(new[site], new, bot):
+                                updatedSites.append(site)
+                        except SaveError:
+                            notUpdatedSites.append(site)
+                        except pywikibot.NoUsername:
+                            pass
+                        except GiveUpOnPage:
+                            break
+        else:
+            for (site, page) in new.iteritems():
+                # edit restriction on is-wiki
+                # http://is.wikipedia.org/wiki/Wikipediaspjall:V%C3%A9lmenni
+                # allow edits for the same conditions as -whenneeded
+                # or the last edit wasn't a bot
+                # or the last edit was 1 month ago
+                smallWikiAllowed = True
+                if globalvar.autonomous and page.site.sitename() ==
'wikipedia:is':
+                    old={}
+                    try:
+                        for mypage in new[page.site].interwiki():
+                            old[mypage.site] = mypage
+                    except pywikibot.NoPage:
+                        pywikibot.output(u"BUG>>> %s no longer
exists?"
+                                         % new[site])
+                        continue
+                    mods, mcomment, adding, removing, modifying \
+                          = compareLanguages(old, new, insite=site)
+                    #cannot create userlib.User with IP
+                    smallWikiAllowed = page.isIpEdit() or \
+                                       len(removing) > 0 or len(old) == 0 or \
+                                       len(adding) + len(modifying) > 2 or \
+                                       len(removing) + len(modifying) == 0 and \
+                                       adding == [page.site]
+                    if not smallWikiAllowed:
+                        import userlib
+                        user = userlib.User(page.site, page.userName())
+                        if not 'bot' in user.groups() \
+                           and not 'bot' in page.userName().lower(): #erstmal
auch keine namen mit bot
+                            smallWikiAllowed = True
+                        else:
+                            diff = minutesDiff(page.editTime(),
+                                               time.strftime("%Y%m%d%H%M%S",
+                                                             time.gmtime()))
+                            if diff > 30*24*60:
+                                smallWikiAllowed = True
+                            else:
+                                pywikibot.output(
+u'NOTE: number of edits are restricted at %s'
+                                    % page.site.sitename())
+
+                # if we have an account for this site
+                if site.family.name in config.usernames \
+                   and site.lang in config.usernames[site.family.name] \
+                   and smallWikiAllowed:
+                    # Try to do the changes
+                    try:
+                        if self.replaceLinks(page, new, bot):
+                            # Page was changed
+                            updatedSites.append(site)
+                    except SaveError:
+                        notUpdatedSites.append(site)
+                    except GiveUpOnPage:
+                        break
+
+        # disabled graph drawing for minor problems: it just takes too long
+        #if notUpdatedSites != [] and config.interwiki_graph:
+        #    # at least one site was not updated, save a conflict graph
+        #    self.createGraph()
+
+        # don't report backlinks for pages we already changed
+        if config.interwiki_backlink:
+            self.reportBacklinks(new, updatedSites)
+
+    def clean(self):
+        """
+        Delete the contents that are stored on disk for this Subject.
+
+        We cannot afford to define this in a StoredPage destructor because
+        StoredPage instances can get referenced cyclicly: that would stop the
+        garbage collector from destroying some of those objects.
+
+        It's also not necessary to set these lines as a Subject destructor:
+        deleting all stored content one entry by one entry when bailing out
+        after a KeyboardInterrupt for example is redundant, because the
+        whole storage file will be eventually removed.
+        """
+        if globalvar.contentsondisk:
+            for page in self.foundIn:
+                # foundIn can contain either Page or StoredPage objects
+                # calling the destructor on _contents will delete the
+                # disk records if necessary
+                if hasattr(page, '_contents'):
+                    del page._contents
+
+    def replaceLinks(self, page, newPages, bot):
+        """
+        Returns True if saving was successful.
+        """
+        if globalvar.localonly:
+            # In this case only continue on the Page we started with
+            if page != self.originPage:
+                raise SaveError(u'-localonly and page != originPage')
+        if page.section():
+            # This is not a page, but a subpage. Do not edit it.
+            pywikibot.output(u"Not editing %s: not doing interwiki on
subpages"
+                             % page)
+            raise SaveError(u'Link has a #section')
+        try:
+            pagetext = page.get()
+        except pywikibot.NoPage:
+            pywikibot.output(u"Not editing %s: page does not exist" % page)
+            raise SaveError(u'Page doesn\'t exist')
+        if page.isEmpty() and not page.isCategory():
+            pywikibot.output(u"Not editing %s: page is empty" % page)
+            raise SaveError
+
+        # clone original newPages dictionary, so that we can modify it to the
+        # local page's needs
+        new = dict(newPages)
+        interwikis = page.interwiki()
+
+        # remove interwiki links to ignore
+        for iw in re.finditer('<!-- *\[\[(.*?:.*?)\]\] *-->', pagetext):
+            try:
+                ignorepage = pywikibot.Page(page.site, iw.groups()[0])
+            except (pywikibot.NoSuchSite, pywikibot.InvalidTitle):
+                continue
+            try:
+                if (new[ignorepage.site] == ignorepage) and \
+                   (ignorepage.site != page.site):
+                    if (ignorepage not in interwikis):
+                        pywikibot.output(
+                            u"Ignoring link to %(to)s for %(from)s"
+                            % {'to': ignorepage,
+                               'from': page})
+                        new.pop(ignorepage.site)
+                    else:
+                        pywikibot.output(
+                            u"NOTE: Not removing interwiki from %(from)s to %(to)s
(exists both commented and non-commented)"
+                            % {'to': ignorepage,
+                               'from': page})
+            except KeyError:
+                pass
+
+        # sanity check - the page we are fixing must be the only one for that
+        # site.
+        pltmp = new[page.site]
+        if pltmp != page:
+            s = u"None"
+            if pltmp is not None: s = pltmp
+            pywikibot.output(
+                u"BUG>>> %s is not in the list of new links! Found
%s."
+                % (page, s))
+            raise SaveError(u'BUG: sanity check failed')
+
+        # Avoid adding an iw link back to itself
+        del new[page.site]
+        # Do not add interwiki links to foreign families that page.site() does not
forward to
+        for stmp in new.keys():
+            if stmp.family != page.site.family:
+                if stmp.family.name != page.site.family.interwiki_forward:
+                    del new[stmp]
+
+        # Put interwiki links into a map
+        old={}
+        for page2 in interwikis:
+            old[page2.site] = page2
+
+        # Check what needs to get done
+        mods, mcomment, adding, removing, modifying = compareLanguages(old,
+                                                                       new,
+                                                                       insite=page.site)
+
+        # When running in autonomous mode without -force switch, make sure we
+        # don't remove any items, but allow addition of the new ones
+        if globalvar.autonomous and (not globalvar.force or
+                                     pywikibot.unicode_error
+                                     ) and len(removing) > 0:
+            for rmsite in removing:
+                # Sometimes sites have an erroneous link to itself as an
+                # interwiki
+                if rmsite == page.site:
+                    continue
+                rmPage = old[rmsite]
+                #put it to new means don't delete it
+                if not globalvar.cleanup and not globalvar.force or \
+                   globalvar.cleanup and \
+                   unicode(rmPage) not in globalvar.remove or \
+                   rmPage.site.lang in ['hak', 'hi', 'cdo'] and
\
+                   pywikibot.unicode_error: #work-arround for bug #3081100 (do not remove
affected pages)
+                    new[rmsite] = rmPage
+                    pywikibot.output(
+                        u"WARNING: %s is either deleted or has a mismatching
disambiguation state."
+                        % rmPage)
+            # Re-Check what needs to get done
+            mods, mcomment, adding, removing, modifying = compareLanguages(old,
+                                                                           new,
+                                                                          
insite=page.site)
+        if not mods:
+            if not globalvar.quiet or pywikibot.verbose:
+                pywikibot.output(u'No changes needed on page %s' % page)
+            return False
+
+        # Show a message in purple.
+        pywikibot.output(
+            u"\03{lightpurple}Updating links on page %s.\03{default}" % page)
+        pywikibot.output(u"Changes to be made: %s" % mods)
+        oldtext = page.get()
+        template = (page.namespace() == 10)
+        newtext = pywikibot.replaceLanguageLinks(oldtext, new,
+                                                 site=page.site,
+                                                 template=template)
+        # This is for now. Later there should be different funktions for each
+        # kind
+        if not botMayEdit(page):
+            if template:
+                pywikibot.output(
+                    u'SKIPPING: %s should have interwiki links on subpage.'
+                    % page)
+            else:
+                pywikibot.output(
+                    u'SKIPPING: %s is under construction or to be deleted.'
+                    % page)
+            return False
+        if newtext == oldtext:
+            return False
+        pywikibot.showDiff(oldtext, newtext)
+
+        # pywikibot.output(u"NOTE: Replace %s" % page)
+        # Determine whether we need permission to submit
+        ask = False
+
+        # Allow for special case of a self-pointing interwiki link
+        if removing and removing != [page.site]:
+            self.problem(u'Found incorrect link to %s in %s'
+                         % (", ".join([x.lang for x in removing]), page),
+                         createneed=False)
+            if pywikibot.unicode_error:
+                for x in removing:
+                    if x.lang in ['hi', 'cdo']:
+                        pywikibot.output(
+u'\03{lightred}WARNING: This may be false positive due to unicode bug
#3081100\03{default}')
+                        break
+            ask = True
+        if globalvar.force or globalvar.cleanup:
+            ask = False
+        if globalvar.confirm and not globalvar.always:
+            ask = True
+        # If we need to ask, do so
+        if ask:
+            if globalvar.autonomous:
+                # If we cannot ask, deny permission
+                answer = 'n'
+            else:
+                answer = pywikibot.inputChoice(u'Submit?',
+                                               ['Yes', 'No', 'open in
Browser',
+                                                'Give up', 'Always'],
+                                               ['y', 'n', 'b',
'g', 'a'])
+                if answer == 'b':
+                    webbrowser.open("http://%s%s" % (
+                        page.site.hostname(),
+                        page.site.nice_get_address(page.title())
+                    ))
+                    pywikibot.input(u"Press Enter when finished in browser.")
+                    return True
+                elif answer == 'a':
+                    # don't ask for the rest of this subject
+                    globalvar.always = True
+                    answer = 'y'
+        else:
+            # If we do not need to ask, allow
+            answer = 'y'
+        # If we got permission to submit, do so
+        if answer == 'y':
+            # Check whether we will have to wait for pywikibot. If so, make
+            # another get-query first.
+            if bot:
+                while pywikibot.get_throttle.waittime() + 2.0 <
pywikibot.put_throttle.waittime():
+                    if not globalvar.quiet or pywikibot.verbose:
+                        pywikibot.output(
+                            u"NOTE: Performing a recursive query first to save
time....")
+                    qdone = bot.oneQuery()
+                    if not qdone:
+                        # Nothing more to do
+                        break
+            if not globalvar.quiet or pywikibot.verbose:
+                pywikibot.output(u"NOTE: Updating live wiki...")
+            timeout=60
+            while True:
+                try:
+                    if globalvar.async:
+                        page.put_async(newtext, comment=mcomment)
+                        status = 302
+                    else:
+                        status, reason, data = page.put(newtext, comment=mcomment)
+                except pywikibot.LockedPage:
+                    pywikibot.output(u'Page %s is locked. Skipping.' % page)
+                    raise SaveError(u'Locked')
+                except pywikibot.EditConflict:
+                    pywikibot.output(
+                        u'ERROR putting page: An edit conflict occurred. Giving
up.')
+                    raise SaveError(u'Edit conflict')
+                except (pywikibot.SpamfilterError), error:
+                    pywikibot.output(
+                        u'ERROR putting page: %s blacklisted by spamfilter. Giving
up.'
+                        % (error.url,))
+                    raise SaveError(u'Spam filter')
+                except (pywikibot.PageNotSaved), error:
+                    pywikibot.output(u'ERROR putting page: %s' % (error.args,))
+                    raise SaveError(u'PageNotSaved')
+                except (socket.error, IOError), error:
+                    if timeout>3600:
+                        raise
+                    pywikibot.output(u'ERROR putting page: %s' % (error.args,))
+                    pywikibot.output(u'Sleeping %i seconds before trying again.'
+                                     % (timeout,))
+                    timeout *= 2
+                    time.sleep(timeout)
+                except pywikibot.ServerError:
+                    if timeout > 3600:
+                        raise
+                    pywikibot.output(u'ERROR putting page: ServerError.')
+                    pywikibot.output(u'Sleeping %i seconds before trying again.'
+                                     % (timeout,))
+                    timeout *= 2
+                    time.sleep(timeout)
+                else:
+                    break
+            if str(status) == '302':
+                return True
+            else:
+                pywikibot.output(u'%s %s' % (status, reason))
+                return False
+        elif answer == 'g':
+            raise GiveUpOnPage(u'User asked us to give up')
+        else:
+            raise LinkMustBeRemoved(u'Found incorrect link to %s in %s'
+                                    % (", ".join([x.lang for x in removing]),
+                                       page))
+
+    def reportBacklinks(self, new, updatedSites):
+        """
+        Report missing back links. This will be called from finish() if needed.
+
+        updatedSites is a list that contains all sites we changed, to avoid
+        reporting of missing backlinks for pages we already fixed
+
+        """
+        # use sets because searching an element is faster than in lists
+        expectedPages = set(new.itervalues())
+        expectedSites = set(new)
+        try:
+            for site in expectedSites - set(updatedSites):
+                page = new[site]
+                if not page.section():
+                    try:
+                        linkedPages = set(page.interwiki())
+                    except pywikibot.NoPage:
+                        pywikibot.output(u"WARNING: Page %s does no longer
exist?!" % page)
+                        break
+                    # To speed things up, create a dictionary which maps sites to pages.
+                    # This assumes that there is only one interwiki link per language.
+                    linkedPagesDict = {}
+                    for linkedPage in linkedPages:
+                        linkedPagesDict[linkedPage.site] = linkedPage
+                    for expectedPage in expectedPages - linkedPages:
+                        if expectedPage != page:
+                            try:
+                                linkedPage = linkedPagesDict[expectedPage.site]
+                                pywikibot.output(
+                                    u"WARNING: %s: %s does not link to %s but to
%s"
+                                    % (page.site.family.name,
+                                       page, expectedPage, linkedPage))
+                            except KeyError:
+                                pywikibot.output(
+                                    u"WARNING: %s: %s does not link to %s"
+                                    % (page.site.family.name,
+                                       page, expectedPage))
+                    # Check for superfluous links
+                    for linkedPage in linkedPages:
+                        if linkedPage not in expectedPages:
+                            # Check whether there is an alternative page on that
language.
+                            # In this case, it was already reported above.
+                            if linkedPage.site not in expectedSites:
+                                pywikibot.output(
+                                    u"WARNING: %s: %s links to incorrect %s"
+                                    % (page.site.family.name,
+                                       page, linkedPage))
+        except (socket.error, IOError):
+            pywikibot.output(u'ERROR: could not report backlinks')
+
+class InterwikiBot(object):
+    """A class keeping track of a list of subjects, controlling which
pages
+       are queried from which languages when."""
+
+    def __init__(self):
+        """Constructor. We always start with empty
lists."""
+        self.subjects = []
+        # We count how many pages still need to be loaded per site.
+        # This allows us to find out from which site to retrieve pages next
+        # in a way that saves bandwidth.
+        # sites are keys, integers are values.
+        # Modify this only via plus() and minus()!
+        self.counts = {}
+        self.pageGenerator = None
+        self.generated = 0
+
+    def add(self, page, hints = None):
+        """Add a single subject to the list"""
+        subj = Subject(page, hints = hints)
+        self.subjects.append(subj)
+        for site, count in subj.openSites():
+            # Keep correct counters
+            self.plus(site, count)
+
+    def setPageGenerator(self, pageGenerator, number = None, until = None):
+        """Add a generator of subjects. Once the list of subjects gets
+           too small, this generator is called to produce more Pages"""
+        self.pageGenerator = pageGenerator
+        self.generateNumber = number
+        self.generateUntil = until
+
+    def dump(self, append = True):
+        site = pywikibot.getSite()
+        dumpfn = pywikibot.config.datafilepath(
+                     'interwiki-dumps',
+                     'interwikidump-%s-%s.txt' % (site.family.name, site.lang))
+        if append: mode = 'appended'
+        else: mode = 'written'
+        f = codecs.open(dumpfn, mode[0], 'utf-8')
+        for subj in self.subjects:
+            if subj.originPage:
+                f.write(subj.originPage.title(asLink=True)+'\n')
+        f.close()
+        pywikibot.output(u'Dump %s (%s) %s.' % (site.lang, site.family.name,
mode))
+        return dumpfn
+
+    def generateMore(self, number):
+        """Generate more subjects. This is called internally when the
+           list of subjects becomes too small, but only if there is a
+           PageGenerator"""
+        fs = self.firstSubject()
+        if fs and (not globalvar.quiet or pywikibot.verbose):
+            pywikibot.output(u"NOTE: The first unfinished subject is %s"
+                             % fs.originPage)
+        pywikibot.output(u"NOTE: Number of pages queued is %d, trying to add %d
more."
+                         % (len(self.subjects), number))
+        for i in xrange(number):
+            try:
+                while True:
+                    try:
+                        page = self.pageGenerator.next()
+                    except IOError:
+                        pywikibot.output(u'IOError occured; skipping')
+                        continue
+                    if page in globalvar.skip:
+                        pywikibot.output(u'Skipping: %s is in the skip list' %
page)
+                        continue
+                    if globalvar.skipauto:
+                        dictName, year = page.autoFormat()
+                        if dictName is not None:
+                            pywikibot.output(u'Skipping: %s is an auto entry
%s(%s)' % (page, dictName, year))
+                            continue
+                    if globalvar.parenthesesonly:
+                        # Only yield pages that have ( ) in titles
+                        if "(" not in page.title():
+                            continue
+                    if page.isTalkPage():
+                        pywikibot.output(u'Skipping: %s is a talk page' % page)
+                        continue
+                    #doesn't work: page must be preloaded for this test
+                    #if page.isEmpty():
+                    #    pywikibot.output(u'Skipping: %s is a empty page' %
page.title())
+                    #    continue
+                    if page.namespace() == 10:
+                        loc = None
+                        try:
+                            tmpl, loc = moved_links[page.site.lang]
+                            del tmpl
+                        except KeyError:
+                            pass
+                        if loc is not None and loc in page.title():
+                            pywikibot.output(u'Skipping: %s is a templates
subpage' % page.title())
+                            continue
+                    break
+
+                if self.generateUntil:
+                    until = self.generateUntil
+                    if page.site.lang not in page.site.family.nocapitalize:
+                        until = until[0].upper()+until[1:]
+                    if page.title(withNamespace=False) > until:
+                        raise StopIteration
+                self.add(page, hints = globalvar.hints)
+                self.generated += 1
+                if self.generateNumber:
+                    if self.generated >= self.generateNumber:
+                        raise StopIteration
+            except StopIteration:
+                self.pageGenerator = None
+                break
+
+    def firstSubject(self):
+        """Return the first subject that is still being worked
on"""
+        if self.subjects:
+            return self.subjects[0]
+
+    def maxOpenSite(self):
+        """Return the site that has the most
+           open queries plus the number. If there is nothing left, return
+           None. Only languages that are TODO for the first Subject
+           are returned."""
+        max = 0
+        maxlang = None
+        if not self.firstSubject():
+            return None
+        oc = dict(self.firstSubject().openSites())
+        if not oc:
+            # The first subject is done. This might be a recursive call made because we
+            # have to wait before submitting another modification to go live. Select
+            # any language from counts.
+            oc = self.counts
+        if pywikibot.getSite() in oc:
+            return pywikibot.getSite()
+        for lang in oc:
+            count = self.counts[lang]
+            if count > max:
+                max = count
+                maxlang = lang
+        return maxlang
+
+    def selectQuerySite(self):
+        """Select the site the next query should go out
for."""
+        # How many home-language queries we still have?
+        mycount = self.counts.get(pywikibot.getSite(), 0)
+        # Do we still have enough subjects to work on for which the
+        # home language has been retrieved? This is rough, because
+        # some subjects may need to retrieve a second home-language page!
+        if len(self.subjects) - mycount < globalvar.minsubjects:
+            # Can we make more home-language queries by adding subjects?
+            if self.pageGenerator and mycount < globalvar.maxquerysize:
+                timeout = 60
+                while timeout<3600:
+                    try:
+                        self.generateMore(globalvar.maxquerysize - mycount)
+                    except pywikibot.ServerError:
+                        # Could not extract allpages special page?
+                        pywikibot.output(u'ERROR: could not retrieve more pages. Will
try again in %d seconds'%timeout)
+                        time.sleep(timeout)
+                        timeout *= 2
+                    else:
+                        break
+            # If we have a few, getting the home language is a good thing.
+            if not globalvar.restoreAll:
+                try:
+                    if self.counts[pywikibot.getSite()] > 4:
+                        return pywikibot.getSite()
+                except KeyError:
+                    pass
+        # If getting the home language doesn't make sense, see how many
+        # foreign page queries we can find.
+        return self.maxOpenSite()
+
+    def oneQuery(self):
+        """
+        Perform one step in the solution process.
+
+        Returns True if pages could be preloaded, or false
+        otherwise.
+        """
+        # First find the best language to work on
+        site = self.selectQuerySite()
+        if site is None:
+            pywikibot.output(u"NOTE: Nothing left to do")
+            return False
+        # Now assemble a reasonable list of pages to get
+        subjectGroup = []
+        pageGroup = []
+        for subject in self.subjects:
+            # Promise the subject that we will work on the site.
+            # We will get a list of pages we can do.
+            pages = subject.whatsNextPageBatch(site)
+            if pages:
+                pageGroup.extend(pages)
+                subjectGroup.append(subject)
+                if len(pageGroup) >= globalvar.maxquerysize:
+                    # We have found enough pages to fill the bandwidth.
+                    break
+        if len(pageGroup) == 0:
+            pywikibot.output(u"NOTE: Nothing left to do 2")
+            return False
+        # Get the content of the assembled list in one blow
+        gen = pagegenerators.PreloadingGenerator(iter(pageGroup))
+        for page in gen:
+            # we don't want to do anything with them now. The
+            # page contents will be read via the Subject class.
+            pass
+        # Tell all of the subjects that the promised work is done
+        for subject in subjectGroup:
+            subject.batchLoaded(self)
+        return True
+
+    def queryStep(self):
+        self.oneQuery()
+        # Delete the ones that are done now.
+        for i in xrange(len(self.subjects)-1, -1, -1):
+            subj = self.subjects[i]
+            if subj.isDone():
+                subj.finish(self)
+                subj.clean()
+                del self.subjects[i]
+
+    def isDone(self):
+        """Check whether there is still more work to do"""
+        return len(self) == 0 and self.pageGenerator is None
+
+    def plus(self, site, count=1):
+        """This is a routine that the Subject class expects in a
counter"""
+        try:
+            self.counts[site] += count
+        except KeyError:
+            self.counts[site] = count
+
+    def minus(self, site, count=1):
+        """This is a routine that the Subject class expects in a
counter"""
+        self.counts[site] -= count
+
+    def run(self):
+        """Start the process until finished"""
+        while not self.isDone():
+            self.queryStep()
+
+    def __len__(self):
+        return len(self.subjects)
+
+def compareLanguages(old, new, insite):
+
+    oldiw = set(old)
+    newiw = set(new)
+
+    # sort by language code
+    adding = sorted(newiw - oldiw)
+    removing = sorted(oldiw - newiw)
+    modifying = sorted(site for site in oldiw & newiw if old[site] != new[site])
+
+    if not globalvar.summary and \
+       len(adding) + len(removing) + len(modifying) <= 3:
+        # Use an extended format for the string linking to all added pages.
+        fmt = lambda d, site: unicode(d[site])
+    else:
+        # Use short format, just the language code
+        fmt = lambda d, site: site.lang
+
+    mods = mcomment = u''
+
+    commentname = 'interwiki'
+    if adding:
+        commentname += '-adding'
+    if removing:
+        commentname += '-removing'
+    if modifying:
+        commentname += '-modifying'
+
+    if adding or removing or modifying:
+        #Version info marks bots without unicode error
+        #This also prevents abuse filter blocking on de-wiki
+        if not pywikibot.unicode_error:
+            mcomment += u'r%s) (' % sys.version.split()[0]
+
+        mcomment += globalvar.summary
+
+        changes = {'adding':    ', '.join([fmt(new, x) for x in
adding]),
+                   'removing':  ', '.join([fmt(old, x) for x in
removing]),
+                   'modifying': ', '.join([fmt(new, x) for x in
modifying])}
+
+        mcomment += i18n.twtranslate(insite.lang, commentname) % changes
+        mods = i18n.twtranslate('en', commentname) % changes
+
+    return mods, mcomment, adding, removing, modifying
+
+def botMayEdit (page):
+    tmpl = []
+    try:
+        tmpl, loc = moved_links[page.site.lang]
+    except KeyError:
+        pass
+    if type(tmpl) != list:
+        tmpl = [tmpl]
+    try:
+        tmpl += ignoreTemplates[page.site.lang]
+    except KeyError:
+        pass
+    tmpl += ignoreTemplates['_default']
+    if tmpl != []:
+        templates = page.templatesWithParams(get_redirect=True);
+        for template in templates:
+            if template[0].lower() in tmpl:
+                return False
+    return True
+
+def readWarnfile(filename, bot):
+    import warnfile
+    reader = warnfile.WarnfileReader(filename)
+    # we won't use removeHints
+    (hints, removeHints) = reader.getHints()
+    for page, pagelist in hints.iteritems():
+        # The WarnfileReader gives us a list of pagelinks, but titletranslate.py expects
a list of strings, so we convert it back.
+        # TODO: This is a quite ugly hack, in the future we should maybe make
titletranslate expect a list of pagelinks.
+        hintStrings = ['%s:%s' % (hintedPage.site.language(), hintedPage.title())
for hintedPage in pagelist]
+        bot.add(page, hints = hintStrings)
+
+def main():
+    singlePageTitle = []
+    opthintsonly = False
+    start = None
+    # Which namespaces should be processed?
+    # default to [] which means all namespaces will be processed
+    namespaces = []
+    number = None
+    until = None
+    warnfile = None
+    # a normal PageGenerator (which doesn't give hints, only Pages)
+    hintlessPageGen = None
+    optContinue = False
+    optRestore = False
+    restoredFiles = []
+    File2Restore  = []
+    dumpFileName = ''
+    append = True
+    newPages = None
+    # This factory is responsible for processing command line arguments
+    # that are also used by other scripts and that determine on which pages
+    # to work on.
+    genFactory = pagegenerators.GeneratorFactory()
+
+    for arg in pywikibot.handleArgs():
+        if globalvar.readOptions(arg):
+            continue
+        elif arg.startswith('-warnfile:'):
+            warnfile = arg[10:]
+        elif arg.startswith('-years'):
+            # Look if user gave a specific year at which to start
+            # Must be a natural number or negative integer.
+            if len(arg) > 7 and (arg[7:].isdigit() or (arg[7] == "-" and
arg[8:].isdigit())):
+                startyear = int(arg[7:])
+            else:
+                startyear = 1
+            # avoid problems where year pages link to centuries etc.
+            globalvar.followredirect = False
+            hintlessPageGen = pagegenerators.YearPageGenerator(startyear)
+        elif arg.startswith('-days'):
+            if len(arg) > 6 and arg[5] == ':' and arg[6:].isdigit():
+                # Looks as if the user gave a specific month at which to start
+                # Must be a natural number.
+                startMonth = int(arg[6:])
+            else:
+                startMonth = 1
+            hintlessPageGen = pagegenerators.DayPageGenerator(startMonth)
+        elif arg.startswith('-new'):
+            if len(arg) > 5 and arg[4] == ':' and arg[5:].isdigit():
+                # Looks as if the user gave a specific number of pages
+                newPages = int(arg[5:])
+            else:
+                newPages = 100
+        elif arg.startswith('-restore'):
+            globalvar.restoreAll = arg[9:].lower() == 'all'
+            optRestore = not globalvar.restoreAll
+        elif arg == '-continue':
+            optContinue = True
+        elif arg == '-hintsonly':
+            opthintsonly = True
+        elif arg.startswith('-namespace:'):
+            try:
+                namespaces.append(int(arg[11:]))
+            except ValueError:
+                namespaces.append(arg[11:])
+        # deprecated for consistency with other scripts
+        elif arg.startswith('-number:'):
+            number = int(arg[8:])
+        elif arg.startswith('-until:'):
+            until = arg[7:]
+        else:
+            if not genFactory.handleArg(arg):
+                singlePageTitle.append(arg)
+
+    # Do not use additional summary with autonomous mode
+    if globalvar.autonomous:
+        globalvar.summary = u''
+    elif globalvar.summary:
+        globalvar.summary += u'; '
+
+    # ensure that we don't try to change main page
+    try:
+        site = pywikibot.getSite()
+        try:
+            mainpagename = site.siteinfo()['mainpage']
+        except TypeError: #pywikibot module handle
+            mainpagename = site.siteinfo['mainpage']
+        globalvar.skip.add(pywikibot.Page(site, mainpagename))
+    except pywikibot.Error:
+        pywikibot.output(u'Missing main page name')
+
+    if newPages is not None:
+        if len(namespaces) == 0:
+            ns = 0
+        elif len(namespaces) == 1:
+            ns = namespaces[0]
+            if ns != 'all':
+                if isinstance(ns, unicode) or isinstance(ns, str):
+                    index = site.getNamespaceIndex(ns)
+                    if index is None:
+                        raise ValueError(u'Unknown namespace: %s' % ns)
+                    ns = index
+            namespaces = []
+        else:
+            ns = 'all'
+        hintlessPageGen = pagegenerators.NewpagesPageGenerator(newPages, namespace=ns)
+
+    elif optRestore or optContinue or globalvar.restoreAll:
+        site = pywikibot.getSite()
+        if globalvar.restoreAll:
+            import glob
+            for FileName in glob.iglob('interwiki-dumps/interwikidump-*.txt'):
+                s =
FileName.split('\\')[1].split('.')[0].split('-')
+                sitename = s[1]
+                for i in xrange(0,2):
+                    s.remove(s[0])
+                sitelang = '-'.join(s)
+                if site.family.name == sitename:
+                    File2Restore.append([sitename, sitelang])
+        else:
+            File2Restore.append([site.family.name, site.lang])
+        for sitename, sitelang in File2Restore:
+            dumpfn = pywikibot.config.datafilepath(
+                               'interwiki-dumps',
+                               u'interwikidump-%s-%s.txt'
+                                 % (sitename, sitelang))
+            pywikibot.output(u'Reading interwikidump-%s-%s.txt' % (sitename,
sitelang))
+            site = pywikibot.getSite(sitelang, sitename)
+            if not hintlessPageGen:
+                hintlessPageGen = pagegenerators.TextfilePageGenerator(dumpfn, site)
+            else:
+                hintlessPageGen =
pagegenerators.CombinedPageGenerator([hintlessPageGen,pagegenerators.TextfilePageGenerator(dumpfn,
site)])
+            restoredFiles.append(dumpfn)
+        if hintlessPageGen:
+            hintlessPageGen =
pagegenerators.DuplicateFilterPageGenerator(hintlessPageGen)
+        if optContinue:
+            # We waste this generator to find out the last page's title
+            # This is an ugly workaround.
+            nextPage = "!"
+            namespace = 0
+            searchGen = pagegenerators.TextfilePageGenerator(dumpfn, site)
+            for page in searchGen:
+                lastPage = page.title(withNamespace=False)
+                if lastPage > nextPage:
+                    nextPage = lastPage
+                    namespace = page.namespace()
+            if nextPage == "!":
+                pywikibot.output(u"Dump file is empty?! Starting at the
beginning.")
+            else:
+                nextPage += '!'
+            hintlessPageGen = pagegenerators.CombinedPageGenerator([hintlessPageGen,
pagegenerators.AllpagesPageGenerator(nextPage, namespace, includeredirects = False)])
+        if not hintlessPageGen:
+            pywikibot.output(u'No Dumpfiles found.')
+            return
+
+    bot = InterwikiBot()
+
+    if not hintlessPageGen:
+        hintlessPageGen = genFactory.getCombinedGenerator()
+    if hintlessPageGen:
+        if len(namespaces) > 0:
+            hintlessPageGen =
pagegenerators.NamespaceFilterPageGenerator(hintlessPageGen, namespaces)
+        # we'll use iter() to create make a next() function available.
+        bot.setPageGenerator(iter(hintlessPageGen), number = number, until=until)
+    elif warnfile:
+        # TODO: filter namespaces if -namespace parameter was used
+        readWarnfile(warnfile, bot)
+    else:
+        singlePageTitle = ' '.join(singlePageTitle)
+        if not singlePageTitle and not opthintsonly:
+            singlePageTitle = pywikibot.input(u'Which page to check:')
+        if singlePageTitle:
+            singlePage = pywikibot.Page(pywikibot.getSite(), singlePageTitle)
+        else:
+            singlePage = None
+        bot.add(singlePage, hints = globalvar.hints)
+
+    try:
+        try:
+            append = not (optRestore or optContinue or globalvar.restoreAll)
+            bot.run()
+        except KeyboardInterrupt:
+            dumpFileName = bot.dump(append)
+        except:
+            dumpFileName = bot.dump(append)
+            raise
+    finally:
+        if globalvar.contentsondisk:
+            StoredPage.SPdeleteStore()
+        if dumpFileName:
+            try:
+                restoredFiles.remove(dumpFileName)
+            except ValueError:
+                pass
+        for dumpFileName in restoredFiles:
+            try:
+                os.remove(dumpFileName)
+                pywikibot.output(u'Dumpfile %s deleted' %
dumpFileName.split('\\')[-1])
+            except WindowsError:
+                pass
+
+#===========
+globalvar=Global()
+
+if __name__ == "__main__":
+    try:
+        main()
+    finally:
+        pywikibot.stopme()

Copied: archive/old python 2.3 scripts/wikipedia.py (from rev 10463,
trunk/pywikipedia/wikipedia.py)
===================================================================
--- archive/old python 2.3 scripts/wikipedia.py	                        (rev 0)
+++ archive/old python 2.3 scripts/wikipedia.py	2012-09-16 13:48:36 UTC (rev 10528)
@@ -0,0 +1,8639 @@
+# -*- coding: utf-8  -*-
+"""
+Library to get and put pages on a MediaWiki.
+
+Contents of the library (objects and functions to be used outside)
+
+Classes:
+    Page(site, title): A page on a MediaWiki site
+    ImagePage(site, title): An image descriptor Page
+    Site(lang, fam): A MediaWiki site
+
+Factory functions:
+    Family(name): Import the named family
+    getSite(lang, fam): Return a Site instance
+
+Exceptions:
+    Error:              Base class for all exceptions in this module
+    NoUsername:         Username is not in user-config.py
+    NoPage:             Page does not exist on the wiki
+    NoSuchSite:         Site does not exist
+    IsRedirectPage:     Page is a redirect page
+    IsNotRedirectPage:  Page is not a redirect page
+    LockedPage:         Page is locked
+    SectionError:       The section specified in the Page title does not exist
+    PageNotSaved:       Saving the page has failed
+      EditConflict:     PageNotSaved due to edit conflict while uploading
+      SpamfilterError:  PageNotSaved due to MediaWiki spam filter
+      LongPageError:    PageNotSaved due to length limit
+    ServerError:        Got unexpected response from wiki server
+    BadTitle:           Server responded with BadTitle
+    UserBlocked:        Client's username or IP has been blocked
+    PageNotFound:       Page not found in list
+
+Objects:
+    get_throttle:       Call to limit rate of read-access to wiki
+    put_throttle:       Call to limit rate of write-access to wiki
+
+Other functions:
+    getall(): Load a group of pages
+    handleArgs(): Process all standard command line arguments (such as
+        -family, -lang, -log and others)
+    translate(xx, dict): dict is a dictionary, giving text depending on
+        language, xx is a language. Returns the text in the most applicable
+        language for the xx: wiki
+    setAction(text): Use 'text' instead of "Wikipedia python library"
in
+        edit summaries
+    setUserAgent(text): Sets the string being passed to the HTTP server as
+        the User-agent: header. Defaults to 'Pywikipediabot/1.0'.
+
+    output(text): Prints the text 'text' in the encoding of the user's
+        console. **Use this instead of "print" statements**
+    input(text): Asks input from the user, printing the text 'text' first.
+    inputChoice: Shows user a list of choices and returns user's selection.
+
+    showDiff(oldtext, newtext): Prints the differences between oldtext and
+        newtext on the screen
+
+Wikitext manipulation functions: each of these takes a unicode string
+containing wiki text as its first argument, and returns a modified version
+of the text unless otherwise noted --
+
+    replaceExcept: replace all instances of 'old' by 'new', skipping any
+        instances of 'old' within comments and other special text blocks
+    removeDisabledParts: remove text portions exempt from wiki markup
+    isDisabled(text,index): return boolean indicating whether text[index] is
+        within a non-wiki-markup section of text
+    decodeEsperantoX: decode Esperanto text using the x convention.
+    encodeEsperantoX: convert wikitext to the Esperanto x-encoding.
+    findmarker(text, startwith, append): return a string which is not part
+        of text
+    expandmarker(text, marker, separator): return marker string expanded
+        backwards to include separator occurrences plus whitespace
+
+Wikitext manipulation functions for interlanguage links:
+
+    getLanguageLinks(text,xx): extract interlanguage links from text and
+        return in a dict
+    removeLanguageLinks(text): remove all interlanguage links from text
+    removeLanguageLinksAndSeparator(text, site, marker, separator = ''):
+        remove language links, whitespace, preceeding separators from text
+    replaceLanguageLinks(oldtext, new): remove the language links and
+        replace them with links from a dict like the one returned by
+        getLanguageLinks
+    interwikiFormat(links): convert a dict of interlanguage links to text
+        (using same dict format as getLanguageLinks)
+    interwikiSort(sites, inSite): sorts a list of sites according to interwiki
+        sort preference of inSite.
+    url2link: Convert urlname of a wiki page into interwiki link format.
+
+Wikitext manipulation functions for category links:
+
+    getCategoryLinks(text): return list of Category objects corresponding
+        to links in text
+    removeCategoryLinks(text): remove all category links from text
+    replaceCategoryLinksAndSeparator(text, site, marker, separator = ''):
+        remove language links, whitespace, preceeding separators from text
+    replaceCategoryLinks(oldtext,new): replace the category links in oldtext by
+        those in a list of Category objects
+    replaceCategoryInPlace(text,oldcat,newtitle): replace a single link to
+        oldcat with a link to category given by newtitle
+    categoryFormat(links): return a string containing links to all
+        Categories in a list.
+
+Unicode utility functions:
+    UnicodeToAsciiHtml: Convert unicode to a bytestring using HTML entities.
+    url2unicode: Convert url-encoded text to unicode using a site's encoding.
+    unicode2html: Ensure unicode string is encodable; if not, convert it to
+        ASCII for HTML.
+    html2unicode: Replace HTML entities in text with unicode characters.
+
+stopme(): Put this on a bot when it is not or not communicating with the Wiki
+    any longer. It will remove the bot from the list of running processes,
+    and thus not slow down other bot threads anymore.
+
+"""
+from __future__ import generators
+#
+# (C) Pywikipedia bot team, 2003-2012
+#
+# Distributed under the terms of the MIT license.
+#
+__version__ = '$Id$'
+
+import os, sys
+import httplib, socket, urllib, urllib2, cookielib
+import traceback
+import time, threading, Queue
+import math
+import re, codecs, difflib, locale
+try:
+    from hashlib import md5
+except ImportError:             # Python 2.4 compatibility
+    from md5 import new as md5
+import xml.sax, xml.sax.handler
+import htmlentitydefs
+import warnings
+import unicodedata
+import xmlreader
+from BeautifulSoup import BeautifulSoup, BeautifulStoneSoup, SoupStrainer
+import weakref
+# Splitting the bot into library parts
+from pywikibot import *
+
+# Set the locale to system default. This will ensure correct string
+# handling for non-latin characters on Python 2.3.x. For Python 2.4.x it's no
+# longer needed.
+locale.setlocale(locale.LC_ALL, '')
+
+import config, login, query, version
+
+try:
+    set # introduced in Python2.4: faster and future
+except NameError:
+    from sets import Set as set
+
+# Check Unicode support (is this a wide or narrow python build?)
+# See http://www.python.org/doc/peps/pep-0261/
+try:
+    unichr(66365)  # a character in th: alphabet, uses 32 bit encoding
+    WIDEBUILD = True
+except ValueError:
+    WIDEBUILD = False
+
+
+SaxError = xml.sax._exceptions.SAXParseException
+
+# Pre-compile re expressions
+reNamespace = re.compile("^(.+?) *: *(.*)$")
+Rwatch = re.compile(
+         r"<input type='hidden' value=\"(.*?)\"
name=\"wpEditToken\"")
+Rwatchlist = re.compile(r"<input tabindex='[\d]+' type='checkbox'
"
+                        r"name='wpWatchthis'
checked='checked'")
+Rlink = re.compile(r'\[\[(?P<title>[^\]\|\[]*)(\|[^\]]*)?\]\]')
+
+
+# Page objects (defined here) represent the page itself, including its contents.
+class Page(object):
+    """Page: A MediaWiki page
+
+    Constructor has two required parameters:
+      1) The wiki Site on which the page resides [note that, if the
+         title is in the form of an interwiki link, the Page object may
+         have a different Site than this]
+      2) The title of the page as a unicode string
+
+    Optional parameters:
+      insite - the wiki Site where this link was found (to help decode
+               interwiki links)
+      defaultNamespace - A namespace to use if the link does not contain one
+
+    Methods available:
+
+    title                 : The name of the page, including namespace and
+                            section if any
+    urlname               : Title, in a form suitable for a URL
+    namespace             : The namespace in which the page is found
+    section               : The section of the page (the part of the title
+                            after '#', if any)
+    sectionFreeTitle      : Title, without the section part
+    site                  : The wiki this page is in
+    encoding              : The encoding of the page
+    isAutoTitle           : Title can be translated using the autoFormat method
+    autoFormat            : Auto-format certain dates and other standard
+                            format page titles
+    isCategory            : True if the page is a category
+    isDisambig (*)        : True if the page is a disambiguation page
+    isImage               : True if the page is an image
+    isRedirectPage (*)    : True if the page is a redirect, false otherwise
+    getRedirectTarget (*) : The page the page redirects to
+    isTalkPage            : True if the page is in any "talk" namespace
+    toggleTalkPage        : Return the talk page (if this is one, return the
+                            non-talk page)
+    get (*)               : The text of the page
+    getSections (*)       : Retrieve page section heading and assign them to
+                            the byte offset
+    latestRevision (*)    : The page's current revision id
+    userName              : Last user to edit page
+    userNameHuman         : Last human (non-bot) user to edit page
+    isIpEdit              : True if last editor was unregistered
+    editTime              : Timestamp of the last revision to the page
+    previousRevision (*)  : The revision id of the previous version
+    permalink (*)         : The url of the permalink of the current version
+    getOldVersion(id) (*) : The text of a previous version of the page
+    getRestrictions       : Returns a protection dictionary
+    getVersionHistory     : Load the version history information from wiki
+    getVersionHistoryTable: Create a wiki table from the history data
+    fullVersionHistory    : Return all past versions including wikitext
+    contributingUsers     : Return set of users who have edited page
+    getCreator            : Function to get the first editor of a page
+    getLatestEditors      : Function to get the last editors of a page
+    exists (*)            : True if the page actually exists, false otherwise
+    isEmpty (*)           : True if the page has 4 characters or less content,
+                            not counting interwiki and category links
+    interwiki (*)         : The interwiki links from the page (list of Pages)
+    categories (*)        : The categories the page is in (list of Pages)
+    linkedPages (*)       : The normal pages linked from the page (list of
+                            Pages)
+    imagelinks (*)        : The pictures on the page (list of ImagePages)
+    templates (*)         : All templates referenced on the page (list of
+                            Pages)
+    templatesWithParams(*): All templates on the page, with list of parameters
+    getReferences         : List of pages linking to the page
+    canBeEdited (*)       : True if page is unprotected or user has edit
+                            privileges
+    protection(*)         : This page protection level
+    botMayEdit (*)        : True if bot is allowed to edit page
+    put(newtext)          : Saves the page
+    put_async(newtext)    : Queues the page to be saved asynchronously
+    append(newtext)       : Append to page section
+    watch                 : Add the page to the watchlist
+    unwatch               : Remove the page from the watchlist
+    move                  : Move the page to another title
+    delete                : Deletes the page (requires being logged in)
+    protect               : Protect or unprotect a page (requires sysop status)
+    removeImage           : Remove all instances of an image from this page
+    replaceImage          : Replace all instances of an image with another
+    loadDeletedRevisions  : Load all deleted versions of this page
+    getDeletedRevision    : Return a particular deleted revision
+    markDeletedRevision   : Mark a version to be undeleted, or not
+    undelete              : Undelete past version(s) of the page
+    purgeCache            : Purge page from server cache
+
+    (*) : This loads the page if it has not been loaded before; permalink might
+          even reload it if it has been loaded before
+
+    """
+    def __init__(self, site, title, insite=None, defaultNamespace=0):
+        """Instantiate a Page object.
+
+        """
+        try:
+            # if _editrestriction is True, it means that the page has been found
+            # to have an edit restriction, but we do not know yet whether the
+            # restriction affects us or not
+            self._editrestriction = False
+
+            if site is None or isinstance(site, basestring):
+                site = getSite(site)
+            self._site = site
+
+            if not insite:
+                insite = site
+
+            # Clean up the name, it can come from anywhere.
+            # Convert HTML entities to unicode
+            t = html2unicode(title)
+
+            # Convert URL-encoded characters to unicode
+            # Sometimes users copy the link to a site from one to another.
+            # Try both the source site and the destination site to decode.
+            try:
+                t = url2unicode(t, site=insite, site2=site)
+            except UnicodeDecodeError:
+                raise InvalidTitle(u'Bad page title : %s' % t)
+
+            # Normalize unicode string to a NFC (composed) format to allow
+            # proper string comparisons. According to
+            #
http://svn.wikimedia.org/viewvc/mediawiki/branches/REL1_6/phase3/includes/n…
+            # the mediawiki code normalizes everything to NFC, not NFKC
+            # (which might result in information loss).
+            t = unicodedata.normalize('NFC', t)
+
+            if u'\ufffd' in t:
+                raise InvalidTitle("Title contains illegal char (\\uFFFD)")
+
+            # Replace underscores by spaces
+            t = t.replace(u"_", u" ")
+            # replace multiple spaces a single space
+            while u"  " in t: t = t.replace(u"  ", u" ")
+            # Strip spaces at both ends
+            t = t.strip()
+            # Remove left-to-right and right-to-left markers.
+            t = t.replace(u'\u200e', '').replace(u'\u200f',
'')
+            # leading colon implies main namespace instead of the default
+            if t.startswith(':'):
+                t = t[1:]
+                self._namespace = 0
+            else:
+                self._namespace = defaultNamespace
+
+            if not t:
+                raise InvalidTitle(u"Invalid title '%s'" % title )
+
+            self._namespace = defaultNamespace
+            #
+            # This code was adapted from Title.php : secureAndSplit()
+            #
+            # Namespace or interwiki prefix
+            while True:
+                m = reNamespace.match(t)
+                if not m:
+                    break
+                p = m.group(1)
+                lowerNs = p.lower()
+                ns = self._site.getNamespaceIndex(lowerNs)
+                if ns:
+                    t = m.group(2)
+                    self._namespace = ns
+                    break
+
+                if lowerNs in self._site.family.langs.keys():
+                    # Interwiki link
+                    t = m.group(2)
+
+                    # Redundant interwiki prefix to the local wiki
+                    if lowerNs == self._site.lang:
+                        if t == '':
+                            raise Error("Can't have an empty self-link")
+                    else:
+                        self._site = getSite(lowerNs, self._site.family.name)
+                        if t == '':
+                            t = self._site.mediawiki_message('Mainpage')
+
+                    # If there's an initial colon after the interwiki, that also
+                    # resets the default namespace
+                    if t != '' and t[0] == ':':
+                        self._namespace = 0
+                        t = t[1:]
+                elif lowerNs in self._site.family.get_known_families(site = self._site):
+                    if self._site.family.get_known_families(site = self._site)[lowerNs]
== self._site.family.name:
+                        t = m.group(2)
+                    else:
+                        # This page is from a different family
+                        if verbose:
+                            output(u"Target link '%s' has different family
'%s'" % (title, lowerNs))
+                        if self._site.family.name in ['commons',
'meta']:
+                            #When the source wiki is commons or meta,
+                            #w:page redirects you to w:en:page
+                            otherlang = 'en'
+                        else:
+                            otherlang = self._site.lang
+                        familyName = self._site.family.get_known_families(site =
self._site)[lowerNs]
+                        if familyName in ['commons', 'meta']:
+                            otherlang = familyName
+                        try:
+                            self._site = getSite(otherlang, familyName)
+                        except ValueError:
+                            raise NoPage("""\
+%s is not a local page on %s, and the %s family is
+not supported by PyWikipediaBot!"""
+                              % (title, self._site, familyName))
+                        t = m.group(2)
+                else:
+                    # If there's no recognized interwiki or namespace,
+                    # then let the colon expression be part of the title.
+                    break
+
+            sectionStart = t.find(u'#')
+            # But maybe there are magic words like {{#time|}}
+            # TODO: recognize magic word and templates inside links
+            # see
http://la.wikipedia.org/w/index.php?title=997_Priska&diff=prev&oldi…
+            if sectionStart > 0:
+                # Categories does not have sections.
+                if self._namespace == 14:
+                    raise InvalidTitle(u"Invalid section in category
'%s'" % t)
+                else:
+                    t, sec = t.split(u'#', 1)
+                    self._section = sec.lstrip() or None
+                    t = t.rstrip()
+            elif sectionStart == 0:
+                raise InvalidTitle(u"Invalid title starting with a #:
'%s'" % t)
+            else:
+                self._section = None
+
+            if t:
+                if not self._site.nocapitalize:
+                    t = t[:1].upper() + t[1:]
+
+            # reassemble the title from its parts
+            if self._namespace != 0:
+                t = u'%s:%s' % (self._site.namespace(self._namespace), t)
+            if self._section:
+                t += u'#' + self._section
+
+            self._title = t
+            self.editRestriction = None
+            self.moveRestriction = None
+            self._permalink = None
+            self._userName = None
+            self._ipedit = None
+            self._editTime = None
+            self._startTime = '0'
+            # For the Flagged Revisions MediaWiki extension
+            self._revisionId = None
+            self._deletedRevs = None
+        except NoSuchSite:
+            raise
+        except:
+            if verbose:
+                output(u"Exception in Page constructor")
+                output(
+                    u"site=%s, title=%s, insite=%s, defaultNamespace=%i"
+                    % (site, title, insite, defaultNamespace)
+                )
+            raise
+
+    @property
+    def site(self):
+        """Return the Site object for the wiki on which this Page
resides."""
+        return self._site
+
+    def namespace(self):
+        """Return the number of the namespace of the page.
+
+        Only recognizes those namespaces defined in family.py.
+        If not defined, it will return 0 (the main namespace).
+
+        """
+        return self._namespace
+
+    def encoding(self):
+        """Return the character encoding used on this Page's wiki
Site."""
+        return self._site.encoding()
+
+    @deprecate_arg("decode", None)
+    def title(self, underscore=False, savetitle=False, withNamespace=True,
+              withSection=True, asUrl=False, asLink=False,
+              allowInterwiki=True, forceInterwiki=False, textlink=False,
+              as_filename=False):
+        """Return the title of this Page, as a Unicode string.
+
+        @param underscore: if true, replace all ' ' characters with '_'
+        @param withNamespace: if false, omit the namespace prefix
+        @param withSection: if false, omit the section
+        @param asUrl: - not implemented yet -
+        @param asLink: if true, return the title in the form of a wikilink
+        @param allowInterwiki: (only used if asLink is true) if true, format
+            the link as an interwiki link if necessary
+        @param forceInterwiki: (only used if asLink is true) if true, always
+            format the link as an interwiki link
+        @param textlink: (only used if asLink is true) if true, place a ':'
+            before Category: and Image: links
+        @param as_filename:  - not implemented yet -
+        @param savetitle: if True, encode any wiki syntax in the title.
+
+        """
+        title = self._title
+        if not withNamespace and  self.namespace() != 0:
+            title = title.split(':', 1)[1]
+        if asLink:
+            iw_target_site = getSite()
+            iw_target_family = getSite().family
+            if iw_target_family.interwiki_forward:
+                iw_target_family = pywikibot.Family(iw_target_family.interwiki_forward)
+
+            if allowInterwiki and (forceInterwiki or self._site != iw_target_site):
+                colon = ""
+                if textlink:
+                    colon = ":"
+                if  self._site.family != iw_target_family \
+                        and self._site.family.name != self._site.lang:
+                    title =  u'[[%s%s:%s:%s]]' % (colon, self._site.family.name,
+                                                self._site.lang, title)
+                else:
+                    title = u'[[%s%s:%s]]' % (colon, self._site.lang, title)
+            elif textlink and (self.isImage() or self.isCategory()):
+                title = u'[[:%s]]' % title
+            else:
+                title = u'[[%s]]' % title
+        if savetitle or asLink:
+            # Ensure there's no wiki syntax in the title
+            title = title.replace(u"''", u'%27%27')
+        if underscore:
+            title = title.replace(' ', '_')
+        if not withSection:
+            sectionName = self.section(underscore=underscore)
+            if sectionName:
+                title = title[:-len(sectionName)-1]
+        return title
+
+    #(a)deprecated(&quot;Page.title(withNamespace=False)&quot;)
+    def titleWithoutNamespace(self, underscore=False):
+        """Return title of Page without namespace and without
section."""
+        return self.title(underscore=underscore, withNamespace=False,
+                          withSection=False)
+
+    def titleForFilename(self):
+        """
+        Return the title of the page in a form suitable for a filename on
+        the user's file system.
+        """
+        result = self.title()
+        # Replace characters that are not possible in file names on some
+        # systems.
+        # Spaces are possible on most systems, but are bad for URLs.
+        for forbiddenChar in ':*?/\\ ':
+            result = result.replace(forbiddenChar, '_')
+        return result
+
+    @deprecate_arg("decode", None)
+    def section(self, underscore = False):
+        """Return the name of the section this Page refers to.
+
+        The section is the part of the title following a '#' character, if
+        any. If no section is present, return None.
+
+        """
+        section = self._section
+        if section and underscore:
+            section = section.replace(' ', '_')
+        return section
+
+    def sectionFreeTitle(self, underscore=False):
+        """Return the title of this Page, without the section (if
any)."""
+        sectionName = self.section(underscore=underscore)
+        title = self.title(underscore=underscore)
+        if sectionName:
+            return title[:-len(sectionName)-1]
+        else:
+            return title
+
+    def urlname(self, withNamespace=True):
+        """Return the Page title encoded for use in an
URL."""
+        title = self.title(withNamespace=withNamespace, underscore=True)
+        encodedTitle = title.encode(self.site().encoding())
+        return urllib.quote(encodedTitle)
+
+    def __str__(self):
+        """Return a console representation of the
pagelink."""
+        return self.title(asLink=True, forceInterwiki=True
+                          ).encode(config.console_encoding,
+                                   "xmlcharrefreplace")
+
+    def __unicode__(self):
+        return self.title(asLink=True, forceInterwiki=True)
+
+    def __repr__(self):
+        """Return a more complete string
representation."""
+        return "%s{%s}" % (self.__class__.__name__,
+                           self.title(asLink=True).encode(config.console_encoding))
+
+    def __cmp__(self, other):
+        """Test for equality and inequality of Page objects.
+
+        Page objects are "equal" if and only if they are on the same site
+        and have the same normalized title, including section if any.
+
+        Page objects are sortable by namespace first, then by title.
+
+        """
+        if not isinstance(other, Page):
+            # especially, return -1 if other is None
+            return -1
+        if self._site == other._site:
+            return cmp(self._title, other._title)
+        else:
+            return cmp(self._site, other._site)
+
+    def __hash__(self):
+        # Pseudo method that makes it possible to store Page objects as keys
+        # in hash-tables. This relies on the fact that the string
+        # representation of an instance can not change after the construction.
+        return hash(unicode(self))
+
+    @deprecated("Page.title(asLink=True)")
+    def aslink(self, forceInterwiki=False, textlink=False, noInterwiki=False):
+        """Return a string representation in the form of a wikilink.
+
+        If forceInterwiki is True, return an interwiki link even if it
+        points to the home wiki. If False, return an interwiki link only if
+        needed.
+
+        If textlink is True, always return a link in text form (that is,
+        interwiki links and internal links to the Category: and Image:
+        namespaces will be preceded by a : character).
+
+        DEPRECATED to merge to rewrite branch:
+        use self.title(asLink=True) instead.
+        """
+        return self.title(asLink=True, forceInterwiki=forceInterwiki,
+                          allowInterwiki=not noInterwiki, textlink=textlink)
+
+    def autoFormat(self):
+        """Return (dictName, value) if title is in date.autoFormat
dictionary.
+
+        Value can be a year, date, etc., and dictName is 'YearBC',
+        'Year_December', or another dictionary name. Please note that two
+        entries may have exactly the same autoFormat, but be in two
+        different namespaces, as some sites have categories with the
+        same names. Regular titles return (None, None).
+
+        """
+        if not hasattr(self, '_autoFormat'):
+            import date
+            self._autoFormat = date.getAutoFormat(self.site().language(),
+                                                  self.title(withNamespace=False))
+        return self._autoFormat
+
+    def isAutoTitle(self):
+        """Return True if title of this Page is in the autoFormat
dictionary."""
+        return self.autoFormat()[0] is not None
+
+    def get(self, force=False, get_redirect=False, throttle=True,
+            sysop=False, change_edit_time=True, expandtemplates=False):
+        """Return the wiki-text of the page.
+
+        This will retrieve the page from the server if it has not been
+        retrieved yet, or if force is True. This can raise the following
+        exceptions that should be caught by the calling code:
+
+        @exception NoPage         The page does not exist
+        @exception IsRedirectPage The page is a redirect. The argument of the
+                                  exception is the title of the page it
+                                  redirects to.
+        @exception SectionError   The section does not exist on a page with
+                                  a # link
+
+        @param force            reload all page attributes, including errors.
+        @param get_redirect     return the redirect text, do not follow the
+                                redirect, do not raise an exception.
+        @param sysop            if the user has a sysop account, use it to
+                                retrieve this page
+        @param change_edit_time if False, do not check this version for
+                                changes before saving. This should be used only
+                                if the page has been loaded previously.
+        @param expandtemplates  all templates in the page content are fully
+                                resolved too (if API is used).
+
+        """
+        # NOTE: The following few NoPage exceptions could already be thrown at
+        # the Page() constructor. They are raised here instead for convenience,
+        # because all scripts are prepared for NoPage exceptions raised by
+        # get(), but not for such raised by the constructor.
+        # \ufffd represents a badly encoded character, the other characters are
+        # disallowed by MediaWiki.
+        for illegalChar in u'#<>[]|{}\n\ufffd':
+            if illegalChar in self.sectionFreeTitle():
+                if verbose:
+                    output(u'Illegal character in %s!'
+                           % self.title(asLink=True))
+                raise NoPage('Illegal character in %s!'
+                             % self.title(asLink=True))
+        if self.namespace() == -1:
+            raise NoPage('%s is in the Special namespace!'
+                         % self.title(asLink=True))
+        if self.site().isInterwikiLink(self.title()):
+            raise NoPage('%s is not a local page on %s!'
+                         % (self.title(asLink=True), self.site()))
+        if force:
+            # When forcing, we retry the page no matter what:
+            # * Old exceptions and contents do not apply any more
+            # * Deleting _contents and _expandcontents to force reload
+            for attr in ['_redirarg', '_getexception',
+                         '_contents', '_expandcontents',
+                         '_sections']:
+                if hasattr(self, attr):
+                    delattr(self, attr)
+        else:
+            # Make sure we re-raise an exception we got on an earlier attempt
+            if hasattr(self, '_redirarg') and not get_redirect:
+                raise IsRedirectPage, self._redirarg
+            elif hasattr(self, '_getexception'):
+                if self._getexception == IsRedirectPage and get_redirect:
+                    pass
+                else:
+                    raise self._getexception
+        # Make sure we did try to get the contents once
+        if expandtemplates:
+            attr = '_expandcontents'
+        else:
+            attr = '_contents'
+        if not hasattr(self, attr):
+            try:
+                contents = self._getEditPage(get_redirect=get_redirect,
throttle=throttle, sysop=sysop,
+                                             expandtemplates = expandtemplates)
+                if expandtemplates:
+                    self._expandcontents = contents
+                else:
+                    self._contents = contents
+                hn = self.section()
+                if hn:
+                    m = re.search("=+[ ']*%s[ ']*=+" % re.escape(hn),
+                                  self._contents)
+                    if verbose and not m:
+                        output(u"WARNING: Section does not exist: %s" % self)
+            # Store any exceptions for later reference
+            except NoPage:
+                self._getexception = NoPage
+                raise
+            except IsRedirectPage, arg:
+                self._getexception = IsRedirectPage
+                self._redirarg = arg
+                if not get_redirect:
+                    raise
+            except SectionError:
+                self._getexception = SectionError
+                raise
+            except UserBlocked:
+                if self.site().loggedInAs(sysop=sysop):
+                    raise UserBlocked(self.site(), unicode(self))
+                else:
+                    if verbose:
+                        output("The IP address is blocked, retry by login.")
+                    self.site().forceLogin(sysop=sysop)
+                    return self.get(force, get_redirect, throttle, sysop,
change_edit_time)
+        if expandtemplates:
+            return self._expandcontents
+        return self._contents
+
+    def _getEditPage(self, get_redirect=False, throttle=True, sysop=False,
+                     oldid=None, change_edit_time=True, expandtemplates=False):
+        """Get the contents of the Page via API query
+
+        Do not use this directly, use get() instead.
+
+        Arguments:
+            oldid - Retrieve an old revision (by id), not the current one
+            get_redirect  - Get the contents, even if it is a redirect page
+        expandtemplates - Fully resolve templates within page content
+                          (if API is used)
+
+        This method returns the raw wiki text as a unicode string.
+        """
+        if not self.site().has_api() or self.site().versionnumber() < 12:
+            return self._getEditPageOld(get_redirect, throttle, sysop, oldid,
change_edit_time)
+        params = {
+            'action': 'query',
+            'titles': self.title(),
+            'prop': ['revisions', 'info'],
+            'rvprop': ['content', 'ids', 'flags',
'timestamp', 'user', 'comment', 'size'],
+            'rvlimit': 1,
+            #'talkid' valid for release > 1.12
+            #'url', 'readable' valid for release > 1.14
+            'inprop': ['protection', 'subjectid'],
+            #'intoken': 'edit',
+        }
+        if oldid:
+            params['rvstartid'] = oldid
+        if expandtemplates:
+            params[u'rvexpandtemplates'] = u''
+
+        if throttle:
+            get_throttle()
+        textareaFound = False
+        # retrying loop is done by query.GetData
+        data = query.GetData(params, self.site(), sysop=sysop)
+        if 'error' in data:
+            raise RuntimeError("API query error: %s" % data)
+        if not 'pages' in data['query']:
+            raise RuntimeError("API query error, no pages found: %s" % data)
+        pageInfo = data['query']['pages'].values()[0]
+        if data['query']['pages'].keys()[0] == "-1":
+            if 'missing' in pageInfo:
+                raise NoPage(self.site(), unicode(self),
+"Page does not exist. In rare cases, if you are certain the page does exist, look
into overriding family.RversionTab")
+            elif 'invalid' in pageInfo:
+                raise BadTitle('BadTitle: %s' % self)
+        elif 'revisions' in pageInfo: #valid Title
+            lastRev = pageInfo['revisions'][0]
+            if isinstance(lastRev['*'], basestring):
+                textareaFound = True
+        # I got page date with 'revisions' in pageInfo but
+        # lastRev['*'] = False instead of the content. The Page itself was
+        # deleted but there was not 'missing' in pageInfo as expected
+        # I raise a ServerError() yet, but maybe it should be NoPage().
+        if not textareaFound:
+            if verbose:
+                print pageInfo
+            raise ServerError('ServerError: No textarea found in %s' % self)
+
+        self.editRestriction = ''
+        self.moveRestriction = ''
+
+        # Note: user may be hidden and mw returns 'userhidden' flag
+        if 'userhidden' in lastRev:
+            self._userName = None
+        else:
+            self._userName = lastRev['user']
+            self._ipedit = 'anon' in lastRev
+        for restr in pageInfo['protection']:
+            if restr['type'] == 'edit':
+                self.editRestriction = restr['level']
+            elif restr['type'] == 'move':
+                self.moveRestriction = restr['level']
+
+        self._revisionId = lastRev['revid']
+
+        if change_edit_time:
+            self._editTime = parsetime2stamp(lastRev['timestamp'])
+            if "starttimestamp" in pageInfo:
+                self._startTime = parsetime2stamp(pageInfo["starttimestamp"])
+
+        self._isWatched = False #cannot handle in API in my research for now.
+
+        pagetext = lastRev['*']
+        pagetext = pagetext.rstrip()
+        # pagetext must not decodeEsperantoX() if loaded via API
+        m = self.site().redirectRegex().match(pagetext)
+        if m:
+            # page text matches the redirect pattern
+            if self.section() and not "#" in m.group(1):
+                redirtarget = "%s#%s" % (m.group(1), self.section())
+            else:
+                redirtarget = m.group(1)
+            if get_redirect:
+                self._redirarg = redirtarget
+            else:
+                raise IsRedirectPage(redirtarget)
+
+        if self.section() and \
+           not textlib.does_text_contain_section(pagetext, self.section()):
+            try:
+                self._getexception
+            except AttributeError:
+                raise SectionError # Page has no section by this name
+        return pagetext
+
+    def _getEditPageOld(self, get_redirect=False, throttle=True, sysop=False,
+                     oldid=None, change_edit_time=True):
+        """Get the contents of the Page via the edit
page."""
+
+        if verbose:
+            output(u'Getting page %s' % self.title(asLink=True))
+        path = self.site().edit_address(self.urlname())
+        if oldid:
+            path += "&oldid="+oldid
+        # Make sure Brion doesn't get angry by waiting if the last time a page
+        # was retrieved was not long enough ago.
+        if throttle:
+            get_throttle()
+        textareaFound = False
+        retry_idle_time = 1
+        while not textareaFound:
+            text = self.site().getUrl(path, sysop = sysop)
+
+            if "<title>Wiki does not exist</title>" in text:
+                raise NoSuchSite(u'Wiki %s does not exist yet' % self.site())
+
+            # Extract the actual text from the textarea
+            m1 = re.search('<textarea([^>]*)>', text)
+            m2 = re.search('</textarea>', text)
+            if m1 and m2:
+                i1 = m1.end()
+                i2 = m2.start()
+                textareaFound = True
+            else:
+                # search for messages with no "view source" (aren't used in
new versions)
+                if self.site().mediawiki_message('whitelistedittitle') in text:
+                    raise NoPage(u'Page editing is forbidden for anonymous
users.')
+                elif self.site().has_mediawiki_message('nocreatetitle') and
self.site().mediawiki_message('nocreatetitle') in text:
+                    raise NoPage(self.site(), unicode(self))
+                # Bad title
+                elif 'var wgPageName = "Special:Badtitle";' in text \
+                or self.site().mediawiki_message('badtitle') in text:
+                    raise BadTitle('BadTitle: %s' % self)
+                # find out if the username or IP has been blocked
+                elif self.site().isBlocked():
+                    raise UserBlocked(self.site(), unicode(self))
+                # If there is no text area and the heading is 'View Source'
+                # but user is not blocked, the page does not exist, and is
+                # locked
+                elif self.site().mediawiki_message('viewsource') in text:
+                    raise NoPage(self.site(), unicode(self))
+                # Some of the newest versions don't have a "view source"
tag for
+                # non-existant pages
+                # Check also the div class because if the language is not english
+                # the bot can not seeing that the page is blocked.
+                elif self.site().mediawiki_message('badaccess') in text or \
+                "<div class=\"permissions-errors\">" in text:
+                    raise NoPage(self.site(), unicode(self))
+                elif config.retry_on_fail:
+                    if "<title>Wikimedia Error</title>" in text:
+                        output( u"Wikimedia has technical problems; will retry in %i
minutes." % retry_idle_time)
+                    else:
+                        output( unicode(text) )
+                        # We assume that the server is down. Wait some time, then try
again.
+                        output( u"WARNING: No text area found on %s%s. Maybe the
server is down. Retrying in %i minutes..." % (self.site().hostname(), path,
retry_idle_time) )
+                    time.sleep(retry_idle_time * 60)
+                    # Next time wait longer, but not longer than half an hour
+                    retry_idle_time *= 2
+                    if retry_idle_time > 30:
+                        retry_idle_time = 30
+                else:
+                    output( u"Failed to access wiki")
+                    sys.exit(1)
+        # Check for restrictions
+        m = re.search('var wgRestrictionEdit = \\["(\w+)"\\]', text)
+        if m:
+            if verbose:
+                output(u"DBG> page is locked for group %s" % m.group(1))
+            self.editRestriction = m.group(1);
+        else:
+            self.editRestriction = ''
+        m = re.search('var wgRestrictionMove = \\["(\w+)"\\]', text)
+        if m:
+            self.moveRestriction = m.group(1);
+        else:
+            self.moveRestriction = ''
+        m = re.search('name=["\']baseRevId["\']
type=["\']hidden["\'] value="(\d+)"', text)
+        if m:
+            self._revisionId = m.group(1)
+        if change_edit_time:
+            # Get timestamps
+            m = re.search('value="(\d+)"
name=["\']wpEdittime["\']', text)
+            if m:
+                self._editTime = m.group(1)
+            else:
+                self._editTime = "0"
+            m = re.search('value="(\d+)"
name=["\']wpStarttime["\']', text)
+            if m:
+                self._startTime = m.group(1)
+            else:
+                self._startTime = "0"
+        # Find out if page actually exists. Only existing pages have a
+        # version history tab.
+        if self.site().family.RversionTab(self.site().language()):
+            # In case a family does not have version history tabs, or in
+            # another form
+            RversionTab =
re.compile(self.site().family.RversionTab(self.site().language()))
+        else:
+            RversionTab = re.compile(r'<li id="ca-history"><a
href=".*?title=.*?&amp;action=history".*?>.*?</a></li>',
re.DOTALL)
+        matchVersionTab = RversionTab.search(text)
+        if not matchVersionTab and not self.site().family.name == 'wikitravel':
+            raise NoPage(self.site(), unicode(self),
+"Page does not exist. In rare cases, if you are certain the page does exist, look
into overriding family.RversionTab" )
+        # Look if the page is on our watchlist
+        matchWatching = Rwatchlist.search(text)
+        if matchWatching:
+            self._isWatched = True
+        else:
+            self._isWatched = False
+        # Now process the contents of the textarea
+        # Unescape HTML characters, strip whitespace
+        pagetext = text[i1:i2]
+        pagetext = unescape(pagetext)
+        pagetext = pagetext.rstrip()
+        if self.site().lang == 'eo':
+            pagetext = decodeEsperantoX(pagetext)
+        m = self.site().redirectRegex().match(pagetext)
+        if m:
+            # page text matches the redirect pattern
+            if self.section() and not "#" in m.group(1):
+                redirtarget = "%s#%s" % (m.group(1), self.section())
+            else:
+                redirtarget = m.group(1)
+            if get_redirect:
+                self._redirarg = redirtarget
+            else:
+                raise IsRedirectPage(redirtarget)
+
+        if self.section() and \
+           not textlib.does_text_contain_section(text, self.section()):
+            try:
+                self._getexception
+            except AttributeError:
+                raise SectionError # Page has no section by this name
+
+        return pagetext
+
+    def getOldVersion(self, oldid, force=False, get_redirect=False,
+                      throttle=True, sysop=False, change_edit_time=True):
+        """Return text of an old revision of this page; same options as
get().
+
+        @param oldid: The revid of the revision desired.
+
+        """
+        # TODO: should probably check for bad pagename, NoPage, and other
+        # exceptions that would prevent retrieving text, as get() does
+
+        # TODO: should this default to change_edit_time = False? If we're not
+        # getting the current version, why change the timestamps?
+        return self._getEditPage(
+                        get_redirect=get_redirect, throttle=throttle,
+                        sysop=sysop, oldid=oldid,
+                        change_edit_time=change_edit_time
+                    )
+
+    ## @since   r10309
+    #  @remarks needed by various bots
+    def getSections(self, minLevel=2, sectionsonly=False, force=False):
+        """Parses the page with API and return section information.
+
+           @param minLevel: The minimal level of heading for section to be reported.
+           @type  minLevel: int
+           @param sectionsonly: Report only the result from API call, do not assign
+                                the headings to wiki text (for compression e.g.).
+           @type  sectionsonly: bool
+           @param force: Use API for full section list resolution, works always but
+                         is extremely slow, since each single section has to be
retrieved.
+           @type  force: bool
+
+           Returns a list with entries: (byteoffset, level, wikiline, line, anchor)
+           This list may be empty and if sections are embedded by template, the
according
+           byteoffset and wikiline entries are None. The wikiline is the wiki text,
+           line is the parsed text and anchor ist the (unique) link label.
+        """
+        # replace 'byteoffset' ALWAYS by self calculated, since parsed does not
match wiki text
+        # bug fix; JIRA: DRTRIGON-82
+
+        # was there already a call? already some info available?
+        if hasattr(self, '_sections'):
+            return self._sections
+
+        # Old exceptions and contents do not apply any more.
+        for attr in ['_sections']:
+            if hasattr(self, attr):
+                delattr(self,attr)
+
+        # call the wiki to get info
+        params = {
+            u'action' : u'parse',
+            u'page'   : self.title(),
+            u'prop'   : u'sections',
+        }
+
+        pywikibot.get_throttle()
+        pywikibot.output(u"Reading section info from %s via API..." %
self.title(asLink=True))
+
+        result = query.GetData(params, self.site())
+        # JIRA: DRTRIGON-90; catch and convert error (convert it such that the whole page
gets processed later)
+        try:
+            r = result[u'parse'][u'sections']
+        except KeyError:    # sequence of sometimes occuring "KeyError:
u'parse'"
+            pywikibot.output(u'WARNING: Query result (gS): %r' % result)
+            raise pywikibot.Error('Problem occured during data retrieval for sections
in %s!' % self.title(asLink=True))
+        #debug_data = str(r) + '\n'
+        debug_data = str(result) + '\n'
+
+        if not sectionsonly:
+            # assign sections with wiki text and section byteoffset
+            #pywikibot.output(u"  Reading wiki page text (if not already
done).")
+
+            debug_data += str(len(self.__dict__.get('_contents',u''))) +
'\n'
+            self.get()
+            debug_data += str(len(self._contents)) + '\n'
+            debug_data += self._contents + '\n'
+
+            # code debugging
+            if verbose:
+                debugDump( 'Page.getSections', self.site,
'Page.getSections', debug_data.encode(config.textfile_encoding) )
+
+            for setting in [(0.05,0.95), (0.4,0.8), (0.05,0.8), (0.0,0.8)]:  # 0.6 is
default upper border
+                try:
+                    pos = 0
+                    for i, item in enumerate(r):
+                        item[u'level'] = int(item[u'level'])
+                        # byteoffset may be 0; 'None' means template
+                        #if (item[u'byteoffset'] != None) and
item[u'line']:
+                        # (empty index means also template - workaround for bug:
+                        # https://bugzilla.wikimedia.org/show_bug.cgi?id=32753)
+                        if (item[u'byteoffset'] != None) and
item[u'line'] and item[u'index']:
+                            # section on this page and index in format u"%i"
+                            self._getSectionByteOffset(item, pos, force, cutoff=setting) 
  # raises 'Error' if not sucessfull !
+                            pos                 = item[u'wikiline_bo'] +
len(item[u'wikiline'])
+                            item[u'byteoffset'] = item[u'wikiline_bo']
+                        else:
+                            # section embedded from template (index in format
u"T-%i") or the
+                            # parser was not able to recongnize section correct (e.g.
html) at all
+                            # (the byteoffset, index, ... may be correct or not)
+                            item[u'wikiline'] = None
+                        r[i] = item
+                    break
+                except pywikibot.Error:
+                    pos = None
+            if (pos == None):
+                raise  # re-raise
+
+        # check min. level
+        data = []
+        for item in r:
+            if (item[u'level'] < minLevel): continue
+            data.append( item )
+        r = data
+
+        # prepare resulting data
+        self._sections = [ (item[u'byteoffset'], item[u'level'],
item[u'wikiline'], item[u'line'], item[u'anchor']) for item in r
]
+
+        return self._sections
+
+    ## @since   r10309
+    #  @remarks needed by Page.getSections()
+    def _getSectionByteOffset(self, section, pos, force=False, cutoff=(0.05, 0.95)):
+        """determine the byteoffset of the given section (can be slow due
another API call).
+        """
+        wikitextlines = self._contents[pos:].splitlines()
+        possible_headers = []
+        #print section[u'line']
+
+        if not force:
+            # how the heading should look like (re)
+            l = section[u'level']
+            headers = [
u'^(\s*)%(spacer)s(.*?)%(spacer)s(\s*)((<!--(.*?)-->)?)(\s*)$' %
{'line': section[u'line'], 'spacer': u'=' * l},
+                    u'^(\s*)<h%(level)i>(.*?)</h%(level)i>(.*?)$' %
{'line': section[u'line'], 'level': l}, ]
+
+            # try to give exact match for heading (remove HTML comments)
+            for h in headers:
+                #ph = re.search(h, pywikibot.removeDisabledParts(self._contents[pos:]),
re.M)
+                ph = re.search(h, self._contents[pos:], re.M)
+                if ph:
+                    ph = ph.group(0).strip()
+                    possible_headers += [ (ph, section[u'line']) ]
+
+            # how the heading could look like (difflib)
+            headers = [ u'%(spacer)s %(line)s %(spacer)s' % {'line':
section[u'line'], 'spacer': u'=' * l},
+                    u'<h%(level)i>%(line)s</h%(level)i>' %
{'line': section[u'line'], 'level': l}, ]
+
+            # give possible match for heading
+            #
http://stackoverflow.com/questions/2923420/fuzzy-string-matching-algorithm-…
+            # http://docs.python.org/library/difflib.html
+            # (http://mwh.geek.nz/2009/04/26/python-damerau-levenshtein-distance/)
+            for h in headers:
+                ph = difflib.get_close_matches(h, wikitextlines, cutoff=cutoff[1])    #
cutoff=0.6 (default)
+                possible_headers += [ (p, section[u'line']) for p in ph ]
+                #print h, possible_headers
+
+        if not possible_headers and section[u'index']:        # nothing found,
try 'prop=revisions (rv)'
+            # call the wiki to get info
+            params = {
+                u'action'    : u'query',
+                u'titles'    : self.title(),
+                u'prop'      : u'revisions',
+                u'rvprop'    : u'content',
+                u'rvsection' : section[u'index'],
+            }
+
+            pywikibot.get_throttle()
+            pywikibot.output(u"  Reading section %s from %s via API..." %
(section[u'index'], self.title(asLink=True)))
+
+            result = query.GetData(params, self.site())
+            # JIRA: DRTRIGON-90; catch and convert error (convert it such that the whole
page gets processed later)
+            try:
+                r = result[u'query'][u'pages'].values()[0]
+                pl = r[u'revisions'][0][u'*'].splitlines()
+            except KeyError:    # sequence of sometimes occuring "KeyError:
u'parse'"
+                pywikibot.output(u'WARNING: Query result (gSBO): %r' % result)
+                raise pywikibot.Error('Problem occured during data retrieval for
sections in %s!' % self.title(asLink=True))
+
+            if pl:
+                possible_headers = [ (pl[0], pl[0]) ]
+
+        # find the most probable match for heading
+        #print possible_headers
+        best_match = (0.0, None)
+        for i, (ph, header) in enumerate(possible_headers):
+            #print u'    ', i, difflib.SequenceMatcher(None, header, ph).ratio(),
header, ph
+            mr = difflib.SequenceMatcher(None, header, ph).ratio()
+            if mr >= best_match[0]: best_match = (mr, ph)
+            if (i in [0, 1]) and (mr >= cutoff[0]): break  # use first (exact; re)
match directly (if good enough)
+        #print u'    ', best_match
+
+        # prepare resulting data
+        section[u'wikiline']    = best_match[1]
+        section[u'wikiline_mq'] = best_match[0]  # match quality
+        section[u'wikiline_bo'] = -1             # byteoffset
+        if section[u'wikiline']:
+            section[u'wikiline_bo'] =
self._contents.find(section[u'wikiline'], pos)
+        if section[u'wikiline_bo'] < 0:          # nothing found, report/raise
error !
+            #page._getexception = ...
+            raise pywikibot.Error('Problem occured during attempt to retrieve and
resolve sections in %s!' % self.title(asLink=True))
+            #pywikibot.output(...)
+            # (or create a own error, e.g. look into interwiki.py)
+
+    def permalink(self):
+        """Return the permalink URL for current revision of this
page."""
+        return "%s://%s%s&oldid=%i" % (self.site().protocol(),
+                                       self.site().hostname(),
+                                       self.site().get_address(self.title()),
+                                       self.latestRevision())
+
+    def latestRevision(self):
+        """Return the current revision id for this
page."""
+        if not self._permalink:
+            # When we get the page with getall, the permalink is received
+            # automatically
+            getall(self.site(),[self],force=True)
+            # Check for exceptions
+            if hasattr(self, '_getexception'):
+                raise self._getexception
+        return int(self._permalink)
+
+    def userName(self):
+        """Return name or IP address of last user to edit page.
+
+        Returns None unless page was retrieved with getAll().
+
+        """
+        return self._userName
+
+    ## @since   r10310
+    #  @remarks needed by various bots
+    def userNameHuman(self):
+        """Return name or IP address of last human/non-bot user to edit
page.
+
+           Returns the most recent human editor out of the last revisions
+           (optimal used with getAll()). If it was not able to retrieve a
+           human user returns None.
+        """
+
+        # was there already a call? already some info available?
+        if hasattr(self, '_userNameHuman'):
+            return self._userNameHuman
+
+        # get history (use preloaded if available)
+        (revid, timestmp, username, comment) = self.getVersionHistory(revCount=1)[0][:4]
+
+        # is the last/actual editor already a human?
+        import botlist # like watchlist
+        if not botlist.isBot(username):
+            self._userNameHuman = username
+            return username
+
+        # search the last human
+        self._userNameHuman = None
+        for vh in self.getVersionHistory()[1:]:
+            (revid, timestmp, username, comment) = vh[:4]
+
+            if username and (not botlist.isBot(username)):
+                # user is a human (not a bot)
+                self._userNameHuman = username
+                break
+
+        # store and return info
+        return self._userNameHuman
+
+    def isIpEdit(self):
+        """Return True if last editor was unregistered.
+
+        Returns None unless page was retrieved with getAll() or _getEditPage().
+
+        """
+        return self._ipedit
+
+    def editTime(self, datetime=False):
+        """Return timestamp (in MediaWiki format) of last revision to
page.
+
+        Returns None unless page was retrieved with getAll() or _getEditPage().
+
+        """
+        if self._editTime and datetime:
+            import datetime
+            return datetime.datetime.strptime(str(self._editTime),
'%Y%m%d%H%M%S')
+          
+        return self._editTime
+
+    def previousRevision(self):
+        """Return the revision id for the previous revision of this
Page."""
+        vh = self.getVersionHistory(revCount=2)
+        return vh[1][0]
+
+    def exists(self):
+        """Return True if page exists on the wiki, even if it's a
redirect.
+
+        If the title includes a section, return False if this section isn't
+        found.
+
+        """
+        try:
+            self.get()
+        except NoPage:
+            return False
+        except IsRedirectPage:
+            return True
+        except SectionError:
+            return False
+        return True
+
+    def pageAPInfo(self):
+        """Return the last revid if page exists on the wiki,
+           Raise IsRedirectPage if it's a redirect
+           Raise NoPage if the page doesn't exist
+
+        Using the API should be a lot faster.
+        Function done in order to improve the scripts performance.
+
+        """
+        params = {
+            'action'    :'query',
+            'prop'      :'info',
+            'titles'    :self.title(),
+            }
+        data = query.GetData(params, self.site(), encodeTitle =
False)['query']['pages'].values()[0]
+        if 'redirect' in data:
+            raise IsRedirectPage
+        elif 'missing' in data:
+            raise NoPage
+        elif 'lastrevid' in data:
+            return data['lastrevid'] # if ok, return the last revid
+        else:
+            # should not exists, OR we have problems.
+            # better double check in this situations
+            x = self.get()
+            return True # if we reach this point, we had no problems.
+
+    def getTemplates(self, tllimit = 5000):
+        #action=query&prop=templates&titles=Main Page
+        """
+        Returns the templates that are used in the page given by API.
+
+        If no templates found, returns None.
+
+        """
+        params = {
+            'action': 'query',
+            'prop': 'templates',
+            'titles': self.title(),
+            'tllimit': tllimit,
+        }
+        if tllimit > config.special_page_limit:
+            params['tllimit'] = config.special_page_limit
+            if tllimit > 5000 and self.site.isAllowed('apihighlimits'):
+                params['tllimit'] = 5000
+
+        tmpsFound = []
+        count = 0
+        while True:
+            data = query.GetData(params, self.site(), encodeTitle =
False)['query']['pages'].values()[0]
+            if "templates" not in data:
+                return []
+
+            for tmp in data['templates']:
+                count += 1
+                tmpsFound.append(Page(self.site(), tmp['title'],
defaultNamespace=tmp['ns']) )
+                if count >= tllimit:
+                    break
+
+            if 'query-continue' in data and count < tllimit:
+                params["tlcontinue"] =
data["query-continue"]["templates"]["tlcontinue"]
+            else:
+                break
+
+        return tmpsFound
+
+    def isRedirectPage(self):
+        """Return True if this is a redirect, False if not or not
existing."""
+        try:
+            self.get()
+        except NoPage:
+            return False
+        except IsRedirectPage:
+            return True
+        except SectionError:
+            return False
+        return False
+
+    def isStaticRedirect(self, force=False):
+        """Return True if this is a redirect containing the magic word
+        __STATICREDIRECT__, False if not or not existing.
+
+        """
+        found = False
+        if self.isRedirectPage() and self.site().versionnumber() > 13:
+            staticKeys = self.site().getmagicwords('staticredirect')
+            text = self.get(get_redirect=True, force=force)
+            if staticKeys:
+                for key in staticKeys:
+                    if key in text:
+                        found = True
+                        break
+        return found
+
+    def isCategoryRedirect(self, text=None):
+        """Return True if this is a category redirect page, False
otherwise."""
+
+        if not self.isCategory():
+            return False
+        if not hasattr(self, "_catredirect"):
+            if not text:
+                try:
+                    text = self.get(get_redirect=True)
+                except NoPage:
+                    return False
+            catredirs = self.site().category_redirects()
+            for (t, args) in self.templatesWithParams(thistxt=text):
+                template = Page(self.site(), t, defaultNamespace=10
+                                ).title(withNamespace=False) # normalize title
+                if template in catredirs:
+                    # Get target (first template argument)
+                    if not args:
+                        pywikibot.output(u'Warning: redirect target for %s is
missing'
+                                         % self.title(asLink=True))
+                        self._catredirect = False
+                    else:
+                        self._catredirect = self.site().namespace(14) + ":" +
args[0]
+                    break
+            else:
+                self._catredirect = False
+        return bool(self._catredirect)
+
+    def getCategoryRedirectTarget(self):
+        """If this is a category redirect, return the target category
title."""
+        if self.isCategoryRedirect():
+            import catlib
+            return catlib.Category(self.site(), self._catredirect)
+        raise IsNotRedirectPage
+
+    def isEmpty(self):
+        """Return True if the page text has less than 4 characters.
+
+        Character count ignores language links and category links.
+        Can raise the same exceptions as get().
+
+        """
+        txt = self.get()
+        txt = removeLanguageLinks(txt, site = self.site())
+        txt = removeCategoryLinks(txt, site = self.site())
+        if len(txt) < 4:
+            return True
+        else:
+            return False
+
+    def isTalkPage(self):
+        """Return True if this page is in any talk
namespace."""
+        ns = self.namespace()
+        return ns >= 0 and ns % 2 == 1
+
+    def toggleTalkPage(self):
+        """Return other member of the article-talk page pair for this
Page.
+
+        If self is a talk page, returns the associated content page;
+        otherwise, returns the associated talk page.
+        Returns None if self is a special page.
+
+        """
+        ns = self.namespace()
+        if ns < 0: # Special page
+            return None
+        if self.isTalkPage():
+            ns -= 1
+        else:
+            ns += 1
+
+        if ns == 6:
+            return ImagePage(self.site(), self.title(withNamespace=False))
+
+        return Page(self.site(), self.title(withNamespace=False),
+                    defaultNamespace=ns)
+
+    def isCategory(self):
+        """Return True if the page is a Category, False
otherwise."""
+        return self.namespace() == 14
+
+    def isImage(self):
+        """Return True if this is an image description page, False
otherwise."""
+        return self.namespace() == 6
+
+    def isDisambig(self, get_Index=True):
+        """Return True if this is a disambiguation page, False otherwise.
+
+        Relies on the presence of specific templates, identified in
+        the Family file or on a wiki page, to identify disambiguation
+        pages.
+
+        By default, loads a list of template names from the Family file;
+        if the value in the Family file is None no entry was made, looks for
+        the list on [[MediaWiki:Disambiguationspage]]. If this page does not
+        exist, take the mediawiki message.
+
+        If get_Index is True then also load the templates for index articles
+        which are given on en-wiki
+
+        Template:Disambig is always assumed to be default, and will be
+        appended regardless of its existence.
+
+        """
+        if not hasattr(self, "_isDisambig"):
+            if not hasattr(self._site, "_disambigtemplates"):
+                try:
+                    default = set(self._site.family.disambig('_default'))
+                except KeyError:
+                    default = set([u'Disambig'])
+                try:
+                    distl = self._site.family.disambig(self._site.lang,
+                                                       fallback=False)
+                except KeyError:
+                    distl = None
+                if distl is None:
+                    try:
+                        disambigpages = Page(self._site,
+                                             "MediaWiki:Disambiguationspage")
+                        disambigs = set(link.title(withNamespace=False)
+                                        for link in disambigpages.linkedPages()
+                                        if link.namespace() == 10)
+                        # add index article templates
+                        if get_Index and \
+                           self._site.sitename() == 'wikipedia:en':
+                            regex = re.compile('\(\((.+?)\)\)')
+                            content = disambigpages.get()
+                            for index in regex.findall(content):
+                                disambigs.add(index[:1].upper() + index[1:])
+                    except NoPage:
+                        disambigs = set([self._site.mediawiki_message(
+                            'Disambiguationspage').split(':', 1)[1]])
+                    # add the default template(s)
+                    self._site._disambigtemplates = disambigs | default
+                else:
+                    # Normalize template capitalization
+                    self._site._disambigtemplates = set(
+                        t[:1].upper() + t[1:] for t in distl
+                    )
+            disambigInPage = self._site._disambigtemplates.intersection(
+                self.templates())
+            self._isDisambig = self.namespace() != 10 and \
+                               len(disambigInPage) > 0
+        return self._isDisambig
+
+    def canBeEdited(self):
+        """Return bool indicating whether this page can be edited.
+
+        This returns True if and only if:
+          - page is unprotected, and bot has an account for this site, or
+          - page is protected, and bot has a sysop account for this site.
+
+        """
+        try:
+            self.get()
+        except:
+            pass
+        if self.editRestriction == 'sysop':
+            userdict = config.sysopnames
+        else:
+            userdict = config.usernames
+        try:
+            userdict[self.site().family.name][self.site().lang]
+            return True
+        except:
+            # We don't have a user account for that wiki, or the
+            # page is locked and we don't have a sysop account.
+            return False
+
+    def botMayEdit(self, username):
+        """Return True if this page allows bots to edit it.
+
+        This will be True if the page doesn't contain {{bots}} or
+        {{nobots}}, or it contains them and the active bot is allowed to
+        edit this page. (This method is only useful on those sites that
+        recognize the bot-exclusion protocol; on other sites, it will always
+        return True.)
+
+        The framework enforces this restriction by default. It is possible
+        to override this by setting ignore_bot_templates=True in
+        user-config.py, or using page.put(force=True).
+
+        """
+
+        if self.site().family.name == 'wikitravel':        # Wikitravel's bot
control.
+            self.site().family.bot_control(self.site())
+
+        if config.ignore_bot_templates: #Check the "master ignore switch"
+            return True
+
+        try:
+            templates = self.templatesWithParams(get_redirect=True);
+        except (NoPage, IsRedirectPage, SectionError):
+            return True
+
+        for template in templates:
+            if template[0].lower() == 'nobots':
+                return False
+            elif template[0].lower() == 'bots':
+                if len(template[1]) == 0:
+                    return True
+                else:
+                    (ttype, bots) = template[1][0].split('=', 1)
+                    bots = bots.split(',')
+                    if ttype == 'allow':
+                        if 'all' in bots or username in bots:
+                            return True
+                        else:
+                            return False
+                    if ttype == 'deny':
+                        if 'all' in bots or username in bots:
+                            return False
+                        else:
+                            return True
+        # no restricting template found
+        return True
+
+    def getReferences(self, follow_redirects=True, withTemplateInclusion=True,
+            onlyTemplateInclusion=False, redirectsOnly=False, internal = False):
+        """Yield all pages that link to the page by API
+
+        If you need a full list of referring pages, use this:
+            pages = [page for page in s.getReferences()]
+        Parameters:
+        * follow_redirects      - if True, also returns pages that link to a
+                                  redirect pointing to the page.
+        * withTemplateInclusion - if True, also returns pages where self is
+                                  used as a template.
+        * onlyTemplateInclusion - if True, only returns pages where self is
+                                  used as a template.
+        * redirectsOnly         - if True, only returns redirects to self.
+
+        """
+        if not self.site().has_api():
+            for s in self.getReferencesOld(follow_redirects, withTemplateInclusion,
onlyTemplateInclusion, redirectsOnly):
+                yield s
+            return
+
+        params = {
+            'action': 'query',
+            'list': [],
+        }
+        if not onlyTemplateInclusion:
+            params['list'].append('backlinks')
+            params['bltitle'] = self.title()
+            params['bllimit'] = config.special_page_limit
+            params['blfilterredir'] = 'all'
+            if follow_redirects:
+                params['blredirect'] = 1
+            if redirectsOnly:
+                params['blfilterredir'] = 'redirects'
+            if not self.site().isAllowed('apihighlimits') and
config.special_page_limit > 500:
+                params['bllimit'] = 500
+
+        if withTemplateInclusion or onlyTemplateInclusion:
+            params['list'].append('embeddedin')
+            params['eititle'] = self.title()
+            params['eilimit'] = config.special_page_limit
+            params['eifilterredir'] = 'all'
+            if follow_redirects:
+                params['eiredirect'] = 1
+            if redirectsOnly:
+                params['eifilterredir'] = 'redirects'
+            if not self.site().isAllowed('apihighlimits') and
config.special_page_limit > 500:
+                params['eilimit'] = 500
+
+        allDone = False
+
+        while not allDone:
+            if not internal:
+                output(u'Getting references to %s via API...'
+                       % self.title(asLink=True))
+
+            datas = query.GetData(params, self.site())
+            data = datas['query'].values()
+            if len(data) == 2:
+                data = data[0] + data[1]
+            else:
+                data = data[0]
+
+            refPages = set()
+            for blp in data:
+                pg = Page(self.site(), blp['title'], defaultNamespace =
blp['ns'])
+                if pg in refPages:
+                    continue
+
+                yield pg
+                refPages.add(pg)
+                if follow_redirects and 'redirect' in blp and
'redirlinks' in blp:
+                    for p in blp['redirlinks']:
+                        plk = Page(self.site(), p['title'], defaultNamespace =
p['ns'])
+                        if plk in refPages:
+                            continue
+
+                        yield plk
+                        refPages.add(plk)
+                        if follow_redirects and 'redirect' in p and plk != self:
+                            for zms in plk.getReferences(follow_redirects,
withTemplateInclusion,
+                                              onlyTemplateInclusion, redirectsOnly,
internal=True):
+                                yield zms
+                        else:
+                            continue
+                else:
+                    continue
+
+            if 'query-continue' in datas:
+                if 'backlinks' in datas['query-continue']:
+                    params['blcontinue'] =
datas['query-continue']['backlinks']['blcontinue']
+
+                if 'embeddedin' in datas['query-continue']:
+                    params['eicontinue'] =
datas['query-continue']['embeddedin']['eicontinue']
+            else:
+                allDone = True
+
+
+    def getReferencesOld(self,
+            follow_redirects=True, withTemplateInclusion=True,
+            onlyTemplateInclusion=False, redirectsOnly=False):
+        """Yield all pages that link to the page.
+        """
+        # Temporary bug-fix while researching more robust solution:
+        if config.special_page_limit > 999:
+            config.special_page_limit = 999
+        site = self.site()
+        path = self.site().references_address(self.urlname())
+        if withTemplateInclusion:
+            path+=u'&hidetrans=0'
+        if onlyTemplateInclusion:
+           
path+=u'&hidetrans=0&hidelinks=1&hideredirs=1&hideimages=1'
+        if redirectsOnly:
+           
path+=u'&hideredirs=0&hidetrans=1&hidelinks=1&hideimages=1'
+        content = SoupStrainer("div", id=self.site().family.content_id)
+        try:
+            next_msg = self.site().mediawiki_message('whatlinkshere-next')
+        except KeyError:
+            next_msg = "next %i" % config.special_page_limit
+        plural = (config.special_page_limit == 1) and "\\1" or "\\2"
+        next_msg = re.sub(r"{{PLURAL:\$1\|(.*?)\|(.*?)}}", plural, next_msg)
+        nextpattern = re.compile("^%s$" % next_msg.replace("$1",
"[0-9]+"))
+        delay = 1
+        if self.site().has_mediawiki_message("Isredirect"):
+            self._isredirectmessage =
self.site().mediawiki_message("Isredirect")
+        if self.site().has_mediawiki_message("Istemplate"):
+            self._istemplatemessage =
self.site().mediawiki_message("Istemplate")
+        # to avoid duplicates:
+        refPages = set()
+        while path:
+            output(u'Getting references to %s' % self.title(asLink=True))
+            get_throttle()
+            txt = self.site().getUrl(path)
+            body = BeautifulSoup(txt,
+                                 convertEntities=BeautifulSoup.HTML_ENTITIES,
+                                 parseOnlyThese=content)
+            next_text = body.find(text=nextpattern)
+            if next_text is not None and next_text.parent.has_key('href'):
+                path = next_text.parent['href'].replace("&amp;",
"&")
+            else:
+                path = ""
+            reflist = body.find("ul")
+            if reflist is None:
+                return
+            for page in self._parse_reflist(reflist,
+                                follow_redirects, withTemplateInclusion,
+                                onlyTemplateInclusion, redirectsOnly):
+                if page not in refPages:
+                    yield page
+                    refPages.add(page)
+
+    def _parse_reflist(self, reflist,
+            follow_redirects=True, withTemplateInclusion=True,
+            onlyTemplateInclusion=False, redirectsOnly=False):
+        """For internal use only
+
+        Parse a "Special:Whatlinkshere" list of references and yield Page
+        objects that meet the criteria (used by getReferences)
+        """
+        for link in reflist("li", recursive=False):
+            title = link.a.string
+            if title is None:
+                output(u"DBG> invalid <li> item in Whatlinkshere: %s"
% link)
+            try:
+                p = Page(self.site(), title)
+            except InvalidTitle:
+                output(u"DBG> Whatlinkshere:%s contains invalid link to %s"
+                        % (self.title(), title))
+                continue
+            isredirect, istemplate = False, False
+            textafter = link.a.findNextSibling(text=True)
+            if textafter is not None:
+                if self.site().has_mediawiki_message("Isredirect") \
+                        and self._isredirectmessage in textafter:
+                    # make sure this is really a redirect to this page
+                    # (MediaWiki will mark as a redirect any link that follows
+                    # a #REDIRECT marker, not just the first one).
+                    if p.getRedirectTarget().sectionFreeTitle() ==
self.sectionFreeTitle():
+                        isredirect = True
+                if self.site().has_mediawiki_message("Istemplate") \
+                        and self._istemplatemessage in textafter:
+                    istemplate = True
+            if (withTemplateInclusion or onlyTemplateInclusion or not istemplate
+                    ) and (not redirectsOnly or isredirect
+                    ) and (not onlyTemplateInclusion or istemplate
+                    ):
+                yield p
+                continue
+
+            if isredirect and follow_redirects:
+                sublist = link.find("ul")
+                if sublist is not None:
+                    for p in self._parse_reflist(sublist,
+                                follow_redirects, withTemplateInclusion,
+                                onlyTemplateInclusion, redirectsOnly):
+                        yield p
+
+    def _getActionUser(self, action, restriction = '', sysop = False):
+        """
+        Get the user to do an action: sysop or not sysop, or raise an exception
+        if the user cannot do that.
+
+        Parameters:
+        * action - the action to be done, which is the name of the right
+        * restriction - the restriction level or an empty string for no restriction
+        * sysop - initially use sysop user?
+        """
+        # Login
+        self.site().forceLogin(sysop = sysop)
+
+        # Check permissions
+        if not self.site().isAllowed(action, sysop):
+            if sysop:
+                raise LockedPage(u'The sysop user is not allowed to %s in site
%s' % (action, self.site()))
+            else:
+                try:
+                    user = self._getActionUser(action, restriction, sysop = True)
+                    output(u'The user is not allowed to %s on site %s. Using sysop
account.' % (action, self.site()))
+                    return user
+                except NoUsername:
+                    raise LockedPage(u'The user is not allowed to %s on site %s, and
no sysop account is defined.' % (action, self.site()))
+                except LockedPage:
+                    raise
+
+        # Check restrictions
+        if not self.site().isAllowed(restriction, sysop):
+            if sysop:
+                raise LockedPage(u'Page on %s is locked in a way that sysop user
cannot %s it' % (self.site(), action))
+            else:
+                try:
+                    user = self._getActionUser(action, restriction, sysop = True)
+                    output(u'Page is locked on %s - cannot %s, using sysop
account.' % (self.site(), action))
+                    return user
+                except NoUsername:
+                    raise LockedPage(u'Page is locked on %s - cannot %s, and no sysop
account is defined.' % (self.site(), action))
+                except LockedPage:
+                    raise
+
+        return sysop
+
+    def getRestrictions(self):
+        """
+        Get the protections on the page.
+        * Returns a restrictions dictionary. Keys are 'edit' and 'move',
+          Values are None (no restriction for that action) or [level, expiry] :
+            * level is the level of auth needed to perform that action
+                ('autoconfirmed' or 'sysop')
+            * expiry is the expiration time of the restriction
+        """
+        #, titles = None
+        #if titles:
+        #    restrictions = {}
+        #else:
+        restrictions = { 'edit': None, 'move': None }
+        try:
+            api_url = self.site().api_address()
+        except NotImplementedError:
+            return restrictions
+
+        predata = {
+            'action': 'query',
+            'prop': 'info',
+            'inprop': 'protection',
+            'titles': self.title(),
+        }
+        #if titles:
+        #    predata['titles'] = titles
+
+        text = query.GetData(predata, self.site())['query']['pages']
+
+        for pageid in text:
+            if 'missing' in text[pageid]:
+                self._getexception = NoPage
+                raise NoPage('Page %s does not exist' % self.title(asLink=True))
+            elif not 'pageid' in text[pageid]:
+                # Don't know what may happen here.
+                # We may want to have better error handling
+                raise Error("BUG> API problem.")
+            if text[pageid]['protection'] != []:
+                #if titles:
+                #    restrictions = dict([ detail['type'], [
detail['level'], detail['expiry'] ] ]
+                #        for detail in text[pageid]['protection'])
+                #else:
+                restrictions = dict([ detail['type'], [ detail['level'],
detail['expiry'] ] ]
+                    for detail in text[pageid]['protection'])
+
+        return restrictions
+
+    def put_async(self, newtext,
+                  comment=None, watchArticle=None, minorEdit=True, force=False,
+                  callback=None):
+        """Put page on queue to be saved to wiki asynchronously.
+
+        Asynchronous version of put (takes the same arguments), which places
+        pages on a queue to be saved by a daemon thread. All arguments  are
+        the same as for .put(), except --
+
+        callback: a callable object that will be called after the page put
+                  operation; this object must take two arguments:
+                  (1) a Page object, and (2) an exception instance, which
+                  will be None if the page was saved successfully.
+
+        The callback is intended to be used by bots that need to keep track
+        of which saves were successful.
+
+        """
+        try:
+            page_put_queue.mutex.acquire()
+            try:
+                _putthread.start()
+            except (AssertionError, RuntimeError):
+                pass
+        finally:
+            page_put_queue.mutex.release()
+        page_put_queue.put((self, newtext, comment, watchArticle, minorEdit,
+                            force, callback))
+
+    def put(self, newtext, comment=None, watchArticle=None, minorEdit=True,
+            force=False, sysop=False, botflag=True, maxTries=-1):
+        """Save the page with the contents of the first argument as the
text.
+
+        Optional parameters:
+          comment:  a unicode string that is to be used as the summary for
+                    the modification.
+          watchArticle: a bool, add or remove this Page to/from bot user's
+                        watchlist (if None, leave watchlist status unchanged)
+          minorEdit: mark this edit as minor if True
+          force: ignore botMayEdit() setting.
+          maxTries: the maximum amount of save attempts. -1 for infinite.
+        """
+        # Login
+        try:
+            self.get()
+        except:
+            pass
+        sysop = self._getActionUser(action = 'edit', restriction =
self.editRestriction, sysop = sysop)
+        username = self.site().loggedInAs()
+
+        # Check blocks
+        self.site().checkBlocks(sysop = sysop)
+
+        # Determine if we are allowed to edit
+        if not force:
+            if not self.botMayEdit(username):
+                raise LockedPage(
+                    u'Not allowed to edit %s because of a restricting template'
+                    % self.title(asLink=True))
+            elif self.site().has_api() and self.namespace() in [2,3] \
+                 and (self.title().endswith('.css') or \
+                      self.title().endswith('.js')):
+                titleparts = self.title().split("/")
+                userpageowner = titleparts[0].split(":")[1]
+                if userpageowner != username:
+                    # API enable: if title ends with .css or .js in ns2,3
+                    # it needs permission to edit user pages
+                    if self.title().endswith('css'):
+                        permission = 'editusercss'
+                    else:
+                        permission = 'edituserjs'
+                    sysop = self._getActionUser(action=permission,
+                                                restriction=self.editRestriction,
+                                                sysop=True)
+
+        # If there is an unchecked edit restriction, we need to load the page
+        if self._editrestriction:
+            output(
+u'Page %s is semi-protected. Getting edit page to find out if we are allowed to
edit.'
+                   % self.title(asLink=True))
+            oldtime = self.editTime()
+            # Note: change_edit_time=True is always True since
+            #       self.get() calls self._getEditPage without this parameter
+            self.get(force=True, change_edit_time=True)
+            newtime = self.editTime()
+            ### TODO: we have different timestamp formats
+            if re.sub('\D', '', str(oldtime)) != re.sub('\D',
'', str(newtime)): # page was changed
+                raise EditConflict(u'Page has been changed after first read.')
+            self._editrestriction = False
+        # If no comment is given for the change, use the default
+        comment = comment or action
+        if config.cosmetic_changes and not self.isTalkPage() and \
+           not calledModuleName() in ('cosmetic_changes', 'touch'):
+            if config.cosmetic_changes_mylang_only:
+                cc = (self.site().family.name == config.family and self.site().lang ==
config.mylang) or \
+                     self.site().family.name in config.cosmetic_changes_enable.keys() and
\
+                     self.site().lang in
config.cosmetic_changes_enable[self.site().family.name]
+            else:
+                cc = True
+            cc = cc and not \
+                 (self.site().family.name in config.cosmetic_changes_disable.keys() and
\
+                 self.site().lang in
config.cosmetic_changes_disable[self.site().family.name])
+            if cc:
+                old = newtext
+                if verbose:
+                    output(u'Cosmetic Changes for %s-%s enabled.' %
(self.site().family.name, self.site().lang))
+                import cosmetic_changes
+                from pywikibot import i18n
+                ccToolkit = cosmetic_changes.CosmeticChangesToolkit(self.site(),
redirect=self.isRedirectPage(), namespace = self.namespace(), pageTitle=self.title())
+                newtext = ccToolkit.change(newtext)
+                if comment and old.strip().replace('\r\n', '\n') !=
newtext.strip().replace('\r\n', '\n'):
+                    comment += i18n.twtranslate(self.site(),
'cosmetic_changes-append')
+
+        if watchArticle is None:
+            # if the page was loaded via get(), we know its status
+            if hasattr(self, '_isWatched'):
+                watchArticle = self._isWatched
+            else:
+                import watchlist
+                watchArticle = watchlist.isWatched(self.title(), site = self.site())
+        newPage = not self.exists()
+        # if posting to an Esperanto wiki, we must e.g. write Bordeauxx instead
+        # of Bordeaux
+        if self.site().lang == 'eo' and not self.site().has_api():
+            newtext = encodeEsperantoX(newtext)
+            comment = encodeEsperantoX(comment)
+
+        return self._putPage(newtext, comment, watchArticle, minorEdit,
+                             newPage, self.site().getToken(sysop = sysop), sysop = sysop,
botflag=botflag, maxTries=maxTries)
+
+    def _encodeArg(self, arg, msgForError):
+        """Encode an ascii string/Unicode string to the site's
encoding"""
+        try:
+            return arg.encode(self.site().encoding())
+        except UnicodeDecodeError, e:
+            # happens when arg is a non-ascii bytestring :
+            # when reencoding bytestrings, python decodes first to ascii
+            e.reason += ' (cannot convert input %s string to unicode)' %
msgForError
+            raise e
+        except UnicodeEncodeError, e:
+            # happens when arg is unicode
+            e.reason += ' (cannot convert %s to wiki encoding %s)' %
(msgForError, self.site().encoding())
+            raise e
+
+    def _putPage(self, text, comment=None, watchArticle=False, minorEdit=True,
+                newPage=False, token=None, newToken=False, sysop=False,
+                captcha=None, botflag=True, maxTries=-1):
+        """Upload 'text' as new content of Page by API
+
+        Don't use this directly, use put() instead.
+
+        """
+        if not self.site().has_api() or self.site().versionnumber() < 13:
+            # api not enabled or version not supported
+            return self._putPageOld(text, comment, watchArticle, minorEdit,
+                newPage, token, newToken, sysop, captcha, botflag, maxTries)
+
+        retry_attempt = 0
+        retry_delay = 1
+        dblagged = False
+        params = {
+            'action': 'edit',
+            'title': self.title(),
+            'text': self._encodeArg(text, 'text'),
+            'summary': self._encodeArg(comment, 'summary'),
+        }
+
+        if token:
+            params['token'] = token
+        else:
+            params['token'] = self.site().getToken(sysop = sysop)
+
+        # Add server lag parameter (see config.py for details)
+        if config.maxlag:
+            params['maxlag'] = str(config.maxlag)
+
+        if self._editTime:
+            params['basetimestamp'] = self._editTime
+        else:
+            params['basetimestamp'] = time.strftime('%Y%m%d%H%M%S',
time.gmtime())
+
+        if self._startTime:
+            params['starttimestamp'] = self._startTime
+        else:
+            params['starttimestamp'] = time.strftime('%Y%m%d%H%M%S',
time.gmtime())
+
+        if botflag:
+            params['bot'] = 1
+
+        if minorEdit:
+            params['minor'] = 1
+        else:
+            params['notminor'] = 1
+
+        if watchArticle:
+            params['watch'] = 1
+        #else:
+        #    params['unwatch'] = 1
+
+        if captcha:
+            params['captchaid'] = captcha['id']
+            params['captchaword'] = captcha['answer']
+
+        while True:
+            if (maxTries == 0):
+                raise MaxTriesExceededError()
+            maxTries -= 1
+            # Check whether we are not too quickly after the previous
+            # putPage, and wait a bit until the interval is acceptable
+            if not dblagged:
+                put_throttle()
+            # Which web-site host are we submitting to?
+            if newPage:
+                output(u'Creating page %s via API' % self.title(asLink=True))
+                params['createonly'] = 1
+            else:
+                output(u'Updating page %s via API' % self.title(asLink=True))
+                params['nocreate'] = 1
+            # Submit the prepared information
+            try:
+                response, data = query.GetData(params, self.site(), sysop=sysop,
back_response = True)
+                if isinstance(data,basestring):
+                    raise KeyError
+            except httplib.BadStatusLine, line:
+                raise PageNotSaved('Bad status line: %s' % line.line)
+            except ServerError:
+                output(u''.join(traceback.format_exception(*sys.exc_info())))
+                retry_attempt += 1
+                if retry_attempt > config.maxretries:
+                    raise
+                output(u'Got a server error when putting %s; will retry in %i
minute%s.' % (self.title(asLink=True), retry_delay, retry_delay != 1 and "s"
or ""))
+                time.sleep(60 * retry_delay)
+                retry_delay *= 2
+                if retry_delay > 30:
+                    retry_delay = 30
+                continue
+            except ValueError: # API result cannot decode
+                output(u"Server error encountered; will retry in %i minute%s."
+                       % (retry_delay, retry_delay != 1 and "s" or
""))
+                time.sleep(60 * retry_delay)
+                retry_delay *= 2
+                if retry_delay > 30:
+                    retry_delay = 30
+                continue
+            # If it has gotten this far then we should reset dblagged
+            dblagged = False
+            # Check blocks
+            self.site().checkBlocks(sysop = sysop)
+            # A second text area means that an edit conflict has occured.
+            if response.code == 500:
+                output(u"Server error encountered; will retry in %i minute%s."
+                       % (retry_delay, retry_delay != 1 and "s" or
""))
+                time.sleep(60 * retry_delay)
+                retry_delay *= 2
+                if retry_delay > 30:
+                    retry_delay = 30
+                continue
+            if 'error' in data:
+                #All available error key in edit mode: (from ApiBase.php)
+                # 'noimageredirect-anon':"Anonymous users can't create
image redirects",
+                # 'noimageredirect':"You don't have permission to create
image redirects",
+                # 'filtered':"The filter callback function refused your
edit",
+                # 'noedit-anon':"Anonymous users can't edit
pages",
+                # 'noedit':"You don't have permission to edit
pages",
+                # 'emptypage':"Creating new, empty pages is not
allowed",
+                # 'badmd5':"The supplied MD5 hash was incorrect",
+                # 'notext':"One of the text, appendtext, prependtext and
undo parameters must be set",
+                # 'emptynewsection':'Creating empty new sections is not
possible.',
+                # 'revwrongpage':"r\$1 is not a revision of
``\$2''",
+                # 'undofailure':'Undo failed due to conflicting intermediate
edits',
+
+                #for debug only
+                #------------------------
+                if verbose:
+                    output("error occured,
code:%s\ninfo:%s\nstatus:%s\nresponse:%s" % (
+                        data['error']['code'],
data['error']['info'], response.code, response.msg))
+                    faked = params
+                    if 'text' in faked:
+                        del faked['text']
+                    output("OriginalData:%s" % faked)
+                    del faked
+                #------------------------
+                errorCode = data['error']['code']
+                #cannot handle longpageerror and PageNoSave yet
+                if errorCode == 'maxlag' or response.code == 503:
+                    # server lag; wait for the lag time and retry
+                    lagpattern = re.compile(r"Waiting for [\d.]+: (?P<lag>\d+)
seconds? lagged")
+                    lag = lagpattern.search(data['error']['info'])
+                    timelag = int(lag.group("lag"))
+                    output(u"Pausing %d seconds due to database server lag." %
min(timelag,300))
+                    dblagged = True
+                    time.sleep(min(timelag,300))
+                    continue
+                elif errorCode == 'editconflict':
+                    # 'editconflict':"Edit conflict detected",
+                    raise EditConflict(u'An edit conflict has occured.')
+                elif errorCode == 'spamdetected':
+                    # 'spamdetected':"Your edit was refused because it
contained a spam fragment: ``\$1''",
+                    raise SpamfilterError(data['error']['info'][62:-2])
+                elif errorCode == 'pagedeleted':
+                    # 'pagedeleted':"The page has been deleted since you
fetched its timestamp",
+                    # Make sure your system clock is correct if this error occurs
+                    # without any reason!
+                    # raise EditConflict(u'Someone deleted the page.')
+                    # No raise, simply define these variables and retry:
+                    params['recreate'] = 1
+                    if self._editTime:
+                        params['basetimestamp'] = self._editTime
+                    else:
+                        params['basetimestamp'] =
time.strftime('%Y%m%d%H%M%S', time.gmtime())
+
+                    if self._startTime:
+                        params['starttimestamp'] = self._startTime
+                    else:
+                        params['starttimestamp'] =
time.strftime('%Y%m%d%H%M%S', time.gmtime())
+                    continue
+                elif errorCode == 'readonly':
+                    # 'readonly':"The wiki is currently in read-only
mode"
+                    output(u"The database is currently locked for write access; will
retry in %i minute%s."
+                           % (retry_delay, retry_delay != 1 and "s" or
""))
+                    time.sleep(60 * retry_delay)
+                    retry_delay *= 2
+                    if retry_delay > 30:
+                        retry_delay = 30
+                    continue
+                elif errorCode == 'contenttoobig':
+                    # 'contenttoobig':"The content you supplied exceeds the
article size limit of \$1 kilobytes",
+                    raise LongPageError(len(params['text']),
int(data['error']['info'][59:-10]))
+                elif errorCode in ['protectedpage',
'customcssjsprotected', 'cascadeprotected', 'protectednamespace',
'protectednamespace-interface']:
+                    # 'protectedpage':"The ``\$1'' right is required
to edit this page"
+                    # 'cascadeprotected':"The page you're trying to edit
is protected because it's included in a cascade-protected page"
+                    # 'customcssjsprotected': "You're not allowed to
edit custom CSS and JavaScript pages"
+                    # 'protectednamespace': "You're not allowed to edit
pages in the ``\$1'' namespace"
+                    # 'protectednamespace-interface':"You're not allowed
to edit interface messages"
+                    #
+                    # The page is locked. This should have already been
+                    # detected when getting the page, but there are some
+                    # reasons why this didn't work, e.g. the page might be
+                    # locked via a cascade lock.
+                    try:
+                        # Page is locked - try using the sysop account, unless we're
using one already
+                        if sysop:# Unknown permissions error
+                            raise LockedPage()
+                        else:
+                            self.site().forceLogin(sysop = True)
+                            output(u'Page is locked, retrying using sysop
account.')
+                            return self._putPage(text, comment, watchArticle, minorEdit,
newPage, token=self.site().getToken(sysop = True), sysop = True)
+                    except NoUsername:
+                        raise LockedPage()
+                elif errorCode == 'badtoken':
+                    if newToken:
+                        output(u"Edit token has failed. Giving up.")
+                    else:
+                        # We might have been using an outdated token
+                        output(u"Edit token has failed. Retrying.")
+                        return self._putPage(text, comment, watchArticle, minorEdit,
newPage, token=self.site().getToken(sysop = sysop, getagain = True), newToken = True,
sysop = sysop)
+                # I think the error message title was changed from "Wikimedia
Error"
+                # to "Wikipedia has a problem", but I'm not sure. Maybe we
could
+                # just check for HTTP Status 500 (Internal Server Error)?
+                else:
+                    output("Unknown Error. API Error code:%s" %
data['error']['code'] )
+                    output("Information:%s" %
data['error']['info'])
+            else:
+                if data['edit']['result'] == u"Success":
+                    #
+                    # The status code for update page completed in ordinary mode is 302 -
Found
+                    # But API is always 200 - OK because it only send "success"
back in string.
+                    # if the page update is successed, we need to return code 302 for
cheat script who
+                    # using status code
+                    #
+                    return 302, response.msg, data['edit']
+
+            solve = self.site().solveCaptcha(data)
+            if solve:
+                return self._putPage(text, comment, watchArticle, minorEdit, newPage,
token, newToken, sysop, captcha=solve)
+
+            return response.code, response.msg, data
+
+
+    def _putPageOld(self, text, comment=None, watchArticle=False, minorEdit=True,
+                newPage=False, token=None, newToken=False, sysop=False,
+                captcha=None, botflag=True, maxTries=-1):
+        """Upload 'text' as new content of Page by filling out the
edit form.
+
+        Don't use this directly, use put() instead.
+
+        """
+        host = self.site().hostname()
+        # Get the address of the page on that host.
+        address = self.site().put_address(self.urlname())
+        predata = {
+            'wpSave': '1',
+            'wpSummary': self._encodeArg(comment, 'edit summary'),
+            'wpTextbox1': self._encodeArg(text, 'wikitext'),
+            # As of October 2008, MW HEAD requires wpSection to be set.
+            # We will need to fill this more smartly if we ever decide to edit by
section
+            'wpSection': '',
+        }
+        if not botflag:
+            predata['bot']='0'
+        if captcha:
+            predata["wpCaptchaId"] = captcha['id']
+            predata["wpCaptchaWord"] = captcha['answer']
+        # Add server lag parameter (see config.py for details)
+        if config.maxlag:
+            predata['maxlag'] = str(config.maxlag)
+        # <s>Except if the page is new, we need to supply the time of the
+        # previous version to the wiki to prevent edit collisions</s>
+        # As of Oct 2008, these must be filled also for new pages
+        if self._editTime:
+            predata['wpEdittime'] = self._editTime
+        else:
+            predata['wpEdittime'] = time.strftime('%Y%m%d%H%M%S',
time.gmtime())
+        if self._startTime:
+            predata['wpStarttime'] = self._startTime
+        else:
+            predata['wpStarttime'] = time.strftime('%Y%m%d%H%M%S',
time.gmtime())
+        if self._revisionId:
+            predata['baseRevId'] = self._revisionId
+        # Pass the minorEdit and watchArticle arguments to the Wiki.
+        if minorEdit:
+            predata['wpMinoredit'] = '1'
+        if watchArticle:
+            predata['wpWatchthis'] = '1'
+        # Give the token, but only if one is supplied.
+        if token:
+            predata['wpEditToken'] = token
+
+        # Sorry, single-site exception...
+        if self.site().fam().name == 'loveto' and self.site().language() ==
'recipes':
+            predata['masteredit'] = '1'
+
+        retry_delay = 1
+        retry_attempt = 0
+        dblagged = False
+        wait = 5
+        while True:
+            if (maxTries == 0):
+                raise MaxTriesExceededError()
+            maxTries -= 1
+            # Check whether we are not too quickly after the previous
+            # putPage, and wait a bit until the interval is acceptable
+            if not dblagged:
+                put_throttle()
+            # Which web-site host are we submitting to?
+            if newPage:
+                output(u'Creating page %s' % self.title(asLink=True))
+            else:
+                output(u'Changing page %s' % self.title(asLink=True))
+            # Submit the prepared information
+            try:
+                response, data = self.site().postForm(address, predata, sysop)
+                if response.code == 503:
+                    if 'x-database-lag' in response.msg.keys():
+                        # server lag; Mediawiki recommends waiting 5 seconds
+                        # and retrying
+                        if verbose:
+                            output(data, newline=False)
+                        output(u"Pausing %d seconds due to database server
lag." % wait)
+                        dblagged = True
+                        time.sleep(wait)
+                        wait = min(wait*2, 300)
+                        continue
+                    # Squid error 503
+                    raise ServerError(response.code)
+            except httplib.BadStatusLine, line:
+                raise PageNotSaved('Bad status line: %s' % line.line)
+            except ServerError:
+                output(u''.join(traceback.format_exception(*sys.exc_info())))
+                retry_attempt += 1
+                if retry_attempt > config.maxretries:
+                    raise
+                output(
+            u'Got a server error when putting %s; will retry in %i minute%s.'
+                       % (self.title(asLink=True), retry_delay, retry_delay != 1 and
"s" or ""))
+                time.sleep(60 * retry_delay)
+                retry_delay *= 2
+                if retry_delay > 30:
+                    retry_delay = 30
+                continue
+            # If it has gotten this far then we should reset dblagged
+            dblagged = False
+            # Check blocks
+            self.site().checkBlocks(sysop = sysop)
+            # A second text area means that an edit conflict has occured.
+            editconflict1 = re.compile('id=["\']wpTextbox2[\'"]
name="wpTextbox2"')
+            editconflict2 = re.compile('name="wpTextbox2"
id="wpTextbox2"')
+            if editconflict1.search(data) or editconflict2.search(data):
+                raise EditConflict(u'An edit conflict has occured.')
+
+            # remove the wpAntispam keyword before checking for Spamfilter
+            data = re.sub(u'(?s)<label
for="wpAntispam">.*?</label>', '', data)
+            if self.site().has_mediawiki_message("spamprotectiontitle")\
+                    and self.site().mediawiki_message('spamprotectiontitle') in
data:
+                try:
+                    reasonR =
re.compile(re.escape(self.site().mediawiki_message('spamprotectionmatch')).replace('\$1',
'(?P<url>[^<]*)'))
+                    url = reasonR.search(data).group('url')
+                except:
+                    # Some wikis have modified the spamprotectionmatch
+                    # template in a way that the above regex doesn't work,
+                    # e.g. on he.wikipedia the template includes a
+                    # wikilink, and on fr.wikipedia there is bold text.
+                    # This is a workaround for this: it takes the region
+                    # which should contain the spamfilter report and the
+                    # URL. It then searches for a plaintext URL.
+                    relevant = data[data.find('<!-- start content
-->')+22:data.find('<!-- end content -->')].strip()
+                    # Throw away all the other links etc.
+                    relevant = re.sub('<.*?>', '', relevant)
+                    relevant = relevant.replace('&#58;', ':')
+                    # MediaWiki only spam-checks HTTP links, and only the
+                    # domain name part of the URL.
+                    m = re.search('http://[\w\-\.]+', relevant)
+                    if m:
+                        url = m.group()
+                    else:
+                        # Can't extract the exact URL. Let the user search.
+                        url = relevant
+                raise SpamfilterError(url)
+            if '<label for=\'wpRecreate\'' in data:
+                # Make sure your system clock is correct if this error occurs
+                # without any reason!
+                # raise EditConflict(u'Someone deleted the page.')
+                # No raise, simply define these variables and retry:
+                if self._editTime:
+                    predata['wpEdittime'] = self._editTime
+                else:
+                    predata['wpEdittime'] = time.strftime('%Y%m%d%H%M%S',
time.gmtime())
+                if self._startTime:
+                    predata['wpStarttime'] = self._startTime
+                else:
+                    predata['wpStarttime'] =
time.strftime('%Y%m%d%H%M%S', time.gmtime())
+                continue
+            if self.site().has_mediawiki_message("viewsource")\
+                    and self.site().mediawiki_message('viewsource') in data:
+                # The page is locked. This should have already been
+                # detected when getting the page, but there are some
+                # reasons why this didn't work, e.g. the page might be
+                # locked via a cascade lock.
+                try:
+                    # Page is locked - try using the sysop account, unless we're
using one already
+                    if sysop:
+                        # Unknown permissions error
+                        raise LockedPage()
+                    else:
+                        self.site().forceLogin(sysop = True)
+                        output(u'Page is locked, retrying using sysop account.')
+                        return self._putPageOld(text, comment, watchArticle, minorEdit,
newPage, token=self.site().getToken(sysop = True), sysop = True)
+                except NoUsername:
+                    raise LockedPage()
+            if not newToken and "<textarea" in data:
+                ##if "<textarea" in data: # for debug use only, if badtoken
still happen
+                # We might have been using an outdated token
+                output(u"Changing page has failed. Retrying.")
+                return self._putPageOld(text, comment, watchArticle, minorEdit, newPage,
token=self.site().getToken(sysop = sysop, getagain = True), newToken = True, sysop =
sysop)
+            # I think the error message title was changed from "Wikimedia
Error"
+            # to "Wikipedia has a problem", but I'm not sure. Maybe we
could
+            # just check for HTTP Status 500 (Internal Server Error)?
+            if ("<title>Wikimedia Error</title>" in data or
"has a problem</title>" in data) \
+                or response.code == 500:
+                output(u"Server error encountered; will retry in %i minute%s."
+                       % (retry_delay, retry_delay != 1 and "s" or
""))
+                time.sleep(60 * retry_delay)
+                retry_delay *= 2
+                if retry_delay > 30:
+                    retry_delay = 30
+                continue
+            if ("1213: Deadlock found when trying to get lock" in data):
+                output(u"Deadlock error encountered; will retry in %i
minute%s."
+                       % (retry_delay, retry_delay != 1 and "s" or
""))
+                time.sleep(60 * retry_delay)
+                retry_delay *= 2
+                if retry_delay > 30:
+                    retry_delay = 30
+                continue
+            if self.site().mediawiki_message('readonly') in data  or
self.site().mediawiki_message('readonly_lag') in data:
+                output(u"The database is currently locked for write access; will
retry in %i minute%s."
+                       % (retry_delay, retry_delay != 1 and "s" or
""))
+                time.sleep(60 * retry_delay)
+                retry_delay *= 2
+                if retry_delay > 30:
+                    retry_delay = 30
+                continue
+            if self.site().has_mediawiki_message('longpageerror'):
+                # FIXME: Long page error detection isn't working in Vietnamese
Wikipedia.
+                long_page_errorR = re.compile(
+                    # Some wikis (e.g. Lithuanian and Slovak Wikipedia) use {{plural}}
in
+                    # [[MediaWiki:longpageerror]]
+                    re.sub(r'\\{\\{plural\\:.*?\\}\\}', '.*?',
+                        re.escape(
+                            html2unicode(
+                                self.site().mediawiki_message('longpageerror')
+                            )
+                        )
+                    ).replace("\$1", "(?P<length>[\d,.\s]+)",
1).replace("\$2", "(?P<limit>[\d,.\s]+)", 1),
+                re.UNICODE)
+
+                match = long_page_errorR.search(data)
+                if match:
+                    # Some wikis (e.g. Lithuanian Wikipedia) don't use $2 parameter
in
+                    # [[MediaWiki:longpageerror]]
+                    longpage_length = 0 ; longpage_limit = 0
+                    if 'length' in match.groups():
+                        longpage_length = match.group('length')
+                    if 'limit' in match.groups():
+                        longpage_limit = match.group('limit')
+                    raise LongPageError(longpage_length, longpage_limit)
+
+            # We might have been prompted for a captcha if the
+            # account is not autoconfirmed, checking....
+            ## output('%s' % data) # WHY?
+            solve = self.site().solveCaptcha(data)
+            if solve:
+                return self._putPageOld(text, comment, watchArticle, minorEdit, newPage,
token, newToken, sysop, captcha=solve)
+
+            # We are expecting a 302 to the action=view page. I'm not sure why this
was removed in r5019
+            if  response.code != 302 and data.strip() != u"":
+                # Something went wrong, and we don't know what. Show the
+                # HTML code that hopefully includes some error message.
+                output(u"ERROR: Unexpected response from wiki server.")
+                output(u"       %s (%s) " % (response.code, response.msg))
+                output(data)
+                # Unexpected responses should raise an error and not pass,
+                # be it silently or loudly. This should raise an error
+
+            if 'name="wpTextbox1"' in data and 'var wgAction =
"submit"' in data:
+                # We are on the preview page, so the page was not saved
+                raise PageNotSaved
+
+            return response.code, response.msg, data
+
+    ## @since   r10311
+    #  @remarks to support appending to single sections
+    def append(self, newtext, comment=None, minorEdit=True, section=0):
+        """Append the wiki-text to the page.
+
+           Returns the result of text append to page section number 'section'.
+           0 for the top section, 'new' for a new section (end of page).
+        """
+
+        # If no comment is given for the change, use the default
+        comment = comment or pywikibot.action
+
+        # send data by POST request
+        params = {
+            'action'     : 'edit',
+            'title'      : self.title(),
+            'section'    : '%s' % section,
+            'appendtext' : self._encodeArg(newtext, 'text'),
+            'token'      : self.site().getToken(),
+            'summary'    : self._encodeArg(comment, 'summary'),
+            'bot'        : 1,
+            }
+
+        if minorEdit:
+            params['minor'] = 1
+        else:
+            params['notminor'] = 1
+
+        response, data = query.GetData(params, self.site(), back_response = True)
+
+        if not (data['edit']['result'] == u"Success"):
+            raise PageNotSaved('Bad result returned: %s' %
data['edit']['result'])
+
+        return response.code, response.msg, data
+
+    def protection(self):
+        """Return list of dicts of this page protection level. like:
+        [{u'expiry': u'2010-05-26T14:41:51Z', u'type':
u'edit', u'level': u'autoconfirmed'}, {u'expiry':
u'2010-05-26T14:41:51Z', u'type': u'move', u'level':
u'sysop'}]
+
+        if the page non protection, return []
+        """
+
+        params = {
+            'action': 'query',
+            'prop'  : 'info',
+            'inprop': 'protection',
+            'titles' : self.title(),
+            }
+
+        datas = query.GetData(params, self.site())
+        data=datas['query']['pages'].values()[0]['protection']
+        return data
+
+    def interwiki(self):
+        """Return a list of interwiki links in the page text.
+
+        This will retrieve the page to do its work, so it can raise
+        the same exceptions that are raised by the get() method.
+
+        The return value is a list of Page objects for each of the
+        interwiki links in the page text.
+
+        """
+        if hasattr(self, "_interwikis"):
+            return self._interwikis
+
+        text = self.get()
+
+        # Replace {{PAGENAME}} by its value
+        for pagenametext in self.site().pagenamecodes(
+                                                   self.site().language()):
+            text = text.replace(u"{{%s}}" % pagenametext, self.title())
+
+        ll = getLanguageLinks(text, insite=self.site(),
pageLink=self.title(asLink=True))
+
+        result = ll.values()
+
+        self._interwikis = result
+        return result
+
+
+
+    def categories(self, get_redirect=False, api=False):
+        """Return a list of Category objects that the article is in.
+        Please be aware: the api call returns also categies which are included
+        by templates. This differs to the old non-api code. If you need only
+        these categories which are in the page text please use getCategoryLinks
+        (or set api=False but this could be deprecated in future).
+        """
+        if not (self.site().has_api() and api):
+            try:
+                category_links_to_return =
getCategoryLinks(self.get(get_redirect=get_redirect), self.site())
+            except NoPage:
+                category_links_to_return = []
+            return category_links_to_return
+
+        else:
+            import catlib
+            params = {
+                'action': 'query',
+                'prop'  : 'categories',
+                'titles' : self.title(),
+            }
+            if not self.site().isAllowed('apihighlimits') and
config.special_page_limit > 500:
+                params['cllimit'] = 500
+
+            output(u'Getting categories in %s via API...' %
self.title(asLink=True))
+            allDone = False
+            cats=[]
+            while not allDone:
+                datas = query.GetData(params, self.site())
+                data=datas['query']['pages'].values()[0]
+                if "categories" in data:
+                    for c in data['categories']:
+                        if c['ns'] is 14:
+                            cat = catlib.Category(self.site(), c['title'])
+                            cats.append(cat)
+
+                if 'query-continue' in datas:
+                    if 'categories' in datas['query-continue']:
+                        params['clcontinue'] =
datas['query-continue']['categories']['clcontinue']
+                else:
+                    allDone = True
+            return cats
+
+    def linkedPages(self, withImageLinks = False):
+        """Return a list of Pages that this Page links to.
+
+        Excludes interwiki and category links, and also image links by default.
+        """
+        result = []
+        try:
+            thistxt = removeLanguageLinks(self.get(get_redirect=True),
+                                          self.site())
+        except NoPage:
+            raise
+        except IsRedirectPage:
+            raise
+        except SectionError:
+            return []
+        thistxt = removeCategoryLinks(thistxt, self.site())
+
+        # remove HTML comments, pre, nowiki, and includeonly sections
+        # from text before processing
+        thistxt = removeDisabledParts(thistxt)
+
+        # resolve {{ns:-1}} or {{ns:Help}}
+        thistxt = self.site().resolvemagicwords(thistxt)
+
+        for match in Rlink.finditer(thistxt):
+            title = match.group('title')
+            title = title.replace("_", " ").strip(" ")
+            if title.startswith("#"):
+                # this is an internal section link
+                continue
+            if not self.site().isInterwikiLink(title):
+                try:
+                    page = Page(self.site(), title)
+                    try:
+                        hash(str(page))
+                    except Exception:
+                        raise Error(u"Page %s contains invalid link to
[[%s]]."
+                                    % (self.title(), title))
+                except Error:
+                    if verbose:
+                        output(u"Page %s contains invalid link to [[%s]]."
+                               % (self.title(), title))
+                    continue
+                if not withImageLinks and page.isImage():
+                    continue
+                if page.sectionFreeTitle() and page not in result:
+                    result.append(page)
+        return result
+
+    def imagelinks(self, followRedirects=False, loose=False):
+        """Return a list of ImagePage objects for images displayed on this
Page.
+
+        Includes images in galleries.
+        If loose is True, this will find anything that looks like it
+        could be an image. This is useful for finding, say, images that are
+        passed as parameters to templates.
+
+        """
+        results = []
+        # Find normal images
+        for page in self.linkedPages(withImageLinks = True):
+            if page.isImage():
+                # convert Page object to ImagePage object
+                results.append( ImagePage(page.site(), page.title()) )
+        # Find images in galleries
+        pageText = self.get(get_redirect=followRedirects)
+        galleryR = re.compile('<gallery>.*?</gallery>', re.DOTALL)
+        galleryEntryR = re.compile('(?P<title>(%s|%s):.+?)(\|.+)?\n' %
(self.site().image_namespace(), self.site().family.image_namespace(code =
'_default')))
+        for gallery in galleryR.findall(pageText):
+            for match in galleryEntryR.finditer(gallery):
+                results.append( ImagePage(self.site(), match.group('title')) )
+        if loose:
+            ns = getSite().image_namespace()
+            imageR =
re.compile('\w\w\w+\.(?:gif|png|jpg|jpeg|svg|JPG|xcf|pdf|mid|ogg|djvu)',
re.IGNORECASE)
+            for imageName in imageR.findall(pageText):
+                results.append( ImagePage(self.site(), imageName) )
+        return list(set(results))
+
+    def templates(self, get_redirect=False):
+        """Return a list of titles (unicode) of templates used on this
Page.
+
+        Template parameters are ignored.
+        """
+        if not hasattr(self, "_templates"):
+            self._templates = list(set([template
+                                       for (template, param)
+                                       in self.templatesWithParams(
+                                               get_redirect=get_redirect)]))
+        return self._templates
+
+    def templatesWithParams(self, thistxt=None, get_redirect=False):
+        """Return a list of templates used on this Page.
+
+        Return value is a list of tuples. There is one tuple for each use of
+        a template in the page, with the template title as the first entry
+        and a list of parameters as the second entry.
+
+        If thistxt is set, it is used instead of current page content.
+        """
+        if not thistxt:
+            try:
+                thistxt = self.get(get_redirect=get_redirect)
+            except (IsRedirectPage, NoPage):
+                return []
+
+        # remove commented-out stuff etc.
+        thistxt  = removeDisabledParts(thistxt)
+
+        # marker for inside templates or parameters
+        marker = findmarker(thistxt,  u'@@', u'@')
+
+        # marker for links
+        marker2 = findmarker(thistxt,  u'##', u'#')
+
+        # marker for math
+        marker3 = findmarker(thistxt,  u'%%', u'%')
+
+        result = []
+        inside = {}
+        count = 0
+        Rtemplate = re.compile(
+                   
ur'{{(msg:)?(?P<name>[^{\|]+?)(\|(?P<params>[^{]*?))?}}')
+        Rlink = re.compile(ur'\[\[[^\]]+\]\]')
+        Rmath = re.compile(ur'<math>[^<]+</math>')
+        Rmarker = re.compile(ur'%s(\d+)%s' % (marker, marker))
+        Rmarker2 = re.compile(ur'%s(\d+)%s' % (marker2, marker2))
+        Rmarker3 = re.compile(ur'%s(\d+)%s' % (marker3, marker3))
+
+        # Replace math with markers
+        maths = {}
+        count = 0
+        for m in Rmath.finditer(thistxt):
+            count += 1
+            text = m.group()
+            thistxt = thistxt.replace(text, '%s%d%s' % (marker3, count,
marker3))
+            maths[count] = text
+
+        while Rtemplate.search(thistxt) is not None:
+            for m in Rtemplate.finditer(thistxt):
+                # Make sure it is not detected again
+                count += 1
+                text = m.group()
+                thistxt = thistxt.replace(text,
+                                          '%s%d%s' % (marker, count, marker))
+                # Make sure stored templates don't contain markers
+                for m2 in Rmarker.finditer(text):
+                    text = text.replace(m2.group(), inside[int(m2.group(1))])
+                for m2 in Rmarker3.finditer(text):
+                    text = text.replace(m2.group(), maths[int(m2.group(1))])
+                inside[count] = text
+
+                # Name
+                name = m.group('name').strip()
+                m2 = Rmarker.search(name) or Rmath.search(name)
+                if m2 is not None:
+                    # Doesn't detect templates whose name changes,
+                    # or templates whose name contains math tags
+                    continue
+                if self.site().isInterwikiLink(name):
+                    continue
+
+                # {{#if: }}
+                if name.startswith('#'):
+                    continue
+                # {{DEFAULTSORT:...}}
+                defaultKeys = self.site().versionnumber() > 13 and \
+                              self.site().getmagicwords('defaultsort')
+                # It seems some wikis does not have this magic key
+                if defaultKeys:
+                    found = False
+                    for key in defaultKeys:
+                        if name.startswith(key):
+                            found = True
+                            break
+                    if found: continue
+
+                try:
+                    name = Page(self.site(), name).title()
+                except InvalidTitle:
+                    if name:
+                        output(
+                            u"Page %s contains invalid template name {{%s}}."
+                           % (self.title(), name.strip()))
+                    continue
+                # Parameters
+                paramString = m.group('params')
+                params = []
+                if paramString:
+                    # Replace links to markers
+                    links = {}
+                    count2 = 0
+                    for m2 in Rlink.finditer(paramString):
+                        count2 += 1
+                        text = m2.group()
+                        paramString = paramString.replace(text,
+                                        '%s%d%s' % (marker2, count2, marker2))
+                        links[count2] = text
+                    # Parse string
+                    markedParams = paramString.split('|')
+                    # Replace markers
+                    for param in markedParams:
+                        for m2 in Rmarker.finditer(param):
+                            param = param.replace(m2.group(),
+                                                  inside[int(m2.group(1))])
+                        for m2 in Rmarker2.finditer(param):
+                            param = param.replace(m2.group(),
+                                                  links[int(m2.group(1))])
+                        for m2 in Rmarker3.finditer(param):
+                            param = param.replace(m2.group(),
+                                                  maths[int(m2.group(1))])
+                        params.append(param)
+
+                # Add it to the result
+                result.append((name, params))
+        return result
+
+    def getRedirectTarget(self):
+        """Return a Page object for the target this Page redirects to.
+
+        If this page is not a redirect page, will raise an IsNotRedirectPage
+        exception. This method also can raise a NoPage exception.
+
+        """
+        try:
+            self.get()
+        except NoPage:
+            raise
+        except IsRedirectPage, err:
+            # otherwise it will return error pages with " inside.
+            target = err[0].replace('&amp;quot;', '"')
+
+            if '|' in target:
+                warnings.warn("'%s' has a | character, this makes no
sense"
+                              % target, Warning)
+            return Page(self.site(), target)
+        else:
+            raise IsNotRedirectPage(self)
+
+    def getVersionHistory(self, forceReload=False, reverseOrder=False,
+                          getAll=False, revCount=500):
+        """Load the version history page and return history information.
+
+        Return value is a list of tuples, where each tuple represents one
+        edit and is built of revision id, edit date/time, user name,
+        edit summary, size and tags. Starts with the most current revision,
+        unless reverseOrder is True.
+        Defaults to getting the first revCount edits, unless getAll is True.
+
+        @param revCount: iterate no more than this number of revisions in total
+        """
+
+        # regular expression matching one edit in the version history.
+        # results will have 4 groups: oldid, edit date/time, user name, and edit
+        # summary.
+        thisHistoryDone = False
+        skip = False # Used in determining whether we need to skip the first page
+        dataQuery = []
+        hasData = False
+
+
+        # Are we getting by Earliest first?
+        if reverseOrder:
+            # Check if _versionhistoryearliest exists
+            if not hasattr(self, '_versionhistoryearliest') or forceReload:
+                self._versionhistoryearliest = []
+            elif getAll and len(self._versionhistoryearliest) == revCount:
+                # Cause a reload, or at least make the loop run
+                thisHistoryDone = False
+                skip = True
+                dataQuery = self._versionhistoryearliest
+            else:
+                thisHistoryDone = True
+        elif not hasattr(self, '_versionhistory') or forceReload or \
+             len(self._versionhistory) < revCount:
+            self._versionhistory = []
+        # ?? does not load if len(self._versionhistory) > revCount
+        # shouldn't it
+        elif getAll and len(self._versionhistory) == revCount:
+            # Cause a reload, or at least make the loop run
+            thisHistoryDone = False
+            skip = True
+            dataQuery = self._versionhistory
+        else:
+            thisHistoryDone = True
+
+        if not thisHistoryDone:
+            dataQuery.extend(self._getVersionHistory(getAll, skip, reverseOrder,
revCount))
+
+        if reverseOrder:
+            # Return only revCount edits, even if the version history is extensive
+            if dataQuery != []:
+                self._versionhistoryearliest = dataQuery
+                del dataQuery
+            if len(self._versionhistoryearliest) > revCount and not getAll:
+                return self._versionhistoryearliest[:revCount]
+            return self._versionhistoryearliest
+
+        if dataQuery != []:
+            self._versionhistory = dataQuery
+            del dataQuery
+        # Return only revCount edits, even if the version history is extensive
+        if len(self._versionhistory) > revCount and not getAll:
+            return self._versionhistory[:revCount]
+        return self._versionhistory
+
+    def _getVersionHistory(self, getAll=False, skipFirst=False, reverseOrder=False,
+                           revCount=500):
+        """Load history informations by API query.
+           Internal use for self.getVersionHistory(), don't use this function
directly.
+        """
+        if not self.site().has_api() or self.site().versionnumber() < 8:
+            return self._getVersionHistoryOld(reExist, getAll, skipFirst, reverseOrder,
revCount)
+        dataQ = []
+        thisHistoryDone = False
+        params = {
+            'action': 'query',
+            'prop': 'revisions',
+            'titles': self.title(),
+            'rvprop': 'ids|timestamp|flags|comment|user|size|tags',
+            'rvlimit': revCount,
+        }
+        while not thisHistoryDone:
+            if reverseOrder:
+                params['rvdir'] = 'newer'
+
+            result = query.GetData(params, self.site())
+            if 'error' in result:
+                raise RuntimeError("%s" % result['error'])
+            pageInfo = result['query']['pages'].values()[0]
+            if result['query']['pages'].keys()[0] == "-1":
+                if 'missing' in pageInfo:
+                    raise NoPage(self.site(), unicode(self),
+                                 "Page does not exist.")
+                elif 'invalid' in pageInfo:
+                    raise BadTitle('BadTitle: %s' % self)
+
+            if 'query-continue' in result and getAll:
+                params['rvstartid'] =
result['query-continue']['revisions']['rvstartid']
+            else:
+                thisHistoryDone = True
+
+            if skipFirst:
+                skipFirst = False
+            else:
+                for r in pageInfo['revisions']:
+                    c = ''
+                    if 'comment' in r:
+                        c = r['comment']
+                    #revision id, edit date/time, user name, edit summary
+                    (revidStrr, timestampStrr, userStrr) = (None, None, None)
+                    if 'revid' in r:
+                        revidStrr = r['revid']
+                    if 'timestamp' in r:
+                        timestampStrr = r['timestamp']
+                    if 'user' in r:
+                        userStrr = r['user']
+                    s=-1 #Will return -1 if not found
+                    if 'size' in r:
+                        s = r['size']
+                    tags=[]
+                    if 'tags' in r:
+                        tags = r['tags']
+                    dataQ.append((revidStrr, timestampStrr, userStrr, c, s, tags))
+                if
len(result['query']['pages'].values()[0]['revisions']) <
revCount:
+                    thisHistoryDone = True
+        return dataQ
+
+    def _getVersionHistoryOld(self, getAll = False, skipFirst = False,
+                               reverseOrder = False, revCount=500):
+        """Load the version history page and return history information.
+           Internal use for self.getVersionHistory(), don't use this function
directly.
+        """
+        dataQ = []
+        thisHistoryDone = False
+        startFromPage = None
+        if self.site().versionnumber() < 4:
+            editR = re.compile('<li>\(.*?\)\s+\(.*\).*?<a
href=".*?oldid=([0-9]*)" title=".*?">([^<]*)</a> <span
class=\'user\'><a href=".*?"
title=".*?">([^<]*?)</a></span>.*?(?:<span
class=\'comment\'>(.*?)</span>)?</li>')
+        elif self.site().versionnumber() < 15:
+            editR = re.compile('<li>\(.*?\)\s+\(.*\).*?<a
href=".*?oldid=([0-9]*)" title=".*?">([^<]*)</a>
(?:<span class=\'history-user\'>|)<a href=".*?"
title=".*?">([^<]*?)</a>.*?(?:</span>|).*?(?:<span
class=[\'"]comment[\'"]>(.*?)</span>)?</li>')
+        elif self.site().versionnumber() < 16:
+            editR = re.compile(r'<li
class=".*?">\((?:\w*|<a[^<]*</a>)\)\s\((?:\w*|<a[^<]*</a>)\).*?<a
href=".*?([0-9]*)" title=".*?">([^<]*)</a> <span
class=\'history-user\'><a
[^>]*?>([^<]*?)</a>.*?</span></span>(?: <span
class="minor">.*?</span>|)(?: <span
class="history-size">.*?</span>|)(?: <span
class=[\'"]comment[\'"]>\((?:<span
class="autocomment">|)(.*?)(?:</span>|)\)</span>)?(?: \(<span
class="mw-history-undo">.*?</span>\)|)\s*</li>', re.UNICODE)
+        else:
+            editR = re.compile(r'<li(?:
class="mw-tag[^>]+)?>\((?:\w+|<a[^<]*</a>)\)\s\((?:\w+|<a[^<]*</a>)\).*?<a
href=".*?([0-9]*)" title=".*?">([^<]*)</a> <span
class=\'history-user\'><a
[^>]*?>([^<]*?)</a>.*?</span></span>(?: <abbr
class="minor"[^>]*?>.*?</abbr>|)(?: <span
class="history-size">.*?</span>|)(?: <span
class="comment">\((?:<span
class="autocomment">|)(.*?)(?:</span>|)\)</span>)?(?: \(<span
class="mw-history-undo">.*?</span>\))?(?: <span
class="mw-tag-markers">.*?</span>\)</span>)?\s*</li>',
re.UNICODE)
+
+        RLinkToNextPage = re.compile('&amp;offset=(.*?)&amp;')
+
+        while not thisHistoryDone:
+            path = self.site().family.version_history_address(self.site().language(),
self.urlname(), config.special_page_limit)
+
+            if reverseOrder:
+                path += '&dir=prev'
+
+            if startFromPage:
+                path += '&offset=' + startFromPage
+
+            # this loop will run until the page could be retrieved
+            # Try to retrieve the page until it was successfully loaded (just in case
+            # the server is down or overloaded)
+            # wait for retry_idle_time minutes (growing!) between retries.
+            retry_idle_time = 1
+
+            if verbose:
+                if startFromPage:
+                    output(u'Continuing to get version history of %s' % self)
+                else:
+                    output(u'Getting version history of %s' % self)
+
+            txt = self.site().getUrl(path)
+
+            # save a copy of the text
+            self_txt = txt
+
+            #Find the nextPage link, if not exist, the page is last history page
+            matchObj = RLinkToNextPage.search(self_txt)
+            if getAll and matchObj:
+                startFromPage = matchObj.group(1)
+            else:
+                thisHistoryDone = True
+
+            if not skipFirst:
+                edits = editR.findall(self_txt)
+
+            if skipFirst:
+                # Skip the first page only,
+                skipFirst = False
+            else:
+                if reverseOrder:
+                    edits.reverse()
+                #for edit in edits:
+                dataQ.extend(edits)
+                if len(edits) < revCount:
+                    thisHistoryDone = True
+        return dataQ
+
+    def getVersionHistoryTable(self, forceReload=False, reverseOrder=False,
+                               getAll=False, revCount=500):
+        """Return the version history as a wiki table."""
+
+        result = '{| class="wikitable"\n'
+        result += '! oldid || date/time || size || username || edit summary\n'
+        for oldid, time, username, summary, size, tags \
+                in self.getVersionHistory(forceReload=forceReload,
+                                          reverseOrder=reverseOrder,
+                                          getAll=getAll, revCount=revCount):
+            result += '|----\n'
+            result += '| %s || %s || %d || %s ||
<nowiki>%s</nowiki>\n' \
+                      % (oldid, time, size, username, summary)
+        result += '|}\n'
+        return result
+
+    def fullVersionHistory(self, getAll=False, skipFirst=False, reverseOrder=False,
+                           revCount=500):
+        """Iterate previous versions including wikitext.
+
+        Gives a list of tuples consisting of revision ID, edit date/time, user name and
+        content
+
+        """
+        if not self.site().has_api() or self.site().versionnumber() < 8:
+            address = self.site().export_address()
+            predata = {
+                'action': 'submit',
+                'pages': self.title()
+            }
+            get_throttle(requestsize = 10)
+            now = time.time()
+            response, data = self.site().postForm(address, predata)
+            data = data.encode(self.site().encoding())
+#        get_throttle.setDelay(time.time() - now)
+            output = []
+        # TODO: parse XML using an actual XML parser instead of regex!
+            r =
re.compile("\<revision\>.*?\<id\>(?P<id>.*?)\<\/id\>.*?\<timestamp\>(?P<timestamp>.*?)\<\/timestamp\>.*?\<(?:ip|username)\>(?P<user>.*?)\</(?:ip|username)\>.*?\<text.*?\>(?P<content>.*?)\<\/text\>",re.DOTALL)
+        #r =
re.compile("\<revision\>.*?\<timestamp\>(.*?)\<\/timestamp\>.*?\<(?:ip|username)\>(.*?)\<",re.DOTALL)
+            return [  (match.group('id'),
+                       match.group('timestamp'),
+                       unescape(match.group('user')),
+                       unescape(match.group('content')))
+                    for match in r.finditer(data)  ]
+
+        # Load history informations by API query.
+
+        dataQ = []
+        thisHistoryDone = False
+        params = {
+            'action': 'query',
+            'prop': 'revisions',
+            'titles': self.title(),
+            'rvprop': 'ids|timestamp|user|content',
+            'rvlimit': revCount,
+        }
+        while not thisHistoryDone:
+            if reverseOrder:
+                params['rvdir'] = 'newer'
+
+            result = query.GetData(params, self.site())
+            if 'error' in result:
+                raise RuntimeError("%s" % result['error'])
+            pageInfo = result['query']['pages'].values()[0]
+            if result['query']['pages'].keys()[0] == "-1":
+                if 'missing' in pageInfo:
+                    raise NoPage(self.site(), unicode(self),
+                                 "Page does not exist.")
+                elif 'invalid' in pageInfo:
+                    raise BadTitle('BadTitle: %s' % self)
+
+            if 'query-continue' in result and getAll:
+                params['rvstartid'] =
result['query-continue']['revisions']['rvstartid']
+            else:
+                thisHistoryDone = True
+
+            if skipFirst:
+                skipFirst = False
+            else:
+                for r in pageInfo['revisions']:
+                    c = ''
+                    if 'comment' in r:
+                        c = r['comment']
+                    #revision id, edit date/time, user name, edit summary
+                    (revidStrr, timestampStrr, userStrr) = (None, None, None)
+                    if 'revid' in r:
+                        revidStrr = r['revid']
+                    if 'timestamp' in r:
+                        timestampStrr = r['timestamp']
+                    if 'user' in r:
+                        userStrr = r['user']
+                    s='' #Will return -1 if not found
+                    if '*' in r:
+                        s = r['*']
+                    dataQ.append((revidStrr, timestampStrr, userStrr, s))
+                if
len(result['query']['pages'].values()[0]['revisions']) <
revCount:
+                    thisHistoryDone = True
+        return dataQ
+
+    def contributingUsers(self, step=None, total=None):
+        """Return a set of usernames (or IPs) of users who edited this
page.
+
+        @param step: limit each API call to this number of revisions
+                     - not used yet, only in rewrite branch -
+        @param total: iterate no more than this number of revisions in total
+
+        """
+        if total is None:
+            total = 500 #set to default of getVersionHistory
+        edits = self.getVersionHistory(revCount=total)
+        users = set([edit[2] for edit in edits])
+        return users
+
+    def getCreator(self):
+        """ Function to get the first editor and time stamp of a page
"""
+        inf = self.getVersionHistory(reverseOrder=True, revCount=1)[0]
+        return inf[2], inf[1]
+
+    def getLatestEditors(self, limit=1):
+        """ Function to get the last editors of a page """
+       
#action=query&prop=revisions&titles=API&rvprop=timestamp|user|comment
+        if hasattr(self, '_versionhistory'):
+            data = self.getVersionHistory(getAll=True, revCount=limit)
+        else:
+            data = self.getVersionHistory(revCount = limit)
+
+        result = []
+        for i in data:
+            result.append({'user':i[2], 'timestamp':i[1]})
+        return result
+
+    def watch(self, unwatch=False):
+        """Add this page to the watchlist"""
+        if self.site().has_api:
+            params = {
+                'action': 'watch',
+                'title': self.title()
+            }
+            # watchtoken is needed for mw 1.18
+            # TODO: Find a better implementation for other actions too
+            #       who needs a token
+            if self.site().versionnumber() >= 18:
+                api = {
+                    'action': 'query',
+                    'prop': 'info',
+                    'titles' : self.title(),
+                    'intoken' : 'watch',
+                }
+                data = query.GetData(api, self.site())
+                params['token'] =
data['query']['pages'].values()[0]['watchtoken']
+            if unwatch:
+                params['unwatch'] = ''
+
+            data = query.GetData(params, self.site())
+            if 'error' in data:
+                raise RuntimeError("API query error: %s" %
data['error'])
+        else:
+            urlname = self.urlname()
+            if not unwatch:
+                address = self.site().watch_address(urlname)
+            else:
+                address = self.site().unwatch_address(urlname)
+            response = self.site().getUrl(address)
+            return response
+
+    def unwatch(self):
+        self.watch(unwatch=True)
+
+    def move(self, newtitle, reason=None, movetalkpage=True, movesubpages=False,
+             sysop=False, throttle=True, deleteAndMove=False, safe=True,
+             fixredirects=True, leaveRedirect=True):
+        """Move this page to new title.
+
+        * fixredirects has no effect in MW < 1.13
+
+        @param newtitle: The new page title.
+        @param reason: The edit summary for the move.
+        @param movetalkpage: If true, move this page's talk page (if it exists)
+        @param sysop: Try to move using sysop account, if available
+        @param deleteAndMove: if move succeeds, delete the old page
+            (usually requires sysop privileges, depending on wiki settings)
+        @param safe: If false, attempt to delete existing page at newtitle
+            (if there is one) and then move this page to that title
+
+        """
+        if not self.site().has_api() or self.site().versionnumber() < 12:
+            return self._moveOld(newtitle, reason, movetalkpage, sysop,
+              throttle, deleteAndMove, safe, fixredirects, leaveRedirect)
+        # Login
+        try:
+            self.get()
+        except:
+            pass
+        sysop = self._getActionUser(action = 'move', restriction =
self.moveRestriction, sysop = False)
+        if deleteAndMove:
+            sysop = self._getActionUser(action = 'delete', restriction =
'', sysop = True)
+            Page(self.site(),
newtitle).delete(self.site().mediawiki_message('delete_and_move_reason'), False,
False)
+
+        # Check blocks
+        self.site().checkBlocks(sysop = sysop)
+
+        if throttle:
+            put_throttle()
+        if reason is None:
+            pywikibot.output(u'Moving %s to [[%s]].'
+                             % (self.title(asLink=True), newtitle))
+            reason = input(u'Please enter a reason for the move:')
+        if self.isTalkPage():
+            movetalkpage = False
+
+        params = {
+            'action': 'move',
+            'from': self.title(),
+            'to': newtitle,
+            'token': self.site().getToken(sysop=sysop),
+            'reason': reason,
+        }
+        if movesubpages:
+            params['movesubpages'] = 1
+
+        if movetalkpage:
+            params['movetalk'] = 1
+
+        if not leaveRedirect:
+            params['noredirect'] = 1
+
+        result = query.GetData(params, self.site(), sysop=sysop)
+        if 'error' in result:
+            err = result['error']['code']
+            if err == 'articleexists':
+                if safe:
+                    output(u'Page move failed: Target page [[%s]] already
exists.' % newtitle)
+                else:
+                    try:
+                        # Try to delete and move
+                        return self.move(newtitle, reason, movetalkpage, movesubpages,
throttle = throttle, deleteAndMove = True)
+                    except NoUsername:
+                        # We dont have the user rights to delete
+                        output(u'Page moved failed: Target page [[%s]] already
exists.' % newtitle)
+            #elif err == 'protectedpage':
+            #
+            else:
+                output("Unknown Error: %s" % result)
+            return False
+        elif 'move' in result:
+            if deleteAndMove:
+                output(u'Page %s moved to %s, deleting the existing page' %
(self.title(), newtitle))
+            else:
+                output(u'Page %s moved to %s' % (self.title(), newtitle))
+
+            if hasattr(self, '_contents'):
+                #self.__init__(self.site(), newtitle, defaultNamespace =
self._namespace)
+                try:
+                    self.get(force=True, get_redirect=True, throttle=False)
+                except NoPage:
+                    output(u'Page %s is moved and no longer exist.' %
self.title() )
+                    #delattr(self, '_contents')
+            return True
+
+    def _moveOld(self, newtitle, reason=None, movetalkpage=True, movesubpages=False,
sysop=False,
+             throttle=True, deleteAndMove=False, safe=True, fixredirects=True,
leaveRedirect=True):
+
+        # Login
+        try:
+            self.get()
+        except:
+            pass
+        sysop = self._getActionUser(action = 'move', restriction =
self.moveRestriction, sysop = False)
+        if deleteAndMove:
+            sysop = self._getActionUser(action = 'delete', restriction =
'', sysop = True)
+
+        # Check blocks
+        self.site().checkBlocks(sysop = sysop)
+
+        if throttle:
+            put_throttle()
+        if reason is None:
+            reason = input(u'Please enter a reason for the move:')
+        if self.isTalkPage():
+            movetalkpage = False
+
+        host = self.site().hostname()
+        address = self.site().move_address()
+        token = self.site().getToken(sysop = sysop)
+        predata = {
+            'wpOldTitle': self.title().encode(self.site().encoding()),
+            'wpNewTitle': newtitle.encode(self.site().encoding()),
+            'wpReason': reason.encode(self.site().encoding()),
+        }
+        if deleteAndMove:
+            predata['wpDeleteAndMove'] =
self.site().mediawiki_message('delete_and_move_confirm')
+            predata['wpConfirm'] = '1'
+
+        if movetalkpage:
+            predata['wpMovetalk'] = '1'
+        else:
+            predata['wpMovetalk'] = '0'
+
+        if self.site().versionnumber() >= 13:
+            if fixredirects:
+                predata['wpFixRedirects'] = '1'
+            else:
+                predata['wpFixRedirects'] = '0'
+
+        if leaveRedirect:
+            predata['wpLeaveRedirect'] = '1'
+        else:
+            predata['wpLeaveRedirect'] = '0'
+
+        if movesubpages:
+            predata['wpMovesubpages'] = '1'
+        else:
+            predata['wpMovesubpages'] = '0'
+
+        if token:
+            predata['wpEditToken'] = token
+
+        response, data = self.site().postForm(address, predata, sysop = sysop)
+
+        if data == u'' or self.site().mediawiki_message('pagemovedsub')
in data:
+            #Move Success
+            if deleteAndMove:
+                output(u'Page %s moved to %s, deleting the existing page' %
(self.title(), newtitle))
+            else:
+                output(u'Page %s moved to %s' % (self.title(), newtitle))
+
+            if hasattr(self, '_contents'):
+                #self.__init__(self.site(), newtitle, defaultNamespace =
self._namespace)
+                try:
+                    self.get(force=True, get_redirect=True, throttle=False)
+                except NoPage:
+                    output(u'Page %s is moved and no longer exist.' %
self.title() )
+                    #delattr(self, '_contents')
+
+            return True
+        else:
+            #Move Failure
+            self.site().checkBlocks(sysop = sysop)
+            if self.site().mediawiki_message('articleexists') in data or
self.site().mediawiki_message('delete_and_move') in data:
+                if safe:
+                    output(u'Page move failed: Target page [[%s]] already
exists.' % newtitle)
+                    return False
+                else:
+                    try:
+                        # Try to delete and move
+                        return self._moveOld(newtitle, reason, movetalkpage,
movesubpages, throttle = throttle, deleteAndMove = True)
+                    except NoUsername:
+                        # We dont have the user rights to delete
+                        output(u'Page moved failed: Target page [[%s]] already
exists.' % newtitle)
+                        return False
+            elif not self.exists():
+                raise NoPage(u'Page move failed: Source page [[%s]] does not
exist.' % newtitle)
+            elif Page(self.site(),newtitle).exists():
+                # XXX : This might be buggy : if the move was successful, the target pase
*has* been created
+                raise PageNotSaved(u'Page move failed: Target page [[%s]] already
exists.' % newtitle)
+            else:
+                output(u'Page move failed for unknown reason.')
+                try:
+                    ibegin = data.index('<!-- start content -->') + 22
+                    iend = data.index('<!-- end content -->')
+                except ValueError:
+                    # if begin/end markers weren't found, show entire HTML file
+                    output(data)
+                else:
+                    # otherwise, remove the irrelevant sections
+                    data = data[ibegin:iend]
+                output(data)
+                return False
+
+    def delete(self, reason=None, prompt=True, throttle=True, mark=False):
+        """Deletes the page from the wiki. Requires administrator status.
+
+        @param reason: The edit summary for the deletion. If None, ask for it.
+        @param prompt: If true, prompt user for confirmation before deleting.
+        @param mark: if true, and user does not have sysop rights, place a
+            speedy-deletion request on the page instead.
+
+        """
+        # Login
+        try:
+            self._getActionUser(action = 'delete', sysop = True)
+        except NoUsername:
+             if mark and self.exists():
+                 text = self.get(get_redirect = True)
+                 output(u'Cannot delete page %s - marking the page for deletion
instead:' % self.title(asLink=True))
+                 # Note: Parameters to {{delete}}, and their meanings, vary from one
Wikipedia to another.
+                 # If you want or need to use them, you must be careful not to break
others. Else don't.
+                 self.put(u'{{delete|bot=yes}}\n%s --~~~~\n----\n\n%s' % (reason,
text), comment = reason)
+                 return
+             else:
+                 raise
+
+        # Check blocks
+        self.site().checkBlocks(sysop = True)
+
+        if throttle:
+            put_throttle()
+        if reason is None:
+            output(u'Deleting %s.' % (self.title(asLink=True)))
+            reason = input(u'Please enter a reason for the deletion:')
+        answer = u'y'
+        if prompt and not hasattr(self.site(), '_noDeletePrompt'):
+            answer = inputChoice(u'Do you want to delete %s?' % self,
+                                 ['yes', 'no', 'all'],
['y', 'N', 'a'], 'N')
+            if answer == 'a':
+                answer = 'y'
+                self.site()._noDeletePrompt = True
+        if answer == 'y':
+
+            token = self.site().getToken(self, sysop = True)
+            reason = reason.encode(self.site().encoding())
+
+            if self.site().has_api() and self.site().versionnumber() >= 12:
+                #API Mode
+                params = {
+                    'action': 'delete',
+                    'title': self.title(),
+                    'token': token,
+                    'reason': reason,
+                }
+                datas = query.GetData(params, self.site(), sysop = True)
+                if 'delete' in datas:
+                    output(u'Page %s deleted' % self)
+                    return True
+                else:
+                    if datas['error']['code'] == 'missingtitle':
+                        output(u'Page %s could not be deleted - it doesn\'t
exist'
+                               % self)
+                    else:
+                        output(u'Deletion of %s failed for an unknown reason. The
response text is:'
+                               % self)
+                        output('%s' % datas)
+
+                    return False
+            else:
+                #Ordinary mode from webpage.
+                host = self.site().hostname()
+                address = self.site().delete_address(self.urlname())
+
+                predata = {
+                    'wpDeleteReasonList': 'other',
+                    'wpReason': reason,
+                    #'wpComment': reason, <- which version?
+                    'wpConfirm': '1',
+                    'wpConfirmB': '1',
+                    'wpEditToken': token,
+                }
+                response, data = self.site().postForm(address, predata, sysop = True)
+                if data:
+                    self.site().checkBlocks(sysop = True)
+                    if self.site().mediawiki_message('actioncomplete') in data:
+                        output(u'Page %s deleted' % self)
+                        return True
+                    elif self.site().mediawiki_message('cannotdelete') in data:
+                        output(u'Page %s could not be deleted - it doesn\'t
exist'
+                               % self)
+                        return False
+                    else:
+                        output(u'Deletion of %s failed for an unknown reason. The
response text is:'
+                               % self)
+                        try:
+                            ibegin = data.index('<!-- start content -->') +
22
+                            iend = data.index('<!-- end content -->')
+                        except ValueError:
+                            # if begin/end markers weren't found, show entire HTML
file
+                            output(data)
+                        else:
+                            # otherwise, remove the irrelevant sections
+                            data = data[ibegin:iend]
+                        output(data)
+                        return False
+
+    def loadDeletedRevisions(self, step=None, total=None):
+        """Retrieve all deleted revisions for this Page from
Special/Undelete.
+
+        Stores all revisions' timestamps, dates, editors and comments in
+        self._deletedRevs attribute.
+
+        @return: list of timestamps (which can be used to retrieve
+            revisions later on).
+
+        """
+        # Login
+        self._getActionUser(action = 'deletedhistory', sysop = True)
+
+        #TODO: Handle image file revisions too.
+        output(u'Loading list of deleted revisions for [[%s]]...' %
self.title())
+
+        self._deletedRevs = {}
+
+        if self.site().has_api() and self.site().versionnumber() >= 12:
+            params = {
+                'action': 'query',
+                'list': 'deletedrevs',
+                'drfrom': self.title(withNamespace=False),
+                'drnamespace': self.namespace(),
+                'drprop':
['revid','user','comment','content'],#','minor','len','token'],
+                'drlimit': 100,
+                'drdir': 'older',
+                #'': '',
+            }
+            count = 0
+            while True:
+                data = query.GetData(params, self.site(), sysop=True)
+                for x in data['query']['deletedrevs']:
+                    if x['title'] != self.title():
+                        continue
+
+                    for y in x['revisions']:
+                        count += 1
+                        self._deletedRevs[parsetime2stamp(y['timestamp'])] =
[y['timestamp'], y['user'], y['comment'] , y['*'], False]
+
+                if 'query-continue' in data:
+                    # get the continue key for backward compatibility
+                    # with pre 1.20wmf8
+                    contKey =
data['query-continue']['deletedrevs'].keys()[0]
+                    if
data['query-continue']['deletedrevs'][contKey].split(
+                        '|')[1] == self.title(withNamespace=False):
+                        params[contKey] =
data['query-continue']['deletedrevs'][contKey]
+                    else: break
+                else:
+                    break
+            self._deletedRevsModified = False
+
+        else:
+            address = self.site().undelete_view_address(self.urlname())
+            text = self.site().getUrl(address, sysop = True)
+            #TODO: Handle non-existent pages etc
+
+            rxRevs = re.compile(r'<input
name="(?P<ts>(?:ts|fileid)\d+)".*?title=".*?">(?P<date>.*?)</a>.*?title=".*?">(?P<editor>.*?)</a>.*?<span
class="comment">\((?P<comment>.*?)\)</span>',re.DOTALL)
+            for rev in rxRevs.finditer(text):
+                self._deletedRevs[rev.group('ts')] = [
+                        rev.group('date'),
+                        rev.group('editor'),
+                        rev.group('comment'),
+                        None,  #Revision text
+                        False, #Restoration marker
+                        ]
+
+            self._deletedRevsModified = False
+
+        return self._deletedRevs.keys()
+
+    def getDeletedRevision(self, timestamp, retrieveText=False):
+        """Return a particular deleted revision by timestamp.
+
+        @return: a list of [date, editor, comment, text, restoration
+            marker]. text will be None, unless retrieveText is True (or has
+            been retrieved earlier). If timestamp is not found, returns
+            None.
+
+        """
+        if self._deletedRevs is None:
+            self.loadDeletedRevisions()
+        if timestamp not in self._deletedRevs:
+            #TODO: Throw an exception instead?
+            return None
+
+        if retrieveText and not self._deletedRevs[timestamp][3] and
timestamp[:2]=='ts':
+            # Login
+            self._getActionUser(action = 'delete', sysop = True)
+
+            output(u'Retrieving text of deleted revision...')
+            address = self.site().undelete_view_address(self.urlname(),timestamp)
+            text = self.site().getUrl(address, sysop = True)
+            und = re.search('<textarea readonly="1" cols="80"
rows="25">(.*?)</textarea><div><form
method="post"',text,re.DOTALL)
+            if und:
+                self._deletedRevs[timestamp][3] = und.group(1)
+
+        return self._deletedRevs[timestamp]
+
+    def markDeletedRevision(self, timestamp, undelete=True):
+        """Mark the revision identified by timestamp for undeletion.
+
+        @param undelete: if False, mark the revision to remain deleted.
+
+        """
+        if self._deletedRevs is None:
+            self.loadDeletedRevisions()
+        if timestamp not in self._deletedRevs:
+            #TODO: Throw an exception?
+            return None
+        self._deletedRevs[timestamp][4] = undelete
+        self._deletedRevsModified = True
+
+    def undelete(self, comment=None, throttle=True):
+        """Undelete page based on the undeletion markers set by previous
calls.
+
+        If no calls have been made since loadDeletedRevisions(), everything
+        will be restored.
+
+        Simplest case:
+            Page(...).undelete('This will restore all revisions')
+
+        More complex:
+            pg = Page(...)
+            revs = pg.loadDeletedRevsions()
+            for rev in revs:
+                if ... #decide whether to undelete a revision
+                    pg.markDeletedRevision(rev) #mark for undeletion
+            pg.undelete('This will restore only selected revisions.')
+
+        @param comment: The undeletion edit summary.
+
+        """
+        # Login
+        self._getActionUser(action = 'undelete', sysop = True)
+
+        # Check blocks
+        self.site().checkBlocks(sysop = True)
+
+        token = self.site().getToken(self, sysop=True)
+        if comment is None:
+            output(u'Preparing to undelete %s.'
+                   % (self.title(asLink=True)))
+            comment = input(u'Please enter a reason for the undeletion:')
+
+        if throttle:
+            put_throttle()
+
+        if self.site().has_api() and self.site().versionnumber() >= 12:
+            params = {
+                'action': 'undelete',
+                'title': self.title(),
+                'reason': comment,
+                'token': token,
+            }
+            if self._deletedRevs and self._deletedRevsModified:
+                selected = []
+
+                for ts in self._deletedRevs:
+                    if self._deletedRevs[ts][4]:
+                        selected.append(ts)
+                params['timestamps'] = ts,
+
+            result = query.GetData(params, self.site(), sysop=True)
+            if 'error' in result:
+                raise RuntimeError("%s" % result['error'])
+            elif 'undelete' in result:
+                output(u'Page %s undeleted' % self.title(asLink=True))
+
+            return result
+
+        else:
+            address = self.site().undelete_address()
+
+            formdata = {
+                'target': self.title(),
+                'wpComment': comment,
+                'wpEditToken': token,
+                'restore': self.site().mediawiki_message('undeletebtn')
+            }
+
+            if self._deletedRevs and self._deletedRevsModified:
+                for ts in self._deletedRevs:
+                    if self._deletedRevs[ts][4]:
+                        formdata['ts'+ts] = '1'
+
+            self._deletedRevs = None
+            #TODO: Check for errors below (have we succeeded? etc):
+            result = self.site().postForm(address,formdata,sysop=True)
+            output(u'Page %s undeleted' % self.title(asLink=True))
+
+            return result
+
+    def protect(self, editcreate='sysop', move='sysop', unprotect=False,
+                reason=None, editcreate_duration='infinite',
+                move_duration = 'infinite', cascading = False, prompt = True,
throttle = True):
+        """(Un)protect a wiki page. Requires administrator status.
+
+        If the title is not exist, the protection only ec (aka edit/create) available
+        If reason is None,  asks for a reason. If prompt is True, asks the
+        user if he wants to protect the page. Valid values for ec and move
+        are:
+           * '' (equivalent to 'none')
+           * 'autoconfirmed'
+           * 'sysop'
+
+        """
+        # Login
+        self._getActionUser(action = 'protect', sysop = True)
+
+        # Check blocks
+        self.site().checkBlocks(sysop = True)
+
+        #if self.exists() and editcreate != move: # check protect level if edit/move not
same
+        #    if editcreate == 'sysop' and move != 'sysop':
+        #        raise Error("The level configuration is not safe")
+
+        if unprotect:
+            editcreate = move = ''
+        else:
+            editcreate, move = editcreate.lower(), move.lower()
+        if throttle:
+            put_throttle()
+        if reason is None:
+            reason = input(
+              u'Please enter a reason for the change of the protection level:')
+        reason = reason.encode(self.site().encoding())
+        answer = 'y'
+        if prompt and not hasattr(self.site(), '_noProtectPrompt'):
+            answer = inputChoice(
+                u'Do you want to change the protection level of %s?' % self,
+                ['Yes', 'No', 'All'], ['Y', 'N',
'A'], 'N')
+            if answer == 'a':
+                answer = 'y'
+                self.site()._noProtectPrompt = True
+        if answer == 'y':
+            if not self.site().has_api() or self.site().versionnumber() < 12:
+                return self._oldProtect(editcreate, move, unprotect, reason,
+                                        editcreate_duration, move_duration,
+                                        cascading, prompt, throttle)
+
+            token = self.site().getToken(self, sysop = True)
+
+            # Translate 'none' to ''
+            protections = []
+            expiry = []
+            if editcreate == 'none':
+                editcreate = 'all'
+            if move == 'none':
+                move = 'all'
+
+            if editcreate_duration == 'none' or not editcreate_duration:
+                editcreate_duration = 'infinite'
+            if move_duration == 'none' or not move_duration:
+                move_duration = 'infinite'
+
+            if self.exists():
+                protections.append("edit=%s" % editcreate)
+
+                protections.append("move=%s" % move)
+                expiry.append(move_duration)
+            else:
+                protections.append("create=%s" % editcreate)
+
+            expiry.append(editcreate_duration)
+
+            params = {
+                'action': 'protect',
+                'title': self.title(),
+                'token': token,
+                'protections': protections,
+                'expiry': expiry,
+                #'': '',
+            }
+            if reason:
+                params['reason'] = reason
+
+            if cascading:
+                if editcreate != 'sysop' or move != 'sysop' or not
self.exists():
+                    # You can't protect a page as autoconfirmed and cascading,
prevent the error
+                    # Cascade only available exists page, create prot. not.
+                    output(u"NOTE: The page can't be protected with cascading
and not also with only-sysop. Set cascading \"off\"")
+                else:
+                    params['cascade'] = 1
+
+            result = query.GetData(params, self.site(), sysop=True)
+
+            if 'error' in result: #error occured
+                err = result['error']['code']
+                output('%s' % result)
+                #if err == '':
+                #
+                #elif err == '':
+                #
+            else:
+                if result['protect']:
+                    output(u'Changed protection level of page %s.' %
self.title(asLink=True))
+                    return True
+
+        return False
+
+    def _oldProtect(self, editcreate = 'sysop', move = 'sysop', unprotect
= False, reason = None, editcreate_duration = 'infinite',
+                move_duration = 'infinite', cascading = False, prompt = True,
throttle = True):
+        """internal use for protect page by ordinary web page
form"""
+        host = self.site().hostname()
+        token = self.site().getToken(sysop = True)
+
+        # Translate 'none' to ''
+        if editcreate == 'none': editcreate = ''
+        if move == 'none': move = ''
+
+        # Translate no duration to infinite
+        if editcreate_duration == 'none' or not editcreate_duration:
editcreate_duration = 'infinite'
+        if move_duration == 'none' or not move_duration: move_duration =
'infinite'
+
+        # Get cascading
+        if cascading == False:
+            cascading = '0'
+        else:
+            if editcreate != 'sysop' or move != 'sysop' or not
self.exists():
+                # You can't protect a page as autoconfirmed and cascading, prevent
the error
+                # Cascade only available exists page, create prot. not.
+                cascading = '0'
+                output(u"NOTE: The page can't be protected with cascading and
not also with only-sysop. Set cascading \"off\"")
+            else:
+                cascading = '1'
+
+        if unprotect:
+            address = self.site().unprotect_address(self.urlname())
+        else:
+            address = self.site().protect_address(self.urlname())
+
+        predata = {}
+        if self.site().versionnumber >= 10:
+            predata['mwProtect-cascade'] = cascading
+
+        predata['mwProtect-reason'] = reason
+
+        if not self.exists(): #and self.site().versionnumber() >= :
+            #create protect
+            predata['mwProtect-level-create'] = editcreate
+            predata['wpProtectExpirySelection-create'] = editcreate_duration
+        else:
+            #edit/move Protect
+            predata['mwProtect-level-edit'] = editcreate
+            predata['mwProtect-level-move'] = move
+
+            if self.site().versionnumber() >= 14:
+                predata['wpProtectExpirySelection-edit'] = editcreate_duration
+                predata['wpProtectExpirySelection-move'] = move_duration
+            else:
+                predata['mwProtect-expiry'] = editcreate_duration
+
+        if token:
+            predata['wpEditToken'] = token
+
+        response, data = self.site().postForm(address, predata, sysop=True)
+
+        if response.code == 302 and not data:
+            output(u'Changed protection level of page %s.' %
self.title(asLink=True))
+            return True
+        else:
+            #Normally, we expect a 302 with no data, so this means an error
+            self.site().checkBlocks(sysop = True)
+            output(u'Failed to change protection level of page %s:'
+                   % self.title(asLink=True))
+            output(u"HTTP response code %s" % response.code)
+            output(data)
+            return False
+
+    def removeImage(self, image, put=False, summary=None, safe=True):
+        """Remove all occurrences of an image from this
Page."""
+        # TODO: this should be grouped with other functions that operate on
+        # wiki-text rather than the Page object
+        return self.replaceImage(image, None, put, summary, safe)
+
+    def replaceImage(self, image, replacement=None, put=False, summary=None,
+                     safe=True):
+        """Replace all occurences of an image by another image.
+
+        Giving None as argument for replacement will delink instead of
+        replace.
+
+        The argument image must be without namespace and all spaces replaced
+        by underscores.
+
+        If put is False, the new text will be returned.  If put is True, the
+        edits will be saved to the wiki and True will be returned on succes,
+        and otherwise False. Edit errors propagate.
+
+        """
+        # TODO: this should be grouped with other functions that operate on
+        # wiki-text rather than the Page object
+
+        # Copyright (c) Orgullomoore, Bryan
+
+        # TODO: document and simplify the code
+        site = self.site()
+
+        text = self.get()
+        new_text = text
+
+        def capitalizationPattern(s):
+            """
+            Given a string, creates a pattern that matches the string, with
+            the first letter case-insensitive if capitalization is switched
+            on on the site you're working on.
+            """
+            if self.site().nocapitalize:
+                return re.escape(s)
+            else:
+                return ur'(?:[%s%s]%s)' % (re.escape(s[0].upper()),
re.escape(s[0].lower()), re.escape(s[1:]))
+
+        namespaces = set(site.namespace(6, all = True) + site.namespace(-2, all = True))
+        # note that the colon is already included here
+        namespacePattern = ur'\s*(?:%s)\s*\:\s*' % u'|'.join(namespaces)
+
+        imagePattern = u'(%s)' %
capitalizationPattern(image).replace(r'\_', '[ _]')
+
+        def filename_replacer(match):
+            if replacement is None:
+                return u''
+            else:
+                old = match.group()
+                return old[:match.start('filename')] + replacement +
old[match.end('filename'):]
+
+        # The group params contains parameters such as thumb and 200px, as well
+        # as the image caption. The caption can contain wiki links, but each
+        # link has to be closed properly.
+        paramPattern = r'(?:\|(?:(?!\[\[).|\[\[.*?\]\])*?)'
+        rImage =
re.compile(ur'\[\[(?P<namespace>%s)(?P<filename>%s)(?P<params>%s*?)\]\]'
% (namespacePattern, imagePattern, paramPattern))
+        if replacement is None:
+            new_text = rImage.sub('', new_text)
+        else:
+            new_text = rImage.sub('[[\g<namespace>%s\g<params>]]' %
replacement, new_text)
+
+        # Remove the image from galleries
+        galleryR =
re.compile(r'(?is)<gallery>(?P<items>.*?)</gallery>')
+        galleryItemR =
re.compile(r'(?m)^%s?(?P<filename>%s)\s*(?P<label>\|.*?)?\s*$' %
(namespacePattern, imagePattern))
+
+        def gallery_replacer(match):
+            return ur'<gallery>%s</gallery>' %
galleryItemR.sub(filename_replacer, match.group('items'))
+
+        new_text = galleryR.sub(gallery_replacer, new_text)
+
+        if (text == new_text) or (not safe):
+            # All previous steps did not work, so the image is
+            # likely embedded in a complicated template.
+            # Note: this regular expression can't handle nested templates.
+            templateR = re.compile(ur'(?s)\{\{(?P<contents>.*?)\}\}')
+            fileReferenceR = re.compile(u'%s(?P<filename>(?:%s)?)' %
(namespacePattern, imagePattern))
+
+            def template_replacer(match):
+                return fileReferenceR.sub(filename_replacer, match.group(0))
+
+            new_text = templateR.sub(template_replacer, new_text)
+
+        if put:
+            if text != new_text:
+                # Save to the wiki
+                self.put(new_text, summary)
+                return True
+            return False
+        else:
+            return new_text
+
+    ## @since   10310
+    #  @remarks needed by various bots
+    def purgeCache(self):
+        """Purges the page cache with API.
+           ( non-api purge can be done with Page.purge_address() )
+        """
+
+        # Make sure we re-raise an exception we got on an earlier attempt
+        if hasattr(self, '_getexception'):
+            return self._getexception
+
+        # call the wiki to execute the request
+        params = {
+            u'action'    : u'purge',
+            u'titles'    : self.title(),
+        }
+
+        pywikibot.get_throttle()
+        pywikibot.output(u"Purging page cache for %s." %
self.title(asLink=True))
+
+        result = query.GetData(params, self.site())
+        r = result[u'purge'][0]
+
+        # store and return info
+        if (u'missing' in r):
+                self._getexception = pywikibot.NoPage
+                raise pywikibot.NoPage(self.site(), self.title(asLink=True),"Page
does not exist. Was not able to purge cache!" )
+
+        return (u'purged' in r)
+
+
+class ImagePage(Page):
+    """A subclass of Page representing an image descriptor wiki page.
+
+    Supports the same interface as Page, with the following added methods:
+
+    getImagePageHtml          : Download image page and return raw HTML text.
+    fileURL                   : Return the URL for the image described on this
+                                page.
+    fileIsOnCommons           : Return True if image stored on Wikimedia
+                                Commons.
+    fileIsShared              : Return True if image stored on Wikitravel
+                                shared repository.
+    getFileMd5Sum             : Return image file's MD5 checksum.
+    getFileVersionHistory     : Return the image file's version history.
+    getFileVersionHistoryTable: Return the version history in the form of a
+                                wiki table.
+    usingPages                : Yield Pages on which the image is displayed.
+    globalUsage               : Yield Pages on which the image is used globally
+
+    """
+    def __init__(self, site, title, insite = None):
+        Page.__init__(self, site, title, insite, defaultNamespace=6)
+        if self.namespace() != 6:
+            raise ValueError(u'BUG: %s is not in the image namespace!' % title)
+        self._imagePageHtml = None
+        self._local = None
+        self._latestInfo = {}
+        self._infoLoaded = False
+
+    def getImagePageHtml(self):
+        """
+        Download the image page, and return the HTML, as a unicode string.
+
+        Caches the HTML code, so that if you run this method twice on the
+        same ImagePage object, the page will only be downloaded once.
+        """
+        if not self._imagePageHtml:
+            path = self.site().get_address(self.urlname())
+            self._imagePageHtml = self.site().getUrl(path)
+        return self._imagePageHtml
+
+    def _loadInfo(self, limit=1):
+        params = {
+            'action': 'query',
+            'prop': 'imageinfo',
+            'titles': self.title(),
+            'iiprop': ['timestamp', 'user', 'comment',
'url', 'size',
+                       'dimensions', 'sha1', 'mime',
'metadata', 'archivename',
+                       'bitdepth'],
+            'iilimit': limit,
+        }
+        try:
+            data = query.GetData(params, self.site())
+        except NotImplementedError:
+            output("API not work, loading page HTML.")
+            self.getImagePageHtml()
+            return
+
+        if 'error' in data:
+            raise RuntimeError("%s" %data['error'])
+        count = 0
+        pageInfo = data['query']['pages'].values()[0]
+        self._local = pageInfo["imagerepository"] != "shared"
+        if data['query']['pages'].keys()[0] == "-1":
+            if 'missing' in pageInfo and self._local:
+                raise NoPage(self.site(), unicode(self),
+                             "Page does not exist.")
+            elif 'invalid' in pageInfo:
+                raise BadTitle('BadTitle: %s' % self)
+        infos = []
+
+        try:
+            while True:
+                for info in pageInfo['imageinfo']:
+                    count += 1
+                    if count == 1 and 'iistart' not in params:
+                    # count 1 and no iicontinue mean first image revision is latest.
+                        self._latestInfo = info
+                    infos.append(info)
+                    if limit == 1:
+                        break
+
+                if 'query-continue' in data and limit != 1:
+                    params['iistart'] =
data['query-continue']['imageinfo']['iistart']
+                else:
+                    break
+        except KeyError:
+            output("Not image in imagepage")
+        self._infoLoaded = True
+        if limit > 1:
+            return infos
+
+    def fileUrl(self):
+        """Return the URL for the image described on this
page."""
+        # There are three types of image pages:
+        # * normal, small images with links like: filename.png (10KB, MIME type:
image/png)
+        # * normal, large images with links like: Download high resolution version
(1024x768, 200 KB)
+        # * SVG images with links like: filename.svg (1KB, MIME type: image/svg)
+        # This regular expression seems to work with all of them.
+        # The part after the | is required for copying .ogg files from en:, as they do
not
+        # have a "full image link" div. This might change in the future; on
commons, there
+        # is a full image link for .ogg and .mid files.
+        #***********************
+        #change to API query:
action=query&titles=File:wiki.jpg&prop=imageinfo&iiprop=url
+        if not self._infoLoaded:
+            self._loadInfo()
+
+        if self._infoLoaded:
+            return self._latestInfo['url']
+
+        urlR = re.compile(r'<div class="fullImageLink"
id="file">.*?<a href="(?P<url>[^ ]+?)"(?!
class="image")|<span class="dangerousLink"><a
href="(?P<url2>.+?)"', re.DOTALL)
+        m = urlR.search(self.getImagePageHtml())
+
+        url = m.group('url') or m.group('url2')
+        return url
+
+    def fileIsOnCommons(self):
+        """Return True if the image is stored on Wikimedia
Commons"""
+        if not self._infoLoaded:
+            self._loadInfo()
+
+        if self._infoLoaded:
+            return not self._local
+
+        return
self.fileUrl().startswith(u'http://upload.wikimedia.org/wikipedia/commo…
+
+    def fileIsShared(self):
+        """Return True if image is stored on Wikitravel shared
repository."""
+        if 'wikitravel_shared' in self.site().shared_image_repository():
+            return
self.fileUrl().startswith(u'http://wikitravel.org/upload/shared/')
+        return self.fileIsOnCommons()
+
+    # FIXME: MD5 might be performed on not complete file due to server disconnection
+    # (see bug #1795683).
+    def getFileMd5Sum(self):
+        """Return image file's MD5 checksum."""
+        f = MyURLopener.open(self.fileUrl())
+        return md5(f.read()).hexdigest()
+
+    def getFileVersionHistory(self):
+        """Return the image file's version history.
+
+        Return value is a list of tuples containing (timestamp, username,
+        resolution, filesize, comment).
+
+        """
+        result = []
+        infos = self._loadInfo(500)
+        #API query
+        if infos:
+            for i in infos:
+                result.append((i['timestamp'], i['user'],
u"%s×%s" % (i['width'], i['height']), i['size'],
i['comment']))
+
+            return result
+
+        #from ImagePage HTML
+        history = re.search('(?s)<table class="wikitable
filehistory">.+?</table>', self.getImagePageHtml())
+        if history:
+            lineR =
re.compile(r'<tr>(?:<td>.*?</td>){1,2}<td.*?><a
href=".+?">(?P<datetime>.+?)</a></td><td>.*?(?P<resolution>\d+\xd7\d+)
<span.*?>\((?P<filesize>.+?)\)</span></td><td><a
href=".+?"(?: class="new"|)
title=".+?">(?P<username>.+?)</a>.*?</td><td>(?:.*?<span
class="comment">\((?P<comment>.*?)\)</span>)?</td></tr>')
+            if not lineR.search(history.group()):
+                # b/c code
+                lineR =
re.compile(r'<tr>(?:<td>.*?</td>){1,2}<td><a
href=".+?">(?P<datetime>.+?)</a></td><td><a
href=".+?"(?: class="new"|)
title=".+?">(?P<username>.+?)</a>.*?</td><td>(?P<resolution>.*?)</td><td
class=".+?">(?P<filesize>.+?)</td><td>(?P<comment>.*?)</td></tr>')
+        else:
+            # backward compatible code
+            history = re.search('(?s)<ul
class="special">.+?</ul>', self.getImagePageHtml())
+            if history:
+                lineR = re.compile('<li> \(.+?\) \(.+?\) <a
href=".+?" title=".+?">(?P<datetime>.+?)</a> . . <a
href=".+?" title=".+?">(?P<username>.+?)</a> \(.+?\) .
. (?P<resolution>\d+.+?\d+) \((?P<filesize>[\d,\.]+) .+?\)( <span
class="comment">(?P<comment>.*?)</span>)?</li>')
+
+        if history:
+            for match in lineR.finditer(history.group()):
+                datetime = match.group('datetime')
+                username = match.group('username')
+                resolution = match.group('resolution')
+                size = match.group('filesize')
+                comment = match.group('comment') or ''
+                result.append((datetime, username, resolution, size, comment))
+        return result
+
+    def getFirstUploader(self):
+        """ Function that uses the APIs to detect the first uploader of
the image """
+        inf = self.getFileVersionHistory()[-1]
+        return [inf[1], inf[0]]
+
+    def getLatestUploader(self):
+        """ Function that uses the APIs to detect the latest uploader of
the image """
+        if not self._infoLoaded:
+            self._loadInfo()
+        if self._infoLoaded:
+            return [self._latestInfo['user'],
self._latestInfo['timestamp']]
+
+        inf = self.getFileVersionHistory()[0]
+        return [inf[1], inf[0]]
+
+    def getHash(self):
+        """ Function that return the Hash of an file in oder to understand
if two
+            Files are the same or not.
+            """
+        if self.exists():
+            if not self._infoLoaded:
+                self._loadInfo()
+            try:
+                return self._latestInfo['sha1']
+            except (KeyError, IndexError, TypeError):
+                try:
+                    self.get()
+                except NoPage:
+                    output(u'%s has been deleted before getting the Hash.
Skipping...' % self.title())
+                    return None
+                except IsRedirectPage:
+                    output("Skipping %s because it's a redirect." %
self.title())
+                    return None
+                else:
+                    raise NoHash('No Hash found in the APIs! Maybe the regex to catch
it is wrong or someone has changed the APIs structure.')
+        else:
+            output(u'File deleted before getting the Hash. Skipping...')
+            return None
+
+    def getFileVersionHistoryTable(self):
+        """Return the version history in the form of a wiki
table."""
+        lines = []
+        for (datetime, username, resolution, size, comment) in
self.getFileVersionHistory():
+            lines.append(u'| %s || %s || %s || %s ||
<nowiki>%s</nowiki>' % (datetime, username, resolution, size, comment))
+        return u'{| border="1"\n! date/time || username || resolution ||
size || edit summary\n|----\n' + u'\n|----\n'.join(lines) + '\n|}'
+
+    def usingPages(self):
+        if not self.site().has_api() or self.site().versionnumber() < 11:
+            for a in self._usingPagesOld():
+                yield a
+            return
+
+        params = {
+            'action': 'query',
+            'list': 'imageusage',
+            'iutitle': self.title(),
+            'iulimit': config.special_page_limit,
+            #'': '',
+        }
+
+        while True:
+            data = query.GetData(params, self.site())
+            if 'error' in data:
+                raise RuntimeError("%s" % data['error'])
+
+            for iu in data['query']["imageusage"]:
+                yield Page(self.site(), iu['title'],
defaultNamespace=iu['ns'])
+
+            if 'query-continue' in data:
+                params['iucontinue'] =
data['query-continue']['imageusage']['iucontinue']
+            else:
+                break
+
+    def _usingPagesOld(self):
+        """Yield Pages on which the image is displayed."""
+        titleList = re.search('(?s)<h2 id="filelinks">.+?<!-- end
content -->',
+                              self.getImagePageHtml()).group()
+        lineR = re.compile(
+                    '<li><a href="[^\"]+"
title=".+?">(?P<title>.+?)</a></li>')
+
+        for match in lineR.finditer(titleList):
+            try:
+                yield Page(self.site(), match.group('title'))
+            except InvalidTitle:
+                output(
+        u"Image description page %s contains invalid reference to [[%s]]."
+                    % (self.title(), match.group('title')))
+
+    def globalUsage(self):
+        '''
+        Yield Pages on which the image is used globally.
+        Currently this probably only works on Wikimedia Commonas.
+        '''
+
+        if not self.site().has_api() or self.site().versionnumber() < 11:
+            # Not supported, just return none
+            return
+
+        params = {
+            'action': 'query',
+            'prop': 'globalusage',
+            'titles': self.title(),
+            'gulimit': config.special_page_limit,
+            #'': '',
+        }
+
+        while True:
+            data = query.GetData(params, self.site())
+            if 'error' in data:
+                raise RuntimeError("%s" % data['error'])
+
+            for (page, globalusage) in data['query']['pages'].items():
+                for gu in globalusage['globalusage']:
+                    #FIXME : Should have a cleaner way to get the wiki where the image is
used
+                    siteparts = gu['wiki'].split('.')
+                    if len(siteparts)==3:
+                        if siteparts[0] in self.site().fam().alphabetic and siteparts[1]
in ['wikipedia', 'wiktionary', 'wikibooks',
'wikiquote','wikisource']:
+                            code = siteparts[0]
+                            fam = siteparts[1]
+                        elif siteparts[0] in ['meta', 'incubator'] and
siteparts[1]==u'wikimedia':
+                            code = code = siteparts[0]
+                            fam = code = siteparts[0]
+                        else:
+                            code = None
+                            fam = None
+                        if code and fam:
+                            site = getSite(code=code, fam=fam)
+                            yield Page(site, gu['title'])
+
+            if 'query-continue' in data:
+                params['gucontinue'] =
data['query-continue']['globalusage']['gucontinue']
+            else:
+                break
+
+
+class _GetAll(object):
+    """For internal use only - supports getall()
function"""
+    def __init__(self, site, pages, throttle, force):
+        self.site = site
+        self.pages = []
+        self.throttle = throttle
+        self.force = force
+        self.sleeptime = 15
+
+        for page in pages:
+            if (not hasattr(page, '_contents') and not hasattr(page,
'_getexception')) or force:
+                self.pages.append(page)
+            elif verbose:
+                output(u"BUGWARNING: %s already done!" %
page.title(asLink=True))
+
+    def sleep(self):
+        time.sleep(self.sleeptime)
+        if self.sleeptime <= 60:
+            self.sleeptime += 15
+        elif self.sleeptime < 360:
+            self.sleeptime += 60
+
+    def run(self):
+        if self.pages:
+            # Sometimes query does not contains revisions
+            if  self.site.has_api() and debug:
+                while True:
+                    try:
+                        data = self.getDataApi()
+                    except (socket.error, httplib.BadStatusLine, ServerError):
+                        # Print the traceback of the caught exception
+                        s = ''.join(traceback.format_exception(*sys.exc_info()))
+                        if not isinstance(s, unicode):
+                            s = s.decode('utf-8')
+                        output(u'%s\nDBG> got network error in _GetAll.run. '
\
+                                'Sleeping for %d seconds...' % (s,
self.sleeptime))
+                        self.sleep()
+                    else:
+                        if 'error' in data:
+                            raise RuntimeError(data['error'])
+                        else:
+                            break
+
+                self.headerDoneApi(data['query'])
+                if 'normalized' in data['query']:
+                    self._norm = dict([(x['from'],x['to']) for x in
data['query']['normalized']])
+                for vals in data['query']['pages'].values():
+                    self.oneDoneApi(vals)
+            else: #read pages via Special:Export
+                while True:
+                    try:
+                        data = self.getData()
+                    except (socket.error, httplib.BadStatusLine, ServerError):
+                        # Print the traceback of the caught exception
+                        s = ''.join(traceback.format_exception(*sys.exc_info()))
+                        if not isinstance(s, unicode):
+                            s = s.decode('utf-8')
+                        output(u'%s\nDBG> got network error in _GetAll.run. '
\
+                                'Sleeping for %d seconds...' % (s,
self.sleeptime))
+                        self.sleep()
+                    else:
+                        if "<title>Wiki does not exist</title>" in
data:
+                            raise NoSuchSite(u'Wiki %s does not exist yet' %
self.site)
+                        elif "</mediawiki>" not in data[-20:]:
+                            # HTML error Page got thrown because of an internal
+                            # error when fetching a revision.
+                            output(u'Received incomplete XML data. ' \
+                                'Sleeping for %d seconds...' % self.sleeptime)
+                            self.sleep()
+                        elif "<siteinfo>" not in data: # This probably
means we got a 'temporary unaivalable'
+                            output(u'Got incorrect export page. ' \
+                                'Sleeping for %d seconds...' % self.sleeptime)
+                            self.sleep()
+                        else:
+                            break
+                R = re.compile(r"\s*<\?xml([^>]*)\?>(.*)",re.DOTALL)
+                m = R.match(data)
+                if m:
+                    data = m.group(2)
+                handler = xmlreader.MediaWikiXmlHandler()
+                handler.setCallback(self.oneDone)
+                handler.setHeaderCallback(self.headerDone)
+                #f = open("backup.txt", "w")
+                #f.write(data)
+                #f.close()
+                try:
+                    xml.sax.parseString(data, handler)
+                except (xml.sax._exceptions.SAXParseException, ValueError), err:
+                    debugDump( 'SaxParseBug', self.site, err, data )
+                    raise
+                except PageNotFound:
+                    return
+                # All of the ones that have not been found apparently do not exist
+
+            for pl in self.pages:
+                if not hasattr(pl,'_contents') and not
hasattr(pl,'_getexception'):
+                    pl._getexception = NoPage
+
+    def oneDone(self, entry):
+        title = entry.title
+        username = entry.username
+        ipedit = entry.ipedit
+        timestamp = entry.timestamp
+        text = entry.text
+        editRestriction = entry.editRestriction
+        moveRestriction = entry.moveRestriction
+        revisionId = entry.revisionid
+
+        page = Page(self.site, title)
+        successful = False
+        for page2 in self.pages:
+            if page2.sectionFreeTitle() == page.sectionFreeTitle():
+                if not (hasattr(page2,'_contents') or \
+                        hasattr(page2, '_getexception')) or self.force:
+                    page2.editRestriction = entry.editRestriction
+                    page2.moveRestriction = entry.moveRestriction
+                    if editRestriction == 'autoconfirmed':
+                        page2._editrestriction = True
+                    page2._permalink = entry.revisionid
+                    page2._userName = username
+                    page2._ipedit = ipedit
+                    page2._revisionId = revisionId
+                    page2._editTime = timestamp
+                    page2._versionhistory = [
+                        (revisionId,
+                         time.strftime("%Y-%m-%dT%H:%M:%SZ",
+                                       time.strptime(str(timestamp),
+                                                     "%Y%m%d%H%M%S")),
+                         username, entry.comment)]
+                    section = page2.section()
+                    # Store the content
+                    page2._contents = text
+                    m = self.site.redirectRegex().match(text)
+                    if m:
+                        ## output(u"%s is a redirect" %
page2.title(asLink=True))
+                        redirectto = m.group(1)
+                        if section and not "#" in redirectto:
+                            redirectto += "#" + section
+                        page2._getexception = IsRedirectPage
+                        page2._redirarg = redirectto
+
+                    # This is used for checking deletion conflict.
+                    # Use the data loading time.
+                    page2._startTime = time.strftime('%Y%m%d%H%M%S',
+                                                     time.gmtime())
+                    if section:
+                        m = re.search("=+[ ']*%s[ ']*=+" %
re.escape(section), text)
+                        if not m:
+                            try:
+                                page2._getexception
+                                output(u"WARNING: Section not found: %s" %
page2)
+                            except AttributeError:
+                                # There is no exception yet
+                                page2._getexception = SectionError
+                successful = True
+                # Note that there is no break here. The reason is that there
+                # might be duplicates in the pages list.
+        if not successful:
+            output(u"BUG>> title %s (%s) not found in list" % (title,
page))
+            output(u'Expected one of: %s'
+                   % u','.join([unicode(page2) for page2 in self.pages]))
+            raise PageNotFound
+
+    def headerDone(self, header):
+        # Verify version
+        version = header.generator
+        p = re.compile('^MediaWiki (.+)$')
+        m = p.match(version)
+        if m:
+            version = m.group(1)
+            # only warn operator when versionnumber has been changed
+            versionnumber = self.site.family.versionnumber
+            if version != self.site.version() and \
+               versionnumber(self.site.lang,
+                             version=version) != versionnumber(self.site.lang):
+                output(u'WARNING: Family file %s contains version number %s, but it
should be %s'
+                       % (self.site.family.name, self.site.version(), version))
+
+        # Verify case
+        if self.site.nocapitalize:
+            case = 'case-sensitive'
+        else:
+            case = 'first-letter'
+        if case != header.case.strip():
+            output(u'WARNING: Family file %s contains case %s, but it should be
%s' % (self.site.family.name, case, header.case.strip()))
+
+        # Verify namespaces
+        lang = self.site.lang
+        ids = header.namespaces.keys()
+        ids.sort()
+        for id in ids:
+            nshdr = header.namespaces[id]
+            if self.site.family.isDefinedNSLanguage(id, lang):
+                ns = self.site.namespace(id) or u''
+                if ns != nshdr:
+                    try:
+                        dflt = self.site.family.namespace('_default', id)
+                    except KeyError:
+                        dflt = u''
+                    if not ns and not dflt:
+                        flag = u"is not set, but should be '%s'" %
nshdr
+                    elif dflt == ns:
+                        flag = u"is set to default ('%s'), but should be
'%s'" % (ns, nshdr)
+                    elif dflt == nshdr:
+                        flag = u"is '%s', but should be removed (default
value '%s')" % (ns, nshdr)
+                    else:
+                        flag = u"is '%s', but should be '%s'" %
(ns, nshdr)
+                    output(u"WARNING: Outdated family file %s:
namespace['%s'][%i] %s" % (self.site.family.name, lang, id, flag))
+                    #self.site.family.namespaces[id][lang] = nshdr
+            else:
+                output(u"WARNING: Missing namespace in family file %s:
namespace['%s'][%i] (it is set to '%s')" % (self.site.family.name,
lang, id, nshdr))
+        for id in self.site.family.namespaces:
+            if self.site.family.isDefinedNSLanguage(id, lang) and id not in
header.namespaces:
+                output(u"WARNING: Family file %s includes
namespace['%s'][%i], but it should be removed (namespace doesn't exist in the
site)" % (self.site.family.name, lang, id))
+
+    def getData(self):
+        address = self.site.export_address()
+        pagenames = [page.sectionFreeTitle() for page in self.pages]
+        # We need to use X convention for requested page titles.
+        if self.site.lang == 'eo':
+            pagenames = [encodeEsperantoX(pagetitle) for pagetitle in pagenames]
+        pagenames = u'\r\n'.join(pagenames)
+        if type(pagenames) is not unicode:
+            output(u'Warning: xmlreader.WikipediaXMLHandler.getData() got non-unicode
page names. Please report this.')
+            print pagenames
+        # convert Unicode string to the encoding used on that wiki
+        pagenames = pagenames.encode(self.site.encoding())
+        predata = {
+            'action': 'submit',
+            'pages': pagenames,
+            'curonly': 'True',
+        }
+        # Slow ourselves down
+        get_throttle(requestsize = len(self.pages))
+        # Now make the actual request to the server
+        now = time.time()
+        response, data = self.site.postForm(address, predata)
+        # The XML parser doesn't expect a Unicode string, but an encoded one,
+        # so we'll encode it back.
+        data = data.encode(self.site.encoding())
+        #get_throttle.setDelay(time.time() - now)
+        return data
+
+    def oneDoneApi(self, data):
+        title = data['title']
+        if not ('missing' in data or 'invalid' in data):
+            revisionId = data['lastrevid']
+            rev = None
+            try:
+                rev = data['revisions']
+            except KeyError:
+                raise KeyError(
+                    u'NOTE: Last revision of [[%s]] not found' % title)
+            else:
+                username = rev[0]['user']
+                ipedit = 'anon' in rev[0]
+                timestamp = rev[0]['timestamp']
+                text = rev[0]['*']
+            editRestriction = ''
+            moveRestriction = ''
+            for revs in data['protection']:
+                if revs['type'] == 'edit':
+                    editRestriction = revs['level']
+                elif revs['type'] == 'move':
+                    moveRestriction = revs['level']
+
+        page = Page(self.site, title)
+        successful = False
+        for page2 in self.pages:
+            if hasattr(self, '_norm') and page2.sectionFreeTitle() in
self._norm:
+                page2._title = self._norm[page2.sectionFreeTitle()]
+
+            if page2.sectionFreeTitle() == page.sectionFreeTitle():
+                if 'missing' in data:
+                    page2._getexception = NoPage
+                    successful = True
+                    break
+
+                if 'invalid' in data:
+                    page2._getexception = BadTitle
+                    successful = True
+                    break
+
+                if not (hasattr(page2,'_contents') or
hasattr(page2,'_getexception')) or self.force:
+                    page2.editRestriction = editRestriction
+                    page2.moveRestriction = moveRestriction
+                    if editRestriction == 'autoconfirmed':
+                        page2._editrestriction = True
+                    page2._permalink = revisionId
+                    if rev:
+                        page2._userName = username
+                        page2._ipedit = ipedit
+                        page2._editTime = timestamp
+                        page2._contents = text
+                    else:
+                        raise KeyError(
+                            u'BUG?>>: Last revision of [[%s]] not found'
+                            % title)
+                    page2._revisionId = revisionId
+                    section = page2.section()
+                    if 'redirect' in data:
+                        ## output(u"%s is a redirect" %
page2.title(asLink=True))
+                        m = self.site.redirectRegex().match(text)
+                        redirectto = m.group(1)
+                        if section and not "#" in redirectto:
+                            redirectto += "#" + section
+                        page2._getexception = IsRedirectPage
+                        page2._redirarg = redirectto
+
+                    # This is used for checking deletion conflict.
+                    # Use the data loading time.
+                    page2._startTime = time.strftime('%Y%m%d%H%M%S',
time.gmtime())
+                    if section:
+                        m = re.search("=+[ ']*%s[ ']*=+" %
re.escape(section), text)
+                        if not m:
+                            try:
+                                page2._getexception
+                                output(u"WARNING: Section not found: %s"
+                                       % page2)
+                            except AttributeError:
+                                # There is no exception yet
+                                page2._getexception = SectionError
+                successful = True
+                # Note that there is no break here. The reason is that there
+                # might be duplicates in the pages list.
+        if not successful:
+            output(u"BUG>> title %s (%s) not found in list" % (title,
page))
+            output(u'Expected one of: %s'
+                   % u','.join([unicode(page2) for page2 in self.pages]))
+            raise PageNotFound
+
+    def headerDoneApi(self, header):
+        p = re.compile('^MediaWiki (.+)$')
+        m = p.match(header['general']['generator'])
+        if m:
+            version = m.group(1)
+            # only warn operator when versionnumber has been changed
+            versionnumber = self.site.family.versionnumber
+            if version != self.site.version() and \
+               versionnumber(self.site.lang,
+                             version=version) != versionnumber(self.site.lang):
+                output(u'WARNING: Family file %s contains version number %s, but it
should be %s'
+                       % (self.site.family.name, self.site.version(), version))
+
+        # Verify case
+        if self.site.nocapitalize:
+            case = 'case-sensitive'
+        else:
+            case = 'first-letter'
+        if case != header['general']['case'].strip():
+            output(u'WARNING: Family file %s contains case %s, but it should be
%s' % (self.site.family.name, case, header.case.strip()))
+
+        # Verify namespaces
+        lang = self.site.lang
+        ids = header['namespaces'].keys()
+        ids.sort()
+        for id in ids:
+            nshdr = header['namespaces'][id]['*']
+            id = header['namespaces'][id]['id']
+            if self.site.family.isDefinedNSLanguage(id, lang):
+                ns = self.site.namespace(id) or u''
+                if ns != nshdr:
+                    try:
+                        dflt = self.site.family.namespace('_default', id)
+                    except KeyError:
+                        dflt = u''
+                    if not ns and not dflt:
+                        flag = u"is not set, but should be '%s'" %
nshdr
+                    elif dflt == ns:
+                        flag = u"is set to default ('%s'), but should be
'%s'" % (ns, nshdr)
+                    elif dflt == nshdr:
+                        flag = u"is '%s', but should be removed (default
value '%s')" % (ns, nshdr)
+                    else:
+                        flag = u"is '%s', but should be '%s'" %
(ns, nshdr)
+                    output(u"WARNING: Outdated family file %s:
namespace['%s'][%i] %s" % (self.site.family.name, lang, id, flag))
+                    #self.site.family.namespaces[id][lang] = nshdr
+            else:
+                output(u"WARNING: Missing namespace in family file %s:
namespace['%s'][%i] (it is set to '%s')" % (self.site.family.name,
lang, id, nshdr))
+        for id in self.site.family.namespaces:
+            if self.site.family.isDefinedNSLanguage(id, lang) and u'%i' % id not
in header['namespaces']:
+                output(u"WARNING: Family file %s includes
namespace['%s'][%i], but it should be removed (namespace doesn't exist in the
site)" % (self.site.family.name, lang, id ) )
+
+    def getDataApi(self):
+        pagenames = [page.sectionFreeTitle() for page in self.pages]
+        params = {
+            'action': 'query',
+            'meta':'siteinfo',
+            'prop': ['info', 'revisions'],
+            'titles': pagenames,
+            'siprop': ['general', 'namespaces'],
+            'rvprop': ['content', 'timestamp', 'user',
'comment', 'size'],#'ids',
+            'inprop': ['protection', 'subjectid'], #,
'talkid', 'url', 'readable'
+        }
+
+        # Slow ourselves down
+        get_throttle(requestsize = len(self.pages))
+        # Now make the actual request to the server
+        now = time.time()
+
+        #get_throttle.setDelay(time.time() - now)
+        return query.GetData(params, self.site)
+
+def getall(site, pages, throttle=True, force=False):
+    """Bulk-retrieve a group of pages from site
+
+    Arguments: site = Site object
+               pages = iterable that yields Page objects
+
+    """
+    # TODO: why isn't this a Site method?
+    pages = list(pages)  # if pages is an iterator, we need to make it a list
+    output(u'Getting %d page%s %sfrom %s...'
+           % (len(pages), (u'', u's')[len(pages) != 1],
+              (u'', u'via API ')[site.has_api() and debug], site))
+    limit = config.special_page_limit / 4 # default is 500/4, but It might have good
point for server.
+    if len(pages) > limit:
+        # separate export pages for bulk-retrieve
+
+        for pagg in range(0, len(pages), limit):
+            if pagg == range(0, len(pages), limit)[-1]: #latest retrieve
+                k = pages[pagg:]
+                output(u'Getting pages %d - %d of %d...' % (pagg + 1, len(pages),
len(pages)))
+                _GetAll(site, k, throttle, force).run()
+                pages[pagg:] = k
+            else:
+                k = pages[pagg:pagg + limit]
+                output(u'Getting pages %d - %d of %d...' % (pagg + 1, pagg +
limit, len(pages)))
+                _GetAll(site, k, throttle, force).run()
+                pages[pagg:pagg + limit] = k
+            get_throttle(requestsize = len(pages) / 10) # one time to retrieve is 7.7
sec.
+    else:
+        _GetAll(site, pages, throttle, force).run()
+
+
+# Library functions
+
+def setAction(s):
+    """Set a summary to use for changed page
submissions"""
+    global action
+    action = s
+
+# Default action
+setAction('Wikipedia python library')
+
+def setUserAgent(s):
+    """Set a User-agent: header passed to the HTTP
server"""
+    global useragent
+    useragent = s
+
+# Default User-agent
+setUserAgent('PythonWikipediaBot/1.0')
+
+def url2link(percentname, insite, site):
+    """Convert urlname of a wiki page into interwiki link format.
+
+    'percentname' is the page title as given by Page.urlname();
+    'insite' specifies the target Site;
+    'site' is the Site on which the page is found.
+
+    """
+    # Note: this is only needed if linking between wikis that use different
+    # encodings, so it is now largely obsolete.  [CONFIRM]
+    percentname = percentname.replace('_', ' ')
+    x = url2unicode(percentname, site = site)
+    return unicode2html(x, insite.encoding())
+
+def decodeEsperantoX(text):
+    """
+    Decode Esperanto text encoded using the x convention.
+
+    E.g., Cxefpagxo and CXefpagXo will both be converted to Ĉefpaĝo.
+    Note that to encode non-Esperanto words like Bordeaux, one uses a
+    double x, i.e. Bordeauxx or BordeauxX.
+
+    """
+    chars = {
+        u'c': u'ĉ',
+        u'C': u'Ĉ',
+        u'g': u'ĝ',
+        u'G': u'Ĝ',
+        u'h': u'ĥ',
+        u'H': u'Ĥ',
+        u'j': u'ĵ',
+        u'J': u'Ĵ',
+        u's': u'ŝ',
+        u'S': u'Ŝ',
+        u'u': u'ŭ',
+        u'U': u'Ŭ',
+    }
+    for latin, esperanto in chars.iteritems():
+        # A regular expression that matches a letter combination which IS
+        # encoded using x-convention.
+        xConvR = re.compile(latin + '[xX]+')
+        pos = 0
+        result = ''
+        # Each matching substring will be regarded exactly once.
+        while True:
+            match = xConvR.search(text[pos:])
+            if match:
+                old = match.group()
+                if len(old) % 2 == 0:
+                    # The first two chars represent an Esperanto letter.
+                    # Following x's are doubled.
+                    new = esperanto + ''.join([old[2 * i]
+                                               for i in xrange(1, len(old)/2)])
+                else:
+                    # The first character stays latin; only the x's are doubled.
+                    new = latin + ''.join([old[2 * i + 1]
+                                           for i in xrange(0, len(old)/2)])
+                result += text[pos : match.start() + pos] + new
+                pos += match.start() + len(old)
+            else:
+                result += text[pos:]
+                text = result
+                break
+    return text
+
+def encodeEsperantoX(text):
+    """
+    Convert standard wikitext to the Esperanto x-encoding.
+
+    Double X-es where necessary so that we can submit a page to an Esperanto
+    wiki. Again, we have to keep stupid stuff like cXxXxxX in mind. Maybe
+    someone wants to write about the Sony Cyber-shot DSC-Uxx camera series on
+    eo: ;)
+    """
+    # A regular expression that matches a letter combination which is NOT
+    # encoded in x-convention.
+    notXConvR = re.compile('[cghjsuCGHJSU][xX]+')
+    pos = 0
+    result = ''
+    while True:
+        match = notXConvR.search(text[pos:])
+        if match:
+            old = match.group()
+            # the first letter stays; add an x after each X or x.
+            new = old[0] + ''.join([old[i] + 'x' for i in xrange(1,
len(old))])
+            result += text[pos : match.start() + pos] + new
+            pos += match.start() + len(old)
+        else:
+            result += text[pos:]
+            text = result
+            break
+    return text
+
+######## Unicode library functions ########
+
+def UnicodeToAsciiHtml(s):
+    """Convert unicode to a bytestring using HTML
entities."""
+    html = []
+    for c in s:
+        cord = ord(c)
+        if 31 < cord < 128:
+            html.append(c)
+        else:
+            html.append('&#%d;'%cord)
+    return ''.join(html)
+
+def url2unicode(title, site, site2 = None):
+    """Convert url-encoded text to unicode using site's encoding.
+
+    If site2 is provided, try its encodings as well.  Uses the first encoding
+    that doesn't cause an error.
+
+    """
+    # create a list of all possible encodings for both hint sites
+    encList = [site.encoding()] + list(site.encodings())
+    if site2 and site2 <> site:
+        encList.append(site2.encoding())
+        encList += list(site2.encodings())
+    firstException = None
+    # try to handle all encodings (will probably retry utf-8)
+    for enc in encList:
+        try:
+            t = title.encode(enc)
+            t = urllib.unquote(t)
+            return unicode(t, enc)
+        except UnicodeError, ex:
+            if not firstException:
+                firstException = ex
+            pass
+    # Couldn't convert, raise the original exception
+    raise firstException
+
+def unicode2html(x, encoding):
+    """
+    Ensure unicode string is encodable, or else convert to ASCII for HTML.
+
+    Arguments are a unicode string and an encoding. Attempt to encode the
+    string into the desired format; if that doesn't work, encode the unicode
+    into html &#; entities. If it does work, return it unchanged.
+
+    """
+    try:
+        x.encode(encoding)
+    except UnicodeError:
+        x = UnicodeToAsciiHtml(x)
+    return x
+
+def html2unicode(text, ignore = []):
+    """Return text, replacing HTML entities by equivalent unicode
characters."""
+    # This regular expression will match any decimal and hexadecimal entity and
+    # also entities that might be named entities.
+    entityR = re.compile(
+       
r'&(?:amp;)?(#(?P<decimal>\d+)|#x(?P<hex>[0-9a-fA-F]+)|(?P<name>[A-Za-z]+));')
+    # These characters are Html-illegal, but sadly you *can* find some of
+    # these and converting them to unichr(decimal) is unsuitable
+    convertIllegalHtmlEntities = {
+        128 : 8364, # €
+        130 : 8218, # ‚
+        131 : 402,  # ƒ
+        132 : 8222, # „
+        133 : 8230, # …
+        134 : 8224, # †
+        135 : 8225, # ‡
+        136 : 710,  # ˆ
+        137 : 8240, # ‰
+        138 : 352,  # Š
+        139 : 8249, # ‹
+        140 : 338,  # Œ
+        142 : 381,  # Ž
+        145 : 8216, # ‘
+        146 : 8217, # ’
+        147 : 8220, # “
+        148 : 8221, # ”
+        149 : 8226, # •
+        150 : 8211, # –
+        151 : 8212, # —
+        152 : 732,  # ˜
+        153 : 8482, # ™
+        154 : 353,  # š
+        155 : 8250, # ›
+        156 : 339,  # œ
+        158 : 382,  # ž
+        159 : 376   # Ÿ
+    }
+    #ensuring that illegal &#129; &#141; and &#157, which have no known
values,
+    #don't get converted to unichr(129), unichr(141) or unichr(157)
+    ignore = set(ignore) | set([129, 141, 157])
+    result = u''
+    i = 0
+    found = True
+    while found:
+        text = text[i:]
+        match = entityR.search(text)
+        if match:
+            unicodeCodepoint = None
+            if match.group('decimal'):
+                unicodeCodepoint = int(match.group('decimal'))
+            elif match.group('hex'):
+                unicodeCodepoint = int(match.group('hex'), 16)
+            elif match.group('name'):
+                name = match.group('name')
+                if name in htmlentitydefs.name2codepoint:
+                    # We found a known HTML entity.
+                    unicodeCodepoint = htmlentitydefs.name2codepoint[name]
+            result += text[:match.start()]
+            try:
+                unicodeCodepoint = convertIllegalHtmlEntities[unicodeCodepoint]
+            except KeyError:
+                pass
+            if unicodeCodepoint and unicodeCodepoint not in ignore and (WIDEBUILD or
unicodeCodepoint < 65534):
+                result += unichr(unicodeCodepoint)
+            else:
+                # Leave the entity unchanged
+                result += text[match.start():match.end()]
+            i = match.end()
+        else:
+            result += text
+            found = False
+    return result
+
+# Warning! _familyCache does not necessarily have to be consistent between
+# two statements. Always ensure that a local reference is created when
+# accessing Family objects
+_familyCache = weakref.WeakValueDictionary()
+def Family(fam=None, fatal=True, force=False):
+    """Import the named family.
+
+    @param fam: family name (if omitted, uses the configured default)
+    @type fam: str
+    @param fatal: if True, the bot will stop running if the given family is
+        unknown. If False, it will only raise a ValueError exception.
+    @param fatal: bool
+    @return: a Family instance configured for the named family.
+
+    """
+    if fam is None:
+        fam = config.family
+
+    family = _familyCache.get(fam)
+    if family and not force:
+        return family
+
+    try:
+        # search for family module in the 'families' subdirectory
+        sys.path.append(config.datafilepath('families'))
+        myfamily = __import__('%s_family' % fam)
+    except ImportError:
+        if fatal:
+            output(u"""\
+Error importing the %s family. This probably means the family
+does not exist. Also check your configuration file."""
+                       % fam)
+            import traceback
+            traceback.print_stack()
+            sys.exit(1)
+        else:
+            raise ValueError("Family %s does not exist" % repr(fam))
+
+    family = myfamily.Family()
+    _familyCache[fam] = family
+    return family
+
+
+class Site(object):
+    """A MediaWiki site. Do not instantiate directly; use getSite()
function.
+
+    Constructor takes three arguments; only code is mandatory:
+        see __init__() param
+
+    Methods:
+
+    language: This Site's language code.
+    family: This Site's Family object.
+    sitename: A string representing this Site.
+    languages: A list of all languages contained in this site's Family.
+    validLanguageLinks: A list of language codes that can be used in interwiki
+        links.
+
+    loggedInAs: return current username, or None if not logged in.
+    forceLogin: require the user to log in to the site
+    messages: return True if there are new messages on the site
+    cookies: return user's cookies as a string
+
+    getUrl: retrieve an URL from the site
+    urlEncode: Encode a query to be sent using an http POST request.
+    postForm: Post form data to an address at this site.
+    postData: Post encoded form data to an http address at this site.
+
+    namespace(num): Return local name of namespace 'num'.
+    normalizeNamespace(value): Return preferred name for namespace 'value' in
+        this Site's language.
+    namespaces: Return list of canonical namespace names for this Site.
+    getNamespaceIndex(name): Return the int index of namespace 'name', or None
+        if invalid.
+
+    redirect: Return the localized redirect tag for the site.
+    redirectRegex: Return compiled regular expression matching on redirect
+                   pages.
+    mediawiki_message: Retrieve the text of a specified MediaWiki message
+    has_mediawiki_message: True if this site defines specified MediaWiki
+                           message
+    has_api: True if this site's family provides api interface
+
+    shared_image_repository: Return tuple of image repositories used by this
+        site.
+    category_on_one_line: Return True if this site wants all category links
+        on one line.
+    interwiki_putfirst: Return list of language codes for ordering of
+        interwiki links.
+    linkto(title): Return string in the form of a wikilink to 'title'
+    isInterwikiLink(s): Return True if 's' is in the form of an interwiki
+                        link.
+    getSite(lang): Return Site object for wiki in same family, language
+                   'lang'.
+    version: Return MediaWiki version string from Family file.
+    versionnumber: Return int identifying the MediaWiki version.
+    live_version: Return version number read from Special:Version.
+    checkCharset(charset): Warn if charset doesn't match family file.
+    server_time: returns server time (currently userclock depending)
+
+    getParsedString: Parses the string with API and returns html content.
+    getExpandedString: Expands the string with API and returns wiki content.
+
+    linktrail: Return regex for trailing chars displayed as part of a link.
+    disambcategory: Category in which disambiguation pages are listed.
+
+    Methods that yield Page objects derived from a wiki's Special: pages
+    (note, some methods yield other information in a tuple along with the
+    Pages; see method docs for details) --
+
+        search(query): query results from Special:Search
+        allpages(): Special:Allpages
+        prefixindex(): Special:Prefixindex
+        protectedpages(): Special:ProtectedPages
+        newpages(): Special:Newpages
+        newimages(): Special:Log&type=upload
+        longpages(): Special:Longpages
+        shortpages(): Special:Shortpages
+        categories(): Special:Categories (yields Category objects)
+        deadendpages(): Special:Deadendpages
+        ancientpages(): Special:Ancientpages
+        lonelypages(): Special:Lonelypages
+        recentchanges(): Special:Recentchanges
+        unwatchedpages(): Special:Unwatchedpages (sysop accounts only)
+        uncategorizedcategories(): Special:Uncategorizedcategories (yields
+            Category objects)
+        uncategorizedpages(): Special:Uncategorizedpages
+        uncategorizedimages(): Special:Uncategorizedimages (yields
+            ImagePage objects)
+        uncategorizedtemplates(): Special:UncategorizedTemplates
+        unusedcategories(): Special:Unusuedcategories (yields Category)
+        unusedfiles(): Special:Unusedimages (yields ImagePage)
+        randompage: Special:Random
+        randomredirectpage: Special:RandomRedirect
+        withoutinterwiki: Special:Withoutinterwiki
+        linksearch: Special:Linksearch
+
+    Convenience methods that provide access to properties of the wiki Family
+    object; all of these are read-only and return a unicode string unless
+    noted --
+
+        encoding: The current encoding for this site.
+        encodings: List of all historical encodings for this site.
+        category_namespace: Canonical name of the Category namespace on this
+            site.
+        category_namespaces: List of all valid names for the Category
+            namespace.
+        image_namespace: Canonical name of the Image namespace on this site.
+        template_namespace: Canonical name of the Template namespace on this
+            site.
+        protocol: Protocol ('http' or 'https') for access to this site.
+        hostname: Host portion of site URL.
+        path: URL path for index.php on this Site.
+        dbName: MySQL database name.
+
+    Methods that return addresses to pages on this site (usually in
+    Special: namespace); these methods only return URL paths, they do not
+    interact with the wiki --
+
+        export_address: Special:Export.
+        query_address: URL path + '?' for query.php
+        api_address: URL path + '?' for api.php
+        apipath: URL path for api.php
+        move_address: Special:Movepage.
+        delete_address(s): Delete title 's'.
+        undelete_view_address(s): Special:Undelete for title 's'
+        undelete_address: Special:Undelete.
+        protect_address(s): Protect title 's'.
+        unprotect_address(s): Unprotect title 's'.
+        put_address(s): Submit revision to page titled 's'.
+        get_address(s): Retrieve page titled 's'.
+        nice_get_address(s): Short URL path to retrieve page titled 's'.
+        edit_address(s): Edit form for page titled 's'.
+        purge_address(s): Purge cache and retrieve page 's'.
+        block_address: Block an IP address.
+        unblock_address: Unblock an IP address.
+        blocksearch_address(s): Search for blocks on IP address 's'.
+        linksearch_address(s): Special:Linksearch for target 's'.
+        search_address(q): Special:Search for query 'q'.
+        allpages_address(s): Special:Allpages.
+        newpages_address: Special:Newpages.
+        longpages_address: Special:Longpages.
+        shortpages_address: Special:Shortpages.
+        unusedfiles_address: Special:Unusedimages.
+        categories_address: Special:Categories.
+        deadendpages_address: Special:Deadendpages.
+        ancientpages_address: Special:Ancientpages.
+        lonelypages_address: Special:Lonelypages.
+        protectedpages_address: Special:ProtectedPages
+        unwatchedpages_address: Special:Unwatchedpages.
+        uncategorizedcategories_address: Special:Uncategorizedcategories.
+        uncategorizedimages_address: Special:Uncategorizedimages.
+        uncategorizedpages_address: Special:Uncategorizedpages.
+        uncategorizedtemplates_address: Special:UncategorizedTemplates.
+        unusedcategories_address: Special:Unusedcategories.
+        withoutinterwiki_address: Special:Withoutinterwiki.
+        references_address(s): Special:Whatlinksere for page 's'.
+        allmessages_address: Special:Allmessages.
+        upload_address: Special:Upload.
+        double_redirects_address: Special:Doubleredirects.
+        broken_redirects_address: Special:Brokenredirects.
+        random_address: Special:Random.
+        randomredirect_address: Special:Random.
+        login_address: Special:Userlogin.
+        captcha_image_address(id): Special:Captcha for image 'id'.
+        watchlist_address: Special:Watchlist editor.
+        contribs_address(target): Special:Contributions for user 'target'.
+
+    """
+
+    @deprecate_arg("persistent_http", None)
+    def __init__(self, code, fam=None, user=None):
+        """
+        @param code: the site's language code
+        @type code: str
+        @param fam: wiki family name (optional)
+        @type fam: str or Family
+        @param user: bot user name (optional)
+        @type user: str
+
+        """
+        self.__code = code.lower()
+        if isinstance(fam, basestring) or fam is None:
+            self.__family = Family(fam, fatal = False)
+        else:
+            self.__family = fam
+
+        # if we got an outdated language code, use the new one instead.
+        if self.__code in self.__family.obsolete:
+            if self.__family.obsolete[self.__code] is not None:
+                self.__code = self.__family.obsolete[self.__code]
+            else:
+                # no such language anymore
+                raise NoSuchSite("Language %s in family %s is obsolete"
+                                 % (self.__code, self.__family.name))
+        if self.__code not in self.languages():
+            if self.__code == 'zh-classic' \
+               and 'zh-classical' in self.languages():
+                self.__code = 'zh-classical'
+                # database hack (database is varchar[10], so zh-classical
+                # is cut to zh-classic)
+            elif self.__family.name in self.__family.langs.keys() \
+                 or len(self.__family.langs) == 1:
+                self.__code = self.__family.name
+            else:
+                raise NoSuchSite("Language %s does not exist in family %s"
+                                 % (self.__code, self.__family.name))
+
+        self._mediawiki_messages = {}
+        self._info = {}
+        self._userName = [None, None]
+        self.nocapitalize = self.code in self.family.nocapitalize
+        self.user = user
+        self._userData = [False, False]
+        self._isLoggedIn = [None, None]
+        self._isBlocked = [None, None]
+        self._messages = [None, None]
+        self._rights = [None, None]
+        self._token = [None, None]
+        self._patrolToken = [None, None]
+        self._cookies = [None, None]
+        # Calculating valid languages took quite long, so we calculate it once
+        # in initialization instead of each time it is used.
+        self._validlanguages = []
+        for language in self.languages():
+            if not language[0].upper() + language[1:] in self.namespaces():
+                self._validlanguages.append(language)
+
+    def __call__(self):
+        """Since the Page.site() method has a property decorator, return
the
+        site object for backwards-compatibility if Page.site() call is still
+        used instead of Page.site as recommended.
+
+        """
+##        # DEPRECATED warning. Should be uncommented if scripts are actualized
+##        pywikibot.output('Page.site() method is DEPRECATED, '
+##                         'use Page.site instead.')
+        return self
+
+    @property
+    def family(self):
+        """The Family object for this Site's wiki
family."""
+
+        return self.__family
+
+    @property
+    def code(self):
+        """The identifying code for this Site.
+
+        By convention, this is usually an ISO language code, but it does
+        not have to be.
+
+        """
+        return self.__code
+
+    @property
+    def lang(self):
+        """The ISO language code for this Site.
+
+        Presumed to be equal to the wiki prefix, but this can be overridden.
+
+        """
+        return self.__code
+
+    def __cmp__(self, other):
+        """Perform equality and inequality tests on Site
objects."""
+
+        if not isinstance(other, Site):
+            return 1
+        if self.family.name == other.family.name:
+            return cmp(self.code ,other.code)
+        return cmp(self.family.name, other.family.name)
+
+    def _userIndex(self, sysop = False):
+        """Returns the internal index of the user."""
+        if sysop:
+            return 1
+        else:
+            return 0
+
+    def username(self, sysop = False):
+        return self._userName[self._userIndex(sysop = sysop)]
+
+    def sitename(self):
+        """Return string representing this Site's name and
code."""
+
+        return self.family.name+':'+self.code
+
+    def __repr__(self):
+        return '%s:%s' % (self.family.name, self.code)
+
+    def __hash__(self):
+        return hash(repr(self))
+
+    def linktrail(self):
+        """Return regex for trailing chars displayed as part of a link.
+
+        Returns a string, not a compiled regular expression object.
+
+        This reads from the family file, and ''not'' from
+        [[MediaWiki:Linktrail]], because the MW software currently uses a
+        built-in linktrail from its message files and ignores the wiki
+        value.
+
+        """
+        return self.family.linktrail(self.code)
+
+    def languages(self):
+        """Return list of all valid language codes for this site's
Family."""
+
+        return self.family.iwkeys
+
+    def validLanguageLinks(self):
+        """Return list of language codes that can be used in interwiki
links."""
+        return self._validlanguages
+
+    def namespaces(self):
+        """Return list of canonical namespace names for this
Site."""
+
+        # n.b.: this does not return namespace numbers; to determine which
+        # numeric namespaces the framework recognizes for this Site (which
+        # may or may not actually exist on the wiki), use
+        # self.family.namespaces.keys()
+
+        if self in _namespaceCache:
+            return _namespaceCache[self]
+        else:
+            nslist = []
+            for n in self.family.namespaces:
+                try:
+                    ns = self.family.namespace(self.lang, n)
+                except KeyError:
+                    # No default namespace defined
+                    continue
+                if ns is not None:
+                    nslist.append(self.family.namespace(self.lang, n))
+            _namespaceCache[self] = nslist
+            return nslist
+
+    def redirect(self, default=False):
+        """Return the localized redirect tag for the site.
+
+        """
+        # return the magic word without the preceding '#' character
+        if default or self.versionnumber() <= 13:
+            return u'REDIRECT'
+        else:
+            return self.getmagicwords('redirect')[0].lstrip("#")
+
+    def loggedInAs(self, sysop = False):
+        """Return the current username if logged in, otherwise return
None.
+
+        Checks if we're logged in by loading a page and looking for the login
+        link. We assume that we're not being logged out during a bot run, so
+        loading the test page is only required once.
+
+        """
+        index = self._userIndex(sysop)
+        if self._isLoggedIn[index] is None:
+            # Load the details only if you don't know the login status.
+            # Don't load them just because the other details aren't known.
+            self._load(sysop = sysop)
+        if self._isLoggedIn[index]:
+            return self._userName[index]
+        else:
+            return None
+
+    def forceLogin(self, sysop = False):
+        """Log the user in if not already logged in."""
+        if not self.loggedInAs(sysop = sysop):
+            loginMan = login.LoginManager(site = self, sysop = sysop)
+            #loginMan.logout()
+            if loginMan.login(retry = True):
+                index = self._userIndex(sysop)
+                self._isLoggedIn[index] = True
+                self._userName[index] = loginMan.username
+                # We know nothing about the new user (but its name)
+                # Old info is about the anonymous user
+                self._userData[index] = False
+
+    def checkBlocks(self, sysop = False):
+        """Check if the user is blocked, and raise an exception if
so."""
+        self._load(sysop = sysop)
+        index = self._userIndex(sysop)
+        if self._isBlocked[index]:
+            # User blocked
+            raise UserBlocked('User is blocked in site %s' % self)
+
+    def isBlocked(self, sysop = False):
+        """Check if the user is blocked."""
+        self._load(sysop = sysop)
+        index = self._userIndex(sysop)
+        if self._isBlocked[index]:
+            # User blocked
+            return True
+        else:
+            return False
+
+    def _getBlock(self, sysop = False):
+        """Get user block data from the API."""
+        try:
+            params = {
+                'action': 'query',
+                'meta': 'userinfo',
+                'uiprop': 'blockinfo',
+            }
+            data = query.GetData(params, self)
+            if not data or 'error' in data:
+                return False
+            if self.versionnumber() == 11:     # fix for version 1.11 API.
+                data = data['userinfo']
+            else:
+                data = data['query']['userinfo']
+            return 'blockedby' in data
+        except NotImplementedError:
+            return False
+
+    def isAllowed(self, right, sysop = False):
+        """Check if the user has a specific right.
+        Among possible rights:
+        * Actions: edit, move, delete, protect, upload
+        * User levels: autoconfirmed, sysop, bot, empty string (always true)
+        """
+        if right == '' or right is None:
+            return True
+        else:
+            self._load(sysop = sysop)
+            index = self._userIndex(sysop)
+            # Handle obsolete editusercssjs permission
+            if right in ['editusercss', 'edituserjs'] \
+               and right not in self._rights[index]:
+                return 'editusercssjs' in self._rights[index]
+            return right in self._rights[index]
+
+    def server_time(self):
+        """returns a datetime object representing server
time"""
+        # It is currently user-clock depending
+        return self.family.server_time()
+
+    def messages(self, sysop = False):
+        """Returns true if the user has new messages, and false
otherwise."""
+        self._load(sysop = sysop)
+        index = self._userIndex(sysop)
+        return self._messages[index]
+
+    def cookies(self, sysop = False):
+        """Return a string containing the user's current
cookies."""
+        self._loadCookies(sysop = sysop)
+        index = self._userIndex(sysop)
+        if self._cookies[index]:
+            #convert cookies dictionary data to string.
+            outputDatas = ""
+            for k, v in self._cookies[index].iteritems():
+                if v:
+                    outputDatas += "%s=%s; " % (k,v)
+                else:
+                    # protection for value ''
+                    outputDatas += "%s=none; " % k
+            return outputDatas
+        else:
+            return None
+
+    def _loadCookies(self, sysop = False):
+        """
+         Retrieve session cookies for login
+         if family datas define the cross projects, this function will search
+         the central login file made by self or cross available project
+         functioin will read the cookiedata if got one of them is exist
+        """
+        index = self._userIndex(sysop)
+        if self._cookies[index] is not None:
+            return
+        try:
+            if sysop:
+                try:
+                    username = config.sysopnames[self.family.name][self.lang]
+                except KeyError:
+                    raise NoUsername("""\
+You tried to perform an action that requires admin privileges, but you haven't
+entered your sysop name in your user-config.py. Please add
+sysopnames['%s']['%s']='name' to your
user-config.py"""
+                                     % (self.family.name, self.lang))
+            else:
+                username = config.usernames[self.family.name][self.lang]
+        except KeyError:
+            self._cookies[index] = None
+            self._isLoggedIn[index] = False
+        else:
+            # check central login data if cross_projects is available.
+            localFn = '%s-%s-%s-login.data' % (self.family.name, self.lang,
username)
+            localPa = config.datafilepath('login-data', localFn)
+            if self.family.cross_projects:
+                for proj in [self.family.name] + self.family.cross_projects:
+                    #find all central data in all cross_projects
+                    centralFn = '%s-%s-central-login.data' % (proj, username)
+                    centralPa = config.datafilepath('login-data', centralFn)
+                    if os.path.exists(centralPa):
+                        self._cookies[index] = self._readCookies(centralFn)
+                        break
+
+            if os.path.exists(localPa):
+                #read and dump local logindata into self._cookies[index]
+                # if self._cookies[index] is not availabe, read the local data and set
the dictionary.
+                if type(self._cookies[index]) == dict:
+                    for k, v in self._readCookies(localFn).iteritems():
+                        if k not in self._cookies[index]:
+                            self._cookies[index][k] = v
+                else:
+                    self._cookies[index] = dict([(k,v) for k,v in
self._readCookies(localFn).iteritems()])
+                #self._cookies[index] = query.CombineParams(self._cookies[index],
self._readCookies(localFn))
+            elif not os.path.exists(localPa) and not self.family.cross_projects:
+                #keep anonymous mode if not login and centralauth not enable
+                self._cookies[index] = None
+                self._isLoggedIn[index] = False
+
+    def _readCookies(self, filename):
+        """read login cookie file and return a
dictionary."""
+        try:
+            f = open( config.datafilepath('login-data', filename), 'r')
+            ck = re.compile("(.*?)=(.*?)\r?\n")
+            data = dict([(x[0],x[1]) for x in ck.findall(f.read())])
+            #data = dict(ck.findall(f.read()))
+            f.close()
+            return data
+        except IOError:
+            return None
+
+    def _setupCookies(self, datas, sysop = False):
+        """save the cookie dictionary to files
+           if cross_project enable, savefiles will separate two, centraldata and
localdata.
+        """
+        index = self._userIndex(sysop)
+        if not self._cookies[index]:
+            self._cookies[index] = datas
+        cache = {0:"",1:""} #0 is central auth, 1 is local.
+        if not self.username(sysop):
+            if not self._cookies[index]:
+                return
+            elif self.family.cross_projects_cookie_username in self._cookies[index]:
+                # Using centralauth to cross login data, it's not necessary to
forceLogin, but Site() didn't know it.
+                # So we need add centralauth username data into siteattribute
+                self._userName[index] =
self._cookies[index][self.family.cross_projects_cookie_username]
+
+
+        for k, v in datas.iteritems():
+            #put key and values into save cache
+            if self.family.cross_projects and k in self.family.cross_projects_cookies:
+                cache[0] += "%s=%s\n" % (k,v)
+            else:
+                cache[1] += "%s=%s\n" % (k,v)
+
+        # write the data.
+        if self.family.cross_projects and cache[0]:
+            filename = '%s-%s-central-login.data' % (self.family.name,
self.username(sysop))
+            f = open(config.datafilepath('login-data', filename), 'w')
+            f.write(cache[0])
+            f.close()
+
+        filename = '%s-%s-%s-login.data' % (self.family.name, self.lang,
self.username(sysop))
+        f = open(config.datafilepath('login-data', filename), 'w')
+        f.write(cache[1])
+        f.close()
+
+    def _removeCookies(self, name):
+        # remove cookies.
+        # ToDo: remove all local datas if cross_projects enable.
+        #
+        if self.family.cross_projects:
+            file = config.datafilepath('login-data',
'%s-%s-central-login.data' % (self.family.name, name))
+            if os.path.exists(file):
+                os.remove( file )
+        file = config.datafilepath('login-data', '%s-%s-%s-login.data' %
(self.family.name, self.lang, name))
+        if os.path.exists(file):
+            os.remove(file)
+
+    def updateCookies(self, datas, sysop = False):
+        """Check and update the current cookies datas and save back to
files."""
+        index = self._userIndex(sysop)
+        if not self._cookies[index]:
+            self._setupCookies(datas, sysop)
+
+        for k, v in datas.iteritems():
+            if k in self._cookies[index]:
+                if v != self._cookies[index][k]:
+                    self._cookies[index][k] = v
+            else:
+                self._cookies[index][k] = v
+
+        self._setupCookies(self._cookies[index], sysop)
+
+    def urlEncode(self, query):
+        """Encode a query so that it can be sent using an http POST
request."""
+        if not query:
+            return None
+        if hasattr(query, 'iteritems'):
+            iterator = query.iteritems()
+        else:
+            iterator = iter(query)
+        l = []
+        wpEditToken = None
+        for key, value in iterator:
+            if isinstance(key, unicode):
+                key = key.encode('utf-8')
+            if isinstance(value, unicode):
+                value = value.encode('utf-8')
+            key = urllib.quote(key)
+            value = urllib.quote(value)
+            if key == 'wpEditToken':
+                wpEditToken = value
+                continue
+            l.append(key + '=' + value)
+
+        # wpEditToken is explicitly added as last value.
+        # If a premature connection abort occurs while putting, the server will
+        # not have received an edit token and thus refuse saving the page
+        if wpEditToken is not None:
+            l.append('wpEditToken=' + wpEditToken)
+        return '&'.join(l)
+
+    def solveCaptcha(self, data):
+        if type(data) == dict: # API Mode result
+            if 'edit' in data and  data['edit']['result'] !=
u"Success":
+                data = data['edit']
+            if "captcha" in data:
+                data = data['captcha']
+                captype = data['type']
+                id = data['id']
+                if captype in ['simple', 'math', 'question']:
+                    answer = input('What is the answer to the captcha "%s"
?' % data['question'])
+                elif captype == 'image':
+                    url = self.protocol() + '://' + self.hostname() +
self.captcha_image_address(id)
+                    answer = ui.askForCaptcha(url)
+                else: #no captcha id result, maybe ReCaptcha.
+                    raise CaptchaError('We have been prompted for a ReCaptcha, but
pywikipedia does not yet support ReCaptchas')
+                return {'id':id, 'answer':answer}
+            return None
+        else:
+            captchaW = re.compile('<label
for="wpCaptchaWord">(?P<question>[^<]*)</label>')
+            captchaR = re.compile('<input type="hidden"
name="wpCaptchaId" id="wpCaptchaId"
value="(?P<id>\d+)" />')
+            match = captchaR.search(data)
+            if match:
+                id = match.group('id')
+                match = captchaW.search(data)
+                if match:
+                    answer = input('What is the answer to the captcha "%s"
?' % match.group('question'))
+                else:
+                    if not config.solve_captcha:
+                        raise CaptchaError(id)
+                    url = self.protocol() + '://' + self.hostname() +
self.captcha_image_address(id)
+                    answer = ui.askForCaptcha(url)
+                return {'id':id, 'answer':answer}
+            Recaptcha = re.compile('<script type="text/javascript"
src="http://api\.recaptcha\.net/[^"]*"></script>…)
+            if Recaptcha.search(data):
+                raise CaptchaError('We have been prompted for a ReCaptcha, but
pywikipedia does not yet support ReCaptchas')
+            return None
+
+    def postForm(self, address, predata, sysop = False, cookies = None):
+        """Post http form data to the given address at this site.
+
+        address - the absolute path without hostname.
+        predata - a dict or any iterable that can be converted to a dict,
+        containing keys and values for the http form.
+        cookies - the cookies to send with the form. If None, send self.cookies
+
+        Return a (response, data) tuple, where response is the HTTP
+        response object and data is a Unicode string containing the
+        body of the response.
+
+        """
+        if ('action' in predata) and pywikibot.simulate and \
+           (predata['action'] in pywikibot.config.actions_to_block) and \
+           (address not in [self.export_address()]):
+            pywikibot.output(u'\03{lightyellow}SIMULATION: %s action
blocked.\03{default}'%\
+                             predata['action'])
+            import StringIO
+            f_dummy = StringIO.StringIO()
+            f_dummy.__dict__.update({u'code': 0, u'msg': u''})
+            return f_dummy, u''
+
+        data = self.urlEncode(predata)
+        try:
+            if cookies:
+                return self.postData(address, data, sysop=sysop,
+                                        cookies=cookies)
+            else:
+                return self.postData(address, data, sysop=sysop,
+                                    cookies=self.cookies(sysop = sysop))
+        except socket.error, e:
+            raise ServerError(e)
+
+    def postData(self, address, data,
+                 contentType = 'application/x-www-form-urlencoded',
+                 sysop = False, compress = True, cookies = None):
+        """Post encoded data to the given http address at this site.
+
+        address is the absolute path without hostname.
+        data is an ASCII string that has been URL-encoded.
+
+        Returns a (response, data) tuple where response is the HTTP
+        response object and data is a Unicode string containing the
+        body of the response.
+        """
+
+        if address[-1] == "?":
+            address = address[:-1]
+
+        headers = {
+            'User-agent': useragent,
+            'Content-Length': str(len(data)),
+            'Content-type':contentType,
+        }
+        if cookies:
+            headers['Cookie'] = cookies
+
+        if compress:
+            headers['Accept-encoding'] = 'gzip'
+        #print '%s' % headers
+
+        url = '%s://%s%s' % (self.protocol(), self.hostname(), address)
+        # Try to retrieve the page until it was successfully loaded (just in
+        # case the server is down or overloaded).
+        # Wait for retry_idle_time minutes (growing!) between retries.
+        retry_idle_time = 1
+        retry_attempt = 0
+        while True:
+            try:
+                request = urllib2.Request(url, data, headers)
+                f = MyURLopener.open(request)
+
+                # read & info can raise socket.error
+                text = f.read()
+                headers = f.info()
+                break
+            except KeyboardInterrupt:
+                raise
+            except urllib2.HTTPError, e:
+                if e.code in [401, 404]:
+                    raise PageNotFound(u'Page %s could not be retrieved. Check your
family file ?' % url)
+                # just check for HTTP Status 500 (Internal Server Error)?
+                elif e.code in [500, 502, 504]:
+                    output(u'HTTPError: %s %s' % (e.code, e.msg))
+                    if config.retry_on_fail:
+                        retry_attempt += 1
+                        if retry_attempt > config.maxretries:
+                            raise MaxTriesExceededError()
+                        output(u"WARNING: Could not open '%s'.\nMaybe the
server is down. Retrying in %i minutes..."
+                               % (url, retry_idle_time))
+                        time.sleep(retry_idle_time * 60)
+                        # Next time wait longer, but not longer than half an hour
+                        retry_idle_time *= 2
+                        if retry_idle_time > 30:
+                            retry_idle_time = 30
+                        continue
+                    raise
+                else:
+                    output(u"Result: %s %s" % (e.code, e.msg))
+                    raise
+            except Exception, e:
+                output(u'%s' %e)
+                if pywikibot.verbose:
+                    import traceback
+                    traceback.print_exc()
+
+                if config.retry_on_fail:
+                    retry_attempt += 1
+                    if retry_attempt > config.maxretries:
+                        raise MaxTriesExceededError()
+                    output(u"WARNING: Could not open '%s'. Maybe the server
or\n your connection is down. Retrying in %i minutes..."
+                           % (url, retry_idle_time))
+                    time.sleep(retry_idle_time * 60)
+                    retry_idle_time *= 2
+                    if retry_idle_time > 30:
+                        retry_idle_time = 30
+                    continue
+                raise
+
+        # check cookies return or not, if return, send its to update.
+        if hasattr(f, 'sheaders'):
+            ck = f.sheaders
+        else:
+            ck = f.info().getallmatchingheaders('set-cookie')
+        if ck:
+            Reat=re.compile(': (.*?)=(.*?);')
+            tmpc = {}
+            for d in ck:
+                m = Reat.search(d)
+                if m: tmpc[m.group(1)] = m.group(2)
+            if self.cookies(sysop):
+                self.updateCookies(tmpc, sysop)
+
+        resContentType = headers.get('content-type', '')
+        contentEncoding = headers.get('content-encoding', '')
+
+        # Ensure that all sent data is received
+        # In rare cases we found a douple Content-Length in the header.
+        # We need to split it to get a value
+        content_length = int(headers.get('content-length',
'0').split(',')[0])
+        if content_length != len(text) and 'content-length' in headers:
+            output(
+                u'Warning! len(text) does not match content-length: %s != %s'
+                % (len(text), content_length))
+            return self.postData(address, data, contentType, sysop, compress,
+                                 cookies)
+
+        if compress and contentEncoding == 'gzip':
+            text = decompress_gzip(text)
+
+        R = re.compile('charset=([^\'\";]+)')
+        m = R.search(resContentType)
+        if m:
+            charset = m.group(1)
+        else:
+            if verbose:
+                output(u"WARNING: No character set found.")
+            # UTF-8 as default
+            charset = 'utf-8'
+        # Check if this is the charset we expected
+        self.checkCharset(charset)
+        # Convert HTML to Unicode
+        try:
+            text = unicode(text, charset, errors = 'strict')
+        except UnicodeDecodeError, e:
+            print e
+            output(u'ERROR: Invalid characters found on %s://%s%s, replaced by
\\ufffd.'
+                   % (self.protocol(), self.hostname(), address))
+            # We use error='replace' in case of bad encoding.
+            text = unicode(text, charset, errors = 'replace')
+
+        # If a wiki page, get user data
+        self._getUserDataOld(text, sysop = sysop)
+
+        return f, text
+
+    #(a)deprecated(&quot;pywikibot.comms.http.request&quot;)    # in 'trunk' not
yet...
+    def getUrl(self, path, retry = None, sysop = False, data = None, compress = True,
+               no_hostname = False, cookie_only=False, refer=None, back_response=False):
+        """
+        Low-level routine to get a URL from the wiki. Tries to login if it is
+        another wiki.
+
+        Parameters:
+            path        - The absolute path, without the hostname.
+            retry       - If True, retries loading the page when a network error
+                        occurs.
+            sysop       - If True, the sysop account's cookie will be used.
+            data        - An optional dict providing extra post request parameters.
+            cookie_only - Only return the cookie the server sent us back
+
+           Returns the HTML text of the page converted to unicode.
+        """
+        from pywikibot.comms import http
+
+        f, text = http.request(self, path, retry, sysop, data, compress,
+                               no_hostname, cookie_only, refer, back_response = True)
+
+        # If a wiki page, get user data
+        self._getUserDataOld(text, sysop = sysop)
+
+        if back_response:
+            return f, text
+
+        return text
+
+    def _getUserData(self, text, sysop = False, force = True):
+        """
+        Get the user data from an API query dict.
+
+        Parameters:
+        * text - the page text
+        * sysop - is the user a sysop?
+        """
+
+        index = self._userIndex(sysop)
+        # Check for blocks
+
+        if 'blockedby' in text and not self._isBlocked[index]:
+            # Write a warning if not shown earlier
+            if sysop:
+                account = 'Your sysop account'
+            else:
+                account = 'Your account'
+            output(u'\nWARNING: %s on %s is blocked by %s.\nReason: %s\nEditing using
this account will stop the run.\n'
+                   % (account, self, text['blockedby'],
text['blockreason']))
+        self._isBlocked[index] = 'blockedby' in text
+
+        # Check for new messages, the data must had key 'messages' in dict.
+        if 'messages' in text:
+            if not self._messages[index]:
+                # User has *new* messages
+                if sysop:
+                    output(u'NOTE: You have new messages in your sysop account on
%s' % self)
+                else:
+                    output(u'NOTE: You have new messages on %s' % self)
+            self._messages[index] = True
+        else:
+            self._messages[index] = False
+
+        # Don't perform other checks if the data was already loaded
+        if self._userData[index] and not force:
+            return
+
+        # Get username.
+        # The data in anonymous mode had key 'anon'
+        # if 'anon' exist, username is IP address, not to collect it right now
+        if not 'anon' in text:
+            self._isLoggedIn[index] = True
+            self._userName[index] = text['name']
+        else:
+            self._isLoggedIn[index] = False
+            self._userName[index] = None
+
+        # Get user groups and rights
+        if 'groups' in text:
+            self._rights[index] = []
+            for group in text['groups']:
+                # Convert dictionaries to list items (bug 3311663)
+                if isinstance(group, dict):
+                    self._rights[index].extend(group.keys())
+                else:
+                    self._rights[index].append(group)
+            self._rights[index].extend(text['rights'])
+            # Warnings
+            # Don't show warnings for not logged in users, they will just fail to
+            # do any action
+            if self._isLoggedIn[index]:
+                if 'bot' not in self._rights[index] and
config.notify_unflagged_bot:
+                    # Sysop + bot flag = Sysop flag in MediaWiki < 1.7.1?
+                    if sysop:
+                        output(u'Note: Your sysop account on %s does not have a bot
flag. Its edits will be visible in the recent changes.' % self)
+                    else:
+                        output(u'WARNING: Your account on %s does not have a bot
flag. Its edits will be visible in the recent changes and it may get blocked.' %
self)
+                if sysop and 'sysop' not in self._rights[index]:
+                    output(u'WARNING: Your sysop account on %s does not seem to have
sysop rights. You may not be able to perform any sysop-restricted actions using it.' %
self)
+        else:
+            # 'groups' is not exists, set default rights
+            self._rights[index] = []
+            if self._isLoggedIn[index]:
+                # Logged in user
+                self._rights[index].append('user')
+                # Assume bot, and thus autoconfirmed
+                self._rights[index].extend(['bot', 'autoconfirmed'])
+                if sysop:
+                    # Assume user reported as a sysop indeed has the sysop rights
+                    self._rights[index].append('sysop')
+        # Assume the user has the default rights if API not query back
+        self._rights[index].extend(['read', 'createaccount',
'edit', 'upload', 'createpage', 'createtalk',
'move', 'upload'])
+        #remove Duplicate rights
+        self._rights[index] = list(set(self._rights[index]))
+
+        # Get token
+        if 'preferencestoken' in text:
+            self._token[index] = text['preferencestoken']
+            if self._rights[index] is not None:
+                # Token and rights are loaded - user data is now loaded
+                self._userData[index] = True
+        elif self.versionnumber() < 14:
+            # uiprop 'preferencestoken' is start from 1.14, if 1.8~13, we need to
use other way to get token
+            params = {
+                'action': 'query',
+                'prop': 'info',
+                'titles':'Non-existing page',
+                'intoken': 'edit',
+            }
+            data = query.GetData(params, self,
sysop=sysop)['query']['pages'].values()[0]
+            if 'edittoken' in data:
+                self._token[index] = data['edittoken']
+                self._userData[index] = True
+            else:
+                output(u'WARNING: Token not found on %s. You will not be able to edit
any page.' % self)
+        else:
+            if not self._isBlocked[index]:
+                output(u'WARNING: Token not found on %s. You will not be able to edit
any page.' % self)
+
+    def _getUserDataOld(self, text, sysop = False, force = True):
+        """
+        Get the user data from a wiki page data.
+
+        Parameters:
+        * text - the page text
+        * sysop - is the user a sysop?
+        """
+
+        index = self._userIndex(sysop)
+
+        if '<div id="globalWrapper">' not in text:
+            # Not a wiki page
+            return
+        # Check for blocks - but only if version is 1.11 (userinfo is available)
+        # and the user data was not yet loaded
+        if self.versionnumber() >= 11 and (not self._userData[index] or force):
+            blocked = self._getBlock(sysop = sysop)
+            if blocked and not self._isBlocked[index]:
+                # Write a warning if not shown earlier
+                if sysop:
+                    account = 'Your sysop account'
+                else:
+                    account = 'Your account'
+                output(u'WARNING: %s on %s is blocked. Editing using this account
will stop the run.' % (account, self))
+            self._isBlocked[index] = blocked
+
+        # Check for new messages
+        if '<div class="usermessage">' in text:
+            if not self._messages[index]:
+                # User has *new* messages
+                if sysop:
+                    output(u'NOTE: You have new messages in your sysop account on
%s' % self)
+                else:
+                    output(u'NOTE: You have new messages on %s' % self)
+            self._messages[index] = True
+        else:
+            self._messages[index] = False
+        # Don't perform other checks if the data was already loaded
+        if self._userData[index] and not force:
+            return
+
+        # Search for the the user page link at the top.
+        # Note that the link of anonymous users (which doesn't exist at all
+        # in Wikimedia sites) has the ID pt-anonuserpage, and thus won't be
+        # found here.
+        userpageR = re.compile('<li id="pt-userpage".*?><a
href=".+?".*?>(?P<username>.+?)</a></li>')
+        m = userpageR.search(text)
+        if m:
+            self._isLoggedIn[index] = True
+            self._userName[index] = m.group('username')
+        else:
+            self._isLoggedIn[index] = False
+            # No idea what is the user name, and it isn't important
+            self._userName[index] = None
+
+        if self.family.name == 'wikitravel':    # fix for Wikitravel's user
page link.
+            self = self.family.user_page_link(self,index)
+
+        # Check user groups, if possible (introduced in 1.10)
+        groupsR = re.compile(r'var wgUserGroups = \[\"(.+)\"\];')
+        m = groupsR.search(text)
+        checkLocal = True
+        if default_code in self.family.cross_allowed: # if current languages in cross
allowed list, check global bot flag.
+            globalgroupsR = re.compile(r'var wgGlobalGroups =
\[\"(.+)\"\];')
+            mg = globalgroupsR.search(text)
+            if mg: # the account had global permission
+                globalRights = mg.group(1)
+                globalRights = globalRights.split('","')
+                self._rights[index] = globalRights
+                if self._isLoggedIn[index]:
+                    if 'Global_bot' in globalRights: # This account has the
global bot flag, no need to check local flags.
+                        checkLocal = False
+                    else:
+                        output(u'Your bot account does not have global the bot flag,
checking local flag.')
+        else:
+            if verbose: output(u'Note: this language does not allow global
bots.')
+        if m and checkLocal:
+            rights = m.group(1)
+            rights = rights.split('", "')
+            if '*' in rights:
+                rights.remove('*')
+            self._rights[index] = rights
+            # Warnings
+            # Don't show warnings for not logged in users, they will just fail to
+            # do any action
+            if self._isLoggedIn[index]:
+                if 'bot' not in self._rights[index] and
config.notify_unflagged_bot:
+                    # Sysop + bot flag = Sysop flag in MediaWiki < 1.7.1?
+                    if sysop:
+                        output(u'Note: Your sysop account on %s does not have a bot
flag. Its edits will be visible in the recent changes.' % self)
+                    else:
+                        output(u'WARNING: Your account on %s does not have a bot
flag. Its edits will be visible in the recent changes and it may get blocked.' %
self)
+                if sysop and 'sysop' not in self._rights[index]:
+                    output(u'WARNING: Your sysop account on %s does not seem to have
sysop rights. You may not be able to perform any sysop-restricted actions using it.' %
self)
+        else:
+            # We don't have wgUserGroups, and can't check the rights
+            self._rights[index] = []
+            if self._isLoggedIn[index]:
+                # Logged in user
+                self._rights[index].append('user')
+                # Assume bot, and thus autoconfirmed
+                self._rights[index].extend(['bot', 'autoconfirmed'])
+                if sysop:
+                    # Assume user reported as a sysop indeed has the sysop rights
+                    self._rights[index].append('sysop')
+        # Assume the user has the default rights
+        self._rights[index].extend(['read', 'createaccount',
'edit', 'upload', 'createpage', 'createtalk',
'move', 'upload'])
+        if 'bot' in self._rights[index] or 'sysop' in
self._rights[index]:
+            self._rights[index].append('apihighlimits')
+        if 'sysop' in self._rights[index]:
+            self._rights[index].extend(['delete', 'undelete',
'block', 'protect', 'import', 'deletedhistory',
'unwatchedpages'])
+
+        # Search for a token
+        tokenR = re.compile(r"\<input type='hidden'
value=\"(.*?)\" name=\"wpEditToken\"")
+        tokenloc = tokenR.search(text)
+        if tokenloc:
+            self._token[index] = tokenloc.group(1)
+            if self._rights[index] is not None:
+                # In this case, token and rights are loaded - user data is now loaded
+                self._userData[index] = True
+        else:
+            # Token not found
+            # Possible reason for this is the user is blocked, don't show a
+            # warning in this case, otherwise do show a warning
+            # Another possible reason is that the page cannot be edited - ensure
+            # there is a textarea and the tab "view source" is not shown
+            if u'<textarea' in text and u'<li
id="ca-viewsource"' not in text and not self._isBlocked[index]:
+                # Token not found
+                output(u'WARNING: Token not found on %s. You will not be able to edit
any page.' % self)
+
+    def siteinfo(self, key = 'general', force = False, dump = False):
+        """Get Mediawiki Site informations by API
+           dump - return all siteinfo datas
+
+           some siprop params is huge data for MediaWiki, they take long times to read by
testment.
+           these params could get, but only one by one.
+
+        """
+        # protection for key in other datatype
+        if type(key) not in [str, unicode]:
+           key = 'general'
+
+        if self._info and key in self._info and not force:
+            if dump:
+                return self._info
+            else:
+                return self._info[key]
+
+        params = {
+            'action':'query',
+            'meta':'siteinfo',
+            'siprop':['general', 'namespaces', ],
+        }
+        #ver 1.10 handle
+        if self.versionnumber() > 10:
+            params['siprop'].extend(['statistics', ])
+            if key in ['specialpagealiases', 'interwikimap',
'namespacealiases', 'usergroups', ]:
+                if verbose: print 'getting huge siprop %s...' % key
+                params['siprop'] = [key]
+
+        #ver 1.13 handle
+        if self.versionnumber() > 13:
+            if key not in ['specialpagealiases', 'interwikimap',
'namespacealiases', 'usergroups', ]:
+                params['siprop'].extend(['fileextensions',
'rightsinfo', ])
+            if key in ['magicwords', 'extensions', ]:
+                if verbose: print 'getting huge siprop %s...' % key
+                params['siprop'] = [key]
+        try:
+            data = query.GetData(params, self)['query']
+        except NotImplementedError:
+            return None
+
+        if not hasattr(self, '_info'):
+            self._info = data
+        else:
+            if key == 'magicwords':
+                if self.versionnumber() <= 13:
+                    return None #Not implemented
+                self._info[key]={}
+                for entry in data[key]:
+                    self._info[key][entry['name']] = entry['aliases']
+            else:
+                for k, v in data.iteritems():
+                    self._info[k] = v
+        #data pre-process
+        if dump:
+            return self._info
+        else:
+            return self._info.get(key)
+
+    def mediawiki_message(self, key, forceReload = False):
+        """Return the MediaWiki message text for key "key"
"""
+        # Allmessages is retrieved once for all per created Site object
+        if (not self._mediawiki_messages) or forceReload:
+            api = self.has_api()
+            if verbose:
+                output(u"Retrieving mediawiki messages from
Special:Allmessages")
+            # Only MediaWiki r27393/1.12 and higher support XML output for
Special:Allmessages
+            if self.versionnumber() < 12:
+                usePHP = True
+            else:
+                usePHP = False
+                elementtree = True
+                try:
+                    try:
+                        from xml.etree.cElementTree import XML # 2.5
+                    except ImportError:
+                        try:
+                            from cElementTree import XML
+                        except ImportError:
+                            from elementtree.ElementTree import XML
+                except ImportError:
+                    if verbose:
+                        output(u'Elementtree was not found, using BeautifulSoup
instead')
+                    elementtree = False
+
+            if config.use_diskcache and not api:
+                import diskcache
+                _dict = lambda x : diskcache.CachedReadOnlyDictI(x, prefix =
"msg-%s-%s-" % (self.family.name, self.lang))
+            else:
+                _dict = dict
+
+            retry_idle_time = 1
+            retry_attempt = 0
+            while True:
+                if api and self.versionnumber() >= 12 or self.versionnumber() >=
16:
+                    params = {
+                        'action': 'query',
+                        'meta': 'allmessages',
+                        'ammessages': key,
+                    }
+                    datas = query.GetData(params,
self)['query']['allmessages'][0]
+                    if "missing" in datas:
+                        raise KeyError("message is not exist.")
+                    elif datas['name'] not in self._mediawiki_messages:
+                        self._mediawiki_messages[datas['name']] =
datas['*']
+                    #self._mediawiki_messages = _dict([(tag['name'].lower(),
tag['*'])
+                    #        for tag in datas if not 'missing' in tag])
+                elif usePHP:
+                    phppage =
self.getUrl(self.get_address("Special:Allmessages") + "&ot=php")
+                    Rphpvals = re.compile(r"(?ms)'([^']*)' =&gt;
'(.*?[^\\])',")
+                    # Previous regexp don't match empty messages. Fast workaround...
+                    phppage = re.sub("(?m)^('.*?' =&gt;)
'',", r"\1 ' ',", phppage)
+                    self._mediawiki_messages = _dict([(name.strip().lower(),
+                        html2unicode(message.replace("\\'",
"'")))
+                            for (name, message) in Rphpvals.findall(phppage)])
+                else:
+                    xml = self.getUrl(self.get_address("Special:Allmessages") +
"&ot=xml")
+                    # xml structure is :
+                    # <messages lang="fr">
+                    #    <message name="about">À propos</message>
+                    #    ...
+                    # </messages>
+                    if elementtree:
+                        decode = xml.encode(self.encoding())
+
+                        # Skip extraneous data such as PHP warning or extra
+                        # whitespaces added from some MediaWiki extensions
+                        xml_dcl_pos = decode.find('<?xml')
+                        if xml_dcl_pos > 0:
+                            decode = decode[xml_dcl_pos:]
+
+                        tree = XML(decode)
+                        self._mediawiki_messages =
_dict([(tag.get('name').lower(), tag.text)
+                                for tag in tree.getiterator('message')])
+                    else:
+                        tree = BeautifulStoneSoup(xml)
+                        self._mediawiki_messages =
_dict([(tag.get('name').lower(), html2unicode(tag.string))
+                                for tag in tree.findAll('message') if
tag.string])
+
+                if not self._mediawiki_messages:
+                    # No messages could be added.
+                    # We assume that the server is down.
+                    # Wait some time, then try again.
+                    output(u'WARNING: No messages found in Special:Allmessages. Maybe
the server is down. Retrying in %i minutes...' % retry_idle_time)
+                    time.sleep(retry_idle_time * 60)
+                    # Next time wait longer, but not longer than half an hour
+                    retry_attempt += 1
+                    if retry_attempt > config.maxretries:
+                        raise ServerError()
+                    retry_idle_time *= 2
+                    if retry_idle_time > 30:
+                        retry_idle_time = 30
+                    continue
+                break
+
+        if self.family.name == 'wikitravel':    # fix for Wikitravel's
mediawiki message setting
+            self = self.family.mediawiki_message(self)
+
+        key = key.lower()
+        try:
+            return self._mediawiki_messages[key]
+        except KeyError:
+            if not forceReload:
+                return self.mediawiki_message(key, True)
+            else:
+                raise KeyError("MediaWiki key '%s' does not exist on
%s" % (key, self))
+
+    def has_mediawiki_message(self, key):
+        """Return True if this site defines a MediaWiki message for
'key'."""
+        #return key in self._mediawiki_messages
+        try:
+            v = self.mediawiki_message(key)
+            return True
+        except KeyError:
+            return False
+
+    def has_api(self):
+        """Return True if this sites family has api
interface."""
+        try:
+            if config.use_api:
+                x = self.apipath()
+                del x
+                return True
+        except NotImplementedError:
+            pass
+        return False
+
+    def _load(self, sysop = False, force = False):
+        """
+        Loads user data.
+        This is only done if we didn't do get any page yet and the information
+        is requested, otherwise we should already have this data.
+
+        Parameters:
+        * sysop - Get sysop user data?
+        """
+        index = self._userIndex(sysop)
+        if self._userData[index] and not force:
+            return
+        if verbose:
+            output(u'Getting information for site %s' % self)
+
+        # Get data
+        # API Userinfo is available from version 1.11
+        # preferencetoken available from 1.14
+        if self.has_api() and self.versionnumber() >= 11:
+            #Query userinfo
+            params = {
+                'action': 'query',
+                'meta': 'userinfo',
+                'uiprop':
['blockinfo','groups','rights','hasmsg'],
+            }
+            if self.versionnumber() >= 12:
+                params['uiprop'].append('ratelimits')
+            if self.versionnumber() >= 14:
+                params['uiprop'].append('preferencestoken')
+
+            data = query.GetData(params, self, sysop=sysop)
+
+            # Show the API error code instead making an index error
+            if 'error' in data:
+                raise RuntimeError('%s' % data['error'])
+
+            if self.versionnumber() == 11:
+                text = data['userinfo']
+            else:
+                text = data['query']['userinfo']
+
+            self._getUserData(text, sysop = sysop, force = force)
+        else:
+            url = self.edit_address('Non-existing_page')
+            text = self.getUrl(url, sysop = sysop)
+
+            self._getUserDataOld(text, sysop = sysop, force = force)
+
+    def search(self, key, number=10, namespaces=None):
+        """
+        Yield search results for query.
+        Use API when enabled use_api and version >= 1.11,
+        or use Special:Search.
+        """
+        if self.has_api() and self.versionnumber() >= 11:
+            #Yield search results (using api) for query.
+            params = {
+                'action': 'query',
+                'list': 'search',
+                'srsearch': key,
+            }
+            if number:
+                params['srlimit'] = number
+            if namespaces:
+                params['srnamespace'] = namespaces
+
+            offset = 0
+            while offset < number or not number:
+                params['sroffset'] = offset
+                data = query.GetData(params, self)
+                if 'error'in data:
+                    raise NotImplementedError('%s' %
data['error']['info'])
+                data = data['query']
+                if 'error' in data:
+                    raise RuntimeError('%s' % data['error'])
+                if not data['search']:
+                    break
+                for s in data['search']:
+                    offset += 1
+                    page = Page(self, s['title'])
+                    if self.versionnumber() >= 16:
+                        yield page, s['snippet'], '', s['size'],
s['wordcount'], s['timestamp']
+                    else:
+                        yield page, '', '', '', '',
''
+        else:
+            #Yield search results (using Special:Search page) for query.
+            throttle = True
+            path = self.search_address(urllib.quote_plus(key.encode('utf-8')),
+                                       n=number, ns=namespaces)
+            get_throttle()
+            html = self.getUrl(path)
+            entryR = re.compile(ur'<li><a href=".+?"
title="(?P<title>.+?)">.+?</a>',
+                                re.DOTALL)
+            for m in entryR.finditer(html):
+                page = Page(self, m.group('title'))
+                yield page, '', '', '', '', ''
+
+    # TODO: avoid code duplication for the following methods
+
+    def logpages(self, number = 50, mode = '', title = None, user = None, repeat
= False,
+                 namespace = [], start = None, end = None, tag = None, newer = False,
dump = False):
+
+        if not self.has_api() or self.versionnumber() < 11 or \
+           mode not in ('block', 'protect', 'rights',
'delete', 'upload',
+                        'move', 'import', 'patrol',
'merge', 'suppress',
+                        'review', 'stable', 'gblblock',
'renameuser',
+                        'globalauth', 'gblrights', 'abusefilter',
'newusers'):
+            raise NotImplementedError, mode
+        params = {
+            'action'    : 'query',
+            'list'      : 'logevents',
+            'letype'    : mode,
+            'lelimit'   : int(number),
+            'ledir'     : 'older',
+            'leprop'    : ['ids', 'title', 'type',
'user', 'timestamp', 'comment', 'details',],
+        }
+
+        if number > config.special_page_limit:
+            params['lelimit'] = config.special_page_limit
+            if number > 5000 and self.isAllowed('apihighlimits'):
+                params['lelimit'] = 5000
+        if newer:
+            params['ledir'] = 'newer'
+        if user:
+            params['leuser'] = user
+        if title:
+            params['letitle'] = title
+        if start:
+            params['lestart'] = start
+        if end:
+            params['leend'] = end
+        if tag and self.versionnumber() >= 16: # tag support from mw:r58399
+            params['letag'] = tag
+
+        nbresults = 0
+        while True:
+            result = query.GetData(params, self)
+            if 'error' in result or 'warnings' in result:
+                output('%s' % result)
+                raise Error
+            for c in result['query']['logevents']:
+                if (not namespace or c['ns'] in namespace) and \
+                   not 'actionhidden' in c.keys():
+                    if dump:
+                        # dump result only.
+                        yield c
+                    else:
+                        if c['ns'] == 6:
+                            p_ret = ImagePage(self, c['title'])
+                        else:
+                            p_ret = Page(self, c['title'],
defaultNamespace=c['ns'])
+
+                        yield (p_ret, c['user'],
+                          parsetime2stamp(c['timestamp']),
+                          c['comment'], )
+
+                nbresults += 1
+                if nbresults >= number:
+                    break
+            if 'query-continue' in result and nbresults < number:
+                params['lestart'] =
result['query-continue']['logevents']['lestart']
+            elif repeat:
+                nbresults = 0
+                try:
+                    params.pop('lestart')
+                except KeyError:
+                    pass
+            else:
+                break
+        return
+
+    def newpages(self, number = 10, get_redirect = False, repeat = False, namespace = 0,
rcshow = ['!bot','!redirect'], user = None, returndict = False):
+        """Yield new articles (as Page objects) from Special:Newpages.
+
+        Starts with the newest article and fetches the number of articles
+        specified in the first argument. If repeat is True, it fetches
+        Newpages again. If there is no new page, it blocks until there is
+        one, sleeping between subsequent fetches of Newpages.
+
+        The objects yielded are dependent on parmater returndict.
+        When true, it yields a tuple composed of a Page object and a dict of attributes.
+        When false, it yields a tuple composed of the Page object,
+        timestamp (unicode), length (int), an empty unicode string, username
+        or IP address (str), comment (unicode).
+
+        """
+        # TODO: in recent MW versions Special:Newpages takes a namespace parameter,
+        #       and defaults to 0 if not specified.
+        # TODO: Detection of unregistered users is broken
+        # TODO: Repeat mechanism doesn't make much sense as implemented;
+        #       should use both offset and limit parameters, and have an
+        #       option to fetch older rather than newer pages
+        seen = set()
+        while True:
+            if self.has_api() and self.versionnumber() >= 10:
+                params = {
+                    'action': 'query',
+                    'list': 'recentchanges',
+                    'rctype': 'new',
+                    'rcnamespace': namespace,
+                    'rclimit': int(number),
+                    'rcprop':
['ids','title','timestamp','sizes','user','comment'],
+                    'rcshow': rcshow,
+                }
+                if user: params['rcuser'] = user
+                data = query.GetData(params,
self)['query']['recentchanges']
+
+                for np in data:
+                    if np['pageid'] not in seen:
+                        seen.add(np['pageid'])
+                        page = Page(self, np['title'],
defaultNamespace=np['ns'])
+                        if returndict:
+                            yield page, np
+                        else:
+                            yield page, np['timestamp'], np['newlen'],
u'', np['user'], np['comment']
+            else:
+                path = self.newpages_address(n=number, namespace=namespace)
+                # The throttling is important here, so always enabled.
+                get_throttle()
+                html = self.getUrl(path)
+
+                entryR = re.compile('<li[^>]*>(?P<date>.+?) \S*?<a
href=".+?"'
+                    '
title="(?P<title>.+?)">.+?</a>.+?[\(\[](?P<length>[\d,.]+)[^\)\]]*[\)\]]'
+                    ' .?<a href=".+?"
title=".+?:(?P<username>.+?)">')
+                for m in entryR.finditer(html):
+                    date = m.group('date')
+                    title = m.group('title')
+                    title = title.replace('&quot;', '"')
+                    length = int(re.sub("[,.]", "",
m.group('length')))
+                    loggedIn = u''
+                    username = m.group('username')
+                    comment = u''
+
+                    if title not in seen:
+                        seen.add(title)
+                        page = Page(self, title)
+                        yield page, date, length, loggedIn, username, comment
+            if not repeat:
+                break
+
+    def longpages(self, number = 10, repeat = False):
+        """Yield Pages from Special:Longpages.
+
+        Return values are a tuple of Page object, length(int).
+
+        """
+        #TODO: should use offset and limit parameters; 'repeat' as now
+        #      implemented is fairly useless
+        # this comment applies to all the XXXXpages methods following, as well
+        seen = set()
+        path = self.longpages_address(n=number)
+        entryR = re.compile(ur'<li>\(<a href=".+?"
title=".+?">.+?</a>\) .<a href=".+?"
title="(?P<title>.+?)">.+?</a>
.\[(?P<length>[\d.,]+).*?\]</li>', re.UNICODE)
+
+        while True:
+            get_throttle()
+            html = self.getUrl(path)
+            for m in entryR.finditer(html):
+                title = m.group('title')
+                length = int(re.sub('[.,]', '',
m.group('length')))
+                if title not in seen:
+                    seen.add(title)
+                    page = Page(self, title)
+                    yield page, length
+            if not repeat:
+                break
+
+    def shortpages(self, number = 10, repeat = False):
+        """Yield Pages and lengths from
Special:Shortpages."""
+        throttle = True
+        seen = set()
+        path = self.shortpages_address(n = number)
+        entryR = re.compile(ur'<li>\(<a href=".+?"
title=".+?">.+?</a>\) .<a href=".+?"
title="(?P<title>.+?)">.+?</a>
.\[(?P<length>[\d.,]+).*?\]</li>', re.UNICODE)
+
+        while True:
+            get_throttle()
+            html = self.getUrl(path)
+
+            for m in entryR.finditer(html):
+                title = m.group('title')
+                length = int(re.sub('[., ]', '',
m.group('length')))
+
+                if title not in seen:
+                    seen.add(title)
+                    page = Page(self, title)
+                    yield page, length
+            if not repeat:
+                break
+
+    def categories(self, number=10, repeat=False):
+        """Yield Category objects from
Special:Categories"""
+        import catlib
+        seen = set()
+        while True:
+            path = self.categories_address(n=number)
+            get_throttle()
+            html = self.getUrl(path)
+            entryR = re.compile(
+                '<li><a href=".+?"
title="(?P<title>.+?)">.+?</a>.*?</li>')
+            for m in entryR.finditer(html):
+                title = m.group('title')
+                if title not in seen:
+                    seen.add(title)
+                    page = catlib.Category(self, title)
+                    yield page
+            if not repeat:
+                break
+
+    def deadendpages(self, number = 10, repeat = False):
+        """Yield Page objects retrieved from
Special:Deadendpages."""
+        seen = set()
+        while True:
+            path = self.deadendpages_address(n=number)
+            get_throttle()
+            html = self.getUrl(path)
+            entryR = re.compile(
+                '<li><a href=".+?"
title="(?P<title>.+?)">.+?</a></li>')
+            for m in entryR.finditer(html):
+                title = m.group('title')
+
+                if title not in seen:
+                    seen.add(title)
+                    page = Page(self, title)
+                    yield page
+            if not repeat:
+                break
+
+    def ancientpages(self, number = 10, repeat = False):
+        """Yield Pages, datestamps from
Special:Ancientpages."""
+        seen = set()
+        while True:
+            path = self.ancientpages_address(n=number)
+            get_throttle()
+            html = self.getUrl(path)
+            entryR = re.compile('<li><a href=".+?"
title="(?P<title>.+?)">.+?</a>
(?P<date>.+?)</li>')
+            for m in entryR.finditer(html):
+                title = m.group('title')
+                date = m.group('date')
+                if title not in seen:
+                    seen.add(title)
+                    page = Page(self, title)
+                    yield page, date
+            if not repeat:
+                break
+
+    def lonelypages(self, number = 10, repeat = False):
+        """Yield Pages retrieved from
Special:Lonelypages."""
+        throttle = True
+        seen = set()
+        while True:
+            path = self.lonelypages_address(n=number)
+            get_throttle()
+            html = self.getUrl(path)
+            entryR = re.compile(
+                '<li><a href=".+?"
title="(?P<title>.+?)">.+?</a></li>')
+            for m in entryR.finditer(html):
+                title = m.group('title')
+
+                if title not in seen:
+                    seen.add(title)
+                    page = Page(self, title)
+                    yield page
+            if not repeat:
+                break
+
+    def unwatchedpages(self, number = 10, repeat = False):
+        """Yield Pages from Special:Unwatchedpages (requires Admin
privileges)."""
+        seen = set()
+        while True:
+            path = self.unwatchedpages_address(n=number)
+            get_throttle()
+            html = self.getUrl(path, sysop = True)
+            entryR = re.compile(
+                '<li><a href=".+?"
title="(?P<title>.+?)">.+?</a>.+?</li>')
+            for m in entryR.finditer(html):
+                title = m.group('title')
+                if title not in seen:
+                    seen.add(title)
+                    page = Page(self, title)
+                    yield page
+            if not repeat:
+                break
+
+    def uncategorizedcategories(self, number = 10, repeat = False):
+        """Yield Categories from
Special:Uncategorizedcategories."""
+        import catlib
+        seen = set()
+        while True:
+            path = self.uncategorizedcategories_address(n=number)
+            get_throttle()
+            html = self.getUrl(path)
+            entryR = re.compile(
+                '<li><a href=".+?"
title="(?P<title>.+?)">.+?</a></li>')
+            for m in entryR.finditer(html):
+                title = m.group('title')
+                if title not in seen:
+                    seen.add(title)
+                    page = catlib.Category(self, title)
+                    yield page
+            if not repeat:
+                break
+
+    def newimages(self, number = 100, lestart = None, leend = None, leuser = None,
letitle = None, repeat = False):
+        """
+        Yield ImagePages from APIs, call:
action=query&list=logevents&letype=upload&lelimit=500
+
+        Options directly from APIs:
+        ---
+        Parameters:
+                           Default: ids|title|type|user|timestamp|comment|details
+          lestart        - The timestamp to start enumerating from.
+          leend          - The timestamp to end enumerating.
+          ledir          - In which direction to enumerate.
+                           One value: newer, older
+                           Default: older
+          leuser         - Filter entries to those made by the given user.
+          letitle        - Filter entries to those related to a page.
+          lelimit        - How many total event entries to return.
+                           No more than 500 (5000 for bots) allowed.
+                           Default: 10
+        """
+
+        for o, u, t, c in self.logpages(number = number, mode = 'upload', title =
letitle, user = leuser,
+                 repeat = repeat, start = lestart, end = leend):
+            yield o, t, u, c
+        return
+
+    def recentchanges(self, number=100, rcstart=None, rcend=None, rcshow=None,
+                      rcdir='older', rctype='edit|new', namespace=None,
+                      includeredirects=True, repeat=False, user=None,
+                      returndict=False):
+        """
+        Yield recent changes as Page objects
+        uses API call:
action=query&list=recentchanges&rctype=edit|new&rclimit=500
+
+        Starts with the newest change and fetches the number of changes
+        specified in the first argument. If repeat is True, it fetches
+        again.
+
+        Options directly from APIs:
+        ---
+        Parameters:
+          rcstart        - The timestamp to start enumerating from.
+          rcend          - The timestamp to end enumerating.
+          rcdir          - In which direction to enumerate.
+                           One value: newer, older
+                           Default: older
+          rcnamespace    - Filter log entries to only this namespace(s)
+                           Values (separate with '|'):
+                           0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
+          rcprop         - Include additional pieces of information
+                           Values (separate with '|'):
+                           user, comment, flags, timestamp, title, ids, sizes,
+                           redirect, patrolled, loginfo
+                           Default: title|timestamp|ids
+          rcshow         - Show only items that meet this criteria.
+                           For example, to see only minor edits done by
+                           logged-in users, set show=minor|!anon
+                           Values (separate with '|'):
+                           minor, !minor, bot, !bot, anon, !anon,
+                           redirect, !redirect, patrolled, !patrolled
+          rclimit        - How many total changes to return.
+                           No more than 500 (5000 for bots) allowed.
+                           Default: 10
+          rctype         - Which types of changes to show.
+                           Values (separate with '|'): edit, new, log
+
+        The objects yielded are dependent on parmater returndict.
+        When true, it yields a tuple composed of a Page object and a dict of attributes.
+        When false, it yields a tuple composed of the Page object,
+        timestamp (unicode), length (int), an empty unicode string, username
+        or IP address (str), comment (unicode).
+
+        # TODO: Detection of unregistered users is broken
+        """
+        if rctype is None:
+            rctype = 'edit|new'
+        params = {
+            'action'    : 'query',
+            'list'      : 'recentchanges',
+            'rcdir'     : rcdir,
+            'rctype'    : rctype,
+            'rcprop'    : ['user', 'comment',
'timestamp', 'title', 'ids',
+                           'loginfo', 'sizes'], #', 'flags',
'redirect', 'patrolled'],
+            'rcnamespace' : namespace,
+            'rclimit'   : int(number),
+            }
+        if user: params['rcuser'] = user
+        if rcstart: params['rcstart'] = rcstart
+        if rcend: params['rcend'] = rcend
+        if rcshow: params['rcshow'] = rcshow
+        if rctype: params['rctype'] = rctype
+
+        while True:
+            data = query.GetData(params, self, encodeTitle = False)
+            if 'error' in data:
+                raise RuntimeError('%s' % data['error'])
+            try:
+                rcData = data['query']['recentchanges']
+            except KeyError:
+                raise ServerError("The APIs don't return data, the site may be
down")
+
+            for i in rcData:
+                page = Page(self, i['title'], defaultNamespace=i['ns'])
+                if returndict:
+                    yield page, i
+                else:
+                    comment = ''
+                    if 'comment' in i:
+                        comment = i['comment']
+                    yield page, i['timestamp'], i['newlen'], True,
i['user'], comment
+            if not repeat:
+                break
+
+    def patrol(self, rcid, token = None):
+        if not self.has_api() or self.versionnumber() < 12:
+            raise Exception('patrol: no API: not implemented')
+
+        if not token:
+            token = self.getPatrolToken()
+
+        params = {
+            'action': 'patrol',
+            'rcid':   rcid,
+            'token':  token,
+        }
+
+        result = query.GetData(params, self)
+        if 'error' in result:
+            raise RuntimeError("%s" % result['error'])
+
+        return True
+
+    def uncategorizedimages(self, number = 10, repeat = False):
+        """Yield ImagePages from
Special:Uncategorizedimages."""
+        seen = set()
+        ns = self.image_namespace()
+        entryR = re.compile(
+            '<a href=".+?"
title="(?P<title>%s:.+?)">.+?</a>' % ns)
+        while True:
+            path = self.uncategorizedimages_address(n=number)
+            get_throttle()
+            html = self.getUrl(path)
+            for m in entryR.finditer(html):
+                title = m.group('title')
+                if title not in seen:
+                    seen.add(title)
+                    page = ImagePage(self, title)
+                    yield page
+            if not repeat:
+                break
+
+    def uncategorizedpages(self, number = 10, repeat = False):
+        """Yield Pages from Special:Uncategorizedpages."""
+        seen = set()
+        while True:
+            path = self.uncategorizedpages_address(n=number)
+            get_throttle()
+            html = self.getUrl(path)
+            entryR = re.compile(
+                '<li><a href=".+?"
title="(?P<title>.+?)">.+?</a></li>')
+            for m in entryR.finditer(html):
+                title = m.group('title')
+
+                if title not in seen:
+                    seen.add(title)
+                    page = Page(self, title)
+                    yield page
+            if not repeat:
+                break
+
+    def uncategorizedtemplates(self, number = 10, repeat = False):
+        """Yield Pages from
Special:UncategorizedTemplates."""
+        seen = set()
+        while True:
+            path = self.uncategorizedtemplates_address(n=number)
+            get_throttle()
+            html = self.getUrl(path)
+            entryR = re.compile(
+                '<li><a href=".+?"
title="(?P<title>.+?)">.+?</a></li>')
+            for m in entryR.finditer(html):
+                title = m.group('title')
+
+                if title not in seen:
+                    seen.add(title)
+                    page = Page(self, title)
+                    yield page
+            if not repeat:
+                break
+
+    def unusedcategories(self, number = 10, repeat = False):
+        """Yield Category objects from
Special:Unusedcategories."""
+        import catlib
+        seen = set()
+        while True:
+            path = self.unusedcategories_address(n=number)
+            get_throttle()
+            html = self.getUrl(path)
+            entryR = re.compile('<li><a href=".+?"
title="(?P<title>.+?)">.+?</a></li>')
+            for m in entryR.finditer(html):
+                title = m.group('title')
+
+                if title not in seen:
+                    seen.add(title)
+                    page = catlib.Category(self, title)
+                    yield page
+            if not repeat:
+                break
+
+    def wantedcategories(self, number=10, repeat=False):
+        """Yield Category objects from
Special:wantedcategories."""
+        import catlib
+        seen = set()
+        while True:
+            path = self.wantedcategories_address(n=number)
+            get_throttle()
+            html = self.getUrl(path)
+            entryR = re.compile(
+                '<li><a href=".+?" class="new"
title="(?P<title>.+?) \(page does not exist\)">.+?</a>
.+?\)</li>')
+            for m in entryR.finditer(html):
+                title = m.group('title')
+
+                if title not in seen:
+                    seen.add(title)
+                    page = catlib.Category(self, title)
+                    yield page
+            if not repeat:
+                break
+
+    def unusedfiles(self, number = 10, repeat = False, extension = None):
+        """Yield ImagePage objects from
Special:Unusedimages."""
+        seen = set()
+        ns = self.image_namespace()
+        entryR = re.compile(
+            '<a href=".+?"
title="(?P<title>%s:.+?)">.+?</a>' % ns)
+        while True:
+            path = self.unusedfiles_address(n=number)
+            get_throttle()
+            html = self.getUrl(path)
+            for m in entryR.finditer(html):
+                fileext = None
+                title = m.group('title')
+                if extension:
+                    fileext = title[len(title)-3:]
+                if title not in seen and fileext == extension:
+                    ## Check whether the media is used in a Proofread page
+                    # code disabled because it slows this method down, and
+                    # because it is unclear what it's supposed to do.
+                    #basename = title[6:]
+                    #page = Page(self, 'Page:' + basename)
+
+                    #if not page.exists():
+                    seen.add(title)
+                    image = ImagePage(self, title)
+                    yield image
+            if not repeat:
+                break
+
+    def withoutinterwiki(self, number=10, repeat=False):
+        """Yield Pages without language links from
Special:Withoutinterwiki."""
+        seen = set()
+        while True:
+            path = self.withoutinterwiki_address(n=number)
+            get_throttle()
+            html = self.getUrl(path)
+            entryR = re.compile('<li><a href=".+?"
title="(?P<title>.+?)">.+?</a></li>')
+            for m in entryR.finditer(html):
+                title = m.group('title')
+                if title not in seen:
+                    seen.add(title)
+                    page = Page(self, title)
+                    yield page
+            if not repeat:
+                break
+
+    def randompage(self, redirect = False):
+        if self.has_api() and self.versionnumber() >= 12:
+            params = {
+                'action': 'query',
+                'list': 'random',
+                #'rnnamespace': '0',
+                'rnlimit': '1',
+                #'': '',
+            }
+            if redirect:
+                params['rnredirect'] = 1
+
+            data = query.GetData(params, self)
+            return Page(self,
data['query']['random'][0]['title'])
+        else:
+            if redirect:
+                """Yield random redirect page via
Special:RandomRedirect."""
+                html = self.getUrl(self.randomredirect_address())
+            else:
+                """Yield random page via Special:Random"""
+                html = self.getUrl(self.random_address())
+            m = re.search('var wgPageName = "(?P<title>.+?)";',
html)
+            if m is not None:
+                return Page(self, m.group('title'))
+
+    def randomredirectpage(self):
+        return self.randompage(redirect = True)
+
+    def allpages(self, start='!', namespace=None, includeredirects=True,
+                 throttle=True):
+        """
+        Yield all Pages in alphabetical order.
+
+        Parameters:
+        start   Start at this page. By default, it starts at '!', and yields
+                all pages.
+        namespace Yield all pages in this namespace; defaults to 0.
+                MediaWiki software will only return pages in one namespace
+                at a time.
+
+        If includeredirects is False, redirects will not be found.
+
+        It is advised not to use this directly, but to use the
+        AllpagesPageGenerator from pagegenerators.py instead.
+
+        """
+        if namespace is None:
+            page = Page(self, start)
+            namespace = page.namespace()
+            start = page.title(withNamespace=False)
+
+        if not self.has_api():
+            for page in self._allpagesOld(start, namespace, includeredirects, throttle):
+                yield page
+            return
+
+        params = {
+            'action'     : 'query',
+            'list'       : 'allpages',
+            'aplimit'    : config.special_page_limit,
+            'apnamespace': namespace,
+            'apfrom'     : start
+        }
+
+        if not includeredirects:
+            params['apfilterredir'] = 'nonredirects'
+        elif includeredirects == 'only':
+            params['apfilterredir'] = 'redirects'
+
+        while True:
+            if throttle:
+                get_throttle()
+            data = query.GetData(params, self)
+            if verbose:
+                print 'DEBUG allpages>>> data.keys()', data.keys()
+            if 'warnings' in data:
+                warning = data['warnings']['allpages']['*']
+                raise RuntimeError("API query warning: %s" % warning)
+            if 'error' in data:
+                raise RuntimeError("API query error: %s" % data)
+            if not 'allpages' in data['query']:
+                raise RuntimeError("API query error, no pages found: %s" %
data)
+            count = 0
+            for p in data['query']['allpages']:
+                count += 1
+                yield Page(self, p['title'])
+                if count >= config.special_page_limit:
+                    break
+            if 'query-continue' in data and count <
params['aplimit']:
+                # get the continue key for backward compatibility with pre 1.20wmf8
+                contKey = data['query-continue']['allpages'].keys()[0]
+                params[contKey] =
data['query-continue']['allpages'][contKey]
+            else:
+                break
+
+    def _allpagesOld(self, start='!', namespace=0, includeredirects=True,
+                 throttle=True):
+        """
+        Yield all Pages from Special:Allpages.
+
+        This method doesn't work with MediaWiki 1.14 because of a change to
+        Special:Allpages. It is only left here for compatibility with older
+        MediaWiki versions, which don't support the API.
+
+        Parameters:
+        start   Start at this page. By default, it starts at '!', and yields
+                all pages.
+        namespace Yield all pages in this namespace; defaults to 0.
+                MediaWiki software will only return pages in one namespace
+                at a time.
+
+        If includeredirects is False, redirects will not be found.
+        If includeredirects equals the string 'only', only redirects
+        will be found. Note that this has not been tested on older
+        versions of the MediaWiki code.
+
+        It is advised not to use this directly, but to use the
+        AllpagesPageGenerator from pagegenerators.py instead.
+
+        """
+        monobook_error = True
+        if start == '':
+            start='!'
+
+        while True:
+            # encode Non-ASCII characters in hexadecimal format (e.g. %F6)
+            start = start.encode(self.encoding())
+            start = urllib.quote(start)
+            # load a list which contains a series of article names (always 480)
+            path = self.allpages_address(start, namespace)
+            output(u'Retrieving Allpages special page for %s from %s, namespace
%i' % (repr(self), start, namespace))
+            returned_html = self.getUrl(path)
+            # Try to find begin and end markers
+            try:
+                # In 1.4, another table was added above the navigational links
+                if self.versionnumber() >= 4:
+                    begin_s = '</table><hr /><table'
+                    end_s = '</table'
+                else:
+                    begin_s = '<table'
+                    end_s = '</table'
+                ibegin = returned_html.index(begin_s)
+                iend = returned_html.index(end_s,ibegin + 3)
+            except ValueError:
+                if monobook_error:
+                    raise ServerError("Couldn't extract allpages special page.
Make sure you're using MonoBook skin.")
+                else:
+                    # No list of wikilinks
+                    break
+            monobook_error = False
+            # remove the irrelevant sections
+            returned_html = returned_html[ibegin:iend]
+            if self.versionnumber()==2:
+                R = re.compile('/wiki/(.*?)\"
*class=[\'\"]printable')
+            elif self.versionnumber()<5:
+                # Apparently the special code for redirects was added in 1.5
+                R = re.compile('title ?=\"(.*?)\"')
+            elif not includeredirects:
+                R = re.compile('\<td(?: width="33%")?\>\<a
href=\"\S*\" +title ?="(.*?)"')
+            elif includeredirects == 'only':
+                R = re.compile('\<td(?:
width="33%")?>\<[^\<\>]*allpagesredirect\"\>\<a
href=\"\S*\" +title ?="(.*?)"')
+            else:
+                R = re.compile('title ?=\"(.*?)\"')
+            # Count the number of useful links on this page
+            n = 0
+            for hit in R.findall(returned_html):
+                # count how many articles we found on the current page
+                n = n + 1
+                if self.versionnumber()==2:
+                    yield Page(self, url2link(hit, site = self, insite = self))
+                else:
+                    yield Page(self, hit)
+                # save the last hit, so that we know where to continue when we
+                # finished all articles on the current page. Append a '!' so
that
+                # we don't yield a page twice.
+                start = Page(self, hit).title(withNamespace=False) + '!'
+            # A small shortcut: if there are less than 100 pages listed on this
+            # page, there is certainly no next. Probably 480 would do as well,
+            # but better be safe than sorry.
+            if n < 100:
+                if (not includeredirects) or includeredirects == 'only':
+                    # Maybe there were only so few because the rest is or is not a
redirect
+                    R = re.compile('title ?=\"(.*?)\"')
+                    allLinks = R.findall(returned_html)
+                    if len(allLinks) < 100:
+                        break
+                    elif n == 0:
+                        # In this special case, no pages of the requested type
+                        # were found, and "start" will remain and be
double-encoded.
+                        # Use the last page as the start of the next page.
+                        start = Page(self,
+                                     allLinks[-1]).title(
+                                         withNamespace=False) + '!'
+                else:
+                    break
+            #else:
+            #    # Don't send a new request if "Next page (pagename)"
isn't present
+            #    Rnonext =
re.compile(r'title="(Special|%s):.+?">%s</a></td></tr></table>'
% (
+            #        self.mediawiki_message('nstab-special'),
+            #       
re.escape(self.mediawiki_message('nextpage')).replace('\$1',
'.*?')))
+            #    if not Rnonext.search(full_returned_html):
+            #        break
+
+    def prefixindex(self, prefix, namespace=0, includeredirects=True):
+        """Yield all pages with a given prefix.
+
+        Parameters:
+        prefix   The prefix of the pages.
+        namespace Namespace number; defaults to 0.
+                MediaWiki software will only return pages in one namespace
+                at a time.
+
+        If includeredirects is False, redirects will not be found.
+        If includeredirects equals the string 'only', only redirects
+        will be found. Note that this has not been tested on older
+        versions of the MediaWiki code.
+
+        It is advised not to use this directly, but to use the
+        PrefixingPageGenerator from pagegenerators.py instead.
+        """
+        for page in self.allpages(start = prefix, namespace = namespace, includeredirects
= includeredirects):
+            if page.title(withNamespace=False).startswith(prefix):
+                yield page
+            else:
+                break
+
+    def protectedpages(self, namespace = None, type = 'edit', lvl = 0):
+        """ Yield all the protected pages, using Special:ProtectedPages
+            * namespace is a namespace number
+            * type can be 'edit' or 'move
+            * lvl : protection level, can be 0, 'autoconfirmed', or
'sysop'
+        """
+        # Avoid problems of encoding and stuff like that, let it divided please
+        url = self.protectedpages_address()
+        url += '&type=%s&level=%s' % (type, lvl)
+        if namespace is not None: # /!\ if namespace seems simpler, but returns false
when ns=0
+
+            url += '&namespace=%s' % namespace
+        parser_text = self.getUrl(url)
+        while 1:
+            #<li><a href="/wiki/Pagina_principale" title="Pagina
principale">Pagina principale</a>‎ <small>(6.522 byte)</small>
‎(protetta)</li>
+            m = re.findall(r'<li><a href=".*?"
title=".*?">(.*?)</a>.*?<small>\((.*?)\)</small>.*?\((.*?)\)</li>',
parser_text)
+            for data in m:
+                title = data[0]
+                size = data[1]
+                status = data[2]
+                yield Page(self, title)
+            nextpage = re.findall(r'<.ul>\(.*?\).*?\(.*?\).*?\(<a
href="(.*?)".*?</a>\) +?\(<a href=', parser_text)
+            if nextpage != []:
+                parser_text = self.getUrl(nextpage[0].replace('&amp;',
'&'))
+                continue
+            else:
+                break
+
+    def linksearch(self, siteurl, limit=500):
+        """Yield Pages from results of Special:Linksearch for
'siteurl'."""
+        cache = []
+        R = re.compile('title
?=\"([^<>]*?)\">[^<>]*</a></li>')
+        urlsToRetrieve = [siteurl]
+        if not siteurl.startswith('*.'):
+            urlsToRetrieve.append('*.' + siteurl)
+
+        if self.has_api() and self.versionnumber() >= 11:
+            output(u'Querying API exturlusage...')
+            for url in urlsToRetrieve:
+                params = {
+                    'action': 'query',
+                    'list'  : 'exturlusage',
+                    'eulimit': limit,
+                    'euquery': url,
+                }
+                count = 0
+                while True:
+                    data = query.GetData(params, self)
+                    if data['query']['exturlusage'] == []:
+                        break
+                    for pages in data['query']['exturlusage']:
+                        count += 1
+                        if not siteurl in pages['title']:
+                            # the links themselves have similar form
+                            if pages['pageid'] not in cache:
+                                cache.append(pages['pageid'])
+                                yield Page(self, pages['title'],
defaultNamespace=pages['ns'])
+                        if count >= limit:
+                            break
+
+                    if 'query-continue' in data and count < limit:
+                            params['euoffset'] =
data[u'query-continue'][u'exturlusage'][u'euoffset']
+                    else:
+                            break
+        else:
+            output(u'Querying [[Special:Linksearch]]...')
+            for url in urlsToRetrieve:
+                offset = 0
+                while True:
+                    path = self.linksearch_address(url, limit=limit, offset=offset)
+                    get_throttle()
+                    html = self.getUrl(path)
+                    #restricting the HTML source :
+                    #when in the source, this div marks the beginning of the input
+                    loc = html.find('<div
class="mw-spcontent">')
+                    if loc > -1:
+                        html = html[loc:]
+                    #when in the source, marks the end of the linklist
+                    loc = html.find('<div class="printfooter">')
+                    if loc > -1:
+                        html = html[:loc]
+
+                    #our regex fetches internal page links and the link they contain
+                    links = R.findall(html)
+                    if not links:
+                        #no more page to be fetched for that link
+                        break
+                    for title in links:
+                        if not siteurl in title:
+                            # the links themselves have similar form
+                            if title in cache:
+                                continue
+                            else:
+                                cache.append(title)
+                                yield Page(self, title)
+                    offset += limit
+
+    def linkto(self, title, othersite = None):
+        """Return unicode string in the form of a wikilink to
'title'
+
+        Use optional Site argument 'othersite' to generate an interwiki link
+        from the other site to the current site.
+
+        """
+        if othersite and othersite.lang != self.lang:
+            return u'[[%s:%s]]' % (self.lang, title)
+        else:
+            return u'[[%s]]' % title
+
+    def isInterwikiLink(self, s):
+        """Return True if s is in the form of an interwiki link.
+
+        Interwiki links have the form "foo:bar" or ":foo:bar" where
foo is a
+        known language code or family. Called recursively if the first part
+        of the link refers to this site's own family and/or language.
+
+        """
+        s = s.replace("_", " ").strip("
").lstrip(":")
+        if not ':' in s:
+            return False
+        first, rest = s.split(':',1)
+        # interwiki codes are case-insensitive
+        first = first.lower().strip(" ")
+        # commons: forwards interlanguage links to wikipedia:, etc.
+        if self.family.interwiki_forward:
+            interlangTargetFamily = Family(self.family.interwiki_forward)
+        else:
+            interlangTargetFamily = self.family
+        if self.getNamespaceIndex(first):
+            return False
+        if first in interlangTargetFamily.langs:
+            if first == self.lang:
+                return self.isInterwikiLink(rest)
+            else:
+                return True
+        if first in self.family.get_known_families(site = self):
+            if first == self.family.name:
+                return self.isInterwikiLink(rest)
+            else:
+                return True
+        return False
+
+    def getmagicwords(self, word):
+        """Return list of localized "word" magic words for the
site."""
+        if self.versionnumber() <= 13:
+            raise NotImplementedError
+        return self.siteinfo('magicwords').get(word)
+
+    def redirectRegex(self):
+        """Return a compiled regular expression matching on redirect
pages.
+
+        Group 1 in the regex match object will be the target title.
+
+        """
+        #NOTE: this is needed, since the API can give false positives!
+        default = 'REDIRECT'
+        keywords = self.versionnumber() > 13 and
self.getmagicwords('redirect')
+        if keywords:
+            pattern = r'(?:' + '|'.join(keywords) + ')'
+        else:
+            # no localized keyword for redirects
+            pattern = r'#%s' % default
+        if self.versionnumber() > 12:
+            # in MW 1.13 (at least) a redirect directive can follow whitespace
+            prefix = r'\s*'
+        else:
+            prefix = r'[\r\n]*'
+        # A redirect starts with hash (#), followed by a keyword, then
+        # arbitrary stuff, then a wikilink. The wikilink may contain
+        # a label, although this is not useful.
+        return re.compile(prefix + pattern
+                                 + '\s*:?\s*\[\[(.+?)(?:\|.*?)?\]\]',
+                          re.IGNORECASE | re.UNICODE | re.DOTALL)
+
+    def pagenamecodes(self, default=True):
+        """Return list of localized PAGENAME tags for the
site."""
+        return self.versionnumber() > 13 and self.getmagicwords('pagename') \
+               or u'PAGENAME'
+
+    def pagename2codes(self, default=True):
+        """Return list of localized PAGENAMEE tags for the
site."""
+        return self.versionnumber() > 13 and self.getmagicwords('pagenamee')
\
+               or u'PAGENAMEE'
+
+    def resolvemagicwords(self, wikitext):
+        """Replace the {{ns:xx}} marks in a wikitext with the namespace
names"""
+
+        defaults = []
+        for namespace in self.family.namespaces.itervalues():
+            value = namespace.get('_default', None)
+            if value:
+                if isinstance(value, list):
+                    defaults.append(value[0])
+                else:
+                    defaults.append(value)
+
+        named = re.compile(u'{{ns:(' + '|'.join(defaults) +
')}}', re.I)
+
+        def replacenamed(match):
+            return self.normalizeNamespace(match.group(1))
+
+        wikitext = named.sub(replacenamed, wikitext)
+
+        numbered = re.compile('{{ns:(-?\d{1,2})}}', re.I)
+
+        def replacenumbered(match):
+            return self.namespace(int(match.group(1)))
+
+        return numbered.sub(replacenumbered, wikitext)
+
+    # The following methods are for convenience, so that you can access
+    # methods of the Family class easier.
+    def encoding(self):
+        """Return the current encoding for this site."""
+        return self.family.code2encoding(self.lang)
+
+    def encodings(self):
+        """Return a list of all historical encodings for this
site."""
+        return self.family.code2encodings(self.lang)
+
+    def category_namespace(self):
+        """Return the canonical name of the Category namespace on this
site."""
+        # equivalent to self.namespace(14)?
+        return self.family.category_namespace(self.lang)
+
+    def category_namespaces(self):
+        """Return a list of all valid names for the Category
namespace."""
+        return self.family.category_namespaces(self.lang)
+
+    def category_redirects(self):
+        return self.family.category_redirects(self.lang)
+
+    def image_namespace(self, fallback = '_default'):
+        """Return the canonical name of the Image namespace on this
site."""
+        # equivalent to self.namespace(6)?
+        return self.family.image_namespace(self.lang, fallback)
+
+    def template_namespace(self, fallback = '_default'):
+        """Return the canonical name of the Template namespace on this
site."""
+        # equivalent to self.namespace(10)?
+        return self.family.template_namespace(self.lang, fallback)
+
+    def export_address(self):
+        """Return URL path for Special:Export."""
+        return self.family.export_address(self.lang)
+
+    def query_address(self):
+        """Return URL path + '?' for query.php (if enabled on this
Site)."""
+        return self.family.query_address(self.lang)
+
+    def api_address(self):
+        """Return URL path + '?' for api.php (if enabled on this
Site)."""
+        return self.family.api_address(self.lang)
+
+    def apipath(self):
+        """Return URL path for api.php (if enabled on this
Site)."""
+        return self.family.apipath(self.lang)
+
+    def scriptpath(self):
+        """Return URL prefix for scripts on this site ({{SCRIPTPATH}}
value)"""
+        return self.family.scriptpath(self.lang)
+
+    def protocol(self):
+        """Return protocol ('http' or 'https') for access
to this site."""
+        return self.family.protocol(self.lang)
+
+    def hostname(self):
+        """Return host portion of site URL."""
+        return self.family.hostname(self.lang)
+
+    def path(self):
+        """Return URL path for index.php on this Site."""
+        return self.family.path(self.lang)
+
+    def dbName(self):
+        """Return MySQL database name."""
+        return self.family.dbName(self.lang)
+
+    def move_address(self):
+        """Return URL path for Special:Movepage."""
+        return self.family.move_address(self.lang)
+
+    def delete_address(self, s):
+        """Return URL path to delete title 's'."""
+        return self.family.delete_address(self.lang, s)
+
+    def undelete_view_address(self, s, ts=''):
+        """Return URL path to view Special:Undelete for title 's'
+
+        Optional argument 'ts' returns path to view specific deleted version.
+
+        """
+        return self.family.undelete_view_address(self.lang, s, ts)
+
+    def undelete_address(self):
+        """Return URL path to Special:Undelete."""
+        return self.family.undelete_address(self.lang)
+
+    def protect_address(self, s):
+        """Return URL path to protect title
's'."""
+        return self.family.protect_address(self.lang, s)
+
+    def unprotect_address(self, s):
+        """Return URL path to unprotect title
's'."""
+        return self.family.unprotect_address(self.lang, s)
+
+    def put_address(self, s):
+        """Return URL path to submit revision to page titled
's'."""
+        return self.family.put_address(self.lang, s)
+
+    def get_address(self, s):
+        """Return URL path to retrieve page titled
's'."""
+        title = s.replace(' ', '_')
+        return self.family.get_address(self.lang, title)
+
+    def nice_get_address(self, s):
+        """Return shorter URL path to retrieve page titled
's'."""
+        return self.family.nice_get_address(self.lang, s)
+
+    def edit_address(self, s):
+        """Return URL path for edit form for page titled
's'."""
+        return self.family.edit_address(self.lang, s)
+
+    def watch_address(self, s):
+        """Return URL path for watching the titled
's'."""
+        return self.family.watch_address(self.lang, s)
+
+    def unwatch_address(self, s):
+        """Return URL path for unwatching the titled
's'."""
+        return self.family.unwatch_address(self.lang, s)
+
+    def purge_address(self, s):
+        """Return URL path to purge cache and retrieve page
's'."""
+        return self.family.purge_address(self.lang, s)
+
+    def block_address(self):
+        """Return path to block an IP address."""
+        return self.family.block_address(self.lang)
+
+    def unblock_address(self):
+        """Return path to unblock an IP address."""
+        return self.family.unblock_address(self.lang)
+
+    def blocksearch_address(self, s):
+        """Return path to search for blocks on IP address
's'."""
+        return self.family.blocksearch_address(self.lang, s)
+
+    def linksearch_address(self, s, limit=500, offset=0):
+        """Return path to Special:Linksearch for target
's'."""
+        return self.family.linksearch_address(self.lang, s, limit=limit, offset=offset)
+
+    def search_address(self, q, n=50, ns=0):
+        """Return path to Special:Search for query
'q'."""
+        return self.family.search_address(self.lang, q, n, ns)
+
+    def allpages_address(self, s, ns = 0):
+        """Return path to Special:Allpages."""
+        return self.family.allpages_address(self.lang, start=s, namespace = ns)
+
+    def log_address(self, n=50, mode = '', user = ''):
+        """Return path to Special:Log."""
+        return self.family.log_address(self.lang, n, mode, user)
+
+    def newpages_address(self, n=50, namespace=0):
+        """Return path to Special:Newpages."""
+        return self.family.newpages_address(self.lang, n, namespace)
+
+    def longpages_address(self, n=500):
+        """Return path to Special:Longpages."""
+        return self.family.longpages_address(self.lang, n)
+
+    def shortpages_address(self, n=500):
+        """Return path to Special:Shortpages."""
+        return self.family.shortpages_address(self.lang, n)
+
+    def unusedfiles_address(self, n=500):
+        """Return path to Special:Unusedimages."""
+        return self.family.unusedfiles_address(self.lang, n)
+
+    def categories_address(self, n=500):
+        """Return path to Special:Categories."""
+        return self.family.categories_address(self.lang, n)
+
+    def deadendpages_address(self, n=500):
+        """Return path to Special:Deadendpages."""
+        return self.family.deadendpages_address(self.lang, n)
+
+    def ancientpages_address(self, n=500):
+        """Return path to Special:Ancientpages."""
+        return self.family.ancientpages_address(self.lang, n)
+
+    def lonelypages_address(self, n=500):
+        """Return path to Special:Lonelypages."""
+        return self.family.lonelypages_address(self.lang, n)
+
+    def protectedpages_address(self, n=500):
+        """Return path to Special:ProtectedPages"""
+        return self.family.protectedpages_address(self.lang, n)
+
+    def unwatchedpages_address(self, n=500):
+        """Return path to Special:Unwatchedpages."""
+        return self.family.unwatchedpages_address(self.lang, n)
+
+    def uncategorizedcategories_address(self, n=500):
+        """Return path to
Special:Uncategorizedcategories."""
+        return self.family.uncategorizedcategories_address(self.lang, n)
+
+    def uncategorizedimages_address(self, n=500):
+        """Return path to Special:Uncategorizedimages."""
+        return self.family.uncategorizedimages_address(self.lang, n)
+
+    def uncategorizedpages_address(self, n=500):
+        """Return path to Special:Uncategorizedpages."""
+        return self.family.uncategorizedpages_address(self.lang, n)
+
+    def uncategorizedtemplates_address(self, n=500):
+        """Return path to Special:Uncategorizedpages."""
+        return self.family.uncategorizedtemplates_address(self.lang, n)
+
+    def unusedcategories_address(self, n=500):
+        """Return path to Special:Unusedcategories."""
+        return self.family.unusedcategories_address(self.lang, n)
+
+    def wantedcategories_address(self, n=500):
+        """Return path to Special:Wantedcategories."""
+        return self.family.wantedcategories_address(self.lang, n)
+
+    def withoutinterwiki_address(self, n=500):
+        """Return path to Special:Withoutinterwiki."""
+        return self.family.withoutinterwiki_address(self.lang, n)
+
+    def references_address(self, s):
+        """Return path to Special:Whatlinksere for page
's'."""
+        return self.family.references_address(self.lang, s)
+
+    def allmessages_address(self):
+        """Return path to Special:Allmessages."""
+        return self.family.allmessages_address(self.lang)
+
+    def upload_address(self):
+        """Return path to Special:Upload."""
+        return self.family.upload_address(self.lang)
+
+    def double_redirects_address(self, default_limit = True):
+        """Return path to Special:Doubleredirects."""
+        return self.family.double_redirects_address(self.lang, default_limit)
+
+    def broken_redirects_address(self, default_limit = True):
+        """Return path to Special:Brokenredirects."""
+        return self.family.broken_redirects_address(self.lang, default_limit)
+
+    def random_address(self):
+        """Return path to Special:Random."""
+        return self.family.random_address(self.lang)
+
+    def randomredirect_address(self):
+        """Return path to Special:RandomRedirect."""
+        return self.family.randomredirect_address(self.lang)
+
+    def login_address(self):
+        """Return path to Special:Userlogin."""
+        return self.family.login_address(self.lang)
+
+    def captcha_image_address(self, id):
+        """Return path to Special:Captcha for image
'id'."""
+        return self.family.captcha_image_address(self.lang, id)
+
+    def watchlist_address(self):
+        """Return path to Special:Watchlist editor."""
+        return self.family.watchlist_address(self.lang)
+
+    def contribs_address(self, target, limit=500, offset=''):
+        """Return path to Special:Contributions for user
'target'."""
+        return self.family.contribs_address(self.lang,target,limit,offset)
+
+    def globalusers_address(self, target='', limit=500, offset='',
group=''):
+        """Return path to Special:GlobalUsers for user 'target'
and/or group 'group'."""
+        return self.family.globalusers_address(self.lang, target, limit, offset, group)
+
+    def version(self):
+        """Return MediaWiki version number as a string."""
+        return self.family.version(self.lang)
+
+    def versionnumber(self):
+        """Return an int identifying MediaWiki version.
+
+        Currently this is implemented as returning the minor version
+        number; i.e., 'X' in version '1.X.Y'
+
+        """
+        return self.family.versionnumber(self.lang)
+
+    def live_version(self):
+        """Return the 'real' version number found on
[[Special:Version]]
+
+        Return value is a tuple (int, int, str) of the major and minor
+        version numbers and any other text contained in the version.
+
+        """
+        global htmldata
+        if not hasattr(self, "_mw_version"):
+            PATTERN = r"^(?:: )?([0-9]+)\.([0-9]+)(.*)$"
+            versionpage = self.getUrl(self.get_address("Special:Version"))
+            htmldata = BeautifulSoup(versionpage, convertEntities="html")
+            # try to find the live version
+            versionlist = []
+            # 1st try is for mw < 1.17wmf1
+            versionlist.append(lambda: htmldata.findAll(
+                                       text="MediaWiki")[1].parent.nextSibling
)
+            # 2nd try is for mw >=1.17wmf1
+            versionlist.append(lambda: htmldata.body.table.findAll(
+                                       'td')[1].contents[0] )
+            # 3rd uses family file which is not live
+            versionlist.append(lambda: self.family.version(self.lang) )
+            for versionfunc in versionlist:
+                try:
+                    versionstring = versionfunc()
+                except:
+                    continue
+                m = re.match(PATTERN, str(versionstring).strip())
+                if m:
+                    break
+            else:
+                raise Error(u'Cannot find any live version!')
+            self._mw_version = (int(m.group(1)), int(m.group(2)), m.group(3))
+        return self._mw_version
+
+    def checkCharset(self, charset):
+        """Warn if charset returned by wiki doesn't match family
file."""
+        fromFamily = self.encoding()
+        assert fromFamily.lower() == charset.lower(), \
+               "charset for %s changed from %s to %s" \
+                   % (repr(self), fromFamily, charset)
+        if fromFamily.lower() != charset.lower():
+            raise ValueError(
+"code2encodings has wrong charset for %s. It should be %s, but is %s"
+                             % (repr(self), charset, self.encoding()))
+
+    def shared_image_repository(self):
+        """Return a tuple of image repositories used by this
site."""
+        return self.family.shared_image_repository(self.lang)
+
+    def category_on_one_line(self):
+        """Return True if this site wants all category links on one
line."""
+        return self.lang in self.family.category_on_one_line
+
+    def interwiki_putfirst(self):
+        """Return list of language codes for ordering of interwiki
links."""
+        return self.family.interwiki_putfirst.get(self.lang, None)
+
+    def interwiki_putfirst_doubled(self, list_of_links):
+        # TODO: is this even needed?  No family in the framework has this
+        # dictionary defined!
+        if self.lang in self.family.interwiki_putfirst_doubled:
+            if len(list_of_links) >=
self.family.interwiki_putfirst_doubled[self.lang][0]:
+                list_of_links2 = []
+                for lang in list_of_links:
+                    list_of_links2.append(lang.language())
+                list = []
+                for lang in self.family.interwiki_putfirst_doubled[self.lang][1]:
+                    try:
+                        list.append(list_of_links[list_of_links2.index(lang)])
+                    except ValueError:
+                        pass
+                return list
+            else:
+                return False
+        else:
+            return False
+
+    def getSite(self, code):
+        """Return Site object for language 'code' in this
Family."""
+        return getSite(code = code, fam = self.family, user=self.user)
+
+    def namespace(self, num, all = False):
+        """Return string containing local name of namespace
'num'.
+
+        If optional argument 'all' is true, return a tuple of all recognized
+        values for this namespace.
+
+        """
+        return self.family.namespace(self.lang, num, all = all)
+
+    def normalizeNamespace(self, value):
+        """Return canonical name for namespace 'value' in this
Site's language.
+
+        'Value' should be a string or unicode.
+        If no match, return 'value' unmodified.
+
+        """
+        if not self.nocapitalize:
+            # make sure first letter gets normalized; there is at least
+            # one case ("İ") in which s.lower().upper() != s
+            value = value[0].lower().upper() + value[1:]
+        return self.family.normalizeNamespace(self.lang, value)
+
+    def getNamespaceIndex(self, namespace):
+        """Given a namespace name, return its int index, or None if
invalid."""
+        return self.family.getNamespaceIndex(self.lang, namespace)
+
+    def language(self):
+        """Return Site's language code."""
+        return self.lang
+
+    def fam(self):
+        """Return Family object for this Site."""
+        return self.family
+
+    def disambcategory(self):
+        """Return Category in which disambig pages are
listed."""
+        import catlib
+        try:
+            return catlib.Category(self,
+                    self.namespace(14)+':'+self.family.disambcatname[self.lang])
+        except KeyError:
+            raise NoPage
+
+    def getToken(self, getalways = True, getagain = False, sysop = False):
+        index = self._userIndex(sysop)
+        if getagain or (getalways and self._token[index] is None):
+            output(u'Getting a token.')
+            self._load(sysop = sysop, force = True)
+        if self._token[index] is not None:
+            return self._token[index]
+        else:
+            return False
+
+    def getPatrolToken(self, sysop = False):
+        index = self._userIndex(sysop)
+
+        if self._patrolToken[index] is None:
+            output(u'Getting a patrol token.')
+            params = {
+                'action'    : 'query',
+                'list'      : 'recentchanges',
+                'rcshow'    : '!patrolled',
+                'rctoken'   : 'patrol',
+                'rclimit'   : 1,
+            }
+            data = query.GetData(params, self, encodeTitle = False)
+            if 'error' in data:
+                raise RuntimeError('%s' % data['error'])
+            try:
+                rcData = data['query']['recentchanges']
+            except KeyError:
+                raise ServerError("The APIs don't return data, the site may be
down")
+
+            self._patrolToken[index] = rcData[0]['patroltoken']
+
+        return self._patrolToken[index]
+
+    def getFilesFromAnHash(self, hash_found = None):
+        """ Function that uses APIs to give the images that has the same
hash. Useful
+            to find duplicates or nowcommons.
+
+            NOTE: it returns also the image itself, if you don't want it, just
+            filter the list returned.
+
+            NOTE 2: it returns the image WITHOUT the image namespace.
+        """
+        if self.versionnumber() < 12:
+            return None
+
+        if hash_found is None: # If the hash is none return None and not continue
+            return None
+        # Now get all the images with the same hash
+        #action=query&format=xml&list=allimages&aisha1=%s
+        image_namespace = "%s:" % self.image_namespace() # Image:
+        params = {
+            'action'    :'query',
+            'list'      :'allimages',
+            'aisha1'    :hash_found,
+        }
+        allimages = query.GetData(params, self, encodeTitle =
False)['query']['allimages']
+        files = list()
+        for imagedata in allimages:
+            image = imagedata[u'name']
+            files.append(image)
+        return files
+
+    def getParsedString(self, string, keeptags = [u'*']):
+        """Parses the string with API and returns html content.
+
+           @param string: String that should be parsed.
+           @type  string: string
+           @param keeptags: Defines which tags (wiki, HTML) should NOT be removed.
+           @type  keeptags: list
+
+           Returns the string given, parsed through the wiki parser.
+        """
+
+        if not self.has_api():
+            raise Exception('parse: no API: not implemented')
+
+        # call the wiki to get info
+        params = {
+            u'action' : u'parse',
+            u'text'   : string,
+        }
+
+        pywikibot.get_throttle()
+        pywikibot.output(u"Parsing string through the wiki parser via API.")
+
+        result = query.GetData(params, self)
+        r = result[u'parse'][u'text'][u'*']
+
+        # disable/remove comments
+        r = pywikibot.removeDisabledParts(r, tags = ['comments']).strip()
+
+        # disable/remove ALL tags
+        if not (keeptags == [u'*']):
+            r = removeHTMLParts(r, keeptags = keeptags).strip()
+
+        return r
+
+    def getExpandedString(self, string):
+        """Expands the string with API and returns wiki content.
+
+           @param string: String that should be expanded.
+           @type  string: string
+
+           Returns the string given, expanded through the wiki parser.
+        """
+
+        if not self.has_api():
+            raise Exception('expandtemplates: no API: not implemented')
+
+        # call the wiki to get info
+        params = {
+            u'action' : u'expandtemplates',
+            u'text'   : string,
+        }
+
+        pywikibot.get_throttle()
+        pywikibot.output(u"Expanding string through the wiki parser via API.")
+
+        result = query.GetData(params, self)
+        r = result[u'expandtemplates'][u'*']
+
+        return r
+
+# Caches to provide faster access
+_sites = {}
+_namespaceCache = {}
+
+@deprecate_arg("persistent_http", None)
+def getSite(code=None, fam=None, user=None, noLogin=False):
+    if code is None:
+        code = default_code
+    if fam is None:
+        fam = default_family
+    key = '%s:%s:%s' % (fam, code, user)
+    if key not in _sites:
+        _sites[key] = Site(code=code, fam=fam, user=user)
+    ret =  _sites[key]
+    if not ret.family.isPublic(code) and not noLogin:
+        ret.forceLogin()
+    return ret
+
+def setSite(site):
+    global default_code, default_family
+    default_code = site.language()
+    default_family = site.family
+
+# Command line parsing and help
+
+def calledModuleName():
+    """Return the name of the module calling this function.
+
+    This is required because the -help option loads the module's docstring
+    and because the module name will be used for the filename of the log.
+
+    """
+    # get commandline arguments
+    called = sys.argv[0].strip()
+    if ".py" in called:  # could end with .pyc, .pyw, etc. on some platforms
+        # clip off the '.py?' filename extension
+        called = called[:called.rindex('.py')]
+    return os.path.basename(called)
+
+def _decodeArg(arg):
+    # We may pass a Unicode string to a script upon importing and calling
+    # main() from another script.
+    if isinstance(arg,unicode):
+        return arg
+    if sys.platform == 'win32':
+        if config.console_encoding in ('cp437', 'cp850'):
+            # Western Windows versions give parameters encoded as windows-1252
+            # even though the console encoding is cp850 or cp437.
+            return unicode(arg, 'windows-1252')
+        elif config.console_encoding == 'cp852':
+            # Central/Eastern European Windows versions give parameters encoded
+            # as windows-1250 even though the console encoding is cp852.
+            return unicode(arg, 'windows-1250')
+        else:
+            return unicode(arg, config.console_encoding)
+    else:
+        # Linux uses the same encoding for both.
+        # I don't know how non-Western Windows versions behave.
+        return unicode(arg, config.console_encoding)
+
+def handleArgs(*args):
+    """Handle standard command line arguments, return the rest as a list.
+
+    Takes the commandline arguments, converts them to Unicode, processes all
+    global parameters such as -lang or -log. Returns a list of all arguments
+    that are not global. This makes sure that global arguments are applied
+    first, regardless of the order in which the arguments were given.
+
+    args may be passed as an argument, thereby overriding sys.argv
+
+    """
+    global default_code, default_family, verbose, debug, simulate
+    # get commandline arguments if necessary
+    if not args:
+        args = sys.argv[1:]
+    # get the name of the module calling this function. This is
+    # required because the -help option loads the module's docstring and because
+    # the module name will be used for the filename of the log.
+    moduleName = calledModuleName()
+    nonGlobalArgs = []
+    username = None
+    do_help = False
+    for arg in args:
+        arg = _decodeArg(arg)
+        if arg == '-help':
+            do_help = True
+        elif arg.startswith('-dir:'):
+            pass # config_dir = arg[5:] // currently handled in wikipediatools.py -
possibly before this routine is called.
+        elif arg.startswith('-family:'):
+            default_family = arg[8:]
+        elif arg.startswith('-lang:'):
+            default_code = arg[6:]
+        elif arg.startswith("-user:"):
+            username = arg[len("-user:") : ]
+        elif arg.startswith('-putthrottle:'):
+            config.put_throttle = int(arg[len("-putthrottle:") : ])
+            put_throttle.setDelay()
+        elif arg.startswith('-pt:'):
+            config.put_throttle = int(arg[len("-pt:") : ])
+            put_throttle.setDelay()
+        elif arg.startswith("-maxlag:"):
+            config.maxlag = int(arg[len("-maxlag:") : ])
+        elif arg == '-log':
+            setLogfileStatus(True)
+        elif arg.startswith('-log:'):
+            setLogfileStatus(True, arg[5:])
+        elif arg == '-nolog':
+            setLogfileStatus(False)
+        elif arg in ['-verbose', '-v']:
+            verbose += 1
+        elif arg == '-daemonize':
+            import daemonize
+            daemonize.daemonize()
+        elif arg.startswith('-daemonize:'):
+            import daemonize
+            daemonize.daemonize(redirect_std = arg[11:])
+        elif arg in ['-cosmeticchanges', '-cc']:
+            config.cosmetic_changes = not config.cosmetic_changes
+            output(u'NOTE: option cosmetic_changes is %s\n' %
config.cosmetic_changes)
+        elif arg == '-simulate':
+            simulate = True
+        elif arg == '-dry':
+            output(u"Usage of -dry is deprecated; use -simulate instead.")
+            simulate = True
+        # global debug option for development purposes. Normally does nothing.
+        elif arg == '-debug':
+            debug = True
+            config.special_page_limit = 500
+        else:
+            # the argument is not global. Let the specific bot script care
+            # about it.
+            nonGlobalArgs.append(arg)
+
+    if username:
+        config.usernames[default_family][default_code] = username
+
+    # TEST for bug #3081100
+    if unicode_error:
+        output("""
+
+================================================================================
+\03{lightyellow}WARNING:\03{lightred} your python version might trigger issue
#3081100\03{default}
+More information: See https://sourceforge.net/support/tracker.php?aid=3081100
+\03{lightyellow}Please update python to 2.7.2+ if you are running on wikimedia
sites!\03{default}
+================================================================================
+
+""")
+    if verbose:
+      output(u'Pywikipediabot %s' % (version.getversion()))
+      output(u'Python %s' % sys.version)
+
+    if do_help:
+        showHelp()
+        sys.exit(0)
+    return nonGlobalArgs
+
+def showHelp(moduleName=None):
+    # the parameter moduleName is deprecated and should be left out.
+    moduleName = moduleName or calledModuleName()
+    try:
+        moduleName = moduleName[moduleName.rindex("\\")+1:]
+    except ValueError: # There was no \ in the module name, so presumably no problem
+        pass
+
+    globalHelp = u'''
+Global arguments available for all bots:
+
+-dir:PATH         Read the bot's configuration data from directory given by
+                  PATH, instead of from the default directory.
+
+-lang:xx          Set the language of the wiki you want to work on, overriding
+                  the configuration in user-config.py. xx should be the
+                  language code.
+
+-family:xyz       Set the family of the wiki you want to work on, e.g.
+                  wikipedia, wiktionary, wikitravel, ...
+                  This will override the configuration in user-config.py.
+
+-user:xyz         Log in as user 'xyz' instead of the default username.
+
+-daemonize:xyz    Immediately return control to the terminal and redirect
+                  stdout and stderr to xyz (only use for bots that require
+                  no input from stdin).
+
+-help             Show this help text.
+
+-log              Enable the logfile, using the default filename
+                  "%s.log"
+                  Logs will be stored in the logs subdirectory.
+
+-log:xyz          Enable the logfile, using 'xyz' as the filename.
+
+-maxlag           Sets a new maxlag parameter to a number of seconds. Defer bot
+                  edits during periods of database server lag. Default is set by
+                  config.py
+
+-nolog            Disable the logfile (if it is enabled by default).
+
+-putthrottle:n    Set the minimum time (in seconds) the bot will wait between
+-pt:n             saving pages.
+
+-verbose          Have the bot provide additional output that may be
+-v                useful in debugging.
+
+-cosmeticchanges  Toggles the cosmetic_changes setting made in config.py or
+-cc               user_config.py to its inverse and overrules it. All other
+                  settings and restrictions are untouched.
+
+-simulate         Disables writing to the server. Useful for testing and
+(-dry)            debugging of new code (if given, doesn't do any real
+                  changes, but only shows what would have been changed).
+                  DEPRECATED: please use -simulate instead of -dry
+''' % moduleName
+    output(globalHelp, toStdout=True)
+    try:
+        exec('import %s as module' % moduleName)
+        helpText = module.__doc__.decode('utf-8')
+        if hasattr(module, 'docuReplacements'):
+            for key, value in module.docuReplacements.iteritems():
+                helpText = helpText.replace(key, value.strip('\n\r'))
+        output(helpText, toStdout=True)
+    except:
+        output(u'Sorry, no help available for %s' % moduleName)
+
+#########################
+# Interpret configuration
+#########################
+
+# search for user interface module in the 'userinterfaces' subdirectory
+sys.path.append(config.datafilepath('userinterfaces'))
+exec "import %s_interface as uiModule" % config.userinterface
+ui = uiModule.UI()
+verbose = 0
+debug = False
+simulate = False
+
+# TEST for bug #3081100
+unicode_error = __import__('unicodedata').normalize(
+    'NFC',
+    u'\u092e\u093e\u0930\u094d\u0915
\u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917'
+    ) != u'\u092e\u093e\u0930\u094d\u0915
\u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917'
+if unicode_error:
+    print u'unicode test: triggers problem #3081100'
+
+default_family = config.family
+default_code = config.mylang
+logfile = None
+# Check
+
+# if the default family+wiki is a non-public one,
+# getSite will try login in. We don't want that, the module
+# is not yet loaded.
+getSite(noLogin=True)
+
+# Set socket timeout
+socket.setdefaulttimeout(config.socket_timeout)
+
+def writeToCommandLogFile():
+    """
+    Save the name of the called module along with all parameters to
+    logs/commands.log so that the user can look it up later to track errors
+    or report bugs.
+    """
+    modname = os.path.basename(sys.argv[0])
+    # put quotation marks around all parameters
+    args = [_decodeArg(modname)] + [_decodeArg('"%s"' % s) for s in
sys.argv[1:]]
+    commandLogFilename = config.datafilepath('logs', 'commands.log')
+    try:
+        commandLogFile = codecs.open(commandLogFilename, 'a', 'utf-8')
+    except IOError:
+        commandLogFile = codecs.open(commandLogFilename, 'w', 'utf-8')
+    # add a timestamp in ISO 8601 formulation
+    isoDate = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime())
+    commandLogFile.write("%s r%s Python %s "
+                         % (isoDate, version.getversiondict()['rev'],
+                            sys.version.split()[0]))
+    s = u' '.join(args)
+    commandLogFile.write(s + os.linesep)
+    commandLogFile.close()
+
+def setLogfileStatus(enabled, logname = None):
+    global logfile
+    if enabled:
+        if not logname:
+            logname = '%s.log' % calledModuleName()
+        logfn = config.datafilepath('logs', logname)
+        try:
+            logfile = codecs.open(logfn, 'a', 'utf-8')
+        except IOError:
+            logfile = codecs.open(logfn, 'w', 'utf-8')
+    else:
+        # disable the log file
+        logfile = None
+
+if '*' in config.log or calledModuleName() in config.log:
+    setLogfileStatus(True)
+
+writeToCommandLogFile()
+
+colorTagR = re.compile('\03{.*?}', re.UNICODE)
+
+def log(text):
+    """Write the given text to the logfile."""
+    if logfile:
+        # remove all color markup
+        plaintext = colorTagR.sub('', text)
+        # save the text in a logfile (will be written in utf-8)
+        logfile.write(plaintext)
+        logfile.flush()
+
+output_lock = threading.Lock()
+input_lock = threading.Lock()
+output_cache = []
+
+def output(text, decoder=None, newline=True, toStdout=False, **kwargs):
+    """Output a message to the user via the userinterface.
+
+    Works like print, but uses the encoding used by the user's console
+    (console_encoding in the configuration file) instead of ASCII.
+    If decoder is None, text should be a unicode string. Otherwise it
+    should be encoded in the given encoding.
+
+    If newline is True, a linebreak will be added after printing the text.
+
+    If toStdout is True, the text will be sent to standard output,
+    so that it can be piped to another process. All other text will
+    be sent to stderr. See: http://en.wikipedia.org/wiki/Pipeline_%28Unix%29
+
+    text can contain special sequences to create colored output. These
+    consist of the escape character \03 and the color name in curly braces,
+    e. g. \03{lightpurple}. \03{default} resets the color.
+
+    """
+    output_lock.acquire()
+    try:
+        if decoder:
+            text = unicode(text, decoder)
+        elif type(text) is not unicode:
+            if verbose and sys.platform != 'win32':
+                print "DBG> BUG: Non-unicode (%s) passed to wikipedia.output
without decoder!" % type(text)
+                print traceback.print_stack()
+                print "DBG> Attempting to recover, but please report this
problem"
+            try:
+                text = unicode(text, 'utf-8')
+            except UnicodeDecodeError:
+                text = unicode(text, 'iso8859-1')
+        if newline:
+            text += u'\n'
+        log(text)
+        if input_lock.locked():
+            cache_output(text, toStdout = toStdout)
+        else:
+            ui.output(text, toStdout = toStdout)
+    finally:
+        output_lock.release()
+
+def cache_output(*args, **kwargs):
+    output_cache.append((args, kwargs))
+
+def flush_output_cache():
+    while(output_cache):
+        (args, kwargs) = output_cache.pop(0)
+        ui.output(*args, **kwargs)
+
+# User input functions
+
+def input(question, password = False):
+    """Ask the user a question, return the user's answer.
+
+    Parameters:
+    * question - a unicode string that will be shown to the user. Don't add a
+                 space after the question mark/colon, this method will do this
+                 for you.
+    * password - if True, hides the user's input (for password entry).
+
+    Returns a unicode string.
+
+    """
+    input_lock.acquire()
+    try:
+        data = ui.input(question, password)
+    finally:
+        flush_output_cache()
+        input_lock.release()
+
+    return data
+
+def inputChoice(question, answers, hotkeys, default = None):
+    """Ask the user a question with several options, return the user's
choice.
+
+    The user's input will be case-insensitive, so the hotkeys should be
+    distinctive case-insensitively.
+
+    Parameters:
+    * question - a unicode string that will be shown to the user. Don't add a
+                 space after the question mark, this method will do this
+                 for you.
+    * answers  - a list of strings that represent the options.
+    * hotkeys  - a list of one-letter strings, one for each answer.
+    * default  - an element of hotkeys, or None. The default choice that will
+                 be returned when the user just presses Enter.
+
+    Returns a one-letter string in lowercase.
+
+    """
+    input_lock.acquire()
+    try:
+        data = ui.inputChoice(question, answers, hotkeys, default).lower()
+    finally:
+        flush_output_cache()
+        input_lock.release()
+
+    return data
+
+
+page_put_queue = Queue.Queue(config.max_queue_size)
+def async_put():
+    """Daemon; take pages from the queue and try to save them on the
wiki."""
+    while True:
+        (page, newtext, comment, watchArticle,
+                 minorEdit, force, callback) = page_put_queue.get()
+        if page is None:
+            # an explicit end-of-Queue marker is needed for compatibility
+            # with Python 2.4; in 2.5, we could use the Queue's task_done()
+            # and join() methods
+            return
+        try:
+            page.put(newtext, comment, watchArticle, minorEdit, force)
+            error = None
+        except Exception, error:
+            pass
+        if callback is not None:
+            callback(page, error)
+            # if callback is provided, it is responsible for exception handling
+            continue
+        if isinstance(error, SpamfilterError):
+            output(u"Saving page %s prevented by spam filter: %s"
+                   % (page, error.url))
+        elif isinstance(error, PageNotSaved):
+            output(u"Saving page %s failed: %s" % (page, error))
+        elif isinstance(error, LockedPage):
+            output(u"Page %s is locked; not saved." % page)
+        elif isinstance(error, NoUsername):
+            output(u"Page %s not saved; sysop privileges required." % page)
+        elif error is not None:
+            tb = traceback.format_exception(*sys.exc_info())
+            output(u"Saving page %s failed:\n%s" % (page,
"".join(tb)))
+
+_putthread = threading.Thread(target=async_put)
+# identification for debugging purposes
+_putthread.setName('Put-Thread')
+_putthread.setDaemon(True)
+## Don't start the queue if it is not necessary.
+#_putthread.start()
+
+def stopme():
+    """This should be run when a bot does not interact with the Wiki, or
+       when it has stopped doing so. After a bot has run stopme() it will
+       not slow down other bots any more.
+    """
+    get_throttle.drop()
+
+def _flush():
+    """Wait for the page-putter to flush its queue.
+
+    Called automatically upon exiting from Python.
+
+    """
+    def remaining():
+        import datetime
+        remainingPages = page_put_queue.qsize() - 1
+            # -1 because we added a None element to stop the queue
+        remainingSeconds = datetime.timedelta(
+            seconds=(remainingPages * put_throttle.getDelay(True)))
+        return (remainingPages, remainingSeconds)
+
+    page_put_queue.put((None, None, None, None, None, None, None))
+
+    if page_put_queue.qsize() > 1:
+        output(u'Waiting for %i pages to be put. Estimated time remaining: %s'
+               % remaining())
+
+    while(_putthread.isAlive()):
+        try:
+            _putthread.join(1)
+        except KeyboardInterrupt:
+            answer = inputChoice(u"""\
+There are %i pages remaining in the queue. Estimated time remaining: %s
+Really exit?"""
+                                     % remaining(),
+                                 ['yes', 'no'], ['y',
'N'], 'N')
+            if answer == 'y':
+                return
+    try:
+        get_throttle.drop()
+    except NameError:
+        pass
+    if config.use_diskcache and not config.use_api:
+        for site in _sites.itervalues():
+            if site._mediawiki_messages:
+                try:
+                    site._mediawiki_messages.delete()
+                except OSError:
+                    pass
+
+import atexit
+atexit.register(_flush)
+
+def debugDump(name, site, error, data):
+    import time
+    name = unicode(name)
+    error = unicode(error)
+    site = unicode(repr(site).replace(u':',u'_'))
+    filename = '%s_%s__%s.dump' % (name, site, time.asctime())
+    filename = filename.replace('
','_').replace(':','-')
+    f = file(filename, 'wb') #trying to write it in binary
+    #f = codecs.open(filename, 'w', 'utf-8')
+    f.write(u'Error reported: %s\n\n' % error)
+    try:
+        f.write(data.encode("utf8"))
+    except UnicodeDecodeError:
+        f.write(data)
+    f.close()
+    output( u'ERROR: %s caused error %s. Dump %s created.' %
(name,error,filename) )
+
+get_throttle = Throttle()
+put_throttle = Throttle(write=True)
+
+def decompress_gzip(data):
+    # Use cStringIO if available
+    # TODO: rewrite gzip.py such that it supports unseekable fileobjects.
+    if data:
+        try:
+            from cStringIO import StringIO
+        except ImportError:
+            from StringIO import StringIO
+        import gzip
+        try:
+            data = gzip.GzipFile(fileobj = StringIO(data)).read()
+        except IOError:
+            raise
+    return data
+
+def parsetime2stamp(tz):
+    s = time.strptime(tz, "%Y-%m-%dT%H:%M:%SZ")
+    return int(time.strftime("%Y%m%d%H%M%S", s))
+
+
+#Redirect Handler for urllib2
+class U2RedirectHandler(urllib2.HTTPRedirectHandler):
+
+    def redirect_request(self, req, fp, code, msg, headers, newurl):
+        newreq = urllib2.HTTPRedirectHandler.redirect_request(self, req, fp, code, msg,
headers, newurl)
+        if (newreq.get_method() == "GET"):
+            for cl in "Content-Length", "Content-length",
"content-length", "CONTENT-LENGTH":
+                if newreq.has_header(cl):
+                    del newreq.headers[cl]
+        return newreq
+
+    def http_error_301(self, req, fp, code, msg, headers):
+        result = urllib2.HTTPRedirectHandler.http_error_301(
+            self, req, fp, code, msg, headers)
+        result.code = code
+        result.sheaders = [v for v in headers.__str__().split('\n') if
v.startswith('Set-Cookie:')]
+        return result
+
+    def http_error_302(self, req, fp, code, msg, headers):
+        result = urllib2.HTTPRedirectHandler.http_error_302(
+            self, req, fp, code, msg, headers)
+        result.code = code
+        result.sheaders = [v for v in headers.__str__().split('\n') if
v.startswith('Set-Cookie:')]
+        return result
+
+# Site Cookies handler
+COOKIEFILE = config.datafilepath('login-data', 'cookies.lwp')
+cj = cookielib.LWPCookieJar()
+if os.path.isfile(COOKIEFILE):
+    cj.load(COOKIEFILE)
+
+cookieProcessor = urllib2.HTTPCookieProcessor(cj)
+
+
+MyURLopener = urllib2.build_opener(U2RedirectHandler)
+
+if config.proxy['host']:
+    proxyHandler = urllib2.ProxyHandler({'http':'http://%s/' %
config.proxy['host'] })
+
+    MyURLopener.add_handler(proxyHandler)
+    if config.proxy['auth']:
+        proxyAuth = urllib2.HTTPPasswordMgrWithDefaultRealm()
+        proxyAuth.add_password(None, config.proxy['host'],
config.proxy['auth'][0], config.proxy['auth'][1])
+        proxyAuthHandler = urllib2.ProxyBasicAuthHandler(proxyAuth)
+
+        MyURLopener.add_handler(proxyAuthHandler)
+
+if config.authenticate:
+    passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
+    for site in config.authenticate:
+        passman.add_password(None, site, config.authenticate[site][0],
config.authenticate[site][1])
+    authhandler = urllib2.HTTPBasicAuthHandler(passman)
+
+    MyURLopener.add_handler(authhandler)
+
+MyURLopener.addheaders = [('User-agent', useragent)]
+
+# This is a temporary part for the 2012 version survey
+# http://thread.gmane.org/gmane.comp.python.pywikipediabot.general/12473
+# Upon removing the connected lines from config.py should be removed, too.
+if not config.suppresssurvey:
+        output(
+"""
+\03{lightyellow}Dear Pywikipedia user!\03{default}
+Pywikibot has detected that you use this outdated version of Python:
+%s.
+We would like to hear your voice before ceasing support of this version.
+Please update to \03{lightyellow}Python 2.7.2\03{default} or higher if possible or visit
+http://www.mediawiki.org/wiki/Pywikipediabot/Survey2012 to tell us why we
+should support your version and to learn how to hide this message.
+After collecting opinions for a time we will decide and announce the deadline
+of deprecating use of old Python versions for Pywikipedia.
+""" % sys.version)
+
+if __name__ == '__main__':
+    import doctest
+    print 'Pywikipediabot %s' % version.getversion()
+    print 'Python %s' % sys.version
+    doctest.testmod()
+