Revision: 8100
Author: xqt
Date: 2010-04-16 08:47:13 +0000 (Fri, 16 Apr 2010)
Log Message:
-----------
update from trunk
Modified Paths:
--------------
branches/rewrite/scripts/interwiki.py
Modified: branches/rewrite/scripts/interwiki.py
===================================================================
--- branches/rewrite/scripts/interwiki.py 2010-04-16 06:54:47 UTC (rev 8099)
+++ branches/rewrite/scripts/interwiki.py 2010-04-16 08:47:13 UTC (rev 8100)
@@ -39,11 +39,18 @@
This implies -noredirect.
-restore: restore a set of "dumped" pages the robot was working on
- when it terminated.
+ when it terminated. The dump file will be subsequently
+ removed.
+ -restore:all restore a set of "dumped" pages of all dumpfiles to a given
+ family remaining in the "interwiki-dumps" directory. All
+ these dump files will be subsequently removed. If restoring
+ process interrupts again, it saves all unprocessed pages in
+ one new dump file of the given site.
+
-continue: like restore, but after having gone through the dumped pages,
continue alphabetically starting at the last of the dumped
- pages.
+ pages. The dump file will be subsequently removed.
-warnfile: used as -warnfile:filename, reads all warnings from the
given file that apply to the home wiki language,
@@ -53,6 +60,8 @@
against the live wiki is using the warnfile.py
script.
+ -quiet: Use this option to get less output
+
Additionaly, these arguments can be used to restrict the bot to certain pages:
-namespace:n Number or name of namespace to process. The parameter can be
@@ -65,22 +74,28 @@
that amount of pages and then stop. This is only useful in
combination with -start. The default is not to stop.
- -until: used as -until:title, specifies that the robot should process
- pages in wiki default sort order up to, and including, "title"
- and then stop. This is only useful in combination with -start.
- The default is not to stop.
+ -until: used as -until:title, specifies that the robot should
+ process pages in wiki default sort order up to, and
+ including, "title" and then stop. This is only useful in
+ combination with -start. The default is not to stop.
Note: do not specify a namespace, even if -start has one.
- -bracket only work on pages that have (in the home language) parenthesis
- in their title. All other pages are skipped.
+ -bracket only work on pages that have (in the home language)
+ parenthesis in their title. All other pages are skipped.
(note: without ending colon)
-skipfile: used as -skipfile:filename, skip all links mentioned in
the given file. This does not work with -number!
-skipauto use to skip all pages that can be translated automatically,
- like dates, centuries, months, etc. (note: without ending colon)
+ like dates, centuries, months, etc.
+ (note: without ending colon)
+ -lack: used as -lack:xx with xx a language code: only work on pages
+ without links to language xx. You can also add a number nn
+ lick -lack:xx:nn, so that the bot only works on pages with
+ at least n interwiki links (the default value for n is 1).
+
These arguments are useful to provide hints to the bot:
-hint: used as -hint:de:Anweisung to give the robot a hint
@@ -88,7 +103,8 @@
useful if you specify a single page to work on. If no
text is given after the second ':', the name of the page
itself is used as the title for the hint, unless the
- -hintnobracket command line option (see there) is also selected.
+ -hintnobracket command line option (see there) is also
+ selected.
There are some special hints, trying a number of languages
at once:
@@ -170,10 +186,13 @@
These arguments specify in which way the bot should follow interwiki links:
- -noredirect do not follow redirects. (note: without ending colon)
+ -noredirect do not follow redirects nor category redirects.
+ (note: without ending colon)
- -initialredirect work on its target if a redirect is entered on the
- command line. (note: without ending colon)
+ -initialredirect work on its target if a redirect or category redirect is
+ entered on the command line or by a generator (note: without
+ ending colon). It is recommended to use this option with
+ -movelog pagegenerator.
-neverlink: used as -neverlink:xx where xx is a language code:
Disregard any links found to language xx. You can also
@@ -208,15 +227,15 @@
The following arguments are only important for users who have accounts for
multiple languages, and specify on which sites the bot should modify pages:
- -localonly only work on the local wiki, not on other wikis in the family
- I have a login at. (note: without ending colon)
+ -localonly only work on the local wiki, not on other wikis in the
+ family I have a login at. (note: without ending colon)
-limittwo only update two pages - one in the local wiki (if logged-in)
and one in the top available one.
For example, if the local page has links to de and fr,
this option will make sure that only local and de: (larger)
- site is updated. This option is useful to quickly set two way
- links without updating all of wiki's sites.
+ site is updated. This option is useful to quickly set two
+ way links without updating all of wiki's sites.
(note: without ending colon)
-whenneeded works like limittwo, but other languages are changed in the
@@ -262,15 +281,16 @@
If interwiki.py is terminated before it is finished, it will write a dump file
to the interwiki-dumps subdirectory. The program will read it if invoked with
the "-restore" or "-continue" option, and finish all the subjects in that list.
-To run the interwiki-bot on all pages on a language, run it with option
-"-start:!", and if it takes so long you have to break it off, use "-continue"
-next time.
+After finishing the dump file will be deleted. To run the interwiki-bot on all
+pages on a language, run it with option "-start:!", and if it takes so long you
+have to break it off, use "-continue" next time.
+
"""
#
# (C) Rob W.W. Hooft, 2003
# (C) Daniel Herding, 2004
# (C) Yuri Astrakhan, 2005-2006
-# (C) Pywikipedia bot team, 2007-2009
+# (C) Pywikipedia bot team, 2007-2010
#
# Distributed under the terms of the MIT license.
#
@@ -279,7 +299,8 @@
import sys, copy, re, os
import time
-import codecs, pickle
+import codecs
+import pickle
import socket
try:
@@ -362,8 +383,12 @@
msg = {
'af': (u'robot ', u'Bygevoeg', u'Verwyder', u'Verander'),
'als': (u'Bötli: ', u'Ygfüegt', u'Ussergnoh', u'Gändret'),
+ 'am': (u'ሎሌ ', u'መጨመር', u'ማስወገድ', u'ማስተካከል'),
+ 'ang': (u'Robot ', u'ēcung', u'fornimung', u'onhweorfung'),
'ar': (u'روبوت ', u'إضافة', u'إزالة', u'تعديل'),
+ 'arc': (u'ܪܘܒܘܛ ', u'ܬܘܣܦܬܐ', u'ܠܚܝܐ', u'ܚܘܠܦܐ'),
'az': (u'Bot redaktəsi ', u'əlavə edilir', u'çıxardılır', u'dəyişdirilir'),
+ 'ba': (u'робот ', u'өҫтәне', u'юйҙы', u'үҙгәртте'),
'bar': (u'Boterl: ', u'Aini', u'Aussi', u'Obàsst'),
'bat-smg': (u'robots ', u'Pridedama', u'Trėnama', u'Keitama'),
'bcl': (u'robot ', u'minadugang', u'minahali', u'minamodifikar'),
@@ -371,9 +396,11 @@
'be-x-old': (u'робат ', u'дадаў', u'выдаліў', u'зьмяніў'),
'bg': (u'Робот ', u'Добавяне', u'Изтриване', u'Промяна'),
'bn': (u'রোবট ', u'যোগ করছে', u'মুছে ফেলছে', u'পরিবর্তন সাধন করছে'),
+ 'bo': (u'འཕྲུལ་ཆས་ཀྱི་མི། ', u'ཁ་སྣོན་རྒྱག་པ།', u'བསུབ་པ།', u'བསྐྱར་བཅོས་བྱེད་པ།'),
'bpy': (u'রোবট ', u'তিলকরের', u'থেইকরের', u'বদালার'),
'br': (u'Robot ', u'ouzhpennet', u'tennet', u'kemmet'),
'ca': (u'Robot ', u'afegeix', u'esborra', u'modifica'),
+ 'ce': (u'робот ', u'тIетоьхна', u'дIаяьккхина', u'хийцина'),
'ceb': (u'robot ', u'Gidugang', u'Gitangtang', u'Gimodipikar'),
'crh': (u'robot ', u'ekley', u'çetleştire', u'deñiştire'),
'cs': (u'robot ', u'přidal', u'odebral', u'změnil'),
@@ -397,13 +424,17 @@
'frp': (u'robot ', u'Apond', u'Retire', u'Modifie'),
'fur': (u'Robot: ', u'o zonti', u'o cambii', u'o gjavi'),
'fy': (u'Bot ', u'- derby', u'- fuort', u'- oars'),
+ 'ga': (u'róbat ', u'ag suimiú', u'ag baint', u'ag mionathrú'),
'gl': (u'bot ', u'Engadido', u'Eliminado', u'Modificado'),
'gn': (u'bot ', u'ojoapy', u'oñembogue', u'oñemoambue'),
+ 'gu': (u'રોબોટ ', u'ઉમેરણ', u'હટાવ્યું', u'ફેરફાર'),
+ 'gv': (u'bot ', u'currit stiagh ec', u'scryssit magh ec', u'caghlaait ec'),
'he': (u'בוט ', u'מוסיף', u'מסיר', u'משנה'),
'hr': (u'robot ', u'Dodaje', u'Uklanja', u'Mijenja'),
'hsb': (u'bot ', u'přidał', u'wotstronił', u'změnił'),
'ht': (u'wobo ', u'Ajoute', u'Anlve', u'Modifye'),
'hu': (u'Bot: ', u'következő hozzáadása', u'következő eltávolítása', u'következő módosítása'),
+ 'hy': (u'Ռոբոտը ', u'ավելացնում է․', u'հեռացնում է․', u'փոփոխում է․'),
'ia': (u'Robot: ', u'Addition de', u'Elimination de', u'Modification de'),
'id': (u'bot ', u'Menambah', u'Membuang', u'Mengubah'),
'ie': (u'Bot: ', u'Adjuntet', u'Removet', u'Modificat'),
@@ -412,31 +443,40 @@
'it': (u'Bot: ', u'Aggiungo', u'Tolgo', u'Modifico'),
'ja': (u'ロボットによる ', u'追加', u'除去', u'変更'),
'ka': (u'ბოტის ', u'დამატება', u'წაშლა', u'შეცვლა'),
+ 'kab': (u'a rubut ', u'ti merniwt', u'a ḍegger', u'a senfel'),
'ko': (u'로봇이 ', u'더함', u'지움', u'바꿈'),
'kk': (u'Боттың ', u'үстегені', u'аластағаны', u'түзеткені'),
+ 'kl': (u'Robot ', u'Ilassut', u'Peersineq', u'Inisseeqqinneq'),
+ 'km': (u'រ៉ូបូ ', u'បន្ថែម', u'ដកចេញ', u'កែសំរួល'),
'ksh': (u'Bot: ', u'dobëijedonn', u'erußjenumme', u'ußjewääßelt'),
'ku': (u'robot ', u'serzêde kirin', u'jêbirin', u'guhêrandin'),
+ 'kw': (u'robot ', u'ow keworra', u'ow dilea', u'ow chanjya'),
'la': (u'bot ', u'addit', u'abdit', u'mutat'),
'lb': (u'Bot ', u'Derbäi setzen', u'Ewech huelen', u'Änneren'),
'lmo': (u'Robot ', u'jontant', u'trant via', u'modifiant'),
'ln': (u'bot ', u'ebakisí', u'elongólí', u'ebongolí'),
+ 'lo': (u'ໂຣບົດ ', u'ພວມເພີ່ມ', u'ພວມລຶບ', u'ພວມແປງ'),
'lt': (u'robotas ', u'Pridedama', u'Šalinama', u'Keičiama'),
'mi': (u'he karetao ', u'e tāpiri ana', u'e tango ana', u'e whakarerekē ana'),
'lv': (u'robots ', u'pievieno', u'izņem', u'izmaina'),
+ 'mdf': (u'бот ', u'поладозе', u'нардазе', u'полафтозе'),
+ 'mg': (u'Rôbô ', u'Nanampy', u'Nanala', u'Nanova'),
'mk': (u'Бот ', u'Додава', u'Брише', u'Менува'),
'ml': (u'യന്ത്രം ', u'ചേര്ക്കുന്നു', u'നീക്കുന്നു', u'പുതുക്കുന്നു'),
'mn': (u'робот ', u'Нэмж байна', u'Арилгаж байна', u'Өөрчилж байна'),
'mr': (u'सांगकाम्याने ', u'वाढविले', u'काढले', u'बदलले'),
'ms': (u'bot ', u'menambah', u'membuang', u'mengubah'),
+ 'myv': (u'роботось ', u'путызеть', u'нардызеть', u'полавтызеть'),
'mzn': (u'Rebot ', u'Biyeshten', u'Bayten', u'Hekărden'),
'nah': (u'Tepozcuayollotl', u'Tlamahxiltilli', u'Tlaquixtilli', u'Tlapatlalli'),
'nds': (u'IW-Bot: ', u'dorto', u'rut', u'ännert'),
- 'nds-nl': (u'bot', u'derbie', u'derof', u'aanders'),
+ 'nds-nl': (u'bot ', u'derbie', u'derof', u'aanders'),
'nl': (u'robot ', u'Erbij', u'Eraf', u'Anders'),
'nn': (u'robot ', u'la til', u'fjerna', u'endra'),
'no': (u'robot ', u'legger til', u'fjerner', u'endrer'),
'nov': (u'robote ', u'Adid', u'Ekartad', u'Modifikad'),
'nrm': (u'robot ', u'ajouôte', u'hale', u'amende'),
+ 'nv': (u'botígíí díí naaltsoos tʼáá bíniʼ łahgo áyiilaa ', u'(+)', u'(-)', u'(+/-)'),
'os': (u'Робот ', u'баххæст кодта', u'Баивта', u'Аиуварс'),
'pdc': (u'Bot: ', u'dezu geduh', u'raus gnumme', u'gennert'),
'pl': (u'robot ', u'dodaje', u'usuwa', u'poprawia'),
@@ -445,6 +485,7 @@
'qu': (u'Rurana antacha ', u'Yapasqa', u'Qullusqa', u'Hukchasqa'),
'ro': (u'Robot interwiki: ', u'Adăugat', u'Înlăturat',u'Modificat'),
'ru': (u'робот ', u'добавил', u'удалил', u'изменил'),
+ 'sah': (u'робот ', u'эптэ', u'сотто', u'уларытта'),
'sk': (u'robot ', u'Pridal', u'Odobral',u'Zmenil' ),
'sl': (u'robot ', u'Dodajanje', u'Odstranjevanje', u'Spreminjanje'),
'sq': (u'roboti ', u'shtoj', u'largoj', u'ndryshoj'),
@@ -461,27 +502,83 @@
'tl': (u'robot ', u'dinagdag', u'tinanggal', u'binago'),
'to': (u'mīsini', u'ʻoku tānaki', u'ʻoku toʻo', u'ʻoku liliu'),
'tr': (u'Bot değişikliği ', u'Ekleniyor', u'Kaldırılıyor', u'Değiştiriliyor'),
+ 'tt': (u'робот ', u'кушты', u'бетерде', u'үзгәртте'),
'th': (u'โรบอต ', u'เพิ่ม', u'ลบ', u'แก้ไข'),
+ 'udm': (u'робот ', u'ватсаз', u'ӵушиз', u'воштӥз'),
'uk': (u'робот ', u'додав', u'видалив', u'змінив'),
+ 'ur': (u'روبالہ ', u'جمع', u'محو', u'ترمیم'),
'uz': (u'Bot ', u'Qoʻshdi', u'Tuzatdi', u'Oʻchirdi'),
'vec': (u'Bot: ', u'Zonto', u'Cavo', u'Canbio'),
'vi': (u'robot ', u'Thêm', u'Dời', u'Thay'),
'vo': (u'bot ', u'läükon', u'moükon', u'votükon'),
'war':(u'robot ', u'Gindugngan', u'Gintanggal', u'Ginliwat'),
+ 'xal': (u'көдлвр ', u'немв', u'һарһв', u'сольв'),
'yi': (u'באט ', u'צוגעלייגט', u'אראפגענומען', u'געענדערט'),
+ 'yo': (u'Bot ', u'Fífikún', u'Yíyọkúrò', u'Títúnṣe'),
'yue': (u'機械人 ', u'加', u'減', u'改'),
'zh': (u'機器人 ', u'新增', u'移除', u'修改'),
'zh-classical': (u'僕 ', u'增', u'削', u'修'),
+ 'zh-min-nan': (u'bot ', u'ka-thiam', u'thiah-tû', u'siu-kái'),
'zh-yue': (u'機械人 ', u'加', u'減', u'改'),
}
+# Subpage templates. Must be in lower case,
+# whereas subpage itself must be case sensitive
+moved_links = {
+ 'bn' : (u'documentation', u'/doc'),
+ 'ca' : (u'ús de la plantilla', u'/ús'),
+ 'cs' : (u'dokumentace', u'/doc'),
+ 'de' : (u'dokumentation', u'/Meta'),
+ 'en' : ([u'documentation',
+ u'template documentation',
+ u'template doc',
+ u'doc',
+ u'documentation, template'], u'/doc'),
+ 'es' : ([u'documentación', u'documentación de plantilla'], u'/doc'),
+ 'eu' : (u'txantiloi dokumentazioa', u'/dok'),
+ # fi: no idea how to handle this type of subpage at :Metasivu:
+ 'fi' : (u'mallineohje', None),
+ 'fr' : ([u'/documentation', u'documentation', u'doc_modèle',
+ u'documentation modèle', u'documentation modèle compliqué',
+ u'documentation modèle en sous-page',
+ u'documentation modèle compliqué en sous-page',
+ u'documentation modèle utilisant les parserfunctions en sous-page',
+ ],
+ u'/Documentation'),
+ 'hu' : (u'sablondokumentáció', u'/doc'),
+ 'id' : (u'template doc', u'/doc'),
+ 'ja' : (u'documentation', u'/doc'),
+ 'ka' : (u'თარგის ინფო', u'/ინფო'),
+ 'ko' : (u'documentation', u'/설명문서'),
+ 'ms' : (u'documentation', u'/doc'),
+ 'pl' : (u'dokumentacja', u'/opis'),
+ 'pt' : ([u'documentação', u'/doc'], u'/doc'),
+ 'ro' : (u'documentaţie', u'/doc'),
+ 'ru' : (u'doc', u'/doc'),
+ 'sv' : (u'dokumentation', u'/dok'),
+ 'vi' : (u'documentation', u'/doc'),
+ 'zh' : ([u'documentation', u'doc'], u'/doc'),
+}
+
+# A list of template names in different languages.
+# Pages which contains these shouldn't be changed.
+ignoreTemplates = {
+ '_default': [u'delete'],
+ 'cs' : [u'Pracuje_se'],
+ 'de' : [u'inuse', u'löschen', u'sla', u'löschantrag', u'löschantragstext'],
+ 'en' : [u'inuse', u'softredirect'],
+ 'pdc': [u'lösche'],
+}
+
class Global(object):
- """Container class for global settings.
- Use of globals outside of this is to be avoided."""
+ """
+ Container class for global settings.
+ Use of globals outside of this is to be avoided.
+ """
autonomous = False
confirm = False
+ always = False
select = False
- debug = True
followredirect = True
initialredirect = False
force = False
@@ -491,9 +588,7 @@
skipauto = False
untranslated = False
untranslatedonly = False
- askhints = False
auto = True
- hintnobracket = False
neverlink = []
showtextlink = 0
showtextlinkadd = 300
@@ -507,9 +602,120 @@
followinterwiki = True
minsubjects = config.interwiki_min_subjects
nobackonly = False
+ askhints = False
+ hintnobracket = False
+ hints = []
hintsareright = False
contentsondisk = config.interwiki_contents_on_disk
+ lacklanguage = None
+ minlinks = 0
+ quiet = False
+ restoreAll = False
+ def readOptions(self, arg):
+ """ Read all commandline parameters for the global container """
+ if arg == '-noauto':
+ self.auto = False
+ elif arg.startswith('-hint:'):
+ self.hints.append(arg[6:])
+ elif arg.startswith('-hintfile'):
+ hintfilename = arg[10:]
+ if (hintfilename is None) or (hintfilename == ''):
+ hintfilename = pywikibot.input(u'Please enter the hint filename:')
+ f = codecs.open(hintfilename, 'r', config.textfile_encoding)
+ R = re.compile(ur'\[\[(.+?)(?:\]\]|\|)') # hint or title ends either before | or before ]]
+ for pageTitle in R.findall(f.read()):
+ self.hints.append(pageTitle)
+ f.close()
+ elif arg == '-force':
+ self.force = True
+ elif arg == '-same':
+ self.same = True
+ elif arg == '-wiktionary':
+ self.same = 'wiktionary'
+ elif arg == '-untranslated':
+ self.untranslated = True
+ elif arg == '-untranslatedonly':
+ self.untranslated = True
+ self.untranslatedonly = True
+ elif arg == '-askhints':
+ self.untranslated = True
+ self.untranslatedonly = False
+ self.askhints = True
+ elif arg == '-hintnobracket':
+ self.hintnobracket = True
+ elif arg == '-confirm':
+ self.confirm = True
+ elif arg == '-select':
+ self.select = True
+ elif arg == '-autonomous' or arg == '-auto':
+ self.autonomous = True
+ elif arg == '-noredirect':
+ self.followredirect = False
+ elif arg == '-initialredirect':
+ self.initialredirect = True
+ elif arg == '-localonly':
+ self.localonly = True
+ elif arg == '-limittwo':
+ self.limittwo = True
+ self.strictlimittwo = True
+ elif arg.startswith('-whenneeded'):
+ self.limittwo = True
+ self.strictlimittwo = False
+ try:
+ self.needlimit = int(arg[12:])
+ except KeyError:
+ pass
+ except ValueError:
+ pass
+ elif arg.startswith('-skipfile:'):
+ skipfile = arg[10:]
+ skipPageGen = pagegenerators.TextfilePageGenerator(skipfile)
+ for page in skipPageGen:
+ self.skip.add(page)
+ del skipPageGen
+ elif arg == '-skipauto':
+ self.skipauto = True
+ elif arg.startswith('-neverlink:'):
+ self.neverlink += arg[11:].split(",")
+ elif arg.startswith('-ignore:'):
+ self.ignore += [pywikibot.Page(None,p) for p in arg[8:].split(",")]
+ elif arg.startswith('-ignorefile:'):
+ ignorefile = arg[12:]
+ ignorePageGen = pagegenerators.TextfilePageGenerator(ignorefile)
+ for page in ignorePageGen:
+ self.ignore.append(page)
+ del ignorePageGen
+ elif arg == '-showpage':
+ self.showtextlink += self.showtextlinkadd
+ elif arg == '-graph':
+ # override configuration
+ config.interwiki_graph = True
+ elif arg == '-bracket':
+ self.parenthesesonly = True
+ elif arg == '-localright':
+ self.followinterwiki = False
+ elif arg == '-hintsareright':
+ self.hintsareright = True
+ elif arg.startswith('-array:'):
+ self.minsubjects = int(arg[7:])
+ elif arg.startswith('-query:'):
+ self.maxquerysize = int(arg[7:])
+ elif arg == '-back':
+ self.nobackonly = True
+ elif arg == '-quiet':
+ self.quiet = True
+ elif arg.startswith('-lack:'):
+ remainder = arg[6:].split(':')
+ self.lacklanguage = remainder[0]
+ if len(remainder) > 1:
+ self.minlinks = int(remainder[1])
+ else:
+ self.minlinks = 1
+ else:
+ return False
+ return True
+
class StoredPage(pywikibot.Page):
"""
Store the Page contents on disk to avoid sucking too much
@@ -748,6 +954,7 @@
self.untranslated = None
self.hintsAsked = False
self.forcedStop = False
+ self.workonme = True
def getFoundDisambig(self, site):
"""
@@ -771,7 +978,8 @@
"""
for tree in [self.done, self.pending]:
for page in tree.filter(site):
- if page.exists() and not page.isDisambig() and not page.isRedirectPage():
+ if page.exists() and not page.isDisambig() \
+ and not page.isRedirectPage() and not page.isCategoryRedirect():
return page
return None
@@ -785,7 +993,7 @@
for tree in [self.done, self.pending, self.todo]:
for page in tree.filter(site):
if page.namespace() == self.originPage.namespace():
- if page.exists() and not page.isRedirectPage():
+ if page.exists() and not page.isRedirectPage() and not page.isCategoryRedirect():
return page
return None
@@ -1002,7 +1210,8 @@
return False
def reportInterwikilessPage(self, page):
- pywikibot.output(u"NOTE: %s does not have any interwiki links" % self.originPage)
+ if not globalvar.quiet:
+ pywikibot.output(u"NOTE: %s does not have any interwiki links" % self.originPage)
if config.without_interwiki:
f = codecs.open(
pywikibot.config.datafilepath('without_interwiki.txt'), 'a', 'utf-8')
@@ -1010,7 +1219,11 @@
f.close()
def askForHints(self, counter):
- if (self.untranslated or globalvar.askhints) and not self.hintsAsked and not self.originPage.isRedirectPage():
+ if not self.workonme:
+ # Do not ask hints for pages that we don't work on anyway
+ return
+ if (self.untranslated or globalvar.askhints) and not self.hintsAsked \
+ and not self.originPage.isRedirectPage() and not self.originPage.isCategoryRedirect():
# Only once!
self.hintsAsked = True
if globalvar.untranslated:
@@ -1058,14 +1271,19 @@
if dictName is not None:
pywikibot.output(u'WARNING: %s:%s relates to %s:%s, which is an auto entry %s(%s)' % (self.originPage.site().language(), self.originPage, page.site().language(),page,dictName,year))
+ # Abort processing if the bot is running in autonomous mode.
+ if globalvar.autonomous:
+ self.makeForcedStop(counter)
+
# Register this fact at the todo-counter.
counter.minus(page.site())
+
# Now check whether any interwiki links should be added to the
# todo list.
-
if not page.exists():
- pywikibot.output(u"NOTE: %s does not exist" % page)
+ if not globalvar.quiet:
+ pywikibot.output(u"NOTE: %s does not exist" % page)
if page == self.originPage:
# The page we are working on is the page that does not exist.
# No use in doing any work on it in that case.
@@ -1076,22 +1294,36 @@
self.done = PageTree()
continue
- elif page.isRedirectPage():
+ elif page.isRedirectPage() or page.isCategoryRedirect():
+ if page.isRedirectPage():
+ redir = u''
+ else:
+ redir = u'category '
try:
- redirectTargetPage = page.getRedirectTarget()
+ if page.isRedirectPage():
+ redirectTargetPage = page.getRedirectTarget()
+ else:
+ redirectTargetPage = page.getCategoryRedirectTarget()
except pywikibot.InvalidTitle:
# MW considers #redirect [[en:#foo]] as a redirect page,
# but we can't do anything useful with such pages
- pywikibot.output(u"NOTE: %s redirects to an invalid title" % page)
+ if not globalvar.quiet:
+ pywikibot.output(u"NOTE: %s redirects to an invalid title"
+ % page)
continue
- pywikibot.output(u"NOTE: %s is redirect to %s" % (page, redirectTargetPage))
+ if not globalvar.quiet:
+ pywikibot.output(u"NOTE: %s is %sredirect to %s"
+ % (page, redir, redirectTargetPage))
if page == self.originPage:
if globalvar.initialredirect:
if globalvar.contentsondisk:
redirectTargetPage = StoredPage(redirectTargetPage)
- self.originPage = redirectTargetPage
- self.todo.add(redirectTargetPage)
- counter.plus(redirectTargetPage.site)
+ #don't follow double redirects; it might be a self loop
+ if not redirectTargetPage.isRedirectPage() \
+ and not redirectTargetPage.isCategoryRedirect():
+ self.originPage = redirectTargetPage
+ self.todo.add(redirectTargetPage)
+ counter.plus(redirectTargetPage.site)
else:
# This is a redirect page to the origin. We don't need to
# follow the redirection.
@@ -1100,25 +1332,39 @@
counter.minus(site, count)
self.todo = PageTree()
elif not globalvar.followredirect:
- pywikibot.output(u"NOTE: not following redirects.")
+ if not globalvar.quiet:
+ pywikibot.output(u"NOTE: not following %sredirects." % redir)
elif page.site().family == redirectTargetPage.site().family \
and not self.skipPage(page, redirectTargetPage, counter):
if self.addIfNew(redirectTargetPage, counter, page):
if config.interwiki_shownew:
- pywikibot.output(u"%s: %s gives new redirect %s" % (self.originPage, page, redirectTargetPage))
+ pywikibot.output(u"%s: %s gives new %sredirect %s"
+ % (self.originPage, page, redir, redirectTargetPage))
+ continue
+ # must be behind the page.isRedirectPage() part
+ # otherwise a redirect error would be raised
+ if page.isEmpty() and not page.isCategory():
+ if not globalvar.quiet:
+ pywikibot.output(u"NOTE: %s is empty. Skipping." % page)
+ if page == self.originPage:
+ for site, count in self.todo.siteCounts():
+ counter.minus(site, count)
+ self.todo = PageTree()
+ self.done = PageTree()
continue
elif page.section():
+ if not globalvar.quiet:
+ pywikibot.output(u"NOTE: %s is a page section. Skipping." % page)
continue
-
# Page exists, isnt a redirect, and is a plain link (no section)
-
try:
iw = page.langlinks()
except pywikibot.NoSuchSite:
- pywikibot.output(u"NOTE: site %s does not exist" % page.site())
+ if not globalvar.quiet:
+ pywikibot.output(u"NOTE: site %s does not exist" % page.site())
continue
(skip, alternativePage) = self.disambigMismatch(page, counter)
@@ -1132,7 +1378,7 @@
duplicate = None
for p in self.done.filter(page.site()):
- if p != page and p.exists() and not p.isRedirectPage():
+ if p != page and p.exists() and not p.isRedirectPage() and not p.isCategoryRedirect():
duplicate = p
break
@@ -1141,9 +1387,15 @@
if globalvar.untranslatedonly:
# Ignore the interwiki links.
iw = ()
+ if globalvar.lacklanguage:
+ if globalvar.lacklanguage in [link.site().language() for link in iw]:
+ iw = ()
+ self.workonme = False
+ if len(iw) < globalvar.minlinks:
+ iw = ()
+ self.workonme = False
- elif globalvar.autonomous and duplicate:
-
+ elif globalvar.autonomous and duplicate and not skip:
pywikibot.output(u"Stopping work on %s because duplicate pages"\
" %s and %s are found" % (self.originPage,
duplicate,
@@ -1167,7 +1419,8 @@
sys.exit()
iw = ()
elif page.isEmpty() and not page.isCategory():
- pywikibot.output(u"NOTE: %s is empty; ignoring it and its interwiki links" % page)
+ if not globalvar.quiet:
+ pywikibot.output(u"NOTE: %s is empty; ignoring it and its interwiki links" % page)
# Ignore the interwiki links
self.done.remove(page)
iw = ()
@@ -1228,7 +1481,7 @@
# Each value will be a list of pages.
new = {}
for page in self.done:
- if page.exists() and not page.isRedirectPage():
+ if page.exists() and not page.isRedirectPage() and not page.isCategoryRedirect():
site = page.site()
if site == self.originPage.site():
if page != self.originPage:
@@ -1274,7 +1527,7 @@
pywikibot.output(u" (%d) Found link to %s in:" % (i, page2))
self.whereReport(page2, indent = 8)
while True:
- answer = pywikibot.input(u"Which variant should be used [number, (n)one, (g)ive up] :")
+ answer = pywikibot.input(u"Which variant should be used? (<number>, [n]one, [g]ive up) ").lower()
if answer:
if answer == 'g':
return None
@@ -1328,11 +1581,15 @@
be told to make another get request first."""
if not self.isDone():
raise "Bugcheck: finish called before done"
- if self.forcedStop:
+ if not self.workonme:
+ return
+ if self.forcedStop: # autonomous with problem
pywikibot.output(u"======Aborted processing %s======" % self.originPage)
return
if self.originPage.isRedirectPage():
return
+ if self.originPage.isCategoryRedirect():
+ return
if not self.untranslated and globalvar.untranslatedonly:
return
# The following check is not always correct and thus disabled.
@@ -1344,7 +1601,7 @@
pywikibot.output(u"======Post-processing %s======" % self.originPage)
# Assemble list of accepted interwiki links
new = self.assemble()
- if new is None: # User said give up or autonomous with problem
+ if new is None: # User said give up
pywikibot.output(u"======Aborted processing %s======" % self.originPage)
return
@@ -1358,6 +1615,7 @@
updatedSites = []
notUpdatedSites = []
# Process all languages here
+ globalvar.always = False
if globalvar.limittwo:
lclSite = self.originPage.site()
lclSiteDone = False
@@ -1504,25 +1762,17 @@
old[page2.site()] = page2
# Check what needs to get done
- mods, adding, removing, modifying = compareLanguages(old, new, insite = page.site())
+ mods, mcomment, adding, removing, modifying = compareLanguages(old, new, insite = page.site())
# When running in autonomous mode without -force switch, make sure we don't remove any items, but allow addition of the new ones
if globalvar.autonomous and not globalvar.force and len(removing) > 0:
for rmsite in removing:
if rmsite != page.site(): # Sometimes sites have an erroneous link to itself as an interwiki
rmPage = old[rmsite]
- ##########
- # temporary hard-coded special case to get rid of thousands of broken links to the Lombard Wikipedia,
- # where useless bot-created articles were mass-deleted. See for example:
- # http://meta.wikimedia.org/wiki/Proposals_for_closing_projects/Closure_of_Lo…
- if rmsite == pywikibot.getSite('lmo', 'wikipedia'):
- pywikibot.output(u'Found bad link to %s. As many lmo pages were deleted, it is assumed that it can be safely removed.' % rmPage)
- else:
- ##########
- new[rmsite] = old[rmsite]
- pywikibot.output(u"WARNING: %s is either deleted or has a mismatching disambiguation state." % rmPage)
+ new[rmsite] = old[rmsite] #put it to new means don't delete it
+ pywikibot.output(u"WARNING: %s is either deleted or has a mismatching disambiguation state." % rmPage)
# Re-Check what needs to get done
- mods, adding, removing, modifying = compareLanguages(old, new, insite = page.site())
+ mods, mcomment, adding, removing, modifying = compareLanguages(old, new, insite = page.site())
if not mods:
pywikibot.output(u'No changes needed' )
@@ -1530,13 +1780,20 @@
pywikibot.output(u"Changes to be made: %s" % mods)
oldtext = page.get()
+ template = (page.namespace() == 10)
newtext = pywikibot.replaceLanguageLinks(oldtext, new,
site = page.site(),
- template = (page.namespace() == 10))
+ template = template)
+ # This is for now. Later there should be different funktions for each kind
+ if not botMayEdit(page):
+ if template:
+ pywikibot.output(u'SKIPPING: %s should have interwiki links on subpage.' % page.aslink(True))
+ else:
+ pywikibot.output(u'SKIPPING: %s is under construction or to be deleted.' % page.aslink(True))
+ return False
if newtext == oldtext:
return False
- if globalvar.debug:
- pywikibot.showDiff(oldtext, newtext)
+ pywikibot.showDiff(oldtext, newtext)
# pywikibot.output(u"NOTE: Replace %s" % page)
# Determine whether we need permission to submit
@@ -1546,7 +1803,7 @@
ask = True
if globalvar.force:
ask = False
- if globalvar.confirm:
+ if globalvar.confirm and not globalvar.always:
ask = True
# If we need to ask, do so
if ask:
@@ -1555,8 +1812,8 @@
answer = 'n'
else:
answer = pywikibot.inputChoice(u'Submit?',
- ['Yes', 'No', 'open in Browser', 'Give up'],
- ['y', 'n', 'b', 'g'])
+ ['Yes', 'No', 'open in Browser', 'Give up', 'Always'],
+ ['y', 'n', 'b', 'g', 'a'])
if answer == 'b':
webbrowser.open("http://%s%s" % (
page.site().hostname(),
@@ -1564,6 +1821,10 @@
))
pywikibot.input(u"Press Enter when finished in browser.")
return True
+ elif answer == 'a':
+ # don't ask for the rest of this subject
+ globalvar.always = True
+ answer = 'y'
else:
# If we do not need to ask, allow
answer = 'y'
@@ -1573,12 +1834,14 @@
# another get-query first.
if bot:
while pywikibot.get_throttle.waittime() + 2.0 < pywikibot.put_throttle.waittime():
- pywikibot.output(u"NOTE: Performing a recursive query first to save time....")
+ if not globalvar.quiet:
+ pywikibot.output(u"NOTE: Performing a recursive query first to save time....")
qdone = bot.oneQuery()
if not qdone:
# Nothing more to do
break
- pywikibot.output(u"NOTE: Updating live wiki...")
+ if not globalvar.quiet:
+ pywikibot.output(u"NOTE: Updating live wiki...")
timeout=60
while 1:
try:
@@ -1603,7 +1866,7 @@
timeout *= 2
time.sleep(timeout)
except pywikibot.ServerError:
- if timeout>3600:
+ if timeout > 3600:
raise
pywikibot.output(u'ERROR putting page: ServerError.')
pywikibot.output(u'Sleeping %i seconds before trying again.' % (timeout,))
@@ -1694,30 +1957,37 @@
self.generateNumber = number
self.generateUntil = until
- def dump(self):
+ def dump(self, append = True):
site = pywikibot.getSite()
dumpfn = pywikibot.config.datafilepath(
'data',
'interwiki-dumps',
'%s-%s.pickle' % (site.family.name, site.lang))
- f = open(dumpfn, 'w')
+ if append: mode = 'appended'
+ else: mode = 'written'
+ f = open(dumpfn, mode[0])
titles = [s.originPage.title() for s in self.subjects]
pickle.dump(titles, f)
f.close()
- pywikibot.output(u'Dump %s (%s) saved' % (site.lang, site.family.name))
+ pywikibot.output(u'Dump %s (%s) %s.' % (site.lang, site.family.name, mode))
+ return dumpfn
def generateMore(self, number):
"""Generate more subjects. This is called internally when the
list of subjects becomes too small, but only if there is a
PageGenerator"""
fs = self.firstSubject()
- if fs:
+ if fs and (not globalvar.quiet):
pywikibot.output(u"NOTE: The first unfinished subject is %s" % fs.originPage)
pywikibot.output(u"NOTE: Number of pages queued is %d, trying to add %d more."%(len(self.subjects), number))
for i in range(number):
try:
while True:
- page = self.pageGenerator.next()
+ try:
+ page = self.pageGenerator.next()
+ except IOError:
+ pywikibot.output(u'IOError occured; skipping')
+ continue
if page in globalvar.skip:
pywikibot.output(u'Skipping: %s is in the skip list' % page)
continue
@@ -1733,12 +2003,29 @@
if page.isTalkPage():
pywikibot.output(u'Skipping: %s is a talk page' % page)
continue
+ #doesn't work: page must be preloaded for this test
+ #if page.isEmpty():
+ # pywikibot.output(u'Skipping: %s is a empty page' % page.title())
+ # continue
+ if page.namespace() == 10:
+ loc = None
+ try:
+ tmpl, loc = moved_links[page.site().lang]
+ del tmpl
+ except KeyError:
+ pass
+ if loc != None and loc in page.title():
+ pywikibot.output(u'Skipping: %s is a templates subpage' % page.title())
+ continue
break
if self.generateUntil:
- if page.titleWithoutNamespace() > self.generateUntil:
+ until = self.generateUntil
+ if page.site().lang not in page.site().family.nocapitalize:
+ until = until[0].upper()+until[1:]
+ if page.titleWithoutNamespace() > until:
raise StopIteration
- self.add(page, hints = hints)
+ self.add(page, hints = globalvar.hints)
self.generated += 1
if self.generateNumber:
if self.generated >= self.generateNumber:
@@ -1779,8 +2066,7 @@
def selectQuerySite(self):
"""Select the site the next query should go out for."""
# How many home-language queries we still have?
- ### it seems this counts a negative value
- mycount = max(0, self.counts.get(pywikibot.getSite(), 0))
+ mycount = self.counts.get(pywikibot.getSite(), 0)
# Do we still have enough subjects to work on for which the
# home language has been retrieved? This is rough, because
# some subjects may need to retrieve a second home-language page!
@@ -1799,11 +2085,12 @@
else:
break
# If we have a few, getting the home language is a good thing.
- try:
- if self.counts[pywikibot.getSite()] > 4:
- return pywikibot.getSite()
- except KeyError:
- pass
+ if not globalvar.restoreAll:
+ try:
+ if self.counts[pywikibot.getSite()] > 4:
+ return pywikibot.getSite()
+ except KeyError:
+ pass
# If getting the home language doesn't make sense, see how many
# foreign page queries we can find.
return self.maxOpenSite()
@@ -1890,7 +2177,7 @@
removing = sorted(oldiw - newiw)
modifying = sorted(site for site in oldiw & newiw if old[site] != new[site])
- mods = u""
+ mcomment = mods = u''
if len(adding) + len(removing) + len(modifying) <= 3:
# Use an extended format for the string linking to all added pages.
@@ -1899,16 +2186,44 @@
# Use short format, just the language code
fmt = lambda d, site: site.lang
- _, add, rem, mod = pywikibot.translate(insite.lang, msg)
+ head, add, rem, mod = pywikibot.translate(insite.lang, msg)
+ colon = u': '
+ comma = u', '
+ sep = u''
+
if adding:
- mods += u" %s: %s" % (add, ", ".join([fmt(new, x) for x in adding]))
+ mods += (add + colon + comma.join([fmt(new, x) for x in adding]))
+ sep = u' '
if removing:
- mods += u" %s: %s" % (rem, ", ".join([fmt(old, x) for x in removing]))
+ mods += (sep + rem + colon + comma.join([fmt(old, x) for x in removing]))
+ sep = u' '
if modifying:
- mods += u" %s: %s" % (mod, ", ".join([fmt(new, x) for x in modifying]))
- return mods, adding, removing, modifying
+ mods += (sep + mod + colon + comma.join([fmt(new, x) for x in modifying]))
+ if mods:
+ mcomment = head + mods
+ return mods, mcomment, adding, removing, modifying
+def botMayEdit (page):
+ tmpl = []
+ try:
+ tmpl, loc = moved_links[page.site().lang]
+ except KeyError:
+ pass
+ if type(tmpl) != list:
+ tmpl = [tmpl]
+ try:
+ tmpl += ignoreTemplates[page.site().lang]
+ except KeyError:
+ pass
+ tmpl += ignoreTemplates['_default']
+ if tmpl != []:
+ templates = page.templatesWithParams(get_redirect=True);
+ for template in templates:
+ if template[0].lower() in tmpl:
+ return False
+ return True
+
def readWarnfile(filename, bot):
import warnfile
reader = warnfile.WarnfileReader(filename)
@@ -1926,6 +2241,7 @@
if __name__ == "__main__":
try:
+ site = pywikibot.getSite()
singlePageTitle = []
hints = []
start = None
@@ -2087,10 +2403,8 @@
if not genFactory.handleArg(arg):
singlePageTitle.append(arg)
-
# ensure that we don't try to change main page
try:
- site = pywikibot.getSite()
mainpagename = site.mediawiki_message('mainpage')
globalvar.skip.add(pywikibot.Page(site, mainpagename))
except pywikibot.Error:
@@ -2112,7 +2426,7 @@
ns = 'all'
hintlessPageGen = pagegenerators.NewpagesPageGenerator(newPages, namespace=ns)
- if optRestore or optContinue:
+ elif optRestore or optContinue:
site = pywikibot.getSite()
dumpFileName = pywikibot.config.datafilepath(
'data',
Revision: 8099
Author: xqt
Date: 2010-04-16 06:54:47 +0000 (Fri, 16 Apr 2010)
Log Message:
-----------
crossupdate from trunk/rewrite
Modified Paths:
--------------
branches/rewrite/scripts/editarticle.py
trunk/pywikipedia/editarticle.py
Modified: branches/rewrite/scripts/editarticle.py
===================================================================
--- branches/rewrite/scripts/editarticle.py 2010-04-16 06:29:58 UTC (rev 8098)
+++ branches/rewrite/scripts/editarticle.py 2010-04-16 06:54:47 UTC (rev 8099)
@@ -50,17 +50,18 @@
def command(self, tempFilename, text, jumpIndex = None):
command = config.editor
if jumpIndex:
- # Some editors make it possible to mark occurences of substrings, or
- # to jump to the line of the first occurence.
+ # Some editors make it possible to mark occurences of substrings,
+ # or to jump to the line of the first occurence.
# TODO: Find a better solution than hardcoding these, e.g. a config
# option.
line = text[:jumpIndex].count('\n')
column = jumpIndex - (text[:jumpIndex].rfind('\n') + 1)
else:
line = column = 0
- # Linux editors. We use startswith() because some users might use parameters.
+ # Linux editors. We use startswith() because some users might use
+ # parameters.
if config.editor.startswith('kate'):
- command += " -l %i -c %i" % (line, column)
+ command += " -l %i -c %i" % (line + 1, column + 1)
elif config.editor.startswith('gedit'):
command += " +%i" % (line + 1) # seems not to support columns
elif config.editor.startswith('emacs'):
@@ -172,7 +173,9 @@
fp = open(fn, 'w')
fp.write(new)
fp.close()
- pywikibot.output(u"An edit conflict has arisen. Your edit has been saved to %s. Please try again." % fn)
+ pywikibot.output(
+ u"An edit conflict has arisen. Your edit has been saved to %s. Please try again."
+ % fn)
def run(self):
try:
@@ -186,7 +189,8 @@
changes = pywikibot.input(u"What did you change?")
comment = pywikibot.translate(pywikibot.getSite(), msg) % changes
try:
- self.page.put(new, comment = comment, minorEdit = False, watchArticle=self.options.watch)
+ self.page.put(new, comment=comment, minorEdit=False,
+ watchArticle=self.options.watch)
except pywikibot.EditConflict:
self.handle_edit_conflict(new)
else:
Modified: trunk/pywikipedia/editarticle.py
===================================================================
--- trunk/pywikipedia/editarticle.py 2010-04-16 06:29:58 UTC (rev 8098)
+++ trunk/pywikipedia/editarticle.py 2010-04-16 06:54:47 UTC (rev 8099)
@@ -6,7 +6,7 @@
#
# (C) Gerrit Holl 2004
-# (C) Pywikipedia team, 2004-2009
+# (C) Pywikipedia team, 2004-2010
#
__version__ = "$Id$"
#
@@ -50,15 +50,16 @@
def command(self, tempFilename, text, jumpIndex = None):
command = config.editor
if jumpIndex:
- # Some editors make it possible to mark occurences of substrings, or
- # to jump to the line of the first occurence.
+ # Some editors make it possible to mark occurences of substrings,
+ # or to jump to the line of the first occurence.
# TODO: Find a better solution than hardcoding these, e.g. a config
# option.
line = text[:jumpIndex].count('\n')
column = jumpIndex - (text[:jumpIndex].rfind('\n') + 1)
else:
line = column = 0
- # Linux editors. We use startswith() because some users might use parameters.
+ # Linux editors. We use startswith() because some users might use
+ # parameters.
if config.editor.startswith('kate'):
command += " -l %i -c %i" % (line + 1, column + 1)
elif config.editor.startswith('gedit'):
@@ -126,27 +127,32 @@
os.unlink(tempFilename)
return self.restoreLinebreaks(newcontent)
else:
- return self.restoreLinebreaks(pywikibot.ui.editText(text, jumpIndex = jumpIndex, highlight = highlight))
+ return self.restoreLinebreaks(
+ pywikibot.ui.editText(text, jumpIndex=jumpIndex,
+ highlight=highlight))
class ArticleEditor:
# join lines if line starts with this ones
joinchars = string.letters + '[]' + string.digits
- def __init__(self):
- self.set_options()
+ def __init__(self, *args):
+ self.set_options(*args)
self.setpage()
self.site = pywikibot.getSite()
- def set_options(self):
+ def set_options(self, *args):
"""Parse commandline and set options attribute"""
my_args = []
- for arg in pywikibot.handleArgs():
+ for arg in pywikibot.handleArgs(*args):
my_args.append(arg)
parser = optparse.OptionParser()
- parser.add_option("-r", "--edit_redirect", action="store_true", default=False, help="Ignore/edit redirects")
+ parser.add_option("-r", "--edit_redirect", action="store_true",
+ default=False, help="Ignore/edit redirects")
parser.add_option("-p", "--page", help="Page to edit")
- parser.add_option("-w", "--watch", action="store_true", default=False, help="Watch article after edit")
- #parser.add_option("-n", "--new_data", default="", help="Automatically generated content")
+ parser.add_option("-w", "--watch", action="store_true", default=False,
+ help="Watch article after edit")
+ #parser.add_option("-n", "--new_data", default="",
+ # help="Automatically generated content")
(self.options, args) = parser.parse_args(args=my_args)
# for convenience, if we have an arg, stuff it into the opt, so we
@@ -167,7 +173,9 @@
fp = open(fn, 'w')
fp.write(new)
fp.close()
- pywikibot.output(u"An edit conflict has arisen. Your edit has been saved to %s. Please try again." % fn)
+ pywikibot.output(
+ u"An edit conflict has arisen. Your edit has been saved to %s. Please try again."
+ % fn)
def run(self):
try:
@@ -181,14 +189,15 @@
changes = pywikibot.input(u"What did you change?")
comment = pywikibot.translate(pywikibot.getSite(), msg) % changes
try:
- self.page.put(new, comment = comment, minorEdit = False, watchArticle=self.options.watch)
+ self.page.put(new, comment=comment, minorEdit=False,
+ watchArticle=self.options.watch)
except pywikibot.EditConflict:
self.handle_edit_conflict(new)
else:
pywikibot.output(u"Nothing changed")
-def main():
- app = ArticleEditor()
+def main(*args):
+ app = ArticleEditor(*args)
app.run()
if __name__ == "__main__":
Revision: 8095
Author: russblau
Date: 2010-04-15 18:16:10 +0000 (Thu, 15 Apr 2010)
Log Message:
-----------
Revert to a single background thread for asynchronous saves, instead of a thread per request; this should mean less overhead and better performance.
Modified Paths:
--------------
branches/rewrite/pywikibot/__init__.py
branches/rewrite/pywikibot/config2.py
branches/rewrite/pywikibot/page.py
Modified: branches/rewrite/pywikibot/__init__.py
===================================================================
--- branches/rewrite/pywikibot/__init__.py 2010-04-15 17:04:13 UTC (rev 8094)
+++ branches/rewrite/pywikibot/__init__.py 2010-04-15 18:16:10 UTC (rev 8095)
@@ -14,6 +14,8 @@
import logging
import re
import sys
+import threading
+from Queue import Queue
import config2 as config
from bot import *
@@ -248,7 +250,6 @@
# Throttle and thread handling
-threadpool = [] # add page-putting threads to this list as they are created
stopped = False
def stopme():
@@ -263,20 +264,70 @@
if not stopped:
pywikibot.debug(u"stopme() called", _logger)
- count = sum(1 for thd in threadpool if thd.isAlive())
- if count:
- pywikibot.output(u"Waiting for about %(count)s pages to be saved."
- % locals())
- for thd in threadpool:
- if thd.isAlive():
- thd.join()
+ def remaining():
+ import datetime
+ remainingPages = page_put_queue.qsize() - 1
+ # -1 because we added a None element to stop the queue
+ remainingSeconds = datetime.timedelta(
+ seconds=(remainingPages * config.put_throttle))
+ return (remainingPages, remainingSeconds)
+
+ page_put_queue.put((None, [], {}))
stopped = True
+
+ if page_put_queue.qsize() > 1:
+ output(u'Waiting for %i pages to be put. Estimated time remaining: %s'
+ % remaining())
+
+ while(_putthread.isAlive()):
+ try:
+ _putthread.join(1)
+ except KeyboardInterrupt:
+ answer = inputChoice(u"""\
+There are %i pages remaining in the queue. Estimated time remaining: %s
+Really exit?"""
+ % remaining(),
+ ['yes', 'no'], ['y', 'N'], 'N')
+ if answer == 'y':
+ return
+
# only need one drop() call because all throttles use the same global pid
try:
- _sites[_sites.keys()[0]].throttle.drop()
+ _sites.values()[0].throttle.drop()
pywikibot.log(u"Dropped throttle(s).")
except IndexError:
pass
import atexit
atexit.register(stopme)
+
+# Create a separate thread for asynchronous page saves (and other requests)
+
+def async_manager():
+ """Daemon; take requests from the queue and execute them in background."""
+ while True:
+ (request, args, kwargs) = page_put_queue.get()
+ if request is None:
+ break
+ request(*args, **kwargs)
+
+def async_request(request, *args, **kwargs):
+ """Put a request on the queue, and start the daemon if necessary."""
+ if not _putthread.isAlive():
+ try:
+ page_put_queue.mutex.acquire()
+ try:
+ _putthread.start()
+ except (AssertionError, RuntimeError):
+ pass
+ finally:
+ page_put_queue.mutex.release()
+ page_put_queue.put((request, args, kwargs))
+
+# queue to hold pending requests
+page_put_queue = Queue(config.max_queue_size)
+# set up the background thread
+_putthread = threading.Thread(target=async_manager)
+# identification for debugging purposes
+_putthread.setName('Put-Thread')
+_putthread.setDaemon(True)
Modified: branches/rewrite/pywikibot/config2.py
===================================================================
--- branches/rewrite/pywikibot/config2.py 2010-04-15 17:04:13 UTC (rev 8094)
+++ branches/rewrite/pywikibot/config2.py 2010-04-15 18:16:10 UTC (rev 8095)
@@ -490,6 +490,12 @@
# Configuration variable 'socks' is defined but unknown. Misspelled?proxy = None
proxy = None
+# How many pages should be put to a queue in asynchroneous mode.
+# If maxsize is <= 0, the queue size is infinite.
+# Increasing this value will increase memory space but could speed up
+# processing. As higher this value this effect will decrease.
+max_queue_size = 64
+
# End of configuration section
# ============================
# System-level and User-level changes.
Modified: branches/rewrite/pywikibot/page.py
===================================================================
--- branches/rewrite/pywikibot/page.py 2010-04-15 17:04:13 UTC (rev 8094)
+++ branches/rewrite/pywikibot/page.py 2010-04-15 18:16:10 UTC (rev 8095)
@@ -718,16 +718,12 @@
"Page %s not saved; editing restricted by {{bots}} template"
% self.title(asLink=True))
if async:
- thd = threading.Thread(
- target=self._save,
- args=(comment, minor, watch, unwatch, callback)
- )
- pywikibot.threadpool.append(thd)
- thd.start()
+ pywikibot.async_request(self._save, comment, minor, watch, unwatch,
+ async, callback)
else:
- self._save(comment, minor, watch, unwatch, callback)
+ self._save(comment, minor, watch, unwatch, async, callback)
- def _save(self, comment, minor, watch, unwatch, callback):
+ def _save(self, comment, minor, watch, unwatch, async, callback):
err = None
link = self.title(asLink=True)
try:
@@ -741,13 +737,14 @@
except pywikibot.LockedPage, err:
# re-raise the LockedPage exception so that calling program
# can re-try if appropriate
- if not callback:
+ if not callback and not async:
raise
# TODO: other "expected" error types to catch?
except pywikibot.Error, err:
- pywikibot.log(u"Error saving page %s\n" % link, exc_info=True)
- if not callback:
- raise pywikibot.PageNotSaved(link)
+ pywikibot.log(u"Error saving page %s (%s)\n" % (link, err),
+ exc_info=True)
+ if not callback and not async:
+ raise pywikibot.PageNotSaved("%s: %s" %(link, err))
if callback:
callback(self, err)
Revision: 8094
Author: xqt
Date: 2010-04-15 17:04:13 +0000 (Thu, 15 Apr 2010)
Log Message:
-----------
update basic.py, blockpagechecker.py from trunk/rewrite
Modified Paths:
--------------
branches/rewrite/scripts/basic.py
branches/rewrite/scripts/blockpageschecker.py
trunk/pywikipedia/blockpageschecker.py
Modified: branches/rewrite/scripts/basic.py
===================================================================
--- branches/rewrite/scripts/basic.py 2010-04-15 14:38:41 UTC (rev 8093)
+++ branches/rewrite/scripts/basic.py 2010-04-15 17:04:13 UTC (rev 8094)
@@ -38,6 +38,7 @@
'ksh': u'Bot: Ännern ...',
'nds': u'Bot: Änderung ...',
'nl': u'Bot: wijziging ...',
+ 'pl': u'Bot: zmienia ...',
'pt': u'Bot: alterando...',
'sv': u'Bot: Ändrar ...',
'zh': u'機器人:編輯.....',
@@ -55,6 +56,7 @@
"""
self.generator = generator
self.dry = dry
+ # Set the edit summary message
self.summary = pywikibot.translate(pywikibot.getSite(), self.msg)
def run(self):
Modified: branches/rewrite/scripts/blockpageschecker.py
===================================================================
--- branches/rewrite/scripts/blockpageschecker.py 2010-04-15 14:38:41 UTC (rev 8093)
+++ branches/rewrite/scripts/blockpageschecker.py 2010-04-15 17:04:13 UTC (rev 8094)
@@ -1,10 +1,10 @@
# -*- coding: utf-8 -*-
"""
This is a script originally written by Wikihermit and then rewritten by Filnik,
-to delete the templates used to warn in the pages that a page is blocked,
-when the page isn't blocked at all. Indeed, very often sysops block the pages
-for a setted time but then the forget to delete the warning! This script is useful
-if you want to delete those useless warning left in these pages.
+to delete the templates used to warn in the pages that a page is blocked, when
+the page isn't blocked at all. Indeed, very often sysops block the pages for a
+setted time but then the forget to delete the warning! This script is useful if
+you want to delete those useless warning left in these pages.
Parameters:
@@ -20,21 +20,25 @@
Argument can also be given as "-page:pagetitle". You can
give this parameter multiple times to edit multiple pages.
--protectedpages: Check all the blocked pages (useful when you have not categories
- or when you have problems with them. (add the namespace after ":" where
- you want to check - default checks all protected pages)
+-protectedpages: Check all the blocked pages; useful when you have not
+ categories or when you have problems with them. (add the
+ namespace after ":" where you want to check - default checks
+ all protected pages.)
-moveprotected: Same as -protectedpages, for moveprotected pages
Furthermore, the following command line parameters are supported:
--always Doesn't ask every time if the bot should make the change or not, do it always.
+-always Doesn't ask every time if the bot should make the change or not,
+ do it always.
--debug When the bot can't delete the template from the page (wrong regex or something like that)
- it will ask you if it should open the page on your browser.
- (attention: pages included may give false positives..)
+-show When the bot can't delete the template from the page (wrong
+ regex or something like that) it will ask you if it should show
+ the page on your browser.
+ (attention: pages included may give false positives!)
--move The bot will check if the page is blocked also for the move option, not only for edit
+-move The bot will check if the page is blocked also for the move
+ option, not only for edit
--- Warning! ---
You have to edit this script in order to add your preferences
@@ -49,13 +53,14 @@
python blockpageschecker.py -cat:Geography -always
-python blockpageschecker.py -debug -protectedpages:4
+python blockpageschecker.py -show -protectedpages:4
"""
#
# (C) Monobi a.k.a. Wikihermit, 2007
-# (C) Filnik, 2007-2008-2009
-# (C) NicDumZ, 2008
+# (C) Filnik, 2007-2009
+# (C) NicDumZ, 2008-2009
+# (C) Pywikipedia bot team, 2007-2010
#
# Distributed under the terms of the MIT license.
#
@@ -83,7 +88,7 @@
r'\{\{(?:[Tt]emplate:|)[Aa]bp(?:|[ _]scad\|(?:.*?))\}\}'],
'fr': [ur'\{\{(?:[Tt]emplate:|[Mm]odèle:|)[Ss]emi[- ]?protection(|[^\}]*)\}\}'],
'ja':[ur'(?<!\<nowiki\>)\{\{(?:[Tt]emplate:|)半保護(?:[Ss]|)(?:\|.+|)\}\}(?!\<\/nowiki\>)\s*(?:\r\n|)*'],
- 'zh':[ur'\{\{(?:[Tt]emplate:|)Protected|(?:[Ss]|[Ss]emi|半)(?:\|.+|)\}\}(\n+?|)',ur'\{\{(?:[Tt]emplate:|)Mini-protected|(?:[Ss]|[Ss]emi|半)(?:\|.+|)\}\}(\n+?|)',ur'\{\{(?:[Tt]emplate:|)Protected-logo|(?:[Ss]|[Ss]emi|半)(?:\|.+|)\}\}(\n+?|)'],
+ #'zh':[ur'\{\{(?:[Tt]emplate:|)Protected|(?:[Ss]|[Ss]emi|半)(?:\|.+|)\}\}(\n+?|)',ur'\{\{(?:[Tt]emplate:|)Mini-protected|(?:[Ss]|[Ss]emi|半)(?:\|.+|)\}\}(\n+?|)',ur'\{\{(?:[Tt]emplate:|)Protected-logo|(?:[Ss]|[Ss]emi|半)(?:\|.+|)\}\}(\n+?|)'],
}
# Regex to get the total-protection template
templateTotalProtection = {
@@ -93,21 +98,21 @@
'fr':[ur'\{\{(?:[Tt]emplate:|[Mm]odèle:|)[Pp]rotection(|[^\}]*)\}\}',
ur'\{\{(?:[Tt]emplate:|[Mm]odèle:|)(?:[Pp]age|[Aa]rchive|[Mm]odèle) protégée?(|[^\}]*)\}\}'],
'ja':[ur'(?<!\<nowiki\>)\{\{(?:[Tt]emplate:|)保護(?:性急|)(?:[Ss]|)(?:\|.+|)\}\}(?!\<\/nowiki\>)\s*(?:\r\n|)*'],
- 'zh':[r'\{\{(?:[Tt]emplate:|)Protected|(?:[Nn]|[Nn]ormal)(?:\|.+|)\}\}(\n+?|)',r'\{\{(?:[Tt]emplate:|)Mini-protected|(?:[Nn]|[Nn]ormal)(?:\|.+|)\}\}(\n+?|)',r'\{\{(?:[Tt]emplate:|)Protected-logo|(?:[Nn]|[Nn]ormal)(?:\|.+|)\}\}(\n+?|)'],
+ #'zh':[r'\{\{(?:[Tt]emplate:|)Protected|(?:[Nn]|[Nn]ormal)(?:\|.+|)\}\}(\n+?|)',r'\{\{(?:[Tt]emplate:|)Mini-protected|(?:[Nn]|[Nn]ormal)(?:\|.+|)\}\}(\n+?|)',r'\{\{(?:[Tt]emplate:|)Protected-logo|(?:[Nn]|[Nn]ormal)(?:\|.+|)\}\}(\n+?|)'],
}
# Regex to get the semi-protection move template
templateSemiMoveProtection = {
'en': None,
'it':[r'\{\{(?:[Tt]emplate:|)[Aa]vvisobloccospostamento(?:|[ _]scad\|.*?|\|.*?)\}\}'],
'ja':[ur'(?<!\<nowiki\>)\{\{(?:[Tt]emplate:|)移動半保護(?:[Ss]|)(?:\|.+|)\}\}(?!\<\/nowiki\>)\s*(?:\r\n|)*'],
- 'zh':[r'\{\{(?:[Tt]emplate:|)Protected|(?:MS|ms)(?:\|.+|)\}\}(\n+?|)',r'\{\{(?:[Tt]emplate:|)Mini-protected|(?:MS|ms)(?:\|.+|)\}\}(\n+?|)',r'\{\{(?:[Tt]emplate:|)Protected-logo|(?:MS|ms)(?:\|.+|)\}\}(\n+?|)'],
+ #'zh':[r'\{\{(?:[Tt]emplate:|)Protected|(?:MS|ms)(?:\|.+|)\}\}(\n+?|)',r'\{\{(?:[Tt]emplate:|)Mini-protected|(?:MS|ms)(?:\|.+|)\}\}(\n+?|)',r'\{\{(?:[Tt]emplate:|)Protected-logo|(?:MS|ms)(?:\|.+|)\}\}(\n+?|)'],
}
# Regex to get the total-protection move template
templateTotalMoveProtection = {
'en': None,
'it':[r'\{\{(?:[Tt]emplate:|)[Aa]vvisobloccospostamento(?:|[ _]scad\|.*?|\|.*?)\}\}'],
'ja':[ur'(?<!\<nowiki\>)\{\{(?:[Tt]emplate:|)移動保護(?:[Ss]|)(?:\|.+|)\}\}(?!\<\/nowiki\>)\s*(?:\r\n|)*'],
- 'zh':[ur'\{\{(?:[Tt]emplate:|)Protected|(?:[Mm]|[Mm]ove|移[動动])(?:\|.+|)\}\}(\n+?|)',ur'\{\{(?:[Tt]emplate:|)Mini-protected|(?:[Mm]|[Mm]ove|移[動动])(?:\|.+|)\}\}(\n+?|)',ur'\{\{(?:[Tt]emplate:|)Protected-logo|(?:[Mm]|[Mm]ove|移[動动])(?:\|.+|)\}\}(\n+?|)'],
+ #'zh':[ur'\{\{(?:[Tt]emplate:|)Protected|(?:[Mm]|[Mm]ove|移[動动])(?:\|.+|)\}\}(\n+?|)',ur'\{\{(?:[Tt]emplate:|)Mini-protected|(?:[Mm]|[Mm]ove|移[動动])(?:\|.+|)\}\}(\n+?|)',ur'\{\{(?:[Tt]emplate:|)Protected-logo|(?:[Mm]|[Mm]ove|移[動动])(?:\|.+|)\}\}(\n+?|)'],
}
# If you use only one template for all the type of protection, put it here.
@@ -123,7 +128,7 @@
'it':['{{Avvisobloccoparziale}}', '{{Avvisoblocco}}', None, None, '{{Protetta}}'],
'fr':['{{Semi-protection}}', '{{Protection}}', None, None, None],
'ja':[u'{{半保護}}', u'{{保護}}', u'{{移動半保護}}', u'{{移動保護}}', None],
- 'zh':[u'{{Protected/semi}}',u'{{Protected}}',u'{{Protected/ms}}',u'{{Protected/move}}', None],
+ #'zh':[u'{{Protected/semi}}',u'{{Protected}}',u'{{Protected/ms}}',u'{{Protected/move}}', None],
}
# Category where the bot will check
@@ -185,47 +190,36 @@
return ('autoconfirmed-move', catchRegex)
return ('editable', r'\A\n') # If editable means that we have no regex, won't change anything with this regex
-def debugQuest(site, page):
- quest = pywikibot.input(u'Do you want to open the page on your [b]rowser, [g]ui or [n]othing?')
+def showQuest(site, page):
+ quest = pywikibot.inputChoice(u'Do you want to open the page?',
+ ['with browser', 'with gui', 'no'],
+ ['b','g','n'], 'n')
pathWiki = site.family.nicepath(site.lang)
url = 'http://%s%s%s?&redirect=no' % (pywikibot.getSite().hostname(), pathWiki, page.urlname())
- while 1:
- if quest.lower() in ['b', 'B']:
- webbrowser.open(url)
- break
- elif quest.lower() in ['g', 'G']:
- import editarticle
- editor = editarticle.TextEditor()
- text = editor.edit(page.get())
- break
- elif quest.lower() in ['n', 'N']:
- break
- else:
- pywikibot.output(u'wrong entry, type "b", "g" or "n"')
- continue
+ if quest == 'b':
+ webbrowser.open(url)
+ elif quest == 'g':
+ import editarticle
+ editor = editarticle.TextEditor()
+ text = editor.edit(page.get())
def main():
""" Main Function """
# Loading the comments
- global categoryToCheck; global comment; global project_inserted
- if config.mylang not in project_inserted:
- pywikibot.output(u"Your project is not supported by this script. You have to edit the script and add it!")
- return
+ global categoryToCheck, comment, project_inserted
# always, define a generator to understand if the user sets one, defining what's genFactory
- always = False; generator = False; debug = False
+ always = False; generator = False; show = False
moveBlockCheck = False; genFactory = pagegenerators.GeneratorFactory()
# To prevent Infinite loops
errorCount = 0
- # Load the right site
- site = pywikibot.getSite()
# Loading the default options.
for arg in pywikibot.handleArgs():
if arg == '-always':
always = True
elif arg == '-move':
moveBlockCheck = True
- elif arg == '-debug':
- debug = True
+ elif arg == '-show':
+ show = True
elif arg.startswith('-protectedpages'):
if len(arg) == 15:
generator = site.protectedpages(namespace = 0)
@@ -245,6 +239,13 @@
else:
genFactory.handleArg(arg)
+ if config.mylang not in project_inserted:
+ pywikibot.output(u"Your project is not supported by this script.\nYou have to edit the script and add it!")
+ return
+
+ # Load the right site
+ site = pywikibot.getSite()
+
# Take the right templates to use, the category and the comment
TSP = pywikibot.translate(site, templateSemiProtection)
TTP = pywikibot.translate(site, templateTotalProtection)
@@ -281,8 +282,8 @@
continue
except pywikibot.IsRedirectPage:
pywikibot.output("%s is a redirect! Skipping..." % pagename)
- if debug:
- debugQuest(site, page)
+ if show:
+ showQuest(site, page)
continue
"""
# This check does not work :
Modified: trunk/pywikipedia/blockpageschecker.py
===================================================================
--- trunk/pywikipedia/blockpageschecker.py 2010-04-15 14:38:41 UTC (rev 8093)
+++ trunk/pywikipedia/blockpageschecker.py 2010-04-15 17:04:13 UTC (rev 8094)
@@ -190,7 +190,9 @@
return ('editable', r'\A\n') # If editable means that we have no regex, won't change anything with this regex
def showQuest(site, page):
- quest = pywikibot.inputChoice(u'Do you want to open the page?',['with browser', 'with gui', 'no'], ['b','g','n'], 'n')
+ quest = pywikibot.inputChoice(u'Do you want to open the page?',
+ ['with browser', 'with gui', 'no'],
+ ['b','g','n'], 'n')
pathWiki = site.family.nicepath(site.lang)
url = 'http://%s%s%s?&redirect=no' % (pywikibot.getSite().hostname(), pathWiki, page.urlname())
if quest == 'b':
@@ -290,16 +292,16 @@
pywikibot.output("%s is sysop-protected : this account can't edit it! Skipping..." % pagename)
continue
"""
- try:
+ if restrictions.has_key('edit'):
editRestr = restrictions['edit']
- if editRestr and editRestr[0] == 'sysop':
- try:
- config.sysopnames[site.family.name][site.lang]
- except:
- pywikibot.output("%s is sysop-protected : this account can't edit it! Skipping..." % pagename)
- continue
- except KeyError:
- continue
+ else:
+ editRestr = None
+ if editRestr and editRestr[0] == 'sysop':
+ try:
+ config.sysopnames[site.family.name][site.lang]
+ except:
+ pywikibot.output("%s is sysop-protected : this account can't edit it! Skipping..." % pagename)
+ continue
# Understand, according to the template in the page, what should be the protection
# and compare it with what there really is.
Revision: 8091
Author: malafaya
Date: 2010-04-14 14:14:12 +0000 (Wed, 14 Apr 2010)
Log Message:
-----------
* Started transliteration for Korean (better than nothing :))
Modified Paths:
--------------
trunk/pywikipedia/userinterfaces/transliteration.py
Modified: trunk/pywikipedia/userinterfaces/transliteration.py
===================================================================
--- trunk/pywikipedia/userinterfaces/transliteration.py 2010-04-14 13:18:54 UTC (rev 8090)
+++ trunk/pywikipedia/userinterfaces/transliteration.py 2010-04-14 14:14:12 UTC (rev 8091)
@@ -1343,8 +1343,70 @@
self.trans[char] = u"."
for char in u"ๆ":
self.trans[char] = u"(2)"
+
+ # Korean (Revised Romanization system within possible, incomplete)
+ for char in u"국":
+ self.trans[char] = u"guk"
+ for char in u"명":
+ self.trans[char] = u"myeong"
+ for char in u"검":
+ self.trans[char] = u"geom"
+ for char in u"타":
+ self.trans[char] = u"ta"
+ for char in u"분":
+ self.trans[char] = u"bun"
+ for char in u"사":
+ self.trans[char] = u"sa"
+ for char in u"류":
+ self.trans[char] = u"ryu"
+ for char in u"포":
+ self.trans[char] = u"po"
+ for char in u"르":
+ self.trans[char] = u"reu"
+ for char in u"투":
+ self.trans[char] = u"tu"
+ for char in u"갈":
+ self.trans[char] = u"gal"
+ for char in u"어":
+ self.trans[char] = u"eo"
+ for char in u"노":
+ self.trans[char] = u"no"
+ for char in u"웨":
+ self.trans[char] = u"we"
+ for char in u"이":
+ self.trans[char] = u"i"
+ for char in u"라":
+ self.trans[char] = u"ra"
+ for char in u"틴":
+ self.trans[char] = u"tin"
+ for char in u"루":
+ self.trans[char] = u"ru"
+ for char in u"마":
+ self.trans[char] = u"ma"
+ for char in u"니":
+ self.trans[char] = u"ni"
+ for char in u"아":
+ self.trans[char] = u"a"
+ for char in u"독":
+ self.trans[char] = u"dok"
+ for char in u"일":
+ self.trans[char] = u"il"
+ for char in u"모":
+ self.trans[char] = u"mo"
+ for char in u"크":
+ self.trans[char] = u"keu"
+ for char in u"샤":
+ self.trans[char] = u"sya"
+ for char in u"영":
+ self.trans[char] = u"yeong"
+ for char in u"불":
+ self.trans[char] = u"bul"
+ for char in u"가":
+ self.trans[char] = u"ga"
+ for char in u"리":
+ self.trans[char] = u"ri"
+
-
def transliterate(self, char, default="?", prev="-", next="-"):
if char in self.trans:
return self.trans[char]