Bugs item #1771889, was opened at 2007-08-10 17:20
Message generated for change (Comment added) made by falk_steinhauer
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1771889&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Falk Steinhauer (falk_steinhauer)
Assigned to: Nobody/Anonymous (nobody)
Summary: Problems with namespaces in wikipedia.py
Initial Comment:
I am using snapshot 2007-06-19:
In our wiki we are using title prefixes for articles that are not in german. They are Fr: (French) and En: (Englisch).
One of our French articles marks the end of a subarticle of [[Special:All Pages]] (see here: http://www.wiki-aventurica.de/index.php?title=Spezial:Alle_Seiten)
If I am using commandline option -start:! the script runs into a recursion. After Fr:xxxx is yielded the script whishes to continue with article xxxx, which is in my case alphabetically before Fr:xxxx. You can see, that this leads to a recursion. If xxxx is after Fr:xxxx, some articles might be skipped.
I detected the reponsible line of code:
wikipedia.py line 3504
# save the last hit, so that we know where to continue when we
# finished all articles on the current page. Append a '!' so that
# we don't yield a page twice.
start = Page(self,hit).titleWithoutNamespace() + '!'
Maybe this can also be fixed in titleWithoutNamespace()
Is it necessary to cut off the namespace?
----------------------------------------------------------------------
>Comment By: Falk Steinhauer (falk_steinhauer)
Date: 2007-08-13 22:54
Message:
Logged In: YES
user_id=1810075
Originator: YES
I don't have these problems with the actual release. That's why I stepped
back.
We worked around the initial problem within our wiki.
----------------------------------------------------------------------
Comment By: Daniel Herding (wikipedian)
Date: 2007-08-13 10:20
Message:
Logged In: YES
user_id=880694
Originator: NO
The timeouts are a way to reduce database server load during peak times.
See: http://www.mediawiki.org/wiki/Manual:Maxlag_parameter
Maybe your server is generally a bit slow, so try to increase the maxlag
parameter in your user-config.py, for example:
maxlag = 10
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2007-08-11 11:48
Message:
Logged In: NO
I stepped back to snapshot 2007-06-19 because of several problems with
nightly build 2007-08-10 08:39:28.
With this version my scripts were not able to change pages with
wikipedia.Page.put(). Server timeout was reported frequently, but the
server was not down.
----------------------------------------------------------------------
Comment By: Falk Steinhauer (falk_steinhauer)
Date: 2007-08-10 23:34
Message:
Logged In: YES
user_id=1810075
Originator: YES
Something is still disturbing. Our language prefixes are still cut off.
But so such pages cannot be found in namespace 0.
----------------------------------------------------------------------
Comment By: Falk Steinhauer (falk_steinhauer)
Date: 2007-08-10 23:26
Message:
Logged In: YES
user_id=1810075
Originator: YES
Thanks now it works.
Some strange thing is, that no redirects are yielded, but parameter
includeredirects of AllpagesPageGenerator() is default set to True.
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2007-08-10 18:55
Message:
Logged In: YES
user_id=687283
Originator: NO
Strange, as these prefixes should not be interpreted as namespaces. For
now, please update to SVN or the latest nightly (
http://tools.wikimedia.de/~valhallasw/pywiki/ ), and test if the issue
still exists.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1771889&group_…
Revision: 4038
Author: wikipedian
Date: 2007-08-13 19:50:49 +0000 (Mon, 13 Aug 2007)
Log Message:
-----------
docu
Modified Paths:
--------------
trunk/pywikipedia/interwiki.py
Modified: trunk/pywikipedia/interwiki.py
===================================================================
--- trunk/pywikipedia/interwiki.py 2007-08-13 19:47:50 UTC (rev 4037)
+++ trunk/pywikipedia/interwiki.py 2007-08-13 19:50:49 UTC (rev 4038)
@@ -1076,6 +1076,7 @@
reporting of missing backlinks for pages we already fixed
"""
+ # use sets because searching an element is faster than in lists
expectedPages = set(new.values())
expectedSites = set([page.site() for page in expectedPages])
try:
@@ -1086,6 +1087,8 @@
except wikipedia.NoPage:
wikipedia.output(u"WARNING: Page %s does no longer exist?!" % page.title())
break
+ # To speed things up, create a dictionary which maps sites to pages.
+ # This assumes that there is only one interwiki link per language.
linkedPagesDict = {}
for linkedPage in linkedPages:
linkedPagesDict[linkedPage.site()] = linkedPage
Revision: 4037
Author: wikipedian
Date: 2007-08-13 19:47:50 +0000 (Mon, 13 Aug 2007)
Log Message:
-----------
Sped up backlinks report generation.
By making use of dictionaries and sets, decreased complexity from O(n^3)
to O(n^2).
For example, the backlinks report for
python interwiki.py -lang:de Indien -localonly
is now generated in 26 seconds, instead of the 190 seconds that were
needed before.
Modified Paths:
--------------
trunk/pywikipedia/interwiki.py
Modified: trunk/pywikipedia/interwiki.py
===================================================================
--- trunk/pywikipedia/interwiki.py 2007-08-13 19:41:35 UTC (rev 4036)
+++ trunk/pywikipedia/interwiki.py 2007-08-13 19:47:50 UTC (rev 4037)
@@ -1076,34 +1076,33 @@
reporting of missing backlinks for pages we already fixed
"""
+ expectedPages = set(new.values())
+ expectedSites = set([page.site() for page in expectedPages])
try:
for site, page in new.iteritems():
if site not in updatedSites and not page.section():
- shouldlink = new.values()
try:
- linked = page.interwiki()
+ linkedPages = set(page.interwiki())
except wikipedia.NoPage:
wikipedia.output(u"WARNING: Page %s does no longer exist?!" % page.title())
break
- for xpage in shouldlink:
- if xpage != page and not xpage in linked:
- for l in linked:
- if l.site() == xpage.site():
- wikipedia.output(u"WARNING: %s: %s does not link to %s but to %s" % (page.site().family.name, page.aslink(True), xpage.aslink(True), l.aslink(True)))
- break
- else:
- wikipedia.output(u"WARNING: %s: %s does not link to %s" % (page.site().family.name, page.aslink(True), xpage.aslink(True)))
+ linkedPagesDict = {}
+ for linkedPage in linkedPages:
+ linkedPagesDict[linkedPage.site()] = linkedPage
+ for expectedPage in expectedPages:
+ if expectedPage != page and expectedPage not in linkedPages:
+ try:
+ linkedPage = linkedPagesDict[expectedPage.site()]
+ wikipedia.output(u"WARNING: %s: %s does not link to %s but to %s" % (page.site().family.name, page.aslink(True), expectedPage.aslink(True), linkedPage.aslink(True)))
+ except KeyError:
+ wikipedia.output(u"WARNING: %s: %s does not link to %s" % (page.site().family.name, page.aslink(True), expectedPage.aslink(True)))
# Check for superfluous links
- for xpage in linked:
- if not xpage in shouldlink:
+ for linkedPage in linkedPages:
+ if linkedPage not in expectedPages:
# Check whether there is an alternative page on that language.
- for l in shouldlink:
- if l.site() == xpage.site():
- # Already reported above.
- break
- else:
- # New warning
- wikipedia.output(u"WARNING: %s: %s links to incorrect %s" % (page.site().family.name, page.aslink(True), xpage.aslink(True)))
+ # In this case, it was already reported above.
+ if linkedPage.site() not in expectedSites:
+ wikipedia.output(u"WARNING: %s: %s links to incorrect %s" % (page.site().family.name, page.aslink(True), linkedPage.aslink(True)))
except (socket.error, IOError):
wikipedia.output(u'ERROR: could not report backlinks')
Revision: 4035
Author: btongminh
Date: 2007-08-13 19:10:17 +0000 (Mon, 13 Aug 2007)
Log Message:
-----------
SQL table layout.
Modified Paths:
--------------
trunk/pywikipedia/delinker.txt
Modified: trunk/pywikipedia/delinker.txt
===================================================================
--- trunk/pywikipedia/delinker.txt 2007-08-13 15:05:15 UTC (rev 4034)
+++ trunk/pywikipedia/delinker.txt 2007-08-13 19:10:17 UTC (rev 4035)
@@ -64,7 +64,7 @@
First setup the dictionary ''CommonsDelinker'', by adding to the config:
CommonsDelinker = {}
-==== General settings ====
+=== General settings ===
* ''timeout = 60'': A general timeout, used for fetching the log and other
timeouts. Set to 60 for medium sized wikis, such as English Wikipedia,
and 60-120 for smaller wikis such as German Wikipedia. Note that during
@@ -88,7 +88,7 @@
GLOBALLY WITHOUT CONSULTING BRYAN AND SIEBRAND. Thank you.
* ''no_sysop = True'': Disable delinking as sysop.
-==== Delinker settings ====
+=== Delinker settings ===
Those variables only need to be set if the delinker is enabled.
* ''delink_wait = 600'': The time to wait after deletion before the image is
delinked.
@@ -96,7 +96,7 @@
summary, the file is not delinked.
* ''summary_cache = 3600'': Time before on-wiki settings are updated.
-==== Replacer settings ====
+=== Replacer settings ===
Those variables only need to be set if the replacer is enabled.
* ''replace_template = "replace image"'': The template for to command
replacement.
@@ -107,7 +107,7 @@
* ''disallowed_replacements = [(r'\.png$', r'\.svg$')]'': List of regular expressions
of refused replacements.
-==== SQL settings ====
+=== SQL settings ===
* ''sql_engine = "mysql"'': Database engine to use. Currently supported:
MySQL. Support for sqlite3 is planned. The Global delinker requires MySQL.
* ''sql_config = {\
@@ -121,6 +121,31 @@
* ''replacer_table = "database.replacer"'': The database.table for the
replacer. Only required if the replacer is activated.
+==== SQL table layout ====
+<code lang="sql">
+CREATE TABLE delinker (
+ timestamp CHAR(14),
+ img VARBINARY(255),
+ wiki VARBINARY(255),
+ page_title VARBINARY(255),
+ namespace INT,
+ status ENUM('ok', 'skipped', 'failed'),
+ newimg VARBINARY(255)
+);
+CREATE TABLE replacer (
+ id INT NOT NULL AUTO_INCREMENT,
+ timestamp VARBINARY(14),
+ old_image VARBINARY(255),
+ new_image VARBINARY(255),
+ status ENUM('pending', 'ok', 'refused', 'done'),
+ user VARBINARY(255),
+ comment VARBINARY(255),
+
+ PRIMARY KEY(id),
+ INDEX(status)
+);
+</code>
+
==== Edit and debugging settings ====
* ''save_diff = False'': Save all changes to a diff. Create a directory diff/
before running.
Hi,
What do you think about add an invite to use dump file / -xml when a
pywiki user does something like:
> python replace.py -start:! a b
(of course, 'a' is surely a very rare text always) or:
> python replace.py -start:Image:! c d
helpful?
Regards,
Francesco Cosoleto
Revision: 4034
Author: wikipedian
Date: 2007-08-13 15:05:15 +0000 (Mon, 13 Aug 2007)
Log Message:
-----------
Applied changes by Filnik (new -savedata parameter). It still runs, but
I haven't tested if it still runs properly; if it doesn't, blame Filnik.
Especially, there was broken indentation near the end of the file (date
formatting code), there were tabs instead of spaces. I hope I got it
right.
Modified Paths:
--------------
trunk/pywikipedia/welcome.py
Modified: trunk/pywikipedia/welcome.py
===================================================================
--- trunk/pywikipedia/welcome.py 2007-08-13 10:19:19 UTC (rev 4033)
+++ trunk/pywikipedia/welcome.py 2007-08-13 15:05:15 UTC (rev 4034)
@@ -1,4 +1,4 @@
-#!/usr/bin/python
+#!/usr/bin/python
# -*- coding: utf-8 -*-
"""
Script to welcome new users. This script works out of the box for Wikis that
@@ -72,6 +72,9 @@
-random Use a random signature, taking the signatures from a wiki
page (for istruction, see below).
+ -savedata This feature saves the random signature index to allow to
+ continue to welcome with the last signature used.
+
********************************* GUIDE ***********************************
Report, Bad and white list guide:
@@ -152,23 +155,25 @@
__version__ = '$Id: welcome.py,v 1.4 2007/04/14 18:05:42 siebrand Exp$'
#
-import wikipedia, string
-import time, re, config
-import urllib
-import locale
+import wikipedia, config, string, locale
+import time, re, cPickle, os, urllib
+
locale.setlocale(locale.LC_ALL,'')
-number = 1 # number of edits that an user required to be welcomed
-numberlog = 15 # number of users that are required to add the log :)
-limit = 50 # number of users that the bot load to check
-offset_variable = 0 # number of newest users to skip each run
-recursive = True # define if the Bot is recursive or not
-time_variable = 3600 # how much time (sec.) the bot sleeps before restart
-log_variable = True # create the welcome log or not
-ask = False # should bot ask to add username to bad-username list
-filter_wp = False # check if the username is ok or not
-sign = ' ~~~~' # default signature
-random = False # should signature be random or not
+number = 1 # number of edits that an user required to be welcomed
+numberlog = 15 # number of users that are required to add the log :)
+limit = 50 # number of users that the bot load to check
+offset_variable = 0 # number of newest users to skip each run
+recursive = True # define if the Bot is recursive or not
+time_variable = 3600 # how much time (sec.) the bot sleeps before restart
+log_variable = True # create the welcome log or not
+ask = False # should bot ask to add username to bad-username list
+filter_wp = False # check if the username is ok or not
+sign = ' --~~~~' # default signature
+random = False # should signature be random or not
+savedata = False # should save the signature index or not
+filename = 'welcome.data' # file where is stored the random signature index
+directory = str(os.getcwd())
# Script users the class wikipedia.translate() to find the right
# page/user/summary/etc so the need to specify language and project have
@@ -510,6 +515,8 @@
ask = True
elif arg == '-filter':
filter_wp = True
+ elif arg == '-savedata':
+ savedata = True
elif arg == '-random':
random = True
elif arg.startswith('-limit'):
@@ -559,13 +566,17 @@
welcomer = u'{{subst:Utente:Filnik/Benve|nome={{subst:PAGENAME}}}} %s'
welcomed_users = list()
- number_user = 0
+ if savedata == True and os.path.exists(directory + '/' + filename):
+ f = file(filename)
+ number_user = cPickle.load(f)
+ else:
+ number_user = 0
# Use try and finally, to put the wikipedia.stopme() always at the end of the code.
try:
# Here there is the main loop.
while True:
if filter_wp == True:
- # A standard list of bad username components (you can change/delate it in your project...) [ i divide the list into two to make it smaller...]
+ # A standard list of bad username components (you can change/delate it in your project...) [ i divide the list into three to make it smaller...]
elencoaf = [' ano', ' anus', 'anal ', 'babies', 'baldracca', 'balle', 'bastardo',
'bestiali', 'bestiale', 'bastarda', 'b.i.t.c.h.', 'bitch', 'boobie',
'bordello', 'breast', 'cacata', 'cacca', 'cachapera', 'cagata',
@@ -642,7 +653,7 @@
if random == True:
try:
wikipedia.output(u'Loading random signatures...')
- signList = defineSign(wsite,signPageTitle)
+ signList = defineSign(wsite, signPageTitle)
except wikipedia.NoPage:
wikipedia.output(u'The list with signatures is not available... Using default signature...')
random = False
@@ -766,10 +777,10 @@
# If recursive, don't exit, repeat after one hour.
if recursive == True:
waitstr = unicode(time_variable)
- if locale.getlocale()[1]:
- strfstr = unicode(time.strftime(u"%d %b %Y %H:%M:%S (UTC)", time.gmtime()), locale.getlocale()[1])
- else:
- strfstr = unicode(time.strftime(u"%d %b %Y %H:%M:%S (UTC)", time.gmtime()))
+ if locale.getlocale()[1]:
+ strfstr = unicode(time.strftime(u"%d %b %Y %H:%M:%S (UTC)", time.gmtime()), locale.getlocale()[1])
+ else:
+ strfstr = unicode(time.strftime(u"%d %b %Y %H:%M:%S (UTC)", time.gmtime()))
wikipedia.output(u'Sleeping %s seconds before rerun. %s' % (waitstr, strfstr))
time.sleep(time_variable)
# If not recursive, break.
@@ -777,4 +788,9 @@
wikipedia.output(u'Stop!')
break
finally:
- wikipedia.stopme()
+ if random == True:
+ if savedata == True:
+ f = file(filename, 'w')
+ cPickle.dump(number_user, f)
+ f.close()
+ wikipedia.stopme()
\ No newline at end of file
Is there anyone who can publish the file in the attachment?
I don't know why but I am not able to upload to SVN (but I can download the
newer versions). Tips are welcome.
Summary: Fix wrong Dutch translation
Thanks
Ward