> Patches item #2791305, was opened at 2009-05-13 19:07
> Message generated for change (Tracker Item Submitted) made by mcknol
> You can respond by visiting:
> https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2791305&group_…
...
>
> Initial Comment:
> Please add {{surname}}, {{hndis}} and {{given name}} to the list of
> possible disambiguation templates on en.wikipedia for interwiki.py.
According to en.wikipedia standards, none of these is a disambiguation
template.
Russ
I'm looking for a definitive guide to regular expressions as used by
Pywikipedia bot. Is that the same as saying "a definitive guide to regular
expressions as used by Python"?
My hunt started when I looked at an old command log and found a regex search
term:
"(?s)greywater(.*$)"
And realized I'd foolishly not written down what the regex meant. And
searching web and mail for "(?s)" just doesn't work.
Thanks.
--
Chris Watkins
Appropedia.org - Sharing knowledge to build rich, sustainable lives.
identi.ca/appropedia / twitter.com/appropediablogs.appropedia.org
I like this: five.sentenc.es
Hello!
It apparently happens often that some people install Python 3 and try
to run pywikipedia.
pywikipedia then fails ungracefully, with unclear Syntax Errors,
because of syntax changes:
1) Calling login.py
$ python3 login.py
File "login.py", line 61
'en': u'Wikipedia:Registered bots',
^
SyntaxError: invalid syntax
2) Importing wikipedia
$ python3
Python 3.0.1+ (r301:69556, Apr 15 2009, 15:59:22)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import wikipedia
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "wikipedia.py", line 354
raise InvalidTitle(u'Bad page title : %s' % t)
^
SyntaxError: invalid syntax
Beginner users then understand that pywikipedia is broken :)
I tried to fix this issue today, to at least be able to output some
meaningful info when Python3 dies.
Problem:
Because Python3 fails on Syntax Errors, no imports are processed, and
no code is ever interpreted: the parser first parses the code, raising
Syntax Errors, and then only starts interpreting the code. It makes
this issue troublesome.
I came up with this patch:
Index: login.py
===================================================================
--- login.py (revision 6857)
+++ login.py (working copy)
@@ -50,7 +50,11 @@
#
__version__='$Id$'
-import re
+import re, sys
+
+if sys.version >= '3':
+ print 'Python 3.x is _not_ supported. Use Python 2.x'
+
import urllib2
import wikipedia, config
Index: wikipedia.py
===================================================================
--- wikipedia.py (revision 6858)
+++ wikipedia.py (working copy)
@@ -123,6 +123,10 @@
__version__ = '$Id$'
import os, sys
+
+if sys.version >= '3':
+ print 'Python 3.x is _not_ supported. Use Python 2.x'
+
import httplib, socket, urllib
import traceback
import time, threading, Queue
1) Calling login.py
Behavior after:
$ python3 login.py
File "login.py", line 56
print 'Python 3.0 is _not_ supported. Use Python 2.x'
^
SyntaxError: invalid syntax
2) Importing wikipedia:
$ python3
Python 3.0.1+ (r301:69556, Apr 15 2009, 15:59:22)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import wikipedia
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "wikipedia.py", line 128
print 'Python 3.0 is _not_ supported. Use Python 2.x'
^
SyntaxError: invalid syntax
As you see, Python3 will still fail on a syntax error.
The hackish, bad-looking "trick" is only to make it fail on a line
that shows, in a text message, what's wrong.
Do you have any better suggestions on how to do this?
I need to document the "hack" in the code of course, I just included a
minimal patch for the mailing list.
I thought adding this check in wikipedia.py ; and in login.py : they
seem to be the first entry points. Should I add it somewhere else?
test.py maybe?
Thanks!
--
Nicolas Dumazet — NicDumZ [ nɪk.d̪ymz ]
siebrand(a)svn.wikimedia.org ha scritto:
> Revision: 6844
> Author: siebrand
> Date: 2009-05-07 09:27:39 +0000 (Thu, 07 May 2009)
>
> Log Message:
> -----------
> [ 2762911 ] Update for French localisation
It should be preferable add the patch author in the commit message (here
David Crochet), instead of only a number for reference.
--
Francesco Cosoleto
"La mente umana fa l'abitudine a qualsiasi aberrazione" (Roberto Benigni)
nicdumz(a)svn.wikimedia.org ha scritto:
> Revision: 6756
> Author: nicdumz
> Date: 2009-04-30 01:47:36 +0000 (Thu, 30 Apr 2009)
>
> Log Message:
> -----------
> Adding an experimental contents_on_disk feature:
> save the Page contents on disk, in a python shelf, and load them
> only when needed, instead of loading the contents in RAM.
>
> Activating this option might slow down a bit the whole interwiki
> process: fetching an entry on disk is slower than simply fetching in
> RAM the attribute. This should however greatly reduce the memory consumption.
[...]
> Modified: trunk/pywikipedia/interwiki.py
[...]
> # (C) Rob W.W. Hooft, 2003
> # (C) Daniel Herding, 2004
> # (C) Yuri Astrakhan, 2005-2006
> +# (C) Pywikipedia bot team, 2007-2009
I think you should put your name instead of a generic "Pywikipedia bot
team" copyright statement. A comment from original authors would be
preferable though.
> + index = 1
> + while True:
> + path = config.datafilepath('cache', 'pagestore' + str(index))
> + if not os.path.exists(path): break
> + index += 1
At least this looks nice for diskcache module too, so we can easily get
rid of the imported random module and the ugly '*-abfdexjwi' like filenames.
> +
> + It's also not necessary to set theses line as a Subject destructor:
these
--
Francesco Cosoleto
"Resteranno però gli altri Achei lunghi capelli,
finché abbatteremo Troia: e se pure questi...
ma sì, sulle navi fuggite verso la patria terra!
Noi due, Stènelo e io, lotteremo, fino a che il termine
fatale d'Ilio troviamo, perché con nume propizio venimmo." (Omero)
nicdumz(a)svn.wikimedia.org ha scritto:
> Revision: 6808
> Author: nicdumz
> Date: 2009-05-03 17:38:53 +0000 (Sun, 03 May 2009)
>
> Log Message:
> -----------
> Follow-up to r6797 & r6802: importing os in diskcache.delete is _necessary_
> Documenting the reasons, so it doesnt get deleted a second time ;)
Thank you for taking care of my errors.
What is the reason of the "delete" name for this function? "__del__"
looks better, as I can see there are case where delete() isn't called.
--
Francesco Cosoleto
«Non abbiamo tanto bisogno dell'aiuto degli amici, quanto della certezza
del loro aiuto». (Epicuro)
nicdumz(a)svn.wikimedia.org ha scritto:
> Revision: 6819
> Author: nicdumz
> Date: 2009-05-04 15:54:19 +0000 (Mon, 04 May 2009)
>
> Log Message:
> -----------
> ReplaceLanguageLinks : 'match'.split(str) should be str.split('match')
Your patch was hidden in the SF bug tracker. Patch in mailing-list have
more chances to get review.
--
Francesco Cosoleto
"[gli industriali italiani] più speculatori che imprenditori" (Massimo
Paci, ex INPS)
nicdumz(a)svn.wikimedia.org ha scritto:
> Revision: 6767
> Author: nicdumz
> Date: 2009-04-30 09:00:50 +0000 (Thu, 30 Apr 2009)
>
> Log Message:
> -----------
> [ 2771272 ] 44 Error Dump Files :
> print a message on site error, sleep and retry.
>
> Modified Paths:
> --------------
> trunk/pywikipedia/wikipedia.py
[...]
> + output(u'Remote site has a problem, it probably ' \
> + 'exited our query with an internal Error. ' \
> + 'Sleeping for %d seconds...' % self.sleeptime)
"error". I think it would be preferable to print a more generic message
about an invalid/unexpected data received from server.
--
Francesco Cosoleto
"Pensare di poter perdere è come tradire i propri principi" (Oscar Luigi
Scalfaro)
Hello.
It looks just wrong as texts should be normally printed on stdout, and
only sometimes, in case of error, standard error should be used instead.
This is changed with r3324. output(... toStdout = True) occurs only 11
times, and there is a total of 1270 calls to output function.
--
Francesco Cosoleto
Anyone can make mistakes, but only an idiot persists in his error. (Cicero)