nicdumz(a)svn.wikimedia.org ha scritto:
> Revision: 6756
> Author: nicdumz
> Date: 2009-04-30 01:47:36 +0000 (Thu, 30 Apr 2009)
>
> Log Message:
> -----------
> Adding an experimental contents_on_disk feature:
> save the Page contents on disk, in a python shelf, and load them
> only when needed, instead of loading the contents in RAM.
>
> Activating this option might slow down a bit the whole interwiki
> process: fetching an entry on disk is slower than simply fetching in
> RAM the attribute. This should however greatly reduce the memory consumption.
[...]
> Modified: trunk/pywikipedia/interwiki.py
[...]
> # (C) Rob W.W. Hooft, 2003
> # (C) Daniel Herding, 2004
> # (C) Yuri Astrakhan, 2005-2006
> +# (C) Pywikipedia bot team, 2007-2009
I think you should put your name instead of a generic "Pywikipedia bot
team" copyright statement. A comment from original authors would be
preferable though.
> + index = 1
> + while True:
> + path = config.datafilepath('cache', 'pagestore' + str(index))
> + if not os.path.exists(path): break
> + index += 1
At least this looks nice for diskcache module too, so we can easily get
rid of the imported random module and the ugly '*-abfdexjwi' like filenames.
> +
> + It's also not necessary to set theses line as a Subject destructor:
these
--
Francesco Cosoleto
"Resteranno però gli altri Achei lunghi capelli,
finché abbatteremo Troia: e se pure questi...
ma sì, sulle navi fuggite verso la patria terra!
Noi due, Stènelo e io, lotteremo, fino a che il termine
fatale d'Ilio troviamo, perché con nume propizio venimmo." (Omero)
nicdumz(a)svn.wikimedia.org ha scritto:
> Revision: 6767
> Author: nicdumz
> Date: 2009-04-30 09:00:50 +0000 (Thu, 30 Apr 2009)
>
> Log Message:
> -----------
> [ 2771272 ] 44 Error Dump Files :
> print a message on site error, sleep and retry.
>
> Modified Paths:
> --------------
> trunk/pywikipedia/wikipedia.py
[...]
> + output(u'Remote site has a problem, it probably ' \
> + 'exited our query with an internal Error. ' \
> + 'Sleeping for %d seconds...' % self.sleeptime)
"error". I think it would be preferable to print a more generic message
about an invalid/unexpected data received from server.
--
Francesco Cosoleto
"Pensare di poter perdere è come tradire i propri principi" (Oscar Luigi
Scalfaro)
2009/4/27 <shizhao(a)svn.wikimedia.org>:
> Revision: 6736
> Author: shizhao
> Date: 2009-04-27 13:06:57 +0000 (Mon, 27 Apr 2009)
>
> Log Message:
> -----------
> fix bug 2780178: repeat = True, can't agian load new
> - if lestart is not None: params['lestart'] = lestart
> - if leend is not None: params['leend'] = leend
> - if leend is not None: params['leuser'] = leuser
> - if leend is not None: params['letitle'] = letitle
> + if lestart != None: params['lestart'] = lestart
> + if leend != None: params['leend'] = leend
> + if leend != None: params['leuser'] = leuser
> + if leend != None: params['letitle'] = letitle
Please don't change "is not None" to "!= None", the first expression
is cleaner, and faster (see rev 6686 by Francesco :
https://fisheye.toolserver.org/changelog/pywikipedia/?cs=6686 )
> [rest of diff]
It looks like the diff is not fixing the issue. With repeat=True, the
same query will be sent over and over, with the same parameters,
returning the same values, and.... never exiting.
And... actually, it seems that a lot of functions implement this
rather useless behavior:
newpages, longpages, shortpages, categories, deadenpages,
ancientpages, lonelypages, unwatchedpages,
(uncategorized|unused)(categories|images), withoutinterwiki,
randompages, and randomredirectpages.
Ah, wait. I see a TODO by Russell:
#TODO: should use offset and limit parameters; 'repeat' as now
# implemented is fairly useless
# this comment applies to all the XXXXpages methods following, as well
Ah, it's been sitting here since October 2007 :)
--
Nicolas Dumazet — NicDumZ [ nɪk.d̪ymz ]
2009/4/24 <russblau(a)svn.wikimedia.org>:
> When server is under heavy load, it may time out (http code 504) on API queries with high limits; this revision lets QueryGenerator instances catch these errors, lower the query limit, and retry.
>
That's an interesting change, because those 504 happen quite often =)
> #TODO: do some error correcting stuff
> + if request.data[0].status == 504:
> + raise Server504Error("Server %s timed out" % site.hostname())
This, however, doesn't seem to work as expected. From time to time, I
get some tracebacks :
ERROR: Traceback (most recent call last):
File "/home/nicdumz/pywikipedia/pywikibot/data/api.py", line 189, in submit
body=params)
File "/home/nicdumz/pywikipedia/pywikibot/comms/http.py", line 102, in request
if request.data[0].status == 504:
AttributeError: 'int' object has no attribute 'status'
WARNING: Waiting 5 seconds before retrying.
But it's nothing critical, because the error is caught, and the query
is sent again.
--
Nicolas Dumazet — NicDumZ [ nɪk.d̪ymz ]
shizhao(a)svn.wikimedia.org ha scritto:
> Revision: 6674
> Author: shizhao
> Date: 2009-04-22 18:31:25 +0000 (Wed, 22 Apr 2009)
>
> Log Message:
> -----------
> add new parameter: "-gorandom".
> Specifies that the robot should starting at the random pages returned by [[Special:Random]].
>
[...]
> +
> +-gorandom Specifies that the robot should starting at the random pages
> + returned by [[Special:Random]].
> """
>
>
> @@ -1009,6 +1012,14 @@
> transclusionPageTitle))
> gen = ReferringPageGenerator(transclusionPage,
> onlyTemplateInclusion=True)
> + elif arg.startswith('-gorandom'):
> + for firstPage in RandomPageGenerator(number = 1):
> + firstPageTitle = firstPage.title()
> + namespace = wikipedia.Page(site, firstPageTitle).namespace()
> + firstPageTitle = wikipedia.Page(site,
> + firstPageTitle).titleWithoutNamespace()
> + gen = AllpagesPageGenerator(firstPageTitle, namespace,
> + includeredirects=False)
> elif arg.startswith('-start'):
> if arg.startswith('-startxml'):
> wikipedia.output(u'-startxml : wrong parameter')
>
I don't agree to implement this option. It isn't necessary request a
page to [[Special:Random]] to get a hint about what download with
[[Special:Allpages]]. And doesn't look for me useful an user option to
do that or similar thing.
--
Francesco Cosoleto
"L'ho incontrato una volta sola. Fini non è una persona che si possa
giudicare avendolo visto una volta sola. E io non ho intenzione di
rivederlo." (Roberto Benigni)
In config.py we can read:
# Use the experimental disk cache to prevent huge memory usage
use_diskcache = False
Is it still experimental? What about enable it by default?
--
Francesco Cosoleto
«Nessuno, eccetto il teorico stesso, crede nelle sue teorie;
tutti credono ai risultati di laboratorio, eccetto lo
sperimentatore». (Albert Einstein)
2009/4/22 <nicdumz(a)svn.wikimedia.org>:
> @@ -4145,6 +4145,14 @@
> # search for family module in the 'families' subdirectory
> sys.path.append(config.datafilepath('families'))
> exec "import %s_family as myfamily" % fam
> + except SyntaxError:
> + if '-' in fam:
> + # A python module cannot include an hyphen
> + output(u"""\
> +A family name cannot include an hyphen (-). Please consider renaming your
> +'%s' family to '%s' or '%s' instead."""
> + % (fam, fam.replace('-', ''), fam.replace('-', '_')))
> + sys.exit(1)
> except ImportError:
> if fatal:
> output(u"""\
On the other hand using the builtin __import__ would allow us to load
families with hyphens:
try:
# search for family module in the 'families' subdirectory
sys.path.append(config.datafilepath('families'))
- exec "import %s_family as myfamily" % fam
+ myfamily = __import__('%s_family' % fam)
except SyntaxError:
if '-' in fam:
# A python module cannot include an hyphen
output(u"""\
__import__ is also cleaner than exec, obviously.
Should we allow family names with hyphens?
--
Nicolas Dumazet — NicDumZ [ nɪk.d̪ymz ]
Hi,I have a interwiki bot and live in Iran.Iran filtered es.wp and my
bot can't read this site but they can't filtered secure
serevers(https://secure.wikimedia.org/) and I want to run bot with
secure servers and must changes wikipedia.py but i tried and can't
changes correctly.
May you change a copy of wikipedia.py and send me?
--
امیر سرآبادانی