I'm using replace.py to create wikilinks. Usually I want to select only the
first occurrence of the search string, and my command works fine for this.
But sometimes, the first hit is not suitable (e.g. it's part of a book or
course title, so I don't want to add the wikilink). If I choose n for no,
the bot goes to the next page.
Is there a way I can skip to the next occurrence in the same page? I'm
guessing it will need a modified version of replace.py, so that it gives an
extra option besides ([y]es, [N]o, [e]dit, open in [b]rowser, [a]ll,
[q]uit)
The actual command I'm using is:
python replace.py -regex "(?si)\b((?:FOO1|FOO2))\b(.*$)
" "[[\\1]]\\2" -exceptinsidetag:link -exceptinsidetag:hyperlink
-exceptinsidetag:header -exceptinsidetag:nowiki -exceptinsidetag:ref
-excepttext:"(?si)\[\[((?:FOO1|FOO2)[\|\]])" -namespace:0 -namespace:102
-namespace:4 -summary:"[[Appropedia:Wikilink bot]] adding double square
brackets to: FOO1|FOO2." -log -xml:currentdump.xml
Many thanks!
--
Chris Watkins
Appropedia.org - Sharing knowledge to build rich, sustainable lives.
blogs.appropedia.org
identi.ca/appropedia
twitter.com/appropedia
I want to generate a list of matches for a search, but not do anything to
the page.
E.g. I want to list all pages that contain "redirect[[:Category", but I
don't want to modify the pages.
I guess that it's possible to modify redirect.py (I don't speak python, but
it shouldn't be hard) and run it with -log. But maybe there's a simpler way?
Thanks in advance.
--
Chris Watkins
Appropedia.org - Sharing knowledge to build rich, sustainable lives.
blogs.appropedia.orgcommunity.livejournal.com/appropedia
identi.ca/appropedia
twitter.com/appropedia
Hi!
Do you have any idea why, using replace.py on some large dumps, I get
this error message:
C:\pywikipedia>replace.py -xml:enwiki-20091128-pages-articles.xml
Please enter the text that should be replaced: impossibletofindword
Please enter the new text: found
Please enter another text that should be replaced, or press Enter to start:
The summary message will default to: Robot: Automated text
replacement (-impossibletofindword +found
)
Press Enter to use this default message, or enter a description of the
changes your bot will make: test
Reading XML dump...
Traceback (most recent call last):
File "C:\pywikipedia\pagegenerators.py", line 847, in __iter__
for page in self.wrapped_gen:
File "C:\pywikipedia\pagegenerators.py", line 779, in
DuplicateFilterPageGenerator
for page in generator:
File "C:\pywikipedia\replace.py", line 218, in __iter__
for entry in self.parser:
File "C:\pywikipedia\xmlreader.py", line 295, in new_parse
for rev in self._parse(event, elem):
File "C:\pywikipedia\xmlreader.py", line 304, in _parse_only_latest
yield self._create_revision(revision)
File "C:\pywikipedia\xmlreader.py", line 341, in _create_revision
redirect=self.isredirect
File "C:\pywikipedia\xmlreader.py", line 64, in __init__
self.username = username.strip()
AttributeError: 'NoneType' object has no attribute 'strip'
'NoneType' object has no attribute 'strip'
I updated pywikipedia to the last revision with no results.
As you can see it does not seem to be user-fixes.py or regex-related.
Thanks in advance!
Davide Bolsi
Hi Russel,
the main reason not to join to the rewrite branch is, I did not got it running yet. I get an importError for simplejson. And I have no idea seting PYTHONPATH playing with idle. Whereas the trunk is easy to use: install python, download the bot and expand it, run it. This is the usability I would expect.
Most of the scripts are out of date since they are modified in trunk but not actualized at rewrite. I guess both forks have to be developed in parallel for a while until all (main) scripts are merged. I could supporting the rewrite development but since I could not test that stuff I wouldn't.
However, I have reservations about the effect that the development for older mw versions are cut.
Regards
----- Original Nachricht ----
Von: Russell Blau <russblau(a)imapmail.org>
An: Pywikipedia discussion list <pywikipedia-l(a)lists.wikimedia.org>
Datum: 30.03.2010 16:18
Betreff: [Pywikipedia-l] Request for feedback on rewrite branch
> I am at a point where it would be helpful to have some feedback from other
> Pywikipedia users about the future of the rewrite branch. As those who
> watch the SVN commits know, I have not had as much time to work on this
> lately, and have to prioritize what time I do spend on it.
>
> For those who have used the rewrite branch, what (if anything) needs to be
> done to it to get you to use it exclusively and retire the old wikipedia.py
>
> system? What is missing? What is broken? What is present but could be
> improved?
>
> For those who have chosen not to use the rewrite branch, why not? What
> might lead you to take another look?
>
> And then, I'm sure there are many whose reaction to this post has been,
> "What's the rewrite branch?" I don't know what to ask you, so feel free to
>
> move on to the next message.
>
> Most critically, is there any reason to continue development of the trunk
> once the rewrite branch is at a point where most users are ready to switch
> to it?
>
> -- Russ
>
>
> _______________________________________________
> Pywikipedia-l mailing list
> Pywikipedia-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>
Traumziele - von Beschreibung bis Buchung jetzt kompakt auf den Reise-Seiten von Arcor.de! http://www.arcor.de/rd/footer.reise
Hello all,
Today I have moved the pywikipedia nightlies page to a seperate toolserver
project called 'pywikipedia'. This change has been made to uncouple the
nightlies from my (private) toolserver account. If my toolserver account
would expire, this will no longer have any effect on availability of
nightlies.
*The nightlies are now available via **
http://toolserver.org/~pywikipedia/nightly/*
Additionally, the source code for the nightly-generating code is now
available via
http://toolserver.org/~pywikipedia/nightly/pywikipedia-nightly.git
All old (http://toolserver.org/~valhallasw/pywiki-based) URLs still work,
and are redirected to the new location.
*Any toolserver user interested in becoming a co-maintainer is very welcome
to do so*. Please drop me an e-mail or open a JIRA ticket yourself.
Best regards,
Merlijn 'valhallasw' van Deen
I have found a bug in wikipedia.py and filed a bug ticket (https://
sourceforge.net/tracker/?
func=detail&aid=3020887&group_id=93107&atid=603138). There are several
ways to fix this bug. However, being unfamiliar with the code, I am
unsure which of these best fits into the current architecture.
Is there someone listening that could give me advice on this?
Regards.
--
-- Dan Nessett
[I am resending this, since I wasn't subscribed when I first sent it and it hasn't yet appeared in the archives after over an hour since sending. Sorry if it is a duplicate.]
Hello,
I am trying to login using login.py. The login is hanging and when I control-c out of it, it appears the program is in an infinite recursive loop.
Here is information on the version of pywikipediabot I am using:
$ python version.py
Pywikipedia [http] trunk/pywikipedia (r8305, 2010/06/16, 17:55:23)
Python 2.5.2 (r252:60911, Jan 20 2010, 21:48:48)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)]
config-settings:
use_api = True
use_api_login = True
Here is what is displayed on the terminal when I control-c:
$ python login.py
Password for user WikiadminBot on localhost_CZ_Refactor:en:
Logging in to localhost_CZ_Refactor:en as WikiadminBot via API.
Traceback (most recent call last):
File "login.py", line 436, in <module>
main()
File "login.py", line 432, in main
loginMan.login()
File "login.py", line 319, in login
cookiedata = self.getCookie(api)
File "login.py", line 181, in getCookie
response, data = query.GetData(predata, self.site, sysop=self.sysop, back_response = True)
File "/usr/local/src/python/pywikipedia/query.py", line 122, in GetData
res, jsontext = site.postForm(path, params, sysop, site.cookies(sysop = sysop) )
File "/usr/local/src/python/pywikipedia/wikipedia.py", line 4951, in postForm
cookies=cookies)
File "/usr/local/src/python/pywikipedia/wikipedia.py", line 5087, in postData
self._getUserDataOld(text, sysop = sysop)
File "/usr/local/src/python/pywikipedia/wikipedia.py", line 5366, in _getUserDataOld
blocked = self._getBlock(sysop = sysop)
File "/usr/local/src/python/pywikipedia/wikipedia.py", line 4688, in _getBlock
data = query.GetData(params, self)
File "/usr/local/src/python/pywikipedia/query.py", line 127, in GetData
jsontext = site.getUrl( path, retry=True, sysop=sysop, data=data)
File "/usr/local/src/python/pywikipedia/wikipedia.py", line 5239, in getUrl
self._getUserDataOld(text, sysop = sysop)
File "/usr/local/src/python/pywikipedia/wikipedia.py", line 5366, in _getUserDataOld
<MANY MORE OF THESE>
File "/usr/local/src/python/pywikipedia/wikipedia.py", line 5366, in _getUserDataOld
blocked = self._getBlock(sysop = sysop)
File "/usr/local/src/python/pywikipedia/wikipedia.py", line 4688, in _getBlock
data = query.GetData(params, self)
File "/usr/local/src/python/pywikipedia/query.py", line 127, in GetData
jsontext = site.getUrl( path, retry=True, sysop=sysop, data=data)
File "/usr/local/src/python/pywikipedia/wikipedia.py", line 5141, in getUrl
f = MyURLopener.open(request)
File "/usr/lib/python2.5/urllib2.py", line 381, in open
response = self._open(req, data)
File "/usr/lib/python2.5/urllib2.py", line 399, in _open
'_open', req)
File "/usr/lib/python2.5/urllib2.py", line 360, in _call_chain
result = func(*args)
File "/usr/lib/python2.5/urllib2.py", line 1107, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.5/urllib2.py", line 1080, in do_open
r = h.getresponse()
File "/usr/lib/python2.5/httplib.py", line 928, in getresponse
response.begin()
File "/usr/lib/python2.5/httplib.py", line 385, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.5/httplib.py", line 343, in _read_status
line = self.fp.readline()
File "/usr/lib/python2.5/socket.py", line 331, in readline
data = recv(1)
KeyboardInterrupt
Regards,
Dan Nessett
xqt(a)svn.wikimedia.org ha scritto:
> Revision: 8292
[...]
> Log Message:
> -----------
> bugfix for r8051 (bug #3015645)
[...]
> def skip_section(text):
> - l = list()
> - for s in sections_to_skip.itervalues():
> - l.extend(s)
> - sect_titles = '|'.join(l)
> -
> + sect_titles = '|'.join(sections_to_skip[wikipedia.getSite().lang])
> sectC = re.compile('(?mi)^==\s*(' + sect_titles + ')\s*==')
> - newtext = ''
> -
Preliminary remarks: I don't work anymore in this project because
lacking of coordination and my advises for cooperation got ignored and
even bad interpreted as impolite requests. Of course it isn't a serious
loss for the team.
I don't know what you wanted fix here (maybe the problem was recently
added 'ar' row in dictionary only), but this commit is wrong.
This function should skip all the sections listed in the dictionary,
without language based distintion.
Yes, I know, this source is badly written and need a rewrite, as well
as a bad example for commit messages quality and so on, but sometime
something are made in a way for a reason.
Thanks for taking care of my script though.
Regards,
--
Francesco Cosoleto
La maggior parte dei lussi, e molte delle cosiddette comodità, non sono
indispensabili, ma sono anzi veri e propri ostacoli all'elevazione
morale dell'Uomo. (Thoreau, "Walden")