I'm using replace.py to create wikilinks. Usually I want to select only the
first occurrence of the search string, and my command works fine for this.
But sometimes, the first hit is not suitable (e.g. it's part of a book or
course title, so I don't want to add the wikilink). If I choose n for no,
the bot goes to the next page.
Is there a way I can skip to the next occurrence in the same page? I'm
guessing it will need a modified version of replace.py, so that it gives an
extra option besides ([y]es, [N]o, [e]dit, open in [b]rowser, [a]ll,
[q]uit)
The actual command I'm using is:
python replace.py -regex "(?si)\b((?:FOO1|FOO2))\b(.*$)
" "[[\\1]]\\2" -exceptinsidetag:link -exceptinsidetag:hyperlink
-exceptinsidetag:header -exceptinsidetag:nowiki -exceptinsidetag:ref
-excepttext:"(?si)\[\[((?:FOO1|FOO2)[\|\]])" -namespace:0 -namespace:102
-namespace:4 -summary:"[[Appropedia:Wikilink bot]] adding double square
brackets to: FOO1|FOO2." -log -xml:currentdump.xml
Many thanks!
--
Chris Watkins
Appropedia.org - Sharing knowledge to build rich, sustainable lives.
blogs.appropedia.org
identi.ca/appropedia
twitter.com/appropedia
I want to generate a list of matches for a search, but not do anything to
the page.
E.g. I want to list all pages that contain "redirect[[:Category", but I
don't want to modify the pages.
I guess that it's possible to modify redirect.py (I don't speak python, but
it shouldn't be hard) and run it with -log. But maybe there's a simpler way?
Thanks in advance.
--
Chris Watkins
Appropedia.org - Sharing knowledge to build rich, sustainable lives.
blogs.appropedia.orgcommunity.livejournal.com/appropedia
identi.ca/appropedia
twitter.com/appropedia
Hi!
Do you have any idea why, using replace.py on some large dumps, I get
this error message:
C:\pywikipedia>replace.py -xml:enwiki-20091128-pages-articles.xml
Please enter the text that should be replaced: impossibletofindword
Please enter the new text: found
Please enter another text that should be replaced, or press Enter to start:
The summary message will default to: Robot: Automated text
replacement (-impossibletofindword +found
)
Press Enter to use this default message, or enter a description of the
changes your bot will make: test
Reading XML dump...
Traceback (most recent call last):
File "C:\pywikipedia\pagegenerators.py", line 847, in __iter__
for page in self.wrapped_gen:
File "C:\pywikipedia\pagegenerators.py", line 779, in
DuplicateFilterPageGenerator
for page in generator:
File "C:\pywikipedia\replace.py", line 218, in __iter__
for entry in self.parser:
File "C:\pywikipedia\xmlreader.py", line 295, in new_parse
for rev in self._parse(event, elem):
File "C:\pywikipedia\xmlreader.py", line 304, in _parse_only_latest
yield self._create_revision(revision)
File "C:\pywikipedia\xmlreader.py", line 341, in _create_revision
redirect=self.isredirect
File "C:\pywikipedia\xmlreader.py", line 64, in __init__
self.username = username.strip()
AttributeError: 'NoneType' object has no attribute 'strip'
'NoneType' object has no attribute 'strip'
I updated pywikipedia to the last revision with no results.
As you can see it does not seem to be user-fixes.py or regex-related.
Thanks in advance!
Davide Bolsi
Hi Russel,
the main reason not to join to the rewrite branch is, I did not got it running yet. I get an importError for simplejson. And I have no idea seting PYTHONPATH playing with idle. Whereas the trunk is easy to use: install python, download the bot and expand it, run it. This is the usability I would expect.
Most of the scripts are out of date since they are modified in trunk but not actualized at rewrite. I guess both forks have to be developed in parallel for a while until all (main) scripts are merged. I could supporting the rewrite development but since I could not test that stuff I wouldn't.
However, I have reservations about the effect that the development for older mw versions are cut.
Regards
----- Original Nachricht ----
Von: Russell Blau <russblau(a)imapmail.org>
An: Pywikipedia discussion list <pywikipedia-l(a)lists.wikimedia.org>
Datum: 30.03.2010 16:18
Betreff: [Pywikipedia-l] Request for feedback on rewrite branch
> I am at a point where it would be helpful to have some feedback from other
> Pywikipedia users about the future of the rewrite branch. As those who
> watch the SVN commits know, I have not had as much time to work on this
> lately, and have to prioritize what time I do spend on it.
>
> For those who have used the rewrite branch, what (if anything) needs to be
> done to it to get you to use it exclusively and retire the old wikipedia.py
>
> system? What is missing? What is broken? What is present but could be
> improved?
>
> For those who have chosen not to use the rewrite branch, why not? What
> might lead you to take another look?
>
> And then, I'm sure there are many whose reaction to this post has been,
> "What's the rewrite branch?" I don't know what to ask you, so feel free to
>
> move on to the next message.
>
> Most critically, is there any reason to continue development of the trunk
> once the rewrite branch is at a point where most users are ready to switch
> to it?
>
> -- Russ
>
>
> _______________________________________________
> Pywikipedia-l mailing list
> Pywikipedia-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>
Traumziele - von Beschreibung bis Buchung jetzt kompakt auf den Reise-Seiten von Arcor.de! http://www.arcor.de/rd/footer.reise
Do we have anyone who can help me set up the rewrite branch to work with
easy_install.py and setuptools? I suppose I could figure it out myself, but
if we have someone who is more familiar with using these tools, they could
probably get it done a lot quicker. Ideally, the script should check for
dependencies (httplib2, and simplejson for Python2.5) and also call the new
generate_user_files.py after installing the package in site-packages.
Once this is done, the whole installation and setup process should be much
simpler.
Hi Merlijn,
thanks for your reply amd your hints for this matter. simplejson is marked as external and downloaded automatically by actualizing svn. This depends also for the trunk fork. Both have identical files but importing into the rewrite gives me that error (yet). I know about the included lib in python 2.6 (and I will update it some days) but since 2.5 is not depricated for the rewrite branch it should work as well with other externals. Maybe I need a little time to explore this misfunction on my workstation.
xqt
----- Original Nachricht ----
Von: Merlijn van Deen <valhallasw(a)arctus.nl>
An: Pywikipedia discussion list <pywikipedia-l(a)lists.wikimedia.org>
Datum: 15.04.2010 20:33
Betreff: Re: [Pywikipedia-l] Import problem using rewrite branch (was:
Request for feedback on rewrite branch)
> On 15 April 2010 18:13, <info(a)gno.de> wrote:
>
> > I get an importError for simplejson.
>
>
> Try updating to python 2.6 (which includes the 'json' library, which is
> used
> by default), or installing simplejson, which is available
> here<http://pypi.python.org/pypi/simplejson/>
> .
>
> Merlijn
>
>
> --------------------------------
>
> _______________________________________________
> Pywikipedia-l mailing list
> Pywikipedia-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>
Traumziele - von Beschreibung bis Buchung jetzt kompakt auf den Reise-Seiten von Arcor.de! http://www.arcor.de/rd/footer.reise
On 15 April 2010 18:13, <info(a)gno.de> wrote:
> I get an importError for simplejson.
Try updating to python 2.6 (which includes the 'json' library, which is used
by default), or installing simplejson, which is available
here<http://pypi.python.org/pypi/simplejson/>
.
Merlijn
Updated my copy and r8071 works for me. Thanks for the fix.
And while I'm at it, big thanks to you and everyone else who've
contributed to Pywikipedia! So far I'm only using it for some simple
small things, but all along the library has provided everything I've
needed and been quite easy to figure out how to work with. Great
experience!
Cheers,
Morten
On Fri, Apr 9, 2010 at 11:59 AM, <xqt_wp(a)arcor.de> wrote:
> Right, the cookie was missing! I tried to fix it in r8071. It worked for me but sorry for this hack.
> I have not remaining time yet to make it cleaner.