I want to read a special page with Page.get(). The message is:
File "C:\Program Files\Pywikipedia\wikipedia.py", line 601, in get
raise NoPage('%s is in the Special namespace!' % self.aslink())
pywikibot.exceptions.NoPage
What is the solution?
--
Bináris
I'm using replace.py to create wikilinks. Usually I want to select only the
first occurrence of the search string, and my command works fine for this.
But sometimes, the first hit is not suitable (e.g. it's part of a book or
course title, so I don't want to add the wikilink). If I choose n for no,
the bot goes to the next page.
Is there a way I can skip to the next occurrence in the same page? I'm
guessing it will need a modified version of replace.py, so that it gives an
extra option besides ([y]es, [N]o, [e]dit, open in [b]rowser, [a]ll,
[q]uit)
The actual command I'm using is:
python replace.py -regex "(?si)\b((?:FOO1|FOO2))\b(.*$)
" "[[\\1]]\\2" -exceptinsidetag:link -exceptinsidetag:hyperlink
-exceptinsidetag:header -exceptinsidetag:nowiki -exceptinsidetag:ref
-excepttext:"(?si)\[\[((?:FOO1|FOO2)[\|\]])" -namespace:0 -namespace:102
-namespace:4 -summary:"[[Appropedia:Wikilink bot]] adding double square
brackets to: FOO1|FOO2." -log -xml:currentdump.xml
Many thanks!
--
Chris Watkins
Appropedia.org - Sharing knowledge to build rich, sustainable lives.
blogs.appropedia.org
identi.ca/appropedia
twitter.com/appropedia
I want to generate a list of matches for a search, but not do anything to
the page.
E.g. I want to list all pages that contain "redirect[[:Category", but I
don't want to modify the pages.
I guess that it's possible to modify redirect.py (I don't speak python, but
it shouldn't be hard) and run it with -log. But maybe there's a simpler way?
Thanks in advance.
--
Chris Watkins
Appropedia.org - Sharing knowledge to build rich, sustainable lives.
blogs.appropedia.orgcommunity.livejournal.com/appropedia
identi.ca/appropedia
twitter.com/appropedia
Hi all;
I think that there is an error in xmlreader.py. When parsing a full revision
XML (in this case[1]), using this code[2] (look at the try-catch, it writes
when fails) I get correctly username, timestamp and revisionid, but
sometimes, the page title and the page id are None or empty string.
The first error is:
['', None, 'QuartierLatin1968', '2004-10-10T04:24:14Z', '4267']
But if we do:
7za e -bd -so kwwiki-20100926-pages-meta-history.xml.7z 2>/dev/null | egrep
-i '2004-10-10T04::14Z' -C20
We get this[3], which is OK, the page title and the page id are available in
the XML, but not correctly parsed. And this is not the only page title and
page it that fails.
Perhaps I have missed something, because I'm learning to parsing XML. Sorry
in that case.
Regards,
emijrp
[1]
http://download.wikimedia.org/kwwiki/20100926/kwwiki-20100926-pages-meta-hi…
[2] http://pastebin.ca/1951930
[3] http://pastebin.ca/1951937
For some reason, I should install Python 3.1 on my Windows XP, and I want to
keep 2.5 for pywikibot. Now, I am looking for some easy and comfortable way
to separate them. I mean, if I type "something.py" at the command prompt in
my Pywikipedia directory, then the script should run under 2.5, and if I
type this in another directory, then it should run under 3.1. Is it
possible?
--
Bináris
A user in huwiki regularly runs this script to archive a lot of talk pages
and community pages:
http://hu.wikipedia.org/wiki/Szerkeszt%C5%91:Cherybot/archivebot_hu.py
This is some modified version of archivebot.py.
We have a community page:
http://hu.wikipedia.org/wiki/Wikip%C3%A9dia:B%C3%BCrokrat%C3%A1k_%C3%BCzen%…
This has 5 first level headers (=title=). This is unusual.
When the bot arhives a section above the =title=, the =title= line goes to
the archive, too.
Now, I was asked to help to correct this behavior. I am not familiar with
the whole thing, I have never run archivebot.py.
The question is: was there any problem like this in another wiki, is there a
bugfix for this in the fresh version, or is it only our problem?
--
Bináris
Hello all
I'd like to suggest the code change given by the attached patch.
The idea is to change RegexFilterPageGenerator a little bit. First
change the 'regex' param to a list of regex(es) instead of 1 single.
The whole list of regex will be checked for a positive match. The
second change involves a new parameter 'invert' which, if set to
True changes the generator from returning pages on ANY POSITIVE match
to return page on NO POSITIVE match AT ALL. This way a positive
(additive) and negative (subtractive) filter behaviour can be achieved.
This would also be very helpful for my bot... ;)
Thanks a lot and greetings
DrTrigon
Hi!
Do you have any idea why, using replace.py on some large dumps, I get
this error message:
C:\pywikipedia>replace.py -xml:enwiki-20091128-pages-articles.xml
Please enter the text that should be replaced: impossibletofindword
Please enter the new text: found
Please enter another text that should be replaced, or press Enter to start:
The summary message will default to: Robot: Automated text
replacement (-impossibletofindword +found
)
Press Enter to use this default message, or enter a description of the
changes your bot will make: test
Reading XML dump...
Traceback (most recent call last):
File "C:\pywikipedia\pagegenerators.py", line 847, in __iter__
for page in self.wrapped_gen:
File "C:\pywikipedia\pagegenerators.py", line 779, in
DuplicateFilterPageGenerator
for page in generator:
File "C:\pywikipedia\replace.py", line 218, in __iter__
for entry in self.parser:
File "C:\pywikipedia\xmlreader.py", line 295, in new_parse
for rev in self._parse(event, elem):
File "C:\pywikipedia\xmlreader.py", line 304, in _parse_only_latest
yield self._create_revision(revision)
File "C:\pywikipedia\xmlreader.py", line 341, in _create_revision
redirect=self.isredirect
File "C:\pywikipedia\xmlreader.py", line 64, in __init__
self.username = username.strip()
AttributeError: 'NoneType' object has no attribute 'strip'
'NoneType' object has no attribute 'strip'
I updated pywikipedia to the last revision with no results.
As you can see it does not seem to be user-fixes.py or regex-related.
Thanks in advance!
Davide Bolsi
The first line of pywki moduls is:
#!/usr/bin/python
Elsewhere I see #! /usr/bin/env/python, and (as far as I remember) this is
stated at docs.python.org, too. What is the difference? Forgive me this
question, I am a Windows-user. Is it just a user habit, where do you put
your Python? Is any of these more canonical?
The second line is usually:
# -*- coding: utf-8 -*-
Elsewhere I see
#coding: utf-8
Is there any difference?
--
Bináris
Hi,
I noticed that there's a bug in the getOldVersion() method in page.py.
The error is in line 309, when it calls site.loadrevisions() the
third parameter is 'revids=oldid'. loadrevisions() attempts to
iterate over that, and fails. The doc for getOldVersion() says
'oldid' should be "the revid of the revision desired", so I patched
line 309 to use 'revids=[oldid]', and that seems to work just fine.
Cheers,
Morten