I'm using replace.py to create wikilinks. Usually I want to select only the
first occurrence of the search string, and my command works fine for this.
But sometimes, the first hit is not suitable (e.g. it's part of a book or
course title, so I don't want to add the wikilink). If I choose n for no,
the bot goes to the next page.
Is there a way I can skip to the next occurrence in the same page? I'm
guessing it will need a modified version of replace.py, so that it gives an
extra option besides ([y]es, [N]o, [e]dit, open in [b]rowser, [a]ll,
[q]uit)
The actual command I'm using is:
python replace.py -regex "(?si)\b((?:FOO1|FOO2))\b(.*$)
" "[[\\1]]\\2" -exceptinsidetag:link -exceptinsidetag:hyperlink
-exceptinsidetag:header -exceptinsidetag:nowiki -exceptinsidetag:ref
-excepttext:"(?si)\[\[((?:FOO1|FOO2)[\|\]])" -namespace:0 -namespace:102
-namespace:4 -summary:"[[Appropedia:Wikilink bot]] adding double square
brackets to: FOO1|FOO2." -log -xml:currentdump.xml
Many thanks!
--
Chris Watkins
Appropedia.org - Sharing knowledge to build rich, sustainable lives.
blogs.appropedia.org
identi.ca/appropedia
twitter.com/appropedia
I want to generate a list of matches for a search, but not do anything to
the page.
E.g. I want to list all pages that contain "redirect[[:Category", but I
don't want to modify the pages.
I guess that it's possible to modify redirect.py (I don't speak python, but
it shouldn't be hard) and run it with -log. But maybe there's a simpler way?
Thanks in advance.
--
Chris Watkins
Appropedia.org - Sharing knowledge to build rich, sustainable lives.
blogs.appropedia.orgcommunity.livejournal.com/appropedia
identi.ca/appropedia
twitter.com/appropedia
Hi!
Do you have any idea why, using replace.py on some large dumps, I get
this error message:
C:\pywikipedia>replace.py -xml:enwiki-20091128-pages-articles.xml
Please enter the text that should be replaced: impossibletofindword
Please enter the new text: found
Please enter another text that should be replaced, or press Enter to start:
The summary message will default to: Robot: Automated text
replacement (-impossibletofindword +found
)
Press Enter to use this default message, or enter a description of the
changes your bot will make: test
Reading XML dump...
Traceback (most recent call last):
File "C:\pywikipedia\pagegenerators.py", line 847, in __iter__
for page in self.wrapped_gen:
File "C:\pywikipedia\pagegenerators.py", line 779, in
DuplicateFilterPageGenerator
for page in generator:
File "C:\pywikipedia\replace.py", line 218, in __iter__
for entry in self.parser:
File "C:\pywikipedia\xmlreader.py", line 295, in new_parse
for rev in self._parse(event, elem):
File "C:\pywikipedia\xmlreader.py", line 304, in _parse_only_latest
yield self._create_revision(revision)
File "C:\pywikipedia\xmlreader.py", line 341, in _create_revision
redirect=self.isredirect
File "C:\pywikipedia\xmlreader.py", line 64, in __init__
self.username = username.strip()
AttributeError: 'NoneType' object has no attribute 'strip'
'NoneType' object has no attribute 'strip'
I updated pywikipedia to the last revision with no results.
As you can see it does not seem to be user-fixes.py or regex-related.
Thanks in advance!
Davide Bolsi
Hi Russel,
the main reason not to join to the rewrite branch is, I did not got it running yet. I get an importError for simplejson. And I have no idea seting PYTHONPATH playing with idle. Whereas the trunk is easy to use: install python, download the bot and expand it, run it. This is the usability I would expect.
Most of the scripts are out of date since they are modified in trunk but not actualized at rewrite. I guess both forks have to be developed in parallel for a while until all (main) scripts are merged. I could supporting the rewrite development but since I could not test that stuff I wouldn't.
However, I have reservations about the effect that the development for older mw versions are cut.
Regards
----- Original Nachricht ----
Von: Russell Blau <russblau(a)imapmail.org>
An: Pywikipedia discussion list <pywikipedia-l(a)lists.wikimedia.org>
Datum: 30.03.2010 16:18
Betreff: [Pywikipedia-l] Request for feedback on rewrite branch
> I am at a point where it would be helpful to have some feedback from other
> Pywikipedia users about the future of the rewrite branch. As those who
> watch the SVN commits know, I have not had as much time to work on this
> lately, and have to prioritize what time I do spend on it.
>
> For those who have used the rewrite branch, what (if anything) needs to be
> done to it to get you to use it exclusively and retire the old wikipedia.py
>
> system? What is missing? What is broken? What is present but could be
> improved?
>
> For those who have chosen not to use the rewrite branch, why not? What
> might lead you to take another look?
>
> And then, I'm sure there are many whose reaction to this post has been,
> "What's the rewrite branch?" I don't know what to ask you, so feel free to
>
> move on to the next message.
>
> Most critically, is there any reason to continue development of the trunk
> once the rewrite branch is at a point where most users are ready to switch
> to it?
>
> -- Russ
>
>
> _______________________________________________
> Pywikipedia-l mailing list
> Pywikipedia-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>
Traumziele - von Beschreibung bis Buchung jetzt kompakt auf den Reise-Seiten von Arcor.de! http://www.arcor.de/rd/footer.reise
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello all,
There is a discussion on enwiki [1] which concerns bots editing while
logged out. The proposal is to block the toolserver IP range to prevent
this from happening -- however, this shouldn't be necessary since bots
should never edit while logged out.
Bot frameworks should help bot operators and authors by using an edit
assertion [2] - is anyone aware of a framework which /doesn't/ do that?
So, this is a reminder to please take care when writing scripts: check
if your framework handles this for you, and if not, be sure to write
your bot such that it edits only while logged in.
If the toolserver IP range does end up being blocked, is anyone aware of
read-only tools which would be adversely affected by the IP being
blocked? Such tools are poorly designed, and their authors should fix
them sooner rather than later, even if the block isn't placed in the end.
- -Mike
[1]
http://en.wikipedia.org/?oldid=364741979#Proposal_to_softblock_Toolserver_I…
[2] http://www.mediawiki.org/wiki/Extension:Assert_Edit
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iEYEARECAAYFAkwDPWUACgkQst0AR/DaKHtTlACfW8XWbH/Pi1Yh1ExkaHVhabLa
+y0AnRBSsHt7QYcWbUcPu4rANySCkFO2
=989j
-----END PGP SIGNATURE-----
Hello! I would like to use Pywikipedia to build a bot on the French wiki
called Vikidia (an encyclopedia for children, naturally based on
Mediawiki). Because the file vikidia_family.py did not exist, I made it
(according to the model on
meta.wikimedia.org/wiki/Pywikipediabot/Use_on_non-WMF_wikis).
Nevertheless, when I now run (for example) replace.py, at first
everything seems good, but then I obtain a "missingtitle" error, which I
do not understand and which forbids me to use the script. Once,
executing another pywikipedia script (which had to create the the bot's
user sub-page called "test" and write "test" inside), I created in
reality a page called "Api.php", in the main spacename! :)
That is why I think that my file vikidia_family.py (hand-made) is bad,
but I do not know where. Is anybody there who could help me, with an
information or a piece of advice, for example? Thank you very much!
thilp
I've done a fix on interwiki bot which work as described by my proposal on my talk page at de-wiki.
In this modification interwiki edits are allowed in following cases:
- if there are no interwiki links at all on the page
- if an interwiki link must be removed
- if there are more than one interwiki link to change or add
- if the last edit was made by human
- if the last edit was made 1 month ago
This is a bit similar to the -whenneeded option.
I could change the svn with this patch but I reluctantly wouldn't without any concessions by the icelandic crats.
Any comments?
Best regards
xqt
----- Original Nachricht ----
Von: Alex Brollo <alex.brollo(a)gmail.com>
An: Pywikipedia discussion list <pywikipedia-l(a)lists.wikimedia.org>
Datum: 28.05.2010 11:54
Betreff: Re: [Pywikipedia-l] Bot run on the Icelandic Wikipedia
> 2010/5/28 Arkaitz Zubiaga <arkaitz.zubiaga(a)gmail.com>
>
> >
> > 2010/5/27 <info(a)gno.de>
> >
> >> Is anybody invited to reduce or disable bot edits due to icelandic bot
> >> policy? I made a proposal on my talk page:
> >>
> http://de.wikipedia.org/wiki/Benutzer_Diskussion:Xqt#Please_don.27t_run_on_t
> he_Icelandic_Wikipedia
> >> If anybody wants to join to this discussion, you are welcome.
> >
> >
> I posted a possible solution into the talk page you linked.
> Alex
>
>
> --------------------------------
>
> _______________________________________________
> Pywikipedia-l mailing list
> Pywikipedia-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>
Und was machen Sie heute abend? Alles Events Ihrer Gegend auf einen Blick im Arcor.de-Veranstaltungskalender: http://www.arcor.de/rd/footer.events
Hello everybody, today working with featured.py I've found this trying to get pages from categories of als@wiki and am@wiki (don't know if this happens with other wikis)
"ValueError: BUG: Wikipedia:Bsunders glungener Artikel is not in the category namespace!"
"ValueError: BUG: Wikipedia:Featured article is not in the category namespace!"
What's happening?
--Manuelt15
_________________________________________________________________
Hotmail es el correo más eficiente. Tienes 25 GB gratis para organizar y compartir tus documentos.
http://mail.live.com/
On Wed, May 26, 2010 at 10:07:32AM +0000, nicdumz(a)svn.wikimedia.org wrote:
> Modified: branches/rewrite/pywikibot/page.py
> ===================================================================
> --- branches/rewrite/pywikibot/page.py 2010-05-26 09:44:23 UTC (rev 8217)
> +++ branches/rewrite/pywikibot/page.py 2010-05-26 10:07:32 UTC (rev 8218)
> @@ -102,7 +102,7 @@
> self.__dict__ = source.__dict__
> if title:
> # overwrite title
> - self._link = Link(title, source=source, defaultNamespace=ns)
> + self._link = Link(title, source=source.site, defaultNamespace=ns)
For the time being, this should probably be "source=source.site(), ...", no ?
(considering that source is a Page object, which is the case here).
stan.