Hi!
My old problem is that repalce.py can't write the pages to work on into a
file on my disk. I have used a modificated version for years that does no
changes but writes the title of the involved pages to a subpage on Wikipedia
in automated mode, and then I can make the replacements from that page much
more quickly than directly from dump or living Wikipedia. This is slow and
generates a plenty of dummy edits.
In other words, replace.py has a tool to get the titles from a file (-file)
or from a wikipage (-links), but has no tool to generate this file.
Now I am ready to rewrite it. This way we can start it and the bot will find
all the possible articles to work on and save the titles without editing
Wikipedia (and without artificial delay), meanwhile we can have the lunch or
run a marathon or sleep. Then we make the replacements from this with -file.
My idea is that replace.py should have two new parameters:
-save writes the results into a new file instead of editing articles. It
overwrites existing file without notice.
-saveappend writes into a file or appends to the existing one.
OR:
-save writes and appends (primary mode)
-savenew writes and overwrites
The help is here:
http://docs.python.org/howto/unicode.html#reading-and-writing-unicode-data
So we have to import codecs.
My script is:
articles=codecs.open('cikkek.txt','a',encoding='utf-8')
...
tutuzuzu=u'# %s\n' %page.aslink() <-- needs rewrite to the new syntax
articles.write(unicode(tutuzuzu)) <-- needs further testing, if nicode() is
really needed
articles.flush()
It works fine except '\n' is a unix-styled newline that has to be converted
by lfcr.py in order to make it readable with notepad.exe.
This is with constant filename, that should be developed to get from command
line.
Your opinions before I begin?
--
Bináris
I want to read a special page with Page.get(). The message is:
File "C:\Program Files\Pywikipedia\wikipedia.py", line 601, in get
raise NoPage('%s is in the Special namespace!' % self.aslink())
pywikibot.exceptions.NoPage
What is the solution?
--
Bináris
Hello all,
This is especially relevant for all interwiki bots on the toolserver.
Do *not* use python 2.7 for those bots.
There is a bug [1] in the unicode normalization that causes page
titles to become mangled [2]. This, in turn, results in botwars [3].
As such, interwiki bots on wikipedia should use a python version that
does not have this bug, which means using a version before 2.6.5.
Although you will get a warning message when using a python version
that exhibits this bug, the bot will still work. As such, you may very
well cause bot wars if you start using py2.7.
Best regards,
Merlijn van Deen
[1] http://bugs.python.org/issue10254
[2] http://sourceforge.net/tracker/?func=detail&atid=603138&aid=3081100&group_i…
[3] http://de.wikipedia.org/w/index.php?title=GNU-Lizenz_f%C3%BCr_freie_Dokumen…
On 22 November 2010 11:22, River Tarnell <river.tarnell(a)wikimedia.de> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi,
>
> During the general maintenance on Dec 6th, we will change the default Python
> version (/usr/bin/python) on the Solaris user servers from 2.6 to 2.7. You may
> wish to test your tools with /usr/bin/python2.7 before then.
>
> - river.
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.16 (FreeBSD)
>
> iEYEARECAAYFAkzqREgACgkQIXd7fCuc5vIhFQCgpX20z0B9xHikuwl+yiEUDzFH
> WjYAn1wqm21wZjP1uQhsEO7RkxlTyE/N
> =CqUE
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> Toolserver-l mailing list (Toolserver-l(a)lists.wikimedia.org)
> https://lists.wikimedia.org/mailman/listinfo/toolserver-l
> Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Hello guys.
2010/10/10 <xqt(a)svn.wikimedia.org>:
> Revision: 8630
> Author: xqt
> Date: 2010-10-09 19:32:57 +0000 (Sat, 09 Oct 2010)
>
> Log Message:
> -----------
> import wikipedia as pywikibot for merging to rewrite
>
[...]
>
> Modified: trunk/pywikipedia/reflinks.py
> ===================================================================
> --- trunk/pywikipedia/reflinks.py 2010-10-09 16:11:46 UTC (rev 8629)
> +++ trunk/pywikipedia/reflinks.py 2010-10-09 19:32:57 UTC (rev 8630)
> @@ -33,15 +33,19 @@
> Basic pagegenerators commands, -page, etc...
> """
> # (C) 2008 - Nicolas Dumazet ( en:User:NicDumZ )
> +# (C) Pywikipedia bot team, 2008-2010
> #
> -# Distributed under the terms of the GPL
> -
> +# Distributed under the terms of the MIT license.
A few things are wrong in this commit
1) The changes do not match the commit message. A license change is not related.
2) You cannot change the license of a script without asking to ALL
contributors of the file for their permissions.
No one asked me if it was OK for me to switch from a license to another.
Note that I am personally fine with changing license if it's required,
but doing so without asking the original authors can seriously harm
the project....
Regards,
--
Nicolas Dumazet — NicDumZ
Hi,
Got a KeyError in page.py, line 2067, lines 2066-67 go:
yield Page(self.site, contrib['title'], contrib['ns']), \
contrib['revid'], ts, contrib['comment']
In the case of a deleted edit comment, it appears that the dictionary
for each edit returned by site.usercontribs() doesn't contain said key
('comment'), resulting in a KeyError. I've currently patched my local
copy to handle the problem, don't know if it's appropriate to handle
this in User.contributions(), Site.usercontribs(), or somewhere else.
Cheers,
Morten
I'm using replace.py to create wikilinks. Usually I want to select only the
first occurrence of the search string, and my command works fine for this.
But sometimes, the first hit is not suitable (e.g. it's part of a book or
course title, so I don't want to add the wikilink). If I choose n for no,
the bot goes to the next page.
Is there a way I can skip to the next occurrence in the same page? I'm
guessing it will need a modified version of replace.py, so that it gives an
extra option besides ([y]es, [N]o, [e]dit, open in [b]rowser, [a]ll,
[q]uit)
The actual command I'm using is:
python replace.py -regex "(?si)\b((?:FOO1|FOO2))\b(.*$)
" "[[\\1]]\\2" -exceptinsidetag:link -exceptinsidetag:hyperlink
-exceptinsidetag:header -exceptinsidetag:nowiki -exceptinsidetag:ref
-excepttext:"(?si)\[\[((?:FOO1|FOO2)[\|\]])" -namespace:0 -namespace:102
-namespace:4 -summary:"[[Appropedia:Wikilink bot]] adding double square
brackets to: FOO1|FOO2." -log -xml:currentdump.xml
Many thanks!
--
Chris Watkins
Appropedia.org - Sharing knowledge to build rich, sustainable lives.
blogs.appropedia.org
identi.ca/appropedia
twitter.com/appropedia
Hello all
Just looked at the last svn chnages and thought this patch could be good
to apply, in fact it just chnages some comments, but I think it's
needed, may be you too. ;)
I did not open a ticket since I do no think it is THAT relevant, beacuse
it's not a bug.
Greetings
Hello xqt and valhallasw!
Just wanted to give a final comment to this topic. ;)
----------------------------------------------------------------------
> >Comment By: xqt (xqt)
Date: 2010-11-06 04:17
Message:
KeyError fixed in r8701 as recommended
Maybe there are follow-ups of this bug in other scripts depending of the
comment is None
----------------------------------------------------------------------
THANKS A LOT FOR IMPLEMENTING THIS!
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2010-11-04 14:44
Message:
I disagree with the statement "an unicode string is expected and thus a
None (..) is not a good idea". The comment is hidden, which is different
from an empty comment. Using None is much more sensible.
In addition, the code can be simplified by using contrib.get('comment',
None) instead of the current if/the.
----------------------------------------------------------------------
YEA, I WAS NOT SURE WHICH ONE TO CHOOSE. 'None' SEAMS MORE SENSIBLE
SOLUTION, BUT SINCE I WANTED NOT TO BREAK OTHER CODE... ;)
INDEED, THE SIMPLIFIED CODE IS A LOT BETTER - I WAS FOCUSSED ON
TRIGGERING TO 'commenthidden' WHICH IS NOT REALLY NEEDED.
AFTER ALL WHEN I WAS LOOKING AT THE CODE THE 'None' GETS FED INTO A
PAGE OBJECTS, SO THIS IS THE ONLY PART THAT COULD BREAK, AND SHOULD
NOT AFFECT ANY OTHER CODE... SEAMS THE PERFECT SOLUTION.
Greetings and thanks
Dr. Trigon
Hi,
I have an enhacement that I wrote for myself.
Now, with replace.py r8700, using -xml and -save, we can collect the
articles to work on in automatic mode. But sometimes it takes a lot of time,
and we would like to know, whether it will end in the near future or needs
more time. For example, I have to leave home, and I want to know if the task
will end in 15 minutes or I have to quit it.
The xml dump contains articles mainly in the order of their creation, so
knowing the date of the first edit if the article on the screen is useful.
It may cause a very few decreasing of speed. My replace.py writes this date
on the screen after every 20th title if in automatic mode (but perhaps it
could also be each title).
The question is: is it worth to build into the framework for public use, or
is it only in my interest? Shall I put it on SF or forget?
--
Bináris
I want to generate a list of matches for a search, but not do anything to
the page.
E.g. I want to list all pages that contain "redirect[[:Category", but I
don't want to modify the pages.
I guess that it's possible to modify redirect.py (I don't speak python, but
it shouldn't be hard) and run it with -log. But maybe there's a simpler way?
Thanks in advance.
--
Chris Watkins
Appropedia.org - Sharing knowledge to build rich, sustainable lives.
blogs.appropedia.orgcommunity.livejournal.com/appropedia
identi.ca/appropedia
twitter.com/appropedia