Hi!
First I am not an expert here, second my thoughts:
- sounds good! :)
- the two cases (new, append) are not really needed
if you just use append, and delete the list by yourself
in the file browser (but this is a philosophical issue)
- for your save/append code, have a look at [1] and maybe
[2] also. In [1] is code quite similar to your proposal
and this code is already in use and works. As visible from
[2] also a '.decode('latin-1')' is needed for me, this
may differ for you, since unicode is quite mysterious... ;))
Hope this helps a bit!
Greetings
[1]
https://fisheye.toolserver.org/browse/drtrigon/pywikipedia/dtbext/dtbext_ba…
[2]
https://fisheye.toolserver.org/browse/drtrigon/pywikipedia/sum_disc.py?r=HE…
Am 22.10.2010 11:16, schrieb Bináris:
Hi!
My old problem is that repalce.py can't write the pages to work on into
a file on my disk. I have used a modificated version for years that does
no changes but writes the title of the involved pages to a subpage on
Wikipedia in automated mode, and then I can make the replacements from
that page much more quickly than directly from dump or living Wikipedia.
This is slow and generates a plenty of dummy edits.
In other words, replace.py has a tool to get the titles from a file
(-file) or from a wikipage (-links), but has no tool to generate this file.
Now I am ready to rewrite it. This way we can start it and the bot will
find all the possible articles to work on and save the titles without
editing Wikipedia (and without artificial delay), meanwhile we can have
the lunch or run a marathon or sleep. Then we make the replacements from
this with -file.
My idea is that replace.py should have two new parameters:
-save writes the results into a new file instead of editing articles. It
overwrites existing file without notice.
-saveappend writes into a file or appends to the existing one.
OR:
-save writes and appends (primary mode)
-savenew writes and overwrites
The help is here:
http://docs.python.org/howto/unicode.html#reading-and-writing-unicode-data
So we have to import codecs.
My script is:
articles=codecs.open('cikkek.txt','a',encoding='utf-8')
...
tutuzuzu=u'# %s\n' %page.aslink() <-- needs rewrite to the new syntax
articles.write(unicode(tutuzuzu)) <-- needs further testing, if nicode()
is really needed
articles.flush()
It works fine except '\n' is a unix-styled newline that has to be
converted by lfcr.py in order to make it readable with notepad.exe.
This is with constant filename, that should be developed to get from
command line.
Your opinions before I begin?
--
Bináris
_______________________________________________
Pywikipedia-l mailing list
Pywikipedia-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l