Hi!
First I am not an expert here, second my thoughts:
- sounds good! :) - the two cases (new, append) are not really needed if you just use append, and delete the list by yourself in the file browser (but this is a philosophical issue) - for your save/append code, have a look at [1] and maybe [2] also. In [1] is code quite similar to your proposal and this code is already in use and works. As visible from [2] also a '.decode('latin-1')' is needed for me, this may differ for you, since unicode is quite mysterious... ;))
Hope this helps a bit! Greetings
[1] https://fisheye.toolserver.org/browse/drtrigon/pywikipedia/dtbext/dtbext_bas... [2] https://fisheye.toolserver.org/browse/drtrigon/pywikipedia/sum_disc.py?r=HEA...
Am 22.10.2010 11:16, schrieb BinĂ¡ris:
Hi!
My old problem is that repalce.py can't write the pages to work on into a file on my disk. I have used a modificated version for years that does no changes but writes the title of the involved pages to a subpage on Wikipedia in automated mode, and then I can make the replacements from that page much more quickly than directly from dump or living Wikipedia. This is slow and generates a plenty of dummy edits.
In other words, replace.py has a tool to get the titles from a file (-file) or from a wikipage (-links), but has no tool to generate this file.
Now I am ready to rewrite it. This way we can start it and the bot will find all the possible articles to work on and save the titles without editing Wikipedia (and without artificial delay), meanwhile we can have the lunch or run a marathon or sleep. Then we make the replacements from this with -file.
My idea is that replace.py should have two new parameters: -save writes the results into a new file instead of editing articles. It overwrites existing file without notice. -saveappend writes into a file or appends to the existing one. OR: -save writes and appends (primary mode) -savenew writes and overwrites
The help is here: http://docs.python.org/howto/unicode.html#reading-and-writing-unicode-data So we have to import codecs. My script is: articles=codecs.open('cikkek.txt','a',encoding='utf-8') ... tutuzuzu=u'# %s\n' %page.aslink() <-- needs rewrite to the new syntax articles.write(unicode(tutuzuzu)) <-- needs further testing, if nicode() is really needed articles.flush()
It works fine except '\n' is a unix-styled newline that has to be converted by lfcr.py in order to make it readable with notepad.exe. This is with constant filename, that should be developed to get from command line.
Your opinions before I begin?
BinĂ¡ris
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l