Hi!
My old problem is that repalce.py can't write the pages to work on into a
file on my disk. I have used a modificated version for years that does no
changes but writes the title of the involved pages to a subpage on Wikipedia
in automated mode, and then I can make the replacements from that page much
more quickly than directly from dump or living Wikipedia. This is slow and
generates a plenty of dummy edits.
In other words, replace.py has a tool to get the titles from a file (-file)
or from a wikipage (-links), but has no tool to generate this file.
Now I am ready to rewrite it. This way we can start it and the bot will find
all the possible articles to work on and save the titles without editing
Wikipedia (and without artificial delay), meanwhile we can have the lunch or
run a marathon or sleep. Then we make the replacements from this with -file.
My idea is that replace.py should have two new parameters:
-save writes the results into a new file instead of editing articles. It
overwrites existing file without notice.
-saveappend writes into a file or appends to the existing one.
OR:
-save writes and appends (primary mode)
-savenew writes and overwrites
The help is here:
http://docs.python.org/howto/unicode.html#reading-and-writing-unicode-data
So we have to import codecs.
My script is:
articles=codecs.open('cikkek.txt','a',encoding='utf-8')
...
tutuzuzu=u'# %s\n' %page.aslink() <-- needs rewrite to the new syntax
articles.write(unicode(tutuzuzu)) <-- needs further testing, if nicode() is
really needed
articles.flush()
It works fine except '\n' is a unix-styled newline that has to be converted
by lfcr.py in order to make it readable with notepad.exe.
This is with constant filename, that should be developed to get from command
line.
Your opinions before I begin?
--
Bináris
I want to read a special page with Page.get(). The message is:
File "C:\Program Files\Pywikipedia\wikipedia.py", line 601, in get
raise NoPage('%s is in the Special namespace!' % self.aslink())
pywikibot.exceptions.NoPage
What is the solution?
--
Bináris
On Mon, Apr 25, 2011 at 7:49 AM, Merlijn van Deen <valhallasw(a)arctus.nl> wrote:
> Whoo! Great work :-) Tests always are good contributions :-)
Thanks ;-)
I agree.
> On a sidenote - is there a reason you're implementing these in 'trunk' and
> not in 'rewrite'? Of course, these contributions are very welcome in the
> trunk, but I still think it would be good to push the rewrite branch.
I'm working off trunk because it is trunk.
I'd assumed that the rewrite branch was a single-purpose branch to
rewrite something, and that it would be merged back when it is stable.
Is it stable?
Is there any documentation on what the plans are for the rewrite branch?
Is there a roadmap to finish it?
Is see now that the rewrite branch has more unit tests, but more are needed.
Is there a need to create a backwards compatibility layer?
Or, is everyone except me using the rewrite branch? ;-)
--
John Vandenberg
Hello all,
*As several people have mentioned they had trouble starting with the rewrite
branch, I decided to do a step-by-step log of installing the rewrite in a
way that is good for developing -- this means you are able to edit the
framework files, while not inflicting any changes on other users (or other
bots you run!) of the system. By using setup.py develop, edits you make to
the framework will immediately be used (no need to setup.py install them),
but only within the virtualenv.*
*This is the windows version of my earlier email*
*
*
I do not run python on windows, so this is a tutorial that starts with
installing python. It's a bit rougher than the unix one, as I did not want
to spend too much time on it.
1. Install python 2.7
http://python.org/ftp/python/2.7.1/python-2.7.1.msi<http://python.org/download/>
(do
*not* use the 64-bit version, due to http://bugs.python.org/issue6792 )
2. Install Setuptools
http://pypi.python.org/pypi/setuptools#files
3. Install Virtualenv
start/run: cmd
c:\Python27\Scripts\easy_install.exe virtualenv
4. create a virtualenv for pwb
C:\Users\valhallasw>c:\Python27\Scripts\virtualenv.exe pywikibot
New python executable in pywikibot\Scripts\python.exe
Installing setuptools.....................done.
5. Go to C:\Users\valhallasw\pywikibot and use tortoisesvn to get the
rewrite
6. create a shortcut to cmd /k
c:\users\valhallasw\pywikibot\scripts\activate.bat
with working path C:\Users\valhallasw\pywikibot\rewrite
7. Use the shortcut. You now have a new cmd.exe window
8. python setup.py develop
Your default user directory is
"C:\Users\valhallasw\AppData\Roaming\pywikibot"
How to proceed? ([K]eep [c]hange)
change, to c:\users\valhallasw\pywikibot\conf\
Answer 'y' to the warning prompt (not 'yes')
Do you want to copy files: y
[note: I copied my unix user-config.py to c:\users\valhallasw\pywikibot]
Path to existing wikipedia.py? C:\Users\valhallasw\pywikibot
NOTE: user-config.py already exists in the directory
Create user-fixes.py file? ([y]es, [N]o) n
(pywikibot) C:\Users\valhallasw\pywikibot\rewrite>echo SET
PYWIKIBOT2_DIR=c:\users\valhallasw\pywikibot\conf>> ..\Scripts\activate.bat
(DON'T put a space between f and >>!)
Close the window, and
9. Use the shortcut from (7) again
You should now have a cmd.exe with a working pywikibot setup!
(pywikibot) C:\Users\valhallasw\pywikibot\rewrite\scripts>python touch.py
Gebruiker:Valhallasw
Retrieving 1 pages from wikipedia:nl.
Page [[Gebruiker:Valhallasw]] saved
NOTE: you *must* use 'python' in front of the script name, or python will
not find the pywikibot directory.
Good luck!
Merlijn
Hi,
Noticed one of my scripts failing because I tried to instantiate a category
named "Wikipedia:Globalt perspektiv-samtliga" (it's in Swedish Wikipedia,
here's the actual category:
http://sv.wikipedia.org/wiki/Kategori:Wikipedia:Globalt_perspektiv-samtliga)
This fails:
myCat = pywikibot.Category(mySite, "Wikipedia:Globalt
perspektiv-samtliga");
The error is:
ValueError: 'Wikipedia:Globalt perspektiv-samtliga' is not in the
category namespace!
Creating it as a page with namespace set to 14 results in:
>>> myPage = pywikibot.Page(mySite, 'Wikipedia:Globalt
perspektiv-samtliga', ns=14);
>>> print myPage.title();
Wikipedia:Globalt perspektiv-samtliga
>>> print myPage.namespace();
4
The title's namespace overrides the given option, which itsn't what I would
expect. However, if I add the category namespace prefix to the title
(localised to Swedish it's "Kategori:Wikipedia:Globalt
perspektiv-samtliga"), I get the right namespace and such.
Not sure if this is a bug or a feature, so I figured I'd post a note here
and see if anyone had any views on it, rather than just file the bug in SF.
Cheers,
Morten
Hi! Please help me.
Hungarian dates are in the form yyyy. mm. dd., or yyyy. <monthname> dd.,
without leading zeros.
In a text environment we use the month names, so I replace numbered months
with named months, and I remove leading zeros from day numbers.
The line in fixes.py is, for January:
(ur'(\d{1,4}(?:\]\])?)\. ?01\. ?(\d\d?)', ur'\1. január \2'),
This is OK, no problem up to this point.
The rule is that the day number has to be followed by a dot, except it is
followed by a hyphen and a suffix.
First level of enhancement is to write a dot if necessary.
- if there is a dot there, don't remove it anyway (a hyphen is often used
erroneously, and I don't want to make a bigger problem)
- if there is no dot, but the day is followed by a hyphen, don't put a
dot
- if there is anything but a dot or a hyphen after the day number, put a
dot after the number
I made some experiments with (?(id/name)yes-pattern|no-pattern) syntax (
http://docs.python.org/py3k/library/re.html), but with no valuable result.
Can you help me? There will be further levels if this task is solved because
users are very creative in making errors.
Further problems are:
- Hyphen is often used instead of ndash when describing an interval of
two dates. In this case a dot and a space is required between the day number
and the ndash. I don't want to correct this in this session or this fix if
it is too difficult, but the dot should not be removed in this case (either
they write a space or not).
- Sometimes hyphen with no dot is correct, but there is an extra space
that should be removed. This can be recognized by means of writing a limited
set of suffixes after the hyphen in the regex.
- Sometimes there is a word after the day but space is omitted and should
be supplied.
--
Bináris
You are welcome. :-) I do use it regularly myself and like it very much.
2011/6/27 Chris Watkins <chriswaterguy(a)appropedia.org>
> Hi Bináris,
>
> I want to thank you (a year later) - I was confused by the r8700 at the
> time, but I just got back into bot work and looked at this problem again and
> found that the -save option works perfectly - so thanks!
>
>
--
Bináris