Pywikipedia-l March 2010

pywikipedia-l@lists.wikimedia.org

12 participants
11 discussions

by Chris Watkins

I'm using replace.py to create wikilinks. Usually I want to select only the first occurrence of the search string, and my command works fine for this. But sometimes, the first hit is not suitable (e.g. it's part of a book or course title, so I don't want to add the wikilink). If I choose n for no, the bot goes to the next page. Is there a way I can skip to the next occurrence in the same page? I'm guessing it will need a modified version of replace.py, so that it gives an extra option besides ([y]es, [N]o, [e]dit, open in [b]rowser, [a]ll, [q]uit) The actual command I'm using is: python replace.py -regex "(?si)\b((?:FOO1|FOO2))\b(.*$) " "[[\\1]]\\2" -exceptinsidetag:link -exceptinsidetag:hyperlink -exceptinsidetag:header -exceptinsidetag:nowiki -exceptinsidetag:ref -excepttext:"(?si)\[\[((?:FOO1|FOO2)[\|\]])" -namespace:0 -namespace:102 -namespace:4 -summary:"[[Appropedia:Wikilink bot]] adding double square brackets to: FOO1|FOO2." -log -xml:currentdump.xml Many thanks! -- Chris Watkins Appropedia.org - Sharing knowledge to build rich, sustainable lives. blogs.appropedia.org identi.ca/appropedia twitter.com/appropedia

13 years, 6 months

'NoneType' object has no attribute 'strip'

by Davide Bolsi

Hi! Do you have any idea why, using replace.py on some large dumps, I get this error message: C:\pywikipedia>replace.py -xml:enwiki-20091128-pages-articles.xml Please enter the text that should be replaced: impossibletofindword Please enter the new text: found Please enter another text that should be replaced, or press Enter to start: The summary message will default to: Robot: Automated text replacement (-impossibletofindword +found ) Press Enter to use this default message, or enter a description of the changes your bot will make: test Reading XML dump... Traceback (most recent call last): File "C:\pywikipedia\pagegenerators.py", line 847, in __iter__ for page in self.wrapped_gen: File "C:\pywikipedia\pagegenerators.py", line 779, in DuplicateFilterPageGenerator for page in generator: File "C:\pywikipedia\replace.py", line 218, in __iter__ for entry in self.parser: File "C:\pywikipedia\xmlreader.py", line 295, in new_parse for rev in self._parse(event, elem): File "C:\pywikipedia\xmlreader.py", line 304, in _parse_only_latest yield self._create_revision(revision) File "C:\pywikipedia\xmlreader.py", line 341, in _create_revision redirect=self.isredirect File "C:\pywikipedia\xmlreader.py", line 64, in __init__ self.username = username.strip() AttributeError: 'NoneType' object has no attribute 'strip' 'NoneType' object has no attribute 'strip' I updated pywikipedia to the last revision with no results. As you can see it does not seem to be user-fixes.py or regex-related. Thanks in advance! Davide Bolsi

13 years, 7 months

Request for feedback on rewrite branch

by Russell Blau

I am at a point where it would be helpful to have some feedback from other Pywikipedia users about the future of the rewrite branch. As those who watch the SVN commits know, I have not had as much time to work on this lately, and have to prioritize what time I do spend on it. For those who have used the rewrite branch, what (if anything) needs to be done to it to get you to use it exclusively and retire the old wikipedia.py system? What is missing? What is broken? What is present but could be improved? For those who have chosen not to use the rewrite branch, why not? What might lead you to take another look? And then, I'm sure there are many whose reaction to this post has been, "What's the rewrite branch?" I don't know what to ask you, so feel free to move on to the next message. Most critically, is there any reason to continue development of the trunk once the rewrite branch is at a point where most users are ready to switch to it? -- Russ

14 years, 1 month

Rewrite available as nightly package

by Merlijn van Deen

Dear all, As response to Nicolas' e-mail: > The original idea was to abandon trunk/ to use the rewrite, but we > lack manpower and (at least for me) time to actually do the conversion > work of all existing scripts. But I know for a fact that code is > working, and cleaner. Please give it a try :) I decided to clean up the nightlies page: I removed all the clutter (spelling, threadedhttp, pywikiparser) and added the rewrite (in other words: only the 'pywikipedia' and 'rewrite' packages remain). Nightlies page: http://toolserver.org/~valhallasw/pywiki/ Best regards, Merlijn van Deen / valhallasw

14 years, 1 month

Bug? Bot can't edit its own .css subages

by emijrp

Hi; My bot crashes while trying edit its own .css user subpages. An error requesting sysopnames['wikipedia']['es'] in user-config.py appears. Regards!

14 years, 1 month

noreferences.py does not work at plwiki

by Patrol110

Hello! I've recently noticed that noreferences does not work with articles from pl.wiki :/ I've used that command "python noreferences.pyc -file:logs/refdx -always" and the python script ignore all pages which have <ref> tags and no have <references/>. But the script showed one info listed below when it was necessary: "No changes necessary: references template found." An example of ignored pages: http://pl.wikipedia.org/wiki/Budgie_%28album%29 Regards, patrol

14 years, 1 month

Copy pages, or create a new page with pywikipediabot?

by Chris Watkins

I want to copy (not move) a few hundred pages from one namespace to another. Any ideas how? I think I can do it in two stages - if I can create a page. I'd create each page with only the page name, then convert that to a transclusion from the mainspace article: python replace.py -regex ".*" "{{subst:PAGENAME}}" -file:pagelist python replace.py -regex ".*" "{{subst::\\1}}" -file:pagelist But I get the error "Page [[blah blah]] not found" when I run the first command. I also tried with "" as the search string in the first line, but it's the same. And ideally I'd like it to exclude cases where the page already exists. Any solution? Thanks! -- Chris Watkins Appropedia.org - Sharing knowledge to build rich, sustainable lives. blogs.appropedia.org community.livejournal.com/appropedia identi.ca/appropedia twitter.com/appropedia

14 years, 1 month

Doubt about files hierarchy

by Pablo Recio Quijano

Hi I use Python frecuently, and today I start working with pywikipediabot, wich is a very good library, by the way. But I think that the workflow is very out-of-the-python-way. I explain my point: To make a script that uses this environment, you need to put the code on the main directory of pywikipediabot, or do some links to that directory. But usually, when you use a third-party module on Python, you should have the chance to "install" the module and load it with a simple import pywikipediabot or from pywikipediabot import wikipedia And doing thins on any directory on your system, without any extra configuration or needed files. I think this could be a nice feature, because it respects the python-way, and gives the chance to distribuite the module much more easier using 'distutils'[1] or even Debian packages [1] http://docs.python.org/distutils/index.html Regards, Pablo Recio

14 years, 2 months

Improvements to the rewrite branch

by Santiago M. Mola

Hi there, I'm currently using the rewrite branch for a project. This project is not a bot, but a tool for vandalism analysis. Here I'll explain how I used it and what changes I made, so it may be useful for the new design of the rewrite. Also, I'd like to get recommendations about my approaches so I can made them suitable for integration with pywikipedia. First of all, my main unit of information is Edit. An Edit is an object composed of a Page and two consecutive revision IDs of such page. Edit supports some operations such as getting the edition comment, user, timestamp and the old and the new text. I had to implement a method similar to BaseSite.loadrevisions(): Given a list of edits, which have associated their revision IDs but NOT their Page, fetch them and associate them with their Page object. This method retrieves all the revisions, creates Page objects for them and Revision objects which are assigned to the corresponding Page._revisions dict. Then, I have to store all this info in-disk for later use. So I wrote a function for exporting my list edits to XML, using WikiMedia's format Export 0.4. To ease this process, I added a to_element() method to Page and Revision objects. to_element() returns an Element object (from the ElementTree API) representing the object. So, exporting is as easy as iterating over all Pages, calling their to_element() method() and appending it to a common root. What do you think about this? Should it be included in pywikipedia? Do you prefer a different approach for exporting to XML? For importing again from XML, I adapted the old XmlDump. My version yields Page objects instead of revisions. Of course this might be a performance nightmare when working with XML dumps with full history, so it can be modified to yield Revision objects. I think the Revision class should include a page attribute, containing the Page object that the Revision belongs to. That would be of use, for example, when writing an XmlDump yielding Revisions and, in general, for more applications that are Revision oriented. And last but not least, currently it's easy to end up with multiple Page objects representing the same page, but with different object state. Do you think that BaseSite should implement a Page factory or some way to "create a Page object for this title if it doesn't exist or give me the one that already exists"? Well, that's all at the moment. Best regards, -- Santiago M. Mola Jabber ID: cooldwind(a)gmail.com

14 years, 2 months

Working on arbitrary revisions

by Santiago M. Mola

Hi, Currently, I have to do work with old revisions from articles. I found that most method in Page are focused on working with the latest revision. So it'd very convenient to add a new Revision class where methods like section(), isDisambig(), userName(), editTime(), etc., live. Then Page could properly get (and cache) any revision. These methods in page could be shortcuts for the actual methods in Revision of the latest revision. What do you think? Also, am I missing something and there's actually a good way of working on arbitrary revisions that I missed? Thanks, -- Santiago M. Mola Jabber ID: cooldwind(a)gmail.com

14 years, 2 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Pywikipedia-l March 2010