On Mon, Apr 12, 2010 at 18:05, Merlijn van Deen valhallasw@arctus.nlwrote:
Searching by using a text dump sounds more reasonable to me.
How would you do this?
E.g. I want to create a list all pages with tables - i.e. with the string "{|". The MediaWiki search won't do this, but I assume it's possible with a site dump. But I don't know the command to use.
Thanks
If you insist on changing replace.py, make sure you are removing all
occurences of both put and put_async.
Best regards, Merlijn 'valhallasw' van Deen
On 12 April 2010 09:54, Chris Watkins chriswaterguy@appropedia.orgwrote:
So I haven't found a way to make a list of matches without replacing. I suspect there's a very simple way, or it would take very simple changes to replace.py.
I tried editing replace.py myself, to make it do everything except replace the files. Then I could hack the log files to get the list I want. But I had no success - I'm not coder, so it was guesswork.
I copied replace.py to a new file intended to do everything except put files, and called it *replacenoput.py* (i.e. "replace," but no "put")
My first attempt was to remove this section (commented it out first, but then removed to be sure):
if self.acceptall and new_text != original_text: try: page.put(new_text, self.editSummary) except wikipedia.EditConflict: wikipedia.output(u'Skipping %s because of edit
conflict' % (page.title(),)) except wikipedia.SpamfilterError, e: wikipedia.output( u'Cannot change %s because of blacklist entry %s' % (page.title(), e.url)) except wikipedia.PageNotSaved, error: wikipedia.output(u'Error putting page: %s' % (error.args,)) except wikipedia.LockedPage: wikipedia.output(u'Skipping %s (locked page)' % (page.title(),))
Fail - it made the changes all the same.
Then I figured out that wikipedia.py was being used to put the files. So I copied that to a new file *wikipedianoput.py* and changed every wikipedia reference in *replacenoput.py* to wikipedianoput.
Then I scanned through wikipedianoput.py looking for what I need to block... but I couldn't tell.
Can anyone help? Or even better, is there a more elegant way?
Thanks Chris
On Fri, Apr 2, 2010 at 00:12, Daniel Mietchen < daniel.mietchen@googlemail.com> wrote:
Hi Chris,
On Thu, Apr 1, 2010 at 2:26 PM, Chris Watkins chriswaterguy@appropedia.org wrote:
Thanks Daniel... I'm confused though.
On Thu, Apr 1, 2010 at 20:25, Daniel Mietchen daniel.mietchen@googlemail.com wrote:
Perhaps http://meta.wikimedia.org/wiki/Pywikipediabot/copyright.py will do the trick,
I can't see how to use it for matching a specific string.
Nor do I - sorry. What I had in mind was to apply it to a page that contains your search string, and to restrict the search for "copyright violations" to your site. But this may indeed be a dead end.
or simply http://meta.wikimedia.org/wiki/Pywikipediabot/replace.py in -debug mode?
Where can I find information on -debug mode? I see there is -verbose
mode
which "may be helpful when debugging", but I don't see how that helps.
I thought that most PWB scripts had it, but apparently replace.py does not.
but if the def __init__(self, reader, force, append, summary, minor, autosummary, debug): line contains "debug" (as in the example above, taken from
http://svn.wikimedia.org/viewvc/pywikipedia/trunk/pywikipedia/pagefromfile.p... ), then -debug is an option with which the script can be run such that it performs all its actions except editing the pages.
I am not very experienced with Python or PWB either, but since nobody had replied so far, I wrote out my ideas as they came to mind. Sorry for the confusion,
Daniel
I may be missing something obvious &-)
Me too.
Chris
Daniel
On Thu, Apr 1, 2010 at 6:05 AM, Chris Watkins chriswaterguy@appropedia.org wrote:
I want to generate a list of matches for a search, but not do
anything
to the page.
E.g. I want to list all pages that contain "redirect[[:Category",
but I
don't want to modify the pages.
I guess that it's possible to modify redirect.py (I don't speak
python,
but it shouldn't be hard) and run it with -log. But maybe there's a
simpler
way?
Thanks in advance.
-- Chris Watkins
Appropedia.org - Sharing knowledge to build rich, sustainable lives.
blogs.appropedia.org community.livejournal.com/appropedia identi.ca/appropedia twitter.com/appropedia
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
-- http://www.google.com/profiles/daniel.mietchen
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
-- Chris Watkins
Appropedia.org - Sharing knowledge to build rich, sustainable lives.
blogs.appropedia.org community.livejournal.com/appropedia identi.ca/appropedia twitter.com/appropedia
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
-- http://www.google.com/profiles/daniel.mietchen
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
-- Chris Watkins
Appropedia.org - Sharing knowledge to build rich, sustainable lives.
blogs.appropedia.org community.livejournal.com/appropedia identi.ca/appropedia twitter.com/appropedia
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l