On Mon, Apr 12, 2010 at 18:05, Merlijn van Deen <valhallasw@arctus.nl> wrote:
Searching by using a text dump sounds more reasonable to me.

How would you do this?

E.g. I want to create a list all pages with tables - i.e. with the string "{|". The MediaWiki search won't do this, but I assume it's possible with a site dump. But I don't know the command to use.

Thanks


If you insist on changing replace.py, make sure you are removing all occurences of both put and put_async.

Best regards,
Merlijn 'valhallasw' van Deen


On 12 April 2010 09:54, Chris Watkins <chriswaterguy@appropedia.org> wrote:
So I haven't found a way to make a list of matches without replacing. I suspect there's a very simple way, or it would take very simple changes to replace.py.


I tried editing replace.py myself, to make it do everything except replace the files. Then I could hack the log files to get the list I want. But I had no success - I'm not coder, so it was guesswork.

I copied replace.py to a new file intended to do everything except put files,  and called it replacenoput.py (i.e. "replace," but no "put")

My first attempt was to remove this section (commented it out first, but then removed to be sure):

            if self.acceptall and new_text != original_text:
                try:
                    page.put(new_text, self.editSummary)
                except wikipedia.EditConflict:
                    wikipedia.output(u'Skipping %s because of edit conflict'
                                     % (page.title(),))
                except wikipedia.SpamfilterError, e:
                    wikipedia.output(
                        u'Cannot change %s because of blacklist entry %s'
                        % (page.title(), e.url))
                except wikipedia.PageNotSaved, error:
                    wikipedia.output(u'Error putting page: %s'
                                     % (error.args,))
                except wikipedia.LockedPage:
                    wikipedia.output(u'Skipping %s (locked page)'
                                     % (page.title(),))


Fail - it made the changes all the same.

Then I figured out that wikipedia.py was being used to put the files. So I copied that to a new file wikipedianoput.py and changed every wikipedia reference in replacenoput.py to wikipedianoput.

Then I scanned through wikipedianoput.py looking for what I need to block... but I couldn't tell.

Can anyone help? Or even better, is there a more elegant way?

Thanks
Chris


On Fri, Apr 2, 2010 at 00:12, Daniel Mietchen <daniel.mietchen@googlemail.com> wrote:
Hi Chris,

On Thu, Apr 1, 2010 at 2:26 PM, Chris Watkins
<chriswaterguy@appropedia.org> wrote:
> Thanks Daniel... I'm confused though.
>
> On Thu, Apr 1, 2010 at 20:25, Daniel Mietchen
> <daniel.mietchen@googlemail.com> wrote:
>>
>> Perhaps
>> http://meta.wikimedia.org/wiki/Pywikipediabot/copyright.py
>> will do the trick,
>
> I can't see how to use it for matching a specific string.
Nor do I - sorry. What I had in mind was to apply it to a page that
contains your search string, and to restrict the search for "copyright
violations" to your site.
But this may indeed be a dead end.

>> or simply
>> http://meta.wikimedia.org/wiki/Pywikipediabot/replace.py
>> in -debug mode?
>
> Where can I find information on -debug mode? I see there is -verbose mode
> which "may be helpful when debugging", but I don't see how that helps.
I thought that most PWB scripts had it, but apparently replace.py does not.

but if the
 def __init__(self, reader, force, append, summary, minor, autosummary, debug):
line contains "debug" (as in the example above, taken from
http://svn.wikimedia.org/viewvc/pywikipedia/trunk/pywikipedia/pagefromfile.py?view=markup
),
then -debug is an option with which the script can be run such that it
performs all its
actions except editing the pages.

I am not very experienced with Python or PWB either, but since nobody
had replied so far, I wrote out my ideas as they came to mind.
Sorry for the confusion,

Daniel

> I may be missing something obvious &-)
Me too.

> Chris
>
>
>>
>> Daniel
>>
>> On Thu, Apr 1, 2010 at 6:05 AM, Chris Watkins
>> <chriswaterguy@appropedia.org> wrote:
>> > I want to generate a list of matches for a search, but not do anything
>> > to
>> > the page.
>> >
>> > E.g. I want to list all pages that contain "redirect[[:Category", but I
>> > don't want to modify the pages.
>> >
>> > I guess that it's possible to modify redirect.py (I don't speak python,
>> > but
>> > it shouldn't be hard) and run it with -log. But maybe there's a simpler
>> > way?
>> >
>> > Thanks in advance.
>> >
>> > --
>> > Chris Watkins
>> >
>> > Appropedia.org - Sharing knowledge to build rich, sustainable lives.
>> >
>> > blogs.appropedia.org
>> > community.livejournal.com/appropedia
>> > identi.ca/appropedia
>> > twitter.com/appropedia
>> >
>> > _______________________________________________
>> > Pywikipedia-l mailing list
>> > Pywikipedia-l@lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>> >
>> >
>>
>>
>>
>> --
>> http://www.google.com/profiles/daniel.mietchen
>>
>> _______________________________________________
>> Pywikipedia-l mailing list
>> Pywikipedia-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>
>
>
> --
> Chris Watkins
>
> Appropedia.org - Sharing knowledge to build rich, sustainable lives.
>
> blogs.appropedia.org
> community.livejournal.com/appropedia
> identi.ca/appropedia
> twitter.com/appropedia
>
> _______________________________________________
> Pywikipedia-l mailing list
> Pywikipedia-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>
>



--
http://www.google.com/profiles/daniel.mietchen

_______________________________________________
Pywikipedia-l mailing list
Pywikipedia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l



--
Chris Watkins

Appropedia.org - Sharing knowledge to build rich, sustainable lives.

blogs.appropedia.org
community.livejournal.com/appropedia
identi.ca/appropedia
twitter.com/appropedia

_______________________________________________
Pywikipedia-l mailing list
Pywikipedia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l



_______________________________________________
Pywikipedia-l mailing list
Pywikipedia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l




--
Chris Watkins

Appropedia.org - Sharing knowledge to build rich, sustainable lives.

blogs.appropedia.org
community.livejournal.com/appropedia
identi.ca/appropedia
twitter.com/appropedia