Brad Jorsch, 09/11/2012 17:30:
> On Fri, Nov 9, 2012 at 7:59 AM, Hydriz Wikipedia <admin(a)alphacorp.tk> wrote:
>>
>> You mentioned "a while back" for "apcontinue", show recent was it? This dump
>> generator is attempting to archive all sorts of versions of MediaWiki, or so
>> unless we write a backward compatibility handler in the script itself.
>
> July 2012: http://lists.wikimedia.org/pipermail/mediawiki-api-announce/2012-July/00003…
>
> Any wiki running version 1.19, or a 1.20 snapshot from before
> mid-July, would be returning the old parameter. If you do it right,
> though, there's little you have to do. Just use whichever keys are
> given you inside the <query-continue> node. Even with your regular
> expression mess, just capture which key is given as well as the value
> and use it as the key for your params dict.
Thank you again for your useful suggestions!
However, as already noted,
https://www.mediawiki.org/wiki/API:Query#Continuing_queries doesn't give
any info about supported releases.
Nemo
P.s.: Small unreliable "temporary" things in MediaWiki, like the
"powered by MediaWiki" sentence we grep for, are usually the most
permanent ones, although I don't like it.
Hydriz Wikipedia, 09/11/2012 16:59:
> You mentioned "a while back" for "apcontinue", show recent was it? This
> dump generator is attempting to archive all sorts of versions of
> MediaWiki, or so unless we write a backward compatibility handler in the
> script itself.
+1
https://www.mediawiki.org/wiki/API:Allpages ,
https://www.mediawiki.org/wiki/API:Lists and
https://www.mediawiki.org/wiki/API:Query#Continuing_queries don't really
shed any light.
> ...and I agree, the code is in a total mess. We need to get someone to
> rewrite the whole thing, soon.
Well, that in an ideal world. In this one, the best would probably be
suggestions for simple libraries to be used to solve such small
problems? (Which can become very big if one doesn't follow API evolution
very closely or know it's history from the beginning of time.)
Nemo
> On Fri, Nov 9, 2012 at 11:50 PM, Brad Jorsch wrote:
>
> You're searching for the continue parameter as "apfrom", but this was
> changed to "apcontinue" a while back. Changing line 162 to something
> like this should probably do it:
>
> m = re.findall(r'<allpages (?:apfrom|apcontinue)="([^>]+)" />',
> xml)
>
> Note that for full correctness, you probably should omit both apfrom
> and apcontinue entirely from params the first time around, and send
> back whichever of the two is found by the above line in subsequent
> queries.
>
> Also, why in the world aren't you using an XML parser (or a JSON
> parser with format=json) to process the API response instead of trying
> to parse the XML using regular expressions?!
>
> On Fri, Nov 9, 2012 at 2:27 AM, Federico Leva (Nemo)
> <nemowiki(a)gmail.com <mailto:nemowiki@gmail.com>> wrote:
> > It's completely broken:
> > https://code.google.com/p/wikiteam/issues/detail?id=56
> > It will download only a fraction of the wiki, 500 pages at most per
> > namespace.
>
> _______________________________________________
> Mediawiki-api mailing list
> Mediawiki-api(a)lists.wikimedia.org
> <mailto:Mediawiki-api@lists.wikimedia.org>
> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>
>
>
>
> --
> Regards,
> Hydriz
>
> We've created the greatest collection of shared knowledge in history.
> Help protect Wikipedia. Donate now: http://donate.wikimedia.org
> <http://donate.wikimedia.org/>
>
>
> _______________________________________________
> Mediawiki-api mailing list
> Mediawiki-api(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>
It's completely broken:
https://code.google.com/p/wikiteam/issues/detail?id=56
It will download only a fraction of the wiki, 500 pages at most per
namespace.
Let me reiterate that
https://code.google.com/p/wikiteam/issues/detail?id=44 is a very urgent
bug and we've seen no work on it in many months. We need an actual
programmer with some knowledge of python to fix it and make the script
work properly; I know there are several on this list (and elsewhere),
please please help. The last time I, as a non-coder, tried to fix a bug,
I made things worse
(https://code.google.com/p/wikiteam/issues/detail?id=26).
Only after API is implemented/fixed, I'll be able to re-archive the 4-5
thousands wikis we've recently archived on archive.org
(https://archive.org/details/wikiteam) and possibly many more. Many of
those dumps contain errors and/or are just partial because of the
script's unreliability, and wikis die on a daily basis. (So, quoting
emijrp, there IS a deadline.)
Nemo
P.s.: Cc'ing some lists out of desperation; sorry for cross-posting.
Hello,
For manual running it may work, but for automated running such as MiszaBot... will it work? - At Meta, the archiving system is automated and I'm not the owner of MiszaBot :-)
Regards, M.
----- Mensaje original -----
De: info(a)gno.de
Enviado: 28-10-12 11:44
Para: pywikipedia-l(a)lists.wikimedia.org
Asunto: Re: [Pywikipedia-l] archivebot.py issues
update the bot to release r10622 or higher and try again with option --page="m::Wikimedia Forum" double colon or preleading colon now implies main namespace as exprected and was intended for page titles long time ago Regards xqt ----- Original Nachricht ---- Von: legoktm <legoktm.wikipedia(a)gmail.com> An: Pywikipedia discussion list <pywikipedia-l(a)lists.wikimedia.org> Datum: 28.10.2012 03:10 Betreff: Re: [Pywikipedia-l] archivebot.py issues > If you modify line 293 and remove the "defaultNamespace=3" it should work. > Looks like that was introduced in > pyrev:10149<https://www.mediawiki.org/wiki/Special:Code/pywikipedia/10149>. > Not sure if it was intentional or not. > -- Legoktm > > > > On Sat, Oct 27, 2012 at 3:19 PM, Marco Aurelio <maurelio(a)gmx.es> wrote: > > > Hi, > > > > At Meta-Wiki we've > detected<https://meta.wikimedia.org/w/index.php?title=Meta:Babel&oldid=43415 > 73#MiszaBot_has_stopped_archiving_main_ns_pages>that the bot that used to > archive pages (MiszaBot) has suddenly stopped > > from doing so on main namespace pages. I did tryed with my bot using the > > archivebot.py script on a backlogged page, being [[m:Wikimedia Forum]] > and > > the result is: > > > > *Processing [[meta:Wikimedia Forum]] > > Looking for: {{User:MiszaBot/config}} in [[meta:User talk:Wikimedia > Forum]] > > Error occured while processing page [[meta:Wikimedia Forum]] > > Traceback (most recent call last): > > File "C:\Python\pywikibot\archivebot.py", line 601, in main > > Archiver = PageArchiver(pg, a, salt, force) > > File "C:\Python\pywikibot\archivebot.py", line 376, in __init__ > > self.loadConfig() > > File "C:\Python\pywikibot\archivebot.py", line 418, in loadConfig > > raise MissingConfigError(u'Missing or malformed template') > > MissingConfigError: Missing or malformed template > > > > C:\Python\pywikibot>* > > > > Something seems to have changed in the script's code, that now just looks > > for Talk and User talk: namespaces. > > > > Can the script please be fixed so it works on all namespaces again? - A > > fix would be deeply appreciated. > > > > Best regards, M. > > > > -- > > - Marco Aurelio > > _______________________________________________ > > Pywikipedia-l mailing list > > Pywikipedia-l(a)lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l > > > > > > > -------------------------------- > > _______________________________________________ > Pywikipedia-l mailing list > Pywikipedia-l(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l > _______________________________________________ Pywikipedia-l mailing list Pywikipedia-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
--
- Marco Aurelio
update the bot to release r10622 or higher and try again with option
--page="m::Wikimedia Forum"
double colon or preleading colon now implies main namespace as exprected and was intended for page titles long time ago
Regards
xqt
----- Original Nachricht ----
Von: legoktm <legoktm.wikipedia(a)gmail.com>
An: Pywikipedia discussion list <pywikipedia-l(a)lists.wikimedia.org>
Datum: 28.10.2012 03:10
Betreff: Re: [Pywikipedia-l] archivebot.py issues
> If you modify line 293 and remove the "defaultNamespace=3" it should work.
> Looks like that was introduced in
> pyrev:10149<https://www.mediawiki.org/wiki/Special:Code/pywikipedia/10149>.
> Not sure if it was intentional or not.
> -- Legoktm
>
>
>
> On Sat, Oct 27, 2012 at 3:19 PM, Marco Aurelio <maurelio(a)gmx.es> wrote:
>
> > Hi,
> >
> > At Meta-Wiki we've
> detected<https://meta.wikimedia.org/w/index.php?title=Meta:Babel&oldid=43415
> 73#MiszaBot_has_stopped_archiving_main_ns_pages>that the bot that used to
> archive pages (MiszaBot) has suddenly stopped
> > from doing so on main namespace pages. I did tryed with my bot using the
> > archivebot.py script on a backlogged page, being [[m:Wikimedia Forum]]
> and
> > the result is:
> >
> > *Processing [[meta:Wikimedia Forum]]
> > Looking for: {{User:MiszaBot/config}} in [[meta:User talk:Wikimedia
> Forum]]
> > Error occured while processing page [[meta:Wikimedia Forum]]
> > Traceback (most recent call last):
> > File "C:\Python\pywikibot\archivebot.py", line 601, in main
> > Archiver = PageArchiver(pg, a, salt, force)
> > File "C:\Python\pywikibot\archivebot.py", line 376, in __init__
> > self.loadConfig()
> > File "C:\Python\pywikibot\archivebot.py", line 418, in loadConfig
> > raise MissingConfigError(u'Missing or malformed template')
> > MissingConfigError: Missing or malformed template
> >
> > C:\Python\pywikibot>*
> >
> > Something seems to have changed in the script's code, that now just looks
> > for Talk and User talk: namespaces.
> >
> > Can the script please be fixed so it works on all namespaces again? - A
> > fix would be deeply appreciated.
> >
> > Best regards, M.
> >
> > --
> > - Marco Aurelio
> > _______________________________________________
> > Pywikipedia-l mailing list
> > Pywikipedia-l(a)lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
> >
> >
>
>
> --------------------------------
>
> _______________________________________________
> Pywikipedia-l mailing list
> Pywikipedia-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>
Hello,
Yes, removing "defaultNamespace=3" from the script works. I tested with the last version of the script and removed that line. The result is that the bot works OK.
Regards, M.
----- Mensaje original -----
De: legoktm
Enviado: 28-10-12 03:10
Para: Pywikipedia discussion list
Asunto: Re: [Pywikipedia-l] archivebot.py issues
If you modify line 293 and remove the "defaultNamespace=3" it should work. Looks like that was introduced in pyrev:10149 https://www.mediawiki.org/wiki/Special:Code/pywikipedia/10149 . Not sure if it was intentional or not.
-- Legoktm
On Sat, Oct 27, 2012 at 3:19 PM, Marco Aurelio < maurelio(a)gmx.es > wrote:
Hi,
At Meta-Wiki we've detected https://meta.wikimedia.org/w/index.php?title=Meta:Babel&oldid=4341573#Misza… that the bot that used to archive pages (MiszaBot) has suddenly stopped from doing so on main namespace pages. I did tryed with my bot using the archivebot.py script on a backlogged page, being [[m:Wikimedia Forum]] and the result is:
Processing [[meta:Wikimedia Forum]]
Looking for: {{User:MiszaBot/config}} in [[meta:User talk:Wikimedia Forum]]
Error occured while processing page [[meta:Wikimedia Forum]]
Traceback (most recent call last):
File "C:\Python\pywikibot\archivebot.py", line 601, in main
Archiver = PageArchiver(pg, a, salt, force)
File "C:\Python\pywikibot\archivebot.py", line 376, in __init__
self.loadConfig()
File "C:\Python\pywikibot\archivebot.py", line 418, in loadConfig
raise MissingConfigError(u'Missing or malformed template')
MissingConfigError: Missing or malformed template
C:\Python\pywikibot>
Something seems to have changed in the script's code, that now just looks for Talk and User talk: namespaces.
Can the script please be fixed so it works on all namespaces again? - A fix would be deeply appreciated.
Best regards, M.
--
- Marco Aurelio
_______________________________________________
Pywikipedia-l mailing list
Pywikipedia-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
--
- Marco Aurelio
Hi,
At Meta-Wiki we've detected https://meta.wikimedia.org/w/index.php?title=Meta:Babel&oldid=4341573#Misza… that the bot that used to archive pages (MiszaBot) has suddenly stopped from doing so on main namespace pages. I did tryed with my bot using the archivebot.py script on a backlogged page, being [[m:Wikimedia Forum]] and the result is:
/Processing [[meta:Wikimedia Forum]]
Looking for: {{User:MiszaBot/config}} in [[meta:User talk:Wikimedia Forum]]
Error occured while processing page [[meta:Wikimedia Forum]]
Traceback (most recent call last):
File "C:\Python\pywikibot\archivebot.py", line 601, in main
Archiver = PageArchiver(pg, a, salt, force)
File "C:\Python\pywikibot\archivebot.py", line 376, in __init__
self.loadConfig()
File "C:\Python\pywikibot\archivebot.py", line 418, in loadConfig
raise MissingConfigError(u'Missing or malformed template')
MissingConfigError: Missing or malformed template
C:\Python\pywikibot>/
Something seems to have changed in the script's code, that now just looks for Talk and User talk: namespaces.
Can the script please be fixed so it works on all namespaces again? - A fix would be deeply appreciated.
Best regards, M.
--
- Marco Aurelio
Hi,
Thank you very much.
Regards, M.
 ----- Mensaje original -----
 De: info(a)gno.de
 Enviado: 21-10-12 15:58
 Para: pywikipedia-l(a)lists.wikimedia.org
 Asunto: Aw: [Pywikipedia-l] Please update family file for es.wikibooks
Â
Done in r10596.
Greetings
xqt
----- Original Nachricht ----
Von: Marco Aurelio <maurelio(a)gmx.es>
An: pywikipedia-l(a)lists.wikimedia.org
Datum: 21.10.2012 00:04
Betreff: [Pywikipedia-l] Please update family file for es.wikibooks
> Hello,
>
> We've removed[1] some unused namespaces in the ES wikibooks project:
>
> Getting 1 page from wikibooks:es...
> WARNING: Family file wikibooks includes namespace['es'][102], but it should
> be removed (namespace doesn't exist in the site)
> WARNING: Family file wikibooks includes namespace['es'][103], but it should
> be removed (namespace doesn't exist in the site)
>
> Can you please make the appropiate changes to the files?
>
> Thank you very much in advance.
>
> Best regards.
>
> [1]: <https://bugzilla.wikimedia.org/show_bug.cgi?id=40838>
>
> --
> - Marco Aurelio
>
> _______________________________________________
> Pywikipedia-l mailing list
> Pywikipedia-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>
Â
--
- Marco Aurelio
Hello,
We've removed[1] some unused namespaces in the ES wikibooks project:
Getting 1 page from wikibooks:es...
WARNING: Family file wikibooks includes namespace['es'][102], but it should be removed (namespace doesn't exist in the site)
WARNING: Family file wikibooks includes namespace['es'][103], but it should be removed (namespace doesn't exist in the site)
Can you please make the appropiate changes to the files?
Thank you very much in advance.
Best regards.
[1]: <https://bugzilla.wikimedia.org/show_bug.cgi?id=40838>
--
- Marco Aurelio