Brad Jorsch, 09/11/2012 17:30:
On Fri, Nov 9, 2012 at 7:59 AM, Hydriz Wikipedia admin@alphacorp.tk wrote:
You mentioned "a while back" for "apcontinue", show recent was it? This dump generator is attempting to archive all sorts of versions of MediaWiki, or so unless we write a backward compatibility handler in the script itself.
July 2012: http://lists.wikimedia.org/pipermail/mediawiki-api-announce/2012-July/000030...
Any wiki running version 1.19, or a 1.20 snapshot from before mid-July, would be returning the old parameter. If you do it right, though, there's little you have to do. Just use whichever keys are given you inside the <query-continue> node. Even with your regular expression mess, just capture which key is given as well as the value and use it as the key for your params dict.
Thank you again for your useful suggestions! However, as already noted, https://www.mediawiki.org/wiki/API:Query#Continuing_queries doesn't give any info about supported releases.
Nemo
P.s.: Small unreliable "temporary" things in MediaWiki, like the "powered by MediaWiki" sentence we grep for, are usually the most permanent ones, although I don't like it.
Question why dont you use Pywikipedia framework? I can see about 90% of your code becomes obsolete If you just use the existing framework and it handles the differences in MediaWiki versions automatically. (and can even fall back to screen scraping on sites that either have a ancient or missing API).
If you can write up a doc of how dumpgenerator.py should work (ignoring how it currently does and just focus on how it should and/or perfect process) and what you want the outcome to be writing up replacement will be easy. I just need specifics on exactly how/what you want the dump creator to do.
On Fri, Nov 9, 2012 at 11:52 AM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Brad Jorsch, 09/11/2012 17:30:
On Fri, Nov 9, 2012 at 7:59 AM, Hydriz Wikipedia admin@alphacorp.tk wrote:
You mentioned "a while back" for "apcontinue", show recent was it? This dump generator is attempting to archive all sorts of versions of MediaWiki, or so unless we write a backward compatibility handler in the script itself.
July 2012: http://lists.wikimedia.org/pipermail/mediawiki-api-announce/2012-July/000030...
Any wiki running version 1.19, or a 1.20 snapshot from before mid-July, would be returning the old parameter. If you do it right, though, there's little you have to do. Just use whichever keys are given you inside the <query-continue> node. Even with your regular expression mess, just capture which key is given as well as the value and use it as the key for your params dict.
Thank you again for your useful suggestions! However, as already noted, https://www.mediawiki.org/wiki/API:Query#Continuing_queries doesn't give any info about supported releases.
Nemo
P.s.: Small unreliable "temporary" things in MediaWiki, like the "powered by MediaWiki" sentence we grep for, are usually the most permanent ones, although I don't like it.
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Thank you very much John/Jack Phoenix and Betacommand! I'm not sure you'll actually be able to dump that 90 % of code, because it will probably be still needed as fallback/backwards compatibility method for super-ancient MediaWiki releases and naughty sites hiding their APIs and special pages, but it would be wonderful to rely less on it. I'll write some specifications very soon.
Nemo
Pywikipedia already has those fallbacks, as it predates the useful API
On Fri, Nov 9, 2012 at 12:14 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Thank you very much John/Jack Phoenix and Betacommand! I'm not sure you'll actually be able to dump that 90 % of code, because it will probably be still needed as fallback/backwards compatibility method for super-ancient MediaWiki releases and naughty sites hiding their APIs and special pages, but it would be wonderful to rely less on it. I'll write some specifications very soon.
Nemo
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Is there a freenode IRC chat room where we can discuss this in real time?
On Fri, Nov 9, 2012 at 12:14 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Thank you very much John/Jack Phoenix and Betacommand! I'm not sure you'll actually be able to dump that 90 % of code, because it will probably be still needed as fallback/backwards compatibility method for super-ancient MediaWiki releases and naughty sites hiding their APIs and special pages, but it would be wonderful to rely less on it. I'll write some specifications very soon.
Nemo
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
There is #pywikipediabot on freenode for pywikipedia-related matters. -- Legoktm
On Fri, Nov 9, 2012 at 11:48 AM, John phoenixoverride@gmail.com wrote:
Is there a freenode IRC chat room where we can discuss this in real time?
On Fri, Nov 9, 2012 at 12:14 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Thank you very much John/Jack Phoenix and Betacommand! I'm not sure you'll actually be able to dump that 90 % of code, because
it
will probably be still needed as fallback/backwards compatibility method
for
super-ancient MediaWiki releases and naughty sites hiding their APIs and special pages, but it would be wonderful to rely less on it. I'll write some specifications very soon.
Nemo
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
I know that I was referring to this dump generator project, I would like to convert them to PWB and email discussions take forever for trying to work out details.
On Fri, Nov 9, 2012 at 1:12 PM, legoktm legoktm.wikipedia@gmail.com wrote:
There is #pywikipediabot on freenode for pywikipedia-related matters. -- Legoktm
On Fri, Nov 9, 2012 at 11:48 AM, John phoenixoverride@gmail.com wrote:
Is there a freenode IRC chat room where we can discuss this in real time?
On Fri, Nov 9, 2012 at 12:14 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Thank you very much John/Jack Phoenix and Betacommand! I'm not sure you'll actually be able to dump that 90 % of code, because it will probably be still needed as fallback/backwards compatibility method for super-ancient MediaWiki releases and naughty sites hiding their APIs and special pages, but it would be wonderful to rely less on it. I'll write some specifications very soon.
Nemo
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
The specifications for the rewrite should be complete now: https://meta.wikimedia.org/wiki/WikiTeam/Dumpgenerator_rewrite Let me know how/where you prefer to procede and discuss me, I'll follow you. :-) As for IRC, I'm Nemo_bis everywhere; we have a #wikiteam channel on EFNet (not super-useful), I'm on #wikimedia and related on FreeNode etc.
Are you sure that PWB has all possible fallbacks? I thought some of them were removed at some point. Moreover, by downloading a few thousands different wikis in the wild I can tell you there are some very tricky one...
Nemo
pywikipedia-l@lists.wikimedia.org