Thank you for your advice! I wrote a simple method which returns a set of titles of pages that have been changed since "limit": http://paste.pocoo.org/show/547240/. It does not return the exact set (for this I think I would have to check the timestamps in the last iteration), therefore in the worst case it would return 100 unwanted titles, but this is not a problem for my purposes.
Cheers alkamid
On 6 February 2012 02:27, Morten Wang nettrom@gmail.com wrote:
To me the implementation depends on what alkamid actually wants to do. For keeping some of SuggestBot's data sources up-to-date I use the site object's recentchanges() generator to grab data (and although one can only get a limited amount at each step, I've never had troubles exhausting the generator), where it's easy to check the edit timestamp to stop iterating when necessary. I then store page titles in a set(), which can be fed to a PagesFromTitlesGenerator, and I chain said generator with a PreloadingGenerator to get the latest revisions.
In my experience only a minority of a Wikipedia edition's articles are updated on a weekly basis, so using allpages() results in a lot of unnecessary data.
Cheers, Morten
On 5 February 2012 17:28, Dr. Trigon dr.trigon@surfeu.ch wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
past week? I thought of using the AllPagesPageGenerator and executing editTime() on each page, but this method gives me only zeros if the page was not read before (e.g. I have to call page.get() first in order for editTime() to work properly). Is there any edit-time-related piece of information I can get from a generated list of pages? Or maybe there is another page generator suitable for me?
Everything using 'getall' from 'wikipedia.py' (imported as 'pywikibot') does give you the first history entry WITHOUT having to trigger page.get(). E.g. the 'PreloadingGenerator' and as you can chain the generators you can first setup your generator as 'gen1' and then pass 'gen1' to a 'PreloadingGenerator' (may be in a 'ThreadedGenerator'...) in order to get the first history entry of every page... In 'sum_disc.py' of the DrTrigonBot repo is an example for this.
Greetings
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk8vEKcACgkQAXWvBxzBrDAMTwCfe7kKUHrtgsE+EguKAuiWoODb zr4An2M5d6G0XZJGMntDLS54DL6XGdug =37Hk -----END PGP SIGNATURE-----
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l