Oh I see, I may have used a revision ID by mistake then. I m interested in using the last version of html article by page_id, not in accessing a particular version. I though that querying a revision must be expensive: Could you please tell if following query is ok for my purpose or can be less costly?
I am currently using this parameters:
action:'query',
prop: 'revisions', rvprop: 'content', rvparse: 1, redirects: 'true',
On Tue, Jan 12, 2016 at 7:06 AM, gnosygnu gnosygnu@gmail.com wrote:
Basically, the xml dumps have 2 IDs: page_id and revision_id.
The page_id points to the article. In this case, 14640471 is the page_id for Mars (https://en.wikipedia.org/wiki/Mars)
The revision_id points to the latest revision for the article. For Mars, the latest revision_id is 699008434 which was generated on 2016-01-09 ( https://en.wikipedia.org/w/index.php?title=Mars&oldid=699008434). Note that a revision_id is generated every time a page is edited.
So, to answer your question, the IDs never change. 14640471 will always point to Mars, while 699008434 points to the 2016-01-09 revision for Mars.
That said, different dumps will have different revision_ids, because an article may be updated. If Mars gets updated tomorrow, and the English Wikipedia dump is generated afterwards, then that dump will list Mars with a new revision_id (something higher than 6999008434). However, that dump will still show Mars with a page_id of 1460471. You're probably better off using the page_id.
Finally, you can see also reference the Wikimedia API to get a similar view to the dump: For example: https://en.wikipedia.org/w/api.php?action=query&prop=revisions&title...
Hope this helps.
On Mon, Jan 11, 2016 at 5:09 AM, Luigi Assom luigi.assom@gmail.com wrote:
yep, same here!
Also another question about consistency of _IDs in time. I was working with an old version of wikipedia dump, and testing some data models I built on the dumpusing as pivot a few topics. I might have data corrupted on my side, but just to be sure: are _IDs of article *persistent* over time, or are they subjected to change?
Might happen that due any fallback or merge in an article history, ID would change? E.g. as test article "Mars" would first point to a version _ID ="4285430" and then changed to "14640471"
I need to ensure _IDs will persist. thank you!
*P.S. sorry for cross posting - I've replied from wrong email - could you please delete the other message and keep only this email address? thank you! *
On Mon, Jan 11, 2016 at 11:05 AM, XDiscovery Team info@xdiscovery.com wrote:
yep, same here!
Also another question about consistency of _IDs in time. I was working with an old version of wikipedia dump, and testing some data models I built on the dump using as pivot a few topics. I might have data corrupted on my side, but just to be sure: are _IDs of article *persistent* over time, or are they subjected to change?
Might happen that due any fallback or merge in an article history, ID would change? E.g. as test article "Mars" would first point to a version _ID ="4285430" and then changed to "14640471"
I need to ensure _IDs will persist. thank you!
On Mon, Jan 11, 2016 at 6:22 AM, Tilman Bayer tbayer@wikimedia.org wrote:
On Sun, Jan 10, 2016 at 4:05 PM, Bernardo Sulzbach < mafagafogigante@gmail.com> wrote:
On Sun, Jan 10, 2016 at 9:55 PM, Neil Harris neil@tonal.clara.co.uk wrote:
Hello! I've noticed that no enwiki dump seems to have been generated
so far
this month. Is this by design, or has there been some sort of dump
failure?
Does anyone know when the next enwiki dump might happen?
I would also be interested.
-- Bernardo Sulzbach
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
CCing the Xmldatadumps mailing list https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l, where someone has already posted https://lists.wikimedia.org/pipermail/xmldatadumps-l/2016-January/001214.html about what might be the same issue.
-- Tilman Bayer Senior Analyst Wikimedia Foundation IRC (Freenode): HaeB
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
-- *Luigi Assom* Founder & CEO @ XDiscovery - Crazy on Human Knowledge *Corporate* www.xdiscovery.com *Mobile App for knowledge Discovery* APP STORE http://tiny.cc/LearnDiscoveryApp | PR http://tiny.cc/app_Mindmap_Wikipedia | WEB http://www.learndiscovery.com/
T +39 349 3033334 | +1 415 707 9684
-- *Luigi Assom*
T +39 349 3033334 | +1 415 707 9684 Skype oggigigi
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l