Oh I see, I may have used a revision ID by mistake then.
I m interested in using the last version of html article by page_id, not in
accessing a particular version.
I though that querying a revision must be expensive:
Could you please tell if following query is ok for my purpose or can be
less costly?
I am currently using this parameters:
action:'query',
prop: 'revisions',
rvprop: 'content',
rvparse: 1,
redirects: 'true',
On Tue, Jan 12, 2016 at 7:06 AM, gnosygnu <gnosygnu(a)gmail.com> wrote:
Basically, the xml dumps have 2 IDs: page_id and
revision_id.
The page_id points to the article. In this case, 14640471 is the page_id
for Mars (
https://en.wikipedia.org/wiki/Mars)
The revision_id points to the latest revision for the article. For Mars,
the latest revision_id is 699008434 which was generated on 2016-01-09 (
https://en.wikipedia.org/w/index.php?title=Mars&oldid=699008434)434). Note
that a revision_id is generated every time a page is edited.
So, to answer your question, the IDs never change. 14640471 will always
point to Mars, while 699008434 points to the 2016-01-09 revision for Mars.
That said, different dumps will have different revision_ids, because an
article may be updated. If Mars gets updated tomorrow, and the English
Wikipedia dump is generated afterwards, then that dump will list Mars with
a new revision_id (something higher than 6999008434). However, that dump
will still show Mars with a page_id of 1460471. You're probably better off
using the page_id.
Finally, you can see also reference the Wikimedia API to get a similar
view to the dump: For example:
https://en.wikipedia.org/w/api.php?action=query&prop=revisions&titl…
Hope this helps.
On Mon, Jan 11, 2016 at 5:09 AM, Luigi Assom <luigi.assom(a)gmail.com>
wrote:
yep, same here!
Also another question about consistency of _IDs in time.
I was working with an old version of wikipedia dump, and testing some
data models I built on the dumpusing as pivot a few topics.
I might have data corrupted on my side, but just to be sure:
are _IDs of article *persistent* over time, or are they subjected to
change?
Might happen that due any fallback or merge in an article history, ID
would change?
E.g. as test article "Mars" would first point to a version _ID
="4285430"
and then changed to "14640471"
I need to ensure _IDs will persist.
thank you!
*P.S. sorry for cross posting - I've replied from wrong email - could you
please delete the other message and keep only this email address? thank
you! *
On Mon, Jan 11, 2016 at 11:05 AM, XDiscovery Team <info(a)xdiscovery.com>
wrote:
yep, same here!
Also another question about consistency of _IDs in time.
I was working with an old version of wikipedia dump, and testing some
data models I built on the dump using as pivot a few topics.
I might have data corrupted on my side, but just to be sure:
are _IDs of article *persistent* over time, or are they subjected to
change?
Might happen that due any fallback or merge in an article history, ID
would change?
E.g. as test article "Mars" would first point to a version _ID
="4285430" and then changed to "14640471"
I need to ensure _IDs will persist.
thank you!
On Mon, Jan 11, 2016 at 6:22 AM, Tilman Bayer <tbayer(a)wikimedia.org>
wrote:
On Sun, Jan 10, 2016 at 4:05 PM, Bernardo
Sulzbach <
mafagafogigante(a)gmail.com> wrote:
> On Sun, Jan 10, 2016 at 9:55 PM, Neil Harris <neil(a)tonal.clara.co.uk>
> wrote:
> > Hello! I've noticed that no enwiki dump seems to have been generated
> so far
> > this month. Is this by design, or has there been some sort of dump
> failure?
> > Does anyone know when the next enwiki dump might happen?
> >
>
> I would also be interested.
>
> --
> Bernardo Sulzbach
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
CCing the Xmldatadumps mailing list
<https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l>, where
someone has already posted
<https://lists.wikimedia.org/pipermail/xmldatadumps-l/2016-January/001214.html>
about
what might be the same issue.
--
Tilman Bayer
Senior Analyst
Wikimedia Foundation
IRC (Freenode): HaeB
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
--
*Luigi Assom*
Founder & CEO @ XDiscovery - Crazy on Human Knowledge
*Corporate*
www.xdiscovery.com
*Mobile App for knowledge Discovery*
APP STORE <http://tiny.cc/LearnDiscoveryApp> | PR
<http://tiny.cc/app_Mindmap_Wikipedia> | WEB
<http://www.learndiscovery.com/>
T +39 349 3033334 | +1 415 707 9684
--
*Luigi Assom*
T +39 349 3033334 | +1 415 707 9684
Skype oggigigi
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l