Eric,
We don't produce dumps of the revision table in sql format because some of
those revisions may be hidden from public view, and even metadata about
them should not be released. We do however publish so-called Adds/Changes
dumps once a day for each wiki, providing stubs and content files in xml of
just new pages and revisions since the last such dump. They lag about 12
hours behind to allow vandalism and such to be filtered out by wiki admins,
but hopefully that's good enough for your needs. You can find those here:
https://dumps.wikimedia.org/other/incr/
Ariel Glenn
ariel(a)wikimedia.org
On Tue, Jan 17, 2023 at 6:22 AM Eric Andrew Lewis <
eric.andrew.lewis(a)gmail.com> wrote:
Hi,
I am interested in performing analysis on recently created pages on
English Wikipedia.
One way to find recently created pages is downloading a meta-history file
for the English language, and filter through the XML, looking for pages
where the oldest revision is within the desired timespan.
Since this requires a library to parse through XML string data, I would
imagine this is much slower than a database query. Is page revision data
available in one of the SQL dumps which I could query for this use case?
Looking at the exported tables list
<https://meta.wikimedia.org/wiki/Data_dumps/What%27s_available_for_download#Database_tables>,
it does not look like it is. Maybe this is intentional?
Thanks,
Eric Andrew Lewis
ericandrewlewis.com
+1 610 715 8560
_______________________________________________
Xmldatadumps-l mailing list -- xmldatadumps-l(a)lists.wikimedia.org
To unsubscribe send an email to xmldatadumps-l-leave(a)lists.wikimedia.org