Hi!
If it is useful to anyone else, I have added to my library [1] in Go for processing dumps support for processing SQL dumps directly, without having to load them into a database. So one can process them directly to extract data, like dumps in other formats.
[1] https://gitlab.com/tozd/go/mediawiki
Mitar
On Thu, Feb 3, 2022 at 9:13 AM Mitar mmitar@gmail.com wrote:
Hi!
I see. Thanks.
Mitar
On Thu, Feb 3, 2022 at 7:17 AM Ariel Glenn WMF ariel@wikimedia.org wrote:
The media/file descriptions contained in the dump are the wikitext of the revisions of pages with the File: prefix, plus the metadata about those pages and revisions (user that made the edit, timestamp of edit, edit comment, and so on).
Width and hieght of the image, the media type, the sha1 of the image and a few other details can be obtained by looking at the image.sql.gz file available for download for the dumps for each wiki. Have a look at https://www.mediawiki.org/wiki/Manual:Image_table for more info.
Hope that helps!
Ariel Glenn
On Wed, Feb 2, 2022 at 10:45 PM Mitar mmitar@gmail.com wrote:
Hi!
I am trying to find a dump of all imageinfo data [1] for all files on Commons. I thought that "Articles, templates, media/file descriptions, and primary meta-pages" XML dump would contain that, given the "media/file descriptions" part, but it seems this is not the case. Is there a dump which contains that information? And what is "media/file descriptions" then? Wiki pages of files?
[1] https://www.mediawiki.org/wiki/API:Imageinfo
Mitar
-- http://mitar.tnode.com/ https://twitter.com/mitar_m _______________________________________________ Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org To unsubscribe send an email to xmldatadumps-l-leave@lists.wikimedia.org