Hi!
I made this ticket [1] to track regaining access to metadata as a dump.
[1]
Mitar
On Tue, Feb 8, 2022 at 2:32 AM Platonides <platonides(a)gmail.com> wrote:
The metadata used to be included in the image table, but it was changed 6 months ago out
to External Storage. See
https://phabricator.wikimedia.org/T275268#7178983
On Fri, 4 Feb 2022 at 20:44, Mitar <mmitar(a)gmail.com> wrote:
>
> Hi!
>
> Will do. Thanks.
>
> After going through the image table dump, it seems not all data is in
> there. For example, page count for Djvu files is missing. Instead of
> metadata in the image table dump, a reference to text table [1] is
> provided:
>
>
{"data":[],"blobs":{"data":"tt:609531648","text":"tt:609531649"}}
>
> But that table itself does not seem to be available as a dump? Or am I
> missing something or misunderstanding something?
>
> [1]
https://www.mediawiki.org/wiki/Manual:Text_table
>
>
> Mitar
>
> On Fri, Feb 4, 2022 at 6:54 AM Ariel Glenn WMF <ariel(a)wikimedia.org> wrote:
> >
> > This looks great! If you like, you might add the link and a brief description
to this page:
https://meta.wikimedia.org/wiki/Data_dumps/Other_tools so that more people
can find and use the library :-)
> >
> > (Anyone else have tools they wrote and use, that aren't on this list? Please
add them!)
> >
> > Ariel
> >
> > On Fri, Feb 4, 2022 at 2:31 AM Mitar <mmitar(a)gmail.com> wrote:
> >>
> >> Hi!
> >>
> >> If it is useful to anyone else, I have added to my library [1] in Go
> >> for processing dumps support for processing SQL dumps directly,
> >> without having to load them into a database. So one can process them
> >> directly to extract data, like dumps in other formats.
> >>
> >> [1]
https://gitlab.com/tozd/go/mediawiki
> >>
> >>
> >> Mitar
> >>
> >> On Thu, Feb 3, 2022 at 9:13 AM Mitar <mmitar(a)gmail.com> wrote:
> >> >
> >> > Hi!
> >> >
> >> > I see. Thanks.
> >> >
> >> >
> >> > Mitar
> >> >
> >> > On Thu, Feb 3, 2022 at 7:17 AM Ariel Glenn WMF
<ariel(a)wikimedia.org> wrote:
> >> > >
> >> > > The media/file descriptions contained in the dump are the wikitext
of the revisions of pages with the File: prefix, plus the metadata about those pages and
revisions (user that made the edit, timestamp of edit, edit comment, and so on).
> >> > >
> >> > > Width and hieght of the image, the media type, the sha1 of the
image and a few other details can be obtained by looking at the image.sql.gz file
available for download for the dumps for each wiki. Have a look at
https://www.mediawiki.org/wiki/Manual:Image_table for more info.
> >> > >
> >> > > Hope that helps!
> >> > >
> >> > > Ariel Glenn
> >> > >
> >> > >
> >> > >
> >> > > On Wed, Feb 2, 2022 at 10:45 PM Mitar <mmitar(a)gmail.com>
wrote:
> >> > >>
> >> > >> Hi!
> >> > >>
> >> > >> I am trying to find a dump of all imageinfo data [1] for all
files on
> >> > >> Commons. I thought that "Articles, templates, media/file
descriptions,
> >> > >> and primary meta-pages" XML dump would contain that,
given the
> >> > >> "media/file descriptions" part, but it seems this is
not the case. Is
> >> > >> there a dump which contains that information? And what is
"media/file
> >> > >> descriptions" then? Wiki pages of files?
> >> > >>
> >> > >> [1]
https://www.mediawiki.org/wiki/API:Imageinfo
> >> > >>
> >> > >>
> >> > >> Mitar
> >> > >>
> >> > >> --
> >> > >>
http://mitar.tnode.com/
> >> > >>
https://twitter.com/mitar_m
> >> > >> _______________________________________________
> >> > >> Xmldatadumps-l mailing list --
xmldatadumps-l(a)lists.wikimedia.org
> >> > >> To unsubscribe send an email to
xmldatadumps-l-leave(a)lists.wikimedia.org
> >> >
> >> >
> >> >
> >> > --
> >> >
http://mitar.tnode.com/
> >> >
https://twitter.com/mitar_m
> >>
> >>
> >>
> >> --
> >>
http://mitar.tnode.com/
> >>
https://twitter.com/mitar_m
>
>
>
> --
>
http://mitar.tnode.com/
>
https://twitter.com/mitar_m
> _______________________________________________
> Xmldatadumps-l mailing list -- xmldatadumps-l(a)lists.wikimedia.org
> To unsubscribe send an email to xmldatadumps-l-leave(a)lists.wikimedia.org