> I'm doing some analysis on the wikipedia image metadata and seeing some
> missing image rows in the sql dumps.
>
> I downloaded
> enwiki-latest-image.sql, enwiki-latest-imagelinks.sql,
> enwiki-latest-imagelinks.sql
> and enwiki-latest-oldimage.sql from
>
http://dumps.wikimedia.org/enwiki/latest/
>
> I picked a page, 25041,
>
http://en.wikipedia.org/wiki/Special:Export/Lockheed_P-38_Lightning
>
> I get 39 links from
> "select il_to from imagelinks where il_from = 25041"
>
> When I query the image table for these, only 8 of the 39 appear.
> Some of the missing files are 050218-F-1234P-076.jpg, 020930-O-9999G-017.jpg
>
> I grepped the original mysql file for these and get nothing.
>
> I can see the original file here though:
>
http://en.wikipedia.org/wiki/File:050218-F-1234P-076.jpg
>
> I did a select count and got a total of 849,801 rows. Seems low for the
> total # of wikipedia images.
>
> Any ideas why i'm getting missing data?
>
> --
> @tommychheng
>
http://tommy.chheng.com
>