I have done something similar recently, in a smaller scale (11K images). I used OpenRefine (here a workshop tutorial https://commons.wikimedia.org/wiki/File:Wikidata_Lab_XXXIV_-_OpenRefine_and_Structured_Data_on_Commons.webm) to obtain the wikitext and extract the QIDs from the template we use to track the monuments depicted in WLM Brazil.

De: Aleksey Chalabyan <xelgen.am@gmail.com>
Enviado: terça-feira, 4 de outubro de 2022 14:31
Para: wikilovesmonuments@lists.wikimedia.org <wikilovesmonuments@lists.wikimedia.org>
Assunto: [Wiki Loves Monuments] Getting list of monuments with no images on Commons?

Hi all,

We're holding WLM in Armenia and we wanted to get list of monuments with no photos uploaded to Commons.

We have lists on wikis as table, and I thought of using Images linking to [[Template:Cultural Heritage Armenia]] as an indicator of monument having photos. Issue is I'll need to look into wikitext of 140000 images pages uploaded in the past, parse out monument IDs and then compare it to a full list of monuments.

Before spraying some WD40 over my rusted MediaWiki API kungfu, I wanted to check if anyone has done something like that/knows tools for this, so I don't reinvent the wheel.

Would appreciate your thoughts on this.

Best,

Aleksey