I have done something similar recently, in a smaller scale (11K images). I used OpenRefine (here a workshop tutorial
https://commons.wikimedia.org/wiki/File:Wikidata_Lab_XXXIV_-_OpenRefine_and_Structured_Data_on_Commons.webm)
to obtain the wikitext and extract the QIDs from the template we use to track the monuments depicted in WLM Brazil.