On Thu, Dec 11, 2014 at 5:41 PM, Ricordisamoa ricordisamoa@openmailbox.org wrote:
As far as I understand the information Guillaume is talking about is exactly the one scraped by CommonsMetadata. See https://tools.wmflabs.org/mrmetadata/how_it_works.html: «The script needs to go through all file description pages of a wiki, and check for machine-readable metadata by querying the CommonsMetadata extension.»
That's correct, the whole purpose of the cleanup drive is to make sure that there's something scrape-able to begin with, i.e. to eliminate the cases where you just get nothing useful back from the CommonsMetadata extension. This sets the stage for potential further work along the lines of https://commons.wikimedia.org/wiki/Commons:Structured_data -- which is pretty meaty and complex work in its own right.
Erik