Hello all,
As you may know, in May 2020 we released new data for automated references, as well as a game that you can use to associate references with statements. We released this game containing 4200 potential references (see statistics). In the meantime, we parsed many more websites and collected 529K potential new references.
These new references will not be added to the game, because they are too many for their relevance to be checked by hand. As requested by some of you after the previous announcement, we published the list of all references in a dump available here https://analytics.wikimedia.org/published/datasets/periodic/wikidata-potential-references/ .
Subsets of this dump can be reused by bots and tools, however, we advise you to be careful when using it and to not mass import them to Wikidata without careful review: it is quite raw, some references may be wrong or irrelevant. In order to help you analyze these references and filter the most useful ones, we are also providing a dashboard https://wmdeanalytics.wmflabs.org/WD_GameReferenceHunt containing an overview of the judgements made in the game so you can see which parts are more likely to be of higher or lower quality.
We’re happy to release the dumps and the dashboard just in time for the Wikidata birthday https://www.wikidata.org/wiki/Wikidata:Eighth_Birthday/Presents :)
If you have any questions or encounter issues with the dump or the dashboard, please let us know on the talk page https://www.wikidata.org/wiki/Wikidata_talk:Automated_finding_references_input .
Cheers,