Would it be a good approach too make a list if items that are clearly potential main topics (ie all diseases, all materials...) and create a bot that add these items as main topics if these are un the title ?


Le jeu. 16 août 2018 à 12:29, Charles Matthews <charles.r.matthews@ntlworld.com> a écrit :

It is now possible to participate in the ScienceSource project, by adding "main subject" statements.

The ScienceSource focus list has been up for a little while now at


which has the shortcut WD:SSFL. Wikidata items about biomedical articles can be added to the list as explained on the page, using P5008. That page has other links to expository material. The original grant page at 


give an overview of the project's aims.

The Listeria-generated page


linked from the focus list page shows which of the items on the list (which is around 3K now, see the SPARQL query on the talk page WT:SSFL) lack a main subject (P921) statement.

At Wikimania I made the clarification that the focus list is supposed to be better "balanced" than the selection of articles represented on Wikidata as a whole. This is a big issue, but I don't think it is really disputed that the existing literature is more interested in the diseases of prosperous people and prosperous countries. On a straight utilitarian argument about the "greatest good of the greatest number", there is a problem.

Therefore, the composition of the focus list should not be a proportionate reflection of the 17.5M articles represented in Wikidata, by topic. We are looking first to include about 0.2% of articles on the list, bringing it up to about 40K. The Listeria page is a sortable table, and if you sort by "published in" you'll see plenty from PLOS Neglected Tropical Diseases - thanks to Daniel Mietchen for adding a collection of well-cited papers from there.

Later on there should be other lists by topic area, so we can get an idea of balance. Once main subjects (where type of disease is the most important area) build up, SPARQL aggregates can reveal distribution. This bubble chart query


gives a baseline, showing that currently the list's subjects are dominated by infectious diseases.

Where next? In the coming weeks, the ScienceSource wiki at http://sciencesource.wmflabs.org/ will be developed. Text-mining and annotation there will be the next phase. Downloading of papers to the wiki will depend on accumulating metadata on their Wikidata items.


Wikidata mailing list
+33 (0)6 75 51 20 80