Hi

Would it be a good approach too make a list if items that are clearly potential main topics (ie all diseases, all materials...) and create a bot that add these items as main topics if these are un the title ?

Regards



Le jeu. 16 août 2018 à 12:29, Charles Matthews <charles.r.matthews@ntlworld.com> a écrit :

It is now possible to participate in the ScienceSource project, by adding "main subject" statements.

The ScienceSource focus list has been up for a little while now at

https://www.wikidata.org/wiki/Wikidata:ScienceSource_focus_list

which has the shortcut WD:SSFL. Wikidata items about biomedical articles can be added to the list as explained on the page, using P5008. That page has other links to expository material. The original grant page at 

https://meta.wikimedia.org/wiki/Grants:Project/ScienceSource

give an overview of the project's aims.

The Listeria-generated page

https://www.wikidata.org/wiki/Wikidata:ScienceSource_focus_list/Main_subject_needed

linked from the focus list page shows which of the items on the list (which is around 3K now, see the SPARQL query on the talk page WT:SSFL) lack a main subject (P921) statement.

At Wikimania I made the clarification that the focus list is supposed to be better "balanced" than the selection of articles represented on Wikidata as a whole. This is a big issue, but I don't think it is really disputed that the existing literature is more interested in the diseases of prosperous people and prosperous countries. On a straight utilitarian argument about the "greatest good of the greatest number", there is a problem.

Therefore, the composition of the focus list should not be a proportionate reflection of the 17.5M articles represented in Wikidata, by topic. We are looking first to include about 0.2% of articles on the list, bringing it up to about 40K. The Listeria page is a sortable table, and if you sort by "published in" you'll see plenty from PLOS Neglected Tropical Diseases - thanks to Daniel Mietchen for adding a collection of well-cited papers from there.

Later on there should be other lists by topic area, so we can get an idea of balance. Once main subjects (where type of disease is the most important area) build up, SPARQL aggregates can reveal distribution. This bubble chart query

https://tinyurl.com/y89s6nlc

gives a baseline, showing that currently the list's subjects are dominated by infectious diseases.

Where next? In the coming weeks, the ScienceSource wiki at http://sciencesource.wmflabs.org/ will be developed. Text-mining and annotation there will be the next phase. Downloading of papers to the wiki will depend on accumulating metadata on their Wikidata items.

Charles


_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
--
Thibaut DEVERAUX
+33 (0)6 75 51 20 80