I round up from DOI/PubMed ID counts on https://tools.wmflabs.org/scholia/
Egon
On Sat, Dec 15, 2018 at 3:03 PM Fabrizio Carrai fabrizio.carrai@gmail.com wrote:
Excellent, I did some tests and with some cycles I already identified and classified several articles. I will have a look at your script in the next days but I already have a question: the number of iterations is based on the total number of articles, how do you know that ?
Fabrizio
Il giorno sab 15 dic 2018 alle ore 10:18 Egon Willighagen < egon.willighagen@gmail.com> ha scritto:
The approach I use is the following, see this (Bioclipse/Groovy) script: https://gist.github.com/egonw/ca4c348b9a2d1116efcdb55fa85dd158
It takes advantage of a combination Blazegraph SPARQL trick and breaking up thing in batches of a certain size:
SELECT ?art ?artLabel WITH { SELECT ?art WHERE { ?art wdt:P31 wd:Q13442814 } LIMIT $batchSize OFFSET $offset } AS %RESULTS { INCLUDE %RESULTS ?art wdt:P1476 ?artLabel . MINUS { ?art wdt:P921 wd:$conceptQ } FILTER (contains(lcase(str(?artLabel)), "$concept")) } where "$concept" is my search word in the title, and $batchSize and $offset take care of the batching by the script. This script creates QuickStatements.
Mind you, I manually check the created statements, because in my domain (biochem) a simple search results of false positives, hence the "blacklist" in the script :)
Egon
On Sat, Dec 15, 2018 at 10:13 AM Fabrizio Carrai < fabrizio.carrai@gmail.com> wrote:
Thanks Matthias, that's a pity. Your suggestion relies on the effective characterization of the item that, at this writing time, is pretty poor for my interest. Could it be an idea to download all the "scholary articles", locally select for the keyword of interest (e.g. "microgravity") and set the property P921 for all of them ? Quickstatements may be helpful for the last step, any suggestions for other tools ?
Thanks Fabrizio
Il giorno ven 14 dic 2018 alle ore 22:16 Matthias Erfurth < erfurth@gmx.de> ha scritto:
Hi Fabrizio, unfortunately you can't fulltext search all the scholarly articles https://www.wikidata.org/wiki/Q13442814 , you should better work with indexed properties, so you can query for other articles with microgravity as main subject ... With the ajax based wikidata search
SELECT ?item WHERE { ?item wdt:P31 wd:Q13442814; wdt:P921 wd:Q48655. }
Best regards,
ciao matthias
*Gesendet:* Freitag, 14. Dezember 2018 um 18:55 Uhr *Von:* "Fabrizio Carrai" fabrizio.carrai@gmail.com *An:* "Discussion list for the Wikidata project" < wikidata@lists.wikimedia.org> *Betreff:* Re: [Wikidata] Query on scholarly article fails Thanks again to Ettore, but I immediately found another timeout problem when I just added a FILTER to find all the articles with the word "biokis" in the title
SELECT ?istanza_di ?instanza_diLabel WHERE { ?istanza_di wdt:P31 wd:Q13442814. ?istanza_di rdfs:label ?instanza_diLabel. FILTER((LANG(?instanza_diLabel)) = "en"). FILTER(CONTAINS(LCASE(?instanza_diLabel), "biokis")) } LIMIT 100
At least one article should be returned: https://www.wikidata.org/wiki/Q57202937 but I got a timeout.
Thanks to anybody that can help
Fabrizio
Il giorno ven 14 dic 2018 alle ore 10:12 Ettore RIZZA < ettorerizza@gmail.com> ha scritto:
Hello Fabrizio,
It seems that the problem comes from SERVICE wikibase:label. As said in another discussion, the query executes in less than one second if you rewrite it in this way https://query.wikidata.org/#SELECT%20%3Fistanza_di%20%3Finstanza_diLabel%20WHERE%20%7B%0A%20%20%3Fistanza_di%20wdt%3AP31%20wd%3AQ13442814.%0A%20%20%3Fistanza_di%20rdfs%3Alabel%20%3Finstanza_diLabel.%0A%20%20FILTER%28%28LANG%28%3Finstanza_diLabel%29%29%20%3D%20%22en%22%29%0A%7D%0ALIMIT%2010 .
Cheers,
Ettore Rizza
Le ven. 14 déc. 2018 à 09:59, Fabrizio Carrai < fabrizio.carrai@gmail.com> a écrit :
Hello all, the following query ends with a timeot:
SELECT ?istanza_di ?istanza_diLabel WHERE { SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } ?istanza_di wdt:P31 wd:Q13442814. } LIMIT 10
Can anybody explain why ? Thanks in advance
-- *Fabrizio* _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- *Fabrizio* _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- *Fabrizio* _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Hi, do you like citation networks? Already 51% of all citations are available https://i4oc.org/ available for innovative new uses https://twitter.com/hashtag/acs2ioc. Join my in asking the American Chemical Society to join the Initiative for Open Citations too https://www.change.org/p/asking-the-american-chemical-society-to-join-the-initiative-for-open-citations. SpringerNature, the RSC and many others already did https://i4oc.org/#publishers.
E.L. Willighagen Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ Blog: http://chem-bla-ics.blogspot.com/ PubList: https://www.zotero.org/egonw ORCID: 0000-0001-7542-0286 http://orcid.org/0000-0001-7542-0286 ImpactStory: https://impactstory.org/u/egonwillighagen _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- *Fabrizio* _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata