Excellent, I did some tests and with some cycles I already identified and
classified several articles.
I will have a look at your script in the next days but I already have a
question: the number of iterations is based on the total number of
articles, how do you know that ?
---
Fabrizio
Il giorno sab 15 dic 2018 alle ore 10:18 Egon Willighagen <
egon.willighagen(a)gmail.com> ha scritto:
The approach I use is the following, see this (Bioclipse/Groovy) script:
https://gist.github.com/egonw/ca4c348b9a2d1116efcdb55fa85dd158
It takes advantage of a combination Blazegraph SPARQL trick and breaking
up thing in batches of a certain size:
SELECT ?art ?artLabel
WITH {
SELECT ?art WHERE {
?art wdt:P31 wd:Q13442814
} LIMIT $batchSize OFFSET $offset
} AS %RESULTS {
INCLUDE %RESULTS
?art wdt:P1476 ?artLabel .
MINUS { ?art wdt:P921 wd:$conceptQ }
FILTER (contains(lcase(str(?artLabel)), "$concept"))
}
where "$concept" is my search word in the title, and $batchSize and
$offset take care of the batching by the script. This script creates
QuickStatements.
Mind you, I manually check the created statements, because in my domain
(biochem) a simple search results of false positives, hence the "blacklist"
in the script :)
Egon
On Sat, Dec 15, 2018 at 10:13 AM Fabrizio Carrai <
fabrizio.carrai(a)gmail.com> wrote:
Thanks Matthias,
that's a pity. Your suggestion relies on the effective characterization
of the item that, at this writing time, is pretty poor for my interest.
Could it be an idea to download all the "scholary articles", locally
select for the keyword of interest (e.g. "microgravity") and set the
property P921 for all of them ? Quickstatements may be helpful for the last
step, any suggestions for other tools ?
Thanks
Fabrizio
Il giorno ven 14 dic 2018 alle ore 22:16 Matthias Erfurth <erfurth(a)gmx.de>
ha scritto:
Hi Fabrizio,
unfortunately you can't fulltext search all the scholarly articles
<https://www.wikidata.org/wiki/Q13442814> , you should better work with
indexed properties, so
you can query for other articles with microgravity as main subject ...
With the ajax based wikidata search
SELECT ?item
WHERE {
?item wdt:P31 wd:Q13442814;
wdt:P921 wd:Q48655.
}
Best regards,
ciao matthias
*Gesendet:* Freitag, 14. Dezember 2018 um 18:55 Uhr
*Von:* "Fabrizio Carrai" <fabrizio.carrai(a)gmail.com>
*An:* "Discussion list for the Wikidata project" <
wikidata(a)lists.wikimedia.org>
*Betreff:* Re: [Wikidata] Query on scholarly article fails
Thanks again to Ettore, but I immediately found another timeout problem
when I just added a FILTER to find all the articles with the word "biokis"
in the title
SELECT ?istanza_di ?instanza_diLabel WHERE {
?istanza_di wdt:P31 wd:Q13442814.
?istanza_di rdfs:label ?instanza_diLabel.
FILTER((LANG(?instanza_diLabel)) = "en").
FILTER(CONTAINS(LCASE(?instanza_diLabel), "biokis"))
}
LIMIT 100
At least one article should be returned:
https://www.wikidata.org/wiki/Q57202937
but I got a timeout.
Thanks to anybody that can help
Fabrizio
Il giorno ven 14 dic 2018 alle ore 10:12 Ettore RIZZA <
ettorerizza(a)gmail.com> ha scritto:
Hello Fabrizio,
It seems that the problem comes from SERVICE wikibase:label. As said in
another discussion, the query executes in less than one second if you rewrite
it in this way
<https://query.wikidata.org/#SELECT%20%3Fistanza_di%20%3Finstanza_diLabel%20WHERE%20%7B%0A%20%20%3Fistanza_di%20wdt%3AP31%20wd%3AQ13442814.%0A%20%20%3Fistanza_di%20rdfs%3Alabel%20%3Finstanza_diLabel.%0A%20%20FILTER%28%28LANG%28%3Finstanza_diLabel%29%29%20%3D%20%22en%22%29%0A%7D%0ALIMIT%2010>
.
Cheers,
Ettore Rizza
Le ven. 14 déc. 2018 à 09:59, Fabrizio Carrai <
fabrizio.carrai(a)gmail.com> a écrit :
> Hello all,
> the following query ends with a timeot:
>
> SELECT ?istanza_di ?istanza_diLabel WHERE {
> SERVICE wikibase:label { bd:serviceParam wikibase:language
> "[AUTO_LANGUAGE],en". }
> ?istanza_di wdt:P31 wd:Q13442814.
> }
> LIMIT 10
>
> Can anybody explain why ?
> Thanks in advance
>
> --
> *Fabrizio*
> _______________________________________________
> Wikidata mailing list
> Wikidata(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
--
*Fabrizio*
_______________________________________________ Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
--
*Fabrizio*
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
--
Hi, do you like citation networks? Already 51% of all citations are
available <https://i4oc.org/> available for innovative new uses
<https://twitter.com/hashtag/acs2ioc>. Join my in asking the American
Chemical Society to join the Initiative for Open Citations too
<https://www.change.org/p/asking-the-american-chemical-society-to-join-the-initiative-for-open-citations>.
SpringerNature,
the RSC and many others already did <https://i4oc.org/#publishers>.
-----
E.L. Willighagen
Department of Bioinformatics - BiGCaT
Maastricht University (
http://www.bigcat.unimaas.nl/)
Homepage:
http://egonw.github.com/
Blog:
http://chem-bla-ics.blogspot.com/
PubList:
https://www.zotero.org/egonw
ORCID: 0000-0001-7542-0286 <http://orcid.org/0000-0001-7542-0286>
ImpactStory:
https://impactstory.org/u/egonwillighagen
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata