Thank you. This will give me the bios, however, I still want the associated wikipedia links. Previously someone had given me a query that included the english wikipedia along with another property. You can see it below:

PREFIX wd: <http://www.wikidata.org/entity/>

PREFIX wdt: <http://www.wikidata.org/prop/direct/>

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

PREFIX schema: <http://schema.org/>

SELECT ?item ?twitter ?article WHERE {

?item wdt:P2002 ?twitter

OPTIONAL {?item rdfs:label ?item_label filter (lang(?item_label) = "en") .}

?article schema:about ?item .

?article schema:inLanguage "en" .

FILTER (SUBSTR(str(?article), 1, 25) = "https://en.wikipedia.org/")

}

ORDER BY ASC (?article)

I tried to take the PREFIX header and this portion to append to some of your queries.

?article schema:about ?item .

?article schema:inLanguage "en" .

FILTER (SUBSTR(str(?article), 1, 25) = "https://en.wikipedia.org/")

The first one, which seems to be only for 1 record, just as a test seemed to give me an ERROR though:

PREFIX wd: <http://www.wikidata.org/entity/>

PREFIX wdt: <http://www.wikidata.org/prop/direct/>

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

PREFIX schema: <http://schema.org/>

SELECT *

WHERE

{

<http://www.wikidata.org/entity/Q1652291> <http://schema.org/description> ?o .

filter(lang(?o)='en').

?article schema:about ?item .

?article schema:inLanguage "en" .

FILTER (SUBSTR(str(?article), 1, 25) = "https://en.wikipedia.org/")

}

So I assume the other queries like this would not work (would timeout on query.wikidata.org so can't test):

PREFIX wd: <http://www.wikidata.org/entity/>

PREFIX wdt: <http://www.wikidata.org/prop/direct/>

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

PREFIX schema: <http://schema.org/>

SELECT *

WHERE

{

?s <http://schema.org/description> ?o .

filter(lang(?o)='en').

?article schema:about ?item .

?article schema:inLanguage "en" .

FILTER (SUBSTR(str(?article), 1, 25) = "https://en.wikipedia.org/")

}

So am I doing something wrong with these combined queries in the syntax?

Thanks in advance again, and the help thus far!

On Mon, Feb 1, 2016 at 1:19 AM, Edgard Marx <marx@informatik.uni-leipzig.de> wrote:

Yep,

Please notes that RDFSlice will take the subset.
That is, the triples that contain the property that you are looking for.
Here go three examples of SPARQL queries:

ps: you can try them here https://query.wikidata.org.

* For your example,

SELECT *
WHERE
{
<http://www.wikidata.org/entity/Q1652291> <http://schema.org/description> ?o .
filter(lang(?o)='en').
}

* For all English bios:

SELECT *
WHERE
{
?s <http://schema.org/description> ?o .
filter(lang(?o)='en').
}

* For all language bios:

SELECT *
WHERE
{
<http://www.wikidata.org/entity/Q1652291> <http://schema.org/description> ?o .
}

best,
Edgard

On Mon, Feb 1, 2016 at 4:34 AM, Hampton Snowball <hamptonsnowball@gmail.com> wrote:
Thanks. I see it requires constructing a query to only extract the data you want. E.g. the graph pattern:

<graphPatterns> - desired query, e.g. "SELECT * WHERE {?s ?p ?o}" or graph pattern e.g. "{?s ?p ?o}"

Since I don't know about constructing queries, would you be able to tell me what would be the proper query to extract from all the pages the short bio, english wikipedia, maybe other wikipedias?

For example from: https://www.wikidata.org/wiki/Q1652291"

"Turkish female given name"
https://en.wikipedia.org/wiki/H%C3%BClya
and optionally https://de.wikipedia.org/wiki/H%C3%BClya

Thanks in advance!

On Sun, Jan 31, 2016 at 3:53 PM, Edgard Marx <marx@informatik.uni-leipzig.de> wrote:
Hey,
you can simple use RDFSlice (https://bitbucket.org/emarx/rdfslice/overview) directly on the dump file (https://dumps.wikimedia.org/wikidatawiki/entities/20160125/)

best,
Edgard

On Sun, Jan 31, 2016 at 7:43 PM, Hampton Snowball <hamptonsnowball@gmail.com> wrote:
Hello,

I am interested in a subset of wikidata and I am trying to find the best way to get it without getting a larger dataset then necessary.

Is there a way to just get the "bios" that appear on the wikidata pages below the name of the person/organization, as well as the link to the english wikipedia page / or all wikipedia pages?

For example from: https://www.wikidata.org/wiki/Q1652291"

"Turkish female given name"
https://en.wikipedia.org/wiki/H%C3%BClya
and optionally https://de.wikipedia.org/wiki/H%C3%BClya

I know there is SPARQL which previously this list helped me construct a query, but I know some requests seem to timeout when looking at a large amount of data so I am not sure this would work.

The dumps I know are the full dataset, but I am not sure if there's any other subset dumps available or better way of grabbing this data

Thanks in advance,
HS

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata