Yep,

Please notes that RDFSlice will take the subset.

That is, the triples that contain the property that you are looking for.
Here go three examples of SPARQL queries:

ps: you can try them here https://query.wikidata.org.

* For your example,

SELECT *

WHERE

{

<http://www.wikidata.org/entity/Q1652291> <http://schema.org/description> ?o .

filter(lang(?o)='en').

}

* For all English bios:

SELECT *

WHERE

{

?s <http://schema.org/description> ?o .

filter(lang(?o)='en').

}

* For all language bios:

SELECT *

WHERE

{

<http://www.wikidata.org/entity/Q1652291> <http://schema.org/description> ?o .

}

best,

Edgard

On Mon, Feb 1, 2016 at 4:34 AM, Hampton Snowball <hamptonsnowball@gmail.com> wrote:

Thanks. I see it requires constructing a query to only extract the data you want. E.g. the graph pattern:

<graphPatterns> - desired query, e.g. "SELECT * WHERE {?s ?p ?o}" or graph pattern e.g. "{?s ?p ?o}"

Since I don't know about constructing queries, would you be able to tell me what would be the proper query to extract from all the pages the short bio, english wikipedia, maybe other wikipedias?

For example from: https://www.wikidata.org/wiki/Q1652291"

"Turkish female given name"
https://en.wikipedia.org/wiki/H%C3%BClya
and optionally https://de.wikipedia.org/wiki/H%C3%BClya

Thanks in advance!

On Sun, Jan 31, 2016 at 3:53 PM, Edgard Marx <marx@informatik.uni-leipzig.de> wrote:
Hey,
you can simple use RDFSlice (https://bitbucket.org/emarx/rdfslice/overview) directly on the dump file (https://dumps.wikimedia.org/wikidatawiki/entities/20160125/)

best,
Edgard

On Sun, Jan 31, 2016 at 7:43 PM, Hampton Snowball <hamptonsnowball@gmail.com> wrote:
Hello,

I am interested in a subset of wikidata and I am trying to find the best way to get it without getting a larger dataset then necessary.

Is there a way to just get the "bios" that appear on the wikidata pages below the name of the person/organization, as well as the link to the english wikipedia page / or all wikipedia pages?

For example from: https://www.wikidata.org/wiki/Q1652291"

"Turkish female given name"
https://en.wikipedia.org/wiki/H%C3%BClya
and optionally https://de.wikipedia.org/wiki/H%C3%BClya

I know there is SPARQL which previously this list helped me construct a query, but I know some requests seem to timeout when looking at a large amount of data so I am not sure this would work.

The dumps I know are the full dataset, but I am not sure if there's any other subset dumps available or better way of grabbing this data

Thanks in advance,
HS

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata