Wikidata - short biographies - Wikidata

List overview All Threads
Download

newer

Wikidata - short biographies

older

Other sites

Wikiversity getting language links...

Hampton Snowball

1 Feb 2016 1 Feb '16

2:43 a.m.

Hello,

I am interested in a subset of wikidata and I am trying to find the best way to get it without getting a larger dataset then necessary.

Is there a way to just get the "bios" that appear on the wikidata pages below the name of the person/organization, as well as the link to the english wikipedia page / or all wikipedia pages?

For example from: https://www.wikidata.org/wiki/Q1652291"

"Turkish female given name" https://en.wikipedia.org/wiki/H%C3%BClya and optionally https://de.wikipedia.org/wiki/H%C3%BClya

I know there is SPARQL which previously this list helped me construct a query, but I know some requests seem to timeout when looking at a large amount of data so I am not sure this would work.

The dumps I know are the full dataset, but I am not sure if there's any other subset dumps available or better way of grabbing this data

Thanks in advance, HS

Attachments:

attachment.htm (text/html — 1.3 KB)

Show replies by date

Gerard Meijssen

1 Feb 1 Feb

2:49 a.m.

Hoi, Magnus created automated descriptions they are a start. Your only problem is that they are not using sparql Thanks, GerardM

On 31 January 2016 at 19:43, Hampton Snowball hamptonsnowball@gmail.com wrote:

...

Hello,

I am interested in a subset of wikidata and I am trying to find the best way to get it without getting a larger dataset then necessary.

Is there a way to just get the "bios" that appear on the wikidata pages below the name of the person/organization, as well as the link to the english wikipedia page / or all wikipedia pages?

For example from: https://www.wikidata.org/wiki/Q1652291"

"Turkish female given name" https://en.wikipedia.org/wiki/H%C3%BClya and optionally https://de.wikipedia.org/wiki/H%C3%BClya

I know there is SPARQL which previously this list helped me construct a query, but I know some requests seem to timeout when looking at a large amount of data so I am not sure this would work.

The dumps I know are the full dataset, but I am not sure if there's any other subset dumps available or better way of grabbing this data

Thanks in advance, HS

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Edgard Marx

4:53 a.m.

Hey, you can simple use RDFSlice (https://bitbucket.org/emarx/rdfslice/overview) directly on the dump file ( https://dumps.wikimedia.org/wikidatawiki/entities/20160125/)

best, Edgard

On Sun, Jan 31, 2016 at 7:43 PM, Hampton Snowball <hamptonsnowball@gmail.com

...

wrote:

...

Hello,

I am interested in a subset of wikidata and I am trying to find the best way to get it without getting a larger dataset then necessary.

Is there a way to just get the "bios" that appear on the wikidata pages below the name of the person/organization, as well as the link to the english wikipedia page / or all wikipedia pages?

For example from: https://www.wikidata.org/wiki/Q1652291"

"Turkish female given name" https://en.wikipedia.org/wiki/H%C3%BClya and optionally https://de.wikipedia.org/wiki/H%C3%BClya

I know there is SPARQL which previously this list helped me construct a query, but I know some requests seem to timeout when looking at a large amount of data so I am not sure this would work.

The dumps I know are the full dataset, but I am not sure if there's any other subset dumps available or better way of grabbing this data

Thanks in advance, HS

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Hampton Snowball

11:34 a.m.

Thanks. I see it requires constructing a query to only extract the data you want. E.g. the graph pattern:

<graphPatterns> - desired query, e.g. "SELECT * WHERE {?s ?p ?o}" or graph pattern e.g. "{?s ?p ?o}"

Since I don't know about constructing queries, would you be able to tell me what would be the proper query to extract from all the pages the short bio, english wikipedia, maybe other wikipedias?

For example from: https://www.wikidata.org/wiki/Q1652291"

"Turkish female given name" https://en.wikipedia.org/wiki/H%C3%BClya and optionally https://de.wikipedia.org/wiki/H%C3%BClya

Thanks in advance!

On Sun, Jan 31, 2016 at 3:53 PM, Edgard Marx <marx@informatik.uni-leipzig.de

...

wrote:

...

Hey, you can simple use RDFSlice (https://bitbucket.org/emarx/rdfslice/overview) directly on the dump file ( https://dumps.wikimedia.org/wikidatawiki/entities/20160125/)

best, Edgard

On Sun, Jan 31, 2016 at 7:43 PM, Hampton Snowball < hamptonsnowball@gmail.com> wrote:

...
Hello,

I am interested in a subset of wikidata and I am trying to find the best way to get it without getting a larger dataset then necessary.

Is there a way to just get the "bios" that appear on the wikidata pages below the name of the person/organization, as well as the link to the english wikipedia page / or all wikipedia pages?

For example from: https://www.wikidata.org/wiki/Q1652291"

"Turkish female given name" https://en.wikipedia.org/wiki/H%C3%BClya and optionally https://de.wikipedia.org/wiki/H%C3%BClya

I know there is SPARQL which previously this list helped me construct a query, but I know some requests seem to timeout when looking at a large amount of data so I am not sure this would work.

The dumps I know are the full dataset, but I am not sure if there's any other subset dumps available or better way of grabbing this data

Thanks in advance, HS

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Edgard Marx

2:19 p.m.

Yep,

Please notes that RDFSlice will take the subset. That is, the triples that contain the property that you are looking for. Here go three examples of SPARQL queries:

ps: you can try them here https://query.wikidata.org.

** For your example,*

SELECT * WHERE { http://www.wikidata.org/entity/Q1652291 http://schema.org/description ?o . filter(lang(?o)='en'). }

** For all English bios:*

SELECT * WHERE { ?s http://schema.org/description ?o . filter(lang(?o)='en'). }

** For all language bios:*

SELECT * WHERE { http://www.wikidata.org/entity/Q1652291 http://schema.org/description ?o . }

best, Edgard

On Mon, Feb 1, 2016 at 4:34 AM, Hampton Snowball hamptonsnowball@gmail.com wrote:

...

Thanks. I see it requires constructing a query to only extract the data you want. E.g. the graph pattern:

<graphPatterns> - desired query, e.g. "SELECT * WHERE {?s ?p ?o}" or graph pattern e.g. "{?s ?p ?o}"

Since I don't know about constructing queries, would you be able to tell me what would be the proper query to extract from all the pages the short bio, english wikipedia, maybe other wikipedias?

For example from: https://www.wikidata.org/wiki/Q1652291"

"Turkish female given name" https://en.wikipedia.org/wiki/H%C3%BClya and optionally https://de.wikipedia.org/wiki/H%C3%BClya

Thanks in advance!

On Sun, Jan 31, 2016 at 3:53 PM, Edgard Marx < marx@informatik.uni-leipzig.de> wrote:

...
Hey, you can simple use RDFSlice ( https://bitbucket.org/emarx/rdfslice/overview) directly on the dump file (https://dumps.wikimedia.org/wikidatawiki/entities/20160125/)

best, Edgard

On Sun, Jan 31, 2016 at 7:43 PM, Hampton Snowball < hamptonsnowball@gmail.com> wrote:

...
Hello,

I am interested in a subset of wikidata and I am trying to find the best way to get it without getting a larger dataset then necessary.

Is there a way to just get the "bios" that appear on the wikidata pages below the name of the person/organization, as well as the link to the english wikipedia page / or all wikipedia pages?

For example from: https://www.wikidata.org/wiki/Q1652291"

"Turkish female given name" https://en.wikipedia.org/wiki/H%C3%BClya and optionally https://de.wikipedia.org/wiki/H%C3%BClya

I know there is SPARQL which previously this list helped me construct a query, but I know some requests seem to timeout when looking at a large amount of data so I am not sure this would work.

The dumps I know are the full dataset, but I am not sure if there's any other subset dumps available or better way of grabbing this data

Thanks in advance, HS

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Stas Malyshev

2:25 p.m.

Hi!

...

** For all English bios:*

SELECT * WHERE { ?s http://schema.org/description ?o . filter(lang(?o)='en'). }

Please don't run this on query.wikidata.org though. Please add LIMIT. Otherwise you'd be trying to download several millions of data items, which would probably time out anyway. Add something like "LIMIT 10" to it.

Thanks,

-- Stas Malyshev smalyshev@wikimedia.org

Edgard Marx

3:07 p.m.

Yep,

One more reason to use RDFSlice ;-),

thnks

On Mon, Feb 1, 2016 at 7:25 AM, Stas Malyshev smalyshev@wikimedia.org wrote:

...

Hi!

...
** For all English bios:*

SELECT * WHERE { ?s http://schema.org/description ?o . filter(lang(?o)='en'). }

Please don't run this on query.wikidata.org though. Please add LIMIT. Otherwise you'd be trying to download several millions of data items, which would probably time out anyway. Add something like "LIMIT 10" to it.

Thanks,

Stas Malyshev smalyshev@wikimedia.org

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Hampton Snowball

2 Feb 2 Feb

12:12 a.m.

Thank you. This will give me the bios, however, I still want the associated wikipedia links. Previously someone had given me a query that included the english wikipedia along with another property. You can see it below:

PREFIX wd: http://www.wikidata.org/entity/ PREFIX wdt: http://www.wikidata.org/prop/direct/ PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema# PREFIX schema: http://schema.org/

SELECT ?item ?twitter ?article WHERE { ?item wdt:P2002 ?twitter OPTIONAL {?item rdfs:label ?item_label filter (lang(?item_label) = "en") .}

?article schema:about ?item . ?article schema:inLanguage "en" . FILTER (SUBSTR(str(?article), 1, 25) = "https://en.wikipedia.org/")

} ORDER BY ASC (?article)

*I tried to take the PREFIX header and this portion to append to some of your queries. *

?article schema:about ?item . ?article schema:inLanguage "en" . FILTER (SUBSTR(str(?article), 1, 25) = "https://en.wikipedia.org/")

*The first one, which seems to be only for 1 record, just as a test seemed to give me an ERROR though:*

PREFIX wd: http://www.wikidata.org/entity/ PREFIX wdt: http://www.wikidata.org/prop/direct/ PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema# PREFIX schema: http://schema.org/

SELECT * WHERE { http://www.wikidata.org/entity/Q1652291 http://schema.org/description ?o . filter(lang(?o)='en').

?article schema:about ?item . ?article schema:inLanguage "en" . FILTER (SUBSTR(str(?article), 1, 25) = "https://en.wikipedia.org/") }

*So I assume the other queries like this would not work (would timeout on query.wikidata.org http://query.wikidata.org so can't test):*

PREFIX wd: http://www.wikidata.org/entity/ PREFIX wdt: http://www.wikidata.org/prop/direct/ PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema# PREFIX schema: http://schema.org/

SELECT * WHERE { ?s http://schema.org/description ?o . filter(lang(?o)='en').

?article schema:about ?item . ?article schema:inLanguage "en" . FILTER (SUBSTR(str(?article), 1, 25) = "https://en.wikipedia.org/") }

So am I doing something wrong with these combined queries in the syntax?

Thanks in advance again, and the help thus far!

On Mon, Feb 1, 2016 at 1:19 AM, Edgard Marx marx@informatik.uni-leipzig.de wrote:

...

Yep,

Please notes that RDFSlice will take the subset. That is, the triples that contain the property that you are looking for. Here go three examples of SPARQL queries:

ps: you can try them here https://query.wikidata.org.

** For your example,*

SELECT * WHERE { http://www.wikidata.org/entity/Q1652291 < http://schema.org/description%3E ?o . filter(lang(?o)='en'). }

** For all English bios:*

SELECT * WHERE { ?s http://schema.org/description ?o . filter(lang(?o)='en'). }

** For all language bios:*

SELECT * WHERE { http://www.wikidata.org/entity/Q1652291 < http://schema.org/description%3E ?o . }

best, Edgard

On Mon, Feb 1, 2016 at 4:34 AM, Hampton Snowball < hamptonsnowball@gmail.com> wrote:

...
Thanks. I see it requires constructing a query to only extract the data you want. E.g. the graph pattern:

<graphPatterns> - desired query, e.g. "SELECT * WHERE {?s ?p ?o}" or graph pattern e.g. "{?s ?p ?o}"

Since I don't know about constructing queries, would you be able to tell me what would be the proper query to extract from all the pages the short bio, english wikipedia, maybe other wikipedias?

For example from: https://www.wikidata.org/wiki/Q1652291"

"Turkish female given name" https://en.wikipedia.org/wiki/H%C3%BClya and optionally https://de.wikipedia.org/wiki/H%C3%BClya

Thanks in advance!

On Sun, Jan 31, 2016 at 3:53 PM, Edgard Marx < marx@informatik.uni-leipzig.de> wrote:

...
Hey, you can simple use RDFSlice ( https://bitbucket.org/emarx/rdfslice/overview) directly on the dump file (https://dumps.wikimedia.org/wikidatawiki/entities/20160125/)

best, Edgard

On Sun, Jan 31, 2016 at 7:43 PM, Hampton Snowball < hamptonsnowball@gmail.com> wrote:

...
Hello,

I am interested in a subset of wikidata and I am trying to find the best way to get it without getting a larger dataset then necessary.

Is there a way to just get the "bios" that appear on the wikidata pages below the name of the person/organization, as well as the link to the english wikipedia page / or all wikipedia pages?

For example from: https://www.wikidata.org/wiki/Q1652291"

"Turkish female given name" https://en.wikipedia.org/wiki/H%C3%BClya and optionally https://de.wikipedia.org/wiki/H%C3%BClya

I know there is SPARQL which previously this list helped me construct a query, but I know some requests seem to timeout when looking at a large amount of data so I am not sure this would work.

The dumps I know are the full dataset, but I am not sure if there's any other subset dumps available or better way of grabbing this data

Thanks in advance, HS

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Edgard Marx

12:43 a.m.

Wikidata seems to be highly queried by now, it is a public endoint.

However, the query bellow might work in RDFSlice:

ps: notice that the subject variable (?article) contains the wikipedia link and it will be extracted.

SELECT * WHERE { ?article http://schema.org/description ?o . ?article http://schema.org/about ?o1 . ?article http://www.w3.org/2000/01/rdf-schema#label ?o2 . }

best, Edgard

On Mon, Feb 1, 2016 at 5:12 PM, Hampton Snowball hamptonsnowball@gmail.com wrote:

...

Thank you. This will give me the bios, however, I still want the associated wikipedia links. Previously someone had given me a query that included the english wikipedia along with another property. You can see it below:

PREFIX wd: http://www.wikidata.org/entity/ PREFIX wdt: http://www.wikidata.org/prop/direct/ PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema# PREFIX schema: http://schema.org/

SELECT ?item ?twitter ?article WHERE { ?item wdt:P2002 ?twitter OPTIONAL {?item rdfs:label ?item_label filter (lang(?item_label) = "en") .}

?article schema:about ?item . ?article schema:inLanguage "en" . FILTER (SUBSTR(str(?article), 1, 25) = "https://en.wikipedia.org/")

} ORDER BY ASC (?article)

*I tried to take the PREFIX header and this portion to append to some of your queries. *

?article schema:about ?item . ?article schema:inLanguage "en" . FILTER (SUBSTR(str(?article), 1, 25) = "https://en.wikipedia.org/")

*The first one, which seems to be only for 1 record, just as a test seemed to give me an ERROR though:*

PREFIX wd: http://www.wikidata.org/entity/ PREFIX wdt: http://www.wikidata.org/prop/direct/ PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema# PREFIX schema: http://schema.org/

SELECT * WHERE { http://www.wikidata.org/entity/Q1652291 < http://schema.org/description%3E ?o . filter(lang(?o)='en').
?article schema:about ?item .
?article schema:inLanguage "en" .
FILTER (SUBSTR(str(?article), 1, 25) = "https://en.wikipedia.org/")
}

*So I assume the other queries like this would not work (would timeout on query.wikidata.org http://query.wikidata.org so can't test):*

PREFIX wd: http://www.wikidata.org/entity/ PREFIX wdt: http://www.wikidata.org/prop/direct/ PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema# PREFIX schema: http://schema.org/

SELECT * WHERE { ?s http://schema.org/description ?o . filter(lang(?o)='en').
?article schema:about ?item .
?article schema:inLanguage "en" .
FILTER (SUBSTR(str(?article), 1, 25) = "https://en.wikipedia.org/")
}

So am I doing something wrong with these combined queries in the syntax?

Thanks in advance again, and the help thus far!

On Mon, Feb 1, 2016 at 1:19 AM, Edgard Marx < marx@informatik.uni-leipzig.de> wrote:

...
Yep,

Please notes that RDFSlice will take the subset. That is, the triples that contain the property that you are looking for. Here go three examples of SPARQL queries:

ps: you can try them here https://query.wikidata.org.

** For your example,*

SELECT * WHERE { http://www.wikidata.org/entity/Q1652291 < http://schema.org/description%3E ?o . filter(lang(?o)='en'). }

** For all English bios:*

SELECT * WHERE { ?s http://schema.org/description ?o . filter(lang(?o)='en'). }

** For all language bios:*

SELECT * WHERE { http://www.wikidata.org/entity/Q1652291 < http://schema.org/description%3E ?o . }

best, Edgard

On Mon, Feb 1, 2016 at 4:34 AM, Hampton Snowball < hamptonsnowball@gmail.com> wrote:

...
Thanks. I see it requires constructing a query to only extract the data you want. E.g. the graph pattern:

<graphPatterns> - desired query, e.g. "SELECT * WHERE {?s ?p ?o}" or graph pattern e.g. "{?s ?p ?o}"

Since I don't know about constructing queries, would you be able to tell me what would be the proper query to extract from all the pages the short bio, english wikipedia, maybe other wikipedias?

For example from: https://www.wikidata.org/wiki/Q1652291"

"Turkish female given name" https://en.wikipedia.org/wiki/H%C3%BClya and optionally https://de.wikipedia.org/wiki/H%C3%BClya

Thanks in advance!

On Sun, Jan 31, 2016 at 3:53 PM, Edgard Marx < marx@informatik.uni-leipzig.de> wrote:

...
Hey, you can simple use RDFSlice ( https://bitbucket.org/emarx/rdfslice/overview) directly on the dump file (https://dumps.wikimedia.org/wikidatawiki/entities/20160125/)

best, Edgard

On Sun, Jan 31, 2016 at 7:43 PM, Hampton Snowball < hamptonsnowball@gmail.com> wrote:

...
Hello,

I am interested in a subset of wikidata and I am trying to find the best way to get it without getting a larger dataset then necessary.

Is there a way to just get the "bios" that appear on the wikidata pages below the name of the person/organization, as well as the link to the english wikipedia page / or all wikipedia pages?

For example from: https://www.wikidata.org/wiki/Q1652291"

"Turkish female given name" https://en.wikipedia.org/wiki/H%C3%BClya and optionally https://de.wikipedia.org/wiki/H%C3%BClya

I know there is SPARQL which previously this list helped me construct a query, but I know some requests seem to timeout when looking at a large amount of data so I am not sure this would work.

The dumps I know are the full dataset, but I am not sure if there's any other subset dumps available or better way of grabbing this data

Thanks in advance, HS

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Stas Malyshev

5:33 a.m.

Hi!

...

*The first one, which seems to be only for 1 record, just as a test seemed to give me an ERROR though:*

PREFIX wd: http://www.wikidata.org/entity/ PREFIX wdt: http://www.wikidata.org/prop/direct/ PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema# PREFIX schema: http://schema.org/

SELECT * WHERE { http://www.wikidata.org/entity/Q1652291 http://schema.org/description ?o . filter(lang(?o)='en').
?article schema:about ?item .
?article schema:inLanguage "en" .
FILTER (SUBSTR(str(?article), 1, 25) = "https://en.wikipedia.org/")
}

This one is not correct - ?item should be replaced by http://www.wikidata.org/entity/Q1652291. Or you can use BIND or VALUES syntax in SPARQL to bind all instanced of ?item to one or more specific items. But if you just leave it as ?item it matches any value - which means you just made it scan through all 15M items :) That will time out.

-- Stas Malyshev smalyshev@wikimedia.org

Hampton Snowball

6:14 a.m.

Thanks.

I only plan on using a query to extract from all english wikidata "articles" or all articles though anyway, hopefully the other queries will work.

On Mon, Feb 1, 2016 at 4:33 PM, Stas Malyshev smalyshev@wikimedia.org wrote:

...

Hi!

...
*The first one, which seems to be only for 1 record, just as a test seemed to give me an ERROR though:*

PREFIX wd: http://www.wikidata.org/entity/ PREFIX wdt: http://www.wikidata.org/prop/direct/ PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema# PREFIX schema: http://schema.org/

SELECT * WHERE { http://www.wikidata.org/entity/Q1652291 http://schema.org/description ?o . filter(lang(?o)='en').
?article schema:about ?item .
?article schema:inLanguage "en" .
FILTER (SUBSTR(str(?article), 1, 25) = "https://en.wikipedia.org/")
}
This one is not correct - ?item should be replaced by http://www.wikidata.org/entity/Q1652291. Or you can use BIND or VALUES syntax in SPARQL to bind all instanced of ?item to one or more specific items. But if you just leave it as ?item it matches any value - which means you just made it scan through all 15M items :) That will time out.

-- Stas Malyshev smalyshev@wikimedia.org

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Hampton Snowball

8:13 a.m.

Sorry if this is a dump question (I'm not a developer). To run the command on the rdfslice program in mentions (" java -jar rdfslice.jar -source <fileList>|<path> -patterns <graphPatterns> -out <fileDest> -order <order> -debug <debugGraphSize>), can this be done with windows command prompt? or do I need some special developer version of java/console?

Thanks for the tool.

On Sun, Jan 31, 2016 at 3:53 PM, Edgard Marx <marx@informatik.uni-leipzig.de

...

wrote:

...

Hey, you can simple use RDFSlice (https://bitbucket.org/emarx/rdfslice/overview) directly on the dump file ( https://dumps.wikimedia.org/wikidatawiki/entities/20160125/)

best, Edgard

On Sun, Jan 31, 2016 at 7:43 PM, Hampton Snowball < hamptonsnowball@gmail.com> wrote:

...
Hello,

I am interested in a subset of wikidata and I am trying to find the best way to get it without getting a larger dataset then necessary.

Is there a way to just get the "bios" that appear on the wikidata pages below the name of the person/organization, as well as the link to the english wikipedia page / or all wikipedia pages?

For example from: https://www.wikidata.org/wiki/Q1652291"

"Turkish female given name" https://en.wikipedia.org/wiki/H%C3%BClya and optionally https://de.wikipedia.org/wiki/H%C3%BClya

I know there is SPARQL which previously this list helped me construct a query, but I know some requests seem to timeout when looking at a large amount of data so I am not sure this would work.

The dumps I know are the full dataset, but I am not sure if there's any other subset dumps available or better way of grabbing this data

Thanks in advance, HS

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Hampton Snowball

8:28 a.m.

Of course I meant sorry if this is a dumb question :)

On Mon, Feb 1, 2016 at 7:13 PM, Hampton Snowball hamptonsnowball@gmail.com wrote:

...

Sorry if this is a dump question (I'm not a developer). To run the command on the rdfslice program in mentions (" java -jar rdfslice.jar -source <fileList>|<path> -patterns <graphPatterns> -out <fileDest> -order <order> -debug <debugGraphSize>), can this be done with windows command prompt? or do I need some special developer version of java/console?

Thanks for the tool.

On Sun, Jan 31, 2016 at 3:53 PM, Edgard Marx < marx@informatik.uni-leipzig.de> wrote:

...
Hey, you can simple use RDFSlice ( https://bitbucket.org/emarx/rdfslice/overview) directly on the dump file (https://dumps.wikimedia.org/wikidatawiki/entities/20160125/)

best, Edgard

On Sun, Jan 31, 2016 at 7:43 PM, Hampton Snowball < hamptonsnowball@gmail.com> wrote:

...
Hello,

I am interested in a subset of wikidata and I am trying to find the best way to get it without getting a larger dataset then necessary.

Is there a way to just get the "bios" that appear on the wikidata pages below the name of the person/organization, as well as the link to the english wikipedia page / or all wikipedia pages?

For example from: https://www.wikidata.org/wiki/Q1652291"

"Turkish female given name" https://en.wikipedia.org/wiki/H%C3%BClya and optionally https://de.wikipedia.org/wiki/H%C3%BClya

I know there is SPARQL which previously this list helped me construct a query, but I know some requests seem to timeout when looking at a large amount of data so I am not sure this would work.

The dumps I know are the full dataset, but I am not sure if there's any other subset dumps available or better way of grabbing this data

Thanks in advance, HS

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Hampton Snowball

2:18 p.m.

I was able to semi-successfully use RDFSlice with the dump using Windows command prompt. Only, maybe because it's a 5gb dump file I am getting java errors line after line as it goes through the file (java.lang.StringIndexOutOfBoundsException: String index out of range - 1. Sometimes the last number changes).

I thought it might might be a memory issue. Increasing memory with the -Xmx2G command (or 3G, 4G) I haven't had luck with. Any tips would be appreciated.

Thanks

On Mon, Feb 1, 2016 at 7:28 PM, Hampton Snowball hamptonsnowball@gmail.com wrote:

...

Of course I meant sorry if this is a dumb question :)

On Mon, Feb 1, 2016 at 7:13 PM, Hampton Snowball < hamptonsnowball@gmail.com> wrote:

...
Sorry if this is a dump question (I'm not a developer). To run the command on the rdfslice program in mentions (" java -jar rdfslice.jar -source <fileList>|<path> -patterns <graphPatterns> -out <fileDest> -order <order> -debug <debugGraphSize>), can this be done with windows command prompt? or do I need some special developer version of java/console?

Thanks for the tool.

On Sun, Jan 31, 2016 at 3:53 PM, Edgard Marx < marx@informatik.uni-leipzig.de> wrote:

...
Hey, you can simple use RDFSlice ( https://bitbucket.org/emarx/rdfslice/overview) directly on the dump file (https://dumps.wikimedia.org/wikidatawiki/entities/20160125/)

best, Edgard

On Sun, Jan 31, 2016 at 7:43 PM, Hampton Snowball < hamptonsnowball@gmail.com> wrote:

...
Hello,

I am interested in a subset of wikidata and I am trying to find the best way to get it without getting a larger dataset then necessary.

Is there a way to just get the "bios" that appear on the wikidata pages below the name of the person/organization, as well as the link to the english wikipedia page / or all wikipedia pages?

For example from: https://www.wikidata.org/wiki/Q1652291"

"Turkish female given name" https://en.wikipedia.org/wiki/H%C3%BClya and optionally https://de.wikipedia.org/wiki/H%C3%BClya

I know there is SPARQL which previously this list helped me construct a query, but I know some requests seem to timeout when looking at a large amount of data so I am not sure this would work.

The dumps I know are the full dataset, but I am not sure if there's any other subset dumps available or better way of grabbing this data

Thanks in advance, HS

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Edgard Marx

9:55 p.m.

Hey,

I recommend you to not post doubts related with third part systems or softwares that are not related with Wikidata or Wikimida here. In case of RDFSlice there is a page called issues ( https://bitbucket.org/emarx/rdfslice/issues), where you can open an issue and someone will answer you.

I also advise you to post your command line or a error, so the developers can better understand it and quickly fix it (if there is a problem).

best regards, Edgard

On Tue, Feb 2, 2016 at 7:18 AM, Hampton Snowball hamptonsnowball@gmail.com wrote:

...

I was able to semi-successfully use RDFSlice with the dump using Windows command prompt. Only, maybe because it's a 5gb dump file I am getting java errors line after line as it goes through the file (java.lang.StringIndexOutOfBoundsException: String index out of range - 1. Sometimes the last number changes).

I thought it might might be a memory issue. Increasing memory with the -Xmx2G command (or 3G, 4G) I haven't had luck with. Any tips would be appreciated.

Thanks

On Mon, Feb 1, 2016 at 7:28 PM, Hampton Snowball < hamptonsnowball@gmail.com> wrote:

...
Of course I meant sorry if this is a dumb question :)

On Mon, Feb 1, 2016 at 7:13 PM, Hampton Snowball < hamptonsnowball@gmail.com> wrote:

...
Sorry if this is a dump question (I'm not a developer). To run the command on the rdfslice program in mentions (" java -jar rdfslice.jar -source <fileList>|<path> -patterns <graphPatterns> -out <fileDest> -order <order> -debug <debugGraphSize>), can this be done with windows command prompt? or do I need some special developer version of java/console?

Thanks for the tool.

On Sun, Jan 31, 2016 at 3:53 PM, Edgard Marx < marx@informatik.uni-leipzig.de> wrote:

...
Hey, you can simple use RDFSlice ( https://bitbucket.org/emarx/rdfslice/overview) directly on the dump file (https://dumps.wikimedia.org/wikidatawiki/entities/20160125/)

best, Edgard

On Sun, Jan 31, 2016 at 7:43 PM, Hampton Snowball < hamptonsnowball@gmail.com> wrote:

...
Hello,

I am interested in a subset of wikidata and I am trying to find the best way to get it without getting a larger dataset then necessary.

Is there a way to just get the "bios" that appear on the wikidata pages below the name of the person/organization, as well as the link to the english wikipedia page / or all wikipedia pages?

For example from: https://www.wikidata.org/wiki/Q1652291"

"Turkish female given name" https://en.wikipedia.org/wiki/H%C3%BClya and optionally https://de.wikipedia.org/wiki/H%C3%BClya

I know there is SPARQL which previously this list helped me construct a query, but I know some requests seem to timeout when looking at a large amount of data so I am not sure this would work.

The dumps I know are the full dataset, but I am not sure if there's any other subset dumps available or better way of grabbing this data

Thanks in advance, HS

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Hampton Snowball

11:57 p.m.

Okay, thanks!

On Tue, Feb 2, 2016 at 8:55 AM, Edgard Marx marx@informatik.uni-leipzig.de wrote:

...

Hey,

I recommend you to not post doubts related with third part systems or softwares that are not related with Wikidata or Wikimida here. In case of RDFSlice there is a page called issues ( https://bitbucket.org/emarx/rdfslice/issues), where you can open an issue and someone will answer you.

I also advise you to post your command line or a error, so the developers can better understand it and quickly fix it (if there is a problem).

best regards, Edgard

On Tue, Feb 2, 2016 at 7:18 AM, Hampton Snowball < hamptonsnowball@gmail.com> wrote:

...
I was able to semi-successfully use RDFSlice with the dump using Windows command prompt. Only, maybe because it's a 5gb dump file I am getting java errors line after line as it goes through the file (java.lang.StringIndexOutOfBoundsException: String index out of range - 1. Sometimes the last number changes).

I thought it might might be a memory issue. Increasing memory with the -Xmx2G command (or 3G, 4G) I haven't had luck with. Any tips would be appreciated.

Thanks

On Mon, Feb 1, 2016 at 7:28 PM, Hampton Snowball < hamptonsnowball@gmail.com> wrote:

...
Of course I meant sorry if this is a dumb question :)

On Mon, Feb 1, 2016 at 7:13 PM, Hampton Snowball < hamptonsnowball@gmail.com> wrote:

...
Sorry if this is a dump question (I'm not a developer). To run the command on the rdfslice program in mentions (" java -jar rdfslice.jar -source <fileList>|<path> -patterns <graphPatterns> -out <fileDest> -order <order> -debug <debugGraphSize>), can this be done with windows command prompt? or do I need some special developer version of java/console?

Thanks for the tool.

On Sun, Jan 31, 2016 at 3:53 PM, Edgard Marx < marx@informatik.uni-leipzig.de> wrote:

...
Hey, you can simple use RDFSlice ( https://bitbucket.org/emarx/rdfslice/overview) directly on the dump file (https://dumps.wikimedia.org/wikidatawiki/entities/20160125/)

best, Edgard

On Sun, Jan 31, 2016 at 7:43 PM, Hampton Snowball < hamptonsnowball@gmail.com> wrote:

...
Hello,

I am interested in a subset of wikidata and I am trying to find the best way to get it without getting a larger dataset then necessary.

Is there a way to just get the "bios" that appear on the wikidata pages below the name of the person/organization, as well as the link to the english wikipedia page / or all wikipedia pages?

For example from: https://www.wikidata.org/wiki/Q1652291"

"Turkish female given name" https://en.wikipedia.org/wiki/H%C3%BClya and optionally https://de.wikipedia.org/wiki/H%C3%BClya

I know there is SPARQL which previously this list helped me construct a query, but I know some requests seem to timeout when looking at a large amount of data so I am not sure this would work.

The dumps I know are the full dataset, but I am not sure if there's any other subset dumps available or better way of grabbing this data

Thanks in advance, HS

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

3227

Age (days ago)

3229

Last active (days ago)

wikidata@lists.wikimedia.org

15 comments

4 participants

tags (0)

participants (4)

Edgard Marx
Gerard Meijssen
Hampton Snowball
Stas Malyshev