I was able to semi-successfully use RDFSlice with the dump using Windows command prompt. Only, maybe because it's a 5gb dump file I am getting java errors line after line as it goes through the file (java.lang.StringIndexOutOfBoundsException: String index out of range - 1. Sometimes the last number changes).
I thought it might might be a memory issue. Increasing memory with the -Xmx2G command (or 3G, 4G) I haven't had luck with. Any tips would be appreciated.
Thanks
On Mon, Feb 1, 2016 at 7:28 PM, Hampton Snowball hamptonsnowball@gmail.com wrote:
Of course I meant sorry if this is a dumb question :)
On Mon, Feb 1, 2016 at 7:13 PM, Hampton Snowball < hamptonsnowball@gmail.com> wrote:
Sorry if this is a dump question (I'm not a developer). To run the command on the rdfslice program in mentions (" java -jar rdfslice.jar -source <fileList>|<path> -patterns <graphPatterns> -out <fileDest> -order <order> -debug <debugGraphSize>), can this be done with windows command prompt? or do I need some special developer version of java/console?
Thanks for the tool.
On Sun, Jan 31, 2016 at 3:53 PM, Edgard Marx < marx@informatik.uni-leipzig.de> wrote:
Hey, you can simple use RDFSlice ( https://bitbucket.org/emarx/rdfslice/overview) directly on the dump file (https://dumps.wikimedia.org/wikidatawiki/entities/20160125/)
best, Edgard
On Sun, Jan 31, 2016 at 7:43 PM, Hampton Snowball < hamptonsnowball@gmail.com> wrote:
Hello,
I am interested in a subset of wikidata and I am trying to find the best way to get it without getting a larger dataset then necessary.
Is there a way to just get the "bios" that appear on the wikidata pages below the name of the person/organization, as well as the link to the english wikipedia page / or all wikipedia pages?
For example from: https://www.wikidata.org/wiki/Q1652291"
"Turkish female given name" https://en.wikipedia.org/wiki/H%C3%BClya and optionally https://de.wikipedia.org/wiki/H%C3%BClya
I know there is SPARQL which previously this list helped me construct a query, but I know some requests seem to timeout when looking at a large amount of data so I am not sure this would work.
The dumps I know are the full dataset, but I am not sure if there's any other subset dumps available or better way of grabbing this data
Thanks in advance, HS
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata