Partial RDF dumps

List overview All Threads
Download

newer

older

Best way to tag item for Admin...

Fwd: [Talk-us] OSM Foundation’s...

Ece Toprak

27 Apr 2020 27 Apr '20

9:34 a.m.

Hi,

I am currently working on a NER project at school and would like to know if there is a way to generate RDF dumps that only contain "instance of" or "subclass of" relations. I have found these dumps: RDF Exports from Wikidatahttps://tools.wmflabs.org/wikidata-exports/rdf/exports/20160801/dump_download.html Here, under "simplified and derived dumps" taxonomy and instances dumps are very useful for me but unfortunately very old. It would be great if I could generate up to date dumps.

Thank You, Alkım Ece Toprak Bogazici University

Attachments:

attachment.htm (text/html — 1.6 KB)

Show replies by date

Andra Waagmeester

27 Apr 27 Apr

9:48 a.m.

You can use CONSTRUCT queries for this. In this example [1] you'll get that subgraph for all items that also have a musicBrainz ID property

[1] https://w.wiki/PBi

On Mon, Apr 27, 2020 at 11:36 AM Ece Toprak ece.topraak@gmail.com wrote:

...

Hi,

I am currently working on a NER project at school and would like to know if there is a way to generate RDF dumps that only contain "instance of" or "subclass of" relations. I have found these dumps: RDF Exports from Wikidata https://tools.wmflabs.org/wikidata-exports/rdf/exports/20160801/dump_download.html Here, under "simplified and derived dumps" taxonomy and instances dumps are very useful for me but unfortunately very old. It would be great if I could generate up to date dumps.

Thank You, Alkım Ece Toprak Bogazici University _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Kingsley Idehen

4:02 p.m.

On 4/27/20 5:48 AM, Andra Waagmeester wrote:

...

You can use CONSTRUCT queries for this. In this example [1] you'll get that subgraph for all items that also have a musicBrainz ID property

[1] https://w.wiki/PBi

Hi Andra,

Do these CONSTRUCT queries return any of the following document content-types?

RDF-Turtle, RDF-XML, JSON-LD ?

Kingsley

...

On Mon, Apr 27, 2020 at 11:36 AM Ece Toprak <ece.topraak@gmail.com mailto:ece.topraak@gmail.com> wrote:

Hi,

I am currently working on a NER project at school and would like
to know if there is a way to generate RDF dumps that only contain
"instance of" or "subclass of" relations.
 I have found these dumps:
RDF Exports from Wikidata
<https://tools.wmflabs.org/wikidata-exports/rdf/exports/20160801/dump_download.html>
Here, under "simplified and derived dumps" taxonomy and instances
dumps are very useful for me but unfortunately very old. 
It would be great if I could generate up to date dumps. 

Thank You,
Alkım Ece Toprak
Bogazici University
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

-- Regards, Kingsley Idehen Founder & CEO OpenLink Software Home Page: http://www.openlinksw.com Community Support: https://community.openlinksw.com Weblogs (Blogs): Company Blog: https://medium.com/openlink-software-blog Virtuoso Blog: https://medium.com/virtuoso-blog Data Access Drivers Blog: https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers Personal Weblogs (Blogs): Medium Blog: https://medium.com/@kidehen Legacy Blogs: http://www.openlinksw.com/blog/~kidehen/ http://kidehen.blogspot.com Profile Pages: Pinterest: https://www.pinterest.com/kidehen/ Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen Twitter: https://twitter.com/kidehen Google+: https://plus.google.com/+KingsleyIdehen/about LinkedIn: http://www.linkedin.com/in/kidehen Web Identities (WebID): Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i : http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this

raffaele messuti

30 Apr 30 Apr

6:32 a.m.

On 27/04/2020 18:02, Kingsley Idehen wrote:

...

...
[1] https://w.wiki/PBi https://w.wiki/PBi

Do these CONSTRUCT queries return any of the following document content-types?

RDF-Turtle, RDF-XML, JSON-LD ?

you can use content negotiation on the sparql endpoint

~ query="CONSTRUCT { ... }" ~ curl -H "Accept: application/rdf+xml" https://query.wikidata.org/sparql --data-urlencode query=$query ~ curl -H "Accept: text/turtle" -G https://query.wikidata.org/sparql --data-urlencode query=$query

-- raffaele@docuver.se

Isaac Johnson

1 May 1 May

3:53 p.m.

If the challenge is downloading large files, you can also get local access to all of the dumps (wikidata, wikipedia, and more) through the PAWS https://wikitech.wikimedia.org/wiki/PAWS (Wikimedia-hosted Jupyter notebooks) and Toolforge https://wikitech.wikimedia.org/wiki/Help:Toolforge (more general-purpose Wikimedia hosting environment). From Toolforge, you could run the Wikidata toolkit (Java) that Denny mentions. I'm personally more familiar with Python, so my suggestion is to use Python code to filter down the dumps to what you desire. Below is an example Python notebook that will do this on PAWS, though the PAWS environment is not set up for these longer running jobs and will probably die before the process is complete, so I'd highly recommend converting it into a script that can run on Toolforge (see https://wikitech.wikimedia.org/wiki/Help:Toolforge/Dumps).

PAWS example: https://paws-public.wmflabs.org/paws-public/User:Isaac_(WMF)/Simplified_Wiki...

Best, Isaac

On Thu, Apr 30, 2020 at 1:33 AM raffaele messuti raffaele@docuver.se wrote:

...

On 27/04/2020 18:02, Kingsley Idehen wrote:

...
...
[1] https://w.wiki/PBi https://w.wiki/PBi

Do these CONSTRUCT queries return any of the following document

content-types?

...
RDF-Turtle, RDF-XML, JSON-LD ?

you can use content negotiation on the sparql endpoint

~ query="CONSTRUCT { ... }" ~ curl -H "Accept: application/rdf+xml" https://query.wikidata.org/sparql --data-urlencode query=$query ~ curl -H "Accept: text/turtle" -G https://query.wikidata.org/sparql --data-urlencode query=$query

-- raffaele@docuver.se

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

-- Isaac Johnson (he/him/his) -- Research Scientist -- Wikimedia Foundation

Kingsley Idehen

4:59 p.m.

On 5/1/20 11:53 AM, Isaac Johnson wrote:

...

If the challenge is downloading large files, you can also get local access to all of the dumps (wikidata, wikipedia, and more) through the PAWS https://wikitech.wikimedia.org/wiki/PAWS (Wikimedia-hosted Jupyter notebooks) and Toolforge https://wikitech.wikimedia.org/wiki/Help:Toolforge (more general-purpose Wikimedia hosting environment). From Toolforge, you could run the Wikidata toolkit (Java) that Denny mentions. I'm personally more familiar with Python, so my suggestion is to use Python code to filter down the dumps to what you desire. Below is an example Python notebook that will do this on PAWS, though the PAWS environment is not set up for these longer running jobs and will probably die before the process is complete, so I'd highly recommend converting it into a script that can run on Toolforge (see https://wikitech.wikimedia.org/wiki/Help:Toolforge/Dumps).

PAWS example: https://paws-public.wmflabs.org/paws-public/User:Isaac_(WMF)/Simplified_Wiki...

Best, Isaac

That isn't my challenge.

I wanted to know why the WDQ UI doesn't provide an option for CONSTRUCT and DESCRIBE query solutions using a variety of document types.

See: https://wikidata.demo.openlinksw.com/sparql to see what I mean. Ditto any DBpedia endpoint.

Kingsley

...

On Thu, Apr 30, 2020 at 1:33 AM raffaele messuti <raffaele@docuver.se mailto:raffaele@docuver.se> wrote:

On 27/04/2020 18:02, Kingsley Idehen wrote:
>> [1] https://w.wiki/PBi <https://w.wiki/PBi>
>>
> Do these CONSTRUCT queries return any of the following document
content-types?
>
> RDF-Turtle, RDF-XML, JSON-LD ?

you can use content negotiation on the sparql endpoint

~ query="CONSTRUCT { ... }"
~ curl -H "Accept: application/rdf+xml"
https://query.wikidata.org/sparql --data-urlencode query=$query
~ curl -H "Accept: text/turtle" -G
https://query.wikidata.org/sparql --data-urlencode query=$query



--
raffaele@docuver.se <mailto:raffaele@docuver.se>

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata

-- Isaac Johnson (he/him/his) -- Research Scientist -- Wikimedia Foundation

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Denny Vrandečić

5:11 p.m.

Kingsley,

thanks for suggesting that feature. Since you already have that feature, could you let us know how often the UI option for these output formats are used? That could help with prioritising.

My uninformed hunch would be that there isn't much demand for selecting the format via the UI, and that is more relevant to have automated calls be able to do that format selection, which the endpoint provides.

Thanks, Denny

On Fri, May 1, 2020 at 9:59 AM Kingsley Idehen kidehen@openlinksw.com wrote:

...

On 5/1/20 11:53 AM, Isaac Johnson wrote:

If the challenge is downloading large files, you can also get local access to all of the dumps (wikidata, wikipedia, and more) through the PAWS https://wikitech.wikimedia.org/wiki/PAWS (Wikimedia-hosted Jupyter notebooks) and Toolforge https://wikitech.wikimedia.org/wiki/Help:Toolforge (more general-purpose Wikimedia hosting environment). From Toolforge, you could run the Wikidata toolkit (Java) that Denny mentions. I'm personally more familiar with Python, so my suggestion is to use Python code to filter down the dumps to what you desire. Below is an example Python notebook that will do this on PAWS, though the PAWS environment is not set up for these longer running jobs and will probably die before the process is complete, so I'd highly recommend converting it into a script that can run on Toolforge (see https://wikitech.wikimedia.org/wiki/Help:Toolforge/Dumps).

PAWS example: https://paws-public.wmflabs.org/paws-public/User:Isaac_(WMF)/Simplified_Wiki...

Best, Isaac

That isn't my challenge.

I wanted to know why the WDQ UI doesn't provide an option for CONSTRUCT and DESCRIBE query solutions using a variety of document types.

See: https://wikidata.demo.openlinksw.com/sparql to see what I mean. Ditto any DBpedia endpoint.

Kingsley

On Thu, Apr 30, 2020 at 1:33 AM raffaele messuti raffaele@docuver.se wrote:

...
On 27/04/2020 18:02, Kingsley Idehen wrote:

...
...
[1] https://w.wiki/PBi https://w.wiki/PBi

Do these CONSTRUCT queries return any of the following document

content-types?

...
RDF-Turtle, RDF-XML, JSON-LD ?

you can use content negotiation on the sparql endpoint

~ query="CONSTRUCT { ... }" ~ curl -H "Accept: application/rdf+xml" https://query.wikidata.org/sparql --data-urlencode query=$query ~ curl -H "Accept: text/turtle" -G https://query.wikidata.org/sparql --data-urlencode query=$query

-- raffaele@docuver.se

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

-- Isaac Johnson (he/him/his) -- Research Scientist -- Wikimedia Foundation

Wikidata mailing listWikidata@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikidata

-- Regards,

Kingsley Idehen Founder & CEO OpenLink Software Home Page: http://www.openlinksw.com Community Support: https://community.openlinksw.com Weblogs (Blogs): Company Blog: https://medium.com/openlink-software-blog Virtuoso Blog: https://medium.com/virtuoso-blog Data Access Drivers Blog: https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers

Personal Weblogs (Blogs): Medium Blog: https://medium.com/@kidehen Legacy Blogs: http://www.openlinksw.com/blog/~kidehen/ http://kidehen.blogspot.com

Profile Pages: Pinterest: https://www.pinterest.com/kidehen/ Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen Twitter: https://twitter.com/kidehen Google+: https://plus.google.com/+KingsleyIdehen/about LinkedIn: http://www.linkedin.com/in/kidehen

Web Identities (WebID): Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i : http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Kingsley Idehen

4:57 p.m.

On 4/30/20 2:32 AM, raffaele messuti wrote:

...

On 27/04/2020 18:02, Kingsley Idehen wrote:

...
...
[1] https://w.wiki/PBi https://w.wiki/PBi

Do these CONSTRUCT queries return any of the following document content-types?

RDF-Turtle, RDF-XML, JSON-LD ?

you can use content negotiation on the sparql endpoint

~ query="CONSTRUCT { ... }" ~ curl -H "Accept: application/rdf+xml" https://query.wikidata.org/sparql --data-urlencode query=$query ~ curl -H "Accept: text/turtle" -G https://query.wikidata.org/sparql --data-urlencode query=$query

Okay.

Any reason why that isn't a UI option also re WDQS?

We have an instance at: https://wikidata.demo.openlinksw.com/sparql that does support DESCRIBE and CONSTRUCT queries that return solutions using a variety of document types (RDF-Turtle, RDF-XML, JSON-LD, etc..), hence my question.

Denny Vrandečić

29 Apr 29 Apr

11:42 p.m.

CONSTRUCT would be best, but I am not sure that there's any system to allows you to do that.

What I would do is get the truthy dump in ntriples, and filter out all lines with the respective properties. The Wikidata Toolkit allows you to do that and more.

https://www.mediawiki.org/wiki/Wikidata_Toolkit

On Mon, Apr 27, 2020 at 2:35 AM Ece Toprak ece.topraak@gmail.com wrote:

...

Hi,

I am currently working on a NER project at school and would like to know if there is a way to generate RDF dumps that only contain "instance of" or "subclass of" relations. I have found these dumps: RDF Exports from Wikidata https://tools.wmflabs.org/wikidata-exports/rdf/exports/20160801/dump_download.html Here, under "simplified and derived dumps" taxonomy and instances dumps are very useful for me but unfortunately very old. It would be great if I could generate up to date dumps.

Thank You, Alkım Ece Toprak Bogazici University _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Gabriel Altay

30 Apr 30 Apr

12:05 a.m.

Lots of ways to do this, but just wanted to throw another option out there. I've put a subset of Wikidata and Wikipedia from 2019 December into a Kaggle dataset and you can use the free Kaggle kernels to explore and try out algorithms. There is one file called "statements.csv" with integer triples for "qpq" statements (i.e., truthy statements where the source and the target are a WIkidata item).

* blog post explaining the dataset: https://blog.kensho.com/announcing-the-kensho-derived-wikimedia-dataset-5d11... * kaggle page for the dataset: https://www.kaggle.com/kenshoresearch/kensho-derived-wikimedia-data * example kernel that uses a simple subclass path model to label items as person/location/organization: https://www.kaggle.com/gabrielaltay/kdwd-subclass-path-ner

Also, if you are interested in working with the raw Wikidata JSON dumps using a python library you can check this package out, * https://qwikidata.readthedocs.io/en/stable/json_dump.html

best, -Gabriel

On Wed, Apr 29, 2020 at 7:43 PM Denny Vrandečić vrandecic@gmail.com wrote:

...

CONSTRUCT would be best, but I am not sure that there's any system to allows you to do that.

What I would do is get the truthy dump in ntriples, and filter out all lines with the respective properties. The Wikidata Toolkit allows you to do that and more.

https://www.mediawiki.org/wiki/Wikidata_Toolkit

On Mon, Apr 27, 2020 at 2:35 AM Ece Toprak ece.topraak@gmail.com wrote:

...
Hi,

I am currently working on a NER project at school and would like to know if there is a way to generate RDF dumps that only contain "instance of" or "subclass of" relations. I have found these dumps: RDF Exports from Wikidata https://tools.wmflabs.org/wikidata-exports/rdf/exports/20160801/dump_download.html Here, under "simplified and derived dumps" taxonomy and instances dumps are very useful for me but unfortunately very old. It would be great if I could generate up to date dumps.

Thank You, Alkım Ece Toprak Bogazici University _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

1548

Age (days ago)

1552

Last active (days ago)

wikidata@lists.wikimedia.org

9 comments

7 participants

tags (0)

participants (7)

Andra Waagmeester
Denny Vrandečić
Ece Toprak
Gabriel Altay
Isaac Johnson
Kingsley Idehen
raffaele messuti