Hi,
I am currently working on a NER project at school and would like to know if there is a way to generate RDF dumps that only contain "instance of" or "subclass of" relations. I have found these dumps: RDF Exports from Wikidatahttps://tools.wmflabs.org/wikidata-exports/rdf/exports/20160801/dump_download.html Here, under "simplified and derived dumps" taxonomy and instances dumps are very useful for me but unfortunately very old. It would be great if I could generate up to date dumps.
Thank You, Alkım Ece Toprak Bogazici University
You can use CONSTRUCT queries for this. In this example [1] you'll get that subgraph for all items that also have a musicBrainz ID property
On Mon, Apr 27, 2020 at 11:36 AM Ece Toprak ece.topraak@gmail.com wrote:
Hi,
I am currently working on a NER project at school and would like to know if there is a way to generate RDF dumps that only contain "instance of" or "subclass of" relations. I have found these dumps: RDF Exports from Wikidata https://tools.wmflabs.org/wikidata-exports/rdf/exports/20160801/dump_download.html Here, under "simplified and derived dumps" taxonomy and instances dumps are very useful for me but unfortunately very old. It would be great if I could generate up to date dumps.
Thank You, Alkım Ece Toprak Bogazici University _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 4/27/20 5:48 AM, Andra Waagmeester wrote:
You can use CONSTRUCT queries for this. In this example [1] you'll get that subgraph for all items that also have a musicBrainz ID property
Hi Andra,
Do these CONSTRUCT queries return any of the following document content-types?
RDF-Turtle, RDF-XML, JSON-LD ?
Kingsley
On Mon, Apr 27, 2020 at 11:36 AM Ece Toprak <ece.topraak@gmail.com mailto:ece.topraak@gmail.com> wrote:
Hi, I am currently working on a NER project at school and would like to know if there is a way to generate RDF dumps that only contain "instance of" or "subclass of" relations. I have found these dumps: RDF Exports from Wikidata <https://tools.wmflabs.org/wikidata-exports/rdf/exports/20160801/dump_download.html> Here, under "simplified and derived dumps" taxonomy and instances dumps are very useful for me but unfortunately very old. It would be great if I could generate up to date dumps. Thank You, Alkım Ece Toprak Bogazici University _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 27/04/2020 18:02, Kingsley Idehen wrote:
Do these CONSTRUCT queries return any of the following document content-types?
RDF-Turtle, RDF-XML, JSON-LD ?
you can use content negotiation on the sparql endpoint
~ query="CONSTRUCT { ... }" ~ curl -H "Accept: application/rdf+xml" https://query.wikidata.org/sparql --data-urlencode query=$query ~ curl -H "Accept: text/turtle" -G https://query.wikidata.org/sparql --data-urlencode query=$query
-- raffaele@docuver.se
If the challenge is downloading large files, you can also get local access to all of the dumps (wikidata, wikipedia, and more) through the PAWS https://wikitech.wikimedia.org/wiki/PAWS (Wikimedia-hosted Jupyter notebooks) and Toolforge https://wikitech.wikimedia.org/wiki/Help:Toolforge (more general-purpose Wikimedia hosting environment). From Toolforge, you could run the Wikidata toolkit (Java) that Denny mentions. I'm personally more familiar with Python, so my suggestion is to use Python code to filter down the dumps to what you desire. Below is an example Python notebook that will do this on PAWS, though the PAWS environment is not set up for these longer running jobs and will probably die before the process is complete, so I'd highly recommend converting it into a script that can run on Toolforge (see https://wikitech.wikimedia.org/wiki/Help:Toolforge/Dumps).
PAWS example: https://paws-public.wmflabs.org/paws-public/User:Isaac_(WMF)/Simplified_Wiki...
Best, Isaac
On Thu, Apr 30, 2020 at 1:33 AM raffaele messuti raffaele@docuver.se wrote:
On 27/04/2020 18:02, Kingsley Idehen wrote:
Do these CONSTRUCT queries return any of the following document
content-types?
RDF-Turtle, RDF-XML, JSON-LD ?
you can use content negotiation on the sparql endpoint
~ query="CONSTRUCT { ... }" ~ curl -H "Accept: application/rdf+xml" https://query.wikidata.org/sparql --data-urlencode query=$query ~ curl -H "Accept: text/turtle" -G https://query.wikidata.org/sparql --data-urlencode query=$query
-- raffaele@docuver.se
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 5/1/20 11:53 AM, Isaac Johnson wrote:
If the challenge is downloading large files, you can also get local access to all of the dumps (wikidata, wikipedia, and more) through the PAWS https://wikitech.wikimedia.org/wiki/PAWS (Wikimedia-hosted Jupyter notebooks) and Toolforge https://wikitech.wikimedia.org/wiki/Help:Toolforge (more general-purpose Wikimedia hosting environment). From Toolforge, you could run the Wikidata toolkit (Java) that Denny mentions. I'm personally more familiar with Python, so my suggestion is to use Python code to filter down the dumps to what you desire. Below is an example Python notebook that will do this on PAWS, though the PAWS environment is not set up for these longer running jobs and will probably die before the process is complete, so I'd highly recommend converting it into a script that can run on Toolforge (see https://wikitech.wikimedia.org/wiki/Help:Toolforge/Dumps).
PAWS example: https://paws-public.wmflabs.org/paws-public/User:Isaac_(WMF)/Simplified_Wiki...
Best, Isaac
That isn't my challenge.
I wanted to know why the WDQ UI doesn't provide an option for CONSTRUCT and DESCRIBE query solutions using a variety of document types.
See: https://wikidata.demo.openlinksw.com/sparql to see what I mean. Ditto any DBpedia endpoint.
Kingsley
On Thu, Apr 30, 2020 at 1:33 AM raffaele messuti <raffaele@docuver.se mailto:raffaele@docuver.se> wrote:
On 27/04/2020 18:02, Kingsley Idehen wrote: >> [1] https://w.wiki/PBi <https://w.wiki/PBi> >> > Do these CONSTRUCT queries return any of the following document content-types? > > RDF-Turtle, RDF-XML, JSON-LD ? you can use content negotiation on the sparql endpoint ~ query="CONSTRUCT { ... }" ~ curl -H "Accept: application/rdf+xml" https://query.wikidata.org/sparql --data-urlencode query=$query ~ curl -H "Accept: text/turtle" -G https://query.wikidata.org/sparql --data-urlencode query=$query -- raffaele@docuver.se <mailto:raffaele@docuver.se> _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Isaac Johnson (he/him/his) -- Research Scientist -- Wikimedia Foundation
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Kingsley,
thanks for suggesting that feature. Since you already have that feature, could you let us know how often the UI option for these output formats are used? That could help with prioritising.
My uninformed hunch would be that there isn't much demand for selecting the format via the UI, and that is more relevant to have automated calls be able to do that format selection, which the endpoint provides.
Thanks, Denny
On Fri, May 1, 2020 at 9:59 AM Kingsley Idehen kidehen@openlinksw.com wrote:
On 5/1/20 11:53 AM, Isaac Johnson wrote:
If the challenge is downloading large files, you can also get local access to all of the dumps (wikidata, wikipedia, and more) through the PAWS https://wikitech.wikimedia.org/wiki/PAWS (Wikimedia-hosted Jupyter notebooks) and Toolforge https://wikitech.wikimedia.org/wiki/Help:Toolforge (more general-purpose Wikimedia hosting environment). From Toolforge, you could run the Wikidata toolkit (Java) that Denny mentions. I'm personally more familiar with Python, so my suggestion is to use Python code to filter down the dumps to what you desire. Below is an example Python notebook that will do this on PAWS, though the PAWS environment is not set up for these longer running jobs and will probably die before the process is complete, so I'd highly recommend converting it into a script that can run on Toolforge (see https://wikitech.wikimedia.org/wiki/Help:Toolforge/Dumps).
PAWS example: https://paws-public.wmflabs.org/paws-public/User:Isaac_(WMF)/Simplified_Wiki...
Best, Isaac
That isn't my challenge.
I wanted to know why the WDQ UI doesn't provide an option for CONSTRUCT and DESCRIBE query solutions using a variety of document types.
See: https://wikidata.demo.openlinksw.com/sparql to see what I mean. Ditto any DBpedia endpoint.
Kingsley
On Thu, Apr 30, 2020 at 1:33 AM raffaele messuti raffaele@docuver.se wrote:
On 27/04/2020 18:02, Kingsley Idehen wrote:
Do these CONSTRUCT queries return any of the following document
content-types?
RDF-Turtle, RDF-XML, JSON-LD ?
you can use content negotiation on the sparql endpoint
~ query="CONSTRUCT { ... }" ~ curl -H "Accept: application/rdf+xml" https://query.wikidata.org/sparql --data-urlencode query=$query ~ curl -H "Accept: text/turtle" -G https://query.wikidata.org/sparql --data-urlencode query=$query
-- raffaele@docuver.se
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Isaac Johnson (he/him/his) -- Research Scientist -- Wikimedia Foundation
Wikidata mailing listWikidata@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikidata
-- Regards,
Kingsley Idehen Founder & CEO OpenLink Software Home Page: http://www.openlinksw.com Community Support: https://community.openlinksw.com Weblogs (Blogs): Company Blog: https://medium.com/openlink-software-blog Virtuoso Blog: https://medium.com/virtuoso-blog Data Access Drivers Blog: https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
Personal Weblogs (Blogs): Medium Blog: https://medium.com/@kidehen Legacy Blogs: http://www.openlinksw.com/blog/~kidehen/ http://kidehen.blogspot.com
Profile Pages: Pinterest: https://www.pinterest.com/kidehen/ Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen Twitter: https://twitter.com/kidehen Google+: https://plus.google.com/+KingsleyIdehen/about LinkedIn: http://www.linkedin.com/in/kidehen
Web Identities (WebID): Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i : http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 4/30/20 2:32 AM, raffaele messuti wrote:
On 27/04/2020 18:02, Kingsley Idehen wrote:
Do these CONSTRUCT queries return any of the following document content-types?
RDF-Turtle, RDF-XML, JSON-LD ?
you can use content negotiation on the sparql endpoint
~ query="CONSTRUCT { ... }" ~ curl -H "Accept: application/rdf+xml" https://query.wikidata.org/sparql --data-urlencode query=$query ~ curl -H "Accept: text/turtle" -G https://query.wikidata.org/sparql --data-urlencode query=$query
Okay.
Any reason why that isn't a UI option also re WDQS?
We have an instance at: https://wikidata.demo.openlinksw.com/sparql that does support DESCRIBE and CONSTRUCT queries that return solutions using a variety of document types (RDF-Turtle, RDF-XML, JSON-LD, etc..), hence my question.
CONSTRUCT would be best, but I am not sure that there's any system to allows you to do that.
What I would do is get the truthy dump in ntriples, and filter out all lines with the respective properties. The Wikidata Toolkit allows you to do that and more.
https://www.mediawiki.org/wiki/Wikidata_Toolkit
On Mon, Apr 27, 2020 at 2:35 AM Ece Toprak ece.topraak@gmail.com wrote:
Hi,
I am currently working on a NER project at school and would like to know if there is a way to generate RDF dumps that only contain "instance of" or "subclass of" relations. I have found these dumps: RDF Exports from Wikidata https://tools.wmflabs.org/wikidata-exports/rdf/exports/20160801/dump_download.html Here, under "simplified and derived dumps" taxonomy and instances dumps are very useful for me but unfortunately very old. It would be great if I could generate up to date dumps.
Thank You, Alkım Ece Toprak Bogazici University _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Lots of ways to do this, but just wanted to throw another option out there. I've put a subset of Wikidata and Wikipedia from 2019 December into a Kaggle dataset and you can use the free Kaggle kernels to explore and try out algorithms. There is one file called "statements.csv" with integer triples for "qpq" statements (i.e., truthy statements where the source and the target are a WIkidata item).
* blog post explaining the dataset: https://blog.kensho.com/announcing-the-kensho-derived-wikimedia-dataset-5d11... * kaggle page for the dataset: https://www.kaggle.com/kenshoresearch/kensho-derived-wikimedia-data * example kernel that uses a simple subclass path model to label items as person/location/organization: https://www.kaggle.com/gabrielaltay/kdwd-subclass-path-ner
Also, if you are interested in working with the raw Wikidata JSON dumps using a python library you can check this package out, * https://qwikidata.readthedocs.io/en/stable/json_dump.html
best, -Gabriel
On Wed, Apr 29, 2020 at 7:43 PM Denny Vrandečić vrandecic@gmail.com wrote:
CONSTRUCT would be best, but I am not sure that there's any system to allows you to do that.
What I would do is get the truthy dump in ntriples, and filter out all lines with the respective properties. The Wikidata Toolkit allows you to do that and more.
https://www.mediawiki.org/wiki/Wikidata_Toolkit
On Mon, Apr 27, 2020 at 2:35 AM Ece Toprak ece.topraak@gmail.com wrote:
Hi,
I am currently working on a NER project at school and would like to know if there is a way to generate RDF dumps that only contain "instance of" or "subclass of" relations. I have found these dumps: RDF Exports from Wikidata https://tools.wmflabs.org/wikidata-exports/rdf/exports/20160801/dump_download.html Here, under "simplified and derived dumps" taxonomy and instances dumps are very useful for me but unfortunately very old. It would be great if I could generate up to date dumps.
Thank You, Alkım Ece Toprak Bogazici University _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata