Hi all,
I hope this is the right place for this discussion :)
First of all, as developer of software for RDF Linked Data consumption, I am naturally delighted that Wikidata serves Linked Data and supports content negotiation (not many services get it right).
However, IMO, the amount of meta-triples not relevant to the requested entity, and the sheer size of the RDF data that it causes, make Wikidata's RDF responses pretty much unusable.
Let's take a single entity as an example:
curl -L -H "Accept: text/turtle" 'https://www.wikidata.org/entity/Q1748'
The size of the Turtle response is 1.6MB!
All of the schema metadata such as property and class descriptions are not needed as they can be discovered by dereferencing the respective term URIs:
wdno:P2960 a owl:Class ; owl:complementOf _:e8842935d39a233def3d267ae3737d8c .
_:e8842935d39a233def3d267ae3737d8c a owl:Restriction ; owl:onProperty wdt:P2960 ;
owl:someValuesFrom owl:Thing .
p:P518 a owl:ObjectProperty . psv:P518 a owl:ObjectProperty . pqv:P518 a owl:ObjectProperty . prv:P518 a owl:ObjectProperty . wdt:P518 a owl:ObjectProperty . ps:P518 a owl:ObjectProperty . pq:P518 a owl:ObjectProperty . pr:P518 a owl:ObjectProperty .
wd:Q1775415 a wikibase:Item ; rdfs:label "feminine"@en ; skos:prefLabel "feminine"@en ; schema:name "feminine"@en ; schema:description "grammatical gender"@en .
and so on and so forth.
Then I would argue that the provenance statements such as http://www.wikidata.org/entity/statement/Q1748-cfb94fd5-464b-1b83-a513-dd751882b7ce are also *not* necessary for the majority of use cases of the majority of users.
I suppose they are included to provide a complete and "truthy" response, but by doing so the usability of the data is diminished. I think the provenance statements should be removed from the default responses and relegated to some "complete" or "truthy" profile with a distinct URI, linked to from the default response.
What do you think?
Martynas atomgraph.com
Dear Martynas,
I strongly disagree that the provenance statements should be removed from the default responses, since it is exactly the provenance that makes Wikidata so valuable. Wikidata comes with a lot of noise, since often references are not provided. Personally, I mostly consider a Wikidata statement without a reference, without any value and is best ignored. . So if we remove the provenance, Wikidata becomes just a bag of noise. Having said this, I do acknowledge that wikidata comes with a lot of baggage or weight, but there are some decent tools out there to subset Wikidata into more manageable portions. We did a paper on that a few years back: https://www.semantic-web-journal.net/system/files/swj3491.pdf
Cheers,
Andra
Op di 6 jan 2026 om 11:15 schreef Martynas Jusevičius < martynas@atomgraph.com>:
Hi all,
I hope this is the right place for this discussion :)
First of all, as developer of software for RDF Linked Data consumption, I am naturally delighted that Wikidata serves Linked Data and supports content negotiation (not many services get it right).
However, IMO, the amount of meta-triples not relevant to the requested entity, and the sheer size of the RDF data that it causes, make Wikidata's RDF responses pretty much unusable.
Let's take a single entity as an example:
curl -L -H "Accept: text/turtle" 'https://www.wikidata.org/entity/Q1748'
The size of the Turtle response is 1.6MB!
All of the schema metadata such as property and class descriptions are not needed as they can be discovered by dereferencing the respective term URIs:
wdno:P2960 a owl:Class ; owl:complementOf _:e8842935d39a233def3d267ae3737d8c .
_:e8842935d39a233def3d267ae3737d8c a owl:Restriction ; owl:onProperty wdt:P2960 ;
owl:someValuesFrom owl:Thing .
p:P518 a owl:ObjectProperty . psv:P518 a owl:ObjectProperty . pqv:P518 a owl:ObjectProperty . prv:P518 a owl:ObjectProperty . wdt:P518 a owl:ObjectProperty . ps:P518 a owl:ObjectProperty . pq:P518 a owl:ObjectProperty . pr:P518 a owl:ObjectProperty .
wd:Q1775415 a wikibase:Item ; rdfs:label "feminine"@en ; skos:prefLabel "feminine"@en ; schema:name "feminine"@en ; schema:description "grammatical gender"@en .
and so on and so forth.
Then I would argue that the provenance statements such as < http://www.wikidata.org/entity/statement/Q1748-cfb94fd5-464b-1b83-a513-dd751...
are also *not* necessary for the majority of use cases of the majority of users.
I suppose they are included to provide a complete and "truthy" response, but by doing so the usability of the data is diminished. I think the provenance statements should be removed from the default responses and relegated to some "complete" or "truthy" profile with a distinct URI, linked to from the default response.
What do you think?
Martynas atomgraph.com _______________________________________________ Wikidata mailing list -- wikidata@lists.wikimedia.org Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/mes... To unsubscribe send an email to wikidata-leave@lists.wikimedia.org
Dear all,
I fully agree with Andra's response in terms of content. Provenance and governance are crucial for responsible use and propagation of information.
I myself work as a healthcare professional in the field of medical guidelines and biomedical research and am confronted daily with the question of whether a term definition is reliable or not.
It makes a difference whether a term is defined by an authoritative body, such as a WHO expert working group, or by an undefined institution.
Also, from my involvement with LIFES https://www.lifes.institute/ , I see that term definitions, and the sources of those definitions, are an overlooked aspect in the data community, which greatly complicates machine interpretability and reuse of data.
Especially when Wikidata/Wikibase is used for a controlled vocabulary in a KAG.
When no source is listed for a term (definition), lossless propagation of information is not guaranteed and is therefore essentially useless for further use.
The problem outlined can be explained by two aspects.
First, it is an intrinsic given. Systems such as Wikidata are not designed to go beyond a lexicological concept. The world is much more complex than that and needs to be described with a far more expressive encyclopedic model.
In practice, knowledge graph-like systems get stuck for more complex knowledge models.
Second, it is a result of uncontrolled growth of source silos, which gives term mapping a disproportionate role in the (poorly defined) propagation of information.
It would be better to address these through an extensive federative policy.
Sincerely,
Frans van der Horst
Van: Andra Waagmeester andra@micel.io Verzonden: dinsdag 6 januari 2026 12:53 Aan: Discussion list for the Wikidata project wikidata@lists.wikimedia.org Onderwerp: [Wikidata] Re: RDF Linked Data responses of Wikidata URIs
Dear Martynas,
I strongly disagree that the provenance statements should be removed from the default responses, since it is exactly the provenance that makes Wikidata so valuable. Wikidata comes with a lot of noise, since often references are not provided. Personally, I mostly consider a Wikidata statement without a reference, without any value and is best ignored. . So if we remove the provenance, Wikidata becomes just a bag of noise.
Having said this, I do acknowledge that wikidata comes with a lot of baggage or weight, but there are some decent tools out there to subset Wikidata into more manageable portions.
We did a paper on that a few years back: https://www.semantic-web-journal.net/system/files/swj3491.pdf https://www.semantic-web-journal.net/system/files/swj3491.pdf
Cheers,
Andra
Op di 6 jan 2026 om 11:15 schreef Martynas Jusevičius < mailto:martynas@atomgraph.com martynas@atomgraph.com>:
Hi all,
I hope this is the right place for this discussion :)
First of all, as developer of software for RDF Linked Data consumption, I am naturally delighted that Wikidata serves Linked Data and supports content negotiation (not many services get it right).
However, IMO, the amount of meta-triples not relevant to the requested entity, and the sheer size of the RDF data that it causes, make Wikidata's RDF responses pretty much unusable.
Let's take a single entity as an example:
curl -L -H "Accept: text/turtle" 'https://www.wikidata.org/entity/Q1748'
The size of the Turtle response is 1.6MB!
All of the schema metadata such as property and class descriptions are not needed as they can be discovered by dereferencing the respective term URIs:
wdno:P2960 a owl:Class ; owl:complementOf _:e8842935d39a233def3d267ae3737d8c .
_:e8842935d39a233def3d267ae3737d8c a owl:Restriction ; owl:onProperty wdt:P2960 ;
owl:someValuesFrom owl:Thing .
p:P518 a owl:ObjectProperty . psv:P518 a owl:ObjectProperty . pqv:P518 a owl:ObjectProperty . prv:P518 a owl:ObjectProperty . wdt:P518 a owl:ObjectProperty . ps:P518 a owl:ObjectProperty . pq:P518 a owl:ObjectProperty . pr:P518 a owl:ObjectProperty .
wd:Q1775415 a wikibase:Item ; rdfs:label "feminine"@en ; skos:prefLabel "feminine"@en ; schema:name "feminine"@en ; schema:description "grammatical gender"@en .
and so on and so forth.
Then I would argue that the provenance statements such as http://www.wikidata.org/entity/statement/Q1748-cfb94fd5-464b-1b83-a513-dd75 1882b7ce are also *not* necessary for the majority of use cases of the majority of users.
I suppose they are included to provide a complete and "truthy" response, but by doing so the usability of the data is diminished. I think the provenance statements should be removed from the default responses and relegated to some "complete" or "truthy" profile with a distinct URI, linked to from the default response.
What do you think?
Martynas atomgraph.com http://atomgraph.com _______________________________________________ Wikidata mailing list -- wikidata@lists.wikimedia.org mailto:wikidata@lists.wikimedia.org Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/mes sage/6CALPNUWKMID3UE2RK7OCIZIGOAKNAVK/ To unsubscribe send an email to wikidata-leave@lists.wikimedia.org mailto:wikidata-leave@lists.wikimedia.org
Hi again,
I am not saying that provenance statements are not necessary -- I am arguing that they are not necessary by default in Linked Data responses. The current situation is like displaying Wikipedia's page editing history at the bottom of each page. What percentage of users would find that useful? Try opening DBpedia and Wikidata URIs in a Linked Data browser and tell me which one gets rendered in a more user-friendly way?
The provenance statements are still in the RDF triplestore, meaning they can still be queried by SPARQL? Or they could be accessed via a secondary resource with a query param added to the URL or smth like that.
If the above is controversial, then I would hope removing the schema terms from the responses would not be?
Best,
Martynas
On Tue, Jan 6, 2026 at 3:57 PM frans@semantoya.nl wrote:
Dear all,
I fully agree with Andra's response in terms of content. Provenance and governance are crucial for responsible use and propagation of information.
I myself work as a healthcare professional in the field of medical guidelines and biomedical research and am confronted daily with the question of whether a term definition is reliable or not.
It makes a difference whether a term is defined by an authoritative body, such as a WHO expert working group, or by an undefined institution.
Also, from my involvement with LIFES, I see that term definitions, and the sources of those definitions, are an overlooked aspect in the data community, which greatly complicates machine interpretability and reuse of data.
Especially when Wikidata/Wikibase is used for a controlled vocabulary in a KAG.
When no source is listed for a term (definition), lossless propagation of information is not guaranteed and is therefore essentially useless for further use.
The problem outlined can be explained by two aspects.
First, it is an intrinsic given. Systems such as Wikidata are not designed to go beyond a lexicological concept. The world is much more complex than that and needs to be described with a far more expressive encyclopedic model.
In practice, knowledge graph-like systems get stuck for more complex knowledge models.
Second, it is a result of uncontrolled growth of source silos, which gives term mapping a disproportionate role in the (poorly defined) propagation of information.
It would be better to address these through an extensive federative policy.
Sincerely,
Frans van der Horst
Van: Andra Waagmeester andra@micel.io Verzonden: dinsdag 6 januari 2026 12:53 Aan: Discussion list for the Wikidata project wikidata@lists.wikimedia.org Onderwerp: [Wikidata] Re: RDF Linked Data responses of Wikidata URIs
Dear Martynas,
I strongly disagree that the provenance statements should be removed from the default responses, since it is exactly the provenance that makes Wikidata so valuable. Wikidata comes with a lot of noise, since often references are not provided. Personally, I mostly consider a Wikidata statement without a reference, without any value and is best ignored. . So if we remove the provenance, Wikidata becomes just a bag of noise.Having said this, I do acknowledge that wikidata comes with a lot of baggage or weight, but there are some decent tools out there to subset Wikidata into more manageable portions.
We did a paper on that a few years back: https://www.semantic-web-journal.net/system/files/swj3491.pdf
Cheers,
Andra
Op di 6 jan 2026 om 11:15 schreef Martynas Jusevičius martynas@atomgraph.com:
Hi all,
I hope this is the right place for this discussion :)
First of all, as developer of software for RDF Linked Data consumption, I am naturally delighted that Wikidata serves Linked Data and supports content negotiation (not many services get it right).
However, IMO, the amount of meta-triples not relevant to the requested entity, and the sheer size of the RDF data that it causes, make Wikidata's RDF responses pretty much unusable.
Let's take a single entity as an example:
curl -L -H "Accept: text/turtle" 'https://www.wikidata.org/entity/Q1748'The size of the Turtle response is 1.6MB!
All of the schema metadata such as property and class descriptions are not needed as they can be discovered by dereferencing the respective term URIs:
wdno:P2960 a owl:Class ; owl:complementOf _:e8842935d39a233def3d267ae3737d8c .
_:e8842935d39a233def3d267ae3737d8c a owl:Restriction ; owl:onProperty wdt:P2960 ;
owl:someValuesFrom owl:Thing .
p:P518 a owl:ObjectProperty . psv:P518 a owl:ObjectProperty . pqv:P518 a owl:ObjectProperty . prv:P518 a owl:ObjectProperty . wdt:P518 a owl:ObjectProperty . ps:P518 a owl:ObjectProperty . pq:P518 a owl:ObjectProperty . pr:P518 a owl:ObjectProperty .
wd:Q1775415 a wikibase:Item ; rdfs:label "feminine"@en ; skos:prefLabel "feminine"@en ; schema:name "feminine"@en ; schema:description "grammatical gender"@en .
and so on and so forth.
Then I would argue that the provenance statements such as http://www.wikidata.org/entity/statement/Q1748-cfb94fd5-464b-1b83-a513-dd751882b7ce are also *not* necessary for the majority of use cases of the majority of users.
I suppose they are included to provide a complete and "truthy" response, but by doing so the usability of the data is diminished. I think the provenance statements should be removed from the default responses and relegated to some "complete" or "truthy" profile with a distinct URI, linked to from the default response.
What do you think?
Martynas atomgraph.com _______________________________________________ Wikidata mailing list -- wikidata@lists.wikimedia.org Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/mes... To unsubscribe send an email to wikidata-leave@lists.wikimedia.org
Wikidata mailing list -- wikidata@lists.wikimedia.org Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/mes... To unsubscribe send an email to wikidata-leave@lists.wikimedia.org
As for subsetting Wikidata, what sort of users do have the resources to do that? Also that would mean new entity URIs (due to a different hostname) which are not widely known (including by LLMs), so not a practical solution IMO.
On Tue, Jan 6, 2026 at 12:54 PM Andra Waagmeester andra@micel.io wrote:
Dear Martynas,
I strongly disagree that the provenance statements should be removed from the default responses, since it is exactly the provenance that makes Wikidata so valuable. Wikidata comes with a lot of noise, since often references are not provided. Personally, I mostly consider a Wikidata statement without a reference, without any value and is best ignored. . So if we remove the provenance, Wikidata becomes just a bag of noise.Having said this, I do acknowledge that wikidata comes with a lot of baggage or weight, but there are some decent tools out there to subset Wikidata into more manageable portions. We did a paper on that a few years back: https://www.semantic-web-journal.net/system/files/swj3491.pdf
Cheers,
Andra
Op di 6 jan 2026 om 11:15 schreef Martynas Jusevičius martynas@atomgraph.com:
Hi all,
I hope this is the right place for this discussion :)
First of all, as developer of software for RDF Linked Data consumption, I am naturally delighted that Wikidata serves Linked Data and supports content negotiation (not many services get it right).
However, IMO, the amount of meta-triples not relevant to the requested entity, and the sheer size of the RDF data that it causes, make Wikidata's RDF responses pretty much unusable.
Let's take a single entity as an example:
curl -L -H "Accept: text/turtle" 'https://www.wikidata.org/entity/Q1748'The size of the Turtle response is 1.6MB!
All of the schema metadata such as property and class descriptions are not needed as they can be discovered by dereferencing the respective term URIs:
wdno:P2960 a owl:Class ; owl:complementOf _:e8842935d39a233def3d267ae3737d8c .
_:e8842935d39a233def3d267ae3737d8c a owl:Restriction ; owl:onProperty wdt:P2960 ;
owl:someValuesFrom owl:Thing .
p:P518 a owl:ObjectProperty . psv:P518 a owl:ObjectProperty . pqv:P518 a owl:ObjectProperty . prv:P518 a owl:ObjectProperty . wdt:P518 a owl:ObjectProperty . ps:P518 a owl:ObjectProperty . pq:P518 a owl:ObjectProperty . pr:P518 a owl:ObjectProperty .
wd:Q1775415 a wikibase:Item ; rdfs:label "feminine"@en ; skos:prefLabel "feminine"@en ; schema:name "feminine"@en ; schema:description "grammatical gender"@en .
and so on and so forth.
Then I would argue that the provenance statements such as http://www.wikidata.org/entity/statement/Q1748-cfb94fd5-464b-1b83-a513-dd751882b7ce are also *not* necessary for the majority of use cases of the majority of users.
I suppose they are included to provide a complete and "truthy" response, but by doing so the usability of the data is diminished. I think the provenance statements should be removed from the default responses and relegated to some "complete" or "truthy" profile with a distinct URI, linked to from the default response.
What do you think?
Martynas atomgraph.com _______________________________________________ Wikidata mailing list -- wikidata@lists.wikimedia.org Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/mes... To unsubscribe send an email to wikidata-leave@lists.wikimedia.org
Wikidata mailing list -- wikidata@lists.wikimedia.org Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/mes... To unsubscribe send an email to wikidata-leave@lists.wikimedia.org
I should think there could be various applications that would benefit from a more configurable output. For instance, just returning the labels for specified languages, returning a graph around a list of QID, - rather than just a single QID. W/o the ontology, only the truthy, provenance or not, qualifiers or not, literal values, w/o wikimedia links, redundancy This sounds like an API.
(I do not think that subsetting Wikidata would generate nye entity URIs)
(I do not get a turtle on 1.6 MB for Q1748...!?: 192K Q1748.json 1000K Q1748.jsonld 476K Q1748.ttl)
Finn Årup Nielsen https://people.compute.dtu.dk/faan/ ________________________________________ Fra: Martynas Jusevičius martynas@atomgraph.com Sendt: 6. januar 2026 16:07 Til: Discussion list for the Wikidata project Emne: [Wikidata] Re: RDF Linked Data responses of Wikidata URIs
As for subsetting Wikidata, what sort of users do have the resources to do that? Also that would mean new entity URIs (due to a different hostname) which are not widely known (including by LLMs), so not a practical solution IMO.
On Tue, Jan 6, 2026 at 12:54 PM Andra Waagmeester andra@micel.io wrote:
Dear Martynas,
I strongly disagree that the provenance statements should be removed from the default responses, since it is exactly the provenance that makes Wikidata so valuable. Wikidata comes with a lot of noise, since often references are not provided. Personally, I mostly consider a Wikidata statement without a reference, without any value and is best ignored. . So if we remove the provenance, Wikidata becomes just a bag of noise.Having said this, I do acknowledge that wikidata comes with a lot of baggage or weight, but there are some decent tools out there to subset Wikidata into more manageable portions. We did a paper on that a few years back: https://www.semantic-web-journal.net/system/files/swj3491.pdf
Cheers,
Andra
Op di 6 jan 2026 om 11:15 schreef Martynas Jusevičius martynas@atomgraph.com:
Hi all,
I hope this is the right place for this discussion :)
First of all, as developer of software for RDF Linked Data consumption, I am naturally delighted that Wikidata serves Linked Data and supports content negotiation (not many services get it right).
However, IMO, the amount of meta-triples not relevant to the requested entity, and the sheer size of the RDF data that it causes, make Wikidata's RDF responses pretty much unusable.
Let's take a single entity as an example:
curl -L -H "Accept: text/turtle" 'https://www.wikidata.org/entity/Q1748'The size of the Turtle response is 1.6MB!
All of the schema metadata such as property and class descriptions are not needed as they can be discovered by dereferencing the respective term URIs:
wdno:P2960 a owl:Class ; owl:complementOf _:e8842935d39a233def3d267ae3737d8c .
_:e8842935d39a233def3d267ae3737d8c a owl:Restriction ; owl:onProperty wdt:P2960 ;
owl:someValuesFrom owl:Thing .
p:P518 a owl:ObjectProperty . psv:P518 a owl:ObjectProperty . pqv:P518 a owl:ObjectProperty . prv:P518 a owl:ObjectProperty . wdt:P518 a owl:ObjectProperty . ps:P518 a owl:ObjectProperty . pq:P518 a owl:ObjectProperty . pr:P518 a owl:ObjectProperty .
wd:Q1775415 a wikibase:Item ; rdfs:label "feminine"@en ; skos:prefLabel "feminine"@en ; schema:name "feminine"@en ; schema:description "grammatical gender"@en .
and so on and so forth.
Then I would argue that the provenance statements such as http://www.wikidata.org/entity/statement/Q1748-cfb94fd5-464b-1b83-a513-dd751882b7ce are also *not* necessary for the majority of use cases of the majority of users.
I suppose they are included to provide a complete and "truthy" response, but by doing so the usability of the data is diminished. I think the provenance statements should be removed from the default responses and relegated to some "complete" or "truthy" profile with a distinct URI, linked to from the default response.
What do you think?
Martynas atomgraph.com _______________________________________________ Wikidata mailing list -- wikidata@lists.wikimedia.org Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/mes... To unsubscribe send an email to wikidata-leave@lists.wikimedia.org
Wikidata mailing list -- wikidata@lists.wikimedia.org Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/mes... To unsubscribe send an email to wikidata-leave@lists.wikimedia.org
_______________________________________________ Wikidata mailing list -- wikidata@lists.wikimedia.org Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/mes... To unsubscribe send an email to wikidata-leave@lists.wikimedia.org