Wikibase as a decentralized perspective for Wikidata

List overview All Threads
Download

newer

older

weekly summary #345

Weekly Summary #344

Baptiste de Coulon (le lieu imaginaire)

27 Nov 2018 27 Nov '18

10:59 p.m.

Hello,

In the pre-conference of SWIB18 [1], Stacy Allison-Cassin and Dan Scott have lead yesterday a great workshop on "Wikibase: configure, customize, and collaborate".

Among others, the discussion on the panel have show the big interest on a decentralized mode to use Wikidata throug a network of Wikibase instances.

To implement it, we have identify the following needs:

* Wikibase instance on Docker have to be update to current version of the software. * A users' community have to be build and remain in close connecting interactions with the development team. * Performing Import and export script between Wikidata and Wikibase have to be achieve. * Connecting Properties have to be developping in the way to interoperate the instances.

Is the Wikidata Community agree with this proposal? The development team also?

Is it necessary to open a second mailing-list dedicate to Wikibase?

Where is the best place to discuss of all this things?

Best Regards

Baptiste

[1] http://swib.org/swib18/index.html

Pour le lieu imaginaire

Baptiste de Coulon conseiller en gestion de l'information

le lieu imaginaire rue des Oeillets 14 2502 Bienne Suisse

+41 78 636 32 17 bdc@lelieuimaginaire.ch mailto:bdc@lelieuimaginaire.ch lelieuimaginaire.ch https://lelieuimaginaire.ch

Attachments:

attachment.htm (text/html — 2.2 KB)

Show replies by date

Yuri Astrakhan

28 Nov 28 Nov

10:02 p.m.

I would add another very important aspect - query prefixes - to build some cohesion within Wikibase community.

Currently, WDQS hardcodes prefixes like "wd:" and "wdt:" to be based on the "conceptUri" parameter. Which means that any Wikibase installation that has its own data would still use well-recognized wd* style prefixes, but they would not mean the same thing as for Wikidata, causing confusion. This is especially important because in most cases, people will want to use federated queries to join data from their own Wikibase instances with the Wikidata one.

My project - sophox.org (OpenStreetMap data and metadata) - has set up an additional set of prefixes that mirror the wd* ones -- osmd, osmdt, ..., but users still have to override the default wd: meaning to point back to Wikidata, otherwise they cannot meaningfully use Wikidata federation.

On Wed, Nov 28, 2018 at 8:24 AM Baptiste de Coulon (le lieu imaginaire) < bdc@lelieuimaginaire.ch> wrote:

...

Hello,

In the pre-conference of SWIB18 [1], Stacy Allison-Cassin and Dan Scott have lead yesterday a great workshop on "Wikibase: configure, customize, and collaborate".

Among others, the discussion on the panel have show the big interest on a decentralized mode to use Wikidata throug a network of Wikibase instances.

To implement it, we have identify the following needs:

Wikibase instance on Docker have to be update to current version of

the software.

A users' community have to be build and remain in close connecting interactions

with the development team.

Performing Import and export script between Wikidata and Wikibase

have to be achieve.

Connecting Properties have to be developping in the way to

interoperate the instances.

Is the Wikidata Community agree with this proposal? The development team also?

Is it necessary to open a second mailing-list dedicate to Wikibase?

Where is the best place to discuss of all this things?

Best Regards

Baptiste

[1] http://swib.org/swib18/index.html Pour le lieu imaginaire

Baptiste de Coulon conseiller en gestion de l'information

le lieu imaginaire rue des Oeillets 14 2502 Bienne Suisse

+41 78 636 32 17 bdc@lelieuimaginaire.ch lelieuimaginaire.ch

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

James Heald

11:45 p.m.

It should also be made possible for the local wikibase to use local prefixes other than 'P' and 'Q' for its own local properties and items, otherwise it makes things needlessly confusing -- but currently I think this is not possible.

-- James

On 28/11/2018 16:32, Yuri Astrakhan wrote:

...

I would add another very important aspect - query prefixes - to build some cohesion within Wikibase community.

Currently, WDQS hardcodes prefixes like "wd:" and "wdt:" to be based on the "conceptUri" parameter. Which means that any Wikibase installation that has its own data would still use well-recognized wd* style prefixes, but they would not mean the same thing as for Wikidata, causing confusion. This is especially important because in most cases, people will want to use federated queries to join data from their own Wikibase instances with the Wikidata one.

My project - sophox.org (OpenStreetMap data and metadata) - has set up an additional set of prefixes that mirror the wd* ones -- osmd, osmdt, ..., but users still have to override the default wd: meaning to point back to Wikidata, otherwise they cannot meaningfully use Wikidata federation.

On Wed, Nov 28, 2018 at 8:24 AM Baptiste de Coulon (le lieu imaginaire) < bdc@lelieuimaginaire.ch> wrote:

...
Hello,

In the pre-conference of SWIB18 [1], Stacy Allison-Cassin and Dan Scott have lead yesterday a great workshop on "Wikibase: configure, customize, and collaborate".

Among others, the discussion on the panel have show the big interest on a decentralized mode to use Wikidata throug a network of Wikibase instances.

To implement it, we have identify the following needs:
- Wikibase instance on Docker have to be update to current version of
the software.
- A users' community have to be build and remain in close connecting interactions
with the development team.
- Performing Import and export script between Wikidata and Wikibase
have to be achieve.
- Connecting Properties have to be developping in the way to
interoperate the instances.
Is the Wikidata Community agree with this proposal? The development team also?

Is it necessary to open a second mailing-list dedicate to Wikibase?

Where is the best place to discuss of all this things?

Best Regards

Baptiste

[1] http://swib.org/swib18/index.html Pour le lieu imaginaire

Baptiste de Coulon conseiller en gestion de l'information

le lieu imaginaire rue des Oeillets 14 2502 Bienne Suisse

+41 78 636 32 17 bdc@lelieuimaginaire.ch lelieuimaginaire.ch

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

--- This email has been checked for viruses by AVG. https://www.avg.com

Yuri Astrakhan

29 Nov 29 Nov

midnight

James, this would be possible the moment Wikibase team accept this to be a requirement. This is not a technical issue, it's a philosophical one. I have written a patch that allows wikis to customize it very easily, but alas, no progress. Feel free to chime in.

https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/455480

On Wed, Nov 28, 2018 at 1:15 PM James Heald jpm.heald@gmail.com wrote:

...

It should also be made possible for the local wikibase to use local prefixes other than 'P' and 'Q' for its own local properties and items, otherwise it makes things needlessly confusing -- but currently I think this is not possible.

-- James

On 28/11/2018 16:32, Yuri Astrakhan wrote:

...
I would add another very important aspect - query prefixes - to build

some

...
cohesion within Wikibase community.

Currently, WDQS hardcodes prefixes like "wd:" and "wdt:" to be based on

the

...
"conceptUri" parameter. Which means that any Wikibase installation that has its own data would still use well-recognized wd* style prefixes, but they would not mean the same thing as for Wikidata, causing confusion. This is especially important because in most cases, people will want to

use

...
federated queries to join data from their own Wikibase instances with the Wikidata one.

My project - sophox.org (OpenStreetMap data and metadata) - has set up

an

...
additional set of prefixes that mirror the wd* ones -- osmd, osmdt, ..., but users still have to override the default wd: meaning to point back to Wikidata, otherwise they cannot meaningfully use Wikidata federation.

On Wed, Nov 28, 2018 at 8:24 AM Baptiste de Coulon (le lieu imaginaire) < bdc@lelieuimaginaire.ch> wrote:

...
Hello,

In the pre-conference of SWIB18 [1], Stacy Allison-Cassin and Dan Scott have lead yesterday a great workshop on "Wikibase: configure, customize, and collaborate".

Among others, the discussion on the panel have show the big interest on

a

...
...
decentralized mode to use Wikidata throug a network of Wikibase

instances.

...
...
To implement it, we have identify the following needs:
- Wikibase instance on Docker have to be update to current version
of

...
...
the software.
- A users' community have to be build and remain in close
connecting interactions

...
...
with the development team.
- Performing Import and export script between Wikidata and Wikibase
have to be achieve.
- Connecting Properties have to be developping in the way to
interoperate the instances.
Is the Wikidata Community agree with this proposal? The development team also?

Is it necessary to open a second mailing-list dedicate to Wikibase?

Where is the best place to discuss of all this things?

Best Regards

Baptiste

[1] http://swib.org/swib18/index.html Pour le lieu imaginaire

Baptiste de Coulon conseiller en gestion de l'information

le lieu imaginaire rue des Oeillets 14 2502 Bienne Suisse

+41 78 636 32 17 bdc@lelieuimaginaire.ch lelieuimaginaire.ch

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
This email has been checked for viruses by AVG. https://www.avg.com

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Daniel Kinzler

6:32 a.m.

Am 28.11.18 um 10:15 schrieb James Heald:

...

It should also be made possible for the local wikibase to use local prefixes other than 'P' and 'Q' for its own local properties and items, otherwise it makes things needlessly confusing -- but currently I think this is not possible.

I think the opposite is the case: ending up with a zoo of prefixes, with items being called A73834 and F0924095 and Q98985 and W094509, would be very confusing. The current approach is to to use the same approach that RDF and XML use: add a kind of namespace identifier in front of "foreign" identifiers. So you would have Q437643 for "local" items, xy:Q8743 for items from xy, foo:Q873287 for items from foo, etc. This is how foreign IDs are currently implemented in Wikibase.

-- Daniel Kinzler Principal Software Engineer, Core Platform Wikimedia Foundation

Yuri Astrakhan

7:44 a.m.

Daniel, it is not so clear cut. Most users will not be exposed to a "zoo". Case in point - Open Street Map. In OSM, the entire user base of tens of thousands of people know the meaning of Q123. The "Q" prefix has a strong identity in itself. Anyone will instantly say - yes, it's a Wikidata identifier attached to the majority of important OSM objects. So whenever someone sees an object with the tag "wikidata=Q123" or "brand:wikidata=Q123" or even "species:wikidata=Q123" they know that there is a WD item describing this object, or the brand of this object (e.g. Mc.Donalds store), or the tree species.

As Lydia said, Wikidata is a huge tree in a forest, overshadowing all other trees. It is totally ok for both OSM and some genetics storage to both use the same prefix - there will be no confusion between the users of the two. Yet both of them are likely to reference Wikidata itself. Keeping "Q" as primarily Wikidata identifier will help the users. That's why I call this a philosophical debate - on one hand, there is very real usability problem. On the other, there is a philosophical dilemma - the best approach in a hypothetical world.

Now that we also have Wikibase on OSM wiki, all of the metadata about those tags is also stored in the Q numbers. So "wikidata" key itself is Q827 [1]. Now lets say at some point we decide to store an item's "class" in osm Wiki, e.g. "item_class=Q123". How often do you think users will confuse this Q123 to be wikidata's ID vs OSM wiki ID? This is almost certain to cause confusion, especially among the novice users, without actually benefiting anyone except the philosophical "everything must be a prefix". Note that unlike Mediawiki, there are hundreds of different tools in OSM, and they do not share anything except key-value pairs. So it would not be possible to make the same "smart" interface for each of them. People will have to use Q123 as a string.

Lastly, up until this morning, the Query Service hardcoded wd:, wdt:, and other prefixes to always mean "current wiki" (conceptUri), which obviously was very confusing -- wd:Q123 had different meaning depending on where you ran it, and if you used federation query with Wikidata itself, you had to hardcode a new prefix into your query to revert the meaning of wd: back to wikidata's. Luckily, it wasn't too hard of a fix that I hope will be merged soon [2].

[1] https://wiki.openstreetmap.org/wiki/Item:Q827 [2] https://gerrit.wikimedia.org/r/c/wikidata/query/rdf/+/476398

On Wed, Nov 28, 2018 at 8:03 PM Daniel Kinzler dkinzler@wikimedia.org wrote:

...

Am 28.11.18 um 10:15 schrieb James Heald:

...
It should also be made possible for the local wikibase to use local

prefixes

...
other than 'P' and 'Q' for its own local properties and items, otherwise

it

...
makes things needlessly confusing -- but currently I think this is not

possible. I think the opposite is the case: ending up with a zoo of prefixes, with items being called A73834 and F0924095 and Q98985 and W094509, would be very confusing. The current approach is to to use the same approach that RDF and XML use: add a kind of namespace identifier in front of "foreign" identifiers. So you would have Q437643 for "local" items, xy:Q8743 for items from xy, foo:Q873287 for items from foo, etc. This is how foreign IDs are currently implemented in Wikibase.

-- Daniel Kinzler Principal Software Engineer, Core Platform Wikimedia Foundation

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Federico Leva (Nemo)

11:21 a.m.

Yuri Astrakhan, 29/11/18 04:14:

...

The "Q" prefix has a strong identity in itself. Anyone will instantly say - yes, it's a Wikidata identifier

But that's because most people only know one Wikibase installation, not the other way around.

Federico

Yuri Astrakhan

11:47 a.m.

On Thu, Nov 29, 2018 at 12:51 AM Federico Leva (Nemo) nemowiki@gmail.com wrote:

...

Yuri Astrakhan, 29/11/18 04:14:

...
The "Q" prefix has a strong identity in itself. Anyone will instantly say - yes, it's a Wikidata identifier

But that's because most people only know one Wikibase installation, not the other way around.

Of course! More specifically, at OSM that's the only Q-numbers people are aware of. All other ID systems do not have nearly the same level of recognition. It would be silly to wait for government agencies to switch to the Q-numbers too, right? Or to wait for 5-10 years until (and IF!) Q numbers become more common at other projects that are large enough to become well known, and use that potential future as a justification to not use a much more convenient system for the next 10 years. The cost of that 10 years of "wait and see" is a significant user confusion.

Imre Samu

9:51 p.m.

...

More specifically, at OSM that's the only Q-numbers people are aware of.

I would like to share my use case ( sorry if sometimes is offtopic )

I am: - member of Wikimédia Magyarország Egyesület (Wikimedia Hungary) - OSM meetup organizer - in my mind: 'Q' == Wikidata ; 'Q' == Quality ( but this is a false associations ) - I have experience working with data warehousing / relational databases

Q/P prefix for me like a https://en.wikipedia.org/wiki/Hungarian_notation

* "Hungarian notation aims to remedy this by providing the programmer with explicit knowledge of each variable's data type."* but now I am not sure: - What is the real meaning of Q/P prefix -> Wikidata or Wikibase?

I am involved in some open geodata projects. #1. adding Wikidata ID concordances to Natural Earth ( this is my work )

https://www.naturalearthdata.com/blog/miscellaneous/natural-earth-v4-1-0-rel... #2. adding Wikidata ID concordances to https://whosonfirst.org/ ( Who's On First is a gazetteer of places. ) #3. OSM

First time: I tried SPARQL + Wikidata Query Service My experience: - more and more data -> ( like: Q486972, human settlement ) -> more timeouts ( in my complex geo queries ) (a lot of farms imported in the Netherlands area, so I have to limit the search radius;... ) - data changes every time, so hard to write and validate complex program codes. After a few months, I have learned that for heavy data users the Wikidata Query Service sometimes not perfect. ( but good for light queries ! )

So now I am loading "Wikidata JSON dump" to Postgres/PostGIS database - and I am writing complex codes in SQL My codes are very complex codes ( jaro_winkler distance, geo distance, detecting Cebuno imports ; ranking multiple candidates for matching ) ; And finally I can control the performance of the system ( not timeout ) and I have reproducible results.

for example: my simple SQL example code - you can see lot of P/Q codes inside , and you can expect - now I am know lot of Q/P codes by heart ! select wd_id ,wd_label ,get_wdcqv_globecoordinate(data,'P625','P518','Q1233637') as river_mouth ,get_wdcqv_globecoordinate(data,'P625','P518','Q7376362') as river_source from wd.wdx where wd_id='Q626';

And now the "Natural Earth" tables looks like this ( relational database ) +-------------+------------+-----------+ | name | wikidataid | iata_code | +-------------+------------+-----------+ | Birsa Munda | Q598231 | IXR | | Barnaul | Q1858312 | BAX | | Bareilly | Q2788745 | |

this is my current workflow.

But my real nightmare will start - if other databases start using Q/P prefix: for example, other Airport related databases start using Wikibase - with Q codes - http://ourairports.com/ ; - https://www.flightradar24.com/data/airports - https://www.airnav.com/airports/

So every airport have at least 4 different Q codes! And in the future, I have to check errors in this spreadsheet ( and sometimes I don't see the header ) +-------------+------------+-----------+-------------+-----------+-----------+ | name | wikidataid | iata_code | ourairports | flightR24 | AirNav | +-------------+------------+-----------+-------------+-----------+-----------+ | Birsa Munda | Q598231 | IXR | Q325324 | Q973 | Q1 | | Barnaul | Q1858312 | BAX | Q42 | Q1 | Q8312 | | Bareilly | Q2788745 | | Q1 | Q31 | Q45 |

Q1 - everywhere - with different meanings

And what if some users want to add the new airport ID-s back to the wikidata ( linking databases ) Why not so in the future, If I check the https://www.wikidata.org/wiki/Q598231 I will see a lot of different Q codes: Ourairports Q325324 FlightR24 Q973 AirNav Q1

And sometimes very hard to communicate for the new contributors that Q1(AirNav) =/= Q1(Wikidata)

If I see any database/spreadsheet. - and I see a Q code - My current expectations that this is a Wikidata code. :) Just check: https://github.com/search?q=Q28+hungary&type=Code

So my current opinion: - please don't use Q/P prefixes in any new/other databases!

for me, unlearning a lot of Q/P values is hard, so as I have more-and-more experience in Wikidata data model - I would like less-and-less using any other Wikibase systems with similar Q/P prefixes.

My other pain point is the "Wikidata JSON dump" , a little more information would be a big help for me:

for detecting data quality of items: - last modification DateTime - last modification user type ( anonym_user, new_user, experienced_user, bot ) - edit counts by user type , for example: { anonym_user=2 , new_user=0 , experienced_user=0, bot=15 } Info about wikidata life cycle - Wikidata redirections / deletions ( now: only in the .ttl files )

I know - I am not a typical user ... and my problems, not a priority yet,

imho:

Integrating Wikidata iDs to other databases have already started ( OSM, Natural Earth, Who's On First , ... ) and need some guideline/support for this cases - before too late. Probably the current practice ( OSM, Natural Earth, Who's On First , ... ) is not optimal. A few months ago - I have learned an extremely painful lesson: https://phabricator.wikimedia.org/T202676#4533486 quote>>>

*- "Q" does not mean "wikidata.org http://wikidata.org". It means "item" and is used by all Wikibase installations so far.* *- "Retroactively "reserving" the letter "Q" to be exclusively used by wikidata.org http://wikidata.org can't work. It was never meant to be like this, and there is no mechanism for this."- *

*- "Q" only means "wikidata.org http://wikidata.org" to users who know about wikidata.org http://wikidata.org. These users should not have a problem understanding that the moment an OSM Wikibase installation exists, "osm:Q1" refers to this installation.*

<<<<quote

so now I am totally confused.

probably, my current practice is a "bad practice" ? :( And the "Natural Earth" wikidata integrations should add a "wd:" prefix everywhere?, but maybe it is too late to change +-------------+---------------+-----------+ | name | wikidataid | iata_code | +-------------+---------------+-----------+ | Birsa Munda | wd:Q598231 | IXR | | Barnaul | wd:Q1858312 | BAX | | Bareilly | wd:Q2788745 | |

this is my retrospective, thank you for reading.

best, Imre

Yuri Astrakhan yuriastrakhan@gmail.com ezt írta (időpont: 2018. nov. 29., Cs, 7:17):

...

On Thu, Nov 29, 2018 at 12:51 AM Federico Leva (Nemo) nemowiki@gmail.com wrote:

...
Yuri Astrakhan, 29/11/18 04:14:

...
The "Q" prefix has a strong identity in itself. Anyone will instantly say - yes, it's a Wikidata identifier

But that's because most people only know one Wikibase installation, not the other way around.

Of course! More specifically, at OSM that's the only Q-numbers people are aware of. All other ID systems do not have nearly the same level of recognition. It would be silly to wait for government agencies to switch to the Q-numbers too, right? Or to wait for 5-10 years until (and IF!) Q numbers become more common at other projects that are large enough to become well known, and use that potential future as a justification to not use a much more convenient system for the next 10 years. The cost of that 10 years of "wait and see" is a significant user confusion. _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Daniel Kinzler

11:11 p.m.

Am 29.11.18 um 08:21 schrieb Imre Samu:

...

What is the real meaning of Q/P prefix -> Wikidata or Wikibase?

The intention was:

P and Q indicate the *type* of the entity ("P" = "Property", "Q" = "Item" for arcane reasons), "L" = Lexeme, "F" = Form, "S" = Sense, "M" = MediaInfo). As you can tell, we'd quickly run out of letters and cause confusion if this became configurable.

Using prefixes to indicate where the entity comes from is indeed useful and is already part of the model. The prefix for Wikidata is "wd:", wo "wd:Q12345" is an item from Wikidata. The prefix can be omitted for local entities, so Q12345 is an item on the local repo (or the default repo of a wikibase client).

-- Daniel Kinzler Principal Software Engineer, Core Platform Wikimedia Foundation

Yuri Astrakhan

30 Nov 30 Nov

12:10 a.m.

Daniel,

...

P and Q indicate the *type* of the entity ("P" = "Property", "Q" = "Item" for arcane reasons), "L" = Lexeme, "F" = Form, "S" = Sense, "M" = MediaInfo). As you can tell, we'd quickly run out of letters and cause confusion if this became configurable.

I don't think this would cause a confusion, because the lexicographical project is really a separate project that just happens to reside on the same Wikidata domain. Essentially you did internally what we are asking for other sites - you mixed two projects, and kept them distinct by using different prefixes. If at some point you decide to add some new area of data, e.g. biological, you could add new prefixes for that too, but that would also be a "separate" project.

Most other sites that link to Wikidata only care about just one of those projects. E.g. OSM would have very little interest in lexical data, so it is OK if "L" prefix would be used in OSM and in WD because it won't be as confusing to the users as reusing the Q.

...

The prefix can be omitted for local entities, so Q12345 is an item on the local repo (or the default repo of a wikibase client).

I think that was a big mistake -- the "(or the default repo of a wikibase

client)" -- because wd implies Wikidata, not Wikibase, so it dilutes the meaning of "wd:". See my other email on how I fixed it.

Daniel Kinzler

3:06 a.m.

Am 29.11.18 um 10:40 schrieb Yuri Astrakhan:>If at

...

some point you decide to add some new area of data, e.g. biological, you could add new prefixes for that too, but that would also be a "separate" project.

The Q, P, L, M, etc are used to identify the *type* of entity. They are not for keeping projects separate. That was never their purpose. Wikibase uses prefixes before that, but they are prefixed *before* the letter that indicates the type.

...

The prefix can be omitted for local entities, so Q12345
is an item on the local repo (or the default repo of a wikibase client).
I think that was a big mistake -- the "(or the default repo of a wikibase client)" -- because wd implies Wikidata, not Wikibase, so it dilutes the meaning of "wd:". See my other email on how I fixed it.

I'm confused - yes, we: should ALWAYS imply wikidata. Your wikibase instance would have its own prefix (that can be omitted for local use), e.g. "osm:".

For the record, I'm just voicing my oppinion here, and telling you what the original intention was. I'm no longer working on Wikidata or Wikibase, and I can't make any decisions on any of this.

-- Daniel Kinzler Principal Software Engineer, Core Platform Wikimedia Foundation

Stas Malyshev

3:31 a.m.

Hi!

...

I don't think this would cause a confusion, because the lexicographical project is really a separate project that just happens to reside on the same Wikidata domain. Essentially you did internally what we are asking

No, the difference here is that L items are not the same as Q items - e.g. L items do not have sitelinks, and do have lemmas and senses. Data structure is different. If you use different data structure than Q items - i.e., no labels, descriptions, sitelinks, etc. - then you should use a different letter. But if it's the same structure, but for different domain - then it should be Q.

...

Most other sites that link to Wikidata only care about just one of those projects. E.g. OSM would have very little interest in lexical data, so it is OK if "L" prefix would be used in OSM and in WD because it won't be as confusing to the users as reusing the Q.

No, that would be confusing. If OSM wants own data type, because Q item does not fit - e.g. OSM doesn't want descriptions and sitelinks - then it should use a separate letter, like MediaInfo uses M. But using L would not be smart since then this data would not integrate well with lexicografical data.

-- Stas Malyshev smalyshev@wikimedia.org

Olaf Simons

29 Nov 29 Nov

1:23 p.m.

What is more problematic than the p/q business:

If I run a SPARQL search at our endpoint - such as this one:

https://database.factgrid.de/query/#SELECT%20%3FIlluminatenorden%20%3FIllumi...

I will receive answers in the form of

wd:q25

but they do not lenk to wd, wikidata, but into our database https://database.factgrid.de/entity/Q25.

The same problem in the other direction: If our users have never seen a SPARQL search in their lives (and that's 100%) and if they now click at sample queries - they will qet Wikidata sample queries which do not work on our database - just as our P and Q numbers do not match.

Olaf

...

Daniel Kinzler dkinzler@wikimedia.org hat am 29. November 2018 um 02:02 geschrieben:

Am 28.11.18 um 10:15 schrieb James Heald:

...
It should also be made possible for the local wikibase to use local prefixes other than 'P' and 'Q' for its own local properties and items, otherwise it makes things needlessly confusing -- but currently I think this is not possible.

I think the opposite is the case: ending up with a zoo of prefixes, with items being called A73834 and F0924095 and Q98985 and W094509, would be very confusing. The current approach is to to use the same approach that RDF and XML use: add a kind of namespace identifier in front of "foreign" identifiers. So you would have Q437643 for "local" items, xy:Q8743 for items from xy, foo:Q873287 for items from foo, etc. This is how foreign IDs are currently implemented in Wikibase.

-- Daniel Kinzler Principal Software Engineer, Core Platform Wikimedia Foundation

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Dr. Olaf Simons Forschungszentrum Gotha der Universität Erfurt Schloss Friedenstein, Pagenhaus 99867 Gotha

Büro: +49-361-737-1722 Mobil: +49-179-5196880

Privat: Hauptmarkt 17b/ 99867 Gotha

Andra Waagmeester

2:14 p.m.

I fully agree. I rather see the scarse development resources being focused on fixing this, than the p/q business, as you nicely call it. Tbh, I really don't see an issue with multiple p's and q's over different Wikibases. That is where prefixes are for, to distinguish between different resources. Examples of identical identifier (literal) schemes between multiple resources are abundant. (e.g. PubMed and NCBI gene) It really is a matter of getting used to, or am I missing something?

On Thu, Nov 29, 2018 at 8:54 AM Olaf Simons olaf.simons@pierre-marteau.com wrote:

...

What is more problematic than the p/q business:

If I run a SPARQL search at our endpoint - such as this one:

https://database.factgrid.de/query/#SELECT%20%3FIlluminatenorden%20%3FIllumi...

I will receive answers in the form of

wd:q25

but they do not lenk to wd, wikidata, but into our database https://database.factgrid.de/entity/Q25.

The same problem in the other direction: If our users have never seen a SPARQL search in their lives (and that's 100%) and if they now click at sample queries - they will qet Wikidata sample queries which do not work on our database - just as our P and Q numbers do not match.

Olaf

...
Daniel Kinzler dkinzler@wikimedia.org hat am 29. November 2018 um

02:02 geschrieben:

...
Am 28.11.18 um 10:15 schrieb James Heald:

...
It should also be made possible for the local wikibase to use local

prefixes

...
...
other than 'P' and 'Q' for its own local properties and items,

otherwise it

...
...
makes things needlessly confusing -- but currently I think this is not

possible.

...
I think the opposite is the case: ending up with a zoo of prefixes, with

items

...
being called A73834 and F0924095 and Q98985 and W094509, would be very confusing. The current approach is to to use the same approach that RDF

and XML

...
use: add a kind of namespace identifier in front of "foreign"

identifiers. So

...
you would have Q437643 for "local" items, xy:Q8743 for items from xy, foo:Q873287 for items from foo, etc. This is how foreign IDs are

currently

...
implemented in Wikibase.

-- Daniel Kinzler Principal Software Engineer, Core Platform Wikimedia Foundation

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Dr. Olaf Simons Forschungszentrum Gotha der Universität Erfurt Schloss Friedenstein, Pagenhaus 99867 Gotha

Büro: +49-361-737-1722 Mobil: +49-179-5196880

Privat: Hauptmarkt 17b/ 99867 Gotha

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Lydia Pintscher

2:30 p.m.

On Thu, Nov 29, 2018 at 9:46 AM Andra Waagmeester andra@micel.io wrote:

...

I fully agree. I rather see the scarse development resources being focused on fixing this, than the p/q business, as you nicely call it. Tbh, I really don't see an issue with multiple p's and q's over different Wikibases. That is where prefixes are for, to distinguish between different resources. Examples of identical identifier (literal) schemes between multiple resources are abundant. (e.g. PubMed and NCBI gene) It really is a matter of getting used to, or am I missing something?

Are we talking about https://phabricator.wikimedia.org/T194180? I'm happy to push that into one of the next sprints if so.

Cheers Lydia

-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

Yuri Astrakhan

11:23 p.m.

Olaf, Andra, Lydia,

On Thu, Nov 29, 2018 at 4:01 AM Lydia Pintscher < Lydia.Pintscher@wikimedia.de> wrote:

...

Are we talking about https://phabricator.wikimedia.org/T194180? I'm happy to push that into one of the next sprints if so.

I think my yesterday's patch fixes this issue on the server side, without

touching the frontend -- all you need to do is set the prefixes.conf file to point "wd:" to the original wikidata prefixes, set conceptUri to your schema, and add your own prefixes. Here's an example of OSM prefixes configuration. Instead of "wd:" I used "osmd". Similarly replaced all other "w" for "osm", and added "osm" when there was no "w":

OSM prefixes.conf: https://github.com/Sophox/wikidata-query-rdf/blob/master/dist/src/script/pre... Patch: https://gerrit.wikimedia.org/r/c/wikidata/query/rdf/+/476398

Daniel Kinzler

11:33 p.m.

Am 29.11.18 um 01:00 schrieb Lydia Pintscher:

...

On Thu, Nov 29, 2018 at 9:46 AM Andra Waagmeester andra@micel.io wrote:

...
I fully agree. I rather see the scarse development resources being focused on fixing this, than the p/q business, as you nicely call it. Tbh, I really don't see an issue with multiple p's and q's over different Wikibases. That is where prefixes are for, to distinguish between different resources. Examples of identical identifier (literal) schemes between multiple resources are abundant. (e.g. PubMed and NCBI gene) It really is a matter of getting used to, or am I missing something?

Are we talking about https://phabricator.wikimedia.org/T194180? I'm happy to push that into one of the next sprints if so.

This doesn't fix the hard-coded prefix in the RDF output generated by Wikibase.

-- Daniel Kinzler Principal Software Engineer, Core Platform Wikimedia Foundation

Yuri Astrakhan

11:54 p.m.

On Thu, Nov 29, 2018 at 1:03 PM Daniel Kinzler dkinzler@wikimedia.org wrote:

...

This doesn't fix the hard-coded prefix in the RDF output generated by Wikibase.

See my previous email - my patch fixes that too. Here's an example query

http://tinyurl.com/yav76uof in Sophox -- it calls out to Wikidata to get a list of large cities (using wd: and wdt: prefixes), than it matches them with OSM objects (uses data from the custom OSM importer), and also adds the metadata item stored in OSM Wiki (osmd prefix). All result links are clickable.

And yes, I had to add OSM prefixes to the GUI too so that it wouldn't show them as long URIs.

Daniel Kinzler

9:17 p.m.

Am 28.11.18 um 23:53 schrieb Olaf Simons:

...

I will receive answers in the form of

wd:q25

but they do not lenk to wd, wikidata, but into our database https://database.factgrid.de/entity/Q25.

Right, that prefix should not be "wd" for your own query service. I'm afraid that's currently hard coded in the RdfVocabulary class. That should indeed be fixed.

-- Daniel Kinzler Principal Software Engineer, Core Platform Wikimedia Foundation

Erik Paulson

2 Dec 2 Dec

6:58 a.m.

How do these external identifiers work, and how do I get something into one of these namespaces? (I apologize if I have missed them in the documentation)

If I stand up my own wikibase with the Docker containers and create an item for the Mayor of Madison, WI - lets say that creates Q2 in my local wikibase, and it will be accessible via http://localhost:8181/wiki/Item:Q2

Is there some way I could create an item in my local wikibase that would have a URL of http://localhost:8181/wiki/Item:wd:Q16107138 that represents Paul Soglin back in Wikidata but stored in my local wikibase, and that I can reference in other properties in my wikibase - if I create my own entry Q3 for Madison, it'd be nice to be able to point 'head of government' to wd:Q16107138 and be able to use it in my local wikibase and mediawiki instance, so if I have a page for Madison as well as a wikibase entry in the local install I can deference it in the wiki markup.

Or are those namespace identifiers like wd: (and xy: or foo: or whatever namespace) only in the WDQS for making calls out to SERVICE bits in SPARQL, plus whatever the WDQS exporter generates for local RDF?

(Also my apologies if wikibase doesn't work like this at all and I've so badly interpreted what Daniel is saying that I'm about to throw the whole conversation in a dead-end direction)

Thanks!

On Wed, Nov 28, 2018 at 7:03 PM Daniel Kinzler dkinzler@wikimedia.org wrote:

...

Am 28.11.18 um 10:15 schrieb James Heald:

...
It should also be made possible for the local wikibase to use local

prefixes

...
other than 'P' and 'Q' for its own local properties and items, otherwise

it

...
makes things needlessly confusing -- but currently I think this is not

possible. I think the opposite is the case: ending up with a zoo of prefixes, with items being called A73834 and F0924095 and Q98985 and W094509, would be very confusing. The current approach is to to use the same approach that RDF and XML use: add a kind of namespace identifier in front of "foreign" identifiers. So you would have Q437643 for "local" items, xy:Q8743 for items from xy, foo:Q873287 for items from foo, etc. This is how foreign IDs are currently implemented in Wikibase.

-- Daniel Kinzler Principal Software Engineer, Core Platform Wikimedia Foundation

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Daniel Kinzler

6 Dec 6 Dec

2:19 p.m.

Am 02.12.18 um 02:28 schrieb Erik Paulson:

...

How do these external identifiers work, and how do I get something into one of these namespaces? (I apologize if I have missed them in the documentation)

Hi Erik!

You got the right idea. Sadly, this feature is not implemented yet. I don't know if there is any public documentation for this by now, but here is a very rough list of the stepping stones towards allowing what you want:

1) Enable Items and Properties that exist on Wikidata to be referenced from other Wikibase instances (repo or client) that can access the Wikidata's internal database directly, and do not themselves define Items or Properties (but may define other kinds of entities). This is implemented, but not deployed yet. It is scheduled to be deployed soon on Wikimedia Commons, as part of the "Structured Data on Coommons" projects (aka Wikibase MediaInfo).

2) Enable Items and Properties that exist on Wikidata to be referenced from other Wikibase instances (repo or client) that call Wikidata's web API, and do not themselves define Items or Properties (but may define other kinds of entities). This is relatively simple, but details about the caching mechanisms need to be ironed out. Ask Adam and Lydia about the timeline for this.

3) Enable Items and Properties that exist on Wikidata to be referenced from other Wikibase instances (repo or client) that call Wikidata's web API, and *do* themselves also define Items or Properties which are *distinct* from the ones that Wikidata defines. The spec for this is clear, but some old code needs to be updated to enable this, and some details about the user interface need to be worked out. Ask Adam and Lydia about the timeline for this.

4) Enable Items and Properties that exist on Wikidata to be referenced from other Wikibase instances (repo or client) that call Wikidata's web API, and may "augment" or "override" the descriptions of Items and Properties defined on Wikidata. There seems to be a lot of demand for this, but the details of the semantics are unclear, especially with respect to SPARQL queries. More discussion is needed.

-- Daniel Kinzler Principal Software Engineer, Core Platform Wikimedia Foundation

Daniel Kinzler

2:21 p.m.

Am 06.12.18 um 09:49 schrieb Daniel Kinzler:

...

Am 02.12.18 um 02:28 schrieb Erik Paulson:

...
How do these external identifiers work, and how do I get something into one of these namespaces? (I apologize if I have missed them in the documentation)

Hi Erik!

Oh, I forgot an important disclaimer: I used to be on the Wikidata team and I was involved in discussing and specifying the different levels of federations for Wikibase repos. I am no longer part of the Wikidata team though, and may not to up to date to the latest progress. I cannot in any way speak for the Wikidata team or make any promises.

-- Daniel Kinzler Principal Software Engineer, Core Platform Wikimedia Foundation

James Heald

28 Dec 28 Dec

10:53 p.m.

Coming back to the question of P's and Q's (sorry, it's been a busy few weeks)

I read people saying "Don't worry because prefixes", but with respect I don't agree.

IMO "Don't worry because prefixes" may make sense as a response if one interacts with Wikidata primarily via RDF dumps, or SPARQL, or perhaps writing system code -- environments where those prefixes may be generally present and used.

But for anyone actually working first-hand with the data, whose work involves any substantial checking and/or manual editing of data through the wikibase user interface, I think it fails to ring true. Extensive hand-editing in this way tends to be an unavoidable aspect when curating a dataset in wikibase -- eg investigating anomalies revealed by query reports, perhaps after a large data upload or data matching procedure, and then identifying and making appropriate edits to resolve them.

For people making a lot of hand-edits like that, a process which as I have said I think is inevitable when actively curating datasets, certain property identifiers become so often encountered and so often used and repeated that they become so deeply ingrained and internalised as to become essentially second nature -- eg P18 for image, P373 for commonscat, P131 for located in administrative territorial entity, etc etc, the precise properties depending on the kind of data and items one is working with. Similarly also for a lot of certain item identifiers, eg Q5 human etc.

If one's doing a lot of editing and looking-up through the interface, these identifications become very very familiar - as internalised and unconscious and automatic as breathing.

So I do think that reusing the same identifiers for quite different meanings in a different wikibase (but with essentially exactly the same editing interface) is to create a cognitive dissonance which (IMO) is significant, unnecessary, unfortunate, and (I believe) ought to be avoidable.

A second issue is Daniel's scenarios 2 to 4, where external repos want to be using and referencing some or all of Wikidata's items and properties, with the same identifiers as Wikidata, plus some additional further properties and items of their own defined locally.

That's not straightforward, if they all have to be placed in the same shared numerical sequences following the same restricted set of initial letters.

I do take the point that it is useful to be able to use the initial letter to distinguish different kinds of Wikibase object -- ie Properties (P), Items (Q), Lexemes (L), MediaInfo items (M)

One solution might be to allow Wikibase instances to use additional characters in the identifier for the local properties, items etc specific to that Wikibase -- so that that the Wikibase could have property identifiers like Px50 or Pz50 or Pm50 to distinguish them from Wikidata's P50, or identifiers like Qx5000 or Qz5000 or Qosm5000 to distinguish them from Wikidata's Q5000.

This would straightforwardly allow Wikidata and local items and properties to exist side by side, and avoid confusion and dissonance with internalised learnt identifier codes from the items and properties on Wikidata itself.

Best regards,

James.

--- This email has been checked for viruses by AVG. https://www.avg.com

Lydia Pintscher

29 Nov 29 Nov

3:06 a.m.

Hi Baptiste,

On Wed, Nov 28, 2018 at 2:25 PM Baptiste de Coulon (le lieu imaginaire) bdc@lelieuimaginaire.ch wrote:

...

Hello,

In the pre-conference of SWIB18 [1], Stacy Allison-Cassin and Dan Scott have lead yesterday a great workshop on "Wikibase: configure, customize, and collaborate".

Among others, the discussion on the panel have show the big interest on a decentralized mode to use Wikidata throug a network of Wikibase instances.

To implement it, we have identify the following needs:

Wikibase instance on Docker have to be update to current version of the software. A users' community have to be build and remain in close connecting interactions with the development team. Performing Import and export script between Wikidata and Wikibase have to be achieve. Connecting Properties have to be developping in the way to interoperate the instances.

Is the Wikidata Community agree with this proposal? The development team also?

The dev team is committed to a strategy that is about building an ecosystem around Wikidata. This means making Wikibase more usable and useful outside Wikimedia. We have put words and work into this and we will continue to do so. It matters to me that Wikidata is not a single oasis in a big dessert but a big tree in flourishing jungle. We want much more data to be open, machine-readable and accessible but it doesn't have to all be and shouldn't all have to be on Wikidata.

...

Is it necessary to open a second mailing-list dedicate to Wikibase?

There is one already for the Wikibase user group at https://lists.wikimedia.org/mailman/listinfo/wikibaseug that we are using.

...

Where is the best place to discuss of all this things?

On that mailinglist is fine :)

Cheers Lydia

2187

Age (days ago)

2218

Last active (days ago)

wikidata@lists.wikimedia.org

24 comments

11 participants

tags (0)

participants (11)

Andra Waagmeester
Baptiste de Coulon (le lieu imaginaire)
Daniel Kinzler
Erik Paulson
Federico Leva (Nemo)
Imre Samu
James Heald
Lydia Pintscher
Olaf Simons
Stas Malyshev
Yuri Astrakhan