[Wikidata-l] Commons Wikibase

List overview All Threads
Download

newer

older

[Wikidata-l] Welcome Jan!

[Wikidata-l] Wikidata / Wikipedia...

Derric Atzrott

16 Aug 2014 16 Aug '14

2:15 a.m.

Hey,

So I heard on another mailing list that Commons is getting its own installation of Wikibase along with using Wikidata? Is this true, and if so, where might I find more information about it?

Thank you, Derric Atzrott Computer Specialist Alizee Pathology

Show replies by date

Lydia Pintscher

16 Aug 16 Aug

2:22 a.m.

Hi Derric,

On Fri, Aug 15, 2014 at 8:15 PM, Derric Atzrott datzrott@alizeepathology.com wrote:

...

Hey,

So I heard on another mailing list that Commons is getting its own installation of Wikibase along with using Wikidata? Is this true, and if so, where might I find more information about it?

Yes that's correct. We're in the planning phase and implementation will take us a while. https://commons.wikimedia.org/wiki/Commons:Wikidata_for_media_info and https://www.mediawiki.org/wiki/Multimedia/Structured_Data have some more info. We'll be publishing more over the next weeks. I also intend to hold office hours together with the multimedia team at the Foundation.

Cheers Lydia

-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

James Heald

17 Aug 17 Aug

3:56 a.m.

Out of interest, are there any thoughts as to how templates are likely to access Commons Wikibase ?

I could image Commons Wikibase sharing the same Properties (Pxxx) but having its own items -- perhaps starting C (Cnnnn).

Such a structure would make it easy to prototype templates using items on Wikidata while Commons Wikibase was being developed, just changing the relevant Q-numbers to C-numbers to port from accessing an item defined on Wikidata to one defined on Commons Wikibase.

Properties on C-numbers could point to Q-numbers; but (with very rare exceptions) properties on Q-numbers would not point to C-numbers.

Is that likely to be how things will be set up?

-- James.

Scott MacLeod

6:12 a.m.

James and Wikidatans,

Good question, and as a follow on question, how are templates (in MediaWiki, I presume) in conjunction with qLabel ( https://googleknowledge.github.io/qlabel/ and http://google-opensource.blogspot.com/2014/04/qlabel-multilingual-content-wi...) likely to access Commons Wikibase ?

Scott

On Sat, Aug 16, 2014 at 12:56 PM, James Heald j.heald@ucl.ac.uk wrote:

...

Out of interest, are there any thoughts as to how templates are likely to access Commons Wikibase ?

I could image Commons Wikibase sharing the same Properties (Pxxx) but having its own items -- perhaps starting C (Cnnnn).

Such a structure would make it easy to prototype templates using items on Wikidata while Commons Wikibase was being developed, just changing the relevant Q-numbers to C-numbers to port from accessing an item defined on Wikidata to one defined on Commons Wikibase.

Properties on C-numbers could point to Q-numbers; but (with very rare exceptions) properties on Q-numbers would not point to C-numbers.

Is that likely to be how things will be set up?

-- James.

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

-- - Scott MacLeod - http://worlduniversityandschool.org

Lydia Pintscher

18 Aug 18 Aug

9:30 p.m.

Hey James :)

On Sat, Aug 16, 2014 at 9:56 PM, James Heald j.heald@ucl.ac.uk wrote:

...

Out of interest, are there any thoughts as to how templates are likely to access Commons Wikibase ?

I could image Commons Wikibase sharing the same Properties (Pxxx) but having its own items -- perhaps starting C (Cnnnn).

Such a structure would make it easy to prototype templates using items on Wikidata while Commons Wikibase was being developed, just changing the relevant Q-numbers to C-numbers to port from accessing an item defined on Wikidata to one defined on Commons Wikibase.

Properties on C-numbers could point to Q-numbers; but (with very rare exceptions) properties on Q-numbers would not point to C-numbers.

Is that likely to be how things will be set up?

I'm not sure we're talking about the exact same thing so let me write down how I envision it:

* For a file on Commons there will be a second page on Commons that holds the structured data about that file. So if the file is HamsterBerta.jpg then we have something like Info:HamsterBerta.jpg. (Info isn't decided yet!) This is what we currently call MediaInfo and is comparable to an item on Wikidata. * On Info:HamsterBerta.jpg we have statements like "topic:hamster" and "license:CC-BY-SA". topic, hamster, license and CC-BY-SA would be linked to Wikidata properties and items respectively. * This data can then be accessed on File:HamsterBerta to make it look roughly like today if people want that. One option would be to take the Information template and rewrite it so that it takes data from the MediaInfo page where avaiable.

Does that make it a bit more clear? I should make a graphic from this...

Cheers Lydia

James Heald

10:22 p.m.

Thanks Lydia!

Something that occurs to me is that one may well want to include Commons categories in such a database, not just files, which presumably might be stored on a page like

Info:Category:Insert random Commons category intersection here

so that one could then ask whether a file belongs to such a category or not, and the data would all be in the database.

Such categories (or sets) may well not be Wikidata notable, for example:

Category:Pictures I took on my cellphone one midsummer morning

so we cannot assume they have Q-numbers.

But it would be nice if we could describe such properties using the existing Wikidata syntax, ie via a property Pxyz = "belongs to set", and then an item number for the set it belonged to.

Since the items wouldn't be on Wikidata, it would be useful if they had a different namespace, eg C nnnnnn

Of course some of the categories would be on Wikidata, so for such categories one would want to create a tie between the item on Commons Wikibase and the item on Wikidata,

C nnnnn <--> Q mmmmm

Sorry if I'm being premature and getting ahead of things, but this is the sort of thing I had in the back of my mind.

On the other hand I can quite see if, to start with, you want only to have files as items on Commons WikiBase (CWB ?). But even then, it's quite nice to have an Wikidata-style identifier syntax for talking about them, eg C nnnnn again.

(I'm not particularly hung up about the "C" -- it could be anything. But "F" for file is perhaps potentially too restrictive for future development).

Just typing out of the top of my head here,

Best,

James.

On 18/08/2014 14:30, Lydia Pintscher wrote:

...

Hey James :)

On Sat, Aug 16, 2014 at 9:56 PM, James Heald j.heald@ucl.ac.uk wrote:

...
Out of interest, are there any thoughts as to how templates are likely to access Commons Wikibase ?

I could image Commons Wikibase sharing the same Properties (Pxxx) but having its own items -- perhaps starting C (Cnnnn).

Such a structure would make it easy to prototype templates using items on Wikidata while Commons Wikibase was being developed, just changing the relevant Q-numbers to C-numbers to port from accessing an item defined on Wikidata to one defined on Commons Wikibase.

Properties on C-numbers could point to Q-numbers; but (with very rare exceptions) properties on Q-numbers would not point to C-numbers.

Is that likely to be how things will be set up?

I'm not sure we're talking about the exact same thing so let me write down how I envision it:

For a file on Commons there will be a second page on Commons that

holds the structured data about that file. So if the file is HamsterBerta.jpg then we have something like Info:HamsterBerta.jpg. (Info isn't decided yet!) This is what we currently call MediaInfo and is comparable to an item on Wikidata.

On Info:HamsterBerta.jpg we have statements like "topic:hamster" and

"license:CC-BY-SA". topic, hamster, license and CC-BY-SA would be linked to Wikidata properties and items respectively.

This data can then be accessed on File:HamsterBerta to make it look

roughly like today if people want that. One option would be to take the Information template and rewrite it so that it takes data from the MediaInfo page where avaiable.

Does that make it a bit more clear? I should make a graphic from this...

Cheers Lydia

Magnus Manske

19 Aug 19 Aug

1:53 a.m.

If I may chime in: Most, if not all, of the (overly specific) categories on Commons can be expressed by statements. So, storing the data/time from EXIF or otherwise would allow for a "midsummer morning" query. Adding EXIF camera model to the file data item would allow to query for cellphones (it would probably reference the cellphone model item on Wikidata, which in turn is an instance of cell phone).

This can be done with live queries a la WDQ, or stored procedures a la "complex queries" which are planned for Wikidata, as a click-on category replacement, if such a thing is desired.

Cheers, Magnus

On Mon, Aug 18, 2014 at 3:22 PM, James Heald j.heald@ucl.ac.uk wrote:

...

Thanks Lydia!

Something that occurs to me is that one may well want to include Commons categories in such a database, not just files, which presumably might be stored on a page like

Info:Category:Insert random Commons category intersection here

so that one could then ask whether a file belongs to such a category or not, and the data would all be in the database.

Such categories (or sets) may well not be Wikidata notable, for example:

Category:Pictures I took on my cellphone one midsummer morning

so we cannot assume they have Q-numbers.

But it would be nice if we could describe such properties using the existing Wikidata syntax, ie via a property Pxyz = "belongs to set", and then an item number for the set it belonged to.

Since the items wouldn't be on Wikidata, it would be useful if they had a different namespace, eg C nnnnnn

Of course some of the categories would be on Wikidata, so for such categories one would want to create a tie between the item on Commons Wikibase and the item on Wikidata,

C nnnnn <--> Q mmmmm

Sorry if I'm being premature and getting ahead of things, but this is the sort of thing I had in the back of my mind.

On the other hand I can quite see if, to start with, you want only to have files as items on Commons WikiBase (CWB ?). But even then, it's quite nice to have an Wikidata-style identifier syntax for talking about them, eg C nnnnn again.

(I'm not particularly hung up about the "C" -- it could be anything. But "F" for file is perhaps potentially too restrictive for future development).

Just typing out of the top of my head here,

Best,

James.

On 18/08/2014 14:30, Lydia Pintscher wrote:

...
Hey James :)

On Sat, Aug 16, 2014 at 9:56 PM, James Heald j.heald@ucl.ac.uk wrote:

...
Out of interest, are there any thoughts as to how templates are likely to access Commons Wikibase ?

I could image Commons Wikibase sharing the same Properties (Pxxx) but having its own items -- perhaps starting C (Cnnnn).

Such a structure would make it easy to prototype templates using items on Wikidata while Commons Wikibase was being developed, just changing the relevant Q-numbers to C-numbers to port from accessing an item defined on Wikidata to one defined on Commons Wikibase.

Properties on C-numbers could point to Q-numbers; but (with very rare exceptions) properties on Q-numbers would not point to C-numbers.

Is that likely to be how things will be set up?

I'm not sure we're talking about the exact same thing so let me write down how I envision it:

For a file on Commons there will be a second page on Commons that

holds the structured data about that file. So if the file is HamsterBerta.jpg then we have something like Info:HamsterBerta.jpg. (Info isn't decided yet!) This is what we currently call MediaInfo and is comparable to an item on Wikidata.

On Info:HamsterBerta.jpg we have statements like "topic:hamster" and

"license:CC-BY-SA". topic, hamster, license and CC-BY-SA would be linked to Wikidata properties and items respectively.

This data can then be accessed on File:HamsterBerta to make it look

roughly like today if people want that. One option would be to take the Information template and rewrite it so that it takes data from the MediaInfo page where avaiable.

Does that make it a bit more clear? I should make a graphic from this...

Cheers Lydia

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

James Heald

5:48 a.m.

Whilst that may be so, please nobody suggest dismantling any categories on Commons, unless and until Commons specifically asks for it.

As I learnt today, some on Commons are touchy enough just about the *idea* of Commons Wikibase, never mind anything being stored on it.

https://commons.wikimedia.org/wiki/Commons:Administrators%27_noticeboard#d:W...

Probably I was being over-facetious with my selection of the words "Midsummer Morning" in my example. (But then it wasn't me that assigned Q42).

The serious point is that I do believe there will be valuable-to-identify sets and subsets that may not be captured just by properties (unless the property is "belongs to this particular set")

So it wasn't just pictures taken from *any* cellphone *any* midsummer morning I was looking to identify, but pictures from one particular cellphone one particular midsummer morning, that happened to make a set.

Similarly I think it is possible to imagine other useful sets that wouldn't be necessarily be identified by a property that pointed to something with a particular Q-number; nor even a combination of such properties.

-- James

On 18/08/2014 18:53, Magnus Manske wrote:

...

If I may chime in: Most, if not all, of the (overly specific) categories on Commons can be expressed by statements. So, storing the data/time from EXIF or otherwise would allow for a "midsummer morning" query. Adding EXIF camera model to the file data item would allow to query for cellphones (it would probably reference the cellphone model item on Wikidata, which in turn is an instance of cell phone).

This can be done with live queries a la WDQ, or stored procedures a la "complex queries" which are planned for Wikidata, as a click-on category replacement, if such a thing is desired.

Cheers, Magnus

On Mon, Aug 18, 2014 at 3:22 PM, James Heald j.heald@ucl.ac.uk wrote:

...
Thanks Lydia!

Something that occurs to me is that one may well want to include Commons categories in such a database, not just files, which presumably might be stored on a page like

Info:Category:Insert random Commons category intersection here

so that one could then ask whether a file belongs to such a category or not, and the data would all be in the database.

Such categories (or sets) may well not be Wikidata notable, for example:

Category:Pictures I took on my cellphone one midsummer morning

so we cannot assume they have Q-numbers.

But it would be nice if we could describe such properties using the existing Wikidata syntax, ie via a property Pxyz = "belongs to set", and then an item number for the set it belonged to.

Since the items wouldn't be on Wikidata, it would be useful if they had a different namespace, eg C nnnnnn

Of course some of the categories would be on Wikidata, so for such categories one would want to create a tie between the item on Commons Wikibase and the item on Wikidata,
C nnnnn <--> Q mmmmm
Sorry if I'm being premature and getting ahead of things, but this is the sort of thing I had in the back of my mind.

On the other hand I can quite see if, to start with, you want only to have files as items on Commons WikiBase (CWB ?). But even then, it's quite nice to have an Wikidata-style identifier syntax for talking about them, eg C nnnnn again.

(I'm not particularly hung up about the "C" -- it could be anything. But "F" for file is perhaps potentially too restrictive for future development).

Just typing out of the top of my head here,

Best,
James.

Gerard Meijssen

2 p.m.

Hoi, I know the categories in Commons exist. I also know that you do not have to add categories when an image is uploaded. Many people do not consider the categories because they are just there and are not easy nor obvious without a long study.

They are there and they evolve. When the "community" finds that they are no longer useful, there will be others who still want to work on it. They can, it is a harmless occupation. Why would we consider removing category structures as long as someone cares about them ?? Thanks, GerardM

On 18 August 2014 23:48, James Heald j.heald@ucl.ac.uk wrote:

...

Whilst that may be so, please nobody suggest dismantling any categories on Commons, unless and until Commons specifically asks for it.

As I learnt today, some on Commons are touchy enough just about the *idea* of Commons Wikibase, never mind anything being stored on it.

https://commons.wikimedia.org/wiki/Commons:Administrators% 27_noticeboard#d:Wikidata:WikiProject_Structured_Data_for_Commons

Probably I was being over-facetious with my selection of the words "Midsummer Morning" in my example. (But then it wasn't me that assigned Q42).

The serious point is that I do believe there will be valuable-to-identify sets and subsets that may not be captured just by properties (unless the property is "belongs to this particular set")

So it wasn't just pictures taken from *any* cellphone *any* midsummer morning I was looking to identify, but pictures from one particular cellphone one particular midsummer morning, that happened to make a set.

Similarly I think it is possible to imagine other useful sets that wouldn't be necessarily be identified by a property that pointed to something with a particular Q-number; nor even a combination of such properties.

-- James

On 18/08/2014 18:53, Magnus Manske wrote:

...
If I may chime in: Most, if not all, of the (overly specific) categories on Commons can be expressed by statements. So, storing the data/time from EXIF or otherwise would allow for a "midsummer morning" query. Adding EXIF camera model to the file data item would allow to query for cellphones (it would probably reference the cellphone model item on Wikidata, which in turn is an instance of cell phone).

This can be done with live queries a la WDQ, or stored procedures a la "complex queries" which are planned for Wikidata, as a click-on category replacement, if such a thing is desired.

Cheers, Magnus

On Mon, Aug 18, 2014 at 3:22 PM, James Heald j.heald@ucl.ac.uk wrote:

Thanks Lydia!

...
Something that occurs to me is that one may well want to include Commons categories in such a database, not just files, which presumably might be stored on a page like

Info:Category:Insert random Commons category intersection here

so that one could then ask whether a file belongs to such a category or not, and the data would all be in the database.

Such categories (or sets) may well not be Wikidata notable, for example:

Category:Pictures I took on my cellphone one midsummer morning

so we cannot assume they have Q-numbers.

But it would be nice if we could describe such properties using the existing Wikidata syntax, ie via a property Pxyz = "belongs to set", and then an item number for the set it belonged to.

Since the items wouldn't be on Wikidata, it would be useful if they had a different namespace, eg C nnnnnn

Of course some of the categories would be on Wikidata, so for such categories one would want to create a tie between the item on Commons Wikibase and the item on Wikidata,
C nnnnn <--> Q mmmmm
Sorry if I'm being premature and getting ahead of things, but this is the sort of thing I had in the back of my mind.

On the other hand I can quite see if, to start with, you want only to have files as items on Commons WikiBase (CWB ?). But even then, it's quite nice to have an Wikidata-style identifier syntax for talking about them, eg C nnnnn again.

(I'm not particularly hung up about the "C" -- it could be anything. But "F" for file is perhaps potentially too restrictive for future development).

Just typing out of the top of my head here,

Best,
James.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

James Heald

4:14 p.m.

Also there might be queries one might want to run on the categories, which would be another reason to include them in Commons Wikibase.

-- J.

On 19/08/2014 07:00, Gerard Meijssen wrote:

...

Hoi, I know the categories in Commons exist. I also know that you do not have to add categories when an image is uploaded. Many people do not consider the categories because they are just there and are not easy nor obvious without a long study.

They are there and they evolve. When the "community" finds that they are no longer useful, there will be others who still want to work on it. They can, it is a harmless occupation. Why would we consider removing category structures as long as someone cares about them ?? Thanks, GerardM

Gerard Meijssen

4:33 p.m.

Hoi, As it is all the queries there are are external to Wikidata.. Speculating at this time about queries and Commons is futile. Thanks, GerardM

On 19 August 2014 10:14, James Heald j.heald@ucl.ac.uk wrote:

...

Also there might be queries one might want to run on the categories, which would be another reason to include them in Commons Wikibase.

-- J.

On 19/08/2014 07:00, Gerard Meijssen wrote:

...
Hoi, I know the categories in Commons exist. I also know that you do not have to add categories when an image is uploaded. Many people do not consider the categories because they are just there and are not easy nor obvious without a long study.

They are there and they evolve. When the "community" finds that they are no longer useful, there will be others who still want to work on it. They can, it is a harmless occupation. Why would we consider removing category structures as long as someone cares about them ?? Thanks, GerardM

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Thomas Douillard

5:43 p.m.

Note that in Wikidata we are developping methods and tools to class items in « classes », which in short are sets of real world things or events. In languages like the w3c language and standards OWL2 https://en.wikipedia.org/wiki/OWL2. In this language you can assign a class to an element (a media in the common case) to a class (that can be seen as a better defined category) either by creating a statement « this media belongs to that category » (in Wikidata this is done by using the « instance of » property https://www.wikidata.org/wiki/Property:P31) or by associating a so called «class expression» in OWL (an analog of a query but more powerful) Then in OWL any item who satisfy the criteria of the query or class expression associated to a class belongs to that class without stating it explicitely. In short, the possibility to assign an arbitrary class to an item when a query is not enough will also be possible with just a metadata repository, we may in the future even be able to mix these two ways to class medias.

2014-08-19 10:14 GMT+02:00 James Heald j.heald@ucl.ac.uk:

...

Also there might be queries one might want to run on the categories, which would be another reason to include them in Commons Wikibase.

-- J.

On 19/08/2014 07:00, Gerard Meijssen wrote:

...
Hoi, I know the categories in Commons exist. I also know that you do not have to add categories when an image is uploaded. Many people do not consider the categories because they are just there and are not easy nor obvious without a long study.

They are there and they evolve. When the "community" finds that they are no longer useful, there will be others who still want to work on it. They can, it is a harmless occupation. Why would we consider removing category structures as long as someone cares about them ?? Thanks, GerardM

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Gerard Meijssen

6:20 p.m.

Hoi, I cannot parse this .. Thanks, GerardM

On 19 August 2014 11:43, Thomas Douillard thomas.douillard@gmail.com wrote:

...

Note that in Wikidata we are developping methods and tools to class items in « classes », which in short are sets of real world things or events. In languages like the w3c language and standards OWL2 https://en.wikipedia.org/wiki/OWL2. In this language you can assign a class to an element (a media in the common case) to a class (that can be seen as a better defined category) either by creating a statement « this media belongs to that category » (in Wikidata this is done by using the « instance of » property https://www.wikidata.org/wiki/Property:P31) or by associating a so called «class expression» in OWL (an analog of a query but more powerful) Then in OWL any item who satisfy the criteria of the query or class expression associated to a class belongs to that class without stating it explicitely. In short, the possibility to assign an arbitrary class to an item when a query is not enough will also be possible with just a metadata repository, we may in the future even be able to mix these two ways to class medias.

2014-08-19 10:14 GMT+02:00 James Heald j.heald@ucl.ac.uk:

Also there might be queries one might want to run on the categories, which

...
would be another reason to include them in Commons Wikibase.

-- J.

On 19/08/2014 07:00, Gerard Meijssen wrote:

...
Hoi, I know the categories in Commons exist. I also know that you do not have to add categories when an image is uploaded. Many people do not consider the categories because they are just there and are not easy nor obvious without a long study.

They are there and they evolve. When the "community" finds that they are no longer useful, there will be others who still want to work on it. They can, it is a harmless occupation. Why would we consider removing category structures as long as someone cares about them ?? Thanks, GerardM

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Thomas Douillard

6:27 p.m.

Can you be more specific ?

2014-08-19 12:20 GMT+02:00 Gerard Meijssen gerard.meijssen@gmail.com:

...

Hoi, I cannot parse this .. Thanks, GerardM

On 19 August 2014 11:43, Thomas Douillard thomas.douillard@gmail.com wrote:

...
Note that in Wikidata we are developping methods and tools to class items in « classes », which in short are sets of real world things or events. In languages like the w3c language and standards OWL2 https://en.wikipedia.org/wiki/OWL2. In this language you can assign a class to an element (a media in the common case) to a class (that can be seen as a better defined category) either by creating a statement « this media belongs to that category » (in Wikidata this is done by using the « instance of » property https://www.wikidata.org/wiki/Property:P31) or by associating a so called «class expression» in OWL (an analog of a query but more powerful) Then in OWL any item who satisfy the criteria of the query or class expression associated to a class belongs to that class without stating it explicitely. In short, the possibility to assign an arbitrary class to an item when a query is not enough will also be possible with just a metadata repository, we may in the future even be able to mix these two ways to class medias.

2014-08-19 10:14 GMT+02:00 James Heald j.heald@ucl.ac.uk:

Also there might be queries one might want to run on the categories,

...
which would be another reason to include them in Commons Wikibase.

-- J.

On 19/08/2014 07:00, Gerard Meijssen wrote:

...
Hoi, I know the categories in Commons exist. I also know that you do not have to add categories when an image is uploaded. Many people do not consider the categories because they are just there and are not easy nor obvious without a long study.

They are there and they evolve. When the "community" finds that they are no longer useful, there will be others who still want to work on it. They can, it is a harmless occupation. Why would we consider removing category structures as long as someone cares about them ?? Thanks, GerardM

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Markus Krötzsch

6:30 p.m.

On 19.08.2014 12:20, Gerard Meijssen wrote:

...

Hoi, I cannot parse this ..

What Thomas is saying is that classification (putting things into "categories") and querying (finding things based on certain properties) can be combined in a natural way. In ontology languages like OWL, you can make statements that say, roughly speaking, that "all results of query A belong to class B". This allows you to build a classification partly automatically, and to ensure that your classification is always consistent with your data.

From this perspective, you can view Wikidata's stored queries as similar to OWL's class expressions in that they allow you to define classes based on the data given for each item, without having to go through all the items to add classes manually. What Thomas is referring to is of course slightly more advanced still, but maybe this clarifies part of the idea.

Cheers,

Markus

...

On 19 August 2014 11:43, Thomas Douillard <thomas.douillard@gmail.com mailto:thomas.douillard@gmail.com> wrote:

Note that in Wikidata we are developping methods and tools to class
items in « classes », which in short are sets of real world things
or events. In languages like the w3c language and standards OWL2
<https://en.wikipedia.org/wiki/OWL2>. In this language you can
assign a class to an element (a media in the common case) to a class
(that can be seen as a better defined category) either by creating a
statement « this media belongs to that category » (in Wikidata this
is done by using the « instance of » property
<https://www.wikidata.org/wiki/Property:P31>) or by associating a so
called «class expression» in OWL (an analog of a query but more
powerful) Then in OWL any item who satisfy the criteria of the query
or class expression associated to a class belongs to that class
without stating it explicitely.  In short, the possibility to assign
an arbitrary class to an item when a query is not enough will also
be possible with just a metadata repository, we may in the future
even be able to mix these two ways to class medias.


2014-08-19 10:14 GMT+02:00 James Heald <j.heald@ucl.ac.uk
<mailto:j.heald@ucl.ac.uk>>:

    Also there might be queries one might want to run on the
    categories, which would be another reason to include them in
    Commons Wikibase.

       -- J.



    On 19/08/2014 07:00, Gerard Meijssen wrote:

        Hoi,
        I know the categories in Commons exist. I also know that you
        do not have to
        add categories when an image is uploaded. Many people do not
        consider the
        categories because they are just there and are not easy nor
        obvious without
        a long study.

        They are there and they evolve. When the "community" finds
        that they are no
        longer useful, there will be others who still want to work
        on it. They can,
        it is a harmless occupation. Why would we consider removing
        category
        structures as long as someone cares about them ??
        Thanks,
                 GerardM



    _________________________________________________
    Wikidata-l mailing list
    Wikidata-l@lists.wikimedia.org
    <mailto:Wikidata-l@lists.wikimedia.org>
    https://lists.wikimedia.org/__mailman/listinfo/wikidata-l
    <https://lists.wikimedia.org/mailman/listinfo/wikidata-l>



_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

David Cuenca

7:34 p.m.

Markus, is this related with the idea of creating a database of automatically inferred statements that you presented some time ago?

Cheers, Micru

On Tue, Aug 19, 2014 at 12:30 PM, Markus Krötzsch < markus@semantic-mediawiki.org> wrote:

...

On 19.08.2014 12:20, Gerard Meijssen wrote:

...
Hoi, I cannot parse this ..

What Thomas is saying is that classification (putting things into "categories") and querying (finding things based on certain properties) can be combined in a natural way. In ontology languages like OWL, you can make statements that say, roughly speaking, that "all results of query A belong to class B". This allows you to build a classification partly automatically, and to ensure that your classification is always consistent with your data.

From this perspective, you can view Wikidata's stored queries as similar to OWL's class expressions in that they allow you to define classes based on the data given for each item, without having to go through all the items to add classes manually. What Thomas is referring to is of course slightly more advanced still, but maybe this clarifies part of the idea.

Cheers,

Markus

...
On 19 August 2014 11:43, Thomas Douillard <thomas.douillard@gmail.com mailto:thomas.douillard@gmail.com> wrote:
Note that in Wikidata we are developping methods and tools to class
items in « classes », which in short are sets of real world things
or events. In languages like the w3c language and standards OWL2
<https://en.wikipedia.org/wiki/OWL2>. In this language you can

assign a class to an element (a media in the common case) to a class
(that can be seen as a better defined category) either by creating a
statement « this media belongs to that category » (in Wikidata this
is done by using the « instance of » property
<https://www.wikidata.org/wiki/Property:P31>) or by associating a so

called «class expression» in OWL (an analog of a query but more
powerful) Then in OWL any item who satisfy the criteria of the query
or class expression associated to a class belongs to that class
without stating it explicitely.  In short, the possibility to assign
an arbitrary class to an item when a query is not enough will also
be possible with just a metadata repository, we may in the future
even be able to mix these two ways to class medias.


2014-08-19 10:14 GMT+02:00 James Heald <j.heald@ucl.ac.uk
<mailto:j.heald@ucl.ac.uk>>:


    Also there might be queries one might want to run on the
    categories, which would be another reason to include them in
    Commons Wikibase.

       -- J.



    On 19/08/2014 07:00, Gerard Meijssen wrote:

        Hoi,
        I know the categories in Commons exist. I also know that you
        do not have to
        add categories when an image is uploaded. Many people do not
        consider the
        categories because they are just there and are not easy nor
        obvious without
        a long study.

        They are there and they evolve. When the "community" finds
        that they are no
        longer useful, there will be others who still want to work
        on it. They can,
        it is a harmless occupation. Why would we consider removing
        category
        structures as long as someone cares about them ??
        Thanks,
                 GerardM



    _________________________________________________
    Wikidata-l mailing list
    Wikidata-l@lists.wikimedia.org
    <mailto:Wikidata-l@lists.wikimedia.org>
    https://lists.wikimedia.org/__mailman/listinfo/wikidata-l

    <https://lists.wikimedia.org/mailman/listinfo/wikidata-l>



_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org
...
https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

-- Etiamsi omnes, ego non

Markus Krötzsch

11:59 p.m.

On 19.08.2014 13:34, David Cuenca wrote:

...

Markus, is this related with the idea of creating a database of automatically inferred statements that you presented some time ago?

I guess it is related, yes. Although currently I am first focussing on efficient query answering -- inference will come after that :-) Whether one stores the results in a database or not is an implementation detail, btw., that is maybe not essential for a user.

Oh, and of course all of this seems to be rather off-topic for the current subject "Commons Wikibase" :-) All that is relevant there has already been said I guess ("many categories could be expressed by queries to improve results; a gentle, community-led transition will be possible and preferred; categories won't be switched off just because Wikidata is switched on").

Cheers,

Markus

...

Cheers, Micru

On Tue, Aug 19, 2014 at 12:30 PM, Markus Krötzsch <markus@semantic-mediawiki.org mailto:markus@semantic-mediawiki.org> wrote:

On 19.08.2014 12:20, Gerard Meijssen wrote:

    Hoi,
    I cannot parse this ..


What Thomas is saying is that classification (putting things into
"categories") and querying (finding things based on certain
properties) can be combined in a natural way. In ontology languages
like OWL, you can make statements that say, roughly speaking, that
"all results of query A belong to class B". This allows you to build
a classification partly automatically, and to ensure that your
classification is always consistent with your data.

 >From this perspective, you can view Wikidata's stored queries as
similar to OWL's class expressions in that they allow you to define
classes based on the data given for each item, without having to go
through all the items to add classes manually. What Thomas is
referring to is of course slightly more advanced still, but maybe
this clarifies part of the idea.

Cheers,

Markus



    On 19 August 2014 11:43, Thomas Douillard
    <thomas.douillard@gmail.com <mailto:thomas.douillard@gmail.com>
    <mailto:thomas.douillard@__gmail.com
    <mailto:thomas.douillard@gmail.com>>> wrote:

         Note that in Wikidata we are developping methods and tools
    to class
         items in « classes », which in short are sets of real world
    things
         or events. In languages like the w3c language and standards
    OWL2
         <https://en.wikipedia.org/__wiki/OWL2
    <https://en.wikipedia.org/wiki/OWL2>>. In this language you can

         assign a class to an element (a media in the common case)
    to a class
         (that can be seen as a better defined category) either by
    creating a
         statement « this media belongs to that category » (in
    Wikidata this
         is done by using the « instance of » property
         <https://www.wikidata.org/__wiki/Property:P31
    <https://www.wikidata.org/wiki/Property:P31>>) or by associating
    a so

         called «class expression» in OWL (an analog of a query but more
         powerful) Then in OWL any item who satisfy the criteria of
    the query
         or class expression associated to a class belongs to that class
         without stating it explicitely.  In short, the possibility
    to assign
         an arbitrary class to an item when a query is not enough
    will also
         be possible with just a metadata repository, we may in the
    future
         even be able to mix these two ways to class medias.


         2014-08-19 10:14 GMT+02:00 James Heald <j.heald@ucl.ac.uk
    <mailto:j.heald@ucl.ac.uk>
         <mailto:j.heald@ucl.ac.uk <mailto:j.heald@ucl.ac.uk>>>:


             Also there might be queries one might want to run on the
             categories, which would be another reason to include
    them in
             Commons Wikibase.

                -- J.



             On 19/08/2014 07:00, Gerard Meijssen wrote:

                 Hoi,
                 I know the categories in Commons exist. I also know
    that you
                 do not have to
                 add categories when an image is uploaded. Many
    people do not
                 consider the
                 categories because they are just there and are not
    easy nor
                 obvious without
                 a long study.

                 They are there and they evolve. When the
    "community" finds
                 that they are no
                 longer useful, there will be others who still want
    to work
                 on it. They can,
                 it is a harmless occupation. Why would we consider
    removing
                 category
                 structures as long as someone cares about them ??
                 Thanks,
                          GerardM



             ___________________________________________________
             Wikidata-l mailing list
    Wikidata-l@lists.wikimedia.org
    <mailto:Wikidata-l@lists.wikimedia.org>
             <mailto:Wikidata-l@lists.__wikimedia.org
    <mailto:Wikidata-l@lists.wikimedia.org>>
    https://lists.wikimedia.org/____mailman/listinfo/wikidata-l
    <https://lists.wikimedia.org/__mailman/listinfo/wikidata-l>


    <https://lists.wikimedia.org/__mailman/listinfo/wikidata-l
    <https://lists.wikimedia.org/mailman/listinfo/wikidata-l>>



         _________________________________________________
         Wikidata-l mailing list
    Wikidata-l@lists.wikimedia.org
    <mailto:Wikidata-l@lists.wikimedia.org>
    <mailto:Wikidata-l@lists.__wikimedia.org
    <mailto:Wikidata-l@lists.wikimedia.org>>

    https://lists.wikimedia.org/__mailman/listinfo/wikidata-l
    <https://lists.wikimedia.org/mailman/listinfo/wikidata-l>




    _________________________________________________
    Wikidata-l mailing list
    Wikidata-l@lists.wikimedia.org
    <mailto:Wikidata-l@lists.wikimedia.org>
    https://lists.wikimedia.org/__mailman/listinfo/wikidata-l
    <https://lists.wikimedia.org/mailman/listinfo/wikidata-l>



_________________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org>
https://lists.wikimedia.org/__mailman/listinfo/wikidata-l
<https://lists.wikimedia.org/mailman/listinfo/wikidata-l>

-- Etiamsi omnes, ego non

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

David Cuenca

20 Aug 20 Aug

4:23 a.m.

On Tue, Aug 19, 2014 at 5:59 PM, Markus Krötzsch < markus@semantic-mediawiki.org> wrote:

...

I guess it is related, yes. Although currently I am first focussing on efficient query answering -- inference will come after that :-) Whether one stores the results in a database or not is an implementation detail, btw., that is maybe not essential for a user.

Ok, ok, no hurries :-)

Oh, and of course all of this seems to be rather off-topic for the current

...

subject "Commons Wikibase" :-) All that is relevant there has already been said I guess ("many categories could be expressed by queries to improve results; a gentle, community-led transition will be possible and preferred; categories won't be switched off just because Wikidata is switched on").

Actually I have one last question :) At the moment Gerard is using "is a list of:<value>" on category item pages which has the effect of being the inverse of "instance of". And then he adds further conditions as qualifiers, see: https://www.wikidata.org/wiki/Q6562

While this method of works for simple categories, more complex ones would be hard to model using this method, like https://www.wikidata.org/wiki/Q8380098

I was thinking of modelling it like: <Category:Discoverers of extrasolar planets> is a list of <human> <Category:Discoverers of extrasolar planets> has items used as value of <discoverer>

Of course it would require to have a link between the item "discoverer" and the property "discoverer", but would that make sense?

Thanks, Micru

Markus Krötzsch

3:36 p.m.

On 19.08.2014 22:23, David Cuenca wrote: ...

...

Actually I have one last question :) At the moment Gerard is using "is a list of:<value>" on category item pages which has the effect of being the inverse of "instance of". And then he adds further conditions as qualifiers, see: https://www.wikidata.org/wiki/Q6562

While this method of works for simple categories, more complex ones would be hard to model using this method, like https://www.wikidata.org/wiki/Q8380098

I was thinking of modelling it like: <Category:Discoverers of extrasolar planets> is a list of <human> <Category:Discoverers of extrasolar planets> has items used as value of

<discoverer>

Of course it would require to have a link between the item "discoverer" and the property "discoverer", but would that make sense?

Well, it depends on what the intended use of "is a list of" is. First note that it is not the inverse of "instance of" (the inverse of a relation R holds between all pairs where R holds, just in the opposite direction; this is not what happens here). Rather, "is a list of" describes some class that all of the elements of a list are instances of.

I don't think that it one should try to capture *exactly* what the items on the list are. Many lists are based on complicated criteria and it would be very hard to express them in a good way using statements. What you suggest above would be an ad hoc solution (a.k.a. hack) for a few cases; many other cases would need different features. Even if one would have a way to capture some lists exactly, one would need to document this very carefully in order for the information to be useful to others. In essence, one would specify a query language there. Since we already are working on queries for Wikidata, the better way to solve this in the future would be to refer to actual queries (as soon as they are expressive enough).

Anyway, as I understand it, Gerard is adding these statements to help with the organisation of lists (and to give some more relevant statements to list items, e.g., to assist the auto description). Since we want to support automated list generation in the future (using query results on Wikipedia pages), it might be handy to have some overview of the lists (how many, about which topics, etc.). But I am just guessing here -- maybe Gerard has other reasons too.

Cheers,

Markus

Gerard Meijssen

4:46 p.m.

Hoi, When I add statements with "is a list of", the item I refer to works as a base. It and all subsequent statements are required to be the result of the result that is generated by WDQ in the background. The results are shown automatically from within Reasonator.

The hack is in having Reasonator interpret the limited expressions available. Then again, calling Reasonator a hack is a disservice to the real application it provides.

When I associate "is a list of" with categories in Wikidata, I express reasonable expectations about what such a category should be about. Presidents of the USA for instance are human and they hold or held the office of President of the USA. This excludes Lex Luthor who is shown as one in Reasonator because it does not make the human restriction.

With the results of the queries several things are expressed. Obviously the results of the query but implicitly it shows "local" articles that are not categorised. It shows items that may or may not have an article elsewhere. Yes, I use it when I add statements to items. It does show up in Reasonator, in WDQ results, in automated descriptions and as interestingly it will end up in the tool by Markus.

The application of the "is a list of" in categories is powerful. It gives clues about a subset of data. When people have an application for it, they concentrate on it. For instance there was a project on "members of the Lok Sabha" and another on "members of the European parliament". The results of the work done prevented a lot of duplicate items.. (who would expect for the Romanian Wikipedia to be among the best in knowing about members of the European parliament?)

When I talk about query, I talk about WDQ and its results. There is no alternative at this time. Consider for instance the tool of Markus. It may have already have a limited application for some but as long as it does not update itself, it is not as illustrative as the WDQ by Magnus and it cannot be used in the same way to improve Wikidata as is possible by many of the tools by Magnus.

The official query happens when it does. When it does it will severely stunted. This is because the "simple" queries will not have the power to make them as illustrative as the queries used by the "is a list of". Obviously I cannot wait until this situation is reversed. Having to convert the existing queries is a pain but it is a nice pain.

As I explained at Wikimania, it is all in the application. When there is one, it makes sense to have it. Without an application it is at best a nice effort we can talk about. But hey, I have a limited amount of time so I prefer to concentrate on application of functionality and data. Thanks, GerardM

On 20 August 2014 09:36, Markus Krötzsch markus@semantic-mediawiki.org wrote:

...

On 19.08.2014 22:23, David Cuenca wrote: ...

...
Actually I have one last question :) At the moment Gerard is using "is a list of:<value>" on category item pages which has the effect of being the inverse of "instance of". And then he adds further conditions as qualifiers, see: https://www.wikidata.org/wiki/Q6562

While this method of works for simple categories, more complex ones would be hard to model using this method, like https://www.wikidata.org/wiki/Q8380098

I was thinking of modelling it like: <Category:Discoverers of extrasolar planets> is a list of <human> <Category:Discoverers of extrasolar planets> has items used as value of

<discoverer>

Of course it would require to have a link between the item "discoverer" and the property "discoverer", but would that make sense?

Well, it depends on what the intended use of "is a list of" is. First note that it is not the inverse of "instance of" (the inverse of a relation R holds between all pairs where R holds, just in the opposite direction; this is not what happens here). Rather, "is a list of" describes some class that all of the elements of a list are instances of.

I don't think that it one should try to capture *exactly* what the items on the list are. Many lists are based on complicated criteria and it would be very hard to express them in a good way using statements. What you suggest above would be an ad hoc solution (a.k.a. hack) for a few cases; many other cases would need different features. Even if one would have a way to capture some lists exactly, one would need to document this very carefully in order for the information to be useful to others. In essence, one would specify a query language there. Since we already are working on queries for Wikidata, the better way to solve this in the future would be to refer to actual queries (as soon as they are expressive enough).

Anyway, as I understand it, Gerard is adding these statements to help with the organisation of lists (and to give some more relevant statements to list items, e.g., to assist the auto description). Since we want to support automated list generation in the future (using query results on Wikipedia pages), it might be handy to have some overview of the lists (how many, about which topics, etc.). But I am just guessing here -- maybe Gerard has other reasons too.

Cheers,

Markus

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Markus Krötzsch

7:52 p.m.

On 20.08.2014 10:46, Gerard Meijssen wrote:

...

Hoi, When I add statements with "is a list of", the item I refer to works as a base. It and all subsequent statements are required to be the result of the result that is generated by WDQ in the background. The results are shown automatically from within Reasonator.

The hack is in having Reasonator interpret the limited expressions available. Then again, calling Reasonator a hack is a disservice to the real application it provides.

Not sure what you refer to, but there might be a misunderstanding here. I was using the word "hack" in my email to refer to the proposal of using additional qualifiers to express queries in Wikidata. That was a new proposal in the email I replied to and had nothing to do with Reasonator or your annotations.

Markus

Paul Houle

11:51 p.m.

I'd be particularly wary of inferring anything from the EXIF data, especially the time.

I have a cheap digital camera which is pretty good except that the clock periodically resets to a default time. I've got a somewhat more expensive digital camera which has the same problem. I have an android tablet that I assume gets the time from the net and/or GPS, but when I took it ought of my gym bag the other day I noticed the time display had been switched to 24hrs and the time zone was switched to central.

When I am in the photography habit, I keep the clock set on my cameras. Sometimes I fall out of the habit but something interesting happens and you'd better believe I am not going to waste time setting the clock if I get a chance to photograph a burning car!

Similarly when travelling I might be bothered to set the timezone or not, more likely not if I have a layover in some place like Frankfurt or Schiphol airport.

If somebody decided just to set the clock to Zulu I wouldn't blame them.

Also, efforts to infer stuff from the EXIF data such as "did the flash go off?" rarely produce interesting results. For instance, it's a good habit to use the flash when you take photos of people outdoors on a bright day because it softens the shadows. Some people do it all the time and the auto mode on some cameras does it by default too. Thus, the flash is not an indicator that a photo was taken at night, indoors, in the dark, etc.

If you filter on things like that, or the ISO level, or the exposure, or aperture, you're unlikely to get categories that are useful.

On Wed, Aug 20, 2014 at 7:52 AM, Markus Krötzsch < markus@semantic-mediawiki.org> wrote:

...

On 20.08.2014 10:46, Gerard Meijssen wrote:

...
Hoi, When I add statements with "is a list of", the item I refer to works as a base. It and all subsequent statements are required to be the result of the result that is generated by WDQ in the background. The results are shown automatically from within Reasonator.

The hack is in having Reasonator interpret the limited expressions available. Then again, calling Reasonator a hack is a disservice to the real application it provides.

Not sure what you refer to, but there might be a misunderstanding here. I was using the word "hack" in my email to refer to the proposal of using additional qualifiers to express queries in Wikidata. That was a new proposal in the email I replied to and had nothing to do with Reasonator or your annotations.

Markus

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

-- Paul Houle Expert on Freebase, DBpedia, Hadoop and RDF (607) 539 6254 paul.houle on Skype ontology2@gmail.com

Magnus Manske

21 Aug 21 Aug

3:34 a.m.

On Wed, Aug 20, 2014 at 4:51 PM, Paul Houle ontology2@gmail.com wrote:

...

I'd be particularly wary of inferring anything from the EXIF data, especially the time.

We could (should!) store the date/time anyway, and slap a "source:EXIF"

(or the like) qualifier on it.

If there is a "manual" time (e.g. written in the template), that could become the "preferred" time statement.

One would thus, by default, get the manual time, or EXIF time if no manual available. Or, one could ask specifically for either. Even if just to see how reliable EXIF time is in practice...

Cheers, Magnus

Andy Mabbett

6:51 a.m.

On 20 August 2014 16:51, Paul Houle ontology2@gmail.com wrote:

...

I have a cheap digital camera which is pretty good except that the clock periodically resets to a default time.

You probably need to replace the internal battery.

I keep the clock in my camera set to UTC, wherever I am in the world, because I was always forgetting to change it/ change it back when I changed timezones.

My hone photos, though, have "correct" local times, because my phone updates its clock automatically.

Meh.

-- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk

Lydia Pintscher

19 Aug 19 Aug

10:31 p.m.

On Mon, Aug 18, 2014 at 11:48 PM, James Heald j.heald@ucl.ac.uk wrote:

...

Whilst that may be so, please nobody suggest dismantling any categories on Commons, unless and until Commons specifically asks for it.

Yeah. My hope is that we can build better tools together and then the Commons community will slowly go and migrate to those better tools because they want the benefits they bring.

Cheers Lydia

Andy Mabbett

6:41 a.m.

On 18 August 2014 15:22, James Heald j.heald@ucl.ac.uk wrote:

...

(I'm not particularly hung up about the "C" -- it could be anything. But "F" for file is perhaps potentially too restrictive for future development).

Annnn for audio Dnnnn for documents Innnn for images Vnnnn for videos

-- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk

Lydia Pintscher

10:27 p.m.

Hey :)

On Mon, Aug 18, 2014 at 4:22 PM, James Heald j.heald@ucl.ac.uk wrote:

...

Thanks Lydia!

Something that occurs to me is that one may well want to include Commons categories in such a database, not just files, which presumably might be stored on a page like

Info:Category:Insert random Commons category intersection here

so that one could then ask whether a file belongs to such a category or not, and the data would all be in the database.

So what you want is to be able to make the category one possible search criteria when searching for images? We don't need an entity type for that I think. We "just" have to build the search interface in a way that it can take those into account as well from where they are already now. Or is that missing something important you had in mind?

...

Such categories (or sets) may well not be Wikidata notable, for example:

Category:Pictures I took on my cellphone one midsummer morning

so we cannot assume they have Q-numbers.

My assumption so far was that we can assume every topic we use to tag images to be in Wikidata. Are there some examples currently in use on Commons that you think would not be covered? Because Wikidata will be used to tag much more than just Commons images in the future. So we should have a really huge vocabulary.

...

But it would be nice if we could describe such properties using the existing Wikidata syntax, ie via a property Pxyz = "belongs to set", and then an item number for the set it belonged to.

What set is this for example? Like "everything takes as part of Wiki Loves Monuments 2012"? Or some other kind of set?

...

Since the items wouldn't be on Wikidata, it would be useful if they had a different namespace, eg C nnnnnn

Imho they should be on Wikidata. I fear if we introduce another layer it'll be considerably harder to use and maintain.

Cheers Lydia

James Heald

1 Sep 1 Sep

6:42 a.m.

New subject: [Wikidata-l] Commons Categories again (was Re: Commons Wikibase)

Hi everybody,

Sorry to open up an old thread again after ten days, but there were some things in Lydia's reply below that I wanted to come back to.

So, first, a couple of examples of the kind of Commons Categories I had in mind:

https://commons.wikimedia.org/wiki/Category:Images_released_by_British_Libra...

https://commons.wikimedia.org/wiki/Category:Metropolitan_Improvements_%28182...

Despite their names, both these cats effectively identify images from particular photosets on Flickr. The first category relates to a particular set of images released by a particular institution on a particular date. The second relates to a particular set of scans from a particular edition of a particular book. Both (IMO) would (and, moreover *should*) currently fail Wikidata:Notability.

The book, and even the edition, might be notable. But a particular set of scans surely would not. Similarly, the first category is really just a photoset from Flickr, again something that wouldn't currently get a Wikidata Q-number.

Now in the email below, Lydia effectively said: no problem, just give each Commons Category a Wikidata Q-number anyway. ("Imho they should be on Wikidata. I fear if we introduce another layer it'll be considerably harder to use and maintain.")

GerardM, in sessions at Wikimania, also argued strongly simply for putting everything in Wikidata.

But I think this would be a mistake, because IMO Wikidata:Notability is a positive virtue, which should be defended. It is *useful* to people that they can download a dump of Wikidata for their own purposes, and get real-world relevant items, rather than the dump being bloated with wiki junk.

So in my opinion, Commons categories should generally *not* get Q-numbers on Wikidata (unless they pass WD:N), but should instead get items on the Commons Wikibase which is being created expressly for the purpose of holding structured data on things which really only have a commonswiki significance, and are not real-world notable.

A second point relates to Magnus's issue about how much of this could be replaced by queries.

Yes, if one were progressively building up a topic search on images from books in the 1-million image BL Mechanical Curator release, one might ask for books about London, then books published in a particular date range. But within that, the natural query to specify scans from this particular copy of 'Metropolitan Improvements' is the image's membership of this particular set -- membership of the set in itself is something that should be queryable, and such a query is the kind of query that, at the right stage, should be offerable to the user trying to refine their search.

In fact, most current Commons categories will not be WD-notable. But even for the most egregious of Commons intersection categories, IMO it will still be worth the Commons Wikibase tracking category membership for an image, not least for the ability that will give to easily present the category's files in different ways -- eg perhaps sorted by filename; or by original creation date; or by upload date; or by uploader; or by geographical proximity... etc. Holding the category membership in the wikibase then allows people to write gadgets to sort or filter or re-present the category in multiple ways. So it's useful to have the category as an entity that can be a target for a property.

But there are also reasons for a category to have an item in its own right -- because there is structured data that one may wish to associate with the category: one example would be access stats to members of the category (eg which categories in the Mechanical Curator collection have had the most file views?) -- the kind of thing of great interest to GLAMs.

Many categories also contain information defining them -- for example, for the book scans category, one would want a property that this category contained scans of the particular book (pointed to by its Q-number), probably a particular edition (probably a qualifier). One might also want to associate linked data -- pointers to entries for the book in (possibly multiple) catalogues of its original host institution.

So for all these reasons it may well be useful, as a matter of course, to have a container for structured information associated with each commonscat.

This is why I think each and every category on Commons should have its own Commons Wikibase item, with an associated C-number.

Queries are important, but I'd suggest they are best seen as an *addition* to the present category system, rather than a *replacement* for it.

A particular way forward, it seems to me, might be to allow categories to be *augmented* with specific queries -- i.e. to allow rules to be specified for particular categories, so that files whose structured-data topic information matched the rules would automatically be added to the categories, alongside the files already there.

Categories, including intersection categories, would therefore effectively auto-update, without human intervention, to include new files if they had appropriate topic information.

Existing legacy categorisation information would survive, allowing the new augmentation approach to slowly come into play if topic information were initially weak. And categories should still be specifiable by hand (or automatically through templates, e.g. as source categories are often specified through source templates) -- because this can still be the most efficient way to specify naturally closed sets.

This would effectively allow a transition pathway towards categorisation / sets-of-interest becoming more determined by the structured data.

One thing in particular it could allow would be a gadget to highlight images that were in a category directly, *not* by virtue of any rule on any metadata, which could then allow such images to be investigated and/or have their topic metadata improved.

It's easy to mock the sometimes extraordinary depths of intersection categories on Commons; such intersection categories are a pain to determine for categorisation, not a very good fit for retrieval, and nor does it well match how the rest of the world does things, which makes metadata import harder and less effective than it should be.

But there are virtues in the category system too. There is a wealth of hard-won information encoded in it. And some categories do match natural groupings of images. The hand-curated category sets and hierarchies, reflecting context knowledge, will often do better than even the best AI-driven suggestions will ever be able to match for search refinement.

Such an approach as I've suggested above would combine categories and topics in an evolutionary rather than revolutionary way. Categories would not all go away -- ever -- but would continue to exist side-by-side with topics in a symbiotic way, that IMO would make the transition smoother and more likely to engage and involve the existing community, to an end-point that it seems to me would have additional strengths over a pure query system.

I'm interested to know what other people think.

-- James. (User:Jheald)

On 19/08/2014 15:27, Lydia Pintscher wrote:

...

Hey :)

On Mon, Aug 18, 2014 at 4:22 PM, James Heald j.heald@ucl.ac.uk wrote:

...
Thanks Lydia!

Something that occurs to me is that one may well want to include Commons categories in such a database, not just files, which presumably might be stored on a page like

Info:Category:Insert random Commons category intersection here

so that one could then ask whether a file belongs to such a category or not, and the data would all be in the database.

So what you want is to be able to make the category one possible search criteria when searching for images? We don't need an entity type for that I think. We "just" have to build the search interface in a way that it can take those into account as well from where they are already now. Or is that missing something important you had in mind?

...
Such categories (or sets) may well not be Wikidata notable, for example:

Category:Pictures I took on my cellphone one midsummer morning

so we cannot assume they have Q-numbers.

My assumption so far was that we can assume every topic we use to tag images to be in Wikidata. Are there some examples currently in use on Commons that you think would not be covered? Because Wikidata will be used to tag much more than just Commons images in the future. So we should have a really huge vocabulary.

...
But it would be nice if we could describe such properties using the existing Wikidata syntax, ie via a property Pxyz = "belongs to set", and then an item number for the set it belonged to.

What set is this for example? Like "everything takes as part of Wiki Loves Monuments 2012"? Or some other kind of set?

...
Since the items wouldn't be on Wikidata, it would be useful if they had a different namespace, eg C nnnnnn

Imho they should be on Wikidata. I fear if we introduce another layer it'll be considerably harder to use and maintain.

Cheers Lydia

Gerard Meijssen

2:07 p.m.

New subject: [Wikidata-l] Commons Categories again (was Re: Commons Wikibase)

Hoi, Wikidata is very much a "working database". Its relevance is exactly because of this. Without the connection to the interwiki links, it would not be the same, it would not have the coverage and it would not have the same sized community.

Considerations about secondary use are secondary. Yes, people may use it for their own purposes and when it fits their needs, well and good. When it does not, that is fine too. As it is, we do have all kind of Wiki "junk" in there. We have disambiguation pages, list articles, templates, categories. The challenge is to find a use for them.

When I add statements based on categories, I "document" many categories [1]. As a result over 900 items for categories will show the result of a query in the Reasonator. The results is what I think a category could contain given the subject of a category. For Wikipedians they are articles not categorised, red links and blue links.

There are several reasons why this is not (yet) a perfect fit. The most obvious one is including articles that are not part of the selection eg a list in a category full of humans. Currently not everything can be expressed in a way that allows Reasonator to pick things up in a query.. dates come to mind. Then there are the categories that have an "arbitrary" set of entries.

I am not going to speculate on what kind of qualifiers Commons will come up with. In essence when you can sort it / select it Wikidata will do a better job for you. The "only" thing we have to do is identify the items that fit the mold. This is something that you can often find the basis for in existing categories. Thanks, GerardM

[1] http://ultimategerardm.blogspot.nl/2014/08/wikidata-my-workflow-enriching-wi...

http://tools.wmflabs.org/wikidata-todo/autolist.html?q=CLAIM%5B31%3A4167836%...

On 1 September 2014 00:42, James Heald j.heald@ucl.ac.uk wrote:

...

Hi everybody,

Sorry to open up an old thread again after ten days, but there were some things in Lydia's reply below that I wanted to come back to.

So, first, a couple of examples of the kind of Commons Categories I had in mind:

https://commons.wikimedia.org/wiki/Category:Images_released_ by_British_Library_Images_Online

https://commons.wikimedia.org/wiki/Category:Metropolitan_ Improvements_%281828%29_Thomas_Hosmer_Shepherd

Despite their names, both these cats effectively identify images from particular photosets on Flickr. The first category relates to a particular set of images released by a particular institution on a particular date. The second relates to a particular set of scans from a particular edition of a particular book. Both (IMO) would (and, moreover *should*) currently fail Wikidata:Notability.

The book, and even the edition, might be notable. But a particular set of scans surely would not. Similarly, the first category is really just a photoset from Flickr, again something that wouldn't currently get a Wikidata Q-number.

Now in the email below, Lydia effectively said: no problem, just give each Commons Category a Wikidata Q-number anyway. ("Imho they should be on Wikidata. I fear if we introduce another layer it'll be considerably harder to use and maintain.")

GerardM, in sessions at Wikimania, also argued strongly simply for putting everything in Wikidata.

But I think this would be a mistake, because IMO Wikidata:Notability is a positive virtue, which should be defended. It is *useful* to people that they can download a dump of Wikidata for their own purposes, and get real-world relevant items, rather than the dump being bloated with wiki junk.

So in my opinion, Commons categories should generally *not* get Q-numbers on Wikidata (unless they pass WD:N), but should instead get items on the Commons Wikibase which is being created expressly for the purpose of holding structured data on things which really only have a commonswiki significance, and are not real-world notable.

A second point relates to Magnus's issue about how much of this could be replaced by queries.

Yes, if one were progressively building up a topic search on images from books in the 1-million image BL Mechanical Curator release, one might ask for books about London, then books published in a particular date range. But within that, the natural query to specify scans from this particular copy of 'Metropolitan Improvements' is the image's membership of this particular set -- membership of the set in itself is something that should be queryable, and such a query is the kind of query that, at the right stage, should be offerable to the user trying to refine their search.

In fact, most current Commons categories will not be WD-notable. But even for the most egregious of Commons intersection categories, IMO it will still be worth the Commons Wikibase tracking category membership for an image, not least for the ability that will give to easily present the category's files in different ways -- eg perhaps sorted by filename; or by original creation date; or by upload date; or by uploader; or by geographical proximity... etc. Holding the category membership in the wikibase then allows people to write gadgets to sort or filter or re-present the category in multiple ways. So it's useful to have the category as an entity that can be a target for a property.

But there are also reasons for a category to have an item in its own right -- because there is structured data that one may wish to associate with the category: one example would be access stats to members of the category (eg which categories in the Mechanical Curator collection have had the most file views?) -- the kind of thing of great interest to GLAMs.

Many categories also contain information defining them -- for example, for the book scans category, one would want a property that this category contained scans of the particular book (pointed to by its Q-number), probably a particular edition (probably a qualifier). One might also want to associate linked data -- pointers to entries for the book in (possibly multiple) catalogues of its original host institution.

So for all these reasons it may well be useful, as a matter of course, to have a container for structured information associated with each commonscat.

This is why I think each and every category on Commons should have its own Commons Wikibase item, with an associated C-number.

Queries are important, but I'd suggest they are best seen as an *addition* to the present category system, rather than a *replacement* for it.

A particular way forward, it seems to me, might be to allow categories to be *augmented* with specific queries -- i.e. to allow rules to be specified for particular categories, so that files whose structured-data topic information matched the rules would automatically be added to the categories, alongside the files already there.

Categories, including intersection categories, would therefore effectively auto-update, without human intervention, to include new files if they had appropriate topic information.

Existing legacy categorisation information would survive, allowing the new augmentation approach to slowly come into play if topic information were initially weak. And categories should still be specifiable by hand (or automatically through templates, e.g. as source categories are often specified through source templates) -- because this can still be the most efficient way to specify naturally closed sets.

This would effectively allow a transition pathway towards categorisation / sets-of-interest becoming more determined by the structured data.

One thing in particular it could allow would be a gadget to highlight images that were in a category directly, *not* by virtue of any rule on any metadata, which could then allow such images to be investigated and/or have their topic metadata improved.

It's easy to mock the sometimes extraordinary depths of intersection categories on Commons; such intersection categories are a pain to determine for categorisation, not a very good fit for retrieval, and nor does it well match how the rest of the world does things, which makes metadata import harder and less effective than it should be.

But there are virtues in the category system too. There is a wealth of hard-won information encoded in it. And some categories do match natural groupings of images. The hand-curated category sets and hierarchies, reflecting context knowledge, will often do better than even the best AI-driven suggestions will ever be able to match for search refinement.

Such an approach as I've suggested above would combine categories and topics in an evolutionary rather than revolutionary way. Categories would not all go away -- ever -- but would continue to exist side-by-side with topics in a symbiotic way, that IMO would make the transition smoother and more likely to engage and involve the existing community, to an end-point that it seems to me would have additional strengths over a pure query system.

I'm interested to know what other people think.

-- James. (User:Jheald)

On 19/08/2014 15:27, Lydia Pintscher wrote:

...
Hey :)

On Mon, Aug 18, 2014 at 4:22 PM, James Heald j.heald@ucl.ac.uk wrote:

...
Thanks Lydia!

Something that occurs to me is that one may well want to include Commons categories in such a database, not just files, which presumably might be stored on a page like

Info:Category:Insert random Commons category intersection here

so that one could then ask whether a file belongs to such a category or not, and the data would all be in the database.

So what you want is to be able to make the category one possible search criteria when searching for images? We don't need an entity type for that I think. We "just" have to build the search interface in a way that it can take those into account as well from where they are already now. Or is that missing something important you had in mind?

Such categories (or sets) may well not be Wikidata notable, for example:

...
Category:Pictures I took on my cellphone one midsummer morning

so we cannot assume they have Q-numbers.

My assumption so far was that we can assume every topic we use to tag images to be in Wikidata. Are there some examples currently in use on Commons that you think would not be covered? Because Wikidata will be used to tag much more than just Commons images in the future. So we should have a really huge vocabulary.

But it would be nice if we could describe such properties using the

...
existing Wikidata syntax, ie via a property Pxyz = "belongs to set", and then an item number for the set it belonged to.

What set is this for example? Like "everything takes as part of Wiki Loves Monuments 2012"? Or some other kind of set?

Since the items wouldn't be on Wikidata, it would be useful if they had a

...
different namespace, eg C nnnnnn

Imho they should be on Wikidata. I fear if we introduce another layer it'll be considerably harder to use and maintain.

Cheers Lydia

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

James Heald

3 Sep 3 Sep

6:05 p.m.

New subject: [Wikidata-l] Commons Categories again (was Re: Commons Wikibase)

Gerard,

I agree with you that I would like the kind of tools currently available with WikiData also to be available on CommonsData.

Queries that combine the two in an integrated way ought to be made simple and straightforward.

What I don't understand is your objection to placing items that really only have a Commons notability, not a world notability, into a specific namespace, or (notionally) the separate database CommonsData, so that it is possible to run those queries that only relate to Commons information solely on CommonsData, and those queries that only relate to world information solely on WikiData.

Does that not make more sense, than requiring the full bulk of the combined database to always be addressed in order to run any query?

-- James.

On 01/09/2014 07:07, Gerard Meijssen wrote:

...

Hoi, Wikidata is very much a "working database". Its relevance is exactly because of this. Without the connection to the interwiki links, it would not be the same, it would not have the coverage and it would not have the same sized community.

Considerations about secondary use are secondary. Yes, people may use it for their own purposes and when it fits their needs, well and good. When it does not, that is fine too. As it is, we do have all kind of Wiki "junk" in there. We have disambiguation pages, list articles, templates, categories. The challenge is to find a use for them.

When I add statements based on categories, I "document" many categories [1]. As a result over 900 items for categories will show the result of a query in the Reasonator. The results is what I think a category could contain given the subject of a category. For Wikipedians they are articles not categorised, red links and blue links.

There are several reasons why this is not (yet) a perfect fit. The most obvious one is including articles that are not part of the selection eg a list in a category full of humans. Currently not everything can be expressed in a way that allows Reasonator to pick things up in a query.. dates come to mind. Then there are the categories that have an "arbitrary" set of entries.

I am not going to speculate on what kind of qualifiers Commons will come up with. In essence when you can sort it / select it Wikidata will do a better job for you. The "only" thing we have to do is identify the items that fit the mold. This is something that you can often find the basis for in existing categories. Thanks, GerardM

[1] http://ultimategerardm.blogspot.nl/2014/08/wikidata-my-workflow-enriching-wi...

http://tools.wmflabs.org/wikidata-todo/autolist.html?q=CLAIM%5B31%3A4167836%...

Gerard Meijssen

6:48 p.m.

New subject: [Wikidata-l] Commons Categories again (was Re: Commons Wikibase)

Hoi, I am firmly opposed to the idea that the Wikidatification of Commons is about Commons. That is imho a disaster.

It is about mediafiles and they exist in many Wikis.

The categories of Commons are in and off themselves useful to a very limited extend. Associating the images they refer to with existing items in Wikidata is one way in which they may be useful. As it is, because of naming conventions and the use of English only, the categories are pretty lame. They do not help me when I am looking for an image in Commons at all.

Really my point is forget about Commons notability start thinking in terms of "what does it take to help people find images". Yes, those people will be 8 years old and they may speak Mandarin or Japanese. Thanks, GerardM

On 3 September 2014 12:05, James Heald j.heald@ucl.ac.uk wrote:

...

Gerard,

I agree with you that I would like the kind of tools currently available with WikiData also to be available on CommonsData.

Queries that combine the two in an integrated way ought to be made simple and straightforward.

What I don't understand is your objection to placing items that really only have a Commons notability, not a world notability, into a specific namespace, or (notionally) the separate database CommonsData, so that it is possible to run those queries that only relate to Commons information solely on CommonsData, and those queries that only relate to world information solely on WikiData.

Does that not make more sense, than requiring the full bulk of the combined database to always be addressed in order to run any query?

-- James.

On 01/09/2014 07:07, Gerard Meijssen wrote:

...
Hoi, Wikidata is very much a "working database". Its relevance is exactly because of this. Without the connection to the interwiki links, it would not be the same, it would not have the coverage and it would not have the same sized community.

Considerations about secondary use are secondary. Yes, people may use it for their own purposes and when it fits their needs, well and good. When it does not, that is fine too. As it is, we do have all kind of Wiki "junk" in there. We have disambiguation pages, list articles, templates, categories. The challenge is to find a use for them.

When I add statements based on categories, I "document" many categories [1]. As a result over 900 items for categories will show the result of a query in the Reasonator. The results is what I think a category could contain given the subject of a category. For Wikipedians they are articles not categorised, red links and blue links.

There are several reasons why this is not (yet) a perfect fit. The most obvious one is including articles that are not part of the selection eg a list in a category full of humans. Currently not everything can be expressed in a way that allows Reasonator to pick things up in a query.. dates come to mind. Then there are the categories that have an "arbitrary" set of entries.

I am not going to speculate on what kind of qualifiers Commons will come up with. In essence when you can sort it / select it Wikidata will do a better job for you. The "only" thing we have to do is identify the items that fit the mold. This is something that you can often find the basis for in existing categories. Thanks, GerardM

[1] http://ultimategerardm.blogspot.nl/2014/08/wikidata- my-workflow-enriching-wikidata.html

http://tools.wmflabs.org/wikidata-todo/autolist.html?q= CLAIM%5B31%3A4167836%5D%20AND%20CLAIM%5B360%3A5%5D%20

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Derric Atzrott

8:02 p.m.

New subject: [Wikidata-l] Commons Categories again (was Re: Commons Wikibase)

...

The categories of Commons are in and off themselves useful to a very limited extend. Associating the images they refer to with existing items in Wikidata is one way in which they may be useful. As it is, because of naming conventions and the use of English only, the categories are pretty lame. They do not help me when I am looking for an image in Commons at all.

Couldn't this still be done from CommonsData? I thought the items in that database would be able to reference the ones in the Wikidata database and vice versa.

...

Really my point is forget about Commons notability start thinking in terms of "what does it take to help people find images". Yes, those people will be 8 years old and they may speak Mandarin or Japanese.

I'm confused, wouldn't having the data in CommonsData still help with this?

...

Considerations about secondary use are secondary. Yes, people may use it for their own purposes and when it fits their needs, well and good. When it does not, that is fine too. As it is, we do have all kind of Wiki "junk" in there. We have disambiguation pages, list articles, templates, categories. The challenge is to find a use for them.

I'd disagree that considerations about secondary use are secondary. Wikidata really has a huge potential for secondary use and we shouldn't forget that.

------------------------

I'm somewhat confused about this thread. Did I miss something? My understanding is that Commons will be getting its own Wikibase install in order to keep track of image metadata. We are currently having a debate over whether the 3.3 million Commons categories should be kept in Wikidata or CommonsData.

The CommonsData argument is that it keeps stuff only really useful to Commons out of the namespace that has thus far been mostly used for items relating to the real world.

The Wikidata argument is that there is already a ton of "wiki-junk" in Wikidata and we shouldn't worry about reuse of Wikidata because it is primarily a tool for Wikimedia editors and that having the data on Wikidata itself would allow editors to more easily find useful images.

Am I understanding that correctly?

Thank you, Derric Atzrott

P. Blissenbach

9:28 p.m.

New subject: [Wikidata-l] Commons Categories again (was Re: Commons Wikibase)

I strongly support this view: Wikidata should support and ease finding Commons-images. This is not only about proper categorising and tagging in a true multilingual way, but also about determining and assigning various properties - both automatically and manually.

Think for example like an art director creating an image flyer (be it about Wikimania, a national open source movement, or a company) looking for photograhps "predominantly blue" depicing "8 humans or more" of "various ages" in a "neutral or indeterminate environent" and so on, so as to get the hang of it.

Purodha

"Gerard Meijssen" gerard.meijssen@gmail.com writes:

Hoi, I am firmly opposed to the idea that the Wikidatification of Commons is about Commons. That is imho a disaster. It is about mediafiles and they exist in many Wikis. The categories of Commons are in and off themselves useful to a very limited extend. Associating the images they refer to with existing items in Wikidata is one way in which they may be useful. As it is, because of naming conventions and the use of English only, the categories are pretty lame. They do not help me when I am looking for an image in Commons at all. Really my point is forget about Commons notability start thinking in terms of "what does it take to help people find images". Yes, those people will be 8 years old and they may speak Mandarin or Japanese. Thanks, GerardM On 3 September 2014 12:05, James Heald j.heald@ucl.ac.uk wrote:Gerard,

I agree with you that I would like the kind of tools currently available with WikiData also to be available on CommonsData.

Queries that combine the two in an integrated way ought to be made simple and straightforward.

Does that not make more sense, than requiring the full bulk of the combined database to always be addressed in order to run any query?

-- James.

On 01/09/2014 07:07, Gerard Meijssen wrote:Hoi, Wikidata is very much a "working database". Its relevance is exactly because of this. Without the connection to the interwiki links, it would not be the same, it would not have the coverage and it would not have the same sized community.

[1] http://ultimategerardm.blogspot.nl/2014/08/wikidata-my-workflow-enriching-wi...]

http://tools.wmflabs.org/wikidata-todo/autolist.html?q=CLAIM%5B31%3A4167836%...

_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org[Wikidata-l@lists.wikimedia.org] https://lists.wikimedia.org/mailman/listinfo/wikidata-l_____________________... Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l%5Bhttps://lists.wiki...]

James Heald

9:33 p.m.

New subject: [Wikidata-l] Commons Categories again (was Re: Commons Wikibase)

Not really relevant.

The way that this will be achieved will be a "topics" list attached to each file, each topic being a pointer to a Wikidata item.

Sure, Wikidata may be used as one of the sources to help build the topics list; but the topics list will not be on Wikidata, but attached to each file, probably on the CommonsData wikibase.

-- James.

On 03/09/2014 14:28, P. Blissenbach wrote:

...

I strongly support this view: Wikidata should support and ease finding Commons-images. This is not only about proper categorising and tagging in a true multilingual way, but also about determining and assigning various properties - both automatically and manually.

Think for example like an art director creating an image flyer (be it about Wikimania, a national open source movement, or a company) looking for photograhps "predominantly blue" depicing "8 humans or more" of "various ages" in a "neutral or indeterminate environent" and so on, so as to get the hang of it.

Purodha

"Gerard Meijssen" gerard.meijssen@gmail.com writes:

Hoi, I am firmly opposed to the idea that the Wikidatification of Commons is about Commons. That is imho a disaster.

It is about mediafiles and they exist in many Wikis.

The categories of Commons are in and off themselves useful to a very limited extend. Associating the images they refer to with existing items in Wikidata is one way in which they may be useful. As it is, because of naming conventions and the use of English only, the categories are pretty lame. They do not help me when I am looking for an image in Commons at all.

Really my point is forget about Commons notability start thinking in terms of "what does it take to help people find images". Yes, those people will be 8 years old and they may speak Mandarin or Japanese. Thanks, GerardM

On 3 September 2014 12:05, James Heald j.heald@ucl.ac.uk wrote:Gerard,

I agree with you that I would like the kind of tools currently available with WikiData also to be available on CommonsData.

Queries that combine the two in an integrated way ought to be made simple and straightforward.

What I don't understand is your objection to placing items that really only have a Commons notability, not a world notability, into a specific namespace, or (notionally) the separate database CommonsData, so that it is possible to run those queries that only relate to Commons information solely on CommonsData, and those queries that only relate to world information solely on WikiData.

Does that not make more sense, than requiring the full bulk of the combined database to always be addressed in order to run any query?

-- James.

On 01/09/2014 07:07, Gerard Meijssen wrote:Hoi, Wikidata is very much a "working database". Its relevance is exactly because of this. Without the connection to the interwiki links, it would not be the same, it would not have the coverage and it would not have the same sized community.

Considerations about secondary use are secondary. Yes, people may use it for their own purposes and when it fits their needs, well and good. When it does not, that is fine too. As it is, we do have all kind of Wiki "junk" in there. We have disambiguation pages, list articles, templates, categories. The challenge is to find a use for them.

When I add statements based on categories, I "document" many categories [1]. As a result over 900 items for categories will show the result of a query in the Reasonator. The results is what I think a category could contain given the subject of a category. For Wikipedians they are articles not categorised, red links and blue links.

There are several reasons why this is not (yet) a perfect fit. The most obvious one is including articles that are not part of the selection eg a list in a category full of humans. Currently not everything can be expressed in a way that allows Reasonator to pick things up in a query.. dates come to mind. Then there are the categories that have an "arbitrary" set of entries.

I am not going to speculate on what kind of qualifiers Commons will come up with. In essence when you can sort it / select it Wikidata will do a better job for you. The "only" thing we have to do is identify the items that fit the mold. This is something that you can often find the basis for in existing categories. Thanks, GerardM

[1] http://ultimategerardm.blogspot.nl/2014/08/wikidata-my-workflow-enriching-wi...]

http://tools.wmflabs.org/wikidata-todo/autolist.html?q=CLAIM%5B31%3A4167836%...

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org[Wikidata-l@lists.wikimedia.org] https://lists.wikimedia.org/mailman/listinfo/wikidata-l_____________________... Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l%5Bhttps://lists.wiki...]

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Gerard Meijssen

5 Sep 5 Sep

4:40 a.m.

New subject: [Wikidata-l] Commons Categories again (was Re: Commons Wikibase)

Hoi, I am really interested how you envision searching when all those topics are isolated and attached to each file..

I also am really interested to know when you have all those files isolated on Commons, how you will include media files that are NOT on Commons.. This is a normal use case. Thanks, GerardM

On 3 September 2014 15:33, James Heald j.heald@ucl.ac.uk wrote:

...

Not really relevant.

The way that this will be achieved will be a "topics" list attached to each file, each topic being a pointer to a Wikidata item.

Sure, Wikidata may be used as one of the sources to help build the topics list; but the topics list will not be on Wikidata, but attached to each file, probably on the CommonsData wikibase.

-- James.

On 03/09/2014 14:28, P. Blissenbach wrote:

...
I strongly support this view: Wikidata should support and ease finding Commons-images. This is not only about proper categorising and tagging in a true multilingual way, but also about determining and assigning various properties - both automatically and manually.

Think for example like an art director creating an image flyer (be it about Wikimania, a national open source movement, or a company) looking for photograhps "predominantly blue" depicing "8 humans or more" of "various ages" in a "neutral or indeterminate environent" and so on, so as to get the hang of it.

Purodha

"Gerard Meijssen" gerard.meijssen@gmail.com writes:

Hoi, I am firmly opposed to the idea that the Wikidatification of Commons is about Commons. That is imho a disaster.

It is about mediafiles and they exist in many Wikis.

The categories of Commons are in and off themselves useful to a very limited extend. Associating the images they refer to with existing items in Wikidata is one way in which they may be useful. As it is, because of naming conventions and the use of English only, the categories are pretty lame. They do not help me when I am looking for an image in Commons at all.

Really my point is forget about Commons notability start thinking in terms of "what does it take to help people find images". Yes, those people will be 8 years old and they may speak Mandarin or Japanese. Thanks, GerardM

On 3 September 2014 12:05, James Heald j.heald@ucl.ac.uk wrote:Gerard,

I agree with you that I would like the kind of tools currently available with WikiData also to be available on CommonsData.

Queries that combine the two in an integrated way ought to be made simple and straightforward.

What I don't understand is your objection to placing items that really only have a Commons notability, not a world notability, into a specific namespace, or (notionally) the separate database CommonsData, so that it is possible to run those queries that only relate to Commons information solely on CommonsData, and those queries that only relate to world information solely on WikiData.

Does that not make more sense, than requiring the full bulk of the combined database to always be addressed in order to run any query?

-- James.

On 01/09/2014 07:07, Gerard Meijssen wrote:Hoi, Wikidata is very much a "working database". Its relevance is exactly because of this. Without the connection to the interwiki links, it would not be the same, it would not have the coverage and it would not have the same sized community.

Considerations about secondary use are secondary. Yes, people may use it for their own purposes and when it fits their needs, well and good. When it does not, that is fine too. As it is, we do have all kind of Wiki "junk" in there. We have disambiguation pages, list articles, templates, categories. The challenge is to find a use for them.

When I add statements based on categories, I "document" many categories [1]. As a result over 900 items for categories will show the result of a query in the Reasonator. The results is what I think a category could contain given the subject of a category. For Wikipedians they are articles not categorised, red links and blue links.

There are several reasons why this is not (yet) a perfect fit. The most obvious one is including articles that are not part of the selection eg a list in a category full of humans. Currently not everything can be expressed in a way that allows Reasonator to pick things up in a query.. dates come to mind. Then there are the categories that have an "arbitrary" set of entries.

I am not going to speculate on what kind of qualifiers Commons will come up with. In essence when you can sort it / select it Wikidata will do a better job for you. The "only" thing we have to do is identify the items that fit the mold. This is something that you can often find the basis for in existing categories. Thanks, GerardM

[1] http://ultimategerardm.blogspot.nl/2014/08/wikidata- my-workflow-enriching-wikidata.html[http://ultimategerardm.blogspot.nl/ 2014/08/wikidata-my-workflow-enriching-wikidata.html]

http://tools.wmflabs.org/wikidata-todo/autolist.html?q= CLAIM%5B31%3A4167836%5D%20AND%20CLAIM%5B360%3A5%5D%20

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org[Wikidata-l@lists.wikimedia.org] https://lists.wikimedia.org/mailman/listinfo/wikidata-l___ ____________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/ mailman/listinfo/wikidata-l[https://lists.wikimedia.org/ mailman/listinfo/wikidata-l]

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

P. Blissenbach

6 Sep 6 Sep

4:48 p.m.

New subject: [Wikidata-l] Commons Categories again (was Re: Commons Wikibase)

(1) If we want to include media files not on commons, then we shall have to "include" data from foreign sources such as flickr or other types of repositories. We must do so without stealing or damaging the authority of these others. If we connect items to media linking them, or if we assign tags, labels, attributes, etc. to foreign media, or make statements involving them, we can do so of course collaboratively, but we cannot assume other communities to cooperate. Often they will, occasionally they will, not and the latter should not be a hindrance.

(2) Assuming we are incorporating labels, tags and statements (claims) made in other repositories in additioni to simple and obvious technical information, we shall have to decide about incorporating the thesaurii, tagging systems, ontlogies, or whatever they use, first.

(3) Much less complicated imho is the initial step to make files on commons and on other WMF wikis available for searches via WikiData. The goal has to be, imho, that everything we "know" already about them is to be converted into statements and made available to search queries. Since that involves reading descriptions and turning them into statements about media, we get a finer grained categorizing or tagging system than we have today. Itwill automatically become more multilingual as data grows. I currently believe that conversion from existing data has at least partially to be done semiautomatically, likely with suggestor bots, that e.g. ask questions like "Is this cat: o Black, o Brown, o White, o Tigered, ... o Not a cat at all" or "In this sample, you hear the voice of a: o Female, o Male, o Child, o Cannot tell, o Several voices, o No voice at all, ...". That would allow to add considerable volumes missing data in little time, startig from categories existing in the wikis.

(4) Searching should most of the time be a matter of making statements about what you want to find. Basic logical operations need to be availabe so as to limit unwieldy result sets, plus additional stepwise refinements. Semantic Mediawiki or Wolfram Alpha or Library Catalog Search Engines already have many of those ;-)

Purodha

"Gerard Meijssen" gerard.meijssen@gmail.com writes:

Hoi, I am really interested how you envision searching when all those topics are isolated and attached to each file.. I also am really interested to know when you have all those files isolated on Commons, how you will include media files that are NOT on Commons.. This is a normal use case. Thanks, GerardM On 3 September 2014 15:33, James Heald j.heald@ucl.ac.uk wrote:Not really relevant.

The way that this will be achieved will be a "topics" list attached to each file, each topic being a pointer to a Wikidata item.

Sure, Wikidata may be used as one of the sources to help build the topics list; but the topics list will not be on Wikidata, but attached to each file, probably on the CommonsData wikibase.

-- James.

On 03/09/2014 14:28, P. Blissenbach wrote:I strongly support this view: Wikidata should support and ease finding Commons-images. This is not only about proper categorising and tagging in a true multilingual way, but also about determining and assigning various properties - both automatically and manually.

Purodha

"Gerard Meijssen" <gerard.meijssen@gmail.com[gerard.meijssen@gmail.com]> writes:

Hoi, I am firmly opposed to the idea that the Wikidatification of Commons is about Commons. That is imho a disaster.

It is about mediafiles and they exist in many Wikis.

On 3 September 2014 12:05, James Heald <j.heald@ucl.ac.uk[j.heald@ucl.ac.uk]> wrote:Gerard,

I agree with you that I would like the kind of tools currently available with WikiData also to be available on CommonsData.

Queries that combine the two in an integrated way ought to be made simple and straightforward.

Does that not make more sense, than requiring the full bulk of the combined database to always be addressed in order to run any query?

-- James.

[1] http://ultimategerardm.blogspot.nl/2014/08/wikidata-my-workflow-enriching-wi...]

http://tools.wmflabs.org/wikidata-todo/autolist.html?q=CLAIM%5B31%3A4167836%...

_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org[Wikidata-l@lists.wikimedia.org][Wikidata-l@lists.wikimedia.org[Wikidata-l@lists.wikimedia.org]] https://lists.wikimedia.org/mailman/listinfo/wikidata-l_____________________...] Wikidata-l mailing list Wikidata-l@lists.wikimedia.org[Wikidata-l@lists.wikimedia.org] https://lists.wikimedia.org/mailman/listinfo/wikidata-l%5Bhttps://lists.wiki...]

_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org[Wikidata-l@lists.wikimedia.org] https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Gerard Meijssen

7 Sep 7 Sep

2:39 a.m.

New subject: [Wikidata-l] Commons Categories again (was Re: Commons Wikibase)

In my opinion it is silly to associate information about media files with the media file itself. The objective is to search for an image of a "horse" and every image of a "horse" should be included NEVER MIND where the file is "located". When the result is to be restricted to freely licensed images, all images should be included NEVER MIND where the file is "located".

NB I love to understand why I am wrong in this.

Thanks, GerardM

On 6 September 2014 10:48, P. Blissenbach publi@web.de wrote:

...

Hi

(1) If we want to include media files not on commons, then we shall have to "include" data from foreign sources such as flickr or other types of repositories. We must do so without stealing or damaging the authority of these others. If we connect items to media linking them, or if we assign tags, labels, attributes, etc. to foreign media, or make statements involving them, we can do so of course collaboratively, but we cannot assume other communities to cooperate. Often they will, occasionally they will, not and the latter should not be a hindrance.

(2) Assuming we are incorporating labels, tags and statements (claims) made in other repositories in additioni to simple and obvious technical information, we shall have to decide about incorporating the thesaurii, tagging systems, ontlogies, or whatever they use, first.

(3) Much less complicated imho is the initial step to make files on commons and on other WMF wikis available for searches via WikiData. The goal has to be, imho, that everything we "know" already about them is to be converted into statements and made available to search queries. Since that involves reading descriptions and turning them into statements about media, we get a finer grained categorizing or tagging system than we have today. Itwill automatically become more multilingual as data grows. I currently believe that conversion from existing data has at least partially to be done semiautomatically, likely with suggestor bots, that e.g. ask questions like "Is this cat: o Black, o Brown, o White, o Tigered, ... o Not a cat at all" or "In this sample, you hear the voice of a: o Female, o Male, o Child, o Cannot tell, o Several voices, o No voice at all, ...". That would allow to add considerable volumes missing data in little time, startig from categories existing in the wikis.

(4) Searching should most of the time be a matter of making statements about what you want to find. Basic logical operations need to be availabe so as to limit unwieldy result sets, plus additional stepwise refinements. Semantic Mediawiki or Wolfram Alpha or Library Catalog Search Engines already have many of those ;-)

Purodha

"Gerard Meijssen" gerard.meijssen@gmail.com writes:

Hoi, I am really interested how you envision searching when all those topics are isolated and attached to each file..

I also am really interested to know when you have all those files isolated on Commons, how you will include media files that are NOT on Commons.. This is a normal use case. Thanks, GerardM

On 3 September 2014 15:33, James Heald j.heald@ucl.ac.uk wrote:Not really relevant.

The way that this will be achieved will be a "topics" list attached to each file, each topic being a pointer to a Wikidata item.

Sure, Wikidata may be used as one of the sources to help build the topics list; but the topics list will not be on Wikidata, but attached to each file, probably on the CommonsData wikibase.

-- James.

On 03/09/2014 14:28, P. Blissenbach wrote:I strongly support this view: Wikidata should support and ease finding Commons-images. This is not only about proper categorising and tagging in a true multilingual way, but also about determining and assigning various properties - both automatically and manually.

Think for example like an art director creating an image flyer (be it about Wikimania, a national open source movement, or a company) looking for photograhps "predominantly blue" depicing "8 humans or more" of "various ages" in a "neutral or indeterminate environent" and so on, so as to get the hang of it.

Purodha

"Gerard Meijssen" <gerard.meijssen@gmail.com[gerard.meijssen@gmail.com]> writes:

Hoi, I am firmly opposed to the idea that the Wikidatification of Commons is about Commons. That is imho a disaster.

It is about mediafiles and they exist in many Wikis.

The categories of Commons are in and off themselves useful to a very limited extend. Associating the images they refer to with existing items in Wikidata is one way in which they may be useful. As it is, because of naming conventions and the use of English only, the categories are pretty lame. They do not help me when I am looking for an image in Commons at all.

Really my point is forget about Commons notability start thinking in terms of "what does it take to help people find images". Yes, those people will be 8 years old and they may speak Mandarin or Japanese. Thanks, GerardM

On 3 September 2014 12:05, James Heald <j.heald@ucl.ac.uk[ j.heald@ucl.ac.uk]> wrote:Gerard,

I agree with you that I would like the kind of tools currently available with WikiData also to be available on CommonsData.

Queries that combine the two in an integrated way ought to be made simple and straightforward.

What I don't understand is your objection to placing items that really only have a Commons notability, not a world notability, into a specific namespace, or (notionally) the separate database CommonsData, so that it is possible to run those queries that only relate to Commons information solely on CommonsData, and those queries that only relate to world information solely on WikiData.

Does that not make more sense, than requiring the full bulk of the combined database to always be addressed in order to run any query?

-- James.

On 01/09/2014 07:07, Gerard Meijssen wrote:Hoi, Wikidata is very much a "working database". Its relevance is exactly because of this. Without the connection to the interwiki links, it would not be the same, it would not have the coverage and it would not have the same sized community.

Considerations about secondary use are secondary. Yes, people may use it for their own purposes and when it fits their needs, well and good. When it does not, that is fine too. As it is, we do have all kind of Wiki "junk" in there. We have disambiguation pages, list articles, templates, categories. The challenge is to find a use for them.

When I add statements based on categories, I "document" many categories [1]. As a result over 900 items for categories will show the result of a query in the Reasonator. The results is what I think a category could contain given the subject of a category. For Wikipedians they are articles not categorised, red links and blue links.

There are several reasons why this is not (yet) a perfect fit. The most obvious one is including articles that are not part of the selection eg a list in a category full of humans. Currently not everything can be expressed in a way that allows Reasonator to pick things up in a query.. dates come to mind. Then there are the categories that have an "arbitrary" set of entries.

I am not going to speculate on what kind of qualifiers Commons will come up with. In essence when you can sort it / select it Wikidata will do a better job for you. The "only" thing we have to do is identify the items that fit the mold. This is something that you can often find the basis for in existing categories. Thanks, GerardM

[1]

http://ultimategerardm.blogspot.nl/2014/08/wikidata-my-workflow-enriching-wi...]

http://tools.wmflabs.org/wikidata-todo/autolist.html?q=CLAIM%5B31%3A4167836%...

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org[Wikidata-l@lists.wikimedia.org][ Wikidata-l@lists.wikimedia.org[Wikidata-l@lists.wikimedia.org]]

https://lists.wikimedia.org/mailman/listinfo/wikidata-l_____________________...] Wikidata-l mailing list Wikidata-l@lists.wikimedia.org[ Wikidata-l@lists.wikimedia.org] https://lists.wikimedia.org/mailman/listinfo/wikidata-l%5Bhttps://lists.wiki...]

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org[Wikidata-l@lists.wikimedia.org] https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org[Wikidata-l@lists.wikimedia.org]

https://lists.wikimedia.org/mailman/listinfo/wikidata-l_____________________... Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l%5Bhttps://lists.wiki...]

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

P. Blissenbach

3:32 a.m.

New subject: [Wikidata-l] Commons Categories again (was Re: Commons Wikibase)

I have no idea how we can find media without having statements on them of the kind "depicts a (some-item)" or "is an instance of (photgraph)", "taken at [Date]", etc., where (items) are represented by Q-something and [values] as usual.

Of course, from "depicts Q112015(=town musicians of Bremen)", we might infer each of "depicts a (donkey)" and "depicts a (dog)" and "depicts a (cat)" and "depicts a (cock)" and likely much more.

Having the bulk of statements on the items depicted, recorded, etc. is imho okay. Yet there may be precision applying only to specific media, such as "a _male_ voice recording of (some-literary-work)". On the long run, I believe, we should have these, too, so as to allow precise queries.

Btw., I agree that the actual location of media files should be of little concern. It is represented by an URL, that is it.

Purodha

"Gerard Meijssen" gerard.meijssen@gmail.com wrote:

Hoi,The use case I was thinking of was to include the images that exist for instance on English Wikipedia. Flickr and other repositories outside the WMF are very much out of scope as far as I am concerned. In my opinion it is silly to associate information about media files with the media file itself. The objective is to search for an image of a "horse" and every image of a "horse" should be included NEVER MIND where the file is "located". When the result is to be restricted to freely licensed images, all images should be included NEVER MIND where the file is "located". NB I love to understand why I am wrong in this.

Thanks, GerardM On 6 September 2014 10:48, P. Blissenbach publi@web.de wrote:Hi

Purodha

"Gerard Meijssen" <gerard.meijssen@gmail.com[gerard.meijssen@gmail.com]> writes:

The way that this will be achieved will be a "topics" list attached to each file, each topic being a pointer to a Wikidata item.

Sure, Wikidata may be used as one of the sources to help build the topics list; but the topics list will not be on Wikidata, but attached to each file, probably on the CommonsData wikibase.

-- James.

Purodha

"Gerard Meijssen" <gerard.meijssen@gmail.com[gerard.meijssen@gmail.com][gerard.meijssen@gmail.com[gerard.meijssen@gmail.com]]> writes:

Hoi, I am firmly opposed to the idea that the Wikidatification of Commons is about Commons. That is imho a disaster.

It is about mediafiles and they exist in many Wikis.

On 3 September 2014 12:05, James Heald <j.heald@ucl.ac.uk[j.heald@ucl.ac.uk][j.heald@ucl.ac.uk[j.heald@ucl.ac.uk]]> wrote:Gerard,

I agree with you that I would like the kind of tools currently available with WikiData also to be available on CommonsData.

Queries that combine the two in an integrated way ought to be made simple and straightforward.

Does that not make more sense, than requiring the full bulk of the combined database to always be addressed in order to run any query?

-- James.

[1]http://ultimategerardm.blogspot.nl/2014/08/wikidata-my-workflow-enriching-wi...]]

http://tools.wmflabs.org/wikidata-todo/autolist.html?q=CLAIM%5B31%3A4167836%...

_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org[Wikidata-l@lists.wikimedia.org][Wikidata-l@lists.wikimedia.org[Wikidata-l@lists.wikimedia.org]][Wikidata-l@lists.wikimedia.org[Wikidata-l@lists.wikimedia.org][Wikidata-l@lists.wikimedia.org[Wikidata-l@lists.wikimedia.org]]] https://lists.wikimedia.org/mailman/listinfo/wikidata-l_____________________...]] Wikidata-l mailing list Wikidata-l@lists.wikimedia.org[Wikidata-l@lists.wikimedia.org][Wikidata-l@lists.wikimedia.org[Wikidata-l@lists.wikimedia.org]] https://lists.wikimedia.org/mailman/listinfo/wikidata-l%5Bhttps://lists.wiki...]]

_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org[Wikidata-l@lists.wikimedia.org][Wikidata-l@lists.wikimedia.org[Wikidata-l@lists.wikimedia.org]]

https://lists.wikimedia.org/mailman/listinfo/wikidata-l%5Bhttps://lists.wiki...]

_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org[Wikidata-l@lists.wikimedia.org][Wikidata-l@lists.wikimedia.org[Wikidata-l@lists.wikimedia.org]] https://lists.wikimedia.org/mailman/listinfo/wikidata-l_____________________...] Wikidata-l mailing list Wikidata-l@lists.wikimedia.org[Wikidata-l@lists.wikimedia.org] https://lists.wikimedia.org/mailman/listinfo/wikidata-l%5Bhttps://lists.wiki...]

James Heald

14 Sep 14 Sep

12:14 a.m.

New subject: [Wikidata-l] Commons file-topic searching and storage (was Re: Commons Categories again)

Hey, all!

This new post is to respond to various points from GerardM and P. Blissenbach, previously responding to me on wikidata-l. I'm also cross-posting it to multimedia-l, who no doubt will be able to put me straight about lots of things.

In particular, where will "topics" to be associated with image files be stored, and how will they be searched ?

* Where will topics be stored ? *

On the question of where the list of topics will be stored, the initial thoughts of the Structured Data team would seem to be clear: they are to be stored on the new CommonsData wikibase.

See eg: https://commons.wikimedia.org/w/index.php?title=File%3AStructured_Data_-_Sli... ("topic links") https://docs.google.com/document/d/1tzwGtXRyK3o2ZEfc85RJ978znRdrf9EkqdJ0zVjm... (API design and class diagram)

* How would topics be searched ? *

Gerard wrote:

...

I am really interested how you envision searching when all those topics are isolated and attached to each file..

The trite answer is: in the same way you would search any other database -- by setting an index.

It should be very very simple to pull the identities of all files on CommonsData related to topic Qnnnnn.

* Why not store information with the Q-items on WikiData, regarding what files are related ? *

One could do this. Essentially what we have here is a many-many join. Each file can have many topics. Each topic can have many files. So the classic relational approach would be a separate join table.

Moving the information out of main Wikidata makes Wikidata smaller and leaner to query, particularly for queries that simply aren't interested in images.

As to whether you really do have a join table, or whether you just consider it all part of CommonsData, that's really up to the developers.

* What about the natural hierarchical structure ? *

Leonardo da Vinci --> Mona Lisa --> --> Files depicting the Mona Lisa

Shouldn't the fact that it was Leonardo that painted the Mona Lisa only be stored in one place, on Mona Lisa, (or perhaps on Leonardo); but *not* multiple times, separately on every single depiction file?

*A*: Probably not, for several reasons.

Trying to find things (and also, to accurately represent things) in a hierarchical structure is the bane of Commons at the moment; it also makes searching Wikidata significantly non-trivial.

So the most significant reason is retrieval.

Suppose we have an image with topic "Gloucestershire Old Spot" (a breed of pig). We also want to be able to retrieve the image rapidly if somebody keys in "Pig".

Similarly, if we have an image of the "Mona Lisa", we also want it to be in the set of images with somebody keying in "Leonardo"

For simple searches, one could image walking down the wikidata tree from "Pig" or from "Leonardo", compiling a list of derived search terms, and then building a union set of hits. Slightly more cumbersome than just pulling everything tagged "Pig" from a relational database, but not so different from what WDQ manages.

However, suppose one is combining "pig" and "country house", does one then have to go down the tree to first identify every single country house, and unify the hits for each one of those searches, before computing the intersection with "pig" ? Or does one instead simply go through the hitset for "pig" and see if it is also tagged "country house" ?

Now it's not a bad idea to identify "lead topics" and "implied topics" associated with an image. Each time a new topic was added to an image, one would want a lookup to be made on Wikidata and a list of implied topics also to be added. Similarly if a topic identified as a "lead topic" was changed (eg perhaps a country house had been mis-identified), one would also want the list of implied topics to be updated (eg what county it was in, which family it was associated with, etc).

Also the system would need to be looking out for relevant changes on Wikidata -- eg if as a result of a new claim being added ("Gloucestershire Old Spot is a type of Pig"), what was previously an independent lead topic "Pig" might become an implied topic.

Similarly, if something in the chain of implications was changed, the consequences of that change would need to be reflected (eg if a parish that the country house was in had been assigned to the wrong county; or a work that the work was derivative of had been assigned to the wrong painter).

Having to monitor such things is the price of denormalisation.

The question one has to ask is what is more troublesome: having to propagate changes like this to multiple places in a denormalised structure where multiple copies of the same information need to be present (which can be done in quite a lazy background way); or, alternatively, having to navigate the normalised structure every time a user wants to build a results set, an overhead which directly affects the speed at which the user can be returned those results ?

* How will searching by users likely be done in practice ? *

A classic approach in combinatorial searching is to give the user an initial set of hits, and then encourage them to refine that set.

This implies, on the basis of the current query and hit-set, trying to identify the best refinement options to offer them.

There may be classic properties like location and time-period. Or there may be tags that can be identified as particularly rich in the return set. Or properties which those tags are the values of that are particularly rich in the return set.

But a really classic approach in image searching is simpler than that.

It simply shows a random selection of images from the current hit set, lets the user reveal the tags that are associated with any one of them, and then lets the user add one of those tags to the user's query.

This is how, in the first instance, I would expect an image search on topics to be first implemented -- because it's such a well-known technique, often works so well, and is so (comparatively) straightforward to implement.

So that's why (IMO) the ability to refine searches by adding another topic needs to be so fast and responsive. In terms of design, this is the optimisation that will affect user experience.

* What about images stored on local language wikis? *

Gerard wrote:

...

I also am really interested to know when you have all those files isolated on Commons, how you will include media files that are NOT on Commons.. This is a normal use case.

The project is called Structured Data for Commons, and the wikibase being built for it is quite often being called CommonsData.

But it seems to me there is no particular reason why it should not be straightforward to roll out essentially the same structure to local language wikis as well.

I would have thought it would be fairly easy to then implement a federated search, that finds all files matching these criteria on *either* Commons *or* en-wiki (say).

Would one actually implement that all in one wikibase (ImagesData, say, rather than CommonsData) ? That's a call I'd leave to the experts.

On the one hand, it probably would make it easy to search for all files matching the criteria on *any* wiki.

What I suspect is more likely, and probably makes more sense, is to converge the images themselves to all live in one place. So if the same fair-use image was used on multiple fair-use wikis, it would only be stored once (though each fair-use wiki would retain it's own File page for it). Such a structure should also make transfers to Commons much easier -- compared to the copy-and-paste by bot at the moment, which loses all the file-page history and most of the upload history.

But there are blockers in the way of that at the moment -- in particular, blockers that need to be addressed for image patrollers and fair-use enforcement specialists still to be able to do their job in such a set-up. To start with there are lots of tools they use, that at the moment only run on one wiki but would need to effectively run on two (or perhaps, the fact that it was two wikis would need to be hidden). They would need equivalent admin and deletion rights on both xx-wiki and the xx partition of Images wiki. Ideally they would be able to see changes to the two on the same watchlist. etc etc.

So it may be some time before running the same image search across all wikis can be supported by the system itself. But it will surely be supported through middleware sooner than that.

So that's some thoughts (or maybe some mis-thoughts) about file-topic searching and storage.

Now, tell me what I've got wrong. :-)

All best,

James.

Jan Ainali

1:15 a.m.

New subject: [Wikidata-l] [Multimedia] Commons file-topic searching and storage (was Re: Commons Categories again)

Just answering a few bits now.

2014-09-13 18:14 GMT+02:00 James Heald j.heald@ucl.ac.uk:

...

Where will topics be stored ? *

On the question of where the list of topics will be stored, the initial thoughts of the Structured Data team would seem to be clear: they are to be stored on the new CommonsData wikibase.

See eg: https://commons.wikimedia.org/w/index.php?title=File% 3AStructured_Data_-_Slides.pdf&page=17 ("topic links") https://docs.google.com/document/d/1tzwGtXRyK3o2ZEfc85RJ978znRdrf 9EkqdJ0zVjmQqs (API design and class diagram)

I read that as topics will be stored on Wikidata. That is, on Commons, you say that file DouglasAdams.jpg is about topic Q42, which is referring to an object on Wikidata. Everything about Q42 is stored on Wikidata.

* What about the natural hierarchical structure ? *

...

eg

Leonardo da Vinci --> Mona Lisa --> --> Files depicting the Mona Lisa

Shouldn't the fact that it was Leonardo that painted the Mona Lisa only be stored in one place, on Mona Lisa, (or perhaps on Leonardo); but *not* multiple times, separately on every single depiction file?

*A*: Probably not, for several reasons.

If the topics are on Wikidata, you will have this "for free", meaning that the hierarchy is already there, ready to be exploited.

/Jan Ainali

James Heald

2:15 a.m.

New subject: [Wikidata-l] [Multimedia] Commons file-topic searching and storage (was Re: Commons Categories again)

On 13/09/2014 18:15, Jan Ainali wrote:

...

2014-09-13 18:14 GMT+02:00 James Heald j.heald@ucl.ac.uk:

...

...

Where will topics be stored ? *

On the question of where the list of topics will be stored, the initial thoughts of the Structured Data team would seem to be clear: they are to be stored on the new CommonsData wikibase.

See eg: https://commons.wikimedia.org/w/index.php?title=File% 3AStructured_Data_-_Slides.pdf&page=17 ("topic links")

I read that as topics will be stored on Wikidata. That is, on Commons, you say that file DouglasAdams.jpg is about topic Q42, which is referring to an object on Wikidata. Everything about Q42 is stored on Wikidata.

Yes, I imagine you would store say

Q42182 (pointing to Buckingham palace), probably with P180 ("depicts" - as opposed to "signature of", or "chemical structure for"

But I suspect you would also store eg

Q16560 (palace) etc; even though this is implied by Buckingham Palace

...

What about the natural hierarchical structure ? *

...
eg

Leonardo da Vinci --> Mona Lisa --> --> Files depicting the Mona Lisa

Shouldn't the fact that it was Leonardo that painted the Mona Lisa only be stored in one place, on Mona Lisa, (or perhaps on Leonardo); but *not* multiple times, separately on every single depiction file?

*A*: Probably not, for several reasons.

If the topics are on Wikidata, you will have this "for free", meaning that the hierarchy is already there, ready to be exploited.

Yes the hierarchy is there, ready to be exploited.

But exploiting it costs time.

The point I'm making in my post is that, especially when the user request is a combination search on two quite general topics, you don't want to be hanging around *waiting* while the system works out how to exploit it.

Instead you want the answer then and there -- and that means denormalisation.

-- James.

Jan Ainali

2:44 a.m.

New subject: [Wikidata-l] [Multimedia] Commons file-topic searching and storage (was Re: Commons Categories again)

2014-09-13 20:15 GMT+02:00 James Heald j.heald@ucl.ac.uk:

...

On 13/09/2014 18:15, Jan Ainali wrote:

...
2014-09-13 18:14 GMT+02:00 James Heald j.heald@ucl.ac.uk:

Where will topics be stored ? *

...
...
On the question of where the list of topics will be stored, the initial thoughts of the Structured Data team would seem to be clear: they are to be stored on the new CommonsData wikibase.

See eg: https://commons.wikimedia.org/w/index.php?title=File% 3AStructured_Data_-_Slides.pdf&page=17 ("topic links")

I read that as topics will be stored on Wikidata. That is, on Commons, you say that file DouglasAdams.jpg is about topic Q42, which is referring to an object on Wikidata. Everything about Q42 is stored on Wikidata.

Yes, I imagine you would store say

Q42182 (pointing to Buckingham palace), probably with P180 ("depicts" - as opposed to "signature of", or "chemical structure for"

But I suspect you would also store eg

Q16560 (palace) etc; even though this is implied by Buckingham Palace

...

What about the natural hierarchical structure ? *

...
eg

Leonardo da Vinci --> Mona Lisa --> --> Files depicting the Mona Lisa

Shouldn't the fact that it was Leonardo that painted the Mona Lisa only be stored in one place, on Mona Lisa, (or perhaps on Leonardo); but *not* multiple times, separately on every single depiction file?

*A*: Probably not, for several reasons.

If the topics are on Wikidata, you will have this "for free", meaning that the hierarchy is already there, ready to be exploited.

Yes the hierarchy is there, ready to be exploited.

But exploiting it costs time.

The point I'm making in my post is that, especially when the user request is a combination search on two quite general topics, you don't want to be hanging around *waiting* while the system works out how to exploit it.

Instead you want the answer then and there -- and that means denormalisation.

Let the ops worry about time, I have not heard them complain about a search dystopia yet. Even the Wiki Data Query has reasonable response time compairing to the power it offers in the queries. And that is on wmflabs, not a production server. You're saying that even when we make the effort to get structured linked data we should not exploit the single most important advantage it offers. It does not make sense. It almost like just repeating the category sysem again but with another software (albeit it offers multilinguality).

/Jan

James Heald

3:51 a.m.

New subject: [Wikidata-l] [Multimedia] Commons file-topic searching and storage (was Re: Commons Categories again)

"Let the ops worry about time" is not an answer.

We're talking about the something we're hoping to turn into a world-class mass-use image bank, and its front-line public-facing search capability.

That's on an altogether different scale to WDQ running a few hundred searches a day.

Moreover, we're talking about a public-facing search capability, where you're user clicks a tag and they want an updated results set *instantly* -- their sitting around while the server makes a cup of tea, or declares the query is too complex and goes into a sulk is not an option.

If the user wants a search on "palace" and "soldier", there simply is not time for the server to first recursively build a list of every palace it knows about, then every image related to each of those palaces, then every soldier it knows about, every image related to each of those soldiers, then intersect the two (very big) lists before it can start delivering any image hits at all. That is not acceptable. A random internet user wants those hits straight away.

The only way to routinely be able to deliver that is denormalisation.

It's not a question of just buying some more blades and filling up some more racks. That doesn't get you a big enough factor of speedup.

What we have is a design challenge, which needs a design solution.

-- James.

...

Let the ops worry about time, I have not heard them complain about a search dystopia yet. Even the Wiki Data Query has reasonable response time compairing to the power it offers in the queries. And that is on wmflabs, not a production server. You're saying that even when we make the effort to get structured linked data we should not exploit the single most important advantage it offers. It does not make sense. It almost like just repeating the category sysem again but with another software (albeit it offers multilinguality).

/Jan

Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia

Thomas Douillard

3:56 a.m.

New subject: [Wikidata-l] [Multimedia] Commons file-topic searching and storage (was Re: Commons Categories again)

Hi James, I don't understand (I must admit I did not read the whole topic). Are we talking about a specific query engine ? The one the development team will implement in Wikibase, or are we talking of something else ?

If we do not know that, I seems difficult to have this conversation at that point.

2014-09-13 21:51 GMT+02:00 James Heald j.heald@ucl.ac.uk:

...

"Let the ops worry about time" is not an answer.

We're talking about the something we're hoping to turn into a world-class mass-use image bank, and its front-line public-facing search capability.

That's on an altogether different scale to WDQ running a few hundred searches a day.

Moreover, we're talking about a public-facing search capability, where you're user clicks a tag and they want an updated results set *instantly* -- their sitting around while the server makes a cup of tea, or declares the query is too complex and goes into a sulk is not an option.

If the user wants a search on "palace" and "soldier", there simply is not time for the server to first recursively build a list of every palace it knows about, then every image related to each of those palaces, then every soldier it knows about, every image related to each of those soldiers, then intersect the two (very big) lists before it can start delivering any image hits at all. That is not acceptable. A random internet user wants those hits straight away.

The only way to routinely be able to deliver that is denormalisation.

It's not a question of just buying some more blades and filling up some more racks. That doesn't get you a big enough factor of speedup.

What we have is a design challenge, which needs a design solution.

-- James.

...
Let the ops worry about time, I have not heard them complain about a search dystopia yet. Even the Wiki Data Query has reasonable response time compairing to the power it offers in the queries. And that is on wmflabs, not a production server. You're saying that even when we make the effort to get structured linked data we should not exploit the single most important advantage it offers. It does not make sense. It almost like just repeating the category sysem again but with another software (albeit it offers multilinguality).

/Jan

Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

James Heald

6:14 a.m.

New subject: [Wikidata-l] [Multimedia] Commons file-topic searching and storage (was Re: Commons Categories again)

Hi Thomas,

I'm not really talking about the specific query *engine* that will work on the file topic data. (Well, maybe a little, in general terms about some of the functionality we might want in such a search).

What I'm more talking about is the kind of data that will likely need to stored on the CommonsData wikibase to make any kind of such query engine *possible* with reasonable speed -- in particular not just the most specific Q-numbers that apply to a file, but (IMO) *any* Q-number that the file should be returned from if the topic corresponding to that Q-number was searched for.

I'm saying that such a Q-number needs to be included on the item on CommonsData for the file -- it's not enough that if used Wikidata to look up the more specific Q-number, then the less specific Q-number would be returned: I'm saying that lookup already needs to have been done (and maintained), so the less specific Q-number is already sitting on CommonsData when someone comes to search for it.

This doesn't need to be a manual process (though the presence of a Q-number on a CommonsData item perhaps needs to subject to manual overrule, in case the inference chain has gone wrong, and it really isn't relevant); but what I'm saying is that you can't wait to do the inference when the search request comes in -- instead the relevant Q-numbers for each file need to be pre-computed, and stored on the CommonsData item, so that when the search request comes in, they are already there to be searched on. That denormalisation of information really needs to be in place whatever the fine coding of the engine -- it's data design, rather than engine coding.

-- James.

On 13/09/2014 20:56, Thomas Douillard wrote:

...

Hi James, I don't understand (I must admit I did not read the whole topic). Are we talking about a specific query engine ? The one the development team will implement in Wikibase, or are we talking of something else ?

If we do not know that, I seems difficult to have this conversation at that point.

2014-09-13 21:51 GMT+02:00 James Heald j.heald@ucl.ac.uk:

...
"Let the ops worry about time" is not an answer.

We're talking about the something we're hoping to turn into a world-class mass-use image bank, and its front-line public-facing search capability.

That's on an altogether different scale to WDQ running a few hundred searches a day.

Moreover, we're talking about a public-facing search capability, where you're user clicks a tag and they want an updated results set *instantly* -- their sitting around while the server makes a cup of tea, or declares the query is too complex and goes into a sulk is not an option.

If the user wants a search on "palace" and "soldier", there simply is not time for the server to first recursively build a list of every palace it knows about, then every image related to each of those palaces, then every soldier it knows about, every image related to each of those soldiers, then intersect the two (very big) lists before it can start delivering any image hits at all. That is not acceptable. A random internet user wants those hits straight away.

The only way to routinely be able to deliver that is denormalisation.

It's not a question of just buying some more blades and filling up some more racks. That doesn't get you a big enough factor of speedup.

What we have is a design challenge, which needs a design solution.

-- James.

...
Let the ops worry about time, I have not heard them complain about a search dystopia yet. Even the Wiki Data Query has reasonable response time compairing to the power it offers in the queries. And that is on wmflabs, not a production server. You're saying that even when we make the effort to get structured linked data we should not exploit the single most important advantage it offers. It does not make sense. It almost like just repeating the category sysem again but with another software (albeit it offers multilinguality).

/Jan

Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

P. Blissenbach

6:36 a.m.

New subject: [Wikidata-l] [Multimedia] Commons file-topic searching and storage

A very simplified version of the post:

We need a query result cache on the tag or item level? That is a design question, and the answer is: Yes.

If we have it, we can and should pre-fill it with server generated data for all kinds of queries that weren't received and executed yet.

Whether we assign idle resoources or an enire high performace server cluser to it, is an ops question.

Purodha

"James Heald" j.heald@ucl.ac.uk wrote:

...

Hi Thomas,

I'm not really talking about the specific query *engine* that will work on the file topic data. (Well, maybe a little, in general terms about some of the functionality we might want in such a search).

What I'm more talking about is the kind of data that will likely need to stored on the CommonsData wikibase to make any kind of such query engine *possible* with reasonable speed -- in particular not just the most specific Q-numbers that apply to a file, but (IMO) *any* Q-number that the file should be returned from if the topic corresponding to that Q-number was searched for.

I'm saying that such a Q-number needs to be included on the item on CommonsData for the file -- it's not enough that if used Wikidata to look up the more specific Q-number, then the less specific Q-number would be returned: I'm saying that lookup already needs to have been done (and maintained), so the less specific Q-number is already sitting on CommonsData when someone comes to search for it.

This doesn't need to be a manual process (though the presence of a Q-number on a CommonsData item perhaps needs to subject to manual overrule, in case the inference chain has gone wrong, and it really isn't relevant); but what I'm saying is that you can't wait to do the inference when the search request comes in -- instead the relevant Q-numbers for each file need to be pre-computed, and stored on the CommonsData item, so that when the search request comes in, they are already there to be searched on. That denormalisation of information really needs to be in place whatever the fine coding of the engine -- it's data design, rather than engine coding.

-- James.

On 13/09/2014 20:56, Thomas Douillard wrote:

...
Hi James, I don't understand (I must admit I did not read the whole topic). Are we talking about a specific query engine ? The one the development team will implement in Wikibase, or are we talking of something else ?

If we do not know that, I seems difficult to have this conversation at that point.

2014-09-13 21:51 GMT+02:00 James Heald j.heald@ucl.ac.uk:

...
"Let the ops worry about time" is not an answer.

We're talking about the something we're hoping to turn into a world-class mass-use image bank, and its front-line public-facing search capability.

That's on an altogether different scale to WDQ running a few hundred searches a day.

Moreover, we're talking about a public-facing search capability, where you're user clicks a tag and they want an updated results set *instantly* -- their sitting around while the server makes a cup of tea, or declares the query is too complex and goes into a sulk is not an option.

If the user wants a search on "palace" and "soldier", there simply is not time for the server to first recursively build a list of every palace it knows about, then every image related to each of those palaces, then every soldier it knows about, every image related to each of those soldiers, then intersect the two (very big) lists before it can start delivering any image hits at all. That is not acceptable. A random internet user wants those hits straight away.

The only way to routinely be able to deliver that is denormalisation.

It's not a question of just buying some more blades and filling up some more racks. That doesn't get you a big enough factor of speedup.

What we have is a design challenge, which needs a design solution.

-- James.

...
Let the ops worry about time, I have not heard them complain about a search dystopia yet. Even the Wiki Data Query has reasonable response time compairing to the power it offers in the queries. And that is on wmflabs, not a production server. You're saying that even when we make the effort to get structured linked data we should not exploit the single most important advantage it offers. It does not make sense. It almost like just repeating the category sysem again but with another software (albeit it offers multilinguality).

/Jan

Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Thomas Douillard

5:43 p.m.

New subject: [Wikidata-l] [Multimedia] Commons file-topic searching and storage (was Re: Commons Categories again)

You should look at the dev team plans about the query engine. Queries will be associated to a query item, and the results of the query will be cached and maintained by the Wikibase software as the datas will be modified, if I understand well.

So this discussion will make sense when we will know how powerful the query engine will be.

Otherwise we are talking into the void. Which "norm" are we using, what should we denormalize ? According to which rules ? To optimize exactly what ?

If it's just the parent classes, Reasonator already does that, and templates like {{Item documentation}} or {{classification}} as well on Wikidata. Without any "denormalization". For an example see https://tools.wmflabs.org/reasonator/?&q=1638134 or the heading of https://www.wikidata.org/wiki/Talk:Q5 for an example of item doc

2014-09-14 0:14 GMT+02:00 James Heald j.heald@ucl.ac.uk:

...

Hi Thomas,

I'm not really talking about the specific query *engine* that will work on the file topic data. (Well, maybe a little, in general terms about some of the functionality we might want in such a search).

What I'm more talking about is the kind of data that will likely need to stored on the CommonsData wikibase to make any kind of such query engine *possible* with reasonable speed -- in particular not just the most specific Q-numbers that apply to a file, but (IMO) *any* Q-number that the file should be returned from if the topic corresponding to that Q-number was searched for.

I'm saying that such a Q-number needs to be included on the item on CommonsData for the file -- it's not enough that if used Wikidata to look up the more specific Q-number, then the less specific Q-number would be returned: I'm saying that lookup already needs to have been done (and maintained), so the less specific Q-number is already sitting on CommonsData when someone comes to search for it.

This doesn't need to be a manual process (though the presence of a Q-number on a CommonsData item perhaps needs to subject to manual overrule, in case the inference chain has gone wrong, and it really isn't relevant); but what I'm saying is that you can't wait to do the inference when the search request comes in -- instead the relevant Q-numbers for each file need to be pre-computed, and stored on the CommonsData item, so that when the search request comes in, they are already there to be searched on. That denormalisation of information really needs to be in place whatever the fine coding of the engine -- it's data design, rather than engine coding.

-- James.

On 13/09/2014 20:56, Thomas Douillard wrote:

...
Hi James, I don't understand (I must admit I did not read the whole topic). Are we talking about a specific query engine ? The one the development team will implement in Wikibase, or are we talking of something else ?

If we do not know that, I seems difficult to have this conversation at that point.

2014-09-13 21:51 GMT+02:00 James Heald j.heald@ucl.ac.uk:

"Let the ops worry about time" is not an answer.

...
We're talking about the something we're hoping to turn into a world-class mass-use image bank, and its front-line public-facing search capability.

That's on an altogether different scale to WDQ running a few hundred searches a day.

Moreover, we're talking about a public-facing search capability, where you're user clicks a tag and they want an updated results set *instantly* -- their sitting around while the server makes a cup of tea, or declares the query is too complex and goes into a sulk is not an option.

If the user wants a search on "palace" and "soldier", there simply is not time for the server to first recursively build a list of every palace it knows about, then every image related to each of those palaces, then every soldier it knows about, every image related to each of those soldiers, then intersect the two (very big) lists before it can start delivering any image hits at all. That is not acceptable. A random internet user wants those hits straight away.

The only way to routinely be able to deliver that is denormalisation.

It's not a question of just buying some more blades and filling up some more racks. That doesn't get you a big enough factor of speedup.

What we have is a design challenge, which needs a design solution.

-- James.

Let the ops worry about time, I have not heard them complain about a

...
search dystopia yet. Even the Wiki Data Query has reasonable response time compairing to the power it offers in the queries. And that is on wmflabs, not a production server. You're saying that even when we make the effort to get structured linked data we should not exploit the single most important advantage it offers. It does not make sense. It almost like just repeating the category sysem again but with another software (albeit it offers multilinguality).

/Jan

Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

P. Blissenbach

15 Sep 15 Sep

4:01 a.m.

New subject: [Wikidata-l] [Multimedia] Commons file-topic searching and storage (was Re: Commons Categories again)

"Thomas Douillard" thomas.douillard@gmail.com werites:

...

Which "norm" are we using, what should we denormalize ? According to which rules ?

We're talking aboun normalized databases vs. ones which aren't. See: http://en.wikipedia.org/wiki/Database_normalization and http://en.wikipedia.org/wiki/Denormalization

Purodha

Thomas Douillard

5:57 p.m.

New subject: [Wikidata-l] [Multimedia] Commons file-topic searching and storage (was Re: Commons Categories again)

Thanks for the link but I wanted some more specific answers :)

There has already been discussions about data redundancies on this mailing list, the that I'm confident we can think as relevant in Wikidata are : * Class membership, as subclass of is a transitive property and instance of is defined using subclass of * inverse relations

Is it of what we are talking about ?

2014-09-14 22:01 GMT+02:00 P. Blissenbach publi@web.de:

...

"Thomas Douillard" thomas.douillard@gmail.com werites:

...
Which "norm" are we using, what should we denormalize ? According to

which rules ?

We're talking aboun normalized databases vs. ones which aren't. See: http://en.wikipedia.org/wiki/Database_normalization and http://en.wikipedia.org/wiki/Denormalization

Purodha

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Lydia Pintscher

9:38 p.m.

New subject: [Wikidata-l] [Multimedia] Commons file-topic searching and storage (was Re: Commons Categories again)

On Sun, Sep 14, 2014 at 11:43 AM, Thomas Douillard thomas.douillard@gmail.com wrote:

...

You should look at the dev team plans about the query engine. Queries will be associated to a query item, and the results of the query will be cached and maintained by the Wikibase software as the datas will be modified, if I understand well.

Yes. That is what we call "complex queries". The drawback of them (at least as they are designed right now) is that they are not instant. So we cannot use that for searching images on Commons as you will always want an instant result for a pretty much arbitrary search. This is an issue I've been thinking about for a while now and that we will definitely need to find an answer to. Discussions to have in the next weeks...

Cheers Lydia

Jan Ainali

14 Sep 14 Sep

4:02 a.m.

New subject: [Wikidata-l] [Multimedia] Commons file-topic searching and storage (was Re: Commons Categories again)

2014-09-13 21:51 GMT+02:00 James Heald j.heald@ucl.ac.uk:

...

What we have is a design challenge, which needs a design solution.

And I am just saying that you are jumping to conclusions, because there is no evidence saying that the servers that would be set up to handle this would not be able to handle the load.

Before restricting ourselves from the functionality that we really really would like to have, and avoid replicating the manual labour that we hate and want to get away from, could we please let the ops chip in?

/Jan

P. Blissenbach

5:40 a.m.

New subject: [Wikidata-l] Commons file-topic searching and storage

Just a word of caution about collecting all images in commons. A while ago, at least, some local wikis had images with license terms incompatible with commons and vice versa. I recall very simple logos of companies, and several types of "fair use" derivatives.

If that is still so, we have an obstacle that may prevent us from both moving images, and even linking to them under some local laws.

Technically, I agree with the idea quoted below.

Purodha

"James Heald" j.heald@ucl.ac.uk wrote:

...

What I suspect is more likely, and probably makes more sense, is to converge the images themselves to all live in one place. So if the same fair-use image was used on multiple fair-use wikis, it would only be stored once (though each fair-use wiki would retain it's own File page for it). Such a structure should also make transfers to Commons much easier -- compared to the copy-and-paste by bot at the moment, which loses all the file-page history and most of the upload history.

James Heald

5:53 a.m.

New subject: [Wikidata-l] Commons file-topic searching and storage

Yes.

Just to be clear, if we did "converge all the images to live in one place", I am not suggesting they would all be free, and I'm not suggesting they would all belong to Commons.

Just that they would all physically live in the same integrated structure; but one that would still appear to the external browser to have different 'partitions', corresponding to the different language wikis, each with a different base url.

(But inside the server all part of one integrated system, making it easy to move a file from a national partition to the Commons partition, or vice-versa -- *if* that was legally appropriate).

-- James.

On 13/09/2014 22:40, P. Blissenbach wrote:

...

Just a word of caution about collecting all images in commons. A while ago, at least, some local wikis had images with license terms incompatible with commons and vice versa. I recall very simple logos of companies, and several types of "fair use" derivatives.

If that is still so, we have an obstacle that may prevent us from both moving images, and even linking to them under some local laws.

Technically, I agree with the idea quoted below.

Purodha

"James Heald" j.heald@ucl.ac.uk wrote:

...
What I suspect is more likely, and probably makes more sense, is to converge the images themselves to all live in one place. So if the same fair-use image was used on multiple fair-use wikis, it would only be stored once (though each fair-use wiki would retain it's own File page for it). Such a structure should also make transfers to Commons much easier -- compared to the copy-and-paste by bot at the moment, which loses all the file-page history and most of the upload history.

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Gerard Meijssen

11:25 p.m.

New subject: [Wikidata-l] Commons file-topic searching and storage

Hoi, Incompatible how ? The fact that some wikis allow for licenses that Commons does not allow for does NOT make them incompatible. It means that they use licenses in addition to Commons.. Technically that is no big deal at all. Thanks, GerardM

On 13 September 2014 23:40, P. Blissenbach publi@web.de wrote:

...

Just a word of caution about collecting all images in commons. A while ago, at least, some local wikis had images with license terms incompatible with commons and vice versa. I recall very simple logos of companies, and several types of "fair use" derivatives.

If that is still so, we have an obstacle that may prevent us from both moving images, and even linking to them under some local laws.

Technically, I agree with the idea quoted below.

Purodha

"James Heald" j.heald@ucl.ac.uk wrote:

...
What I suspect is more likely, and probably makes more sense, is to converge the images themselves to all live in one place. So if the same fair-use image was used on multiple fair-use wikis, it would only be stored once (though each fair-use wiki would retain it's own File page for it). Such a structure should also make transfers to Commons much easier -- compared to the copy-and-paste by bot at the moment, which loses all the file-page history and most of the upload history.

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Jan Dudík

11:55 p.m.

New subject: [Wikidata-l] Commons file-topic searching and storage

Problem is, when somebody translate article from en.wiki and copy all images, it will display even if they have "incompatibile licence" - and who will check it? And there would be many problems with some people which will not agree with deleting these images from articles. Soulition would be, if there will be some table of wikis which do not allow such images - and servers will not dispaly this images on these wikis.

JAnD --- Ing. Jan Dudík projekce dopravních staveb tel. 777082195

2014-09-14 17:25 GMT+02:00 Gerard Meijssen gerard.meijssen@gmail.com:

...

Hoi, Incompatible how ? The fact that some wikis allow for licenses that Commons does not allow for does NOT make them incompatible. It means that they use licenses in addition to Commons.. Technically that is no big deal at all. Thanks, GerardM

On 13 September 2014 23:40, P. Blissenbach publi@web.de wrote:

...
Just a word of caution about collecting all images in commons. A while ago, at least, some local wikis had images with license terms incompatible with commons and vice versa. I recall very simple logos of companies, and several types of "fair use" derivatives.

If that is still so, we have an obstacle that may prevent us from both moving images, and even linking to them under some local laws.

Technically, I agree with the idea quoted below.

Purodha

"James Heald" j.heald@ucl.ac.uk wrote:

...
What I suspect is more likely, and probably makes more sense, is to converge the images themselves to all live in one place. So if the same fair-use image was used on multiple fair-use wikis, it would only be stored once (though each fair-use wiki would retain it's own File page for it). Such a structure should also make transfers to Commons much easier -- compared to the copy-and-paste by bot at the moment, which loses all the file-page history and most of the upload history.

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Gerard Meijssen

15 Sep 15 Sep

12:18 a.m.

New subject: [Wikidata-l] Commons file-topic searching and storage

Hoi, Why would it ? A wiki would have a list of permissible licenses. That has nothing to do with Commons and everything with standardising licenses so that there is only one for each license. Thanks, GerardM

On 14 September 2014 17:55, Jan Dudík jan.dudik@gmail.com wrote:

...

Problem is, when somebody translate article from en.wiki and copy all images, it will display even if they have "incompatibile licence" - and who will check it? And there would be many problems with some people which will not agree with deleting these images from articles. Soulition would be, if there will be some table of wikis which do not allow such images - and servers will not dispaly this images on these wikis.

JAnD

Ing. Jan Dudík projekce dopravních staveb tel. 777082195

2014-09-14 17:25 GMT+02:00 Gerard Meijssen gerard.meijssen@gmail.com:

...
Hoi, Incompatible how ? The fact that some wikis allow for licenses that

Commons

...
does not allow for does NOT make them incompatible. It means that they

use

...
licenses in addition to Commons.. Technically that is no big deal at all. Thanks, GerardM

On 13 September 2014 23:40, P. Blissenbach publi@web.de wrote:

...
Just a word of caution about collecting all images in commons. A while ago, at least, some local wikis had images with license terms incompatible with commons and vice versa. I recall very simple logos of companies, and several types of "fair use" derivatives.

If that is still so, we have an obstacle that may prevent us from both moving images, and even linking to them under some local laws.

Technically, I agree with the idea quoted below.

Purodha

"James Heald" j.heald@ucl.ac.uk wrote:

...
What I suspect is more likely, and probably makes more sense, is to converge the images themselves to all live in one place. So if the

same

...
...
...
fair-use image was used on multiple fair-use wikis, it would only be stored once (though each fair-use wiki would retain it's own File page for it). Such a structure should also make transfers to Commons much easier -- compared to the copy-and-paste by bot at the moment, which loses all the file-page history and most of the upload history.

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Joe Filceolaire

2:21 a.m.

New subject: [Wikidata-l] Commons file-topic searching and storage

Except that the problem isn't incompatible licenses; its lack of licenses.

Most pix uploaded to wikipedias have no license. They are there under fair use rationales which are specific to each use and to the laws which apply in countries using that language. These pix are not free to reuse. Each reuse needs a new fair use rationale to justify it.

That is why I think we should limit commonsdata to files on commons, at least for now.

Joe

On 14 Sep 2014 17:19, "Gerard Meijssen" gerard.meijssen@gmail.com wrote:

...

Hoi, Why would it ? A wiki would have a list of permissible licenses. That has nothing to do with Commons and everything with standardising licenses so that there is only one for each license. Thanks, GerardM

On 14 September 2014 17:55, Jan Dudík jan.dudik@gmail.com wrote:

...
Problem is, when somebody translate article from en.wiki and copy all images, it will display even if they have "incompatibile licence" - and who will check it? And there would be many problems with some people which will not agree with deleting these images from articles. Soulition would be, if there will be some table of wikis which do not allow such images - and servers will not dispaly this images on these wikis.

JAnD

Ing. Jan Dudík projekce dopravních staveb tel. 777082195

2014-09-14 17:25 GMT+02:00 Gerard Meijssen gerard.meijssen@gmail.com:

...
Hoi, Incompatible how ? The fact that some wikis allow for licenses that

Commons

...
does not allow for does NOT make them incompatible. It means that they

use

...
licenses in addition to Commons.. Technically that is no big deal at

all.

...
Thanks, GerardM

On 13 September 2014 23:40, P. Blissenbach publi@web.de wrote:

...
Just a word of caution about collecting all images in commons. A while ago, at least, some local wikis had images with license terms incompatible with commons and vice versa. I recall very simple logos of companies, and several types of "fair use" derivatives.

If that is still so, we have an obstacle that may prevent us from both moving images, and even linking to them under some local laws.

Technically, I agree with the idea quoted below.

Purodha

"James Heald" j.heald@ucl.ac.uk wrote:

...
What I suspect is more likely, and probably makes more sense, is to converge the images themselves to all live in one place. So if the

same

...
...
...
fair-use image was used on multiple fair-use wikis, it would only be stored once (though each fair-use wiki would retain it's own File

page

...
...
...
for it). Such a structure should also make transfers to Commons much easier -- compared to the copy-and-paste by bot at the moment, which loses all the file-page history and most of the upload history.

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Gerard Meijssen

2:28 a.m.

New subject: [Wikidata-l] Commons file-topic searching and storage

Hoi, The consequence would be that we cannot deal with these files. We cannot even know what they are about, We cannot target them for replacement by freely licensed files.

Having access to them, knowing about them is different from using them.

Files with a "fair use" rationale are categorised by them being available for "fair use" reasons.. Marking them as such is not hard and it is not controversial. Making them unavailable for analysis is. Thanks, GerardM

On 14 September 2014 20:21, Joe Filceolaire filceolaire@gmail.com wrote:

...

Except that the problem isn't incompatible licenses; its lack of licenses.

Most pix uploaded to wikipedias have no license. They are there under fair use rationales which are specific to each use and to the laws which apply in countries using that language. These pix are not free to reuse. Each reuse needs a new fair use rationale to justify it.

That is why I think we should limit commonsdata to files on commons, at least for now.

Joe

On 14 Sep 2014 17:19, "Gerard Meijssen" gerard.meijssen@gmail.com wrote:

...
Hoi, Why would it ? A wiki would have a list of permissible licenses. That has nothing to do with Commons and everything with standardising licenses so that there is only one for each license. Thanks, GerardM

On 14 September 2014 17:55, Jan Dudík jan.dudik@gmail.com wrote:

...
Problem is, when somebody translate article from en.wiki and copy all images, it will display even if they have "incompatibile licence" - and who will check it? And there would be many problems with some people which will not agree with deleting these images from articles. Soulition would be, if there will be some table of wikis which do not allow such images - and servers will not dispaly this images on these wikis.

JAnD

Ing. Jan Dudík projekce dopravních staveb tel. 777082195

2014-09-14 17:25 GMT+02:00 Gerard Meijssen gerard.meijssen@gmail.com:

...
Hoi, Incompatible how ? The fact that some wikis allow for licenses that

Commons

...
does not allow for does NOT make them incompatible. It means that they

use

...
licenses in addition to Commons.. Technically that is no big deal at

all.

...
Thanks, GerardM

On 13 September 2014 23:40, P. Blissenbach publi@web.de wrote:

...
Just a word of caution about collecting all images in commons. A while ago, at least, some local wikis had images with license terms incompatible with commons and vice versa. I recall very simple logos of companies, and several types of "fair use" derivatives.

If that is still so, we have an obstacle that may prevent us from both moving images, and even linking to them under some local laws.

Technically, I agree with the idea quoted below.

Purodha

"James Heald" j.heald@ucl.ac.uk wrote:

...
What I suspect is more likely, and probably makes more sense, is to converge the images themselves to all live in one place. So if the

same

...
...
...
fair-use image was used on multiple fair-use wikis, it would only be stored once (though each fair-use wiki would retain it's own File

page

...
...
...
for it). Such a structure should also make transfers to Commons

much

...
...
...
easier -- compared to the copy-and-paste by bot at the moment, which loses all the file-page history and most of the upload history.

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Lydia Pintscher

9:29 p.m.

New subject: [Wikidata-l] Commons file-topic searching and storage

Hey :)

Just an update from my side: We will keep non-Commons images in mind when designing the system. The goal is to provide them with structured data support as well. However initially we will concentrate on Commons to get it to work there as we can have the highest impact there.

Cheers Lydia

Scott MacLeod

16 Sep 16 Sep

12:57 a.m.

New subject: [Wikidata-l] Commons file-topic searching and storage

Hi Lydia and Wikidatans,

In what ways are Wikidatan developers planning for Creative Commons' databases for images, etc. - -http://search.creativecommons.org/ - - as well as for interoperability, - so, beyond WikiCommons' file-topic searching and storage images - especially if these CC databases already have structured data support (and perhaps vis-a-vis Maxime's Google TOS' question as well)? Thanks.

Cheers, Scott

On Mon, Sep 15, 2014 at 6:29 AM, Lydia Pintscher < lydia.pintscher@wikimedia.de> wrote:

...

Hey :)

Just an update from my side: We will keep non-Commons images in mind when designing the system. The goal is to provide them with structured data support as well. However initially we will concentrate on Commons to get it to work there as we can have the highest impact there.

Cheers Lydia

-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata

Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

-- - Scott MacLeod - Founder & President - 415 480 4577 - http://worlduniversityandschool.org - World University and School - like Wikipedia with MIT OpenCourseWare (not endorsed by MIT OCW) - incorporated as a nonprofit university and school in California, and is a U.S. 501 (c) (3) tax-exempt educational organization, both effective April 2010. World University and School is sending you this because of your interest in free, online, higher education. If you don't want to receive these, please reply with 'unsubscribe' in the subject line. Thank you.

Lydia Pintscher

17 Sep 17 Sep

8:01 p.m.

New subject: [Wikidata-l] Commons file-topic searching and storage

On Mon, Sep 15, 2014 at 6:57 PM, Scott MacLeod worlduniversityandschool@gmail.com wrote:

...

Hi Lydia and Wikidatans,

In what ways are Wikidatan developers planning for Creative Commons' databases for images, etc. - -http://search.creativecommons.org/ - - as well as for interoperability, - so, beyond WikiCommons' file-topic searching and storage images - especially if these CC databases already have structured data support (and perhaps vis-a-vis Maxime's Google TOS' question as well)? Thanks.

Hey Scott :)

What kind of interaction/integration do you have in mind?

Cheers Lydia

Scott MacLeod

18 Sep 18 Sep

7:40 a.m.

New subject: [Wikidata-l] Commons file-topic searching and storage

Hi Lydia and Wikidatans,

My questions about C.C. Wikidata integration / interaction have to do with Creative Commons' entities /resources, as examples of external sister projects. Are there any examples so far of Creative Commons' licensed databases that interact/integrate/interoperate with Wikidata and which might parallel or build on the ways in which Wikidata is exploring accessing WikiCommons, and interlingually especially? And what are other examples of Wikidata being used by external sister projects, and in each language? Where (what URLs) will such lists emerge?

In thinking about the Wikidata/WikiCommons/Creative Commons' roadmap ahead, I'm curious what the possible scenarios are especially for using SemanticWiki / Wikidata in the Creative Commons' sphere, and in an expansive way, even for the creation, for example, of Creative Commons' related coding jobs in many/all 7,106 + languages, and interlingually (and vis-a-vis WUaS).

What are Wikidata's nascent plans to support innovations in Creative Commons' structured data projects in the future, for example?

In what ways too might metrics, such as Google Analytics, Watson Analytics or MediaWiki Analytics, for example, be anticipated innovatively roadmap-wise, and interlingually, and especially vis a vis SemanticWiki/Wikidata and CC databases?

Thanks, Scott

On Mon, Sep 15, 2014 at 6:57 PM, Scott MacLeod worlduniversityandschool@gmail.com wrote:

...

Hi Lydia and Wikidatans,

In what ways are Wikidatan developers planning for Creative Commons' databases for images, etc. - -http://search.creativecommons.org/- - as

well

...

as for interoperability, - so, beyond WikiCommons' file-topic searching

and

...

storage images - especially if these CC databases already have structured data support (and perhaps vis-a-vis Maxime's Google TOS' question as

well)?

...

Thanks.

Hey Scott :)

What kind of interaction/integration do you have in mind?

Cheers Lydia

-- Lydia Pintscher -http://about.me/lydia.pintscher Product Manager for Wikidata

Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer27/681/51985.

_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Gerard Meijssen

14 Sep 14 Sep

11:17 p.m.

New subject: [Wikidata-l] Commons file-topic searching and storage (was Re: Commons Categories again)

Hoi, James, you assume a database but in reality it is not "setting an index" at all. In essence this is NOT a relational database. Therefore it is NOT that simplel.

Also the Wikidatification is a failure when it does not work for all languages equally well. That is not to say that the results will be the same; technically it should work equally well. That already is a big challenge.

Denormalisation again has everything to do with relational databases.. So in essence, you miss the point. Thanks, GerardM

Joe Filceolaire

2 Sep 2 Sep

12:43 a.m.

New subject: [Wikidata-l] Commons Categories again (was Re: Commons Wikibase)

James I think the problem is not as difficult as you have described.

If we look at http://www.wikidata.org/wiki/Wikidata:Notability then you will see that each wikimedia commons page can have a corresponding item. The comment that "a sitelink to a category page in Wikimedia Commons is *not* allowed on main article items" means that Commons Category pages should link to Category items and not to items linked to wikipedia articles. It does not mean items linked to Commons Categories are not allowed. I fact I believe nearly every Commons Category has a corresponding wikidata category item.

Notability Criterion 3. reads "(An item is acceptable if) It fulfills *some structural need*, for example: it is needed to make statements made in other items more useful.". I believe that this allows the creation of items for institutions, photographers, books etc as required to describe Commons files. Considering the two examples you identified:

Category:Images released by British Library Images Online Each of these images can have the statement 'Source:British Library Images Online'. This statement requires a CommonsData Property "Source" and a wikidata item "British Library Images Online". As this wikidata item is needed to complete this statement therefore it meets wikidata notability 3.

Category:Metropolitan Improvements (1828) Thomas Hosmer Shepherd Again wikidata items can be created for the book "Metropolitan Improvements" and for the author "Thomas Hosmer Shepherd" and for the books publisher (if known). All of these are clearly all considered as notable under https://www.wikidata.org/wiki/Help:Sources. These wikidata items can then be linked to from statements in CommonsData describing each of the images.

Note that this all works without needing to link to the Category Qitems.

In practice this means that if a Commons file is in a certain category then we can know that certain statements will apply to that file. Later, eventually, we can find those files by searching for files to which those statements apply and ignore the categorisation since all the information inherent in membership of that Category has been included in the form of statements. We do not need a "container for structured information for structured information associated with each commonscat". This structured information can just be included in CommonsData, without any separate 'container'.

Eventually, when the information inherent in the categorisation system has been translated into structured data, and the query system is a lot more useful than today, and the Categories based on idiosyncratic selection criteria have been transitioned into Galleries where they should have been all along then Categories may no longer be needed.

But perhaps we will keep them anyway.

Joe

James Heald

3 Sep 3 Sep

5:57 p.m.

New subject: [Wikidata-l] Commons Categories again (was Re: Commons Wikibase)

@Joe Filceolaire

Fair enough. I had misread the rules. I thought it was the Commons Cat that needed to have a sitelink to some other page on any Wikimedia Project, rather than the requirement just being that a Wikidata item needed to have a sitelink to eg a Commons Cat.

So per the current rules, these Commons Cats could all have Wikidata items (though I still think that would be a mistake).

...

I fact I believe nearly every Commons Category has a corresponding wikidata category item.

That is not correct.

There are currently 3,338,000 categories on Commons (excluding redirects)

About 250,000 category-like items on Wikidata have links to Commons (the number is similar either counting sitelinks, or property P373.)

About 688,000 article-like items on Wikidata have links to Commons categories using property P373.

So between 2,400,000 and 2,650,000 categories on Commons are currently pointed to by neither a category-like item, not an article-like item.

In my view that should continue to be the case.

We're setting up a separate database or namespace for Commons files anyway; so doesn't it make more sense for entities like Commons categories that really only relate to Commons to have items held in that database or namespace, rather than in main Wikidata?

What are the advantages of adding two and a half million items of wiki-junk to Wikidata?

Yes, like other items on CommonsData, the properties of such C-items would normally point to Q-items on main Wikidata.

Looking at the modelling of the two categories in more detail:

First, Category:Images released by British Library Images Online

* It's not clear that BL Images Online would actually have its own Q-item. The British Library certainly does. Images Online is one of many parts of the BL.

But even if we create Images Online as a useful thing to link to, that's not really the point. This category (despite its title) is really for a specific release of images from BL Images Online. If there were another release, that would have a new different (sub-)category.

Yes, we could perhaps capture the set with a query specifying the source and the date. But as a distinctive set, its useful to have a (C-)item that can represent it, (i) acting as a container for the query, and any other information about the set that might be relevant; and (ii) acting as a target for searches, so the set can be retrieved directly with a simple search, rather than requiring a complex search combining multiple properties.

Secondly, Category:Metropolitan Improvements (1828) Thomas Hosmer Shepherd

Again, the important thing is that (despite its title) what this category really represents is a particular set of *scans*.

There are already titles where we have multiple sets of scans for a single book, from different sources, often with different image characteristics.

In the jargon, these scan-sets are called "manifestations" of the work. On main Wikidata, current guidance is to have Q-items for works, and Q-items for editions, but not Q-items for manifestations of editions. So on current sourcing guidance, again, this category should not have a Q-item.

But it does make sense for it to have an item for operational reasons on Commons, so (IMO) it makes sense for it to have a C-item on CommonsData.

The C-item would reference the Q-item on WikiData about the edition; but would also contain information specific to the C-item -- for example, that the source for these scans was a particular copy of the book scanned and released as part of the Mechanical Curator collection.

Scans of other copies of the same edition of the same book might have separately been released as part of the Mechanical Curator collection, part of the Wellcome collection, part of a release by the NYPL, or part of the Internet Archive Book Images collection (which in itself can contain multiple releases of the same book, from different libraries).

This source information can be quite detailed, along with credit-line information, and specific link-back information. So (IMO) it makes sense to be able to hold it as a single item for the set, rather than only be able to extract it as a query from the individual images.

Furthermore, this is information that one wants to be able to display on the Commons category page. It doesn't make sense to have to run a query over the images (which images? all of them?) in the category, just to be able to display header information on the category page.

-- James.

On 01/09/2014 17:43, Joe Filceolaire wrote:

...

James I think the problem is not as difficult as you have described.

If we look at http://www.wikidata.org/wiki/Wikidata:Notability then you will see that each wikimedia commons page can have a corresponding item. The comment that "a sitelink to a category page in Wikimedia Commons is *not* allowed on main article items" means that Commons Category pages should link to Category items and not to items linked to wikipedia articles. It does not mean items linked to Commons Categories are not allowed. I fact I believe nearly every Commons Category has a corresponding wikidata category item.

Notability Criterion 3. reads "(An item is acceptable if) It fulfills *some structural need*, for example: it is needed to make statements made in other items more useful.". I believe that this allows the creation of items for institutions, photographers, books etc as required to describe Commons files. Considering the two examples you identified:

Category:Images released by British Library Images Online Each of these images can have the statement 'Source:British Library Images Online'. This statement requires a CommonsData Property "Source" and a wikidata item "British Library Images Online". As this wikidata item is needed to complete this statement therefore it meets wikidata notability 3.

Category:Metropolitan Improvements (1828) Thomas Hosmer Shepherd Again wikidata items can be created for the book "Metropolitan Improvements" and for the author "Thomas Hosmer Shepherd" and for the books publisher (if known). All of these are clearly all considered as notable under https://www.wikidata.org/wiki/Help:Sources. These wikidata items can then be linked to from statements in CommonsData describing each of the images.

Note that this all works without needing to link to the Category Qitems.

In practice this means that if a Commons file is in a certain category then we can know that certain statements will apply to that file. Later, eventually, we can find those files by searching for files to which those statements apply and ignore the categorisation since all the information inherent in membership of that Category has been included in the form of statements. We do not need a "container for structured information for structured information associated with each commonscat". This structured information can just be included in CommonsData, without any separate 'container'.

Eventually, when the information inherent in the categorisation system has been translated into structured data, and the query system is a lot more useful than today, and the Categories based on idiosyncratic selection criteria have been transitioned into Galleries where they should have been all along then Categories may no longer be needed.

But perhaps we will keep them anyway.

Joe

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Andy Mabbett

19 Aug 19 Aug

6:37 a.m.

On 18 August 2014 14:30, Lydia Pintscher lydia.pintscher@wikimedia.de wrote:

...

I'm not sure we're talking about the exact same thing so let me write down how I envision it:

For a file on Commons there will be a second page on Commons that

holds the structured data about that file. So if the file is HamsterBerta.jpg then we have something like Info:HamsterBerta.jpg. (Info isn't decided yet!) This is what we currently call MediaInfo and is comparable to an item on Wikidata.

I'd envisaged a singe page, with image and prose at the top, and Wikidata-style properties below.

-- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk

Lydia Pintscher

10:32 p.m.

On Tue, Aug 19, 2014 at 12:37 AM, Andy Mabbett andy@pigsonthewing.org.uk wrote:

...

On 18 August 2014 14:30, Lydia Pintscher lydia.pintscher@wikimedia.de wrote:

...
I'm not sure we're talking about the exact same thing so let me write down how I envision it:

For a file on Commons there will be a second page on Commons that

holds the structured data about that file. So if the file is HamsterBerta.jpg then we have something like Info:HamsterBerta.jpg. (Info isn't decided yet!) This is what we currently call MediaInfo and is comparable to an item on Wikidata.

I'd envisaged a singe page, with image and prose at the top, and Wikidata-style properties below.

We can think about that too, yeah. But I think that'd be at least one step later.

Cheers Lydia

Luca Martinelli

16 Aug 16 Aug

2:22 a.m.

More info on http://m.mediawiki.org/wiki/Multimedia/Structured_Data and http://www.wikidata.org/wiki/Wikidata:Wikimedia_Commons/Development

Cheers,

L. Il 15/ago/2014 20:19 "Derric Atzrott" datzrott@alizeepathology.com ha scritto:

...

Hey,

So I heard on another mailing list that Commons is getting its own installation of Wikibase along with using Wikidata? Is this true, and if so, where might I find more information about it?

Thank you, Derric Atzrott Computer Specialist Alizee Pathology

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

3730

Age (days ago)

3763

Last active (days ago)

wikidata@lists.wikimedia.org

67 comments

16 participants

tags (0)

participants (16)

Andy Mabbett
David Cuenca
Derric Atzrott
Gerard Meijssen
James Heald
Jan Ainali
Jan Dudík
Joe Filceolaire
Luca Martinelli
Lydia Pintscher
Magnus Manske
Markus Krötzsch
P. Blissenbach
Paul Houle
Scott MacLeod
Thomas Douillard