restarting the thread with a correct title now :)
2012/5/22 Platonides platonides@gmail.com
On 22/05/12 01:11, Kilian Kluge wrote:
So what do I suggest you do instead?
As I said, we're facing a similar (if not worse) situation in Germany. We have literally more than one thousand institutions and authorities that issue monument lists for areas ranging from single municipalities to whole states (in total, there are about 1 million monuments). Many of them do assign numbers, but they all start with 1, so we have the issue that the IDs are not unique.
What we're going to do now that we have enough lists on Wikipedia: We will use an already existing numbering scheme called the Gemeindekennziffer which assings a unique code to each municipality as a prefix to the official IDs. (The actual system is a little more difficult, in fact, the Gemeindekennziffer is structured into different parts, for example the first to numbers tell you what state the municipality is in and so on. Therefore we'll just use the first two numbers of the code as the prefix for states that have unique IDs already.) This way, the IDs on Wikipedia and on Commons stay the official ones, only inside the database we add a prefix which is not OR but based on an official numbering scheme. I'm sure that you can find a similar numbering system for Italy!
What identifier is used by people when uploading the images? If the unprefixed one, how do you find out automatically the municipality?
Wiki Loves Monuments mailing list WikiLovesMonuments@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikilovesmonuments http://www.wikilovesmonuments.eu
Good evening,
2012/5/22 Platonides platonides@gmail.com
What identifier is used by people when uploading the images? If the unprefixed one, how do you find out automatically the municipality?
We will most likely ask for the original identifier. The template structure on Commons is already based on states, so e.g. {{Kulturdenkmal Hessen|12345}} works already, the database ID for that image will then be 06-12345 with 06 being the code for the state Hesse (this is, as I said, an official code scheme, all municipalities in Hesse have a code starting with 06).
As far as I know today, there are only two states that have numbered lists on lower levels. We're not exactly sure how to handle them on Commons even, but it will either be something like {{Baudenkmal Nordrhein-Westfalen|Köln|12345}} (which I prefer) or {{Baudenkmal Nordrhein-Westfalen|05 3 15 000|12345}} with the 05... being the code for Cologne. But since all municipality codes are known, we can ask to provide the municipality and monument ID, a simple matching table will do the rest. Same for areas where the counties keep the lists and assign the IDs, just with a shorter code as prefix (in the code for Cologne, 05 is the state, 3 the district and 15 the county).
When uploading directly from the lists, it's obviously much easier since we can provide all necessary information automatically.
You can see that this method has a clear advantage over generating new IDs from scratch: It keeps the official IDs and is based on an official code scheme, so it's not an OR issue to actually display the prefixed IDs in some Tools. On Wikipedia, only the official IDs are shown and included in the lists.
Kilian
If I understood it, there are two options:
1. Set an upload campaign for every local registrant with conflicting ID, i.e. Köln campaign with ID=12345 2. Use a wider upload campaign with a prefix identifying local registrants, i.e. ID=05315000/12345
Last year we used solution 2 for local monuments of Barcelona and there was not any doubt about OR as it is an official prefix for the municipality and it is a solution used by other sources [1]. An example list with both national ID and local ID: https://ca.wikipedia.org/wiki/Llista_de_monuments_de_la_Barceloneta
[1] http://patrimonicultural.diba.cat/?fitxa=118000127 See "Número d'element: 08118/127"
Vicenç
Date: Tue, 22 May 2012 22:44:02 +0200 From: kilian@k-kluge.de To: wikilovesmonuments@lists.wikimedia.org Subject: Re: [Wiki Loves Monuments] about identifiers
Good evening,
2012/5/22 Platonides platonides@gmail.com
What identifier is used by people when uploading the images? If the unprefixed one, how do you find out automatically the municipality?
We will most likely ask for the original identifier. The template structure on Commons is already based on states, so e.g. {{Kulturdenkmal Hessen|12345}} works already, the database ID for that image will then be 06-12345 with 06 being the code for the state Hesse (this is, as I said, an official code scheme, all municipalities in Hesse have a code starting with 06).
As far as I know today, there are only two states that have numbered lists on lower levels. We're not exactly sure how to handle them on Commons even, but it will either be something like {{Baudenkmal Nordrhein-Westfalen|Köln|12345}} (which I prefer) or {{Baudenkmal Nordrhein-Westfalen|05 3 15 000|12345}} with the 05... being the code for Cologne. But since all municipality codes are known, we can ask to provide the municipality and monument ID, a simple matching table will do the rest. Same for areas where the counties keep the lists and assign the IDs, just with a shorter code as prefix (in the code for Cologne, 05 is the state, 3 the district and 15 the county).
When uploading directly from the lists, it's obviously much easier since we can provide all necessary information automatically.
You can see that this method has a clear advantage over generating new IDs from scratch: It keeps the official IDs and is based on an official code scheme, so it's not an OR issue to actually display the prefixed IDs in some Tools. On Wikipedia, only the official IDs are shown and included in the lists.
Kilian
Wiki Loves Monuments mailing list WikiLovesMonuments@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikilovesmonuments http://www.wikilovesmonuments.eu
On Tue, May 22, 2012 at 7:32 PM, Lodewijk lodewijk@effeietsanders.orgwrote:
restarting the thread with a correct title now :)
2012/5/22 Platonides platonides@gmail.com
On 22/05/12 01:11, Kilian Kluge wrote:
So what do I suggest you do instead?
As I said, we're facing a similar (if not worse) situation in Germany. We have literally more than one thousand institutions and authorities that issue monument lists for areas ranging from single municipalities to whole states (in total, there are about 1 million monuments). Many of them do assign numbers, but they all start with 1, so we have the issue that the IDs are not unique.
What we're going to do now that we have enough lists on Wikipedia: We will use an already existing numbering scheme called the Gemeindekennziffer which assings a unique code to each municipality as a prefix to the official IDs. (The actual system is a little more difficult, in fact, the Gemeindekennziffer is structured into different parts, for example the first to numbers tell you what state the municipality is in and so on. Therefore we'll just use the first two numbers of the code as the prefix for states that have unique IDs already.) This way, the IDs on Wikipedia and on Commons stay the official ones, only inside the database we add a prefix which is not OR but based on an official numbering scheme. I'm sure that you can find a similar numbering system for Italy!
What identifier is used by people when uploading the images? If the unprefixed one, how do you find out automatically the municipality?
This thread is really interesting because it is important to define some key points:
a) the structure of the data is *not" an original research as the creation of the templates is not considered original. The identifier is mainly a problem of the organization of data. b) the unique identifier is becoming to be an important question mainly if there are some tools which will help the uploading and the identification of the monuments c) the local identifiers cannot be lost in order to keep the links wth other "official lists"
These preamble would demonstrate that the definition of an unique identifier for all monuments of all countries is important and that the list of monuments is basically a "database".
The structure of the database is not an original research but it is part of the "infrastructure". The local identifiers cannot be primary key because there is the problem of redundancy (mainly if we have a unique repository for all countries), so to have a progress we have to define our own *primary key* and connect it with the local identifiers in a way that can assure a continuity and a long life of the new structure of the list of the monuments.
These "new system of identifiers" may help some projects like that of the monuments of the Portuguese Empire which is basically a *view* of a database.
The problem faced by Germany and by Poland and by Italy with the regional lists of monuments is only a partial vision of a biggest problem to have a unique repository.
The use of an identifier for municipality is not a good candidate for a primary key because in some countries the municipalities may be aggregated or may be split year by year. Probably an identifier connected with the geographical coordinates may be a better candidate... but the real question is that the identifier is a good point to be discussed and can become urgent in the near future.
Hi all,
I'm glad we're having this discussion.
First of all, I think it is important that identifiers are *always* only unique within a certain context. Almost every country will have a monument number 1. That means that whatever identifier you have in your country, it will not be absolutely unique, but only in the context of your country. So how do we make it unique on a global scale? Well, by combining it with the country information.
Nothing is stopping us from doing the same on a subnational level. And nobody said that we *must* have exactly the same structure in every country. It would be great to have, but unrealistic. So if Germany splits up its database in 16 databases, one for every Land, that can make sense. Then there are indeed two solutions: 1) use one database, but combine the local identifier with a region code. 2) make seperate databases or 3) add a region field. In either way, the end result is unique enough, it just needs some technical working out.
If the municipality key works in one country, it doesn't have to work in another. I suggest we use whatever system works best for your country - and implement that. Also merging municipalities are a big pain, but I'm sure there's a way to work around that too (for example setting up a renaming/redirect table).
Lodewijk
2012/5/23 Ilario Valdelli valdelli@gmail.com
On Tue, May 22, 2012 at 7:32 PM, Lodewijk lodewijk@effeietsanders.orgwrote:
restarting the thread with a correct title now :)
2012/5/22 Platonides platonides@gmail.com
On 22/05/12 01:11, Kilian Kluge wrote:
So what do I suggest you do instead?
As I said, we're facing a similar (if not worse) situation in Germany. We have literally more than one thousand institutions and authorities that issue monument lists for areas ranging from single municipalities to whole states (in total, there are about 1 million monuments). Many of them do assign numbers, but they all start with 1, so we have the issue that the IDs are not unique.
What we're going to do now that we have enough lists on Wikipedia: We will use an already existing numbering scheme called the Gemeindekennziffer which assings a unique code to each municipality as a prefix to the official IDs. (The actual system is a little more difficult, in fact, the Gemeindekennziffer is structured into different parts, for example the first to numbers tell you what state the municipality is in and so on. Therefore we'll just use the first two numbers of the code as the prefix for states that have unique IDs already.) This way, the IDs on Wikipedia and on Commons stay the official ones, only inside the database we add a prefix which is not OR but based on an official numbering scheme. I'm sure that you can find a similar numbering system for Italy!
What identifier is used by people when uploading the images? If the unprefixed one, how do you find out automatically the municipality?
This thread is really interesting because it is important to define some key points:
a) the structure of the data is *not" an original research as the creation of the templates is not considered original. The identifier is mainly a problem of the organization of data. b) the unique identifier is becoming to be an important question mainly if there are some tools which will help the uploading and the identification of the monuments c) the local identifiers cannot be lost in order to keep the links wth other "official lists"
These preamble would demonstrate that the definition of an unique identifier for all monuments of all countries is important and that the list of monuments is basically a "database".
The structure of the database is not an original research but it is part of the "infrastructure". The local identifiers cannot be primary key because there is the problem of redundancy (mainly if we have a unique repository for all countries), so to have a progress we have to define our own *primary key* and connect it with the local identifiers in a way that can assure a continuity and a long life of the new structure of the list of the monuments.
These "new system of identifiers" may help some projects like that of the monuments of the Portuguese Empire which is basically a *view* of a database.
The problem faced by Germany and by Poland and by Italy with the regional lists of monuments is only a partial vision of a biggest problem to have a unique repository.
The use of an identifier for municipality is not a good candidate for a primary key because in some countries the municipalities may be aggregated or may be split year by year. Probably an identifier connected with the geographical coordinates may be a better candidate... but the real question is that the identifier is a good point to be discussed and can become urgent in the near future.
-- Ilario Valdelli Wikimedia CH Verein zur Förderung Freien Wissens Association pour l’avancement des connaissances libre Associazione per il sostegno alla conoscenza libera Switzerland - 8008 Zürich Tel: +41764821371 http://www.wikimedia.ch
Wiki Loves Monuments mailing list WikiLovesMonuments@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikilovesmonuments http://www.wikilovesmonuments.eu
Last year Wikimedia Spain only listed monuments declared "Bien de Interés Cultural" and registered and listed on Ministry site, with the code given by ministry. But this site is not very accurated. Many regional goverments have their own site with their own ID, different from ministry. And these lists are usually more accurated and they have more monuments not listed in the site of the ministry.
Why do I say this? Because we find monuments with two different ID (one given by ministry and another by regional goverment). Both are official. Must we show both? In my opinion yes, because both ID are useful and encycopedical information. How does it afect to the database of WLM? Examples of this case could be: Catalonia, Aragon, Murcia, Valencia or Andalucia.
But usually, like in Castille and León or Basque Country, there is not an ID available in their sites. In these two cases we can find a unique number for identifing them in the url. Could we use this number as ID?
And then, there is a third case. When we have monuments but nowhere there is a ID or something like that available. What can we do in this cases? Could we create an ID for the contest? How can we show that this code is not official to wikipedia readers? Examples: Navarre or Cantabria.
I'm very busy in my real life at this moment, but in one month I'll be active another time in this project.
Thank you for your opinions, suggestions and advices.
Regards!!
Santiago Navarro Wikimedia España user:Millars
Santigo,
In such scenario, I would advise you create compound IDs. And then again, forget the wikipedia lists for a moment. By referencing the source you can all recover from whatever damage you'll be doing.
So, for instance, suppose: * Monument A, from list A, has id 3 * Monument B, from list B, has no id * Monument A is also at list C, with id 5
If you created a compound ID (primary key) like this (source,id auto_increment) with an attribute (field) alt_source_id, when you insert the values, you'll do (pseudocode): INSERT (a, 3, NULL); INSERT (b, NULL, NULL); INSERT OR UPDATE (a,3,5);
And you'll get: a,3,5 => maps to ID A3, or A-3, whatever you want b,1,NULL => maps to ID B1, or B-1...
See my mail a couple of minutes ago for an idea using parserfunctions to decide what to do when publishing to wikipedia lists. For instance, you could entirely hide "source=b" ids.... or later on, you could decide to switch the Ids (3 for 5 in row "a", etc) and regenerate the lists.
Storing "extra" attributes was one of the reasons I found the WLM monuments database insufficient. Nobody's fault, though, our solution is far more time-consuming to implement and would require a lot of code refactoring (bots and alike)... but we can regenerate the wikipedia lists which are far older than WLM, which contain wikilinks inside the descriptions (we actually did it), thus allowing us to know which wikipedia articles the monument entry refers to...
-NT
Em 23-05-2012 14:24, Santiago Navarro Sanz escreveu:
Last year Wikimedia Spain only listed monuments declared "Bien de Interés Cultural" and registered and listed on Ministry site, with the code given by ministry. But this site is not very accurated. Many regional goverments have their own site with their own ID, different from ministry. And these lists are usually more accurated and they have more monuments not listed in the site of the ministry.
Why do I say this? Because we find monuments with two different ID (one given by ministry and another by regional goverment). Both are official. Must we show both? In my opinion yes, because both ID are useful and encycopedical information. How does it afect to the database of WLM? Examples of this case could be: Catalonia, Aragon, Murcia, Valencia or Andalucia.
But usually, like in Castille and León or Basque Country, there is not an ID available in their sites. In these two cases we can find a unique number for identifing them in the url. Could we use this number as ID?
And then, there is a third case. When we have monuments but nowhere there is a ID or something like that available. What can we do in this cases? Could we create an ID for the contest? How can we show that this code is not official to wikipedia readers? Examples: Navarre or Cantabria.
I'm very busy in my real life at this moment, but in one month I'll be active another time in this project.
Thank you for your opinions, suggestions and advices.
Regards!!
Santiago Navarro Wikimedia España user:Millars
Wiki Loves Monuments mailing list WikiLovesMonuments@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikilovesmonuments http://www.wikilovesmonuments.eu
Hi Lodewijk, Hi all,
On Wed, May 23, 2012 at 1:14 PM, Lodewijk lodewijk@effeietsanders.org wrote:
If the municipality key works in one country, it doesn't have to work in another. I suggest we use whatever system works best for your country - and implement that.
I hope that it didn't sound like I was suggesting anything but that. My goal was just to point out that own identifiers made from scratch that don't include the original IDs and reflect the structure those IDs are assigned will bring more trouble than good in the long run.
Also merging municipalities are a big pain, but I'm sure there's a way to work around that too (for example setting up a renaming/redirect table).
In the case of Germany, this wouldn't be that big of an issue, since the changes in the Gemeindekennziffer scheme are clearly communicated. (Also, the prefixed IDs don't show up on Wikipedia or Commons.)
Kilian
Hi Lodewijk, hi Kilian, hi all, I support Kilian's statement with practical reasons. It is often difficult to find certain monuments, even though they have a unique number, if you cant compare them unambiguously with other lists containig additional information.
In Austria some districts have been merged this year, we are working on that (it mainly influences the display of monuments on maps). Kind regards Beppo
-------- Original-Nachricht --------
Datum: Wed, 23 May 2012 16:07:55 +0200 Von: Kilian Kluge kilian@k-kluge.de An: Wiki Loves Monuments Photograph Competition wikilovesmonuments@lists.wikimedia.org Betreff: Re: [Wiki Loves Monuments] about identifiers
Hi Lodewijk, Hi all,
On Wed, May 23, 2012 at 1:14 PM, Lodewijk lodewijk@effeietsanders.org wrote:
If the municipality key works in one country, it doesn't have to work in another. I suggest we use whatever system works best for your country -
and
implement that.
I hope that it didn't sound like I was suggesting anything but that. My goal was just to point out that own identifiers made from scratch that don't include the original IDs and reflect the structure those IDs are assigned will bring more trouble than good in the long run.
Also merging municipalities are a big pain, but I'm sure there's a way to work around that too (for example setting up a renaming/redirect table).
In the case of Germany, this wouldn't be that big of an issue, since the changes in the Gemeindekennziffer scheme are clearly communicated. (Also, the prefixed IDs don't show up on Wikipedia or Commons.)
Kilian
Wiki Loves Monuments mailing list WikiLovesMonuments@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikilovesmonuments http://www.wikilovesmonuments.eu
Hi Ilario,
I agree with you, thanks for structuring the discussion :)
On Wed, May 23, 2012 at 12:59 PM, Ilario Valdelli valdelli@gmail.com wrote:
The structure of the database is not an original research but it is part of the "infrastructure". The local identifiers cannot be primary key because there is the problem of redundancy (mainly if we have a unique repository for all countries), so to have a progress we have to define our own *primary key* and connect it with the local identifiers in a way that can assure a continuity and a long life of the new structure of the list of the monuments.
I would add to that one additional requirement: The IDs created by and for the database system should not show up on Wikipedia or Commons. A list on Wikipedia should always contain and display the actual local identifier as it is the case in the countries with just one numbering system. That's what I consider the "OR issue" that we need to avoid and which I tried to explain (poorly).
The use of an identifier for municipality is not a good candidate for a primary key because in some countries the municipalities may be aggregated or may be split year by year. Probably an identifier connected with the geographical coordinates may be a better candidate... but the real question is that the identifier is a good point to be discussed and can become urgent in the near future.
The connection with the municipality key has the advantage that it reflects the structure of the lists: If every municipality/county/district/state/nation assigns their own IDs, we should just add prefixes to make them unique. (In fact, for most german areas it's not the municipality but rather the district or state level, so changes will be very rare.)
It also avoids creation of unnecessarily long and complicated IDs: If a state keeps unique numbers, you just add the state identifier and don't worry about location or municipality.
But that's just my point of view :-)
Kilian
On 23/05/12 16:22, Kilian Kluge wrote:
Hi Ilario,
I agree with you, thanks for structuring the discussion :)
On Wed, May 23, 2012 at 12:59 PM, Ilario Valdelli valdelli@gmail.com wrote:
The structure of the database is not an original research but it is part of the "infrastructure". The local identifiers cannot be primary key because there is the problem of redundancy (mainly if we have a unique repository for all countries), so to have a progress we have to define our own *primary key* and connect it with the local identifiers in a way that can assure a continuity and a long life of the new structure of the list of the monuments.
I would add to that one additional requirement: The IDs created by and for the database system should not show up on Wikipedia or Commons. A list on Wikipedia should always contain and display the actual local identifier as it is the case in the countries with just one numbering system. That's what I consider the "OR issue" that we need to avoid and which I tried to explain (poorly).
The problem is, if your lists don't include the "full ID", how do you expect the users to include the appropiate one when uploading? Even when you provide a very clear ID column some people will fail, but if they have to guess a prefix...
On the other hand, this is a point for not prefixing with something like the region postal code, but to use instead the region name/abbreviation. This way, its visually significant, both for someone expecting the local id ("Region-85, ok the id is 85") and more casual readers ("the monument #85 of region").
In Spain we made up a convention last year for WLM on how to write the identifiers, since the same db contained the ids formatted in several ways (with/without dots, spaces, brackets...)
I got kinda lost in this discussion, so I'm sorry if I repeat someone else's approach.
What we did (regions: continental PT, Madeira Islands, Azores Islands and a second/broader partner list).
We established ranges of numbers, something like this: - 0-899999 were IGESPAR - 900000-9099999 were Azores - 910000-9100000 were Madeira - 990000-9999999 were SIPA (all Portugal, non-protected buildings)
(ranges may be wrong).
Yes, it is not perfect, but allowed us to strip off the prefix using ParserFunctions ( {{#if id>900000 and id<9100000|show IGESPAR icon; perform subtraction id-9000000; show real ID}} ).
Keeping the work to numerics (or controlled parsable prefixes) will let you recover the real IDs. That is indeed important, because you will have: - List of Continental Portugal, where ID 1 exists, but its reference (footnote) is "IGESPAR official lists"; - List of Madeira, where ID 1 exists, but its reference (footnote) is "Madeira official lists";
And so on...
In the end, the WLM contestant will provide an ID of 99001523, which is perfectly fine for WLM, and the Wikipedia reader will keep looking at ID 1523 with a footnote for SIPA.
And yes, we never pointed the user to the Wikipedia lists.
-NT
Em 23-05-2012 17:39, Platonides escreveu:
On 23/05/12 16:22, Kilian Kluge wrote:
Hi Ilario,
I agree with you, thanks for structuring the discussion :)
On Wed, May 23, 2012 at 12:59 PM, Ilario Valdellivaldelli@gmail.com wrote:
The structure of the database is not an original research but it is part of the "infrastructure". The local identifiers cannot be primary key because there is the problem of redundancy (mainly if we have a unique repository for all countries), so to have a progress we have to define our own *primary key* and connect it with the local identifiers in a way that can assure a continuity and a long life of the new structure of the list of the monuments.
I would add to that one additional requirement: The IDs created by and for the database system should not show up on Wikipedia or Commons. A list on Wikipedia should always contain and display the actual local identifier as it is the case in the countries with just one numbering system. That's what I consider the "OR issue" that we need to avoid and which I tried to explain (poorly).
The problem is, if your lists don't include the "full ID", how do you expect the users to include the appropiate one when uploading? Even when you provide a very clear ID column some people will fail, but if they have to guess a prefix...
On the other hand, this is a point for not prefixing with something like the region postal code, but to use instead the region name/abbreviation. This way, its visually significant, both for someone expecting the local id ("Region-85, ok the id is 85") and more casual readers ("the monument #85 of region").
In Spain we made up a convention last year for WLM on how to write the identifiers, since the same db contained the ids formatted in several ways (with/without dots, spaces, brackets...)
Wiki Loves Monuments mailing list WikiLovesMonuments@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikilovesmonuments http://www.wikilovesmonuments.eu
This is a key point: Wikipedic information needs and the needs for organizing a contest with unique IDs. An ideal solution is to satisfy both needs.
Vicenç
Date: Thu, 24 May 2012 00:47:09 +0100 From: nuno.tavares@wikimedia.pt To: wikilovesmonuments@lists.wikimedia.org Subject: Re: [Wiki Loves Monuments] about identifiers
And yes, we never pointed the user to the Wikipedia lists.
-NT
wikilovesmonuments@lists.wikimedia.org