Hi everyone,
I have some bad news. We screwed up. I’m really sorry about this. I’d really appreciate everyone’s help with fixing it.
TLDR: We have a bad mixup of calendar models for the dates in Wikidata and we need to fix them.
==== What happened? ==== Wikidata dates have a calendar model. This can be Julian or Gregorian and the plan is to support more in the future. There are two ways to interpret this calendar model: # the given date is in this calendar model # the given date is Gregorian and this calendar model says if the date should be displayed in Gregorian or Julian in the user interface
Unfortunately both among the developers as well as bot operators there was confusion about which of those is to be used. This lead to inconsistencies in the backend/frontend code as well as different bot authors treating the calendar model differently. In addition the user interface had problematic defaults. We now have a number of dates with a potentially wrong calendar model. The biggest issue started when we moved code from the frontend to the backend in Mid 2014 in order to improve performance. Prior to the move, the user interface used to make the conversion from one model to the other. After the move, the conversion was not done anywhere anymore - but the calendar model was still displayed. We made one part better but in the process broke another part badly :(
==== What now? ==== * Going forward the date data value will be given in both the normalized proleptic Gregorian calendar as well as in the calendar model explicitly given (which currently supports, as said, proleptic Gregorian and proleptic Julian). * The user interface will again indicate which calendar model the date is given in. We will improve documentation around this to make sure there is no confusion from now on. * We made a flowchart to help decide what the correct calendar model for a date should be to help with the clean up. * We are improving the user interface to make it easier to understand what is going on and by default do the right thing. * We are providing a list of dates that need to be checked and potentially fixed. * How are we making sure it doesn’t happen again? * We are improving documentation around dates and will look for other potential ambiguous concepts we have.
==== How can we fix it? ==== We have created a list of all dates that potentially need checking. We can either provide this as a list on some wiki page or run a bot to add “instance of: date needing calendar model check“ or something similar as a qualifier to the respective dates. What do you prefer? The list probably contains dates we can batch-change or approve but we’d need your help with figuring out which those are. We also created a flowchart that should help with making the decision which calendar model to pick for a given date: https://commons.wikimedia.org/wiki/File:Wikidata_Calendar_Model_Decision_Tre...
Thank you to everyone who helped us investigate and get to the bottom of the issue. Sorry again this has happened and is causing work. I feel miserable about this and if there is anything more we can do to help with the cleanup please do let me know.
Let's please keep further discussion about this in one place on-wiki at https://www.wikidata.org/wiki/Wikidata:Project_chat#calendar_model_screwup
Cheers Lydia
Can I just ask all of you who want to demand an enquiry as to how this happened to hold off until the problem has been fixed
Please
No post mortem while the patient is still alive
Joe
On Tue, 30 Jun 2015 18:39 Lydia Pintscher lydia.pintscher@wikimedia.de wrote:
Hi everyone,
I have some bad news. We screwed up. I’m really sorry about this. I’d really appreciate everyone’s help with fixing it.
TLDR: We have a bad mixup of calendar models for the dates in Wikidata and we need to fix them.
==== What happened? ==== Wikidata dates have a calendar model. This can be Julian or Gregorian and the plan is to support more in the future. There are two ways to interpret this calendar model: # the given date is in this calendar model # the given date is Gregorian and this calendar model says if the date should be displayed in Gregorian or Julian in the user interface
Unfortunately both among the developers as well as bot operators there was confusion about which of those is to be used. This lead to inconsistencies in the backend/frontend code as well as different bot authors treating the calendar model differently. In addition the user interface had problematic defaults. We now have a number of dates with a potentially wrong calendar model. The biggest issue started when we moved code from the frontend to the backend in Mid 2014 in order to improve performance. Prior to the move, the user interface used to make the conversion from one model to the other. After the move, the conversion was not done anywhere anymore - but the calendar model was still displayed. We made one part better but in the process broke another part badly :(
==== What now? ====
- Going forward the date data value will be given in both the
normalized proleptic Gregorian calendar as well as in the calendar model explicitly given (which currently supports, as said, proleptic Gregorian and proleptic Julian).
- The user interface will again indicate which calendar model the date
is given in. We will improve documentation around this to make sure there is no confusion from now on.
- We made a flowchart to help decide what the correct calendar model
for a date should be to help with the clean up.
- We are improving the user interface to make it easier to understand
what is going on and by default do the right thing.
- We are providing a list of dates that need to be checked and
potentially fixed.
- How are we making sure it doesn’t happen again?
- We are improving documentation around dates and will look for other
potential ambiguous concepts we have.
==== How can we fix it? ==== We have created a list of all dates that potentially need checking. We can either provide this as a list on some wiki page or run a bot to add “instance of: date needing calendar model check“ or something similar as a qualifier to the respective dates. What do you prefer? The list probably contains dates we can batch-change or approve but we’d need your help with figuring out which those are. We also created a flowchart that should help with making the decision which calendar model to pick for a given date:
https://commons.wikimedia.org/wiki/File:Wikidata_Calendar_Model_Decision_Tre...
Thank you to everyone who helped us investigate and get to the bottom of the issue. Sorry again this has happened and is causing work. I feel miserable about this and if there is anything more we can do to help with the cleanup please do let me know.
Let's please keep further discussion about this in one place on-wiki at https://www.wikidata.org/wiki/Wikidata:Project_chat#calendar_model_screwup
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
I may have said this before; it is very easy to get things screwed up when a value must reference a datum (a calendar is a datum for time). I think this is perhaps one of the most common errors on Wikipedia, we just assume there is a single global datum. Usually it is not, it is only a matter of precission in the number and some datum-error bites you in the rumpf!
Big thanks to Lydia and the team for providing an explanation, it makes it much easier to fix it. Don't waste to much time on how it happen, it is far more important to figure out how to fix it.
Keep up the good work, and don't forget to report failures. Thats how we all learn.
John Erling Blad /jeblad
(Okey I do laugh a little, but not very much! [?])
On Tue, Jun 30, 2015 at 9:21 PM, Joe Filceolaire filceolaire@gmail.com wrote:
Can I just ask all of you who want to demand an enquiry as to how this happened to hold off until the problem has been fixed
Please
No post mortem while the patient is still alive
Joe
On Tue, 30 Jun 2015 18:39 Lydia Pintscher lydia.pintscher@wikimedia.de wrote:
Hi everyone,
I have some bad news. We screwed up. I’m really sorry about this. I’d really appreciate everyone’s help with fixing it.
TLDR: We have a bad mixup of calendar models for the dates in Wikidata and we need to fix them.
==== What happened? ==== Wikidata dates have a calendar model. This can be Julian or Gregorian and the plan is to support more in the future. There are two ways to interpret this calendar model: # the given date is in this calendar model # the given date is Gregorian and this calendar model says if the date should be displayed in Gregorian or Julian in the user interface
Unfortunately both among the developers as well as bot operators there was confusion about which of those is to be used. This lead to inconsistencies in the backend/frontend code as well as different bot authors treating the calendar model differently. In addition the user interface had problematic defaults. We now have a number of dates with a potentially wrong calendar model. The biggest issue started when we moved code from the frontend to the backend in Mid 2014 in order to improve performance. Prior to the move, the user interface used to make the conversion from one model to the other. After the move, the conversion was not done anywhere anymore - but the calendar model was still displayed. We made one part better but in the process broke another part badly :(
==== What now? ====
- Going forward the date data value will be given in both the
normalized proleptic Gregorian calendar as well as in the calendar model explicitly given (which currently supports, as said, proleptic Gregorian and proleptic Julian).
- The user interface will again indicate which calendar model the date
is given in. We will improve documentation around this to make sure there is no confusion from now on.
- We made a flowchart to help decide what the correct calendar model
for a date should be to help with the clean up.
- We are improving the user interface to make it easier to understand
what is going on and by default do the right thing.
- We are providing a list of dates that need to be checked and
potentially fixed.
- How are we making sure it doesn’t happen again?
- We are improving documentation around dates and will look for other
potential ambiguous concepts we have.
==== How can we fix it? ==== We have created a list of all dates that potentially need checking. We can either provide this as a list on some wiki page or run a bot to add “instance of: date needing calendar model check“ or something similar as a qualifier to the respective dates. What do you prefer? The list probably contains dates we can batch-change or approve but we’d need your help with figuring out which those are. We also created a flowchart that should help with making the decision which calendar model to pick for a given date:
https://commons.wikimedia.org/wiki/File:Wikidata_Calendar_Model_Decision_Tre...
Thank you to everyone who helped us investigate and get to the bottom of the issue. Sorry again this has happened and is causing work. I feel miserable about this and if there is anything more we can do to help with the cleanup please do let me know.
Let's please keep further discussion about this in one place on-wiki at https://www.wikidata.org/wiki/Wikidata:Project_chat#calendar_model_screwup
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
It's worrying to hear this. The Italian Wikisource community strove to get calendar models right.
But I'm sure Magnus will come up with a tool to fix them ;-)
Il 30/06/2015 19:38, Lydia Pintscher ha scritto:
Hi everyone,
I have some bad news. We screwed up. I’m really sorry about this. I’d really appreciate everyone’s help with fixing it.
TLDR: We have a bad mixup of calendar models for the dates in Wikidata and we need to fix them.
==== What happened? ==== Wikidata dates have a calendar model. This can be Julian or Gregorian and the plan is to support more in the future. There are two ways to interpret this calendar model: # the given date is in this calendar model # the given date is Gregorian and this calendar model says if the date should be displayed in Gregorian or Julian in the user interface
Unfortunately both among the developers as well as bot operators there was confusion about which of those is to be used. This lead to inconsistencies in the backend/frontend code as well as different bot authors treating the calendar model differently. In addition the user interface had problematic defaults. We now have a number of dates with a potentially wrong calendar model. The biggest issue started when we moved code from the frontend to the backend in Mid 2014 in order to improve performance. Prior to the move, the user interface used to make the conversion from one model to the other. After the move, the conversion was not done anywhere anymore - but the calendar model was still displayed. We made one part better but in the process broke another part badly :(
==== What now? ====
- Going forward the date data value will be given in both the
normalized proleptic Gregorian calendar as well as in the calendar model explicitly given (which currently supports, as said, proleptic Gregorian and proleptic Julian).
- The user interface will again indicate which calendar model the date
is given in. We will improve documentation around this to make sure there is no confusion from now on.
- We made a flowchart to help decide what the correct calendar model
for a date should be to help with the clean up.
- We are improving the user interface to make it easier to understand
what is going on and by default do the right thing.
- We are providing a list of dates that need to be checked and
potentially fixed.
- How are we making sure it doesn’t happen again?
- We are improving documentation around dates and will look for other
potential ambiguous concepts we have.
==== How can we fix it? ==== We have created a list of all dates that potentially need checking. We can either provide this as a list on some wiki page or run a bot to add “instance of: date needing calendar model check“ or something similar as a qualifier to the respective dates. What do you prefer? The list probably contains dates we can batch-change or approve but we’d need your help with figuring out which those are. We also created a flowchart that should help with making the decision which calendar model to pick for a given date: https://commons.wikimedia.org/wiki/File:Wikidata_Calendar_Model_Decision_Tre...
Thank you to everyone who helped us investigate and get to the bottom of the issue. Sorry again this has happened and is causing work. I feel miserable about this and if there is anything more we can do to help with the cleanup please do let me know.
Let's please keep further discussion about this in one place on-wiki at https://www.wikidata.org/wiki/Wikidata:Project_chat#calendar_model_screwup
Cheers Lydia
Hi everyone,
Thanks to Lydia and the team for containing this issue and providing the necessary documentation for fixing it. For all of you who wonder what the scale of the issue is (a.k.a. "How bad is it?"), here are some numbers.
The most important years for better understanding:
1582: Gregorian Calendar is introduced; some countries switch quickly 1753: most countries have made the switch (Sweden and UK quite late), only Greece and Russia continue to use Julian dates 1923: Greece finally switches to Gregorian, all countries switched
Only dates with day-level precision are affected (the shift between calendars was never more than half a month). This said, the following numbers (state 22 June 2015) should make some sense:
* Dates in Wikipedia overall: 12,549,662 (100%) * Dates that are precise at least to the day: 11.216.635 (89%): ** Since 1923: 10,406,630 (83%) ** 1753-1922: 716,711 (5.7%) ** 1582-1752: 64,325 (0.5%) ** Until 1581: 28,969 (0.2%)
In other words, disregarding Greece and Russia, at most 0.7% of Wikidata dates are affected. This still makes a notable number of over 93,000 dates, though the 64,325 from 1582-1752 are an overestimate (many countries had already switched).
It is not easy to say how many potentially Julian dates are among the 5.7% that happened before Greece introduced Gregorian, but it seems likely that a good majority did not occur in Russia or Greece.
These numbers were as expected for me. What I was more surprised by is the rare use of "Julian calendar" as a calendar model in the data:
* Dates in Wikipedia overall: 12,549,662 (100%) ** Gregorian dates: 12,529,635 (99.8%) ** Julian dates: 20,027 (0.15%)
This means that not even the 0.2% that happened before the invention of Gregorian Calendar are tagged as Julian. Now some dates before 1582 may correctly use Gregorian calendar (e.g., no calendar model makes sense for the beginning of the universe, so we might as well leave Gregorian there). However, I would expect that basically all dates with day-precision before 1582 are from historic records and should therefore use Julian. So it seems that there is some work to do there.
It might be good to find out where these many historic dates came from. When entering dates in the UI, it will suggest Julian for these times, so it seems unlikely that users have entered most of them. If they came through a bot, it would be good to find out what the bot author was doing. If you upload historic dates (maybe birth dates), they should come in Julian too. As Lydia explained, there are two things that the bot author might have thought:
(1) "I should use the calendar model setting to tell Wikidata which calendar model my dates are in" (2) "I should upload Gregorian dates and use the calendar model setting to tell Wikidata which calendar model my dates should be displayed in"
In either case, the natural choice would be to use Julian. Why could a bot author possibly have specified "Gregorian" for a date before 1582? A bot author might convert Julian dates to Gregorian if (s)he would expect option (2) to be correct (since this would require all dates to be sent in Gregorian). But in this case, the bot would still set "Julian" as the calendar model to use for display.
Whatever way I look at it, it seems likely that our historic dates need some validation anyway. Maybe the calendar model confusion Lydia explained is not the only issue here. And adding more references would also be very useful on its own right.
Best regards,
Markus
On 30.06.2015 19:38, Lydia Pintscher wrote:
Hi everyone,
I have some bad news. We screwed up. I’m really sorry about this. I’d really appreciate everyone’s help with fixing it.
TLDR: We have a bad mixup of calendar models for the dates in Wikidata and we need to fix them.
==== What happened? ==== Wikidata dates have a calendar model. This can be Julian or Gregorian and the plan is to support more in the future. There are two ways to interpret this calendar model: # the given date is in this calendar model # the given date is Gregorian and this calendar model says if the date should be displayed in Gregorian or Julian in the user interface
Unfortunately both among the developers as well as bot operators there was confusion about which of those is to be used. This lead to inconsistencies in the backend/frontend code as well as different bot authors treating the calendar model differently. In addition the user interface had problematic defaults. We now have a number of dates with a potentially wrong calendar model. The biggest issue started when we moved code from the frontend to the backend in Mid 2014 in order to improve performance. Prior to the move, the user interface used to make the conversion from one model to the other. After the move, the conversion was not done anywhere anymore - but the calendar model was still displayed. We made one part better but in the process broke another part badly :(
==== What now? ====
- Going forward the date data value will be given in both the
normalized proleptic Gregorian calendar as well as in the calendar model explicitly given (which currently supports, as said, proleptic Gregorian and proleptic Julian).
- The user interface will again indicate which calendar model the date
is given in. We will improve documentation around this to make sure there is no confusion from now on.
- We made a flowchart to help decide what the correct calendar model
for a date should be to help with the clean up.
- We are improving the user interface to make it easier to understand
what is going on and by default do the right thing.
- We are providing a list of dates that need to be checked and
potentially fixed.
- How are we making sure it doesn’t happen again?
- We are improving documentation around dates and will look for other
potential ambiguous concepts we have.
==== How can we fix it? ==== We have created a list of all dates that potentially need checking. We can either provide this as a list on some wiki page or run a bot to add “instance of: date needing calendar model check“ or something similar as a qualifier to the respective dates. What do you prefer? The list probably contains dates we can batch-change or approve but we’d need your help with figuring out which those are. We also created a flowchart that should help with making the decision which calendar model to pick for a given date: https://commons.wikimedia.org/wiki/File:Wikidata_Calendar_Model_Decision_Tre...
Thank you to everyone who helped us investigate and get to the bottom of the issue. Sorry again this has happened and is causing work. I feel miserable about this and if there is anything more we can do to help with the cleanup please do let me know.
Let's please keep further discussion about this in one place on-wiki at https://www.wikidata.org/wiki/Wikidata:Project_chat#calendar_model_screwup
Cheers Lydia
Please also keep in mind that not all calendars set the start of day at the same time. This is not a problem if you only have Julian and Gregorian, but it certainly is if you introduce other calendars.
Two events may happen in the same day in one calendar, and on two different days in another calendar.
Also, there surely exist ancient calendars whose exact delta from Gregorian is not known with certainty.
All dates must have their associated calendar, as converting between calendars may be extremely difficult, or impossible.
P.
On Tue, Jun 30, 2015 at 11:32 PM, Markus Krötzsch markus@semantic-mediawiki.org wrote:
Hi everyone,
Thanks to Lydia and the team for containing this issue and providing the necessary documentation for fixing it. For all of you who wonder what the scale of the issue is (a.k.a. "How bad is it?"), here are some numbers.
The most important years for better understanding:
1582: Gregorian Calendar is introduced; some countries switch quickly 1753: most countries have made the switch (Sweden and UK quite late), only Greece and Russia continue to use Julian dates 1923: Greece finally switches to Gregorian, all countries switched
Only dates with day-level precision are affected (the shift between calendars was never more than half a month). This said, the following numbers (state 22 June 2015) should make some sense:
- Dates in Wikipedia overall: 12,549,662 (100%)
- Dates that are precise at least to the day: 11.216.635 (89%):
** Since 1923: 10,406,630 (83%) ** 1753-1922: 716,711 (5.7%) ** 1582-1752: 64,325 (0.5%) ** Until 1581: 28,969 (0.2%)
In other words, disregarding Greece and Russia, at most 0.7% of Wikidata dates are affected. This still makes a notable number of over 93,000 dates, though the 64,325 from 1582-1752 are an overestimate (many countries had already switched).
It is not easy to say how many potentially Julian dates are among the 5.7% that happened before Greece introduced Gregorian, but it seems likely that a good majority did not occur in Russia or Greece.
These numbers were as expected for me. What I was more surprised by is the rare use of "Julian calendar" as a calendar model in the data:
- Dates in Wikipedia overall: 12,549,662 (100%)
** Gregorian dates: 12,529,635 (99.8%) ** Julian dates: 20,027 (0.15%)
This means that not even the 0.2% that happened before the invention of Gregorian Calendar are tagged as Julian. Now some dates before 1582 may correctly use Gregorian calendar (e.g., no calendar model makes sense for the beginning of the universe, so we might as well leave Gregorian there). However, I would expect that basically all dates with day-precision before 1582 are from historic records and should therefore use Julian. So it seems that there is some work to do there.
It might be good to find out where these many historic dates came from. When entering dates in the UI, it will suggest Julian for these times, so it seems unlikely that users have entered most of them. If they came through a bot, it would be good to find out what the bot author was doing. If you upload historic dates (maybe birth dates), they should come in Julian too. As Lydia explained, there are two things that the bot author might have thought:
(1) "I should use the calendar model setting to tell Wikidata which calendar model my dates are in" (2) "I should upload Gregorian dates and use the calendar model setting to tell Wikidata which calendar model my dates should be displayed in"
In either case, the natural choice would be to use Julian. Why could a bot author possibly have specified "Gregorian" for a date before 1582? A bot author might convert Julian dates to Gregorian if (s)he would expect option (2) to be correct (since this would require all dates to be sent in Gregorian). But in this case, the bot would still set "Julian" as the calendar model to use for display.
Whatever way I look at it, it seems likely that our historic dates need some validation anyway. Maybe the calendar model confusion Lydia explained is not the only issue here. And adding more references would also be very useful on its own right.
Best regards,
Markus
On 30.06.2015 19:38, Lydia Pintscher wrote:
Hi everyone,
I have some bad news. We screwed up. I’m really sorry about this. I’d really appreciate everyone’s help with fixing it.
TLDR: We have a bad mixup of calendar models for the dates in Wikidata and we need to fix them.
==== What happened? ==== Wikidata dates have a calendar model. This can be Julian or Gregorian and the plan is to support more in the future. There are two ways to interpret this calendar model: # the given date is in this calendar model # the given date is Gregorian and this calendar model says if the date should be displayed in Gregorian or Julian in the user interface
Unfortunately both among the developers as well as bot operators there was confusion about which of those is to be used. This lead to inconsistencies in the backend/frontend code as well as different bot authors treating the calendar model differently. In addition the user interface had problematic defaults. We now have a number of dates with a potentially wrong calendar model. The biggest issue started when we moved code from the frontend to the backend in Mid 2014 in order to improve performance. Prior to the move, the user interface used to make the conversion from one model to the other. After the move, the conversion was not done anywhere anymore - but the calendar model was still displayed. We made one part better but in the process broke another part badly :(
==== What now? ====
- Going forward the date data value will be given in both the
normalized proleptic Gregorian calendar as well as in the calendar model explicitly given (which currently supports, as said, proleptic Gregorian and proleptic Julian).
- The user interface will again indicate which calendar model the date
is given in. We will improve documentation around this to make sure there is no confusion from now on.
- We made a flowchart to help decide what the correct calendar model
for a date should be to help with the clean up.
- We are improving the user interface to make it easier to understand
what is going on and by default do the right thing.
- We are providing a list of dates that need to be checked and
potentially fixed.
- How are we making sure it doesn’t happen again?
- We are improving documentation around dates and will look for other
potential ambiguous concepts we have.
==== How can we fix it? ==== We have created a list of all dates that potentially need checking. We can either provide this as a list on some wiki page or run a bot to add “instance of: date needing calendar model check“ or something similar as a qualifier to the respective dates. What do you prefer? The list probably contains dates we can batch-change or approve but we’d need your help with figuring out which those are. We also created a flowchart that should help with making the decision which calendar model to pick for a given date:
https://commons.wikimedia.org/wiki/File:Wikidata_Calendar_Model_Decision_Tre...
Thank you to everyone who helped us investigate and get to the bottom of the issue. Sorry again this has happened and is causing work. I feel miserable about this and if there is anything more we can do to help with the cleanup please do let me know.
Let's please keep further discussion about this in one place on-wiki at https://www.wikidata.org/wiki/Wikidata:Project_chat#calendar_model_screwup
Cheers Lydia
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Dear Pierpaolo,
This thread was only about Julian and Gregorian calendar dates. If and how other calendar models should be supported in some future is another (potentially big) discussion. As you said, there are many issues there. Let's first make sure that we handle the "easy" 99.9% of cases correctly before discussing any more complicated options.
Best regards,
Markus
On 01.07.2015 01:11, Pierpaolo Bernardi wrote:
Please also keep in mind that not all calendars set the start of day at the same time. This is not a problem if you only have Julian and Gregorian, but it certainly is if you introduce other calendars.
Two events may happen in the same day in one calendar, and on two different days in another calendar.
Also, there surely exist ancient calendars whose exact delta from Gregorian is not known with certainty.
All dates must have their associated calendar, as converting between calendars may be extremely difficult, or impossible.
P.
On Tue, Jun 30, 2015 at 11:32 PM, Markus Krötzsch markus@semantic-mediawiki.org wrote:
Hi everyone,
Thanks to Lydia and the team for containing this issue and providing the necessary documentation for fixing it. For all of you who wonder what the scale of the issue is (a.k.a. "How bad is it?"), here are some numbers.
The most important years for better understanding:
1582: Gregorian Calendar is introduced; some countries switch quickly 1753: most countries have made the switch (Sweden and UK quite late), only Greece and Russia continue to use Julian dates 1923: Greece finally switches to Gregorian, all countries switched
Only dates with day-level precision are affected (the shift between calendars was never more than half a month). This said, the following numbers (state 22 June 2015) should make some sense:
- Dates in Wikipedia overall: 12,549,662 (100%)
- Dates that are precise at least to the day: 11.216.635 (89%):
** Since 1923: 10,406,630 (83%) ** 1753-1922: 716,711 (5.7%) ** 1582-1752: 64,325 (0.5%) ** Until 1581: 28,969 (0.2%)
In other words, disregarding Greece and Russia, at most 0.7% of Wikidata dates are affected. This still makes a notable number of over 93,000 dates, though the 64,325 from 1582-1752 are an overestimate (many countries had already switched).
It is not easy to say how many potentially Julian dates are among the 5.7% that happened before Greece introduced Gregorian, but it seems likely that a good majority did not occur in Russia or Greece.
These numbers were as expected for me. What I was more surprised by is the rare use of "Julian calendar" as a calendar model in the data:
- Dates in Wikipedia overall: 12,549,662 (100%)
** Gregorian dates: 12,529,635 (99.8%) ** Julian dates: 20,027 (0.15%)
This means that not even the 0.2% that happened before the invention of Gregorian Calendar are tagged as Julian. Now some dates before 1582 may correctly use Gregorian calendar (e.g., no calendar model makes sense for the beginning of the universe, so we might as well leave Gregorian there). However, I would expect that basically all dates with day-precision before 1582 are from historic records and should therefore use Julian. So it seems that there is some work to do there.
It might be good to find out where these many historic dates came from. When entering dates in the UI, it will suggest Julian for these times, so it seems unlikely that users have entered most of them. If they came through a bot, it would be good to find out what the bot author was doing. If you upload historic dates (maybe birth dates), they should come in Julian too. As Lydia explained, there are two things that the bot author might have thought:
(1) "I should use the calendar model setting to tell Wikidata which calendar model my dates are in" (2) "I should upload Gregorian dates and use the calendar model setting to tell Wikidata which calendar model my dates should be displayed in"
In either case, the natural choice would be to use Julian. Why could a bot author possibly have specified "Gregorian" for a date before 1582? A bot author might convert Julian dates to Gregorian if (s)he would expect option (2) to be correct (since this would require all dates to be sent in Gregorian). But in this case, the bot would still set "Julian" as the calendar model to use for display.
Whatever way I look at it, it seems likely that our historic dates need some validation anyway. Maybe the calendar model confusion Lydia explained is not the only issue here. And adding more references would also be very useful on its own right.
Best regards,
Markus
On 30.06.2015 19:38, Lydia Pintscher wrote:
Hi everyone,
I have some bad news. We screwed up. I’m really sorry about this. I’d really appreciate everyone’s help with fixing it.
TLDR: We have a bad mixup of calendar models for the dates in Wikidata and we need to fix them.
==== What happened? ==== Wikidata dates have a calendar model. This can be Julian or Gregorian and the plan is to support more in the future. There are two ways to interpret this calendar model: # the given date is in this calendar model # the given date is Gregorian and this calendar model says if the date should be displayed in Gregorian or Julian in the user interface
Unfortunately both among the developers as well as bot operators there was confusion about which of those is to be used. This lead to inconsistencies in the backend/frontend code as well as different bot authors treating the calendar model differently. In addition the user interface had problematic defaults. We now have a number of dates with a potentially wrong calendar model. The biggest issue started when we moved code from the frontend to the backend in Mid 2014 in order to improve performance. Prior to the move, the user interface used to make the conversion from one model to the other. After the move, the conversion was not done anywhere anymore - but the calendar model was still displayed. We made one part better but in the process broke another part badly :(
==== What now? ====
- Going forward the date data value will be given in both the
normalized proleptic Gregorian calendar as well as in the calendar model explicitly given (which currently supports, as said, proleptic Gregorian and proleptic Julian).
- The user interface will again indicate which calendar model the date
is given in. We will improve documentation around this to make sure there is no confusion from now on.
- We made a flowchart to help decide what the correct calendar model
for a date should be to help with the clean up.
- We are improving the user interface to make it easier to understand
what is going on and by default do the right thing.
- We are providing a list of dates that need to be checked and
potentially fixed.
- How are we making sure it doesn’t happen again?
- We are improving documentation around dates and will look for other
potential ambiguous concepts we have.
==== How can we fix it? ==== We have created a list of all dates that potentially need checking. We can either provide this as a list on some wiki page or run a bot to add “instance of: date needing calendar model check“ or something similar as a qualifier to the respective dates. What do you prefer? The list probably contains dates we can batch-change or approve but we’d need your help with figuring out which those are. We also created a flowchart that should help with making the decision which calendar model to pick for a given date:
https://commons.wikimedia.org/wiki/File:Wikidata_Calendar_Model_Decision_Tre...
Thank you to everyone who helped us investigate and get to the bottom of the issue. Sorry again this has happened and is causing work. I feel miserable about this and if there is anything more we can do to help with the cleanup please do let me know.
Let's please keep further discussion about this in one place on-wiki at https://www.wikidata.org/wiki/Wikidata:Project_chat#calendar_model_screwup
Cheers Lydia
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On Wed, Jul 1, 2015 at 8:17 AM, Markus Krötzsch markus@semantic-mediawiki.org wrote:
Dear Pierpaolo,
This thread was only about Julian and Gregorian calendar dates. If and how other calendar models should be supported in some future is another (potentially big) discussion. As you said, there are many issues there. Let's first make sure that we handle the "easy" 99.9% of cases correctly before discussing any more complicated options.
Lydia Pintscher in the starting email explained that there's a model for calendars, and unfortunately this model could be (and has been) interpreted in two ways (AFAIU).
My intention was to point out that one of the two interpretations is not sound. This leaves the other one as the only viable one.
Cheers P.
On 01.07.2015 16:00, Pierpaolo Bernardi wrote:
On Wed, Jul 1, 2015 at 8:17 AM, Markus Krötzsch markus@semantic-mediawiki.org wrote:
Dear Pierpaolo,
This thread was only about Julian and Gregorian calendar dates. If and how other calendar models should be supported in some future is another (potentially big) discussion. As you said, there are many issues there. Let's first make sure that we handle the "easy" 99.9% of cases correctly before discussing any more complicated options.
Lydia Pintscher in the starting email explained that there's a model for calendars, and unfortunately this model could be (and has been) interpreted in two ways (AFAIU).
My intention was to point out that one of the two interpretations is not sound. This leaves the other one as the only viable one.
To clarify: the problem that Lydia discussed has occurred on another (more technical) level. It is not about the question whether there are further calendar models that are incompatible to Julian and Gregorian, but about the two calendar models that are captured by what Wikidata calls the "date" type. This type does not support dates that cannot be converted into one another. This is the usual trade-off you have when building a data-based system: you have to restrict the possible formats to ensure that the resulting data is still usable. For example, we could capture many more complex things and nuances of reality in free text, but then we would not have Wikidata but Wikipedia ;-)
What is colloquially called a calendar date can be anywhere between clearly defined time point to a rough suggestion of a relative time frame. Wikidata already makes a lot of commitments towards a less strict notion of "date", many of which are not fully supported and correctly used now (timezones, "before" and "after" -- even the meaning of "precision" is all but clear). Many of these features have been implemented as a response to user queries for making date entry even more general, to cover even more corner cases. For data consumers, this makes the data much harder to use. It creates a cost for everyone. So far, there is only the cost, and not the benefit (or is anybody using "before" and "after"? Yet I have to deal with it when reading data!). Let's first make use of what we have (this includes proper UI support for timezone annotation and precision windows), before discussing even more complex notions of calendar and time.
But don't worry: there will surely be more calendar models that can be supported properly, in a specified and clear way. However, it is definitely not planned that all possible calendar models will at some time be implemented. A basic design goal of the "date" type in Wikidata is that dates remain compatible on the day level. Calendars that are too far away from this should use own properties (maybe of type string, maybe of another special date type). One can then give approximate Gregorian/Julian dates in addition by using the standard date properties of Wikidata (these approximate dates would then not capture the exact moment, but the best possible approximation). In this way, one can get the best of both worlds: exact date information in native calendar models and maximal compatibility with major time-based applications (such as Histropedia) and query services (all time-related query functions in SPARQL databases are based on Gregorian dates).
Regards,
Markus
Cheers P.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
I would find this discussion easier to follow if the Wikidata identifiers for the various classes and properties were mentioned, and there were pointers to relevant documentation.
The only Wikidata class or property that I could easily find is Q205892. It's discussion page, https://www.wikidata.org/wiki/Talk:Q205892, mentions a bit about conversion, but nothing about this issue.
The page segment that is supposed to be being used for discussion, https://www.wikidata.org/wiki/Wikidata:Project_chat#calendar_model_screwup, does not have any pointers to any classes, properties, or documentation.
Even the very nice email from Markus that gives numbers does not provide any information on where the numbers come from.
Please
On 01.07.2015 18:03, Peter F. Patel-Schneider wrote: ...
Even the very nice email from Markus that gives numbers does not provide any information on where the numbers come from.
I just ran a simple Java program based on Wikidata Toolkit to count the date values. The features I used for counting are all part of the data (concretely I accessed: year number, precision, and calendar model). I used the JSON dump of 22 June 2015. The program counted all dates that occur in any place (main values of statements, qualifiers, and references). No other special processing was done.
Below is the main code snippet that did the counting, in case my description was too vague. If you want to get your own numbers, it does not require much (I just modified one of the example programs in Wikidata Toolkit that gathers general statistics). Running the code took about 25min on my laptop (the initial dump download took longer though). The SPARQL endpoint at https://wdqs-beta.wmflabs.org/ should also return useful counts if it does not time out on the very large numbers. It uses life data.
Best regards,
Markus
// after determining that snak is of appropriate type: String cm = ((TimeValue) ((ValueSnak) snak).getValue()) .getPreferredCalendarModel(); if (TimeValue.CM_GREGORIAN_PRO.equals(cm)) { this.countGregDates++; } else if (TimeValue.CM_JULIAN_PRO.equals(cm)) { this.countJulDates++; } else { System.err.println("Weird calendar model: " + ((ValueSnak) snak).getValue()); }
if (((TimeValue) ((ValueSnak) snak).getValue()).getPrecision() <= TimeValue.PREC_MONTH) { return; }
long year = ((TimeValue) ((ValueSnak) snak).getValue()).getYear(); if (year >= 1923) { this.countModernDates++; } else if (year >= 1753) { this.countAlmostModernDates++; } else if (year >= 1582) { this.countTransitionDates++; } else { this.countOldenDates++; }
Thanks.
This helps in finding out how to reproduce the numbers.
However, I'm still confused as to how these bits of data are part of the Wikidata data/knowledge model. Where is the description of getPreferredCalendarModel, for example?
http://javadox.com/org.wikidata.wdtk/wdtk-datamodel/0.1.0/org/wikidata/wdtk/... Is a *partial* description of what is going on. Changes to this document would be somewhat useful. However, what I'm really looking for is a description of how time works in Wikidata.
peter
PS: I note that there are lots of aspects of TimeValue that are only suitable for the Gregorian and Julian calendars.
On 07/01/2015 09:24 AM, Markus Krötzsch wrote:
On 01.07.2015 18:03, Peter F. Patel-Schneider wrote: ...
Even the very nice email from Markus that gives numbers does not provide any information on where the numbers come from.
I just ran a simple Java program based on Wikidata Toolkit to count the date values. The features I used for counting are all part of the data (concretely I accessed: year number, precision, and calendar model). I used the JSON dump of 22 June 2015. The program counted all dates that occur in any place (main values of statements, qualifiers, and references). No other special processing was done.
Below is the main code snippet that did the counting, in case my description was too vague. If you want to get your own numbers, it does not require much (I just modified one of the example programs in Wikidata Toolkit that gathers general statistics). Running the code took about 25min on my laptop (the initial dump download took longer though). The SPARQL endpoint at https://wdqs-beta.wmflabs.org/ should also return useful counts if it does not time out on the very large numbers. It uses life data.
Best regards,
Markus
// after determining that snak is of appropriate type: String cm = ((TimeValue) ((ValueSnak) snak).getValue()) .getPreferredCalendarModel(); if (TimeValue.CM_GREGORIAN_PRO.equals(cm)) { this.countGregDates++; } else if (TimeValue.CM_JULIAN_PRO.equals(cm)) { this.countJulDates++; } else { System.err.println("Weird calendar model: " + ((ValueSnak) snak).getValue()); }
if (((TimeValue) ((ValueSnak) snak).getValue()).getPrecision() <= TimeValue.PREC_MONTH) { return; }
long year = ((TimeValue) ((ValueSnak) snak).getValue()).getYear(); if (year >= 1923) { this.countModernDates++; } else if (year >= 1753) { this.countAlmostModernDates++; } else if (year >= 1582) { this.countTransitionDates++; } else { this.countOldenDates++; }
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Peter,
you might be looking for this:
https://www.mediawiki.org/wiki/Wikibase/DataModel#Dates_and_times
Cheers, Denny
On Wed, Jul 1, 2015 at 9:48 AM Peter F. Patel-Schneider < pfpschneider@gmail.com> wrote:
Thanks.
This helps in finding out how to reproduce the numbers.
However, I'm still confused as to how these bits of data are part of the Wikidata data/knowledge model. Where is the description of getPreferredCalendarModel, for example?
http://javadox.com/org.wikidata.wdtk/wdtk-datamodel/0.1.0/org/wikidata/wdtk/... Is a *partial* description of what is going on. Changes to this document would be somewhat useful. However, what I'm really looking for is a description of how time works in Wikidata.
peter
PS: I note that there are lots of aspects of TimeValue that are only suitable for the Gregorian and Julian calendars.
On 07/01/2015 09:24 AM, Markus Krötzsch wrote:
On 01.07.2015 18:03, Peter F. Patel-Schneider wrote: ...
Even the very nice email from Markus that gives numbers does not provide any information on where the numbers come from.
I just ran a simple Java program based on Wikidata Toolkit to count the date values. The features I used for counting are all part of the data (concretely I accessed: year number, precision, and calendar model). I used the JSON dump of 22 June 2015. The program counted all dates that occur in any place (main values of statements, qualifiers, and references). No other special processing was done.
Below is the main code snippet that did the counting, in case my description was too vague. If you want to get your own numbers, it does not require much (I just modified one of the example programs in Wikidata Toolkit that gathers general statistics). Running the code took about 25min on my laptop (the initial dump download took longer though). The SPARQL endpoint at https://wdqs-beta.wmflabs.org/ should also return useful counts if it does not time out on the very large numbers. It uses life data.
Best regards,
Markus
// after determining that snak is of appropriate type: String cm = ((TimeValue) ((ValueSnak) snak).getValue()) .getPreferredCalendarModel(); if (TimeValue.CM_GREGORIAN_PRO.equals(cm)) { this.countGregDates++; } else if (TimeValue.CM_JULIAN_PRO.equals(cm)) { this.countJulDates++; } else { System.err.println("Weird calendar model: " + ((ValueSnak) snak).getValue()); }
if (((TimeValue) ((ValueSnak) snak).getValue()).getPrecision() <= TimeValue.PREC_MONTH) { return; }
long year = ((TimeValue) ((ValueSnak) snak).getValue()).getYear(); if (year >= 1923) { this.countModernDates++; } else if (year >= 1753) { this.countAlmostModernDates++; } else if (year >= 1582) { this.countTransitionDates++; } else { this.countOldenDates++; }
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Thanks. That helps a lot. Is that the way that things are going to be done in the future, i.e., dates will be stored using the specified calendar model instead of being converted?
peter
On 07/01/2015 10:52 AM, Denny Vrandečić wrote:
Peter,
you might be looking for this:
https://www.mediawiki.org/wiki/Wikibase/DataModel#Dates_and_times
Cheers, Denny
On Wed, Jul 1, 2015 at 9:48 AM Peter F. Patel-Schneider <pfpschneider@gmail.com mailto:pfpschneider@gmail.com> wrote:
Thanks. This helps in finding out how to reproduce the numbers. However, I'm still confused as to how these bits of data are part of the Wikidata data/knowledge model. Where is the description of getPreferredCalendarModel, for example? http://javadox.com/org.wikidata.wdtk/wdtk-datamodel/0.1.0/org/wikidata/wdtk/datamodel/interfaces/TimeValue.html Is a *partial* description of what is going on. Changes to this document would be somewhat useful. However, what I'm really looking for is a description of how time works in Wikidata. peter PS: I note that there are lots of aspects of TimeValue that are only suitable for the Gregorian and Julian calendars. On 07/01/2015 09:24 AM, Markus Krötzsch wrote: > On 01.07.2015 18:03, Peter F. Patel-Schneider wrote: ... >> >> Even the very nice email from Markus that gives numbers does not >> provide any information on where the numbers come from. > > I just ran a simple Java program based on Wikidata Toolkit to count the > date values. The features I used for counting are all part of the data > (concretely I accessed: year number, precision, and calendar model). I > used the JSON dump of 22 June 2015. The program counted all dates that > occur in any place (main values of statements, qualifiers, and > references). No other special processing was done. > > Below is the main code snippet that did the counting, in case my > description was too vague. If you want to get your own numbers, it does > not require much (I just modified one of the example programs in Wikidata > Toolkit that gathers general statistics). Running the code took about > 25min on my laptop (the initial dump download took longer though). The > SPARQL endpoint at https://wdqs-beta.wmflabs.org/ should also return > useful counts if it does not time out on the very large numbers. It uses > life data. > > Best regards, > > Markus > > > // after determining that snak is of appropriate type: String cm = > ((TimeValue) ((ValueSnak) snak).getValue()) > .getPreferredCalendarModel(); if (TimeValue.CM_GREGORIAN_PRO.equals(cm)) > { this.countGregDates++; } else if (TimeValue.CM_JULIAN_PRO.equals(cm)) > { this.countJulDates++; } else { System.err.println("Weird calendar > model: " + ((ValueSnak) snak).getValue()); } > > if (((TimeValue) ((ValueSnak) snak).getValue()).getPrecision() <= > TimeValue.PREC_MONTH) { return; } > > long year = ((TimeValue) ((ValueSnak) snak).getValue()).getYear(); if > (year >= 1923) { this.countModernDates++; } else if (year >= 1753) { > this.countAlmostModernDates++; } else if (year >= 1582) { > this.countTransitionDates++; } else { this.countOldenDates++; } > > > _______________________________________________ Wikidata mailing list > Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> > https://lists.wikimedia.org/mailman/listinfo/wikidata _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Open a new thread for discussion of calendar models in general.
On Wed, Jul 1, 2015 at 4:49 PM, Markus Krötzsch markus@semantic-mediawiki.org wrote:
On 01.07.2015 16:00, Pierpaolo Bernardi wrote:
On Wed, Jul 1, 2015 at 8:17 AM, Markus Krötzsch markus@semantic-mediawiki.org wrote:
Dear Pierpaolo,
This thread was only about Julian and Gregorian calendar dates. If and how other calendar models should be supported in some future is another (potentially big) discussion. As you said, there are many issues there. Let's first make sure that we handle the "easy" 99.9% of cases correctly before discussing any more complicated options.
Lydia Pintscher in the starting email explained that there's a model for calendars, and unfortunately this model could be (and has been) interpreted in two ways (AFAIU).
My intention was to point out that one of the two interpretations is not sound. This leaves the other one as the only viable one.
To clarify: the problem that Lydia discussed has occurred on another (more technical) level. It is not about the question whether there are further calendar models that are incompatible to Julian and Gregorian, but about the two calendar models that are captured by what Wikidata calls the "date" type. This type does not support dates that cannot be converted into one another. This is the usual trade-off you have when building a data-based system: you have to restrict the possible formats to ensure that the resulting data is still usable. For example, we could capture many more complex things and nuances of reality in free text, but then we would not have Wikidata but Wikipedia ;-)
What is colloquially called a calendar date can be anywhere between clearly defined time point to a rough suggestion of a relative time frame. Wikidata already makes a lot of commitments towards a less strict notion of "date", many of which are not fully supported and correctly used now (timezones, "before" and "after" -- even the meaning of "precision" is all but clear). Many of these features have been implemented as a response to user queries for making date entry even more general, to cover even more corner cases. For data consumers, this makes the data much harder to use. It creates a cost for everyone. So far, there is only the cost, and not the benefit (or is anybody using "before" and "after"? Yet I have to deal with it when reading data!). Let's first make use of what we have (this includes proper UI support for timezone annotation and precision windows), before discussing even more complex notions of calendar and time.
But don't worry: there will surely be more calendar models that can be supported properly, in a specified and clear way. However, it is definitely not planned that all possible calendar models will at some time be implemented. A basic design goal of the "date" type in Wikidata is that dates remain compatible on the day level. Calendars that are too far away from this should use own properties (maybe of type string, maybe of another special date type). One can then give approximate Gregorian/Julian dates in addition by using the standard date properties of Wikidata (these approximate dates would then not capture the exact moment, but the best possible approximation). In this way, one can get the best of both worlds: exact date information in native calendar models and maximal compatibility with major time-based applications (such as Histropedia) and query services (all time-related query functions in SPARQL databases are based on Gregorian dates).
Regards,
Markus
Cheers P.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 07/01/2015 07:00 AM, Pierpaolo Bernardi wrote:
On Wed, Jul 1, 2015 at 8:17 AM, Markus Krötzsch markus@semantic-mediawiki.org wrote:
Dear Pierpaolo,
This thread was only about Julian and Gregorian calendar dates. If and how other calendar models should be supported in some future is another (potentially big) discussion. As you said, there are many issues there. Let's first make sure that we handle the "easy" 99.9% of cases correctly before discussing any more complicated options.
Lydia Pintscher in the starting email explained that there's a model for calendars, and unfortunately this model could be (and has been) interpreted in two ways (AFAIU).
My intention was to point out that one of the two interpretations is not sound. This leaves the other one as the only viable one.
Cheers P.
It appears (from the email only---there are no pointers to enduring documentation on the solution that are attached to the relevant classes or poperties) that the chosen method is to store dates in both the source calendar and the proleptic Gegorian calendar (https://www.wikidata.org/wiki/Wikidata:Project_chat#calendar_model_screwup). As you point out, this is not a viable solution for calendars whose days do not start at the same time as days in the proleptic Gegorian calendar (unless, of course, there is time and location information also available).
peter
On 01.07.2015 18:14, Peter F. Patel-Schneider wrote:
On 07/01/2015 07:00 AM, Pierpaolo Bernardi wrote:
On Wed, Jul 1, 2015 at 8:17 AM, Markus Krötzsch markus@semantic-mediawiki.org wrote:
Dear Pierpaolo,
This thread was only about Julian and Gregorian calendar dates. If and how other calendar models should be supported in some future is another (potentially big) discussion. As you said, there are many issues there. Let's first make sure that we handle the "easy" 99.9% of cases correctly before discussing any more complicated options.
Lydia Pintscher in the starting email explained that there's a model for calendars, and unfortunately this model could be (and has been) interpreted in two ways (AFAIU).
My intention was to point out that one of the two interpretations is not sound. This leaves the other one as the only viable one.
Cheers P.
It appears (from the email only---there are no pointers to enduring documentation on the solution that are attached to the relevant classes or poperties) that the chosen method is to store dates in both the source calendar and the proleptic Gegorian calendar (https://www.wikidata.org/wiki/Wikidata:Project_chat#calendar_model_screwup). As you point out, this is not a viable solution for calendars whose days do not start at the same time as days in the proleptic Gegorian calendar (unless, of course, there is time and location information also available).
The Wikidata date implementation intentionally restricts to dates that are compatible with the Gregorian calendar. Although the system refers to Wikidata item ids of calendar models to denote "Proleptic Gregorian" and "Proleptic Julian", the system does not allow users or bots to enter arbitrary items as calendar model.
My understanding (and the implementation in WDTK) is that all dates are provided in Gregorian calendar with a calendar model that specifies how they should be displayed (if possible). The date in the source calendar is for convenience and maybe for technical reasons on the side of the PHP implementation. At no time should the source calendar date be impossible to convert to Gregorian. We have had extensive discussions about this point -- Gregorian must remain the main format at all times.
This does not mean that we cannot have more models in the future. There is (currently unused) timezone information, which can be used to store offsets. Once fully implemented, this might allow exact conversion from calendar models that have another start for their days. So maybe this is not a case of real incompatibility. However, the timezone support for current dates needs to be finished before discussing the next steps into more exotic calendars.
Best regards,
Markus
Wouldn't it be better to use iso8601 as internal format?
ons. 1. jul. 2015, 18.45 skrev Markus Krötzsch < markus@semantic-mediawiki.org>:
On 01.07.2015 18:14, Peter F. Patel-Schneider wrote:
On 07/01/2015 07:00 AM, Pierpaolo Bernardi wrote:
On Wed, Jul 1, 2015 at 8:17 AM, Markus Krötzsch markus@semantic-mediawiki.org wrote:
Dear Pierpaolo,
This thread was only about Julian and Gregorian calendar dates. If and how other calendar models should be supported in some future is another (potentially big) discussion. As you said, there are many issues there. Let's first make sure that we handle the "easy" 99.9% of cases correctly before discussing any more complicated options.
Lydia Pintscher in the starting email explained that there's a model for calendars, and unfortunately this model could be (and has been) interpreted in two ways (AFAIU).
My intention was to point out that one of the two interpretations is not sound. This leaves the other one as the only viable one.
Cheers P.
It appears (from the email only---there are no pointers to enduring documentation on the solution that are attached to the relevant classes
or
poperties) that the chosen method is to store dates in both the source calendar and the proleptic Gegorian calendar (
https://www.wikidata.org/wiki/Wikidata:Project_chat#calendar_model_screwup ).
As you point out, this is not a viable solution for calendars whose days
do
not start at the same time as days in the proleptic Gegorian calendar (unless, of course, there is time and location information also
available).
The Wikidata date implementation intentionally restricts to dates that are compatible with the Gregorian calendar. Although the system refers to Wikidata item ids of calendar models to denote "Proleptic Gregorian" and "Proleptic Julian", the system does not allow users or bots to enter arbitrary items as calendar model.
My understanding (and the implementation in WDTK) is that all dates are provided in Gregorian calendar with a calendar model that specifies how they should be displayed (if possible). The date in the source calendar is for convenience and maybe for technical reasons on the side of the PHP implementation. At no time should the source calendar date be impossible to convert to Gregorian. We have had extensive discussions about this point -- Gregorian must remain the main format at all times.
This does not mean that we cannot have more models in the future. There is (currently unused) timezone information, which can be used to store offsets. Once fully implemented, this might allow exact conversion from calendar models that have another start for their days. So maybe this is not a case of real incompatibility. However, the timezone support for current dates needs to be finished before discussing the next steps into more exotic calendars.
Best regards,
Markus
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
That should be "default calendar model". My screw up... ;/
ons. 1. jul. 2015, 20.08 skrev John Erling Blad jeblad@gmail.com:
Wouldn't it be better to use iso8601 as internal format?
ons. 1. jul. 2015, 18.45 skrev Markus Krötzsch < markus@semantic-mediawiki.org>:
On 01.07.2015 18:14, Peter F. Patel-Schneider wrote:
On 07/01/2015 07:00 AM, Pierpaolo Bernardi wrote:
On Wed, Jul 1, 2015 at 8:17 AM, Markus Krötzsch markus@semantic-mediawiki.org wrote:
Dear Pierpaolo,
This thread was only about Julian and Gregorian calendar dates. If and how other calendar models should be supported in some future is another (potentially big) discussion. As you said, there are many issues there. Let's first make sure that we handle the "easy" 99.9% of cases correctly before discussing any more complicated options.
Lydia Pintscher in the starting email explained that there's a model
for
calendars, and unfortunately this model could be (and has been) interpreted in two ways (AFAIU).
My intention was to point out that one of the two interpretations is
not
sound. This leaves the other one as the only viable one.
Cheers P.
It appears (from the email only---there are no pointers to enduring documentation on the solution that are attached to the relevant classes
or
poperties) that the chosen method is to store dates in both the source calendar and the proleptic Gegorian calendar (
https://www.wikidata.org/wiki/Wikidata:Project_chat#calendar_model_screwup ).
As you point out, this is not a viable solution for calendars whose
days do
not start at the same time as days in the proleptic Gegorian calendar (unless, of course, there is time and location information also
available).
The Wikidata date implementation intentionally restricts to dates that are compatible with the Gregorian calendar. Although the system refers to Wikidata item ids of calendar models to denote "Proleptic Gregorian" and "Proleptic Julian", the system does not allow users or bots to enter arbitrary items as calendar model.
My understanding (and the implementation in WDTK) is that all dates are provided in Gregorian calendar with a calendar model that specifies how they should be displayed (if possible). The date in the source calendar is for convenience and maybe for technical reasons on the side of the PHP implementation. At no time should the source calendar date be impossible to convert to Gregorian. We have had extensive discussions about this point -- Gregorian must remain the main format at all times.
This does not mean that we cannot have more models in the future. There is (currently unused) timezone information, which can be used to store offsets. Once fully implemented, this might allow exact conversion from calendar models that have another start for their days. So maybe this is not a case of real incompatibility. However, the timezone support for current dates needs to be finished before discussing the next steps into more exotic calendars.
Best regards,
Markus
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 01.07.2015 20:08, John Erling Blad wrote:
Wouldn't it be better to use iso8601 as internal format?
Yes, that was essentially our original proposal. ISO8601 is a syntax for proleptic Gregorian dates, so this would be the internal calendar model. ISO has no such detailed way to specify precision, so this was an add-on we conceived for Wikidata as an optional annotation to the exact date). The idea was that all type "date" just means "ISO date + precision" and that Julian calendar support is provided by offering transparent conversion functions to the user (so you could always see and write Julian dates if you wanted to, without this changing the internal ISO form). The additions of "before" and "after" came later. The key idea to use ISO format internally while still supporting Julian dates with perfect round-tripping is implemented in Semantic MediaWiki, and the plan was to do this in Wikidata as well.
That was the theory. In practice, there was the confusion that Lydia described. Maybe the main problem was that bot authors were writing the internal data directly (for a while this was almost unfiltered). So when they would use the API like a user would use the UI ("set 'Julian' if you want to input your date in Julian"), then it would be wrong, since they would bypass the Julian conversion that the UI provided. This might have been the seed for the confusion that arose. This is only of historic interest now; we don't need to discuss where exactly the errors happened first. Better focus on fixing the dates now.
Best regards,
Markus
ons. 1. jul. 2015, 18.45 skrev Markus Krötzsch <markus@semantic-mediawiki.org mailto:markus@semantic-mediawiki.org>:
On 01.07.2015 18:14, Peter F. Patel-Schneider wrote: > On 07/01/2015 07:00 AM, Pierpaolo Bernardi wrote: >> On Wed, Jul 1, 2015 at 8:17 AM, Markus Krötzsch >> <markus@semantic-mediawiki.org <mailto:markus@semantic-mediawiki.org>> wrote: >>> Dear Pierpaolo, >>> >>> This thread was only about Julian and Gregorian calendar dates. If and >>> how other calendar models should be supported in some future is >>> another (potentially big) discussion. As you said, there are many >>> issues there. Let's first make sure that we handle the "easy" 99.9% of >>> cases correctly before discussing any more complicated options. >> >> Lydia Pintscher in the starting email explained that there's a model for >> calendars, and unfortunately this model could be (and has been) >> interpreted in two ways (AFAIU). >> >> My intention was to point out that one of the two interpretations is not >> sound. This leaves the other one as the only viable one. >> >> Cheers P. > > It appears (from the email only---there are no pointers to enduring > documentation on the solution that are attached to the relevant classes or > poperties) that the chosen method is to store dates in both the source > calendar and the proleptic Gegorian calendar > (https://www.wikidata.org/wiki/Wikidata:Project_chat#calendar_model_screwup). > As you point out, this is not a viable solution for calendars whose days do > not start at the same time as days in the proleptic Gegorian calendar > (unless, of course, there is time and location information also available). The Wikidata date implementation intentionally restricts to dates that are compatible with the Gregorian calendar. Although the system refers to Wikidata item ids of calendar models to denote "Proleptic Gregorian" and "Proleptic Julian", the system does not allow users or bots to enter arbitrary items as calendar model. My understanding (and the implementation in WDTK) is that all dates are provided in Gregorian calendar with a calendar model that specifies how they should be displayed (if possible). The date in the source calendar is for convenience and maybe for technical reasons on the side of the PHP implementation. At no time should the source calendar date be impossible to convert to Gregorian. We have had extensive discussions about this point -- Gregorian must remain the main format at all times. This does not mean that we cannot have more models in the future. There is (currently unused) timezone information, which can be used to store offsets. Once fully implemented, this might allow exact conversion from calendar models that have another start for their days. So maybe this is not a case of real incompatibility. However, the timezone support for current dates needs to be finished before discussing the next steps into more exotic calendars. Best regards, Markus _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 1 July 2015 at 21:12, Markus Krötzsch markus@semantic-mediawiki.org wrote:
ISO has no such detailed way to specify precision,
No, but there is an extension, EDTF, for this:
http://www.loc.gov/standards/datetime/pre-submission.html
with an active, low-traffic mailing list:
http://www.loc.gov/standards/datetime/listserv.html
See that list's archives for details of recent (May 2015) discussions with ISO about formal recognition of the extension.
Disclosure: I'm a contributor to EDTF.
It is quite common to set Gregorian dates as equal to ISO8601 dates, and this is correct as long as you only go forward from 1582. If you want to go backwards you must do so only after negotiation with the communicating peer, that is "do as we say if you want our data!" When you hit 1BC the ISO8601 goes to +0, but "hey - do you want our data or what?"
If we want to be really picky about dates, or any value, then there are systems, model and datum. The model is how we describe something, and the datum is a specific set of arguments for the model. Both are often assumed within a system, like the SI system, but often there are bunch of updates and modifications. If there is a best precision then a lot of the models and datums collapse into a single one, like if you have a date with only year precision.
/me digs in on another project
John
On Wed, Jul 1, 2015 at 10:12 PM, Markus Krötzsch markus@semantic-mediawiki.org wrote:
On 01.07.2015 20:08, John Erling Blad wrote:
Wouldn't it be better to use iso8601 as internal format?
Yes, that was essentially our original proposal. ISO8601 is a syntax for proleptic Gregorian dates, so this would be the internal calendar model. ISO has no such detailed way to specify precision, so this was an add-on we conceived for Wikidata as an optional annotation to the exact date). The idea was that all type "date" just means "ISO date + precision" and that Julian calendar support is provided by offering transparent conversion functions to the user (so you could always see and write Julian dates if you wanted to, without this changing the internal ISO form). The additions of "before" and "after" came later. The key idea to use ISO format internally while still supporting Julian dates with perfect round-tripping is implemented in Semantic MediaWiki, and the plan was to do this in Wikidata as well.
That was the theory. In practice, there was the confusion that Lydia described. Maybe the main problem was that bot authors were writing the internal data directly (for a while this was almost unfiltered). So when they would use the API like a user would use the UI ("set 'Julian' if you want to input your date in Julian"), then it would be wrong, since they would bypass the Julian conversion that the UI provided. This might have been the seed for the confusion that arose. This is only of historic interest now; we don't need to discuss where exactly the errors happened first. Better focus on fixing the dates now.
Best regards,
Markus
ons. 1. jul. 2015, 18.45 skrev Markus Krötzsch <markus@semantic-mediawiki.org mailto:markus@semantic-mediawiki.org>:
On 01.07.2015 18:14, Peter F. Patel-Schneider wrote: > On 07/01/2015 07:00 AM, Pierpaolo Bernardi wrote: >> On Wed, Jul 1, 2015 at 8:17 AM, Markus Krötzsch >> <markus@semantic-mediawiki.org <mailto:markus@semantic-mediawiki.org>> wrote: >>> Dear Pierpaolo, >>> >>> This thread was only about Julian and Gregorian calendar dates. If and >>> how other calendar models should be supported in some future is >>> another (potentially big) discussion. As you said, there are many >>> issues there. Let's first make sure that we handle the "easy" 99.9% of >>> cases correctly before discussing any more complicated options. >> >> Lydia Pintscher in the starting email explained that there's a model for >> calendars, and unfortunately this model could be (and has been) >> interpreted in two ways (AFAIU). >> >> My intention was to point out that one of the two interpretations is not >> sound. This leaves the other one as the only viable one. >> >> Cheers P. > > It appears (from the email only---there are no pointers to enduring > documentation on the solution that are attached to the relevant classes or > poperties) that the chosen method is to store dates in both the source > calendar and the proleptic Gegorian calendar >
(https://www.wikidata.org/wiki/Wikidata:Project_chat#calendar_model_screwup). > As you point out, this is not a viable solution for calendars whose days do > not start at the same time as days in the proleptic Gegorian calendar > (unless, of course, there is time and location information also available).
The Wikidata date implementation intentionally restricts to dates that are compatible with the Gregorian calendar. Although the system refers to Wikidata item ids of calendar models to denote "Proleptic
Gregorian" and "Proleptic Julian", the system does not allow users or bots to enter arbitrary items as calendar model.
My understanding (and the implementation in WDTK) is that all dates
are provided in Gregorian calendar with a calendar model that specifies how they should be displayed (if possible). The date in the source calendar is for convenience and maybe for technical reasons on the side of the PHP implementation. At no time should the source calendar date be impossible to convert to Gregorian. We have had extensive discussions about this point -- Gregorian must remain the main format at all times.
This does not mean that we cannot have more models in the future.
There is (currently unused) timezone information, which can be used to store offsets. Once fully implemented, this might allow exact conversion from calendar models that have another start for their days. So maybe this is not a case of real incompatibility. However, the timezone support for current dates needs to be finished before discussing the next steps into more exotic calendars.
Best regards, Markus _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Am 01.07.2015 um 20:08 schrieb John Erling Blad:
Wouldn't it be better to use iso8601 as internal format?
In a relational database schema or a triple store, yes. In the primary JSON blobs, no - there we generally want to store data as entered by the user: if the user entered a length in feet, we store feet, and if they entered a date in Julian, we store Julian. Attempting to handle dates different in this regard was one source of confusion that led to the current mess (think diffs, for example).
I thought I lost that discussion....? =D https://www.youtube.com/watch?v=jxhNWYTUiQQ
John
On Thu, Jul 2, 2015 at 11:42 AM, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
Am 01.07.2015 um 20:08 schrieb John Erling Blad:
Wouldn't it be better to use iso8601 as internal format?
In a relational database schema or a triple store, yes. In the primary JSON blobs, no - there we generally want to store data as entered by the user: if the user entered a length in feet, we store feet, and if they entered a date in Julian, we store Julian. Attempting to handle dates different in this regard was one source of confusion that led to the current mess (think diffs, for example).
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 01/07/15 15:00, Pierpaolo Bernardi wrote:
On Wed, Jul 1, 2015 at 8:17 AM, Markus Krötzsch markus@semantic-mediawiki.org wrote:
Dear Pierpaolo,
This thread was only about Julian and Gregorian calendar dates. If and how other calendar models should be supported in some future is another (potentially big) discussion. As you said, there are many issues there. Let's first make sure that we handle the "easy" 99.9% of cases correctly before discussing any more complicated options.
Just for future reference, if anyone's interested, THE book on this topic is "Calendrical Calculations".
Alas, their code is closed-source, but the book is still the best reference I know of.
http://emr.cs.iit.edu/home/reingold/calendar-book/third-edition/
-- Neil
The issues unravel like a ball of string when you look at time.
There is all the cultural stuff, then there is the astronomy, geodesy and physics that frustrate you if you want to get it right. Leap seconds, Congress changing daylight savings time, relativity, etc.
The Allen algebra is, I think, the most useful theory of time. Here you look at "times" as a union of intervals, which gives a clear way to say "Thorsday" or "Easter" or "Ramadan" or the times the Burger King down the road is open.
You can use intervals to specify the accuracy of a date (i.e. when we say that Franz Kafka was born on July 23, 1883 this a truncation of a birth "event" which could be timed to a minute or so.) On the other hand, if we say there is a public festival such as "Tanno no Tanjobi" (the emperor's birthday) or just a reference to the actual day of "July 23, 1883" the interval algebra handles that too. There is some conflation, in practice it is not so bad and with the property graph model Wikidata uses you can stick a statement that qualifies ~your~ point of view of how to think about it.
The good news too is that "unusual" use cases of time are remarkably unusual. For instance, some W3C standards suggest you can write an ISO date like
17413-04-07
adding one more or more digits to the year. Practically nobody uses this because precise dates aren't known to prior civilization other than for astronomical events; at +-10,000 the defects of common calendars are showing and if you go out to +-100,000 the errors in all of the models. It is not uncommon for science fiction writers to give specific years, months and dates in the 2000-2999 range, but incredibly unusal after +10,000. People who build nuclear waste dumps need to think about times in that 10-100kyear range, but after the facility closes, it doesn't matter if a day is a Sunday or a Monday.
The same is true for calendars. For instance, if a document is signed in Saudi Arabia, it may have an Islamic date stamped on. As a westerner who wants to know when the document is signed, you are better served seeing a western date, at least on first glance.
In a western cultural zone you can squash dates from other cultural zones to your own date system. I think it would be awesome if the Arabic slice of Wikidata/Wikipedia had Islamic dates for dates and western ones for our cultural zone and if you could flip a switch and see the one you want (or look up which year of the emperor it is.)
The design of the JDK 8 java.time framework is sound and ought to give some inspirations as to how to think about calendars.
https://docs.oracle.com/javase/8/docs/api/java/time/package-summary.html
I'd say make a core type that handles Gregorian dates and supports the Allen Algebra, then you can define more data types and extensions to properly handle other calendar systems.
On Thu, Jul 2, 2015 at 10:16 AM, Neil Harris neil@tonal.clara.co.uk wrote:
On 01/07/15 15:00, Pierpaolo Bernardi wrote:
On Wed, Jul 1, 2015 at 8:17 AM, Markus Krötzsch markus@semantic-mediawiki.org wrote:
Dear Pierpaolo,
This thread was only about Julian and Gregorian calendar dates. If and how other calendar models should be supported in some future is another (potentially big) discussion. As you said, there are many issues there. Let's first make sure that we handle the "easy" 99.9% of cases correctly before discussing any more complicated options.
Just for future reference, if anyone's interested, THE book on this topic is "Calendrical Calculations".
Alas, their code is closed-source, but the book is still the best reference I know of.
http://emr.cs.iit.edu/home/reingold/calendar-book/third-edition/
-- Neil
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On Thu, Jul 2, 2015 at 4:16 PM, Neil Harris neil@tonal.clara.co.uk wrote:
On 01/07/15 15:00, Pierpaolo Bernardi wrote:
Just for future reference, if anyone's interested, THE book on this topic is "Calendrical Calculations".
Alas, their code is closed-source, but the book is still the best reference I know of.
True. HOWEVER, the authors of the book are the same people who wrote the Emacs calendar, which is substantially the same code as the one in the book, and the code in Emacs is GPL, of course. One can look at the book for explanations, and at Emacs calendar for code, if one needs free code.
Cheers P.
There also seems to be an issue where a year could be recorded as either time="+0000000YYYY-00-00T00:00:00Z" or time="+0000000YYYY-01-01T00:00:00Z" (with precision=9). At least I spotted that my earlier bot runs was doing this.
These display the same but if you compare the claims they show up as different. I also guess only the latter is correct. Is there a way of checking how many of the first there is and possible convert these to the latter?
André Costa | GLAM-tekniker, Wikimedia Sverige | Andre.Costa@wikimedia.se | +46 (0)733-964574
Stöd fri kunskap, bli medlem i Wikimedia Sverige. Läs mer på blimedlem.wikimedia.se
On 2 July 2015 at 20:48, Pierpaolo Bernardi olopierpa@gmail.com wrote:
On Thu, Jul 2, 2015 at 4:16 PM, Neil Harris neil@tonal.clara.co.uk wrote:
On 01/07/15 15:00, Pierpaolo Bernardi wrote:
Just for future reference, if anyone's interested, THE book on this
topic is
"Calendrical Calculations".
Alas, their code is closed-source, but the book is still the best
reference
I know of.
True. HOWEVER, the authors of the book are the same people who wrote the Emacs calendar, which is substantially the same code as the one in the book, and the code in Emacs is GPL, of course. One can look at the book for explanations, and at Emacs calendar for code, if one needs free code.
Cheers P.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata