Hi everyone,
Thanks to Lydia and the team for containing this issue and providing the necessary documentation for fixing it. For all of you who wonder what the scale of the issue is (a.k.a. "How bad is it?"), here are some numbers.
The most important years for better understanding:
1582: Gregorian Calendar is introduced; some countries switch quickly 1753: most countries have made the switch (Sweden and UK quite late), only Greece and Russia continue to use Julian dates 1923: Greece finally switches to Gregorian, all countries switched
Only dates with day-level precision are affected (the shift between calendars was never more than half a month). This said, the following numbers (state 22 June 2015) should make some sense:
* Dates in Wikipedia overall: 12,549,662 (100%) * Dates that are precise at least to the day: 11.216.635 (89%): ** Since 1923: 10,406,630 (83%) ** 1753-1922: 716,711 (5.7%) ** 1582-1752: 64,325 (0.5%) ** Until 1581: 28,969 (0.2%)
In other words, disregarding Greece and Russia, at most 0.7% of Wikidata dates are affected. This still makes a notable number of over 93,000 dates, though the 64,325 from 1582-1752 are an overestimate (many countries had already switched).
It is not easy to say how many potentially Julian dates are among the 5.7% that happened before Greece introduced Gregorian, but it seems likely that a good majority did not occur in Russia or Greece.
These numbers were as expected for me. What I was more surprised by is the rare use of "Julian calendar" as a calendar model in the data:
* Dates in Wikipedia overall: 12,549,662 (100%) ** Gregorian dates: 12,529,635 (99.8%) ** Julian dates: 20,027 (0.15%)
This means that not even the 0.2% that happened before the invention of Gregorian Calendar are tagged as Julian. Now some dates before 1582 may correctly use Gregorian calendar (e.g., no calendar model makes sense for the beginning of the universe, so we might as well leave Gregorian there). However, I would expect that basically all dates with day-precision before 1582 are from historic records and should therefore use Julian. So it seems that there is some work to do there.
It might be good to find out where these many historic dates came from. When entering dates in the UI, it will suggest Julian for these times, so it seems unlikely that users have entered most of them. If they came through a bot, it would be good to find out what the bot author was doing. If you upload historic dates (maybe birth dates), they should come in Julian too. As Lydia explained, there are two things that the bot author might have thought:
(1) "I should use the calendar model setting to tell Wikidata which calendar model my dates are in" (2) "I should upload Gregorian dates and use the calendar model setting to tell Wikidata which calendar model my dates should be displayed in"
In either case, the natural choice would be to use Julian. Why could a bot author possibly have specified "Gregorian" for a date before 1582? A bot author might convert Julian dates to Gregorian if (s)he would expect option (2) to be correct (since this would require all dates to be sent in Gregorian). But in this case, the bot would still set "Julian" as the calendar model to use for display.
Whatever way I look at it, it seems likely that our historic dates need some validation anyway. Maybe the calendar model confusion Lydia explained is not the only issue here. And adding more references would also be very useful on its own right.
Best regards,
Markus
On 30.06.2015 19:38, Lydia Pintscher wrote:
Hi everyone,
I have some bad news. We screwed up. I’m really sorry about this. I’d really appreciate everyone’s help with fixing it.
TLDR: We have a bad mixup of calendar models for the dates in Wikidata and we need to fix them.
==== What happened? ==== Wikidata dates have a calendar model. This can be Julian or Gregorian and the plan is to support more in the future. There are two ways to interpret this calendar model: # the given date is in this calendar model # the given date is Gregorian and this calendar model says if the date should be displayed in Gregorian or Julian in the user interface
Unfortunately both among the developers as well as bot operators there was confusion about which of those is to be used. This lead to inconsistencies in the backend/frontend code as well as different bot authors treating the calendar model differently. In addition the user interface had problematic defaults. We now have a number of dates with a potentially wrong calendar model. The biggest issue started when we moved code from the frontend to the backend in Mid 2014 in order to improve performance. Prior to the move, the user interface used to make the conversion from one model to the other. After the move, the conversion was not done anywhere anymore - but the calendar model was still displayed. We made one part better but in the process broke another part badly :(
==== What now? ====
- Going forward the date data value will be given in both the
normalized proleptic Gregorian calendar as well as in the calendar model explicitly given (which currently supports, as said, proleptic Gregorian and proleptic Julian).
- The user interface will again indicate which calendar model the date
is given in. We will improve documentation around this to make sure there is no confusion from now on.
- We made a flowchart to help decide what the correct calendar model
for a date should be to help with the clean up.
- We are improving the user interface to make it easier to understand
what is going on and by default do the right thing.
- We are providing a list of dates that need to be checked and
potentially fixed.
- How are we making sure it doesn’t happen again?
- We are improving documentation around dates and will look for other
potential ambiguous concepts we have.
==== How can we fix it? ==== We have created a list of all dates that potentially need checking. We can either provide this as a list on some wiki page or run a bot to add “instance of: date needing calendar model check“ or something similar as a qualifier to the respective dates. What do you prefer? The list probably contains dates we can batch-change or approve but we’d need your help with figuring out which those are. We also created a flowchart that should help with making the decision which calendar model to pick for a given date: https://commons.wikimedia.org/wiki/File:Wikidata_Calendar_Model_Decision_Tre...
Thank you to everyone who helped us investigate and get to the bottom of the issue. Sorry again this has happened and is causing work. I feel miserable about this and if there is anything more we can do to help with the cleanup please do let me know.
Let's please keep further discussion about this in one place on-wiki at https://www.wikidata.org/wiki/Wikidata:Project_chat#calendar_model_screwup
Cheers Lydia