Hi!
I have been processing a recent Wikidata JSON dump. I have noticed that some claims have +0000-00-00T00:00:00Z as the time value. My understanding is that those are invalid values for time, at least according to [1]. I think they can be safely removed, yes?
[1] https://doc.wikimedia.org/Wikibase/master/php/md_docs_topics_json.html
Mitar
Hey Mitar,
Also here a few examples would help to better understand what's going on.
Cheers Lydia
On Sun, Jan 9, 2022 at 9:52 AM Mitar mmitar@gmail.com wrote:
Hi!
I have been processing a recent Wikidata JSON dump. I have noticed that some claims have +0000-00-00T00:00:00Z as the time value. My understanding is that those are invalid values for time, at least according to [1]. I think they can be safely removed, yes?
[1] https://doc.wikimedia.org/Wikibase/master/php/md_docs_topics_json.html
Mitar
-- http://mitar.tnode.com/ https://twitter.com/mitar_m _______________________________________________ Wikidata mailing list -- wikidata@lists.wikimedia.org To unsubscribe send an email to wikidata-leave@lists.wikimedia.org
Hi!
I took some time and went over all cases I found and all of them were simply bad data. I suspect that most of them were added using some automatic way which was passing this timestamp in when there was no data. So I cleaned them up or fixed them (in few cases the right value was "unknown" with a range, in some cases it was 1 BCE, but in most cases I just removed the claim because not only that it is false, it simply invalid, it is not even a valid timestamp).
You can see examples in my recent changes [1].
At this point I would ask more about how this got in (why it is not denied at insertion time) and even more interesting: the web UI does not show any warning about those values. For many other cases you get various warnings about possibly invalid data, but not here. So maybe adding a warning that if such a timestamp is a value, a warning should be shown next to it, that would be great. Of course, even better would be to prevent insertion (because in 99% it means somebody is blindly inserting a default zero value).
[1] https://www.wikidata.org/w/index.php?title=Special:Contributions/Mitar&o...
Mitar
On Mon, Jan 10, 2022 at 4:50 PM Lydia Pintscher Lydia.Pintscher@wikimedia.de wrote:
Hey Mitar,
Also here a few examples would help to better understand what's going on.
Cheers Lydia
On Sun, Jan 9, 2022 at 9:52 AM Mitar mmitar@gmail.com wrote:
Hi!
I have been processing a recent Wikidata JSON dump. I have noticed that some claims have +0000-00-00T00:00:00Z as the time value. My understanding is that those are invalid values for time, at least according to [1]. I think they can be safely removed, yes?
[1] https://doc.wikimedia.org/Wikibase/master/php/md_docs_topics_json.html
Mitar
-- http://mitar.tnode.com/ https://twitter.com/mitar_m _______________________________________________ Wikidata mailing list -- wikidata@lists.wikimedia.org To unsubscribe send an email to wikidata-leave@lists.wikimedia.org
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207. _______________________________________________ Wikidata mailing list -- wikidata@lists.wikimedia.org To unsubscribe send an email to wikidata-leave@lists.wikimedia.org