Hi everybody!
As the guy who has to honor to shortly receive some funding from Wikimedia Germany for handling spatial open government data [0] I would like to make some remarks on the current geo definitions in the Wikidata model:
1. Spatial Reference System Identifier (SRID [1]) definition is missing
Every GeoCoordinatesValue field should either have a corresponding SRID field that defines the used spatial reference system (SRS [2]) or mandate the use of a single SRS like WGS84 [3] which is currently the standard used by GPS, OpenStreetMap and Wikipedia.
2. Geographic shapes should be defined in either Well-known text (WKT [4]) or GeoJSON [5]
WKT is the defacto standard to store spatial data in a rational database and GeoJSON is the defacto standard to access geo data via web. Both formats can be easily transformed into each other. So which one you choose pretty much depends on your preferred choice of SQL vs. NoSQL database.
So in summary I would propose the following data model for spatial data:
Geographic locations Datatype IRI: http://wikidata.org/vocabulary/datatype_geocoords Value: GeoCoordinatesValue Mandatory spatial reference system: EPSG 4326 (WGS 84/GPS) Type: Decimal
Geographic objects Datatype IRI: http://wikidata.org/vocabulary/datatype_geoobjects Value: GeoObjectsValue Type: GeoJSON [5]
Geographic objects SRID Datatype IRI: http://wikidata.org/vocabulary/datatype_geoobjects_srid Value: GeoObjectsSridValue Type: EPSG Spatial Reference System Identifier (SRID [1])
That model would allow a structure where every spatial object can have a complex geometry stored in its original geodetic system and still have an easily manageable location in GPS format.
cu andreas
[0] http://de.wikipedia.org/wiki/Wikipedia:Community-Projektbudget#2._kartenwerk... [1] https://en.wikipedia.org/wiki/Spatial_reference_system_identifier [2] https://en.wikipedia.org/wiki/Spatial_reference_system [3] https://en.wikipedia.org/wiki/WGS84 [4] https://en.wikipedia.org/wiki/Well-known_text [5] https://en.wikipedia.org/wiki/GeoJSON
Hi Andreas,
thanks for the input. I have drafted the current text about geo-related datatypes, but I am far from being an expert in this area. Our mapping expert in Wikidata is Katie (Aude), who has also been working with OpenStreetMap, but further expert input on this topic would be quite valuable.
As in all areas, we need to find a balance between generality and usability, so I am slightly in favour of committing to one SR for now (as I understand, the data can be converted easily between SRs but -- as opposed to other cases where people measure something -- most of the world seems to be happy with one of them).
I have now included a link to this thread into an editorial remark in the data model, so we do not forget about this discussion when working out the details.
Markus
On 04/04/12 14:16, Andreas Trawoeger wrote:
Hi everybody!
As the guy who has to honor to shortly receive some funding from Wikimedia Germany for handling spatial open government data [0] I would like to make some remarks on the current geo definitions in the Wikidata model:
- Spatial Reference System Identifier (SRID [1]) definition is missing
Every GeoCoordinatesValue field should either have a corresponding SRID field that defines the used spatial reference system (SRS [2]) or mandate the use of a single SRS like WGS84 [3] which is currently the standard used by GPS, OpenStreetMap and Wikipedia.
- Geographic shapes should be defined in either Well-known text (WKT
[4]) or GeoJSON [5]
WKT is the defacto standard to store spatial data in a rational database and GeoJSON is the defacto standard to access geo data via web. Both formats can be easily transformed into each other. So which one you choose pretty much depends on your preferred choice of SQL vs. NoSQL database.
So in summary I would propose the following data model for spatial data:
Geographic locations Datatype IRI: http://wikidata.org/vocabulary/datatype_geocoords Value: GeoCoordinatesValue Mandatory spatial reference system: EPSG 4326 (WGS 84/GPS) Type: Decimal
Geographic objects Datatype IRI: http://wikidata.org/vocabulary/datatype_geoobjects Value: GeoObjectsValue Type: GeoJSON [5]
Geographic objects SRID Datatype IRI: http://wikidata.org/vocabulary/datatype_geoobjects_srid Value: GeoObjectsSridValue Type: EPSG Spatial Reference System Identifier (SRID [1])
That model would allow a structure where every spatial object can have a complex geometry stored in its original geodetic system and still have an easily manageable location in GPS format.
cu andreas
[0] http://de.wikipedia.org/wiki/Wikipedia:Community-Projektbudget#2._kartenwerk... [1] https://en.wikipedia.org/wiki/Spatial_reference_system_identifier [2] https://en.wikipedia.org/wiki/Spatial_reference_system [3] https://en.wikipedia.org/wiki/WGS84 [4] https://en.wikipedia.org/wiki/Well-known_text [5] https://en.wikipedia.org/wiki/GeoJSON
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
I believe this is very important, not only for spatial objects but also for time and for spatiotemporal objects, and even for a lot of other types of objects. One option could be to add some kind of subtypes, or other variations of types, and one of them could be default in a locale or given as a user preference. A subtype could then refer to a SRID which then refer to a specific datum, geoid, coordinate system, projection, and whattever. As Markus said, usability is an issue here, but I think most of the complexity can be hidden. Or I hope so. ;)
One reason for it to be subclasses of a type is that alternate subtypes could replace each other, while still being valid for a specific property. For example a coordinate could be given in NAD27 and then converted to WGS84 with some errors.. It could be difficult or impossible to convert coordinates in general (old coordinates referring to a flat map could be an example), but perhaps some types are important enough.
Note that even identifying which ones are important enough would be difficult, not to say converting between them. It is not an error in general to not be able to convert between subtypes, even if the two subtypes belong to the same supertype.
Note also that this is an inherent problem in all kinds of measurement where a value refer to some form of official or unofficial measuring method. Other examples are length measurements in feet in UK and Germany (all are valid length but conversion is difficult and in some cases unknown), and old time standards that could be different for individual European cities (not sure if the differences are well known at all). An even better example could be currency where USD 1 would not have a fixed value compared to EUR 1. Akain both USD and EUR is valid currency but the conversion rate is unknown in the future and known with some error in the past.
John
On Fri, Apr 6, 2012 at 1:46 PM, Markus Krötzsch markus@semantic-mediawiki.org wrote:
Hi Andreas,
thanks for the input. I have drafted the current text about geo-related datatypes, but I am far from being an expert in this area. Our mapping expert in Wikidata is Katie (Aude), who has also been working with OpenStreetMap, but further expert input on this topic would be quite valuable.
As in all areas, we need to find a balance between generality and usability, so I am slightly in favour of committing to one SR for now (as I understand, the data can be converted easily between SRs but -- as opposed to other cases where people measure something -- most of the world seems to be happy with one of them).
I have now included a link to this thread into an editorial remark in the data model, so we do not forget about this discussion when working out the details.
Markus
On 04/04/12 14:16, Andreas Trawoeger wrote:
Hi everybody!
As the guy who has to honor to shortly receive some funding from Wikimedia Germany for handling spatial open government data [0] I would like to make some remarks on the current geo definitions in the Wikidata model:
- Spatial Reference System Identifier (SRID [1]) definition is missing
Every GeoCoordinatesValue field should either have a corresponding SRID field that defines the used spatial reference system (SRS [2]) or mandate the use of a single SRS like WGS84 [3] which is currently the standard used by GPS, OpenStreetMap and Wikipedia.
- Geographic shapes should be defined in either Well-known text (WKT
[4]) or GeoJSON [5]
WKT is the defacto standard to store spatial data in a rational database and GeoJSON is the defacto standard to access geo data via web. Both formats can be easily transformed into each other. So which one you choose pretty much depends on your preferred choice of SQL vs. NoSQL database.
So in summary I would propose the following data model for spatial data:
Geographic locations Datatype IRI: http://wikidata.org/vocabulary/datatype_geocoords Value: GeoCoordinatesValue Mandatory spatial reference system: EPSG 4326 (WGS 84/GPS) Type: Decimal
Geographic objects Datatype IRI: http://wikidata.org/vocabulary/datatype_geoobjects Value: GeoObjectsValue Type: GeoJSON [5]
Geographic objects SRID Datatype IRI: http://wikidata.org/vocabulary/datatype_geoobjects_srid Value: GeoObjectsSridValue Type: EPSG Spatial Reference System Identifier (SRID [1])
That model would allow a structure where every spatial object can have a complex geometry stored in its original geodetic system and still have an easily manageable location in GPS format.
cu andreas
[0] http://de.wikipedia.org/wiki/Wikipedia:Community-Projektbudget#2._kartenwerk... [1] https://en.wikipedia.org/wiki/Spatial_reference_system_identifier [2] https://en.wikipedia.org/wiki/Spatial_reference_system [3] https://en.wikipedia.org/wiki/WGS84 [4] https://en.wikipedia.org/wiki/Well-known_text [5] https://en.wikipedia.org/wiki/GeoJSON
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
And one last thing; SRID and datums and other such things should be solved, but leave it for later. Its like relationships on Facebook, "its complicated"! ;)
John
On Fri, Apr 6, 2012 at 2:41 PM, John Erling Blad jeblad@gmail.com wrote:
I believe this is very important, not only for spatial objects but also for time and for spatiotemporal objects, and even for a lot of other types of objects. One option could be to add some kind of subtypes, or other variations of types, and one of them could be default in a locale or given as a user preference. A subtype could then refer to a SRID which then refer to a specific datum, geoid, coordinate system, projection, and whattever. As Markus said, usability is an issue here, but I think most of the complexity can be hidden. Or I hope so. ;)
One reason for it to be subclasses of a type is that alternate subtypes could replace each other, while still being valid for a specific property. For example a coordinate could be given in NAD27 and then converted to WGS84 with some errors.. It could be difficult or impossible to convert coordinates in general (old coordinates referring to a flat map could be an example), but perhaps some types are important enough.
Note that even identifying which ones are important enough would be difficult, not to say converting between them. It is not an error in general to not be able to convert between subtypes, even if the two subtypes belong to the same supertype.
Note also that this is an inherent problem in all kinds of measurement where a value refer to some form of official or unofficial measuring method. Other examples are length measurements in feet in UK and Germany (all are valid length but conversion is difficult and in some cases unknown), and old time standards that could be different for individual European cities (not sure if the differences are well known at all). An even better example could be currency where USD 1 would not have a fixed value compared to EUR 1. Akain both USD and EUR is valid currency but the conversion rate is unknown in the future and known with some error in the past.
John
On Fri, Apr 6, 2012 at 1:46 PM, Markus Krötzsch markus@semantic-mediawiki.org wrote:
Hi Andreas,
thanks for the input. I have drafted the current text about geo-related datatypes, but I am far from being an expert in this area. Our mapping expert in Wikidata is Katie (Aude), who has also been working with OpenStreetMap, but further expert input on this topic would be quite valuable.
As in all areas, we need to find a balance between generality and usability, so I am slightly in favour of committing to one SR for now (as I understand, the data can be converted easily between SRs but -- as opposed to other cases where people measure something -- most of the world seems to be happy with one of them).
I have now included a link to this thread into an editorial remark in the data model, so we do not forget about this discussion when working out the details.
Markus
On 04/04/12 14:16, Andreas Trawoeger wrote:
Hi everybody!
As the guy who has to honor to shortly receive some funding from Wikimedia Germany for handling spatial open government data [0] I would like to make some remarks on the current geo definitions in the Wikidata model:
- Spatial Reference System Identifier (SRID [1]) definition is missing
Every GeoCoordinatesValue field should either have a corresponding SRID field that defines the used spatial reference system (SRS [2]) or mandate the use of a single SRS like WGS84 [3] which is currently the standard used by GPS, OpenStreetMap and Wikipedia.
- Geographic shapes should be defined in either Well-known text (WKT
[4]) or GeoJSON [5]
WKT is the defacto standard to store spatial data in a rational database and GeoJSON is the defacto standard to access geo data via web. Both formats can be easily transformed into each other. So which one you choose pretty much depends on your preferred choice of SQL vs. NoSQL database.
So in summary I would propose the following data model for spatial data:
Geographic locations Datatype IRI: http://wikidata.org/vocabulary/datatype_geocoords Value: GeoCoordinatesValue Mandatory spatial reference system: EPSG 4326 (WGS 84/GPS) Type: Decimal
Geographic objects Datatype IRI: http://wikidata.org/vocabulary/datatype_geoobjects Value: GeoObjectsValue Type: GeoJSON [5]
Geographic objects SRID Datatype IRI: http://wikidata.org/vocabulary/datatype_geoobjects_srid Value: GeoObjectsSridValue Type: EPSG Spatial Reference System Identifier (SRID [1])
That model would allow a structure where every spatial object can have a complex geometry stored in its original geodetic system and still have an easily manageable location in GPS format.
cu andreas
[0] http://de.wikipedia.org/wiki/Wikipedia:Community-Projektbudget#2._kartenwerk... [1] https://en.wikipedia.org/wiki/Spatial_reference_system_identifier [2] https://en.wikipedia.org/wiki/Spatial_reference_system [3] https://en.wikipedia.org/wiki/WGS84 [4] https://en.wikipedia.org/wiki/Well-known_text [5] https://en.wikipedia.org/wiki/GeoJSON
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On 06/04/12 13:41, John Erling Blad wrote:
I believe this is very important, not only for spatial objects but also for time and for spatiotemporal objects, and even for a lot of other types of objects. One option could be to add some kind of subtypes, or other variations of types, and one of them could be default in a locale or given as a user preference. A subtype could then refer to a SRID which then refer to a specific datum, geoid, coordinate system, projection, and whattever. As Markus said, usability is an issue here, but I think most of the complexity can be hidden. Or I hope so. ;)
One reason for it to be subclasses of a type is that alternate subtypes could replace each other, while still being valid for a specific property. For example a coordinate could be given in NAD27 and then converted to WGS84 with some errors.. It could be difficult or impossible to convert coordinates in general (old coordinates referring to a flat map could be an example), but perhaps some types are important enough.
Note that even identifying which ones are important enough would be difficult, not to say converting between them. It is not an error in general to not be able to convert between subtypes, even if the two subtypes belong to the same supertype.
Note also that this is an inherent problem in all kinds of measurement where a value refer to some form of official or unofficial measuring method. Other examples are length measurements in feet in UK and Germany (all are valid length but conversion is difficult and in some cases unknown), and old time standards that could be different for individual European cities (not sure if the differences are well known at all). An even better example could be currency where USD 1 would not have a fixed value compared to EUR 1. Akain both USD and EUR is valid currency but the conversion rate is unknown in the future and known with some error in the past.
Many complicated issues are mentioned here. I think the only way in which we can decide for or against additional complexity is to collect more use cases. It would be useful to collect (and briefly explain) examples of challenging infoboxes at
http://meta.wikimedia.org/wiki/Wikidata/Infoboxes
as suggested by Lydia recently. We can then consider how each case can be supported appropriately. So far, I do not know of any example where Wikipedia uses unusual SRs or old-time units in infoboxes.
Markus
P.S. Let's try to keep this thread on geo; numeric unit conversion issues or time-related discussions deserve separate threads.
On Fri, Apr 6, 2012 at 1:46 PM, Markus Krötzsch markus@semantic-mediawiki.org wrote:
Hi Andreas,
thanks for the input. I have drafted the current text about geo-related datatypes, but I am far from being an expert in this area. Our mapping expert in Wikidata is Katie (Aude), who has also been working with OpenStreetMap, but further expert input on this topic would be quite valuable.
As in all areas, we need to find a balance between generality and usability, so I am slightly in favour of committing to one SR for now (as I understand, the data can be converted easily between SRs but -- as opposed to other cases where people measure something -- most of the world seems to be happy with one of them).
I have now included a link to this thread into an editorial remark in the data model, so we do not forget about this discussion when working out the details.
Markus
On 04/04/12 14:16, Andreas Trawoeger wrote:
Hi everybody!
As the guy who has to honor to shortly receive some funding from Wikimedia Germany for handling spatial open government data [0] I would like to make some remarks on the current geo definitions in the Wikidata model:
- Spatial Reference System Identifier (SRID [1]) definition is missing
Every GeoCoordinatesValue field should either have a corresponding SRID field that defines the used spatial reference system (SRS [2]) or mandate the use of a single SRS like WGS84 [3] which is currently the standard used by GPS, OpenStreetMap and Wikipedia.
- Geographic shapes should be defined in either Well-known text (WKT
[4]) or GeoJSON [5]
WKT is the defacto standard to store spatial data in a rational database and GeoJSON is the defacto standard to access geo data via web. Both formats can be easily transformed into each other. So which one you choose pretty much depends on your preferred choice of SQL vs. NoSQL database.
So in summary I would propose the following data model for spatial data:
Geographic locations Datatype IRI: http://wikidata.org/vocabulary/datatype_geocoords Value: GeoCoordinatesValue Mandatory spatial reference system: EPSG 4326 (WGS 84/GPS) Type: Decimal
Geographic objects Datatype IRI: http://wikidata.org/vocabulary/datatype_geoobjects Value: GeoObjectsValue Type: GeoJSON [5]
Geographic objects SRID Datatype IRI: http://wikidata.org/vocabulary/datatype_geoobjects_srid Value: GeoObjectsSridValue Type: EPSG Spatial Reference System Identifier (SRID [1])
That model would allow a structure where every spatial object can have a complex geometry stored in its original geodetic system and still have an easily manageable location in GPS format.
cu andreas
[0] http://de.wikipedia.org/wiki/Wikipedia:Community-Projektbudget#2._kartenwerk... [1] https://en.wikipedia.org/wiki/Spatial_reference_system_identifier [2] https://en.wikipedia.org/wiki/Spatial_reference_system [3] https://en.wikipedia.org/wiki/WGS84 [4] https://en.wikipedia.org/wiki/Well-known_text [5] https://en.wikipedia.org/wiki/GeoJSON
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Andreas Trawoeger schrieb am 04.04.2012 15:16:
- Spatial Reference System Identifier (SRID [1]) definition is missing
Introducing a SRID field would also imply to implement coordinate transformation services behind WikiData and most likely would complicate data usability.
IMO it is sufficient to constrain geodata to WGS84. For most purposes (at least in Wikipedia) it should be accurate enough (± 5 m). We don't do land surveying here, right?
That means, we'll basically ignore influences from plate tectonics. Local or continent-related CRS like NAD83 (for North America) or ETRS89 (for Europe) have the advantage of being more stable pertaining to a particular plate. That's why we use them in Europe and North America. In a global CRS like WGS84 theoretically we have to update geo-coordinates due to crustal motion more often. But we speak of <20 m within 100 years (maybe more in some areas like Hawaii). Relevant to Wikidata? Probably not.
On the other hand there are some historic CRS (like NAD27, Gauss-Krüger). But - as John said - these geo-coordinates can be transformed into WGS84 (usually with an accuracy of <10 m).
In Wikipedia we also have some coordinates for other globes (Moon, Mars) and also celestial coordinates as Lat/lon. But I have no clue on what definitions those coordinates based on. Presumably it would be better to separate these things from geodata.
Regards Alex
Is there an objection to the concept of, or cooperation with, "datawiki" Wikidata compatible projects? I would define a "datawiki" (as there are databases) as a JSON oriented NoSQL DBMS using an enhanced wiki as a human user I/O interface. This would permit BigData, specialized data, and graph sources to feed Wikidata along their own data philosophy and collection/update policy. I suppose that the main point would be an inter-datawiki interchange protocol (RFC?) matching the datawiki authoritative operators' (the first of them being Wikidata) requirements. I would permit projects at different stages of R&D or with different main purposes in order to cooperate with Wikidata. jfc