Hi,
On Fri, Dec 21, 2012 at 6:14 PM, Denny Vrandečić
<denny.vrandecic(a)wikimedia.de> wrote:
Friedrich, the term "query answering" simply means the ability to answer
queries against the database in Phase 3, e.g. the list of cities located in Ghana with a
population over 25,000 ordered by population.
A query system that deals well with intervals -- I would need a pointer for that. For now
I was always assuming to use a single value internally to answer such queries. If the
values is 90+-20 then the query >100? would not contain that result. Sucks, but I
don't know of any better system.
So taking the current data model and our eiffel tower example:
We have the entity "Eiffel Tower".
We want to represent "The Eiffel Tower is 324 meters high", so we
would create the statement
(
Assumptions:
wikipedia pages as ids,
I understood the wikidata object notation correctly,
Number(upperlimit lowerlimit unit) is the object notation for numbers.
I omitted the quantity field because i think its redundant at least in
this example and the confidence could be added as an additional
PropertySnak(?))
)
Statement('
http://en.wikipedia.org/wiki/Eiffel_Tower
PropertyValueSnak('
http://en.wikipedia.org/wiki/Height Number(324 324
http://en.wikipedia.org/wiki/Metre) ')
{reference and rank omitted}
')
(Note: I though I read somewhere it was decided that all statements on
wikidata should at least have one reference, but in the object
notation definition the {references} seems to imply this is a optional
argument. Also the visual has a 0..* relation, did i miss something?)
Assuming a System of tables by unit and multiples i.e. (meter is m0)
is used the table could look like this
Table m0:
propertyid | property | max | min | other information
1234 | Height | 324 | 324 | ... (or 323 | 325 )
an index could be put over min and max and property and the query for
all buildings higher then 300 meters could start with:
SELECT propertyid FROM m0 WHERE property = Height AND min > 300 OR max > 300;
This would allow a query for things with a "Height" greater then 300
and it would even include things defined as 290+-20, since the max
value would be over 300. The much harder thing imho is the "located in
Ghana" part of the query, but I think there are such things as spatial
queries for the big databases ( f.e.
http://dev.mysql.com/doc/refman/5.0/en/spatial-extensions.html ).
This would also work for queries on Dates with tables that have
mindate, maxdate. ( appropo dates here is a short but interesting
discussion on how they might be saved in databases if arbitrary dates
are needed here:
http://stackoverflow.com/a/2487792 )
Representation of "temporal knowledge" (?) seems to be a huge research
topic anyway (f.e.
http://www.math.unipd.it/~kvenable/RT/corso2009/Allen.pdf or
http://www.cs.ox.ac.uk/boris.motik/pubs/nm03fuzzy.pdf, ...); where the
problems about time intervals vs. time points, uncertainty and
"vagueness" and their representation is discussed.
hope this helps,
Friedrich