Denny said:
I think the assumption everything has exactly one type is oversimplifying
The assumption that everything is of multiple types is over-complicating.
Usually you can tell from the first sentence in the Wikipedia page.
"Tuesday is a day of the week"
"Love is an emotion"
"(Roman) Catholicism is a faith"
"Gollum is a fictional character"
"HAL-9000 is a character"
"Noah is a Patriarch"
"Enos was the first chimpanzee"
So consensus certainly is being achieved among thousands of authors about
the fundamental type of thing each of these pages represent. Disambiguation
pages very commonly reference these types of things as in "Enos
(chimpanzee)".
Let's take Gollum. I can imagine a topic map has these subjects:
1. Character
1A. Fictional character
1A1. Fictional person
1A2. Fictional animal
1A3. Fictional ghost
1A4. Fictional god
Another equally valid assertion is that Gollum is a Character that is typed
as Fictional and Human thing (both these adjectives that are instances of
owl:Class) -- so that a comprehensive system sometime in the future would
reinterpret that Gollum is actually a Fictional person.
As you say yourself, it's not useful to create a "perfect" system to handle
every imaginable edge case **to the extent that they exist**. Personally I
don't believe such edge cases can be found - I challenge anyone to provide
me such an example.
But more to the point of Wikidata. I don't believe for a second that WP will
be reorganized into thousands of namespaces. Rather, I believe first,
SUBOBJECT names must include the idea of 'namespace' for the efficiencies
gained, and second, WP pages should be associated with the same set of nouns
(noun-phrases) available for subobject names. IOW, it's an implementation
issue whether a wiki's pages are named using these namespaces, so that the
wiki as a whole can gain the same inherent efficiencies I've sketched for
subobjects.
Best - john
Hi Denny -
Correct - URI opaqueness is required at the level of exchange but there's no
such requirement internal to an application. During exchange, sure you can
add a triple to assert that Density is the type of the object
France#Density:2012_pop_estimate_Bilan_2010. Outside of exchange, type can
be reported by an API or template which parses the namespace from the
object's name. Obviously this would save one triple per object and per page,
in the triples database - a big win in large datasets such as WP.
A similar situation is found today in SMW applications. People create
namespaces and then add a category of the *same name* to the pages and
objects in that namespace; it's highly inefficient and makes me yelp !!
Denny asks:
How do I know the relationship between
France#Density:2012_pop_estimate_Bilan_2010 and
Germany#Density:2009_pop_estimate_CIA_2010?
If you're asking how do you know they're both Density objects.
That's easy - they're subobjects in the same 'namespace'.
So in SMW speak, I'd like to use a predicate 'like' operator.
To list all Density subobjects is just [[~#Density:+]]
To list all for a page is just [[~France#Density:+]]
To list all for both is [[~France#Density:+||~Germany#Density:+]]
With regard to rdf:property - I think you may have meant rdf:predicate as
there's no rdf:property I could find. It's domain is an rdf:Statement not
rdf:Resource. Is a Snak a subtype of rdf:Statement? Why not name it as such,
eg "WMFStatement", to clearly assert its lineage? In any event, I'm
concerned about using rdf:predicate in this manner. In fact I suspect
there's some confusion about reification, because I used dc:Subject /
dc:subject to associate the 'reifier' with the object I think in a quite
traditional uncontroversial manner. To use rdf mechanisms for reification in
this context I'm worried could be particularly problematic.
john
PS yes I was giving triples!!!!
Hi Denny -
Correct - URI opaqueness is required at the level of exchange but there's no
such requirement internal to an application. During exchange, sure you can
add a triple to assert that Density is the type of the object
France#Density:2012_pop_estimate_Bilan_2010. Outside of exchange, type can
be reported by an API or template which parses the namespace from the
object's name. Obviously this would save one triple per object and per page,
in the triples database - a big win in large datasets such as WP.
A similar situation is found today in SMW applications. People create
namespaces and then add a category of the *same name* to the pages and
objects in that namespace; it's highly inefficient and makes me yelp !!
Denny asks:
How do I know the relationship between
France#Density:2012_pop_estimate_Bilan_2010 and
Germany#Density:2009_pop_estimate_CIA_2010?
If you're asking how do you know they're both Density objects.
That's easy - they're subobjects in the same 'namespace'.
So in SMW speak, I'd like to use a predicate 'like' operator.
To list all Density subobjects is just [[~#Density:+]]
To list all for a page is just [[~France#Density:+]]
To list all for both is [[~France#Density:+||~Germany#Density:+]]
With regard to rdf:property - I think you may have meant rdf:predicate as
there's no rdf:property I could find. It's domain is an rdf:Statement not
rdf:Resource. Is a Snak a subtype of rdf:Statement? Why not name it as such,
eg "WMFStatement", to clearly assert its lineage? In any event, I'm
concerned about using rdf:predicate in this manner. In fact I suspect
there's some confusion about reification, because I used dc:Subject /
dc:subject to associate the 'reifier' with the object I think in a quite
traditional uncontroversial manner. To use rdf mechanisms for reification in
this context I'm worried could be particularly problematic.
john
PS yes I was giving triples!!!!
Denny said:
"if I understand topic maps correctly it should be trivial to write a
transformer that takes the export that Wikidata will offer and translates it
into topic maps, if you are so inclined. This way the topic maps community
can be served through that transformer easily, be it a web service or a
parser-front-end"
Sure, at a technical level, but let's focus on concepts & requirements
though because your semantics may force mapping SNAKs to other protocols,
bad karma for lossless exchange. And allow me to note that the "topic map"
community is one & the same here as the "semantic" community (if not the
greatest part of that community btw) so please, let's not create a we-they
paradigm okay? I guess you know that Drupal is incorporating ISO Topic Maps
and that ISO Topic Maps is a superset of the W3 RDF. It's definitely
heartening to see your own evolution on this matter, creating a conceptual
design so duplicative of this international standard (you can reach farther
when standing on the shoulders of giants...).
Last point, on the "information sources must be free" dictum, which you
stated in your reply.
ISO has a simple business model that is fair honest etc, to support its own
non-profit operations. You can use their info for free. That said, can you
provide a link to your stated policy, stated as an MWF policy, or is this
your own policy -- I'd like to know more about the thinking behind it.
Thanks - john
Denny said:
you forgot to add something like
France#Density:2012_pop_estimate_Bilan_2010 property Density .
No I did not forget anything, given the Density 'namespace' in the subobject
name.
IOW your triple merely restates what is discernible from the subobject name.
Maybe you should tell me what a "property" property is supposed to represent
At most I made a misstatement that "Estimated is an adjective treated as a
subclass of owl:Class"
It should say "as an instance of owl:Class"
Denny said:
But if you find a simpler, and more RDFish way to express the (below)
statement, please feel free to enlighten me. I would be indeed very
interested.
"The population density of France, as of an 2012 estimate, is 116 per square
kilometer, according to the "Bilan demographique 2010"."
A wiki namespace-based approach is:
France has_subobject France#Density:2012_pop_estimate_Bilan_2010
France#Density:2012_pop_estimate_Bilan_2010 ^source "Bilan ...2010"
France#Density:2012_pop_estimate_Bilan_2010 ^npkm2 116
France#Density:2012_pop_estimate_Bilan_2010 Type Estimated
France#Density:2012_pop_estimate_Bilan_2010 ^date 2012
Key:
France#Density:2012_pop_estimate_Bilan_2010 is a named subobject
This subobject is so named to prevent subobject name collisions
Subobject names follow pagename naming conventions
has_subobject is a reserved property name
Density is an instance in a Type (or, Noun) namespace
All properties prefixed by ^ are text properties
^source and ^date are Dublin Core properties (both text properties)
Type is a Dublin Core property (an object property)
Estimated is an adjective treated as a subclass of owl:Class
Estimated is an abbreviation of "Estimated Things"
^npkm2 is an SI unit (amount per sq km)
Apologies, should this already have been discussed.
http://meta.wikimedia.org/wiki/Wikidata/Notes/Data_model#The_Metamodel
defines: Wikipedialink = (Title, LanguageId, Badge?)
In my experience the scope or extent of entities in different
Wikipedias sometimes differs. One Wikipedia considers an entity a
valid lemma, whereas another Wikipedia subsumes it in a larger lemma.
Would changing the model to:
Wikipedialink = (Title, LanguageId, Relation, Badge?)
where relation can be
broader match
close match
exact match
narrower match
etc. help in expressing these situations? I believe this could prevent
problems later on.
The default (and initial import of interlanguage links) could easily
be "close match" - to be refined only where required.
Gregor
Wiki namespaces are currently so underused people may not realize their
importance: they provide crucial semantic information. For instance,
consider the example given in Wikidata's data model article[1]
"Obama was US Senator from Illinois from January 3, 2005 to November 16,
2008"
which yielded these observations:
a.. mainSnak of type PropertyValueSnak with subject "Obama", property "US
Senator from", and value "Illinois"
b.. auxiliary Snak of type PropertyIntervalSnak with property "in office"
and interval "January 3, 2005 to November 16, 2008" (the subject of the
auxiliary Snak is always the statement itself).
An alternative lexical model might restate this as
"The US Senator for the place Illinois is/was the person Obama from date
January 3, 2005 until date November 16, 2008".
a.. the prime resource being described is a US Senator page not so much
the Obama page
b.. Person:Obama is the subject complement of this US Senator via the
linking verb-property 'was' or "is"
c.. "for" is a property of this US Senator whose value is "Place:Illinois"
d.. "from" is a property of this US Senator with the value "Date:January
3, 2005"
e.. "until" is a property of this US Senator with the value "Date:November
16, 2008"
A significant point is that that US Senator page is named Senator:Barack H
Obama (or, Legislator:Barack H Obama or Public Employee:Barack H Obama,
etc); it is of type US Senator, and it has these three properties, for,
from, and until. In other words, if the content from this page is to be
shown on the Person:Barack H Obama page, then that content should be
transcluded from the Senator page; its semantic markup need not because
software can interpret transcluded material as being a "subject" of, &or
organic to, the Person page.
Lastly I really don't know how developers will cognitively absorb made-up
words like Snak. The need for the term does mystify me somewhat. I do think
everyone seems to "get" namespaces, appreciating the clarity they provide. I
hope concepts like "namespace" can be equally as prominent at this stage as
Snaks in the Wikidata model. Regards, --Hypergrove (talk) 03:03, 5 April
2012 (UTC)
[1] http://meta.wikimedia.org/w/index.php?title=Talk:Wikidata/Data_model
Hi Denny -
Thanks for your reply and I am relieved. The design seems in the process of
walking towards looking quite alot like ISO Topic Maps, I must say, because
it designates no wall of separation between classes and topics. Today that
wall exists in SMW in the dichotomy of Category vs all-other-namespaces,
with the problems I've outlined. I'm reading into your document that there
will be no wall - that the topics describing classes surely will exist in
the same 'namespace' as the topics purported to be instances of these
classes.
Is this correct? If so, then there's less difference between ISO Topic Maps
and your design than what I had origianlly thought. If indeed the direction
of the project (as I detect on this email list) is to associate pages with
classification schemes such as LCSH or many others, then we're talking about
even more an ISO Topic Map orientation. Which brings me back to the many
benefits of a *brutally honest* adoption of the ISO Topic Map technology.
Extend & refine it for sure, but imho ISO Topic Map technology is an
excellent fit with wiki implementations. It seems to be what you're
incidentally doing anyway.
cheers - john