Von: Nadja Kutz <nadja(a)daytar.de>
Datum: 14. Juni 2012 11:21:28 MESZ
An: "Discussion list for the Wikidata project." <wikidata-l(a)lists.wikimedia.org>
Betreff: Re:Re: [[meta:wikitopics]] updated
Hello Evan Sherwin
We just had a discussion in this thread (see http://article.gmane.org/gmane.org.wikimedia.wikidata/618)
about companies who might be interested in a standartized way to represent meta-information.
In fact since some while I am trying to make people think about the importance of a mathematical sound structuring and presentation of RDF standards for science and engineering applications, especially with respect to environmental issues, please see the article:
http://www.azimuthproject.org/azimuth/show/Examples+of+semantic+web+applica…
It might be interesting for you to hear that sofar the math and physics community did not react much on that -
eventually this may be because it is a rather tedious, unrewarding work
to implement such standards ? But there is in general and globally not so much support especially of mathematics, so
it just may that it is right now a bit too much of a burden to the math community, next to the normal math tasks.
And as you can see by our discussion in the thread the ISO seems to be equally hesitating.
The above article is currently only located at the Azimuth project www.azimuthproject.org, which is a circle of mostly scientists
who try to openly gather and structure environmental data.
In the article there is also a description of a student project where visualization tools for RDF data where created.
I had hoped that the students would put their software on sourceforge or so (and that I could announce this in the
article about the semantic web applications), but they seem to be very busy these times,
eventually they might even be now in internships at rather wellknown giant californian software industries :)
The promotion of the article as a publication on the Azimuth project was sofar not so big. This may be because Azimuths
public outreach program may not yet be overly advanced :). So my plan is currently to put the article on arxiv.org with the hope that some more mathematicians/scientists may get interested in these kind of tasks.
1. as mentioned several times, a standard for us to be considered
must be free. Free as in "Everyone can get it without having to pay or
register for
it. I can give it to anyone legally without any
restrictions." Free of patents. Free as in W3C.
2. I have taken another
look at your page, and after starting to read it you simply loose me.
You use so many terms without defining them. To give
just a few
examples:
* "The NIF ontology is incorporated into the ontology for
Wikitopics which shapes API designs." I do not know what the Wikitopics
ontology is. The
section beneath just lists a few keywords, but does not
really explain it. I do not know what it means for ontologies to
incorporate one another. I do
not know what it means for an ontology to
shape API designs.
* "Wikipage naming conventions are used to name
subobjects in an equally meaningful manner". Equally meaningful? To
what? What does this even mean? You completely lost me here.
* For the
key wikipage transclusions, you do not explain what a "formatted topic
presentation" is, a "formatted topic index", or a "formatted
infobox". I
think I understand the latter, but not the previous two. What are they?
And if I indeed understand it right, are you saying that
infoboxes have
to be completely formatted in Wikidata, as Gregor has asked?
Hello
Denny,
1. There are likely several ways to accommodate your process
requirements. And btw, I asked last month but received no response for a
citation to relevant MWF policy on this issue, to detect whether your
statement reflects the team's ELECTIVE policy or a MWF policy. Where's
the benefit from imposing expenses magnitudes greater on everyone, to
design develop & socialize solutions already known? And please mention
how the wikidata community can be assured that the wikidata team's
designs themselves don't infringe someone else's patent or copyright, a
reassurance that would directly follow from MWF's purchase of rights to
use an ISO standard.
2a. Surely you appreciate that Wikidata involves
fielding ''some'' ontology, at least as suggested by your intention to
include the (SMW) Property namespace. I don't know when you plan to
publish wikidata's ontology, but certainly it must be done so overtly
and soon, agile or not. I agree the ontology I proposed needs much
fleshing out, but chief goals of the proposed ontology are pretty clear
-- to provide a wiki-topic index, to support NIF tools directly, to
capture provenance data, to reuse existing SMW tools and key
international standards, and to establish various best-practices for the
wider community.
2b. An ontology that 'shapes/controls API interfaces'
means that the APIs' information model must align with the information
model represented by the ontology. If the ontology includes an
expiration-date as a required property, for instance, then the API needs
to include an expiration-date as a required parameter in some fashion.
2c. One ontology incorporating another is perhaps a clumsy way to
describe the process of associating a class or property defined in one
ontology, to another in a different ontology, either through a
subclass/subproperty relation or a documented or implemented transform.
2d."Equally meaningful" as the wiki-page naming conventions are, eg
interwiki:lang:ns:pgnm is quite meaningful ... I am proposing SMW
subobjects be named similarly, eg scope:lang:type:name, is the proposed
structure for SMW subobject names.
2e. A 'formatted topic
presentation' is the content displayed on a page for a topic. Wikidata
will have a page called (Main:)Thomas Jefferson that displays a
formatted topic presentation, showing information harvested from other
wikis plus any information developed by the wikidata community itself.
Using transclusion, anyone can embed (Main:)Thomas Jefferson into their
wiki. A 'formatted topic index' (which certainly can be one part of a
topic's formatted presentation) is a snippet that corresponds to the
"Thomas Jefferson" heading in a subject index under which are many
subtopics eg
Jefferson, Thomas [1] [2] [3]
-- Early years [4] [5]
[6]
-- Birth [7] [8]
-- Formative influences [9] [10]
-- etc
2f.
Perhaps you missed my immediate reply [1] to Gregor. Yes all infoboxes
(among other non/formatted artifacts) are '''transcluded''' from
wikidata, without the nonsense of cross-wiki API calls for individual
data-items, as I understand the wikidata team is now gearing to provide.
Best regards - jmc
[1]
http://lists.wikimedia.org/pipermail/wikidata-l/2012-May/000588.html
,
for instance
John McClure wrote:
"Of course, thanks for the pointer. Yes, I'd agree that 19788's ontology be closely reviewed for inclusion. 19788:2 standardizes the Dublin Core properties, the same I recommend for [[wikidata]] provenance data, the same slated for the [[wikidata]] ontology. But more to your point is that the entire ISO corpus would fit really well if it were viewed as a topic map whose topics and sub-topics can be referenced from [[wikidata]] artifacts such as property definitions."
Hello John
Frankly speaking I don't see why one would want to use topic maps.
That is RDF triples are after an identification (canonical: elements with the same URI are identified)
a labeled graph, here to be called "the" RDF graph. (I know that some people call the triples themselves
"the" RDF graph, but why use a second word (namely graph) for triples?
Triples are very trivial highly disconnected graph.).
If I want to connect certain nodes of that graph to a topic
I only need to supply these nodes with an extra triple which says
("this node belongs to this topic", i.e. something like (node, belongsto,thistopic) ) or modify
the canonical identification map and the RDF graph will be a "topic map" or one has the case that the triples are
already set out in "topics" that is for example
if I have a set of triples with the same resource URI then upon canonical identification these are
a kind of "topic map" (with all "legs" pointing in one direction) or am I missing out something crucial?
However if you start with topics, you have no canonical information about the "internal structure" of
a topic and in some cases you would need to artificially impose this in retrospect onto
the datastructure.
Like if your topic is "members of a society" , you have all the members and you would need some internal structure like a hierarchy
then you would need to supply each member with a hierarchy classification
(i.e. with extra data, which is usually different for each member). For the RDF case the person who gave you
the triples could have made a choice of order which could be given upon canonical identification. I.e. in principle the
internal structure depends on your identification map but there is a canonical one.
You can of course mimick a RDF triple with a topic map by choosing the topic to be the ressource and
one "leg" of your topic as a property and the topic which is connected with this "leg" as the object, but the
choice of a leg is not canonical if there is more than one leg. Only if you would make all "legs" of a topic map into triples you would have something like a canonical assignment. I find these differences important. But may be I have overseen something or misunderstood
about topic maps (I read about this issue what I found scattered around in the internet so this is not so unprobable).
I had this kind of discussion with people from deepa mehta http://www.deepamehta.de/
because they used topic maps, but sofar nobody there could convince me about the distinguished advantages of topic maps.
But the discussion was sofar rather brief.
The discussion was because we discussed to what extend it would be possible to merge a student project we
had at HTW Berlin ( a collaboration platform for visualizing RDF data called Mimirix
http://www.daytar.de/art/MIMIRIX/) with deepa mehta, like for example
one could use at least the backend, which has already a layout for access control
(the deepa mehta people told me that they haven't yet really attacked the issue
of access control) or one could use at lease the carefully designed client.
May be you have other arguments for topic maps, as said I might have missed out something.
I understand that there are other issues like the speed of adressability or direct access issues
but these are then rather an issue of the serialization I find.
So I didn't understand why for example the pregiven JSON structure of an JSONarray
http://www.json.org/javadoc/org/json/JSONArray.html
is not used in JSON-LD
http://json-ld.org/spec/latest/json-ld-syntax/#sets-and-lists
but thats another topic.
In the context of applications of ISO metadata you may want to read:
http://www.azimuthproject.org/azimuth/show/Examples+of+semantic+web+applica…
>> * The feedback-dialogue behaves strange when clicking on "What is this?" >> (the width of the dialogue changes) >Ok. To be honest I am not sure if we'll fix this but we'll see. It's noted. Actually, this is an Moodbar bug. I have filed a bug for this at https://bugzilla.wikimedia.org/show_bug.cgi?id=37624 .
Can´t we just install the SpamBlacklist extension and add links to the blacklist - MediaWiki:Spam-blacklist http://is.wikipedia.org/wiki/Melding:Spam-blacklist ?
I think the best way to combat spam is to identify the unwanted behaviour and then block it. No need to limit editing on the whole wiki, as that would also limit the edits of users that have not done anything wrong.
John McClure wrote: "http://en.wikipedia.org/wiki/Topic_Maps gives links to iso/iec 13250"
???
There seems to be a misunderstanding. I was having rather the meta descriptions of general ISO items in mind, not the description of a topic map per se. (and apart from that I thought wikidata wanted to use RDF ? )
The ISO site is rather cryptic and most of it is not accessible (see e.g. http://isotc.iso.org/livelink/livelink/open/jtc1sc36) however I understood sentences (see http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnu…) like:
"ISO/IEC 19788-1:2011 provides principles, rules and structures for the specification of the description of a learning resource; it identifies and specifies the attributes of a data element as well as the rules governing their use. The key principles stated in ISO/IEC 19788-1:2011 are informed by a user requirements-driven context with the aim of supporting multilingual and cultural adaptability requirements from a global perspective.
ISO/IEC 19788-1:2011 is information-technology-neutral and defines a set of common approaches, i.e. methodologies and constructs, which apply to the development of the subsequent parts of ISO/IEC 19788."
as that the ISO is in the process of turning parts of their database into a machine-readable standard format. So I assumed that
the "identification of a data element" for learning ressources could be sort of planned to be extendend to all of (or already have?) their standards, which reaches from screw threads
http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnu…
over mathematical symbols
http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnu…
to copper alloys
http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_tc_browse.htm?c…
If that would be the case then companies etc. could link and conform to standards (here for example an unlinked reference
to a DIN standard for
http://de.wikipedia.org/wiki/Wärmeleitzahl in a product for insulation: http://isofloc.de/index.php?technische-daten)
That is especially companies could be interested in promoting parts of their technical data in a ISO standartized format (which makes the comparision of technical data of products easier)
so for example crawlers could collect products which set out certain technical specifications. Organizations could
link easier to companies which conform e.g. to social standards etc.
So when I wrote that Wikidata could eventually base their data on the ISO standards then I meant that it
would make sense to have a structural correspondence
between ISO standards and definitions (or e.g. standards from the DIN http://en.wikipedia.org/wiki/Deutsches_Institut_f%C3%BCr_Normung or other similar organizations) and the wikidata ontology, because wikidata would for materials etc. anyways have an entry for the corresponding standards.
Friedrich Roehrs wrote:"1c. You're arguing over CHF 200 -- which extraordinarily-cheaply and
fundamentally PROTECTS the MWF from copyright infringement suits? Can
the SNAK architecture provide that reassurance to the MWF community?"
I don't know what you mean by that. As you can see in the ISO links:
http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnu…
each item alone costs something in that range.
Hey guys,
I think for a long time semantic projects have focused on
getting data out but haven't incorporated the 'voice of the consumer';
yet, if you think of data quality as 'suitable for customer
requirements' instead of 'process conforms to specification,' this is
the first step.
I'm looking forward to getting an early look of the data you
publish and otherwise advocate for the point of view of people who want
to use Wikidata data.
I'd also like to advocate for "documentation driven
development" here, because my experience with
http://code.google.com/p/basekb-tools/
is that it helps a lot to
(1) document the behavior of the system by providing specific
query examples,
(2) construct test sets for everything in the documentation
(3) If you're not 100% proud of the story told in your
documentation feed this back into the product.
I think a little attention towards producing "readerly" output
in conjunction with a "writerly" interface for Wikipedians could be key
to make Wikidata one of the foundations of the Linked Data Galaxy.
Hello John,
thanks for digging this out.
I see in the Brochure there is not only a postbox but they have even an office where one could meat the ISO:
1, ch. de la Voie-Creuse
It would be interesting to hear whats Wikimedia's opinion on this.
Nadja