At 11:55 09/04/2012, Soslan Khubulov wrote:
>The best thing would be to create new engine specially for
>structured data. It would be also better for Wikitionary.
>Just remember what was Mediawiki created for. Storing marked up text
>pages. Mediawiki is good for encyclopedia but not for Wikitionary
>and Wikidata purposes.
Soslan,
you are most probably correct. However, I feel that every different
need that can be discussed about Wikidata may lead to this
conclusion, but with different requirements. This is why I suggest to
uncouple the storing architecture (there might be several ones) from
the project and to make central its interchange protocol. In doing so
I suggest to refer to a NoSQL typical storing system as being by
essence the most complex context since it can be format independant.
Such a protocol is more complex than a simple JSON use. It should
support concepts such as structure characteristics, confidence
levels, IP protection, plagiary filtering, authority authentication,
encryption, langtags, mandatory information, locale files, time,
embargoes, acknowledgments, etc. Possibly we may want to specify
datawiki agents for the capture of the data (DWA), some of them could
be automated processes (e.g. weather observations, scientific
experiments reporting, stock exchanges, etc.)
jfc
At 09:20 09/04/2012, Gerard Meijssen wrote:
>Hoi,
>First things first ... that is getting Wikidata to work for its
>initial purposes. Automated updates from elsewhere are nice but
>introduce a complete new set of issues including reliability.
I agree fist thing first. It seens that in a network centric world
the holistic aspects should come first. New projects necessarily come
in a context they depend on and they are networked with. They have to
be in osmosis with their context; and its possible futures, and
therefore designed for it. Foreign (whiich do not have necessarily to
be automated) batch updtes are part of their environment as well as
users individual updates. The interest of a networked datawikis
approach is that requirements can be distributed and therefore
Wikidata specifications to be simpler, as long as they are supported
by a common generic basis and an interchange protocol. The Dublin
Core results from OCLC networking in the late 70s. The W3C did not
start in thinking of semantic registries but of a semantic web. The
IRI are universal. JSON is open and universal. Denny does not even
understand what my own project basics mean, howver we can easily meet
on a JSON based protocol. Why, for example, to enter geographic
coordinates or linguistic tables manually?
For example, I look in vain for a single table quiving me the name,
value, characteristics, and 32x32 bits graphic of every ISO 10646
code poiint. If someone makes it, it should result in an easy batch
transfer, supported (both ways) by an authoritative decision. Not by
millions of human error prone manual entries. For the time being I
did not see discuss the position of the huge amount of new entries
not being validated yet. If I enter that ice melts at 5°C, will that
be immediately dessiminated or will be in stand-by somewhere until approved?
jfc
>Thanks,
>Â Â Â Gerard
>
>On 9 April 2012 03:25, JFC Morfin
><<mailto:jefsey@jefsey.com>jefsey(a)jefsey.com> wrote:
>Is there an objection to the concept of, or cooperation with,
>"datawiki" Wikidata compatible projects? I would define a "datawiki"
>(as there are databases) as a JSON oriented NoSQL DBMS using an
>enhanced wiki as a human user I/O interface. This would permit
>BigData, specialized data, and graph sources to feed Wikidata along
>their own data philosophy and collection/update policy. I suppose
>that the main point would be an inter-datawiki interchange protocol
>(RFC?) matching the datawiki authoritative operators' (the first of
>them being Wikidata) requirements. I would permit projects at
>different stages of R&D or with different main purposes in order to
>cooperate with Wikidata.
>jfc
>
>
>
>
>_______________________________________________
>Wikidata-l mailing list
><mailto:Wikidata-l@lists.wikimedia.org>Wikidata-l(a)lists.wikimedia.org
>https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>
>
>_______________________________________________
>Wikidata-l mailing list
>Wikidata-l(a)lists.wikimedia.org
>https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Hoi,
First things first ... that is getting Wikidata to work for its initial
purposes. Automated updates from elsewhere are nice but introduce a
complete new set of issues including reliability.
Thanks,
Gerard
On 9 April 2012 03:25, JFC Morfin <jefsey(a)jefsey.com> wrote:
> Is there an objection to the concept of, or cooperation with, "datawiki"
> Wikidata compatible projects? I would define a "datawiki" (as there are
> databases) as a JSON oriented NoSQL DBMS using an enhanced wiki as a human
> user I/O interface. This would permit BigData, specialized data, and graph
> sources to feed Wikidata along their own data philosophy and
> collection/update policy. I suppose that the main point would be an
> inter-datawiki interchange protocol (RFC?) matching the datawiki
> authoritative operators' (the first of them being Wikidata) requirements. I
> would permit projects at different stages of R&D or with different main
> purposes in order to cooperate with Wikidata.
> jfc
>
>
>
>
> ______________________________**_________________
> Wikidata-l mailing list
> Wikidata-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/**mailman/listinfo/wikidata-l<https://lists.wikimedia.org/mailman/listinfo/wikidata-l>
>
Hi everybody!
As the guy who has to honor to shortly receive some funding from
Wikimedia Germany for handling spatial open government data [0] I
would like to make some remarks on the current geo definitions in the
Wikidata model:
1. Spatial Reference System Identifier (SRID [1]) definition is missing
Every GeoCoordinatesValue field should either have a corresponding
SRID field that defines the used spatial reference system (SRS [2]) or
mandate the use of a single SRS like WGS84 [3] which is currently the
standard used by GPS, OpenStreetMap and Wikipedia.
2. Geographic shapes should be defined in either Well-known text (WKT
[4]) or GeoJSON [5]
WKT is the defacto standard to store spatial data in a rational
database and GeoJSON is the defacto standard to access geo data via
web. Both formats can be easily transformed into each other. So which
one you choose pretty much depends on your preferred choice of SQL vs.
NoSQL database.
So in summary I would propose the following data model for spatial data:
Geographic locations
Datatype IRI: http://wikidata.org/vocabulary/datatype_geocoords
Value: GeoCoordinatesValue
Mandatory spatial reference system: EPSG 4326 (WGS 84/GPS)
Type: Decimal
Geographic objects
Datatype IRI: http://wikidata.org/vocabulary/datatype_geoobjects
Value: GeoObjectsValue
Type: GeoJSON [5]
Geographic objects SRID
Datatype IRI: http://wikidata.org/vocabulary/datatype_geoobjects_srid
Value: GeoObjectsSridValue
Type: EPSG Spatial Reference System Identifier (SRID [1])
That model would allow a structure where every spatial object can have
a complex geometry stored in its original geodetic system and still
have an easily manageable location in GPS format.
cu andreas
[0] http://de.wikipedia.org/wiki/Wikipedia:Community-Projektbudget#2._kartenwer…
[1] https://en.wikipedia.org/wiki/Spatial_reference_system_identifier
[2] https://en.wikipedia.org/wiki/Spatial_reference_system
[3] https://en.wikipedia.org/wiki/WGS84
[4] https://en.wikipedia.org/wiki/Well-known_text
[5] https://en.wikipedia.org/wiki/GeoJSON
Call for Presentations is open, closes May 11:
https://thestrangeloop.com/sessions-page/call-for-presentations
This is one of the four big conferences I'm pushing to get Wikimedia
developers to speak at this year, because there are great developers
there to inform and recruit. Talks should be technical enough that they
include code. You can do a 50-minute or 20-minute proposal.
They specifically ask for Semantic Web talks, so I think a Wikidata
presentation would have a good chance of being accepted.
Strange Loop is September 23-25 in St. Louis, Missouri, USA. (I
understand if the Wikidata team thinks this is too much of a distraction
from their work and declines to submit a talk.)
--
Sumana Harihareswara
Volunteer Development Coordinator
Wikimedia Foundation
I ran into this visualization today:
http://scimaps.org/maps/map/design_vs_emergence__127/
To be honest, I am not sure I fully understand what I see in detail on the chart, though. Maybe somebody who knows better UDC and the WP categories may make more sense of it
(you can click on the image and then you can zoom in)
Ivan
----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
FOAF: http://www.ivan-herman.net/foaf.rdf
Hi All,
I just wanted to make sure folks had seen this paper by Eytan Adar, Michael Skinner, and Daniel Weld (big names in AI). It's precisely about aligning infoboxes and propagating information across multiple language editions, so it might help guide the process when we get to that stage (and even earlier):
http://www.cond.org/paper_202.pdf
- Brent
Brent Hecht
Ph.D. Candidate in Computer Science
CollabLab: The Collaborative Technology Laboratory
Northwestern University
w: http://www.brenthecht.com
e: brent(a)u.northwestern.edu
Binàris,
then you should coin a crystal clear definition of snak (could it be
made an acronym?) everyone can memorize and understand. Sounds also
as snap and snag. If you find a pun it would help it get accepted.
jfc
At 10:39 06/04/2012, Bináris wrote:
>2012/4/5 Gregor Hagedorn
><<mailto:g.m.hagedorn@gmail.com>g.m.hagedorn(a)gmail.com>
>
>I still feel uneasy about the hard-to-remember-neonym.
>
>It was strange to me and had to read after it. You may remember as
>bit-->byte-->snack, growing pieces of food.
>
>
>I cannot prove
>it, but believe the term snak will have to be learned by anyone who
>interacts with the system through the API, any form of import
>mechanism, etc.
>
>Well, and what's then? They will learn. Once I thought namespaces to
>be a rather programming word and concept, but then I became a
>Wikipedian and understood they were a basic concept of editing.
>Every Wikipedian must know the difference between article and user
>and project namespace and they are not afraid of the word even if
>they have no real knowledge about namespaces in programming. People
>must understand concepts and ideas, and for the majority of
>non-English, non-programmer people it will be quite the same
>whatever name the new concept has. More, a sna(c)k fits better to
>every day concepts of an avarage person than an assertion, doesn't it?
>
>
>--
>Bináris
>_______________________________________________
>Wikidata-l mailing list
>Wikidata-l(a)lists.wikimedia.org
>https://lists.wikimedia.org/mailman/listinfo/wikidata-l
At 12:45 05/04/2012, Denny VrandeÄiÄ wrote:
>In short, we have for Wikidata two pragmatic goals:
>* Wikidata's first aim is to support the Wikipedias with their language links
>* Wikidata's second aim is to support the Wikipedias with the infoboxes
>
>Out of the support for these tasks, other interesting use cases
>might and are expected to arise.
>
>Until I manage to understand how your comments relate to one of
>these goals, I will personally take the liberty to ignore your comments.
Fair enough :-)
Your assesments are correct. As I first documented it, our
(iucg(a)ietf.org) target in this area is the Internet+ (smart fringe to
fringe Internet) MDRS (metadata registry multilinguistic distributed
referential system). The MDRS is to the Internet+ and to the Semiotic
Internet (Intersem) that we explore, what the IANA is to the legacy
Internet, and what Wikidata might be to Wikimedia.
Our "use case" is the Internet+ distributed operations (I documented
the IETF Drafts references). The MDRS will most probably be a
datawiki or/and a DDDS (the DNS is a DDDS) of some sort. Todays IANA
and wikis are humanly fed and read, datawikis will be more and more
fed and read by intelligent processes. This intelligence leads to
additional opportunities and constraints.
Our targets are the same, however you have to have conceptual limits,
while by essence I must have none. This is why I tried to poke our
possible common interest areas. Your two confirmed documents now
gives us your current limits (the more people understand what the
"revolution" (as per wikimedia) datawikis are going to be, the more
they may expect from them).
My own target is to internally review these documents, assess their
possible evolution, strive to stay interoperable, and permit users
and applications to take a better advantage from your project
(wikidata.iucg.org). We will alert you if we fear possible
architectural conflicts through our work and tests. This seems to be
in line with what Lydia responded today.
Best
jfc
It's more accurate to say that your belief is an artifact of present tools.
RDF has just one way to associate a Class with an object, the rdf:type
attribute. Specifically because RDF makes no distinction between classes
that represent a type-of-thing (eg a Character) and classes that represent a
facet-of-thing (eg Fictional), present tools require multiple classes to be
able to be associated with any resource. Obviously a given resource can have
multiple facets. In my work I store facet-classes in the Dublin Core
Coverage and Format properties and I store a single existential-class in the
Dublin Core Type property for the page; the page's template restates both
kinds of classes as Categories for the page (hence my piqued email to at
least define existential classes in a separate namespace from category).
So if no distinction is made, then multiple "types" are indeed necessary. If
a distinction between nouns and adjectives is made, then one type + multiple
facets is necessary.
-----Original Message-----
From: John McClure [mailto:jmcclure@hypergrove.com]
Sent: Thursday, April 05, 2012 7:08 PM
To: Wikidata (E-mail)
Subject: [Wikidata-l] Namespace-based model
Denny said:
I think the assumption everything has exactly one type is oversimplifying
The assumption that everything is of multiple types is over-complicating.
Usually you can tell from the first sentence in the Wikipedia page.
"Tuesday is a day of the week"
"Love is an emotion"
"(Roman) Catholicism is a faith"
"Gollum is a fictional character"
"HAL-9000 is a character"
"Noah is a Patriarch"
"Enos was the first chimpanzee"
So consensus certainly is being achieved among thousands of authors about
the fundamental type of thing each of these pages represent. Disambiguation
pages very commonly reference these types of things as in "Enos
(chimpanzee)".
Let's take Gollum. I can imagine a topic map has these subjects:
1. Character
1A. Fictional character
1A1. Fictional person
1A2. Fictional animal
1A3. Fictional ghost
1A4. Fictional god
Another equally valid assertion is that Gollum is a Character that is
typed as Fictional and Human thing (both these adjectives that are instances
of owl:Class) -- so that a comprehensive system sometime in the future would
reinterpret that Gollum is actually a Fictional person.
As you say yourself, it's not useful to create a "perfect" system to
handle every imaginable edge case **to the extent that they exist**.
Personally I don't believe such edge cases can be found - I challenge anyone
to provide me such an example.
But more to the point of Wikidata. I don't believe for a second that WP
will be reorganized into thousands of namespaces. Rather, I believe first,
SUBOBJECT names must include the idea of 'namespace' for the efficiencies
gained, and second, WP pages should be associated with the same set of nouns
(noun-phrases) available for subobject names. IOW, it's an implementation
issue whether a wiki's pages are named using these namespaces, so that the
wiki as a whole can gain the same inherent efficiencies I've sketched for
subobjects.
Best - john