Wikidata April 2012

wikidata@lists.wikimedia.org

73 participants
111 discussions

Re: [Wikidata-l] Engine of Wikidata
by JFC Morfin 09 Apr '12

09 Apr '12

At 11:55 09/04/2012, Soslan Khubulov wrote: >The best thing would be to create new engine specially for >structured data. It would be also better for Wikitionary. >Just remember what was Mediawiki created for. Storing marked up text >pages. Mediawiki is good for encyclopedia but not for Wikitionary >and Wikidata purposes. Soslan, you are most probably correct. However, I feel that every different need that can be discussed about Wikidata may lead to this conclusion, but with different requirements. This is why I suggest to uncouple the storing architecture (there might be several ones) from the project and to make central its interchange protocol. In doing so I suggest to refer to a NoSQL typical storing system as being by essence the most complex context since it can be format independant. Such a protocol is more complex than a simple JSON use. It should support concepts such as structure characteristics, confidence levels, IP protection, plagiary filtering, authority authentication, encryption, langtags, mandatory information, locale files, time, embargoes, acknowledgments, etc. Possibly we may want to specify datawiki agents for the capture of the data (DWA), some of them could be automated processes (e.g. weather observations, scientific experiments reporting, stock exchanges, etc.) jfc

1 0

Re: [Wikidata-l] datawikis?
by JFC Morfin 09 Apr '12

09 Apr '12

At 09:20 09/04/2012, Gerard Meijssen wrote: >Hoi, >First things first ... that is getting Wikidata to work for its >initial purposes. Automated updates from elsewhere are nice but >introduce a complete new set of issues including reliability. I agree fist thing first. It seens that in a network centric world the holistic aspects should come first. New projects necessarily come in a context they depend on and they are networked with. They have to be in osmosis with their context; and its possible futures, and therefore designed for it. Foreign (whiich do not have necessarily to be automated) batch updtes are part of their environment as well as users individual updates. The interest of a networked datawikis approach is that requirements can be distributed and therefore Wikidata specifications to be simpler, as long as they are supported by a common generic basis and an interchange protocol. The Dublin Core results from OCLC networking in the late 70s. The W3C did not start in thinking of semantic registries but of a semantic web. The IRI are universal. JSON is open and universal. Denny does not even understand what my own project basics mean, howver we can easily meet on a JSON based protocol. Why, for example, to enter geographic coordinates or linguistic tables manually? For example, I look in vain for a single table quiving me the name, value, characteristics, and 32x32 bits graphic of every ISO 10646 code poiint. If someone makes it, it should result in an easy batch transfer, supported (both ways) by an authoritative decision. Not by millions of human error prone manual entries. For the time being I did not see discuss the position of the huge amount of new entries not being validated yet. If I enter that ice melts at 5°C, will that be immediately dessiminated or will be in stand-by somewhere until approved? jfc >Thanks, >Â Â Â Gerard > >On 9 April 2012 03:25, JFC Morfin ><<mailto:jefsey@jefsey.com>jefsey(a)jefsey.com> wrote: >Is there an objection to the concept of, or cooperation with, >"datawiki" Wikidata compatible projects? I would define a "datawiki" >(as there are databases) as a JSON oriented NoSQL DBMS using an >enhanced wiki as a human user I/O interface. This would permit >BigData, specialized data, and graph sources to feed Wikidata along >their own data philosophy and collection/update policy. I suppose >that the main point would be an inter-datawiki interchange protocol >(RFC?) matching the datawiki authoritative operators' (the first of >them being Wikidata) requirements. I would permit projects at >different stages of R&D or with different main purposes in order to >cooperate with Wikidata. >jfc > > > > >_______________________________________________ >Wikidata-l mailing list ><mailto:Wikidata-l@lists.wikimedia.org>Wikidata-l(a)lists.wikimedia.org >https://lists.wikimedia.org/mailman/listinfo/wikidata-l > > >_______________________________________________ >Wikidata-l mailing list >Wikidata-l(a)lists.wikimedia.org >https://lists.wikimedia.org/mailman/listinfo/wikidata-l

1 0

Re: [Wikidata-l] datawikis?
by Gerard Meijssen 09 Apr '12

09 Apr '12

Hoi, First things first ... that is getting Wikidata to work for its initial purposes. Automated updates from elsewhere are nice but introduce a complete new set of issues including reliability. Thanks, Gerard On 9 April 2012 03:25, JFC Morfin <jefsey(a)jefsey.com> wrote: > Is there an objection to the concept of, or cooperation with, "datawiki" > Wikidata compatible projects? I would define a "datawiki" (as there are > databases) as a JSON oriented NoSQL DBMS using an enhanced wiki as a human > user I/O interface. This would permit BigData, specialized data, and graph > sources to feed Wikidata along their own data philosophy and > collection/update policy. I suppose that the main point would be an > inter-datawiki interchange protocol (RFC?) matching the datawiki > authoritative operators' (the first of them being Wikidata) requirements. I > would permit projects at different stages of R&D or with different main > purposes in order to cooperate with Wikidata. > jfc > > > > > ______________________________**_________________ > Wikidata-l mailing list > Wikidata-l(a)lists.wikimedia.org > https://lists.wikimedia.org/**mailman/listinfo/wikidata-l<https://lists.wikimedia.org/mailman/listinfo/wikidata-l> >

1 0

[Wikidata-l] Spatial data definition
by Andreas Trawoeger 09 Apr '12

09 Apr '12

Hi everybody! As the guy who has to honor to shortly receive some funding from Wikimedia Germany for handling spatial open government data [0] I would like to make some remarks on the current geo definitions in the Wikidata model: 1. Spatial Reference System Identifier (SRID [1]) definition is missing Every GeoCoordinatesValue field should either have a corresponding SRID field that defines the used spatial reference system (SRS [2]) or mandate the use of a single SRS like WGS84 [3] which is currently the standard used by GPS, OpenStreetMap and Wikipedia. 2. Geographic shapes should be defined in either Well-known text (WKT [4]) or GeoJSON [5] WKT is the defacto standard to store spatial data in a rational database and GeoJSON is the defacto standard to access geo data via web. Both formats can be easily transformed into each other. So which one you choose pretty much depends on your preferred choice of SQL vs. NoSQL database. So in summary I would propose the following data model for spatial data: Geographic locations Datatype IRI: http://wikidata.org/vocabulary/datatype_geocoords Value: GeoCoordinatesValue Mandatory spatial reference system: EPSG 4326 (WGS 84/GPS) Type: Decimal Geographic objects Datatype IRI: http://wikidata.org/vocabulary/datatype_geoobjects Value: GeoObjectsValue Type: GeoJSON [5] Geographic objects SRID Datatype IRI: http://wikidata.org/vocabulary/datatype_geoobjects_srid Value: GeoObjectsSridValue Type: EPSG Spatial Reference System Identifier (SRID [1]) That model would allow a structure where every spatial object can have a complex geometry stored in its original geodetic system and still have an easily manageable location in GPS format. cu andreas [0] http://de.wikipedia.org/wiki/Wikipedia:Community-Projektbudget#2._kartenwer… [1] https://en.wikipedia.org/wiki/Spatial_reference_system_identifier [2] https://en.wikipedia.org/wiki/Spatial_reference_system [3] https://en.wikipedia.org/wiki/WGS84 [4] https://en.wikipedia.org/wiki/Well-known_text [5] https://en.wikipedia.org/wiki/GeoJSON

5 6

[Wikidata-l] Submit Wikidata talk to Strange Loop?
by Sumana Harihareswara 08 Apr '12

08 Apr '12

Call for Presentations is open, closes May 11: https://thestrangeloop.com/sessions-page/call-for-presentations This is one of the four big conferences I'm pushing to get Wikimedia developers to speak at this year, because there are great developers there to inform and recruit. Talks should be technical enough that they include code. You can do a 50-minute or 20-minute proposal. They specifically ask for Semantic Web talks, so I think a Wikidata presentation would have a good chance of being accepted. Strange Loop is September 23-25 in St. Louis, Missouri, USA. (I understand if the Wikidata team thinks this is too much of a distraction from their work and declines to submit a talk.) -- Sumana Harihareswara Volunteer Development Coordinator Wikimedia Foundation

2 1

[Wikidata-l] Wikipedia Categories vs. UDC
by Ivan Herman 08 Apr '12

08 Apr '12

I ran into this visualization today: http://scimaps.org/maps/map/design_vs_emergence__127/ To be honest, I am not sure I fully understand what I see in detail on the chart, though. Maybe somebody who knows better UDC and the WP categories may make more sense of it (you can click on the image and then you can zoom in) Ivan ---- Ivan Herman, W3C Semantic Web Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 FOAF: http://www.ivan-herman.net/foaf.rdf

1 0

[Wikidata-l] relevant work on infobox alignment
by Brent Hecht 07 Apr '12

07 Apr '12

Hi All, I just wanted to make sure folks had seen this paper by Eytan Adar, Michael Skinner, and Daniel Weld (big names in AI). It's precisely about aligning infoboxes and propagating information across multiple language editions, so it might help guide the process when we get to that stage (and even earlier): http://www.cond.org/paper_202.pdf - Brent Brent Hecht Ph.D. Candidate in Computer Science CollabLab: The Collaborative Technology Laboratory Northwestern University w: http://www.brenthecht.com e: brent(a)u.northwestern.edu

1 0

Re: [Wikidata-l] SNAK -> assertion?
by JFC Morfin 07 Apr '12

07 Apr '12

Binàris, then you should coin a crystal clear definition of snak (could it be made an acronym?) everyone can memorize and understand. Sounds also as snap and snag. If you find a pun it would help it get accepted. jfc At 10:39 06/04/2012, Bináris wrote: >2012/4/5 Gregor Hagedorn ><<mailto:g.m.hagedorn@gmail.com>g.m.hagedorn(a)gmail.com> > >I still feel uneasy about the hard-to-remember-neonym. > >It was strange to me and had to read after it. You may remember as >bit-->byte-->snack, growing pieces of food. > > >I cannot prove >it, but believe the term snak will have to be learned by anyone who >interacts with the system through the API, any form of import >mechanism, etc. > >Well, and what's then? They will learn. Once I thought namespaces to >be a rather programming word and concept, but then I became a >Wikipedian and understood they were a basic concept of editing. >Every Wikipedian must know the difference between article and user >and project namespace and they are not afraid of the word even if >they have no real knowledge about namespaces in programming. People >must understand concepts and ideas, and for the majority of >non-English, non-programmer people it will be quite the same >whatever name the new concept has. More, a sna(c)k fits better to >every day concepts of an avarage person than an assertion, doesn't it? > > >-- >Bináris >_______________________________________________ >Wikidata-l mailing list >Wikidata-l(a)lists.wikimedia.org >https://lists.wikimedia.org/mailman/listinfo/wikidata-l

1 0

Re: [Wikidata-l] Industry, JTC1/ISO, W3C, IUse, Wikimedia - where are we ?
by JFC Morfin 06 Apr '12

06 Apr '12

At 12:45 05/04/2012, Denny VrandeÄiÄ wrote: >In short, we have for Wikidata two pragmatic goals: >* Wikidata's first aim is to support the Wikipedias with their language links >* Wikidata's second aim is to support the Wikipedias with the infoboxes > >Out of the support for these tasks, other interesting use cases >might and are expected to arise. > >Until I manage to understand how your comments relate to one of >these goals, I will personally take the liberty to ignore your comments. Fair enough :-) Your assesments are correct. As I first documented it, our (iucg(a)ietf.org) target in this area is the Internet+ (smart fringe to fringe Internet) MDRS (metadata registry multilinguistic distributed referential system). The MDRS is to the Internet+ and to the Semiotic Internet (Intersem) that we explore, what the IANA is to the legacy Internet, and what Wikidata might be to Wikimedia. Our "use case" is the Internet+ distributed operations (I documented the IETF Drafts references). The MDRS will most probably be a datawiki or/and a DDDS (the DNS is a DDDS) of some sort. Todays IANA and wikis are humanly fed and read, datawikis will be more and more fed and read by intelligent processes. This intelligence leads to additional opportunities and constraints. Our targets are the same, however you have to have conceptual limits, while by essence I must have none. This is why I tried to poke our possible common interest areas. Your two confirmed documents now gives us your current limits (the more people understand what the "revolution" (as per wikimedia) datawikis are going to be, the more they may expect from them). My own target is to internally review these documents, assess their possible evolution, strive to stay interoperable, and permit users and applications to take a better advantage from your project (wikidata.iucg.org). We will alert you if we fear possible architectural conflicts through our work and tests. This seems to be in line with what Lydia responded today. Best jfc

1 0

Re: [Wikidata-l] Namespace-based model
by John McClure 06 Apr '12

06 Apr '12

It's more accurate to say that your belief is an artifact of present tools. RDF has just one way to associate a Class with an object, the rdf:type attribute. Specifically because RDF makes no distinction between classes that represent a type-of-thing (eg a Character) and classes that represent a facet-of-thing (eg Fictional), present tools require multiple classes to be able to be associated with any resource. Obviously a given resource can have multiple facets. In my work I store facet-classes in the Dublin Core Coverage and Format properties and I store a single existential-class in the Dublin Core Type property for the page; the page's template restates both kinds of classes as Categories for the page (hence my piqued email to at least define existential classes in a separate namespace from category). So if no distinction is made, then multiple "types" are indeed necessary. If a distinction between nouns and adjectives is made, then one type + multiple facets is necessary. -----Original Message----- From: John McClure [mailto:jmcclure@hypergrove.com] Sent: Thursday, April 05, 2012 7:08 PM To: Wikidata (E-mail) Subject: [Wikidata-l] Namespace-based model Denny said: I think the assumption everything has exactly one type is oversimplifying The assumption that everything is of multiple types is over-complicating. Usually you can tell from the first sentence in the Wikipedia page. "Tuesday is a day of the week" "Love is an emotion" "(Roman) Catholicism is a faith" "Gollum is a fictional character" "HAL-9000 is a character" "Noah is a Patriarch" "Enos was the first chimpanzee" So consensus certainly is being achieved among thousands of authors about the fundamental type of thing each of these pages represent. Disambiguation pages very commonly reference these types of things as in "Enos (chimpanzee)". Let's take Gollum. I can imagine a topic map has these subjects: 1. Character 1A. Fictional character 1A1. Fictional person 1A2. Fictional animal 1A3. Fictional ghost 1A4. Fictional god Another equally valid assertion is that Gollum is a Character that is typed as Fictional and Human thing (both these adjectives that are instances of owl:Class) -- so that a comprehensive system sometime in the future would reinterpret that Gollum is actually a Fictional person. As you say yourself, it's not useful to create a "perfect" system to handle every imaginable edge case **to the extent that they exist**. Personally I don't believe such edge cases can be found - I challenge anyone to provide me such an example. But more to the point of Wikidata. I don't believe for a second that WP will be reorganized into thousands of namespaces. Rather, I believe first, SUBOBJECT names must include the idea of 'namespace' for the efficiencies gained, and second, WP pages should be associated with the same set of nouns (noun-phrases) available for subobject names. IOW, it's an implementation issue whether a wiki's pages are named using these namespaces, so that the wiki as a whole can gain the same inherent efficiencies I've sketched for subobjects. Best - john

1 0

← Newer
1
2
3
4
5
6
7
8
...
12
Older →

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Wikidata April 2012