Hi Martynas,
On Mon, Feb 13, 2012 at 02:08, Martynas Jusevicius
<martynas(a)graphity.org> wrote:
> Hey list,
>
> I just came across the Wikidata project and was excited to see that we
> share pretty much the same goals.
Excellent!
> After seeing the Wikimania video and reading the technical proposal, I
> think choosing any other data model than RDF would be a mistake.
> DBPedia has done a lot in this area (as I'm sure you must known) and
> failing to build on this work would be a bad start for Wikidata. There
> is no need to invent new models and APIs, RDF and SPARQL are standard
> and proven tools.
> Also, implementation-wise I would suggest scrapping any wiki-based
> software (Semantic)MediaWiki and build a fresh generic RDF-based
> platform.
>
> Hereby I want to present to you Graphity which is an open-source
> generic Linked Data management platform, found at http://graphity.org.
> We just released v1.0.0 for PHP which supports JAX-RS annotations for
> RESTful APIs and lightweight RDF object API very similar to Jena's.
> I hope you can find some of our code useful or even contribute to it.
> If you find it interesting, we can provide more documentation and/or
> support.
>
> Java version is on the way, as well as the most important component -
> generic Linked Data browser. It will implement most of the technical
> requirements specified in the Wikidata proposal. A prototype for O3.6
> extension can be seen here:
> http://semanticreports.com
>
> The platform was presented at W3C Linked Enterprise Data workshop last year:
> http://www.w3.org/2011/09/LinkedData/
> Here is the position paper and the presentation slides:
> http://www.w3.org/2011/09/LinkedData/ledp2011_submission_1.pdf
> http://www.slideshare.net/seporaitis/graphity-generic-linked-data-platform
>
> The platform is production-ready: it is running of the biggest
> scandinavian entertainment websites http://heltnormalt.dk (danish
> successor of http://wulffmorgenthaler.de).
>
> http://code.google.com/p/linked-data-api/
Thanks for the heads-up. I'll make sure this is looked at and considered.
Cheers
Lydia
--
Lydia Pintscher - http://about.me/lydia.pintscher
Community Communications for Wikidata
Wikimedia Deutschland e.V.
Eisenacher Straße 2
10777 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
On Tue, Feb 14, 2012 at 20:54, Bináris <wikiposta(a)gmail.com> wrote:
> Hi folks,
>
> I am sorry to be late with subscribing. As far as I see there are already
> two mails in the archive before my arrival. :-)
Hehe don't worry. You're right in time.
> I work with Pywikipedia and I am interested in applying Pywikibot to
> Wikidata from the first steps.
Cool! Do you have concrete ideas already what you'd want to do?
Cheers
Lydia
--
Lydia Pintscher - http://about.me/lydia.pintscher
Community Communications for Wikidata
Wikimedia Deutschland e.V.
Eisenacher Straße 2
10777 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
On Sat, Feb 25, 2012 at 08:31, Mike Cariaso <cariaso(a)gmail.com> wrote:
> Greetings from Hacker Space Korea.
>
> In anticipation of some of my eventual work for wikidata I've released
> https://github.com/cariaso/Semantic-MediaWiki-Bot
>
> And been pushing relevant 3rd parties to support GET instead of POST
>
> Ruby
> https://github.com/jpatokal/mediawiki-gateway/issues/24
>
> and Perl libraries.
> https://rt.cpan.org/Public/Bug/Display.html?id=75296
>
> Perhaps this will smooth the wikidata work when it more formally begins.
Thanks, Mike! We'll definitely have a look.
Cheers
Lydia
--
Lydia Pintscher - http://about.me/lydia.pintscher
Community Communications for Wikidata
Wikimedia Deutschland e.V.
Eisenacher Straße 2
10777 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
I am forwarding the 3 previous mails from the old wikidata mailing list in
order to preserve them.
Denny
---------- Forwarded message ----------
From: Bináris <wikiposta(a)gmail.com>
Date: 2012/2/14
Subject: [Wikidata] Introduction
To: wikidata(a)lists.wikimedia.org
Hi folks,
I am sorry to be late with subscribing. As far as I see there are already
two mails in the archive before my arrival. :-)
I work with Pywikipedia and I am interested in applying Pywikibot to
Wikidata from the first steps.
--
Bináris
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
I am forwarding the 3 previous mails from the old wikidata mailing list in
order to preserve them.
Denny
---------- Forwarded message ----------
From: Martynas Jusevicius <martynas(a)graphity.org>
Date: 2012/2/13
Subject: [Wikidata] Wikidata + Graphity
To: wikidata(a)lists.wikimedia.org
Hey list,
I just came across the Wikidata project and was excited to see that we
share pretty much the same goals.
After seeing the Wikimania video and reading the technical proposal, I
think choosing any other data model than RDF would be a mistake.
DBPedia has done a lot in this area (as I'm sure you must known) and
failing to build on this work would be a bad start for Wikidata. There
is no need to invent new models and APIs, RDF and SPARQL are standard
and proven tools.
Also, implementation-wise I would suggest scrapping any wiki-based
software (Semantic)MediaWiki and build a fresh generic RDF-based
platform.
Hereby I want to present to you Graphity which is an open-source
generic Linked Data management platform, found at http://graphity.org.
We just released v1.0.0 for PHP which supports JAX-RS annotations for
RESTful APIs and lightweight RDF object API very similar to Jena's.
I hope you can find some of our code useful or even contribute to it.
If you find it interesting, we can provide more documentation and/or
support.
Java version is on the way, as well as the most important component -
generic Linked Data browser. It will implement most of the technical
requirements specified in the Wikidata proposal. A prototype for O3.6
extension can be seen here:
http://semanticreports.com
The platform was presented at W3C Linked Enterprise Data workshop last year:
http://www.w3.org/2011/09/LinkedData/
Here is the position paper and the presentation slides:
http://www.w3.org/2011/09/LinkedData/ledp2011_submission_1.pdfhttp://www.slideshare.net/seporaitis/graphity-generic-linked-data-platform
The platform is production-ready: it is running of the biggest
scandinavian entertainment websites http://heltnormalt.dk (danish
successor of http://wulffmorgenthaler.de).
http://code.google.com/p/linked-data-api/
Martynas Jusevicius
http://twitter.com/pumba_lthttp://graphity.org
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
I am forwarding to you the first (not complete) version of the page
http://meta.wikimedia.org/wiki/User:Millosh/Dictionaries .
At the end of this month I'll have some software for such generation
of dictionaries. So, it would be good to hear what do you think about
that and is there someone interested to join this project. Maybe
Gerard may think how to implement such thing in the OmegaWiki, too :)
The page is not completed (stages 2 and 3 are not described), but I
think that you may follow my idea anyway. I'll complete the page in
the next few weeks and I'll inform you about that.
* * *
In this moment I am working on one Serbian dictionary of synonyms.
During that work I got some ideas about the work on Wiktionaries:
Let's say what one word with synonyms/translations is enough for one
word in Wiktionary. (Maybe I should read some Wiktionary
documentation, but I suppose that this is the minimum.)
In short, this may be done for a dozens of languages on a dozens of
Wiktionaries.
==Stage 1, one language dictionary==
*Take some dictionary between English (or whatever language) and your
language. Of course, take it in machine readable format (not
encrypted).
*Take the first word in (let's say) English.
*Take the first translation in your language. Connect this word in
your language with other translations of the word in English.
*Find which words in English have the same translation. Connect the
word with other translations in those words.
*You will get the list of connected words. There will be a lot of
mass, but you will be able to make some simple methods for cleaning
the most of the mass. The rest of the mass will be cleaned by humans
because this is a wiki :)
*Of course, you may do that with a lot of different dictionaries...
Imagine that we analyzed two words from language A in the dictionary
"language B -> language A" and that we got the next results (of
course, this is simplified table):
<pre>
A58 - B65 - A58, A43, A21, A63
- B69 - A58, A28, A21, A38
- B71 - A58, A43, A21, A88
- B89 - A58, A43, A21, A63
A21 - B31 - A21, A43, A76, A20
- B44 - A21, A43, A39, A22
- B65 - A58, A43, A21, A63
- B69 - A58, A28, A21, A38
- B71 - A58, A43, A21, A88
- B89 - A58, A43, A21, A63
</pre>
We may say that if one word from the language A has the same meaning
as the word A58 in the language B, this connection will get one point.
So, we will have the next situation according to the words A58 and
A21:
<pre>
A58(A21) = 4
A58(A43) = 3
A58(A63) = 2
A58(A28) = 1
A58(A38) = 1
A58(A88) = 1
A21(A43) = 5
A21(A58) = 4
A21(A63) = 2
A21(A28) = 1
A21(A38) = 1
A21(A88) = 1
A21(A76) = 1
A21(A39) = 1
A21(A20) = 1
A21(A22) = 1
</pre>
For the beginning, this may mean:
*The closest synonyms to the word A58 is the word A21.
*The closest synonyms to the word A21 is the word A43.
*Words A21, A58, A43 and A63 are synonyms (which we may call "G(As)1").
*It seems that words A28, A38, A88, A76, A39, A20 and A22 are not
related with the group G(As)1. However, we will put the connections in
the memory, but we will not write it into the dictionary. Imagine that
the word ''blood'' literary means in some language "red bird". Of
course, there are some ''red birds'' in the area where that language
is spoken. So, in this sense, blood will be connected with the word
"bird" and, almost for sure, with some specie of birds. However, this
will be the only connection to the birds. Other connections will be
inside of the descriptions for erythrocyte, lymphocyte, heart and so
on. Of course, mistakes are possible, but we may analyze results :)
*This may be very useful for smaller languages which have some two
language dictionaries (where the language B is English). We may be
able to generate one language Wiktionaries for all of such languages.
==Stage 2, two languages dictionary==
(To be continued.)
==Stage 3, cross language dictionaries==
(To be continued.)
http://de.wikipedia.org/wiki/Liste_der_byzantinischen_Kaiser
this here is a list that can be worked on with translations and basic
data - where should we place links to lists where we can work on?
Ciao, Sabine
___________________________________
Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB
http://mail.yahoo.it
Basically I am with Sabine and support the idea.
Yet I want to warn against doing it now and doing it quickly so as
to avoid certain pitfalls.
I suggest, rather to develop & train bots, data, and algorithms
with the test wikipedia only, for the time being, where spefic
situations can easily be (re)created without risk of havoc.
What follows are details and reasons, you can safely stop reading
here if not interested.
I've been mass inserting data in the ripuarian test wikipedia at
a semi autmated level, which I created from several small database
like collections, such as:
- names, and ISO codes of languages having Wikipedias,
- dates and mottos of carnival parades in the city of cologne of
the last 185 years.
- redirects for dialectal, and spelling variants
- etc.
So I've (limited) experience.
Pitfalls to be avoided.
----------------------
If we have inserted data in a WP already, and later a refined
version of that data becomes available, we want to pass that
to the WP. This becomes complicated when an article already
exists for a record. Thus we may strategically choose to export
data as late as possible, at an as complete state as possible,
when general additions and amendments have become unlikely, and
the data structure is stable.
We can safely replace articles, when we can determine that they
have been unaltered since our own last update - i.e. we need to be
able to look at the version history for those cases.
When an article has been conventionally updated by an editor, that
may mean, he altered data, which we originally supplied, and that
we have to update our source before we may re-exort data to
WPs again. It is possible, that an update made in one WP shall
influence others as well, though this is not neccessarily so.
When we say, we supply only some specific data to an article, e.g.
an infobox, then we can reread the infobox, and if it has not
been altered, we can rewrite it for an update.
We can also use such infoboxes to import new data from WPs, when
they have been altered, e.g. someone died. We should have, however,
some protection agains collecting errors, garbage, and vandal drivel.
Both such uses should imho be documented by comments in the
wikicode of the articles in question. Editors must know of the
implications of their edits.
Summarizing all this, I'd suggest to carefulls plan, and test
drive, all aplications having the least chance to be more than
sheer article-creation-and-the-leave-it-alone-forever projects.
Language.
--------
Another field needing attention is language.
A pretty huge number of names (of persons, places, langages, etc.)
are identical between languages, or are transliterated somehow, or
undergo systematic transformations (e.g. of the kind that Estonian
versions of male names have 'as' appended to them, afaik) etc.
The rule of thumb is that for lesser-known distant things (places,
languages, persons, etc.) the existance of special or irregular
translations is very unlikely.
That may mean, we can compile a set of transformation rules, and
an exception lookup mechanism (e.g. in WiktionaryZ) and pretty
well assume, when exceptions are not found, that we can use the
rules.
Naturally, when this assumption fails, we need to have a feedback
path from the respective language community, that allows us to
"repair" errors. Since in most Wikipedias there are editors
reviewing all, or most, new articles, we can assume feedback to be
rather quick and reliable.
Finding the right, grammar, wording etc. for automatically
generated non-tabular content is quite an interesting task which
I'll not address here any further ;-)
Community aspects.
-----------------
Wikies not having alert proofreaders should imho not be filled
with much automated content, since this might be a remarkable
hindrance for community buildup.
The amount of newly inserted automated data should be determinable
by wiki admins, and generally it might be wise to make it somehow
related to the number of edits of any given time period, so as not
to overload the community.
How wiki admins find the right figures should imho be left to
them, valid suggestion might be by public voting, or taken from
experience of how thoroughly data can be verified.
Alo, keeping data up to date needs imho to be negotiated with the
communities. I bet, we'll receive several interesting ideas of how
this could be accomplished without interfering with potential
human editors too much.
Greetings to all
Purodha
-- e-mail: <wikidata-l.mail.wikimedia.org(a)publi.purodha.net>