Benoit Lelong, 11/12/2012 16:11:
> I am currently planning to process the last french dump. I would like to
> ask if somebody has already found or used a good OpenNLP french sentence
> detection model. If yes please let me know where to find one.
What have you found? Probably wiktionary-l is a better place to ask.
In Wiktionary, it's very convenient that some words
have sound illustrations, e.g.
These audio bites are simple 2-3 second OGG files, e.g.
but they are limited in number. It would be very
easy to record more of them, but before you get
started it takes some time to learn the details,
and then you need to upload to Commons and specify
a license, and provide a description, ... It's not
very likely that the person who does all that is
also a good voice in each desired language.
Here's a better plan:
Provide a tool on the toolserver, or any other
server, having a simple link syntax that specifies
the language code and the text, e.g.
The tool uses a cookie, that remembers that this
user has agreed to submit contributions using cc0.
At the first visit, this question is asked as a
The user is now prompted with the text (from the URL)
and recording starts when pressing a button. The
user says the word, and presses the button again.
The tool saves the OGG sound, uploads it to Commons
with the filename fr-gouter-XYZ789.ogg and
the cc0 declaration and all metadata, placing it
in a category of recorded but unverified words.
Another user can record the same word, and it will
be given another random letter-digit code.
As a separate part of the tool, other volunteers are
asked to verify or rate (1 to 5 stars) the recordings
available in a given language. The rating is stored
as categories on commons.
Now, a separate procedure (manual or a bot job) can
pick words that need new or improved recordings,
and list them (with links to the tool) on a normal
I know HTML supports uploading of a file, but I don't
know how to solve the recording of sound directly to
a web service. Perhaps this could be a Skype application?
I have no idea. Please just be creative. It should be
solvable, because this is 2013 and not 2003.
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik - http://aronsson.se
There is no point at all in maintaining the software currently used by
OmegaWiki. That would be foolish. Nobody who knows OmegaWiki will ask for
What we are asking for is that we ensure that the structures that exist in
OmegaWiki are replicated in Wikidata for reasons that are clear and
obvious. Technically there are a few things that make sense to have..
For instance.. In the Dutch language we have a noun, a verb an adjective
.... we do not have a country in this class. A noun can be male, female or
neutral .... we do not have a stupid. We have singular and plural and we
do not have dual like in Arabic.
When there is a concept, we have synonyms and translations that are used as
such but do not cover the original concept well. We want to be able to
Really Denny, all we need is to keep the structure, the data. We do not
even want to be dogmatic about this (too much). What we want are things
that fulfil a need, that have a purpose.
On 11 March 2013 15:51, Denny Vrandečić <denny.vrandecic(a)wikimedia.de>wrote:
> Sorry about the wrong link, I meant this IEG proposal:
> but as far as I can tell, this one didn't make it into round 1 (pity,
> something like that would have made sense, but I understand that the
> proposal was obviously not detailed enough. Whatever.)
> I fully agree with Andrea and Nemo that some use cases would be very easy
> to implement, especially linking between the projects. Commons and
> Wiktionary though are very different and require more thought:
> * easy goals: link to appropriate items for some of the pages in Commons,
> use data from Wikidata in the creator namespace and similar
> * more engaging: add metadata to the media files in Commons itself and link
> them to each other and to Wikidata
> * easy goals: none. The conceptualization of Wiktionary simply is not a
> direct fit to the conceptualization in Wikipedia and Wikidata.
> We need to figure out how they work together. Maybe this page is a good
> start, and maybe we should collect the ideas there.
> I mean, OmegaWiki has been around for a while, and they learned many,
> extremely valuable lessons. A lot of work has went into it, and it would be
> a shame not to build on its experiences and lessons. But I would like to
> ask the question whether it is the right software or not, even though it is
> a painful question. But please be reminded that I have spent many years in
> the development of Semantic MediaWiki, with the one goal to have it
> switched on the Wikipedias -- and then to come to the conclusion to *not*
> use the software as is, and start from scratch.
> We need a discussion on Wiktionary, and how it can evolve, and if it even
> should. And I do not think that a cross-mailing list discussion like the
> current one is the right place, and I do not even know where the right
> place is.
> So, first question: where should this discussion take place?
> 2013/3/11 Federico Leva (Nemo) <nemowiki(a)gmail.com>
> > Denny Vrandečić, 11/03/2013 14:52:
> > There is currently a number of things going on re the future of
> >> Wiktionary.
> >> There is, for example, the suggestion to adopt OmegaWiki, which could
> >> potentially complicate a Wikibase-Solution in the future (but then
> >> structured data is often rather easy to transform):
> >> <
> >> >
> >> There is this grant proposal for elaborating the future of Wiktionary,
> >> which I consider a potentially smarter first step:
> >> <
> >> http://meta.wikimedia.org/**wiki/Grants:IEG/Elaborate_**
> >> Wikisource_strategic_vision<
> > That's Wikisource. :)
> >> There's this discussion on Wikdiata itself:
> >> <https://www.wikidata.org/**wiki/Wikidata:Wiktionary<
> >> >
> >> And I know that Daniel K. is very interested in working into this
> >> direction.
> >> Personally, I regard Wiktionary as the third priority, following
> >> and Commons. A lot of the other projects -- like Wikivoyage or
> >> -- can be served with only small changes to Wikidata as it is, but both
> >> Commons and Wiktionary would require a bit of thought (and here again,
> >> Commons much less than Wiktionary).
> > Actually Wikiquote and Wikivoyage use interwikis exactly like Wikipedia;
> > Commons in the same way except it's interproject; Wiktionary in the same
> > way except it's case-sensitive and not about concepts (opr about a
> > definition of concept); Wikisource in a completely different way;
> > Wikibooks, Wikinews and Wikiversity I'm not sure.
> > As for phase II, it's another story. Wikisource and Commons would benefit
> > a lot from it; for Wiktionary it could be a revolution; for Wikispecies
> > idem but with less effort (?); Wikiquote would become
> > I would appreciate a discussion with
> >> the Wiktionary-Communities, and also to make them more aware of the
> >> OmegaWiki proposal, the potential of Wikidata for Wiktionary, etc. Just
> >> give a comparison: it took a few months to write the original Wikidata
> >> proposal, and it was up for discussion for several months before it was
> >> decided and acted upon. I would strongly advise to again choose slow and
> >> careful planning over hastened decisions.
> > It's impossible to plan or discuss anything without knowing what matters.
> > Nemo
> Project director Wikidata
> Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
> Tel. +49-30-219 158 26-0 | http://wikimedia.de
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
> der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
> Körperschaften I Berlin, Steuernummer 27/681/51985.
> Wikimedia-l mailing list
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Denny Vrandečić, 11/03/2013 14:52:
> There is currently a number of things going on re the future of Wiktionary.
> There is, for example, the suggestion to adopt OmegaWiki, which could
> potentially complicate a Wikibase-Solution in the future (but then again,
> structured data is often rather easy to transform):
> There is this grant proposal for elaborating the future of Wiktionary,
> which I consider a potentially smarter first step:
That's Wikisource. :)
> There's this discussion on Wikdiata itself:
> And I know that Daniel K. is very interested in working into this direction.
> Personally, I regard Wiktionary as the third priority, following Wikipedia
> and Commons. A lot of the other projects -- like Wikivoyage or Wikisource
> -- can be served with only small changes to Wikidata as it is, but both
> Commons and Wiktionary would require a bit of thought (and here again,
> Commons much less than Wiktionary).
Actually Wikiquote and Wikivoyage use interwikis exactly like Wikipedia;
Commons in the same way except it's interproject; Wiktionary in the same
way except it's case-sensitive and not about concepts (opr about a
stricter definition of concept); Wikisource in a completely different
way; Wikibooks, Wikinews and Wikiversity I'm not sure.
As for phase II, it's another story. Wikisource and Commons would
benefit a lot from it; for Wiktionary it could be a revolution; for
Wikispecies idem but with less effort (?); Wikiquote would become
> I would appreciate a discussion with
> the Wiktionary-Communities, and also to make them more aware of the
> OmegaWiki proposal, the potential of Wikidata for Wiktionary, etc. Just to
> give a comparison: it took a few months to write the original Wikidata
> proposal, and it was up for discussion for several months before it was
> decided and acted upon. I would strongly advise to again choose slow and
> careful planning over hastened decisions.
It's impossible to plan or discuss anything without knowing what matters.
Request for help by Wikimedia Deutschland also posted on
of relevance for all Wikimedia projects users.
-------- Messaggio originale --------
Oggetto: Information on Tool Labs/ Your help needed
Data: Thu, 7 Mar 2013 10:26:51 +0000 (UTC)
Mittente: Silke Meyer
You are getting this e-mail because you have an active or expired
account on the toolserver.
As you might know, Wikimedia Foundation is building a cloud-based
infrastructure (Labs/Tool Labs) that - in the long run - will be a
replacement for the toolserver. Don't worry! The toolserver will be not
just be switched off. WMF and WMDE would like to support you as good as
possible when it comes to migrating tools. There will be enough time for
this process. We will offer different forms of support which you will
hear of soon.
Right now, Tool Labs is not ready. WMF staff, mainly Marc Pelletier is
building it. You might have seen the general list of needed and wanted
features . For many of you crucial features are database replication
and user databases. Both are upcoming features in the nearer future.
Your tools and their dependencies are taken into account to build the
new infrastructure. This is why I am asking you for help:
Please provide information about your tools! I started an incomplete
list at  (http://www.mediawiki.org/wiki/Toolserver/List_of_Tools,
based on the wiki and jira).
Personally I was really impressed about its length and diversity - how
cool! Several people said there was so much missing in this list - so
please help me to complete it!
Here is what I'm asking you to do:
* Please check if your tools are on the list (correctly).
* Please fill in your software dependencies, data dependencies, use
patterns (are they running continously? or webservices? Batch runs?
etc.), the license.
* If you have not given your software an explicit license, please note
that only free software can migrate to Labs. Consider putting your stuff
under a free license.
* If you have already migrated your bot or tool to Labs and it is in the
list, please say so in the last column "status".
* If your toolserver account has expired and/or your tool is not active
on the toolserver right now, please add it to the second table at the
bottom of the page. In the last column write that it is not running
currently and if you want to revive it. If you are considering to revive
it, please fill in all the details, too, so that your needs will be
taken into account.
* Please follow the toolserver-announce list for more information to come.
Thanks for your help! If you have any questions, please don't hesitate
to ask me, to ask on toolserver-l and/or on labs-l.
 List of Tools:
Systemadministratorin und Projektassistenz Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. (030) 219 158 260