Wiktionary-l August 2013

wiktionary-l@lists.wikimedia.org

6 participants
6 discussions

Re: [Wiktionary-l] [Wikitech-l] Listing missing words of wiktionnaries
by Lars Aronsson 09 Dec '13

09 Dec '13

On 07/23/2013 11:23 AM, Mathieu Stumpf wrote: > Here is what I would like to do : generating reports which give, for a > given language, a list of words which are used on the web with a > number evaluating its occurencies, but which are not in a given > wiktionary. > > How would you recommand to implemente that within the wikimedia > infrastructure? Some years back, I undertook to add entries for Swedish words in the English Wiktionary. You can follow my diary at http://en.wiktionary.org/wiki/User:LA2 Among the things I did was to extract a list of all Swedish words that already had entries. The best way was to use CatScan to list entries in categories for Swedish words. Even if there is a page called "men", this doesn't mean the Swedish word "men" has an entry, because it could be the English word "men" that is in that page. Then I extracted all words from some known texts, e.g. novels, the Bible, government reports, and the Swedish Wikipedia, counting the number of occurrencies of each word. Case significance is a bit tricky. There should not be an entry for lower-case stockholm, so you can't just convert everything to lower case. But if a sentence begins with a capital letter, that word should not have a capitalized entry. Another tricky issue is abbreviations, which should keep the period, for example "i.e." rather than "i" and "e". But the period that ends a sentence should be removed. When splitting a text into words, I decided to keep all periods and initial capital letters, even if this leads to some false words. When you have word frequency statistics for a text, and a list of existing entries from Wiktionary, you can compute the coverage, and I wrote a little script for this. I found that English Wiktionary already had Swedish entries covering 72% of the words in the Bible, and when I started to add entries for the most common of the missing words, I was able to increase this to 87% in just a single month (September 2010). Many of the common words that were missing when I started were adverbs such as "thereof", "herein", which occur frequently in any text but are not very exciting to write entries about. This statistics-based approach gave me a reason to add those entries. It is interesting to contrast a given text to a given dictionary in this way. The Swedish entries in the English Wiktionary is a different dictionary than the Swedish entries in the German or Danish Wiktionary. The kinds of words found in the Bible are different from those found in Wikipedia or in legal texts. There is not a single, universal text corpus that we can aim to cover. Google has released its ngram dataset. I'm not sure if it covers Swedish, but even if it does, it must differ from the corpus frequencies published by the Swedish Academy. It is relatively easy to extract a list of existing entries from Wiktionary. But to prepare a given text corpus for frequency and coverage analysis needs more preparation. -- Lars Aronsson (lars(a)aronsson.se) Aronsson Datateknik - http://aronsson.se

5 7

FYI: Java-based Wiktionary Library (JWKTL) 1.0.0 released as open source software
by Christian Meyer 20 Aug '13

20 Aug '13

[Apologies for X-posting] We are pleased to announce the release of the Java-based Wiktionary Library (JWKTL) 1.0.0 - an application programming interface for Wiktionary. Project homepage: http://code.google.com/p/jwktl/ == Overview == JWKTL (Java-based Wiktionary Library) is an application programming interface for the free multilingual online dictionary Wiktionary (http://www.wiktionary.org). JWKTL enables efficient and structured access to the information encoded in the English, the German, and the Russian Wiktionary language editions, including sense definitions, part of speech tags, etymology, example sentences, translations, semantic relations, and many other lexical information types. The Russian JWKTL parser is based on Wikokit (http://code.google.com/p/wikokit/). Prior to being available as open source software, JWKTL has been a research project at the Ubiquitous Knowledge Processing (UKP) Lab of the Technische Universität Darmstadt, Germany. The following people have mainly contributed to this project: Yevgen Chebotar, Iryna Gurevych, Christian M. Meyer, Christof Müller, Lizhen Qu, Torsten Zesch. == Publications == A detailed description of Wiktionary and JWKTL is available in our scientific articles: * Christian M. Meyer and Iryna Gurevych: Wiktionary: A new rival for expert-built lexicons? Exploring the possibilities of collaborative lexicography, Chapter 13 in S. Granger & M. Paquot (Eds.): Electronic Lexicography, pp. 259-291, Oxford: Oxford University Press, November 2012. (http://www.ukp.tu-darmstadt.de/publications/details/?no_cache=1&tx_bibtex_p…) * Christian M. Meyer and Iryna Gurevych: OntoWiktionary - Constructing an Ontology from the Collaborative Online Dictionary Wiktionary, chapter 6 in M. T. Pazienza and A. Stellato (Eds.): Semi-Automatic Ontology Development: Processes and Resources, pp. 131-161, Hershey, PA: IGI Global, February 2012. (http://www.ukp.tu-darmstadt.de/publications/details/?no_cache=1&tx_bibtex_p…) * Torsten Zesch, Christof Müller, and Iryna Gurevych: Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary, in: Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC), pp. 1646-1652, May 2008. Marrakech, Morocco. (http://www.ukp.tu-darmstadt.de/publications/details/?no_cache=1&tx_bibtex_p…) == License and Availability == The latest version of JWKTL is available via Maven Central. If you use Maven as your build tool, then you can add JWKTL as a dependency in your pom.xml file: <dependency> <groupId>de.tudarmstadt.ukp.jwktl</groupId> <artifactId>jwktl</artifactId> <version>1.0.0</version> </dependency> JWKTL is available as open source software under the Apache License 2.0 (ASL). The software thus comes "as is" without any warranty (see license text for more details). JWKTL makes use of Berkeley DB Java Edition 5.0.73 (Sleepycat License), Apache Ant 1.7.1 (ASL), Xerces 2.9.1 (ASL), JUnit 4.10 (CPL). Some classes have been taken from the Wikokit project (available under multiple licenses, redistributed under the ASL license). See NOTICE.txt for further details. == Contact == Please direct any questions or suggestions to https://groups.google.com/forum/#!forum/jwktl-users Group E-Mail: jwktl-users(a)googlegroups.com Best wishes, Christian M. Meyer -- Christian M. Meyer, M.Sc. Doctoral Researcher Ubiquitous Knowledge Processing (UKP Lab) FB 20 Computer Science Department Technische Universität Darmstadt Hochschulstr. 10, D-64289 Darmstadt, Germany Phone [+49] (0)6151 16-5386, fax -5455, room S2/02/B113 meyer(a)ukp.informatik.tu-darmstadt.de www.ukp.tu-darmstadt.de Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de

1 0

wikt2dict - a tool for extracting translations from Wiktionaries
by Judit, Ács 14 Aug '13

14 Aug '13

Hi All, I created a tool to extract translations from different editions of Wiktionary. Right now it supports 39 different Wiktionaries. It only extracts translations and ignores the rest. Supported Wiktionaries: Azerbaijani, Bulgarian, Catalan, Czech, Danish, Greek, English, Esperanto, Spanish, Estonian, Basque, Finnish, French, Galician, Hebrew, Croatian, Hungarian, Indonesian, Italian, Georgian, Latin, Lithuanian, Malagasy, Dutch, Norwegian, Occitan, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Serbian, Swedish, Swahili, Turkish, Ukrainian, Vietnamese and Chinese. Adding a new Wiktionary is done via a configuration file. Right now the beta version is available for download at: https://github.com/juditacs/wikt2dict Documentation is in progress, until then the README should be enough to get started. Please test it and send me your feedback and bug reports. Thanks, Judit Ács

3 4

Re: [Wiktionary-l] [Wikimedia-l] [Wikidata-l] Meeting about the support of Wiktionary in Wikidata
by David Cuenca 10 Aug '13

10 Aug '13

To add up a couple of comments to what Denny said, from my experience with Wikisource, reaching out to international, loosely connected communities is already a big challenge on its own. I would like to invite Wiktionary contributors to take a look to this Individual Engagement Grant project that Aubrey and me are doing for Wikisource, because maybe it would make sense that a group of involved Wiktionarians started a similar initiative for Wiktionary. The original application can be found here: http://meta.wikimedia.org/wiki/Grants:IEG/Elaborate_Wikisource_strategic_vi… And the midterm report: http://meta.wikimedia.org/wiki/Grants:IEG/Elaborate_Wikisource_strategic_vi… If anyone from the Wiktionary community wants to step forward, I would be more than happy to share experiences and provide advice. Cheers, Micru On Sat, Aug 10, 2013 at 3:30 AM, Denny Vrandečić < denny.vrandecic(a)wikimedia.de> wrote: > [Sorry for cross-posting] > > Yes, I agree that the OmegaWiki community should be involved in the > discussions, and I pointed GerardM to our proposals whenever and > discussions, using him as a liaison. We also looked and keep looking at the > OmegaWiki data model to see what we are missing. > > Our latest proposal is different from OmegaWiki in two major points: > > * our primary goal is to provide support for structured data in the > Wiktionaries. We do not plan to be the main resource ourselves, where > readers come to in order to look up something, we merely provide structured > data that a Wiktionary may or may not use. This parallels the role of > Wikidata has with regards to Wikipedia. This also highlights the difference > between Wikidata and OmegaWiki, since OmegaWiki's goal is "to create a > dictionary of all words of all languages, including lexical, terminological > and ontological information." > > * a smaller difference is the data model. Wikidata's latest proposal to > support Wiktionary is centered around lexemes, and we do not assume that > there is such a things as a language-independent defined meaning. But no > matter what model we end up with, it is important to ensure that the bulk > of the data could freely flow between the projects, and even though we > might disagree on this issue in the modeling, it is ensured that the > exchange of data is widely possible. > > We tried to keep notes on the discussion we had today: < > http://epl.wikimedia.org/p/WiktionaryAndWikidata> > > My major take home message for me is that: > * the proposal needs more visual elements, especially a mock-up or sketch > of how it would look like and how it could be used on the Wiktionaries > * there is no generally accepted place for a discussion that involves all > Wiktionary projects. Still, my initial decision to have the discussion on > the Wikidata wiki was not a good one, and it should and will be moved to > Meta. > > Having said that, the current proposal for the data model of how to support > Wiktionary with Wikidata seems to have garnered a lot of support so far. So > this is what I will continue building upon. Further comments are extremely > welcomed. You can find it here: > > <http://www.wikidata.org/wiki/Wikidata:Wiktionary> > > As said, it will be moved to Meta, as soon as the requested mockups and > extensions are done. > > Cheers, > Denny > > > > > > 2013/8/10 Samuel Klein <meta.sj(a)gmail.com> > > > Hello, > > > > > On Fri, Aug 9, 2013 at 6:13 PM, JP Béland <lebo.beland(a)gmail.com> > wrote: > > >> I agree. We also need to include the Omegawiki community. > > > > Agreed. > > > > On Fri, Aug 9, 2013 at 12:22 PM, Laura Hale <laura(a)fanhistory.com> > wrote: > > > Why? The question of moving them into the WMF fold was pretty much no, > > > because the project has an overlapping purpose with Wiktionary, > > > > This is not actually the case. > > There was overwhelming community support for adopting Omegawiki - at > > least simply providing hosting. It stalled because the code needed a > > security and style review, and Kip (the lead developer) was going to > > put some time into that. The OW editors and dev were very interested > > in finding a way forward that involved Wikidata and led to a combined > > project with a single repository of terms, meanings, definitions and > > translations. > > > > Recap: The page describing the OmegaWiki project satisfies all of the > > criteria for requesting WMF adoption. > > * It is well-defined on Meta http://meta.wikimedia.org/wiki/Omegawiki > > * It describes an interesting idea clearly aligned with expanding the > > scope of free knowledge > > * It is not a 'competing' project to Wiktionaries; it is an idea that > > grew out of the Wiktionary community, has been developed for years > > alongside it, and shares many active contributors and linguiaphiles. > > * It started an RfC which garnered 85% support for adoption. > > http://meta.wikimedia.org/wiki/Requests_for_comment/Adopt_OmegaWiki > > > > Even if the current OW code is not used at all for a future Wiktionary > > update -- and this idea was proposed and taken seriously by the OW > > devs -- their community of contributors should be part of discussions > > about how to solve the Wiktionary problem that they were the first to > > dedicate themselves to. > > > > Regards, > > Sam. > > > > _______________________________________________ > > Wikimedia-l mailing list > > Wikimedia-l(a)lists.wikimedia.org > > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe> > > > > > > -- > Project director Wikidata > Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin > Tel. +49-30-219 158 26-0 | http://wikimedia.de > > Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. > Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter > der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für > Körperschaften I Berlin, Steuernummer 27/681/51985. > _______________________________________________ > Wikimedia-l mailing list > Wikimedia-l(a)lists.wikimedia.org > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe> > -- Etiamsi omnes, ego non

1 0

Meeting about the support of Wiktionary in Wikidata
by David Cuenca 09 Aug '13

09 Aug '13

<wiktionary-l(a)lists.wikimedia.org>Hi, If there is someone in Wikimania interested in participating in the talks about the future support of Wiktionary in Wikidata, we will having a discussion about the several proposals. http://wikimania2013.wikimedia.org/wiki/Support_of_Wiktionary_in_Wikidata Date : Saturday, 10 Aug, 11:30 am - 1:00 pm Place: Y520 (block Y, 5th floor) See you there, Micru

4 3

Re: [Wiktionary-l] [Wikidata-l] Meeting about the support of Wiktionary in Wikidata
by Mathieu Stumpf 09 Aug '13

09 Aug '13

Le 2013-08-09 13:04, Romaine Wiki a écrit : > Are there much users from Wiktionary in Hong Kong? I do not think any > of the Dutch users is, I can't say for others. > > I think it would be essential that this subject is discussed inside > the wider Wiktionary community. To me the group of users > participating > is too narrow. Also is a mailing list not handy as most of the users > from Wiktionary do not read that. I think a Wikt-community wide > discussion is needed. I agree, and I think meta would be the most obvious channel for such a discussion. As said in the previous email, there's already [[Wiktionary future]] which is waiting for contributions and discussion on meta. Anyway, whatever the canal, it would be realy important to make aware as much contributors as possible aware of this initiative, so they can provide relevant feedback specific to their needs. > > Romaine > > > -------------------------------------------- > On Fri, 8/9/13, David Cuenca <dacuetu(a)gmail.com> wrote: > > Subject: [Wikidata-l] Meeting about the support of Wiktionary in > Wikidata > To: wiktionary-l(a)lists.wikimedia.org, "Wikimania general list (open > subscription)" <wikimania-l(a)lists.wikimedia.org>, "Discussion list > for > the Wikidata project." <wikidata-l(a)lists.wikimedia.org>, "Wikimedia > Mailing List" <wikimedia-l(a)lists.wikimedia.org> > Date: Friday, August 9, 2013, 4:43 AM > > Hi, > > > If there is someone in Wikimania interested in participating > in the talks about the future support of Wiktionary in > Wikidata, we will having a discussion about the several > proposals. > > > > > http://wikimania2013.wikimedia.org/wiki/Support_of_Wiktionary_in_Wikidata > > Date : Saturday, 10 Aug, 11:30 am - 1:00 pm > > > Place: Y520 (block Y, 5th floor) > > See you there, > Micru > > > -----Inline Attachment Follows----- > > _______________________________________________ > Wikidata-l mailing list > Wikidata-l(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata-l > > > _______________________________________________ > Wikidata-l mailing list > Wikidata-l(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Association Culture-Libre http://www.culture-libre.org/

1 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Wiktionary-l August 2013