Re: [Wiktionary-l] Wiktionary parsing ; multiple languages

5 Apr 2013

      Hi Moutupsi,
there are actually some problems, that can be better solved by a 
community than by software alone. It took quite some efforts and three 
years, but we are very close to really start now.
Since two days, we have a working minimal example for the Wiktionary2RDF 
subproject of DBpedia, so the community can really pick it up now.
Main docu is here: http://dbpedia.org/Wiktionary
Now that the software and the linked data and sparql hosting are 
working, we will try to find maintainers for each language. DBpedia 
already has a vast network for this:
http://wiki.dbpedia.org/Internationalization
I think there will be configs + data for these languages quite soon: ko, 
sr, el, es with many more to follow. You are welcome to join in, try to 
produce the data you need and give back your results to the community.
There are two views on the software, one for people who just want to use 
it and create configs: https://github.com/dbpedia/dbpedia-wiktionary
and for Scala/Java developers: 
https://github.com/dbpedia/extraction-framework/tree/master/wiktionary
Data can be found here: http://downloads.dbpedia.org/wiktionary/dumps/
I will write a blog post announcing this soon.
All the best,
Sebastian
Am 04.04.2013 03:21, schrieb Moutupsi Paul:
...
Hi All,
Greeting,
I am a CS grad student from Data Science Lab Stony Brookhttps://sites.google.com/site/datascienceslab/ and I am dropping this mail to request information about parsing multi-lingual Wiktionary data. Our lab has been using Wikipedia data for quite a while now but we are really interested in taking advantage of the massive Wiktionary content which we feel , after proper parsing, can become an rich muti-language corpus.
But the big hurdle is a parsing tool. We have tried a few Wiktionary parsing tools

  https://github.com/clbecker/perl-wiktionary-parser/

  https://code.google.com/p/wikokit/wiki/GettingStartedWiktionaryParser

  https://github.com/benreynwar/wiktionary-parser/tree/master/wiktionary_parser

  http://www.ukp.tu-darmstadt.de/software/jwktl/

but none of them are available in a ready-to-use or easy-to-extend in multiple language mode. (I am currently trying to work with wikokit (parser 2 above)  )
I request for some advice, suggestion or redirection towards best available Wiktionary parser. We are mainly looking to extract meanings, POS, examples, translations etc. (more can never hurt).
Any help is appreciated. Kindly let know if further information is needed.
Regards,
Moutupsi

Wiktionary-l mailing list
Wiktionary-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
-- 
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Projects: http://nlp2rdf.org , http://linguistics.okfn.org , 
http://dbpedia.org/Wiktionary , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Wiktionary-l] Wiktionary parsing ; multiple languages