Re: [Wikitech-l] GSOC 2014 idea

28 Feb 2014


      2014-02-28 11:09 GMT+02:00 Roman Zaynetdinov romanznet@gmail.com:
...
From which source gather the data?
Wiktionary is the best candidate, it is an open source and it has a wide
database. It also suits for growing your project by adding different
languages.
It's not obvious why you have reached this conclusion.
1) There are many Wiktionaries, and they do not all work the same or
have the same content.
2) The Wiktionary data is relatively free form text, so it is hard to
parse to find the relevant bits.
3) Dozens of people have mined Wiktionary already. It would make sense
to see if they have put the resulting database available.
4) There are many sources of data, some of them also open, which can
have better coverage, or coverage on speciality areas where
Wiktionaries are lacking.
5) I expect that best results will be achieved by using multiple data sources.
...
Growth opportunities
I am leaving in Finland right now and I don't know Finnish as I should to
understand locals, therefore this project can be expanded by adding more
languages support for helping people like me reading, learning and
understanding texts in foreign languages.
I hope you enjoyed your stay in here. I do not how much Finnish you
have learned, but after a while it should be obvious that just
searching for the exact string the user clicked or selected will not
work because of the agglutinative nature of the language. I advocate
for features which work in all languages (at least in many :). If you
implement this for English only first, it is likely that you will have
to rewrite it to support other languages.
-Niklas

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] GSOC 2014 idea