Deal all,
Recently I met few of the Assamese Wikipedians in Dibrugarh. We discussed
regarding many of the plans of the Assamese Wikimedia community. I'm
summarizing our discussions in this mail.
Minutes of discussion
Meeting with Assamese Wikipedians, Faculty members of Dibrugarh University
Venue: Dibrugarh University
Time: 6:30 - 8:30 pm, 11.02.2013
Participants:
1.
Gitartha Bordoloi, M.D. Student, Assam Medical College, Dibrugarh
1.
Abhijit Kalita, Manager, NEEPCO Ltd, Duliajan
1.
Gunadeep Chetia, Asst. Professor, Centre for Computer Studies, Dibrugarh
University, Dibrugarh
2.
Dr. Bhaskarjyoti Sarma, Asst. Professor, Department of Assamese ,
Dibrugarh University, Dibrugarh
3.
Dr. Mridul Bordoloi, Asst. Professor, Department of English, Dibrugarh
University, Dibrugarh
4.
Subhashish Panigrahi, Programme Officer, A2K, Centre for Internet and
Society, New Delhi
Agenda:
1.
Creating resources for Assamese Wikimedia communities
2.
Solving technological challenges
3.
Facilitating wikipedians for outreach
Discussion outline:
1.
Subhashish addressed the meeting by explaining the role of A2K team in
terms of catalyzing the Indian language communities across the nation.
2.
Gitartha briefly explained how Assamese community flourished in the
recent past and has been trying to build it organically.
Gunadeep Chetia briefly discussed about the technical roadblocks for
Assamese communities and the initiatives. Gunadeep, Bhaskarjyoti, Abhijit,
Mridul and few other Assamese language enthusiasts have formed a non-profit
organization “Society for Language Technology Development, Assam
(SLTD,Assam) and have released a few typing tools for typing in Assamese.
The society has 3 teams working collaboratively; A) Technology team B)
Language team and C) Design team. Faculty members from various departments
are involved in the society. Some of the students are also involved as
volunteers in content generation project.
Challenges:
1.
Technology gap: Most of those who have experience and expertise are not
computer handy and it’s becoming a roadblock for them to contribute to
Wikipedia.
2.
Standardization of Assamese language:
3.
Lack of volunteers: Most of the volunteers die out after some time. If
some kind of remuneration could be allocated more people could get involved
and a huge repository could be built in a short span of time.
Accomplishments & Plans of the Society:
Accomplishments:
1.
Rodali: SLTD indigenously worked for creating the first Assamese
Phonetic typing tool (Offline and online)
“Rodali<http://www.sltdassam.com/rodali.html>”.
It is distributed in a free license and constantly being upgraded.
Plans:
1.
Assamese Spell checker: This is a tool which could be useful for
correcting typos
2.
Assamese word library (Some funding is needed to involve some people to
type and create a library of Assamese words which could be used for adding
the spell check feature.)
3.
Pronunciation library: Samples of various Assamese dialects were
collected and analyzed using voice analyzing softwares. The average value
of the samples was taken as the standard pronunciation of a particular
word. Most Indian language lack such libraries and once built it could be
used in multiple ways;
A) Voice command for voice based command for computer
B) Text to speech
Primary needs of the community:
1.
Digitization: Assamese community needs many of the out-of copyright
books to be digitized in text format so they could be used for WikiSource.
The community feels it will be useful to get the books typed by local DTP
operators and distribute them for WikiSource and other portals. WikiSource
has has very few active contributors and it is being difficult for the
community to gather more people. A content creation drive could help.
4) Font related issues
I. Ambiguity of characters:
Assamese and Bengali scripts are broadly same, but some of the characters
make the two scripts unique. Unicode consortium calls the characters for
Assamese as Bengali characters. This issue has been taken to the Unicode
Consortium and they were requested to change the name to Assamese/Bengali
but never addressed. This mistake gave rise few more issues:
A) Assamese Wikipedia uses a Bengali Phonetic keyboard layout called Avro.
As Avro uses the Bengali characters and conjuncts few of the frequently
used conjuncts display incorrectly (In Assamese). The issue has taken a
larger time of the community to correct. There were cases of the same
article being created twice with two different spellings.
II. Non-availability of good quality Assamese fonts
Assamese Wikipedia and other language projects often rely on Bengali fonts.
A good quality Assamese font would be of great use.
5) TTS: Text to Speech software development work:
Needs two major libraries: Pronunciation library and word library for the
TTS project
6) Typing tool:
Rodali: Rodali was developed as an indigenous phonetic & Inscript typing
tool for Assamese. It is available offline for Linux and Windows and
online. The Windows application is built on C++ (Coding) and .NET
(Interface) and the Linux application is written using iBus and Lisp. The
online version uses JavaScript and is compatible with jQuery.IME (Used for
Wikimedia projects for typing). Rodali development included feedback from
linguists, common users and many test versions were released for testing.
Suggestions were made to replace the Avro Bengali typing tool with Rodali
so that the same tool is used across platforms.
7) OCR (Optical character recognition) software
Optical character recognition is used for converting text from scanned
images of books/documents. There are handful of OCR softwares made in few
of the Indian languages which are more or less inaccurate. OCR for Bengali
has multiple bugs and it was assumed it would work for Assamese as well.
But because of few of the Assamese characters the engine for OCR doesn’t
work properly for Assamese. There is a need of an Assamese OCR. The Society
is currently planning to invest some time for OCR as well.
--
Best!
Subhashish Panigrahi
Programme Officer, Access To Knowledge
Centre for Internet and Society
@psubhashish
Show replies by thread