Has anyone tried Microsoft office Gujarati spell checker? It is available with office 2010.

Sent from the old new iPad!

On Sep 18, 2013, at 11:43 AM, Bakul Shah <bakul@bitblocks.com> wrote:

Googling "hindi spell checker algorithm" found a number of papers. The basic idea is to compare how "similar" a word being checked is to a word known to be correct, where similarity is computed using some algorithm. You don't store all the ways people can misspell a word. plus logic is used to derive related words from a root word, which depend on plurality, gender, tense, etc. These rules are more complex in Indic languages than western. And i think we may need to look at "clusters"  instead of individual unicode points. But all this must have been worked years ago. May be not for Gujarati but for Hindi, Marathi, Bengali. You should check with the usual suspects (google, Microsoft, SIL, language researchers etc.).

For OCR you may need something slightly different than spellcheckers that deal with human errors. Here a more common problem will be mistaking similar looking letters and joining or splitting of words due to too little of too much white space.

Ultimately there should be support for language variations too (surati, kathiawadi, amdavadi etc)!

On Sep 18, 2013, at 4:12 AM, Rajesh Mashruwala <mashru@gmail.com> wrote:

Dhavalbhai,

As we get text that is generated using OCR, I see need for a good Gujarati dictionary. I tried to use GL dictionary. It was not effective because it has corpus of words. It can not recognize any variation on the word. In that model, we need possibly over ten times the corpus GL dictionary has to be useful. Otherwise, it finds error with too many correct words.

The same dictionary could be used for Gujarati proof readers.

One way is to generate larger corpus by scrapping words from Gujarati Internet pages (those in Unicode), a better way is to think about building better dictionary logic. I may be able to interest exceptionally good volunteer developers if we can think of smarter way of creating a dictionary. For example, we could codify grammar rules to form derivative words.

Should we pursue this course?



Sent from the old new iPad!

On Sep 18, 2013, at 2:48 AM, "Dhaval S. Vyas" <dsvyas@gmail.com> wrote:

Dear Roopalben,

I second your concern regarding the correct language. I often say that Newspapers are the only LITERATURE most of us end up reading and have access to. The language and (more becoming common Hindi) words used in them shapes the language of society in present day and hence it is great that you are introducing this course.

Unfortunately, on wiki we don't have spelling correction tool or dictionary lookup facility. But, Vishal Monpara has been developing one. Gujarati Lexicon has recently developed pop-up dictionary as well, which could be adapted for this purpose.

On gu.wikipedia, there is a lot of content translated from either English or Hindi, and most of these lack the original Gujarati language. When read, these translations look so artificial. For the course, it could be good idea to show such examples and get the course attendees correct it, may be offline if they are not computer savvy or hesitant to use wikipedia.

Please let me and community here know if you have any suggestions on how we can help with the task you are carrying out.

Kind Regards,
Dhaval

On 18 Sep 2013 06:39, "Roopal Mehta" <roopal.mehta@gmail.com> wrote:
Basically there are not many good proofreaders available in the publishing industry - and the demand is high. That was the main reason for starting this course.

Wikipedia is an important source for information. However, the concern here is about correct use of language too. Today we see a lot many errors in Gujarati newspapers, publishing, media and almost everywhere. That is a high concern for us.

If Wiki is going to be an important tool for the next generation, we Have to make sure that it conveys correct language to the society.

I would like to know, whether any auto-correction of spelling etc. are available while editing an article in Wiki ?

Thank you.


Roopal


On Tue, Sep 17, 2013 at 4:38 PM, Kartik Mistry <kartik.mistry@gmail.com> wrote:
On Tue, Sep 17, 2013 at 3:42 PM, Roopal Mehta <roopal.mehta@gmail.com> wrote:
> At Gujarati Sahitya Parishad, we are running proof reading course and we are including a session of modern methods of proof reading, which includes editing on (Guj) Wiki articles.
>
> Please send suggestions if you have. This is the first batch of students from various fields.

Few suggestions (some may be offtopic, sorry for that!)
1. Please follow Wikipedia's guideline for article.
2. Make sure person is logged in before making changes.
3. Please do not change anything other than spelling/grammar etc.
4. If you're that already, donating pictures of 'સાહિત્યકાર' in
various articles from GSP, is good idea. Isn't it? :)

Thanks for good work!

--
Kartik Mistry | IRC: kart_
{0x1f1f, kartikm}.wordpress.com

_______________________________________________
Wikipedia-gu mailing list
Wikipedia-gu@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikipedia-gu


_______________________________________________
Wikipedia-gu mailing list
Wikipedia-gu@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikipedia-gu

_______________________________________________
Wikipedia-gu mailing list
Wikipedia-gu@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikipedia-gu