[Wikitech-l] GSOC 2012 - Text Processing and Data Mining

31 Mar 2012


      Hello,
I am Karthik from India - currently pursuing 3rd year Bachelors in Computer
Science and Engineering in PESIT, Bangalore.
I am interested in some of the projects proposed for Google SOC 2012 and
would love to work and contribute the same to the open-source world.
I am very attracted towards Text Processing and Data Mining. I have
undertaken course in Natural Language Processing. I am currently working on
a project "Automatic Essay Grader" - A system that automatically grades
English essays based on Spelling, Grammar and Structure, Coherence,
Frequent phrases and Vocabulary as weighted parameters. Realized by
implementing a self-designed algorithm – studying the ‘relation graph’ of
words of the essay.
I had also worked on "Sentiment Analysis on Web" - Extraction of reviews
about a gadget from tech-review forums, analysis of the Sentiments of the
reviews thus predicting the sentiment/opinion associated with that gadget
and then generation of appropriate Rating on the scale of 10.
The following projects mentioned on the mediawiki's ideas page caught my
eye:
1) Wikipedia Corpus Tools
2) Lucene Lemma Analyzers based on Morphology Extraction from Wikipedia Text
3) Lucene Automatic Query Expansion from Wikipedia Text
4) Translation spellchecking
Apart from the above projects, I also had the following ideas which i feel
will be of great help if implemented.
a) Implementing a wikiSumarizer widget which will give the summary of the
page being read by the user.
b) An automatic coherence analyser which would make it easy to find out if
the article on a given page talks about the same topic
c) Details Aggregator for page.
I would be grateful if you could kindly let me know about the specific
requirements of the projects and about your thoughts on my ideas so that I
can suitably write a proposal.
Eagerly waiting for your response.
Thanking you.
Best Regards,
Karthik.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[Wikitech-l] GSOC 2012 - Text Processing and Data Mining