Hello, I am Karthik from India - currently pursuing 3rd year Bachelors in Computer Science and Engineering in PESIT, Bangalore.
I am interested in some of the projects proposed for Google SOC 2012 and would love to work and contribute the same to the open-source world.
I am very attracted towards Text Processing and Data Mining. I have undertaken course in Natural Language Processing. I am currently working on a project "Automatic Essay Grader" - A system that automatically grades English essays based on Spelling, Grammar and Structure, Coherence, Frequent phrases and Vocabulary as weighted parameters. Realized by implementing a self-designed algorithm – studying the ‘relation graph’ of words of the essay.
I had also worked on "Sentiment Analysis on Web" - Extraction of reviews about a gadget from tech-review forums, analysis of the Sentiments of the reviews thus predicting the sentiment/opinion associated with that gadget and then generation of appropriate Rating on the scale of 10.
The following projects mentioned on the mediawiki's ideas page caught my eye: 1) Wikipedia Corpus Tools 2) Lucene Lemma Analyzers based on Morphology Extraction from Wikipedia Text 3) Lucene Automatic Query Expansion from Wikipedia Text 4) Translation spellchecking
Apart from the above projects, I also had the following ideas which i feel will be of great help if implemented. a) Implementing a wikiSumarizer widget which will give the summary of the page being read by the user. b) An automatic coherence analyser which would make it easy to find out if the article on a given page talks about the same topic c) Details Aggregator for page.
I would be grateful if you could kindly let me know about the specific requirements of the projects and about your thoughts on my ideas so that I can suitably write a proposal.
Eagerly waiting for your response.
Thanking you.
Best Regards, Karthik.