Hello, I am a GSoC aspirant and have compiled a proposal for one of the project ideas - Wikipedia Corpus Tools. [Mentor : Oren Bochman] I would sincerely appreciate if you could kindly go through it and suggest corrections/additions so that I can settle with a coherent proposal.
Link to my proposal : https://www.mediawiki.org/wiki/User:Karthikprasad/gsoc2012proposal
Thanking you.
Best Regards, Karthik.
2012/4/3 karthik prasad karthikprasad008@gmail.com:
Hello, I am a GSoC aspirant and have compiled a proposal for one of the project ideas - Wikipedia Corpus Tools. [Mentor : Oren Bochman] I would sincerely appreciate if you could kindly go through it and suggest corrections/additions so that I can settle with a coherent proposal.
Link to my proposal : https://www.mediawiki.org/wiki/User:Karthikprasad/gsoc2012proposal
Nice, but why only English?
If i understand the proposal correctly, this project is supposed to be able to work with almost any language with very little effort.
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
Amir,
Thank you for your GSOC proposal! :)
Between now and Google's submission deadline on April 6th - you are invited to further modify your proposals. The GSOC page on MW.org - https://www.mediawiki.org/wiki/GSOC - and our IRC rooms - https://www.mediawiki.org/wiki/MediaWiki_on_IRC
Looking over your proposal - I think you've got good background information on yourself. However, I think you should flush out more details on the proposed project. Without more familiarity with corpus (and with no links to find that info) - it's hard for everyone to weigh in equally or to make sure your project gets the full consideration you'd like.
-greg aka varnent
On Apr 3, 2012, at 4:18 PM, Amir E. Aharoni amir.aharoni@mail.huji.ac.il wrote:
2012/4/3 karthik prasad karthikprasad008@gmail.com:
Hello, I am a GSoC aspirant and have compiled a proposal for one of the project ideas - Wikipedia Corpus Tools. [Mentor : Oren Bochman] I would sincerely appreciate if you could kindly go through it and suggest corrections/additions so that I can settle with a coherent proposal.
Link to my proposal : https://www.mediawiki.org/wiki/User:Karthikprasad/gsoc2012proposal
Nice, but why only English?
If i understand the proposal correctly, this project is supposed to be able to work with almost any language with very little effort.
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Whoops - I meant that email to be directed to Karthik - although Amir you're welcome to read it as well. :)
-greg
On Apr 3, 2012, at 11:24 PM, Gregory Varnum gregory.varnum@gmail.com wrote:
Amir,
Thank you for your GSOC proposal! :)
Between now and Google's submission deadline on April 6th - you are invited to further modify your proposals. The GSOC page on MW.org - https://www.mediawiki.org/wiki/GSOC - and our IRC rooms - https://www.mediawiki.org/wiki/MediaWiki_on_IRC
Looking over your proposal - I think you've got good background information on yourself. However, I think you should flush out more details on the proposed project. Without more familiarity with corpus (and with no links to find that info) - it's hard for everyone to weigh in equally or to make sure your project gets the full consideration you'd like.
-greg aka varnent
On Apr 3, 2012, at 4:18 PM, Amir E. Aharoni amir.aharoni@mail.huji.ac.il wrote:
2012/4/3 karthik prasad karthikprasad008@gmail.com:
Hello, I am a GSoC aspirant and have compiled a proposal for one of the project ideas - Wikipedia Corpus Tools. [Mentor : Oren Bochman] I would sincerely appreciate if you could kindly go through it and suggest corrections/additions so that I can settle with a coherent proposal.
Link to my proposal : https://www.mediawiki.org/wiki/User:Karthikprasad/gsoc2012proposal
Nice, but why only English?
If i understand the proposal correctly, this project is supposed to be able to work with almost any language with very little effort.
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
You do understand correctly!
The main idea about NLP components is with POS tagger as an example:
1. a fall back system that does unsupervised POS tagging. 2. the ability to plug in an existing POS tagger as these become available for specific languages.
I would as supervisor would recommend working with 3 languages. English, Hebrew, and the GSOC native language.
If we could get QA from other native speakers we would incorporate them into the workflow.
I think that by using a deletion/reversion based heuristic we may also be able to make a spam corpus to boost the accuracy of the corpuses.
Operation Manager E-mail: oren@romai-horizon.com Mobil: +36 30 866 6706
Római Horizon Kft. H-1039 Budapest Királyok útja 291. D. ép. fszt. 2. Tel: +36 1 492 1492 Fax: +36 1 266 5529
-----Original Message----- From: wikitech-l-bounces@lists.wikimedia.org [mailto:wikitech-l-bounces@lists.wikimedia.org] On Behalf Of Amir E. Aharoni Sent: Tuesday, April 03, 2012 10:19 PM To: Wikimedia developers Subject: Re: [Wikitech-l] GSoC 2012: Proposal-Wikipedia Corpus Tools
2012/4/3 karthik prasad karthikprasad008@gmail.com:
Hello, I am a GSoC aspirant and have compiled a proposal for one of the project ideas - Wikipedia Corpus Tools. [Mentor : Oren Bochman] I would sincerely appreciate if you could kindly go through it and suggest corrections/additions so that I can settle with a coherent proposal.
Link to my proposal : https://www.mediawiki.org/wiki/User:Karthikprasad/gsoc2012proposal
Nice, but why only English?
If i understand the proposal correctly, this project is supposed to be able to work with almost any language with very little effort.
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
_______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
2012/4/4 Oren Bochman orenbochman@gmail.com:
You do understand correctly!
The main idea about NLP components is with POS tagger as an example:
Just to make sure, POS = part of speech, isn't it?
It's one of the most confusing TLAs in computing :)
If we could get QA from other native speakers we would incorporate them into the workflow.
Good. As long as there is a way to plug other languages and a way for speakers of other languages to contribute QA, i'm very happy.
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
wikitech-l@lists.wikimedia.org