Dear Sirs,
I am grateful for your valuable feedback and suggestions.
I have updated my proposal based on the inputs given by you. The split-up
of the deliverables on the ideas page indeed helped me understand the
requirements more clearly.
The link to my updated proposal is
https://www.mediawiki.org/wiki/User:Karthikprasad/gsoc2012proposal
I request you and everyone to kindly skim through my proposal once again
and suggest changes/additions.
I am very excited about this project and working with you; and truth be
told, 23rd April seems like ages ahead.
Thanking you,
Yours sincerely,
Karthik
Date: Wed, 4 Apr 2012 11:49:41 +0200
From: "Oren Bochman" <orenbochman(a)gmail.com>
To: "'Wikimedia developers'" <wikitech-l(a)lists.wikimedia.org>
Subject: Re: [Wikitech-l] GSoC 2012: Proposal-Wikipedia Corpus Tools
Message-ID: <007f01cd1248$42ee6f40$c8cb4dc0$@com>
Content-Type: text/plain; charset="utf-8"
You do understand correctly!
The main idea about NLP components is with POS tagger as an example:
1. a fall back system that does unsupervised POS tagging.
2. the ability to plug in an existing POS tagger as these become
available for specific languages.
I would as supervisor would recommend working with 3 languages.
English, Hebrew, and the GSOC native language.
If we could get QA from other native speakers we would incorporate them
into the workflow.
I think that by using a deletion/reversion based heuristic we may also be
able to make a spam corpus to boost the accuracy of the corpuses.
Operation Manager
E-mail: oren(a)romai-horizon.com
Mobil: +36 30 866 6706
R?mai Horizon Kft.
H-1039 Budapest
Kir?lyok ?tja 291. D. ?p. fszt. 2.
Tel: +36 1 492 1492
Fax: +36 1 266 5529
-----Original Message-----
From: wikitech-l-bounces(a)lists.wikimedia.org [mailto:
wikitech-l-bounces(a)lists.wikimedia.org] On Behalf Of Amir E. Aharoni
Sent: Tuesday, April 03, 2012 10:19 PM
To: Wikimedia developers
Subject: Re: [Wikitech-l] GSoC 2012: Proposal-Wikipedia Corpus Tools
2012/4/3 karthik prasad <karthikprasad008(a)gmail.com>om>:
Hello,
I am a GSoC aspirant and have compiled a proposal for one of the
project ideas - Wikipedia Corpus Tools. [Mentor : Oren Bochman] I
would sincerely appreciate if you could kindly go through it and
suggest corrections/additions so that I can settle with a coherent
proposal.
Nice, but why only English?
If i understand the proposal correctly, this project is supposed to be
able to work with almost any language with very little effort.
--
Amir Elisha Aharoni ? ?????? ????????? ??????????
http://aharoni.wordpress.com ??We're living in pieces, I want to live in
peace.? ? T. Moore?
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
------------------------------
Date: Wed, 4 Apr 2012 12:58:11 +0300
From: "Amir E. Aharoni" <amir.aharoni(a)mail.huji.ac.il>
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
Subject: Re: [Wikitech-l] GSoC 2012: Proposal-Wikipedia Corpus Tools
Message-ID:
<CACtNa8tS-PifzJS1JsF02k3qW_-7=UK-wDQnVSfLGLufhxnmNw(a)mail.gmail.com
Content-Type: text/plain; charset=UTF-8
2012/4/4 Oren Bochman <orenbochman(a)gmail.com>om>:
> You do understand correctly!
> The main idea about NLP components is
with POS tagger as an example:
Just to make sure, POS = part of speech, isn't it?
It's one of the most confusing TLAs in computing :)
If we could get QA from other native speakers we
would incorporate them
into the workflow.
Good. As long as there is a way to plug other languages and a way for
speakers of other languages to contribute QA, i'm very happy.
--
Amir Elisha Aharoni ? ?????? ????????? ??????????
http://aharoni.wordpress.com
??We're living in pieces,
I want to live in peace.? ? T. Moore?
Date: Wed, 4 Apr 2012 00:28:29 -0400
From: Gregory Varnum <gregory.varnum(a)gmail.com>
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
Subject: Re: [Wikitech-l] GSoC 2012: Proposal-Wikipedia Corpus Tools
Message-ID: <AC4C429F-A839-4911-BE9B-C8928AA2DD8C(a)gmail.com>
Content-Type: text/plain; charset=utf-8
Whoops - I meant that email to be directed to Karthik - although Amir
you're welcome to read it as well. :)
-greg
On Apr 3, 2012, at 11:24 PM, Gregory Varnum <gregory.varnum(a)gmail.com>
wrote:
Amir,
Thank you for your GSOC proposal! :)
Between now and Google's submission deadline on April 6th - you are
invited to
further modify your proposals. The GSOC page on
MW.org -
https://www.mediawiki.org/wiki/GSOC - and our IRC rooms -
https://www.mediawiki.org/wiki/MediaWiki_on_IRC
Looking over your proposal - I think you've got good background
information on
yourself. However, I think you should flush out more
details on the proposed project. Without more familiarity with corpus (and
with no links to find that info) - it's hard for everyone to weigh in
equally or to make sure your project gets the full consideration you'd like.
-greg aka varnent
On Apr 3, 2012, at 4:18 PM, Amir E. Aharoni <amir.aharoni(a)mail.huji.ac.il>
wrote:
> 2012/4/3 karthik prasad <karthikprasad008(a)gmail.com>om>:
>> Hello,
>> I am a GSoC aspirant and have compiled a proposal for one of the project
>> ideas - Wikipedia Corpus Tools. [Mentor : Oren Bochman]
>> I would sincerely appreciate if you could kindly go through it and
suggest
corrections/additions so that I can settle with a
coherent proposal.
Link to my proposal :
https://www.mediawiki.org/wiki/User:Karthikprasad/gsoc2012proposal
Nice, but why only English?
If i understand the proposal correctly, this project is supposed to be
able to work with almost any language with very little effort.
--
Amir Elisha Aharoni ? ?????? ????????? ??????????
http://aharoni.wordpress.com
??We're living in pieces,
I want to live in peace.? ? T. Moore?
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l