Re: [Wikitech-l] GSoC 2012: Proposal-Wikipedia Corpus Tools (Oren Bochman) (Amir E. Aharoni)(Gregory Varnum)

4 Apr 2012

Dear Sirs,
I am grateful for your valuable feedback and suggestions.

I have updated my proposal based on the inputs given by you. The split-up
of the deliverables on the ideas page indeed helped me understand the
requirements more clearly.

The link to my updated proposal is
https://www.mediawiki.org/wiki/User:Karthikprasad/gsoc2012proposal

I request you and everyone to kindly skim through my proposal once again
and suggest changes/additions.
I am very excited about this project and working with you; and truth be
told, 23rd April seems like ages ahead.

Thanking you,
Yours sincerely,
Karthik

...
  Date: Wed, 4 Apr 2012 11:49:41 +0200
 From: "Oren Bochman" &lt;orenbochman(a)gmail.com&gt;
 To: "'Wikimedia developers'" &lt;wikitech-l(a)lists.wikimedia.org&gt;
 Subject: Re: [Wikitech-l] GSoC 2012: Proposal-Wikipedia Corpus Tools
 Message-ID: <007f01cd1248$42ee6f40$c8cb4dc0$@com>
 Content-Type: text/plain;       charset="utf-8"

 You do understand correctly!

 The main idea about NLP components is with POS tagger as an example:

 1. a fall back system that does unsupervised POS tagging.
 2. the ability to plug in an existing POS tagger as these become
  available for specific languages.

 I would as supervisor would recommend working with 3 languages.
 English, Hebrew, and the GSOC native language.

 If we could get QA from other native speakers we would incorporate them
 into the workflow.

 I think that by using a deletion/reversion based heuristic we may also be
 able to make a spam corpus to boost the accuracy of the corpuses.

 Operation Manager
 E-mail: oren(a)romai-horizon.com
 Mobil: +36 30 866 6706

 R?mai Horizon Kft.
 H-1039 Budapest
 Kir?lyok ?tja  291. D. ?p. fszt. 2.
 Tel:   +36 1 492 1492
 Fax:  +36 1 266 5529

 -----Original Message-----
 From: wikitech-l-bounces(a)lists.wikimedia.org [mailto:
 wikitech-l-bounces(a)lists.wikimedia.org] On Behalf Of Amir E. Aharoni
 Sent: Tuesday, April 03, 2012 10:19 PM
 To: Wikimedia developers
 Subject: Re: [Wikitech-l] GSoC 2012: Proposal-Wikipedia Corpus Tools

 2012/4/3 karthik prasad &lt;karthikprasad008(a)gmail.com&gt;om>:
  Hello,
 I am a GSoC aspirant and have compiled a proposal for one of the
 project ideas - Wikipedia Corpus Tools. [Mentor : Oren Bochman] I
 would sincerely appreciate if you could kindly go through it and
 suggest corrections/additions so that I can settle with a coherent  proposal.

 Link to my proposal :
 https://www.mediawiki.org/wiki/User:Karthikprasad/gsoc2012proposal 
 Nice, but why only English?

 If i understand the proposal correctly, this project is supposed to be
 able to work with almost any language with very little effort.

 --
 Amir Elisha Aharoni ? ?????? ????????? ??????????
 http://aharoni.wordpress.com ??We're living in pieces, I want to live in
 peace.? ? T. Moore?

 _______________________________________________
 Wikitech-l mailing list
 Wikitech-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

 ------------------------------

 Date: Wed, 4 Apr 2012 12:58:11 +0300
 From: "Amir E. Aharoni" &lt;amir.aharoni(a)mail.huji.ac.il&gt;
 To: Wikimedia developers &lt;wikitech-l(a)lists.wikimedia.org&gt;
 Subject: Re: [Wikitech-l] GSoC 2012: Proposal-Wikipedia Corpus Tools
 Message-ID:
        &lt;CACtNa8tS-PifzJS1JsF02k3qW_-7=UK-wDQnVSfLGLufhxnmNw(a)mail.gmail.com
   Content-Type: text/plain; charset=UTF-8

 2012/4/4 Oren Bochman &lt;orenbochman(a)gmail.com&gt;om>:
 > You do understand correctly!
   > The main idea about NLP components is
with POS tagger as an example:

 Just to make sure, POS = part of speech, isn't it?

 It's one of the most confusing TLAs in computing :)

  If we could get QA from other native speakers we
would incorporate them  into the workflow.

 Good. As long as there is a way to plug other languages and a way for
 speakers of other languages to contribute QA, i'm very happy.

 --
 Amir Elisha Aharoni ? ?????? ????????? ??????????
 http://aharoni.wordpress.com
 ??We're living in pieces,
 I want to live in peace.? ? T. Moore?

Date: Wed, 4 Apr 2012 00:28:29 -0400
From: Gregory Varnum &lt;gregory.varnum(a)gmail.com&gt;
To: Wikimedia developers &lt;wikitech-l(a)lists.wikimedia.org&gt;
Subject: Re: [Wikitech-l] GSoC 2012: Proposal-Wikipedia Corpus Tools
Message-ID: &lt;AC4C429F-A839-4911-BE9B-C8928AA2DD8C(a)gmail.com&gt;
Content-Type: text/plain; charset=utf-8

Whoops - I meant that email to be directed to Karthik - although Amir
you're welcome to read it as well.  :)

-greg

On Apr 3, 2012, at 11:24 PM, Gregory Varnum &lt;gregory.varnum(a)gmail.com&gt;
wrote:

...
  Amir,

 Thank you for your GSOC proposal!  :)

 Between now and Google's submission deadline on April 6th - you are invited to
further modify your proposals.  The GSOC page on MW.org -
https://www.mediawiki.org/wiki/GSOC - and our IRC rooms -
https://www.mediawiki.org/wiki/MediaWiki_on_IRC
...

 Looking over your proposal - I think you've got good background information on
yourself.  However, I think you should flush out more
details on the proposed project.  Without more familiarity with corpus (and
with no links to find that info) - it's hard for everyone to weigh in
equally or to make sure your project gets the full consideration you'd like.
...

 -greg aka varnent

 On Apr 3, 2012, at 4:18 PM, Amir E. Aharoni &lt;amir.aharoni(a)mail.huji.ac.il&gt;
wrote:
...

> 2012/4/3 karthik prasad &lt;karthikprasad008(a)gmail.com&gt;om>:
>> Hello,
>> I am a GSoC aspirant and have compiled a proposal for one of the project
>> ideas - Wikipedia Corpus Tools. [Mentor : Oren Bochman]
>> I would sincerely appreciate if you could kindly go through it and suggest
...
    corrections/additions so that I can settle with a
coherent proposal.

 Link to my proposal :
 https://www.mediawiki.org/wiki/User:Karthikprasad/gsoc2012proposal 
 Nice, but why only English?

 If i understand the proposal correctly, this project is supposed to be
 able to work with almost any language with very little effort.

 --
 Amir Elisha Aharoni ? ?????? ????????? ??????????
 http://aharoni.wordpress.com
 ??We're living in pieces,
 I want to live in peace.? ? T. Moore?

 _______________________________________________
 Wikitech-l mailing list
 Wikitech-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l 

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] GSoC 2012: Proposal-Wikipedia Corpus Tools (Oren Bochman) (Amir E. Aharoni)(Gregory Varnum)