Hi,
I am sudeep. I am final year student at Indian Institute of Technology,
Kharagpur in the computer science department.
I am interested to apply in the following projects for gsoc 2012
1. Lucene automatic query expansion from wikipedia text
2. Backwards compatibility extension
3. Semantic form rules
4. Index transcluded text in search
I have a strong background in Information retrieval and Machine learning. I
have worked previously with Yahoo Research Labs in the area of Information
retrieval. We extracted association rules and attribite-value pairs from
the webpages using unsupervised approach.
I have also worked on another project with yahoo, which involved emotion
detection of youtube videos, based on the comments of the users. We used
various ML, Statisitcs andf IR techniques to achieve our goal.
I last year succesfully completed GSOC 2011, with OSGEO and have good
experience in Open Source Development.
Kindly let me know how shall I proceed with my application.
Thanks
regards
Sudeep
Hey everyone,
I've had an awful lot of interest in the Who's Been Awesome/Get merchandise
to reward the community extension I proposed and we can only really take
one in the end so I wanted to make sure that everyone knew the score and
mentors still looking for help could chime in and let us know.
There has been 8 or 9 people ask about the project and we have 1, almost
full, proposal so far. Part of that has been me being slow in responding
but if you're interested I encourage you to either submit a proposal soon
or look at other options (or both!) so that we can get as many people in a
possible! If you are still waiting for answers from me or you have other
questions feel free to shoot me an email and I'll be setting time aside
tomorrow (bed soon) to go through them all.
Other mentors if you're still looking for help please let us know so that
we can get as many of these great candidates as possible!
James
--
James Alexander
Manager, Merchandise
Wikimedia Foundation
(415) 839-6885 x6716 @jamesofur
Hi everyone,
This email is going to briefly describe the old SVN workflow, and then
use that as a baseline to describe what we should do for Git. I
haven't had a chance to coordinate this mail with Chad (or anyone
else), so I'll reserve the right for him to completely contradict me
here. This is meant to provoke a discussion about how we're really
going to use Git, and to establish a plan for taking advantage of the
new workflow to move to much more frequent deployments.
In the old world, we had this:
trunk
├── REL1_17
│ └── 1.17wmf1 (branched from REL1_17)
├── REL1_18
│ └── 1.18wmf1 (branched from REL1_18)
└── REL1_19
└── 1.19wmf1 (branched from REL1_19)
Tarball releases would come out of the respective REL1_xx branches,
and deployments would come out of the 1.xxwmf1 branches. REL1_xx
branches have all extensions, and 1.xxwmf1 branches have only
Wikimedia production code. Each would be a relatively long lived
branch (6-18 months) into which critical fixes and priority features
would be merged from trunk.
Looking ahead to deployments, there's a couple of different ways to go
about this:
One plan would be to have a "wmf" branch that does not trail far
behind the master. The extensions we deploy to the cluster can be
included as submodules for that given branch. The process for
deployment at that point will be "merge from master" or "update
submodule reference" on the wmf branch. Then on fenari, you will git
pull and git submodule update before scapping like you're currently
used to. The downside of this approach is that there's not an obvious
way to have multiple production branches in play (heterogeneous
deploy). Seems solvable (e.g wmf1, wmf2, etc), but that also seems
messy.
Another possible plan would be to have something *somewhat* closer to
what we have today, with new branches off of trunk for each
deployment, and deployments happening as frequently as weekly.
master
├── 1.20wmf01
├── 1.20wmf02
├── 1.20wmf03
...
├── 1.20wmf11
├── 1.20wmf12
├── REL1_20
├── 1.21wmf01
├── 1.21wmf02
├── 1.21wmf03
...
This is how I was envisioning the process working, and just didn't get
a chance to sync up with Chad to find out what the issues of this
approach would be.
Since we don't have an imminent deployment coming from Git, we have a
little time to figure this situation out.
Regardless of the branching strategy, the goal would be to start as
early as April with much more frequent deployments to production. The
deployment plan would look something like this:
* Deploy 1.20wmf01 to test2 real soon now (say, no later than April 16).
* Deploy 1.20wmf01 to mediawiki.org a couple deploy days after that
("deploy day" meaning Monday through Thursday)
* Let simmer for some short-ish amount of time (TBD)
* Roll out 1.20wmf01 to more wikis, eventually making it to all of them
Given the way APC caches and other caching works, I suspect we can't
get away with having more than two simultaneous versions out on the
production cluster, but we could conceivably have a situation where,
for example, a deploy day or two after rolling out 1.20wmf01 out to
the last of the wikis, we then roll out 1.20wmf02 out to test2.
This topic is partially covered here:
https://www.mediawiki.org/wiki/Git/Workflow#Who_can_review.3F_Gerrit_projec…
...but I imagine we'll probably need to revise that based on this
conversation and perhaps break this out into a separate page.
There's a few of us that plan to meet in a couple of weeks to
formalize something here, but perhaps we can get this all hammered out
on-list prior to that.
Thoughts on this process?
Rob
Hello,
I'm Gautham Shankar from India pursuing my 4th year bachelors in computer
science and engineering.I find the project proposal "Lucene Automatic Query
Expansion from Wikipedia Text" in GSOC 2012 very interesting and would love
to work on it.
i have created a proposal for the idea
https://www.mediawiki.org/wiki/User:Gautham_shankar/Gsoc
I have experience in data mining and have built a recommendation framework
using the heat diffusion principle which has been tested on the AOL search
dataset to recommend better queries that can be typed
for a given input query.It has been implemented in java. Since it is a
framework it can be used to recommend different types of data. for example
the same framework can be used to recommend movies as well as music.Im
currently working on an extension of this project to add social network
graphs so as to recommend people who have the same interests in movie,
music etc when a query is typed.
I have also built a web based product "hive" which is a networking platform
for members of the power generation industry. The users can share their
experiences and it is a open forum where members interact with one another
to effectively run their machines and solve common problems. The product
has been implemented using PHP, mysql, javascript (inc ajax). Lucene is the
search engine and phpbb is used for forums.
<https://www.mediawiki.org/wiki/User:Gautham_shankar/Gsoc>
it would be very helpful if anyone could give a feedback and guide me in
improving the proposal.
Eagerly awaiting a response.
Regards,
Gautham Shankar
Dear Sirs,
I am grateful for your valuable feedback and suggestions.
I have updated my proposal based on the inputs given by you. The split-up
of the deliverables on the ideas page indeed helped me understand the
requirements more clearly.
The link to my updated proposal is
https://www.mediawiki.org/wiki/User:Karthikprasad/gsoc2012proposal
I request you and everyone to kindly skim through my proposal once again
and suggest changes/additions.
I am very excited about this project and working with you; and truth be
told, 23rd April seems like ages ahead.
Thanking you,
Yours sincerely,
Karthik
> Date: Wed, 4 Apr 2012 11:49:41 +0200
> From: "Oren Bochman" <orenbochman(a)gmail.com>
> To: "'Wikimedia developers'" <wikitech-l(a)lists.wikimedia.org>
> Subject: Re: [Wikitech-l] GSoC 2012: Proposal-Wikipedia Corpus Tools
> Message-ID: <007f01cd1248$42ee6f40$c8cb4dc0$@com>
> Content-Type: text/plain; charset="utf-8"
>
> You do understand correctly!
>
> The main idea about NLP components is with POS tagger as an example:
>
> 1. a fall back system that does unsupervised POS tagging.
> 2. the ability to plug in an existing POS tagger as these become
> available for specific languages.
>
> I would as supervisor would recommend working with 3 languages.
> English, Hebrew, and the GSOC native language.
>
> If we could get QA from other native speakers we would incorporate them
> into the workflow.
>
> I think that by using a deletion/reversion based heuristic we may also be
> able to make a spam corpus to boost the accuracy of the corpuses.
>
>
> Operation Manager
> E-mail: oren(a)romai-horizon.com
> Mobil: +36 30 866 6706
>
>
>
> R?mai Horizon Kft.
> H-1039 Budapest
> Kir?lyok ?tja 291. D. ?p. fszt. 2.
> Tel: +36 1 492 1492
> Fax: +36 1 266 5529
>
> -----Original Message-----
> From: wikitech-l-bounces(a)lists.wikimedia.org [mailto:
> wikitech-l-bounces(a)lists.wikimedia.org] On Behalf Of Amir E. Aharoni
> Sent: Tuesday, April 03, 2012 10:19 PM
> To: Wikimedia developers
> Subject: Re: [Wikitech-l] GSoC 2012: Proposal-Wikipedia Corpus Tools
>
> 2012/4/3 karthik prasad <karthikprasad008(a)gmail.com>:
> > Hello,
> > I am a GSoC aspirant and have compiled a proposal for one of the
> > project ideas - Wikipedia Corpus Tools. [Mentor : Oren Bochman] I
> > would sincerely appreciate if you could kindly go through it and
> > suggest corrections/additions so that I can settle with a coherent
> proposal.
> >
> > Link to my proposal :
> > https://www.mediawiki.org/wiki/User:Karthikprasad/gsoc2012proposal
>
> Nice, but why only English?
>
> If i understand the proposal correctly, this project is supposed to be
> able to work with almost any language with very little effort.
>
> --
> Amir Elisha Aharoni ? ?????? ????????? ??????????
> http://aharoni.wordpress.com ??We're living in pieces, I want to live in
> peace.? ? T. Moore?
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
>
>
> ------------------------------
>
>
> Date: Wed, 4 Apr 2012 12:58:11 +0300
> From: "Amir E. Aharoni" <amir.aharoni(a)mail.huji.ac.il>
> To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
> Subject: Re: [Wikitech-l] GSoC 2012: Proposal-Wikipedia Corpus Tools
> Message-ID:
> <CACtNa8tS-PifzJS1JsF02k3qW_-7=UK-wDQnVSfLGLufhxnmNw(a)mail.gmail.com
> >
> Content-Type: text/plain; charset=UTF-8
>
> 2012/4/4 Oren Bochman <orenbochman(a)gmail.com>:
> > You do understand correctly!
> >
> > The main idea about NLP components is with POS tagger as an example:
>
> Just to make sure, POS = part of speech, isn't it?
>
> It's one of the most confusing TLAs in computing :)
>
> > If we could get QA from other native speakers we would incorporate them
> into the workflow.
>
> Good. As long as there is a way to plug other languages and a way for
> speakers of other languages to contribute QA, i'm very happy.
>
> --
> Amir Elisha Aharoni ? ?????? ????????? ??????????
> http://aharoni.wordpress.com
> ??We're living in pieces,
> I want to live in peace.? ? T. Moore?
>
Date: Wed, 4 Apr 2012 00:28:29 -0400
From: Gregory Varnum <gregory.varnum(a)gmail.com>
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
Subject: Re: [Wikitech-l] GSoC 2012: Proposal-Wikipedia Corpus Tools
Message-ID: <AC4C429F-A839-4911-BE9B-C8928AA2DD8C(a)gmail.com>
Content-Type: text/plain; charset=utf-8
Whoops - I meant that email to be directed to Karthik - although Amir
you're welcome to read it as well. :)
-greg
On Apr 3, 2012, at 11:24 PM, Gregory Varnum <gregory.varnum(a)gmail.com>
wrote:
> Amir,
>
> Thank you for your GSOC proposal! :)
>
> Between now and Google's submission deadline on April 6th - you are
invited to further modify your proposals. The GSOC page on MW.org -
https://www.mediawiki.org/wiki/GSOC - and our IRC rooms -
https://www.mediawiki.org/wiki/MediaWiki_on_IRC
>
> Looking over your proposal - I think you've got good background
information on yourself. However, I think you should flush out more
details on the proposed project. Without more familiarity with corpus (and
with no links to find that info) - it's hard for everyone to weigh in
equally or to make sure your project gets the full consideration you'd like.
>
> -greg aka varnent
>
>
> On Apr 3, 2012, at 4:18 PM, Amir E. Aharoni <amir.aharoni(a)mail.huji.ac.il>
wrote:
>
>> 2012/4/3 karthik prasad <karthikprasad008(a)gmail.com>:
>>> Hello,
>>> I am a GSoC aspirant and have compiled a proposal for one of the project
>>> ideas - Wikipedia Corpus Tools. [Mentor : Oren Bochman]
>>> I would sincerely appreciate if you could kindly go through it and
suggest
>>> corrections/additions so that I can settle with a coherent proposal.
>>>
>>> Link to my proposal :
>>> https://www.mediawiki.org/wiki/User:Karthikprasad/gsoc2012proposal
>>
>> Nice, but why only English?
>>
>> If i understand the proposal correctly, this project is supposed to be
>> able to work with almost any language with very little effort.
>>
>> --
>> Amir Elisha Aharoni ? ?????? ????????? ??????????
>> http://aharoni.wordpress.com
>> ??We're living in pieces,
>> I want to live in peace.? ? T. Moore?
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> Wikitech-l(a)lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
Hi everyone,
I seek to work on building a Convention extension as part of the Google
Summer of Code project.I have set up a proposal for the same , here is the
link http://www.mediawiki.org/wiki/User:Chughakshay16/GSOCProposal(2012).
I haven't found a mentor to work with me for this project yet, so if anyone
feels the need for this extension just the way I do, please feel free to
add the feedback to the proposal page, or reply here.
More information regarding this extension can be found here :-
http://www.mediawiki.org/wiki/User:Chughakshay16/ConventionExtension
Thanks ,
Akshay Chugh
(irc - chughakshay16)