Hi everyone,
I am an engineering student from Bits Pilani (India) currently in my 5th
and final year.I seek to apply for GSOC this year under Mediawiki. So from
the list of ideas given on the Mediawiki gsoc page, I wish to work on
building a convention extension which would help convert any wiki into a
conference like website such as Wikimania.After having discussions over IRC
channels regarding the features that this extension should possess , and
some feedback that I got from other developers I have written a proposal
for this extension. I would really appreciate any feedback in this short
period of time left, as it would help me in setting the right deliverables
for this project.
The proposal page -
http://www.mediawiki.org/wiki/User:Chughakshay16/GSOCProposal%282012%29
The other details about this extension can be found on the following pages:
1. implementation details (+UI mockups) -
http://www.mediawiki.org/wiki/User:Chughakshay16/ConventionExtension
2. database details -
http://www.mediawiki.org/wiki/User:Chughakshay16/databasedetails
The talk pages for the above can also be used for the feedback.
Thanks,
Akshay Chugh
(irc - chughakshay16)
On Tue, Apr 3, 2012 at 4:46 AM, <wikitech-l-request(a)lists.wikimedia.org>wrote:
> Send Wikitech-l mailing list submissions to
> wikitech-l(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> or, via email, send a message with subject or body 'help' to
> wikitech-l-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> wikitech-l-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wikitech-l digest..."
>
>
> Today's Topics:
>
> 1. Re: Time to redirect to https by default? (Ryan Lane)
> 2. Re: Time to redirect to https by default? (Ryan Lane)
> 3. Re: Time to redirect to https by default? (Ryan Lane)
> 4. Re: Time to redirect to https by default? (MZMcBride)
> 5. Re: Committing followups: please no --amend (Chad)
> 6. Re: Time to redirect to https by default? (Platonides)
> 7. Re: Time to redirect to https by default? (Ryan Lane)
> 8. rsync on scap/sync reporting 'no space left on device' for a
> lot of hosts (Arthur Richards)
> 9. Re: Time to redirect to https by default? (Antoine Musso)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 3 Apr 2012 03:34:13 +0900
> From: Ryan Lane <rlane32(a)gmail.com>
> To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
> Subject: Re: [Wikitech-l] Time to redirect to https by default?
> Message-ID:
> <CALKgCA0RLExmwpdJ18ATtPJ_b=h7JBc7AJ=Pj+2zgUYvnRPJ4w(a)mail.gmail.com
> >
> Content-Type: text/plain; charset=ISO-8859-1
>
> On Mon, Apr 2, 2012 at 12:33 PM, Tim Starling <tstarling(a)wikimedia.org>
> wrote:
> > On 02/04/12 06:14, Ryan Lane wrote:
> >> TL;DR: we have no plans for anonymous HTTPS by default, but will
> >> eventually default to HTTPS for logged-in users.
> >>
> >> 1. It would require an ssl terminator on every frontend cache. The ssl
> >> terminators eat memory, which is also what the frontend caches do.
> >
> > Once we enable it by default for logged-in users, we will care a lot
> > more if someone tries to take it down with a DoS attack. Unless the
> > redirection can be disabled without actually logging in, a DoS attack
> > on the HTTPS frontend would prevent any authenticated activity.
> >
> > It suggests a need for a robust, overprovisioned service, with tools
> > and procedures in place for identifying and blocking or throttling
> > malicious traffic.
> >
>
> Indeed. We're already pretty over provisioned. We have 4 servers per
> datacenter, each of which is very bored. All they are doing is acting
> as a transparent proxy, after ssl termination. We're using RC4 by
> default (due to BEAST), and AES is also available (the processors we
> are using have AES support).
>
> Ideally we'll be using STS for logged in users. This will mean it's
> impossible to turn off the redirection for users that have already
> logged in for whatever period of time we have STS headers set. We need
> to consider blocking a DoS from the SSL proxies, the LVS servers, or
> the routers.
>
> >> 3. Some countries may completely block HTTPS, but allow HTTP to our
> >> sites so that they can track users. Is it better for us to provide
> >> them content, or protect their privacy?
> >> 4. It's still possible for governments to see that people are going to
> >> wikimedia sites when using HTTPS, so it's still possible to oppress
> >> people for trying to visit sites that are disallowed.
> >
> > It's also possible for governments to snoop on HTTPS communications,
> > by using a private key from a trusted CA to perform a
> > man-in-the-middle attack. Apparently the government of Iran has done
> this.
> >
>
> We really should publish our certificate fingerprints. An attack like
> this can be detected. An end-user being attacked can see if the
> certificate they are being handed is different from the one we
> advertise. We could also provide a convergence notary service (or one
> of the other things like convergence).
>
> > If we really want to protect the privacy of our users then we should
> > shut down the regular website and serve our content only via a Tor
> > hidden service ;)
> >
>
> I agree that it's impossible to provide total protection of a user's
> privacy. We could provide a number of services that would help users,
> though. That said, I don't feel this should be on the top of our
> priority list.
>
> - Ryan
>
>
>
> ------------------------------
>
> Message: 2
> Date: Tue, 3 Apr 2012 03:58:41 +0900
> From: Ryan Lane <rlane32(a)gmail.com>
> To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
> Subject: Re: [Wikitech-l] Time to redirect to https by default?
> Message-ID:
> <CALKgCA1ZXK8TBedG4cgdLeg-LsW7aN=ujf9O2V0QKL33DUYDVA(a)mail.gmail.com
> >
> Content-Type: text/plain; charset=ISO-8859-1
>
> On Mon, Apr 2, 2012 at 4:20 PM, Petr Bena <benapetr(a)gmail.com> wrote:
> > That's not what I wanted to say, I wanted to say "https may cause
> > troubles with caching", In fact some caching servers have problems
> > with https since the header is encrypted as well, so they usually just
> > forward the encrypted traffic to server. I don't say it's impossible
> > to cache this, but it's very complicated
> >
>
> Using SSL by default means all transparent proxies inbetween aren't
> hit at all, since they'd be a MITM. I don't necessarily see this as a
> bad thing, as transparent proxies often break things.
>
> Browsers cache things differently from HTTPS sites, but otherwise
> everything should work as normal. The SSL termination proxies
> transparently proxy to our frontend caches after termination. Links
> are sent as protocol-relative so that we don't split our cache, as
> well.
>
> - Ryan
>
>
>
> ------------------------------
>
> Message: 3
> Date: Tue, 3 Apr 2012 04:00:32 +0900
> From: Ryan Lane <rlane32(a)gmail.com>
> To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
> Subject: Re: [Wikitech-l] Time to redirect to https by default?
> Message-ID:
> <CALKgCA18yzdDpiqS1RAViaz8O7nw68K2Bz0qZaCp1i0g1TxTNg(a)mail.gmail.com
> >
> Content-Type: text/plain; charset=ISO-8859-1
>
> On Mon, Apr 2, 2012 at 6:34 PM, Tei <oscar.vives(a)gmail.com> wrote:
> > Perhaps have a black list of countries that are know to break the
> > privacy of communications, then make https default for logued users in
> > these countries.
> >
> > This may help because:
> >
> > ?- It only affect a subgroup of users (the ones from these countries)
> > ?- It only affect a subgroup of that subgroup, ?the logued users (not
> all)
> > ?- It create a blacklist of "bad countries" where citizens are under
> > surveillance by the governement
> >
> > This perhaps is not feasible, if theres not easy way to detect the
> > country based on the ip.
> >
>
> I'd definitely not support doing something like this. This would
> incredibly complicate things.
>
> - Ryan
>
>
>
> ------------------------------
>
> Message: 4
> Date: Mon, 02 Apr 2012 16:26:57 -0400
> From: MZMcBride <z(a)mzmcbride.com>
> To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
> Subject: Re: [Wikitech-l] Time to redirect to https by default?
> Message-ID: <CB9F83D1.18E9B%z(a)mzmcbride.com>
> Content-Type: text/plain; charset="ISO-8859-1"
>
> Ryan Lane wrote:
> > On Mon, Apr 2, 2012 at 6:34 PM, Tei <oscar.vives(a)gmail.com> wrote:
> >> Perhaps have a black list of countries that are know to break the
> >> privacy of communications, then make https default for logued users in
> >> these countries.
> >>
> >> This may help because:
> >>
> >> ?- It only affect a subgroup of users (the ones from these countries)
> >> ?- It only affect a subgroup of that subgroup, ?the logued users (not
> all)
> >> ?- It create a blacklist of "bad countries" where citizens are under
> >> surveillance by the governement
> >>
> >> This perhaps is not feasible, if theres not easy way to detect the
> >> country based on the ip.
> >
> > I'd definitely not support doing something like this. This would
> > incredibly complicate things.
>
> Someone came into #wikimedia-tech a few days ago and asked about something
> similar to this. The idea was to use site-wide JavaScript to auto-redirect
> users to https on one of the Chinese Wikipedias. I believe this was in
> combination with geolocation functionality, but I'm not sure.
>
> Do you have any thoughts on individual wikis doing this, assuming there's
> local community consensus?
>
> MZMcBride
>
>
>
>
>
> ------------------------------
>
> Message: 5
> Date: Mon, 2 Apr 2012 17:05:03 -0400
> From: Chad <innocentkiller(a)gmail.com>
> To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
> Subject: Re: [Wikitech-l] Committing followups: please no --amend
> Message-ID:
> <CADn73rPYs_H5tR0fQ0bvwAkjrJTnhA3Tk52FmLj1RbsekMp38A(a)mail.gmail.com
> >
> Content-Type: text/plain; charset=UTF-8
>
> On Tue, Mar 27, 2012 at 6:06 AM, Tim Starling <tstarling(a)wikimedia.org>
> wrote:
> > On 27/03/12 19:49, Roan Kattouw wrote:
> >> On Mon, Mar 26, 2012 at 9:50 PM, Tim Starling <tstarling(a)wikimedia.org>
> wrote:
> >>> For commits with lots of files, Gerrit's diff interface is too broken
> >>> to be useful. It does not provide a compact overview of the change
> >>> which is essential for effective review.
> >>>
> >>> Luckily, there are alternatives, specifically local git clients and
> >>> gitweb. However, these don't work when git's change model is broken by
> >>> the use of git commit --amend.
> >>>
> >> They do; it just wasn't obvious to you how to do it, but that doesn't
> >> mean it can't be done.
> >>
> >> $ git fetch https://gerrit.wikimedia.org/r/p/analytics/udp-filters
> >> refs/changes/22/3222/3 && git branch foo FETCH_HEAD
> >> $ git fetch https://gerrit.wikimedia.org/r/p/analytics/udp-filters
> >> refs/changes/22/3222/4 && git branch bar FETCH_HEAD
> >> $ git diff foo..bar
> >>
> >> The two 'git fetch' commands (or at least the part before the &&) can
> >> be taken from the change page in Gerrit.
> >
> > It doesn't work, I'm afraid. Because of the implicit rebase on push,
> > usually subsequent changesets have a different parent. So when you
> > diff between the two branches, you get all of the intervening commits
> > which were merged to the master.
> >
> > Examples from today:
> >
> > https://gerrit.wikimedia.org/r/#change,3367
> > Patchsets 1 and 2 have different parents.
> >
> > https://gerrit.wikimedia.org/r/#change,3363
> > Patchsets 1, 2 and 3 have different parents.
> >
> > It's possible to get a diff between them, and I did, but it's tedious.
> > I figure we should pick a workflow that doesn't waste the reviewer's
> > time quite so much.
> >
>
> The problem here is the implicit rebase. As long as the review
> backlog isn't long and/or people aren't submitting conflicting
> changes, rebasing amended changes against master creates
> more harm than good.
>
> For amending commits, you should use `git review -R` so you
> don't rebase the change (again) against master (see for example
> [0], difference between patch 2 and 3). I've updated the docs[1],
> but they are, briefly:
>
> git review -d 123
> (make changes)
> git commit -a --amend
> git review -R
>
> If you're not using git-review and have been using the alias, your
> amended patchsets have not been creating this problem.
>
> -Chad
>
> [0] https://gerrit.wikimedia.org/r/#change,4020
> [1] https://www.mediawiki.org/wiki/Git/Workflow#Amend_your_change
>
>
>
> ------------------------------
>
> Message: 6
> Date: Mon, 02 Apr 2012 23:31:28 +0200
> From: Platonides <Platonides(a)gmail.com>
> To: wikitech-l(a)lists.wikimedia.org
> Subject: Re: [Wikitech-l] Time to redirect to https by default?
> Message-ID: <jld5hm$suh$1(a)dough.gmane.org>
> Content-Type: text/plain; charset=ISO-8859-1
>
> On 02/04/12 20:34, Ryan Lane wrote:
> >> It's also possible for governments to snoop on HTTPS communications,
> >> by using a private key from a trusted CA to perform a
> >> man-in-the-middle attack. Apparently the government of Iran has done
> this.
> >>
> >
> > We really should publish our certificate fingerprints. An attack like
> > this can be detected. An end-user being attacked can see if the
> > certificate they are being handed is different from the one we
> > advertise. We could also provide a convergence notary service (or one
> > of the other things like convergence).
>
> Indeed. Detecting a potential MITM is useless if you can't determine if
> it's real or not. For instance the switch from RapidSSL to DigiCert
> certificate was quite suspicious.
>
> I don't know how to best publicise it, though. I suppose we would list
> them somewhere like https://secure.wikimedia.org/servers.html but if
> nobody knows it's there...
>
>
>
>
>
> ------------------------------
>
> Message: 7
> Date: Tue, 3 Apr 2012 06:35:50 +0900
> From: Ryan Lane <rlane32(a)gmail.com>
> To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
> Subject: Re: [Wikitech-l] Time to redirect to https by default?
> Message-ID:
> <CALKgCA0xW+wJu5LxzxQrJTaUfAf1ELPFjfucVeHaBF8CbaJjAw(a)mail.gmail.com
> >
> Content-Type: text/plain; charset=ISO-8859-1
>
> > Indeed. Detecting a potential MITM is useless if you can't determine if
> > it's real or not. For instance the switch from RapidSSL to DigiCert
> > certificate was quite suspicious.
> >
> > I don't know how to best publicise it, though. I suppose we would list
> > them somewhere like https://secure.wikimedia.org/servers.html but if
> > nobody knows it's there...
> >
>
> What's https://secure.wikimedia.org?
>
> - Ryan
>
>
>
> ------------------------------
>
> Message: 8
> Date: Mon, 2 Apr 2012 16:19:20 -0700
> From: Arthur Richards <arichards(a)wikimedia.org>
> To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
> Subject: [Wikitech-l] rsync on scap/sync reporting 'no space left on
> device' for a lot of hosts
> Message-ID:
> <CAG5Yvh+pr-=KLLzj2sVucWX5a8PJG1mb=sd+0wc+FK+0KZpybw(a)mail.gmail.com
> >
> Content-Type: text/plain; charset=ISO-8859-1
>
> I just ran scap and saw the following for a lot of hosts:
>
> srv285: rsync: write failed on
> "/usr/local/apache/common-local/php-1.19/cache/l10n/l10n_cache-ab.cdb": No
> space left on device (28)
>
> srv285: rsync error: error in file IO (code 11) at receiver.c(302)
> [receiver=3.0.7]
>
> srv285: rsync: connection unexpectedly closed (2051 bytes received so far)
> [generator]
>
> srv285: rsync error: error in rsync protocol data stream (code 12) at
> io.c(601) [generator=3.0.7]
>
>
> Also, on configchange:
>
> mw21: rsync: write failed on
> "/apache/common-local/wmf-config/CommonSettings.php": No space left on
> device (28)
>
> mw21: rsync error: error in file IO (code 11) at receiver.c(302)
> [receiver=3.0.7]
>
> mw21: rsync: connection unexpectedly closed (37 bytes received so far)
> [generator]
>
> mw21: rsync error: error in rsync protocol data stream (code 12) at
> io.c(601) [generator=3.0.7]
>
>
> Not sure if this is a problem and/or if others are aware/working on it, but
> thought I'd mention it.
>
> --
> Arthur Richards
> Software Engineer, Mobile
> [[User:Awjrichards]]
> IRC: awjr
> +1-415-839-6885 x6687
>
>
> ------------------------------
>
> Message: 9
> Date: Tue, 03 Apr 2012 01:22:10 +0200
> From: Antoine Musso <hashar+wmf(a)free.fr>
> To: wikitech-l(a)lists.wikimedia.org
> Subject: Re: [Wikitech-l] Time to redirect to https by default?
> Message-ID: <jldcat$bto$1(a)dough.gmane.org>
> Content-Type: text/plain; charset=ISO-8859-1
>
> On April 2nd, 2012 at 23:35, Ryan Lane wrote:
> > What's https://secure.wikimedia.org?
>
> Some old experiment. Nothing to see here :-)
>
> --
> Antoine "hashar" Musso
>
>
>
>
> ------------------------------
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
> End of Wikitech-l Digest, Vol 105, Issue 10
> *******************************************
>
Hey all,
Now I have more of the details sorted out, I'd like to invite feedback on
my Google Summer of Code proposed project, entitled *TranslateSvg: Bringing
the translation revolution to Wikimedia Commons *.[1]. Obviously the
deadline for submissions is rapidly closing in, but comments would still be
very welcome, either before or after that deadline.
To quote from my proposal:
> TranslateSvg has the potential to revolutionise the ability of Wikimedia's
> diverse groups of image maintainers to work together creating and improving
> the same communal set of SVG (vector) images. At the moment, providing
> alternative translations of SVG files typically requires "forking" the
> image. This drastically increases the image's maintenance burden and
> thereby discourages image improvement. Where such improvement does take
> place, it is seldom shared between different language versions.
> TranslateSvg would completely change this suboptimal workflow by removing
> the need for the image to be forked; instead, translations (provided using
> a streamlined special page) would be saved inside the image itself, in
> accordance with the SVG 1.1 specification. The file, complete with these
> embedded translations, could then be displayed in either the language of a
> wiki, the user's preferred interface language, or any given arbitrary
> language. If the SVG file were to be served directly, it would helpfully
> display in the user's system language where such a translation was
> available, aiding reuse possibilities. When I originally raised this idea
> it received the support of several Wikimedia Commons users as well as WMF
> developers.
>
Thanks!
Harry (User:Jarry1250)
[1] https://www.mediawiki.org/wiki/User:Jarry1250/GSoC_2012_application
Hello,
This European afternoon, I have done a serie of mistakes that made
Jenkins to build a lot of pending Gerrit changes and submit a failure
message. You can happily discard such message for now.
During some time, we had a faulty test in master that makes any patch
based on it to happily fail the MediaWiki-Test-API job. I have marked
that test broken with change 4159 :
https://gerrit.wikimedia.org/r/4159
Meanwhile, any change based on a commit before 633c454 will be marked by
Jenkins as a failing. If you want to have a worthwile result, you will
want to rebase your change on top of current master.
New changes are not subject to the broken test above. Any failing build
should mean that your patch as an issue :-]
The Jenkins jobs currently only run the PHPUnit tests against a sqlite
backend, more will be added soon (tm).
--
Antoine "hashar" Musso
I just ran scap and saw the following for a lot of hosts:
srv285: rsync: write failed on
"/usr/local/apache/common-local/php-1.19/cache/l10n/l10n_cache-ab.cdb": No
space left on device (28)
srv285: rsync error: error in file IO (code 11) at receiver.c(302)
[receiver=3.0.7]
srv285: rsync: connection unexpectedly closed (2051 bytes received so far)
[generator]
srv285: rsync error: error in rsync protocol data stream (code 12) at
io.c(601) [generator=3.0.7]
Also, on configchange:
mw21: rsync: write failed on
"/apache/common-local/wmf-config/CommonSettings.php": No space left on
device (28)
mw21: rsync error: error in file IO (code 11) at receiver.c(302)
[receiver=3.0.7]
mw21: rsync: connection unexpectedly closed (37 bytes received so far)
[generator]
mw21: rsync error: error in rsync protocol data stream (code 12) at
io.c(601) [generator=3.0.7]
Not sure if this is a problem and/or if others are aware/working on it, but
thought I'd mention it.
--
Arthur Richards
Software Engineer, Mobile
[[User:Awjrichards]]
IRC: awjr
+1-415-839-6885 x6687
Hey,
I am very much interested in the idea of a Taxobox. I have an interesting
method of generating it using a basic Python script. It would gather all
the data using the basic Python application and would be able to store it
in the taxonomy templates and we can display according with respect to the
display templates.
I will look into the merits and de - merits of the Generation of Automatic
Taxobox and you will be receiving my proposal in the following week.
I just need to know whether I am on the right track here?
Regards,
Ashwin
--
Ashwin.S.Ravichandran
For commits with lots of files, Gerrit's diff interface is too broken
to be useful. It does not provide a compact overview of the change
which is essential for effective review.
Luckily, there are alternatives, specifically local git clients and
gitweb. However, these don't work when git's change model is broken by
the use of git commit --amend.
For commits with a small number of files, such changes are reviewable
by the use of the "patch history" table in the diff views. But when
there are a large number of files, it becomes difficult to find the
files which have changed, and if there are a lot of changed files, to
produce a compact combined diff.
So if there are no objections, I'm going to change [[Git/Workflow]] to
restrict the recommended applications of "git commit --amend", and to
recommend plain "git commit" as an alternative. A plain commit seems
to work just fine. It gives you a separate commit to analyse with
Gerrit, gitweb and client-side tools, and it provides a link to the
original change in the "dependencies" section of the change page.
-- Tim Starling
Dear Oren Bochman,
I am very pleased to hear from you.
My familiarity with the requirements *on a scale of 5* are as follows:
1. Java and other programming languages :: * 4.5 *...I have done
courses on Java, C, C++. I have extensively used Python in my projects. I
am very comfortable with the syntax and semantics and understanding
different libraries won't be difficult
2. PHP :: * 3.5 *...I have used php in my project and
am undergoing a course on it in my university.
3. Apache Lucene :: * 2 *...I was not very familiar with this
library until recently. However, I am very much willing to learn this
as soon as possible, and be comfortable with it before the
coding period starts.
4. Natural Language Processing:: * 4 *...Language processing and
Data is my major interest and I have done all my projects on NLP. I have
taken up the course on NLP being offered at coursera.org. NLP is what i
discuss with my professors at my university too.
5. Computational Linguistics and Word net :: * 4 *...I am using the
principles of computational linguistics and the wordnet in my current
project- Automatic essay grader. Also, I have chosen Data Mining as an
elective and am comfortable with the field
I was looking for some clarifications regarding the proposed ideas:
1. Regarding the first project :: "a framework for handling different
languages."...how exactly should we be looking at 'handling' languages?
what kind of frame work is expected?
2. Regarding the second project :: "Make a Lucene filter which uses such
a wordnet to expand search terms."...does this project aim at building
everything from scratch or revamping the existing code?
My understanding of the proposed idea 1 is : "To extract the corpus
from Wikipedia and and to apply the deliverables on them." Please correct
me if I am missing something.
Also, I was wondering if you were thinking of some specific approach or
would it be OK if i come up with an approach and propose the same in my
proposal.
Some more details regarding my Essay Grader project. The grader does take
care of the essay coherence. Spelling and grammar are, as you pointed out
important, but not too informative when it comes to the "relatedness" of
the essay. The essays are also graded based on the structure. We tried to
analyse the statistics of the essay to come up with a measure to grade the
essay structure.
I am very excited about this and am eagerly looking forward to hear from
you.
Thank you.
Best Regards,
Karthik
> Date: Mon, 2 Apr 2012 11:46:21 +0200
> From: "Oren Bochman" <orenbochman(a)gmail.com>
> To: <wikitech-l(a)lists.wikimedia.org>
> Subject: Re: [Wikitech-l] GSOC 2012 - Text Processing and Data Mining
> Message-ID: <017401cd10b5$769f9fb0$63dedf10$@com>
> Content-Type: text/plain; charset="us-ascii"
>
> Dear, Karthik Prasad & Other GSOC candidates.
>
>
>
> I was not getting this list but I am now.
>
>
>
> The GSOC proposal should be specified by the student.
>
>
>
> I'll can expand the details on these projects.
>
> I can answer specific questions you have about expectation.
>
>
>
> To optimally match you with a suitable high impact project - to what
> extent
> are you familiar with :
>
> *Java and other programming languages?
>
> *PHP?
>
> *Apache Lucene?
>
> *Natural Language Processing?
>
> *Corpus Linguistics?
>
> *Word Net?
>
>
>
> The listed projects would be either wrapped as services, or consumed by
> downstream projects or both.
>
>
>
> The corpus is the simplest but requires lots of attention to detail. When
> successful, it would be picked up by lots of
>
> researchers and companies who do not have the resources for doing such CPU
> intensive tasks.
>
> For WMF it would provide us with a standardized body for future NLP work. A
> Part Of Speech tagged corpus would
> be immediately useful for an 80% accurate word sense disambiguation in the
> search engine.
>
>
>
> Automatic Summaries are not a strategic priority AFAIK -
>
> 1. most articles provide a kind of abstract in their intro and
>
> 2. there are something like this already provided in the dumps for
> yahoo.
>
> 3. I have been using a great pop up preview widget in Wiktionary for
> a
> year or so.
>
>
>
> I do think it would be a great project to learn how to become a MediaWiki
> developer but is small for a GSOC.
> However I cannot speak for Jebald and other mentors in cellular and other
> teams who might be interested in this.
>
>
>
> If your easy grader is working it could be the basis of another very
> exciting GSOC project aimed at article quality.
>
> A NLP savvy "smart" article quality assessment service could improve/expand
> the current bots grading articles.
> Grammar and spelling are two good indicators, features. However a full
> assessment of Wikipedia articles would
> require more details - both stylistic and information based. Once you have
> covered sufficient features
> building discriminators based on samples of graded articles would require
> some data mining ability.
>
>
>
> However since there is an Existing bot, undergoing upgrades we would have
> to check with its small dev team what it currently doing
>
> And it would be subject to community oversight.
>
>
>
> Yours Sincerely,
>
>
>
> Oren Bochman
>
>
>
> MediaWiki Search Developer
>
>
Hi,
I am a Pywikipedia committer, but by this time I have committed to trunk
branch only. This was the first time I wanted to upload a new script to
rewrite branch, because i18n keyword files belong to rewrite and appear as
externals in trunk.
The error message is:
Access to
'/svnroot/pywikipedia/!svn/act/d1b70ddb-06c1-f64f-9b60-46de4e2db7ca'
forbidden
What is the problem? How can I know if I have the correct rights to that
branch?
(Pywikipedia is still on SVN.)
--
Bináris