On Tue, Mar 23, 2010 at 2:00 PM, Damon Wang damonwang@uchicago.edu wrote:
I've been writing projects for university and for a computer lab I work at, but it's mostly small, one-off sysadmin things and usually the emphasis is more on "xyz server has to be back up before we open tomorrow" than writing good, clean code. So, yes, I'd welcome other suggestions.
Cool! So, I'm assuming you're looking forward to an opportunity to write good, clean code as a summer project. :)
There are ways to make [Python-based extensions] run faster if performance
is a concern. For example, mod_python or mod_wscgi, or explicitly pulling the Python out into a standalone daemon that listens for requests from the webserver.
Personally, I'd avoid trying to make that pitch for a GSoC project. While you're right that Python is a pretty defensible choice when embarking on a large project, trading one dependency for another for this size/scale of project won't be as compelling as eliminating a dependency altogether.
Of course, as I say that, I see Platonides disagrees with me here. Choosing Python is not a huge disadvantage in this context, but it's not going to have the same unanimous(-ish) approval of using PHP.
Another possibility be writing it in C to avoid all interpreter overhead, and using a foreign function interface. Unfortunately, I'm not familiar with PHP's FFI. Google takes me to http://wiki.php.net/rfc/php_native_interface which seems to think that as of a year ago there weren't any good ones, but this doesn't look too painful: http://theserverpages.com/php/manual/en/zend.creating.php
I think straight PHP would be fine for this particular project. The downside of a C implementation is that, while its almost certainly going to have the best performance characteristics, it also makes it more likely to fall into disrepair and be a possible source of buffer overruns and other security issues.
The nice thing about a PHP port (if done correctly) is that it would be a trivial install for small wikis and Wikipedia alike. That translates into more usage, which in turn translates into higher likelihood that it stays maintained.
That said, there have got to be a ton of projects that could benefit from PHP->native C bindings. I'm going to leave it to some other folks to suggest projects in this area.
I'm most familiar with Python and C, for whatever that's worth coming from an undergrad who didn't know Python existed five years ago. I learned PHP to maintain the web interfaces of an in-house print system at work, but I haven't used it for anything as involved as what we're discussing here. So, in terms of productivity, yes, if I have to work in PHP my mentor will probably get asked a few more newbie questions.
In terms of happiness, though, it'd be a great opportunity to dig into PHP and finally learn to use it as more than really smart CSS with a database connection. Although I prefer Python or even C because I think I'd be more useful, I wouldn't be very upset at all if it turned out you guys were willing to let me learn PHP on your time.
There's a few Python-based things that might be interesting, but I think you'll get a lot more love for doing something in PHP or C. Since this is a student internship, you shouldn't be bashful about using this as a learning opportunity.
I'd only caution against convincing yourself (and us) that you'll be more interested in learning something like PHP than you truly are. It might help you land a spot, but it will work against you in having a successful project, and this has such high visibility that you'll really want to be successful. So, if you find yourself thinking about doing this in PHP and having your inner voice say "meh", then I'd recommend sticking to your guns and propose doing this or something else in Python and/or C.
- Are you zeroing in on <math> parsing and parsing in general because
that's an area that you're already developing expertise in and/or are
deeply
interested in getting into, or is that just something that looked kinda interesting to learn about relative to other opportunities you
considered?
I like the <math> parsing project because it seems well-suited for a third-year undergrad who knows LaTeX and reads a few other functional languages and has studied lex/yacc before in his coursework. The goals are clear, and I know how to break them down into smaller problems and how to tackle each one. It's a little isolated from the rest of Mediawiki, so I don't need to grok the entire code base.
Basically, this looks like a way to make a concrete contribution despite being a newcomer to the project. That doesn't mean I'm not happy to entertain alternatives, just that they have a pretty high bar to clear.
This is a really smart way of thinking about this, so that's great that you're thinking the right way about the project scope. I agree with you that finding something reasonably well-contained is going to be the best strategy for success.
- Are you coming at this as someone who is already deep into
Wikipedia/MediaWiki usage who is looking to resolve particular things
(like
<math> parsing) that are painful as an end user, or are you more casually involved and more interested in applying in this project because it looks like we've got a lot of interesting programming problems to solve?
The second. I just want to tackle a problem that's near but not quite beyond my limits, and if I can help out a site I use daily, so much the better.
Wonderful! Great reason to get involved!
Rob