On Mon, Mar 1, 2010 at 10:10, Domas Mituzas midom.lists@gmail.com wrote:
Howdy,
Most of the code in MediaWiki works just fine with it (since most of it is mundane) but things like dynamically including certain files, declaring classes, eval() and so on are all out.
There're two types of includes in MediaWiki, ones I fixed for AutoLoader and ones I didn't - HPHP has all classes loaded, so AutoLoader is redundant. Generally, every include that just defines classes/functions is fine with HPHP, it is just some of MediaWiki's startup logic (Setup/WebStart) that depends on files included in certain order, so we have to make sure HipHop understands those includes. There was some different behavior with file including - in Zend you can say require("File.php"), and it will try current script's directory, but if you do require("../File.php") - it will
We don't have any eval() at the moment, and actually there's a mode when eval() works, people are just scared too much of it. We had some double class definitions (depending on whether certain components are available), as well as double function definitions ( ProfilerStub vs Profiler )
One of major problems is simply still not complete function set, that we'd need:
- session - though we could sure work around it by setting up our own Session abstraction, team at facebook is already busy implementing full support
- xdiff, mhash - the only two calls to it are from DiffHistoryBlob - so getting the feature to work is mandatory for production, not needed for testing :)
- tidy - have to call the binary now
function_exists() is somewhat crippled, as far as I understand, so I had to work around certain issues there. There're some other crippled functions, which we hit through the testing...
It is quite fun to hit all the various edge cases in PHP language (e.g. interfaces may have constants) which are broken in hiphop. Good thing is having developers carefully reading/looking at those. Some things are still broken, some can be worked around in MediaWiki.
Some of crashes I hit are quite difficult to reproduce - it is easier to bypass that code for now, and come up with good reproduction cases later.
Even if it wasn't hotspots like the parser could still be compiled with hiphop and turned into a PECL extension.
hiphop provides major boost for actual mediawiki initialization too - while Zend has to reinitialize objects and data all the time, having all that in core process image is quite efficient.
One other nice thing about hiphop is that the compiler output is relatively readable compared to most compilers. Meaning that if you
That especially helps with debugging :)
need to optimize some particular function it's easy to take the generated .cpp output and replace the generated code with something more native to C++ that doesn't lose speed because it needs to manipulate everything as a php object.
Well, that is not entirely true - if it manipulated everything as PHP object (zval), it would be as slow and inefficient as PHP. The major cost benefit here is that it does strict type inference, and falls back to Variant only when it cannot come up with decent type. And yes, one can find offending code that causes the expensive paths. I don't see manual C++ code optimizations as way to go though - because they'd be overwritten by next code build.
The case I had in mind is when you have say a function in the parser that takes a $string and munges it. If that turns out to be a bottleneck you could just get a char* out of that $string and munge it at the C level instead of calling the PHP wrappers for things like explode() and other php string/array munging.
That's some future project once it's working and those bottlenecks are found though, I was just pleasantly surprised that hphp makes this relatively easy.
One large practical upshot of this is though that hacky things like the parser which are the way they are because that's how you optimize this sort of thing in PHP could be written in some babytalk version of PHP that produces a real parse tree; It would be slower in pure php but maybe hphp's speed could make up for it.
Then you could take that component & compile it to C++ (maybe with some manual munging) and make libmediawiki-parse++ which, that would be quite awesome :)