On Mon, Jan 3, 2011 at 11:59 AM, Brion Vibber brion@pobox.com wrote:
phpQuery itself builds on the DOM module already in PHP, so be aware that using it for this purpose is equivalent to using DOM & Xpath functions already available.
For one thing this means that HTML will have to be run through the libxml2 HTML parser (which I have found is very sketchy with perfectly legal implied close tags and such). In addition to memory and performance concerns of parsing the whole document into a DOM tree and reserializing it, you might not get back the structure you put in... hopefully no surprises but keep an eye out.
In theory, this problem should go away in a few years when everyone converges on HTML5 parsing. I think you can get a PHP HTML5 parser, which is compatible with browser parsing, but the performance probably isn't so good, and I don't know how well-maintained it is. ("Compatible with browser parsing" means "identical to Firefox 4 and WebKit nightly parsing, and compatible enough with how they used to parse things that no appreciable number of sites have broken in the new browser versions".)
That said, we do generally output well-formed XML or something quite close to it, so the cases where PHP's DOM library will do something unexpected should be reasonably limited.
I thought we had compatibility problems with users who didn't have the DOM module installed, including default RHEL5 configuration IIRC? Or was that something else?