On Mon, Jan 3, 2011 at 11:59 AM, Brion Vibber <brion(a)pobox.com> wrote:
phpQuery itself builds on the DOM module already in
PHP, so be aware that
using it for this purpose is equivalent to using DOM & Xpath functions
already available.
For one thing this means that HTML will have to be run through the libxml2
HTML parser (which I have found is very sketchy with perfectly legal implied
close tags and such). In addition to memory and performance concerns of
parsing the whole document into a DOM tree and reserializing it, you might
not get back the structure you put in... hopefully no surprises but keep an
eye out.
In theory, this problem should go away in a few years when everyone
converges on HTML5 parsing. I think you can get a PHP HTML5 parser,
which is compatible with browser parsing, but the performance probably
isn't so good, and I don't know how well-maintained it is.
("Compatible with browser parsing" means "identical to Firefox 4 and
WebKit nightly parsing, and compatible enough with how they used to
parse things that no appreciable number of sites have broken in the
new browser versions".)
That said, we do generally output well-formed XML or something quite
close to it, so the cases where PHP's DOM library will do something
unexpected should be reasonably limited.
I thought we had compatibility problems with users who didn't have the
DOM module installed, including default RHEL5 configuration IIRC? Or
was that something else?