zetawoof wrote:
You'll do much better with a purpose-built lexical
analyzer. If
nothing else, this'll work when titles and tags span across buffers -
If they span buffers, the xml parsing libs choke as well, just for your
information. I know, I've seen them do it.
I have a version that does not choke on buffer spanning, it buffers
underneath. I just posted that as an example.
Jeff
something which the previous program will choke on
on.
Released into the public domain as a trivial example program. Save as
'titleparse.l' and 'make titleparse'.
%option noyywrap
%{
#include <stdio.h>
%}
%x TITLE
%%
<INITIAL>"<title>" { BEGIN TITLE; }
<TITLE>"</title>" { BEGIN INITIAL; putchar('\n'); }
<TITLE>"<" { putchar('<'); }
<TITLE>">" { putchar('>'); }
<TITLE>""e;" { putchar('"'); }
<INITIAL>.|\n /* ignored */
%%
int main(int argc, char *argv[]) {
yylex();
exit(0);
}