zetawoof wrote:
You'll do much better with a purpose-built lexical analyzer. If nothing else, this'll work when titles and tags span across buffers -
If they span buffers, the xml parsing libs choke as well, just for your information. I know, I've seen them do it. I have a version that does not choke on buffer spanning, it buffers underneath. I just posted that as an example.
Jeff
something which the previous program will choke on on.
Released into the public domain as a trivial example program. Save as 'titleparse.l' and 'make titleparse'.
%option noyywrap
%{ #include <stdio.h> %}
%x TITLE
%%
<INITIAL>"<title>" { BEGIN TITLE; }
<TITLE>"</title>" { BEGIN INITIAL; putchar('\n'); } <TITLE>"<" { putchar('<'); } <TITLE>">" { putchar('>'); } <TITLE>""e;" { putchar('"'); } <INITIAL>.|\n /* ignored */
%%
int main(int argc, char *argv[]) { yylex(); exit(0); }