I have a program whih reads something like this:
string text_in, text_out; tex_tree tree; int hash;
text_in = read_whole (stdin); tree = parse (text_in); if (tree == null) { print ("failure " + text_out); return; } text_out = regenerate (tree); hash = md5(text_out); if (!file_exists(filesystem_path + string(hash) + ".png")) if (fork() == 0) { // validation ensures that latex won't fail too often call_latex_to_generate_that_file(text_out, filesystem_path + hash + ".png"); _exit(); } print ("<img alt=" + text_in + " src = " + http_path + md5 + ".png>"); return;
Now, the problem is - it can be external program, but 1 fork/exec is required per TeX equation. Parsing and checking if image is already generated is very fast, but all these fork/execs are unnecessary overhead. I see a few solutions: * pass all equations at once, just 1 fork/exec per article would be neccesary. * make a table of already generated TeX - "input TeX, hash(output TeX)", and let php code check if image already exists without having to parse TeX. The problem is, database is quite slow, and these computations are rather simple. * make a table of "hash(input TeX), hash(output TeX)". I don't know if it will be significantly faster * providing symlinks from (path + hash(input TeX) + ".png") to (path + hash(output TeX) + ".png") this way PHP can check if image is already generated. If image is generated but from different input TeX (for example "x+y" is already generated, we ask for "x + y"), then that program will just add a symlink. * change that program into shared library. That was my initial idea, but it seems to be quite hard with PHP. * rewrite that program in PHP (will be slow, unless PHP features LALR parsers generator)
wikitech-l@lists.wikimedia.org