The mail is quite long, I 'included' some php code, and a small 'article' at the end. I did it like that rather than to include files...
-----Message d'origine----- De: Toby Bartels [SMTP:toby+wikipedia@math.ucr.edu] Date: lundi 31 mars 2003 07:02 À: wikitech-l@wikipedia.org Objet: Re: [Wikitech-l] Using TeX
Michel Mouly wrote:
However, when I looked to the details, I found (to my taste) too many limitations with the followed approach. Then I wrote an extension to outputpage.php to replace the call to texvc by calls to pdflatex then imagemagick convert. This has reached the stage of minimal functionality (translation and caching). This allows me to have full maths, without having to remember what is supported or not, or modified, compared to LaTeX. In addition, we intend to use it for small music scores (using
Mu-
sixTeX or possibly PMX), and I have in mind some other uses of LaTeX.
Is your version safe against DoS attacks with long scripts?
I confess my ignorance on the topic. But I'm ready to learn. Maybe it is relevant to mention that, contrarily to texvc, the text to compile is not in the DOS calls: the script writes files and DOS lines are pretty standard.
Is it safe against running TeX commands that access files?
Safety is a real problem, I agree. I did not look in any details to the question with LaTeX. The small application I'm trying to set up with some friends is (or will be, I've still this problem with a blank page return after submit), I hope, sufficiently safe for reasons independent from the php scripts. Maybe naive...
OTOH, does it allow inclusion of additional TeX packages (like Xypic) with a simple modification to the code opening up the package?
Well, this can be done already just modifying the 'header.tex' file (included). That is what I will do for music. My idea (see the article) is that different markups to choose between header files.
BTW, using drawing packages like Xypic is also on my agenda, see article.
If so, then some of us (me and AxelBoldt, I guess) might well prefer your code to the current texvc -- at least when producing PNG output instead of HTML.
Please understand that I'm not trying to (re)open a debate. I've read
the
"math markup" page on meta.wikipedia (and music markup as well), and I'm aware of (some of) the drawbacks of the approach I followed. I
mentionned
what I did just in conformance with GPL: if there is any interest in
this
small piece of code (which I doubt, it's rather trivial!), just say it.
I'd like to see the diff to see just what you took away from texvc.
I include the relevant part of outputpage.php. It is basically scratch code, to check if the idea is viable. At least error handling requires further work. As you will see, I just 'mimicked' texvc (same input format, same output format) and kept the rest of the code.
I include header.tex (the very basic and trivial one), for completion.
I also include a text I prepared with in mind the possibility to put it somewhere in wikipedia or meta wikipedia; I'm too new in the business to decide whether this is valuable, or where exactly to put it. Consider it backgroupd information. It deals with 'source' for images or sounds: one of my problems in my small project is music, and allowing others to modify scores is important. LaTeX provides those tools as well.
An important point, hinted at in the text, is that compiling the 'source' on the wikipedia site is not really necessary (though definitely useful). Then security aspects should be less a problem, as well as computing load (going through pdflatex is quite long on my machine).
-- Toby _______________________________________________ Wikitech-l mailing list Wikitech-l@wikipedia.org http://www.wikipedia.org/mailman/listinfo/wikitech-l
This the beginning of outputpage.php; the rest is exactly as in the normal script. The modifications are
1) first step, encapsulating the call to texvc;
2) second step, function 'fulltex', same input format, same output format as the function encapsulting texvc. The functionality is very slightly different: the '$' are included in the call to fulltex, so that it can be used for text in non math mode.
First comes the 4-line 'header.tex'.
<header.tex> \documentclass[12 pt]{article} \pagestyle{empty} \begin{document} \LARGE </header.tex>
<code>
# See design.doc
function linkToMathImage ( $tex, $outputhash ) { global $wgMathPath; return "<img src="".$wgMathPath."/".$outputhash.".png" alt="".wfEscapeHTML($tex)."">"; }
function texvc($tex) { global $wgMathDirectory, $wgTmpDirectory, $wgInputEncoding; $cmd = "./math/texvc ".escapeshellarg($wgTmpDirectory)." ". escapeshellarg($wgMathDirectory)." ".escapeshellarg($tex)." ".escapeshellarg($wgInputEncoding); return(`$cmd`); }
function fullTeX($tex) #same output syntax as texvc, generation done via pdfLatex and Imagemagick convert { global $wgMathDirectory, $wgTmpDirectory, $wgInputEncoding; global $wgPdflatex, $wgconvert;
#wgInputEncoding not taken into account, assumed to be compatible with pdfLatex
#if (!isset($wgPdflatex)) $wgPdflatex = "pdflatex -quiet -halt-on-error -interaction batchmode -output-directory $wgTmpDirectory"; # chdir() ok, while -output_directory leads to pb (pdflatex can't find its own .aux!!) if (!isset($wgPdflatex)) $wgPdflatex = "pdflatex -quiet -halt-on-error -interaction batchmode"; if (!isset($Convert)) $wgConvert = 'C:\Programs\ImageMagick-5.5.6-Q16\convert'; $headerfilename = "$wgMathDirectory/header.tex"; #$headerfilename = 'header.tex'; $header = fopen($headerfilename, 'r');
$md5 = md5($tex); $filename = "$wgTmpDirectory/$md5";
$fp = fopen($filename.".tex", 'w+');
fwrite($fp, fread($header, filesize ($headerfilename)));
fwrite($fp, "$tex\r"); fwrite($fp, "\end{document}"); fclose($fp);
$backupcwd = getcwd(); chdir($wgTmpDirectory); $cmd = "$wgPdflatex $filename.tex"; $res = `$cmd`;
#todo: test if error; OK if empty (thanks to option -quiet)
#$cmd = "$wgConvert $filename.pdf -trim -bordercolor white -border 5 x 5 $wgMathDirectory/$md5.png"; $cmd = "$wgConvert $filename.pdf -trim $wgMathDirectory/$md5.png"; $res = `$cmd`;
#todo: test if error; OK if empty
#todo : delete temporary files (should be kept for debug and error) chdir($backupcwd); # don't know if needed, certainly cleaner return ("+$md5"); }
function renderMath( $tex ) { global $wgUser, $wgMathDirectory, $wgTmpDirectory, $wgInputEncoding; $mf = wfMsg( "math_failure" ); $munk = wfMsg( "math_unknown_error" );
$fname = "renderMath";
$math = $wgUser->getOption("math"); if ($math == 3) return ('$ '.wfEscapeHTML($tex).' $');
$md5 = md5($tex); $md5_sql = mysql_escape_string(pack("H32", $md5)); if ($math == 0) $sql = "SELECT math_outputhash FROM math WHERE math_inputhash = '".$md5_sql."'"; else $sql = "SELECT math_outputhash,math_html_conservativeness,math_html FROM math WHERE math_inputhash = '".$md5_sql."'";
$res = wfQuery( $sql, $fname ); if ( wfNumRows( $res ) == 0 ) { # $cmd = "./math/texvc ".escapeshellarg($wgTmpDirectory)." ". # escapeshellarg($wgMathDirectory)." ".escapeshellarg($tex)." ".escapeshellarg($wgInputEncoding); # $contents = `$cmd`;
### $contents = texvc($tex); $contents = fullTeX("$$tex$");
if (strlen($contents) == 0) return "<b>".$mf." (".$munk."): ".wfEscapeHTML($tex)."</b>"; $retval = substr ($contents, 0, 1); if (($retval == "C") || ($retval == "M") || ($retval == "L")) { if ($retval == "C")
<\code>
<article Non text elements in Wikipedia>
This discusses how to handle non text elements in wikipedia pages, such as images, sounds, or math formulae. More precisely, this advocates the possibility to have the 'source code' of such elements, so that they can be modified as easily (almost!) as the text can be modified.
Akin ideas have been discussed in the past (math markup, SVG support, chess talk page, ...). I did not looked everywhere (by far!), so the ideas propounded herein are likely not original! If they are, the key aspect is that the proposed scheme is general, not specific to one domain, whether it be math formula, chessboards or vectorised images.
The present state
Documents can already include different types of material, namely text, images and sounds.
For texts a 'source file', according to a special syntax, is uploaded, and is 'compiled' (i.e., translated in HTML) by the wikipedia site.
Images and sounds are simply uploaded. They are either included in the text (images) or available for links (images and sounds).
There is an intermediate case, that of mathematical formula. They are included in the visible page as images, but the 'source' is uploaded and compiled by the site. Another peculiarity is that the 'source' is embedded in the text 'source'. And still another one is that a special syntax is to be used (derived from TeX, but not TeX).
That the images and sounds are uploaded 'as is' is, IMHO, in contradiction with the general goal of wikipedia, in particular easiness to modify.
In many cases, sounds and images have been, or could, be generated from a 'source'. Making this 'source' available would have many advantages:
* it would allow for free modifications, in conformance with the general spirit; * it would allow more or less automatic eventual change to another format (e.g., extensions of HTML); * it provides ready-to-use examples to other images/sounds/maths.
Let us take an example. Chess positions. This is done at present with png images. They are quite nice, I agree, but how to modify them? How to add new ones in the same style as the existing ones? Simply because the images are difficult to reproduce, a set of pages become difficult to extend upon. Either a different style of drawing is used, and the result is not professional, or somebody becomes an unavoidable intermediate! The talk page of the chess article shows such concerns.
Imagine now a simple source code to draw chess positions (this exists in LaTeX). To recipe for creating new drawings is obvious and style is consistent. No blocking.
(To complete the example, source code for a chess position with LaTeX could be (taken from LaTeX graphics companion):
\usepackage{chess} \board{B* * * KR} {*r* * *R*} {* b p p} { *P*k*P*} {*p* P *p} { P *P* P} {* *N*N* }
Ok, this looks a bit esoteric, but this is a simple matrix, with uppercase for white and lowercase for black, p for pawn, k for king, n for knight, and so on. The result is a very nice and professionally looking drawing. Don't tell me the source is more esoteric or difficult to use than, say, HTML.)
How to upload the source?
The case of math formula provides one approach: to embed the source in the page text, with a special markup.
This raises then the issue of the generation of the 'compiled' version. In the case of math, this is done by the site. This offers the advantage to the users that they don't have to install anything. On the other hand, this requires that the generation software is installed on the site, thus limiting freedom, and consumes some site resources (who does consider that the response time is short enough??), in particular in the case of successive corrections, e.g., to correct syntax errors.
The other possibility is to ask the user to upload both the source and the result. This is more complex for the user, mainly because this requires the software, but this allows for checking prior upload (less load on the side, and possibly, all taken into account, less operations for the user).
In practice (for the user), this consists in extending the upload page to include:
* the result; * (optional) the source; * when not obvious, a description of the 'compiling' method (e.g., texvc, pdflatex with such or such header then imagemagick convert, povray 3.5).
Conversely, clicking on a drawing (for instance) opens a page more or less as the present one, extended with the source and the compiling indications, plus the possibility to edit the source code (exactly as for a text page).
Embedding in the page text can still be a possibility (better for math than for images for instance), but either has to be limited to what the site can compile, or has to be coupled with the upload of the result.
Which formats are acceptable?
Ideally, the source format should be such that:
* it is in plain text; * it is public, free of copyrights or other constraints; * it is already in use; * at least one free version of a 'compiler' is easily available, easy to install, and easy to use for as many platforms as possible; * it must be as secure as possible (to prevent carrying nasty code).
IMHO, texvc does not respect all the conditions.
Examples (in my limited knowledge) that do respect them include :
* music (scores): lilypond, musixtex; * music (sound) : midi; * math : LaTeX; * images : povray (security??), drawing packages in LaTeX;
Browsing through LaTeX drawing packages, one can see the potential richness of such a scheme. Could be mentioned, in any order, board games, card games, graphs, Feynman diagrams, chemical diagrams, electrical diagrams, ...
Should the list of formats be explicitly prescribed?
IMO, no. Wikipedia is assumed to be self-regulating. If a format is considered wrong, somebody can transcribe it in something more appropriate.
<\article>