Nick,
Your idea assumes that the "lag" problem is due to overloading a single machine, which plays double roles: database backeand and web server. So, if we divide the work amoung 2 or more machines, you expect faster throughput. Right?
(I'm just repeating the obvious to make sure that what's obvious to me, is what you really meant!)
I guess if we all pitch in $50 each we can buy another machine. Where should I send my money?
Ed Poor
As per Mav's request, the "View article" link is now context-dependent
(code in CVS). "View Wikipedia page" for Wikipedia: pages was too long,
so I use "View meta page" instead. I know we already have meta.wiki, but
nobody suggested anything better (even though I asked), and technically,
these are meta pages (pages about the encyclopedia).
I have included a German translation, but this *will* break translations
in other Wikipedias (i.e. English text will be shown). To fix this, text
for articlepage, imagepage, wikipediapage, userpage needs to be included
in Language??.php instead of the text for subjectpage.
Regards,
Erik
--
FOKUS - Fraunhofer Insitute for Open Communication Systems
Project BerliOS - http://www.berlios.de
I made texvc render nicely-fomatted HTML sums products and integrals,
using tables.
Such HTML code would be insane to write by hand.
Anybody has any idea how to render fractions in HTML ?
tables with <hr> work but aren't really nice.
Maybe if <hr> was made to produce simpler line ...
Have fun.
Here is first version of TeX rendering extension to Wikipedia.
It's not production code yet.
Please comment.
= How does it work =
New preference is introduced which says whether to:
* always render images as PNGs
* render them as HTML if they are simple enough, or as PNGs otherwise.
* leave them as pseudo-TeX (mainly for text browsers where neither PNG
nor HTML rendering would be visible)
ISSUE 1: While HTML reduces bandwidth, it is much uglier, so default is PNG-only.
ISSUE 2: PNGs are rendered with "a bit too big" font. That's on purpose.
A big too big works well in big and medium resolution, and is still readable
in small resolution. But "a bit too little" with big resolution would be very
hard to read.
Also new table is introduced:
CREATE TABLE math (
math_inputhash char(32) NOT NULL,
math_outputhash char(32) NOT NULL,
math_html text NOT NULL,
UNIQUE KEY math_inputhash (math_inputhash)
);
math_inputhash is MD5 of input markup, math_outputhash
is MD5 of output markup, math_html is HTML rendering or ""
if it's too difficult for HTML.
ISSUE 3: MD5 should be stored in binary in final version.
OutputPage.php calls renderMath() for every occurence of <math></math>
in code. If user decided he likes pseudo-TeX, then that's the end.
Otherwise it checks in database whether it is already rendered or not.
If it is, then it either takes HTML or generates link to image.
ISSUE 4: Directory for math images should be configurable and it should be
also known to texvc (command line ? compilation option ?).
It should not be upload directory.
ISSUE 5: Maybe it should use a/ab/ab*.png like other images. Or maybe
Wikipedia servershould move to reserfs.
ISSUE 6: Image should have ALT= tag
If image/html isn't generated yet, texvc is called. If it fails, message
is generated.
ISSUE 7: this message should be localized.
ISSUE 8: texvc shouldn't be in cgi-bin or care should be taken it can't
be called with any evil options.
Depending on return value of texvc results are generated and put into table
for caching.
ISSUE 9: failures are not cached. In final version they should be cached,
but cleaned on every upgrade of texvc (which may support more
TeX than previous version).
Now texvc takes input in first argument.
ISSUE 10: I'd rather use stdin but proc_open (popen2 for perl hackers) appears
only in PHP 4.3, but PHP 4.2 is still the standard
Then it LALR-parses it. What it parses is not real TeX. If HTML contains &foo;
and TeX doesn't, this preudo-TeX will contain \foo anyway. This ensures
that it's very easy to use.
Then it is standarized and md5 of standarized version computed.
ISSUE 11: race condition of 2 runs of texvc trying to generate the same PNG,
will have to be investigated
ISSUE 12: texvc should check here whether output PNG already exists (HTML is fast
to generate so it doesn't hurt to regenerate it). It may happen not only
in case of race condition, but also if it was generated from different
input markup (say from "x + y", and we do it from "x+y" now)
Then it prints md5 and HTML (if any) on stdout.
ISSUE 13: PHP should not wait for texvc to finish from this point. texvc should
probably fork() here.
Now latex, dvips and convert (which in turn uses ghostscript) are called.
ISSUE 14: Latex creates some temporary files. They should be created in some
tmp/ directory, not in current directory.
Version 2:
* it doesn't generate PNG if it already exists
* it supports more TeX
* TeX used as a shell argument is escaped properly now
ISSUE 18:
Escaping from further passes is very important.
Following things must me protected from interpretation:
* TeX markup if uses choses "leave it as TeX" mode
* contents of alt= tags
* Messages saping that some TeX couldn't be generated.
Should I just generate unique character strings like <nowiki> does ?
ISSUE 19:
I'm not sure if \mbox is right thing to have.
Isn't it meant to do things other than math ?
I'm no TeX expert so please comment.
ISSUE 20:
How ugly HTML will we allow ?
Sums and integrals in HTML will be redable but extremely ugly.
Well, we could implement more sophisticated HTML rendering algorithms,
for example using a table for that.
But I'm not sure if it's worth the effort.
ISSUE 21:
I'm using:
http://www.fi.uib.no/Fysisk/Teori/KURS/WRK/TeX/symALL.htmlhttp://pl.wikipedia.org/wiki/Encje_HTML
to get list of things we should support.
Is there some better list ?
>From Village pump on en.wiki:
Getting DB errors on my watchlist as follows:
A database query syntax error has occurred. This could be because of an
illegal search query (see Searching Wikipedia), or it may indicate a bug
in the software. The last attempted database query was:
SELECT DISTINCT wl_page,talk.cur_id AS id,talk.cur_namespace AS
namespace,talk.cur_title AS title, talk.cur_user AS
user,talk.cur_comment AS comment,talk.cur_user_text AS user_text,
talk.cur_timestamp AS timestamp,talk.cur_minor_edit AS
minor_edit,talk.cur_is_new AS is_new FROM cur as page, cur as talk,
watchlist WHERE wl_user=5862 AND wl_page=page.cur_id AND
page.cur_title=talk.cur_title AND talk.cur_namespace |
1=page.cur_namespace | 1 ORDER BY talk.cur_timestamp DESC LIMIT 50
from within function "wfSpecialWatchlist". MySQL returned error "1030:
Got error 127 from table handler".
Help! -Martin
--
FOKUS - Fraunhofer Insitute for Open Communication Systems
Project BerliOS - http://www.berlios.de
Brion,
Thanks for explaining the page lifecycle. I have clipped this diagram for reference.
> Here's a diagram of what exists in what tables over the lifetime of such
> an event:
>
> Page creation:
> rev A -> cur
>
> Later edit:
> rev B -> cur
> rev A -> old
>
> Deletion
> rev B -> archive (hidden)
> rev A -> archive (hidden)
>
> New creation with same title:
> rev C -> cur
> rev B -- archive (hidden)
> rev A -- archive (hidden)
>
> Later edit:
> rev D -> cur
> rev C -> old
> rev B -- archive (hidden)
> rev A -- archive (hidden)
>
> Restoration of deleted revisions:
> rev D -- cur
> rev C -- old
> rev B -> old
> rev A -> old
Gratefully,
Ed Poor
What is the desired behavior? If someone creates an article that was
previously deleted, should the articles previous history get restored?
--
Geek House Productions, Ltd.
Providing Unix & Internet Contracting and Consulting,
QA Testing, Technical Documentation, Systems Design & Implementation,
General Programming, E-commerce, Web & Mail Services since 1998
Phone: 604-435-1205
Email: djw(a)reactor-core.org
Webpage: http://reactor-core.org
Address: 2459 E 41st Ave, Vancouver, BC V5R2W2
Namespaces appear to exist so that actual encyclopedia articles can be
distinguished from everything else. Instead of namespaces, could
Wikipedia live with a flag that said "This is an article" for each
article?
Jonathan
--
Geek House Productions, Ltd.
Providing Unix & Internet Contracting and Consulting,
QA Testing, Technical Documentation, Systems Design & Implementation,
General Programming, E-commerce, Web & Mail Services since 1998
Phone: 604-435-1205
Email: djw(a)reactor-core.org
Webpage: http://reactor-core.org
Address: 2459 E 41st Ave, Vancouver, BC V5R2W2