This version fixes a few minor problems and adds support for even more TeX.
Now it compiles more than 99% of legal equations from PlanetMath corpus.
= TeX =
Of 6302 tests:
* 5574 (88.4%) passed
* in 5136 (92.1%) cases it was able to produce HTML
* in 438 (7.9%) cases it couldn't
* 1 (0.0%) failed due to lexer
* 11 (0.2%) failed due to parser
* 716 (11.3%) failed due to 188 unknown \-codes
* 179 \-codes in 684 (10.8%) equations were illegal (couldn't be compiled
by LaTeX with ams* packages)
* 9 \-codes in 35 (0.6%) equations were legal
* So 47 (0.7%) failed due to reasons other than genuinely illegal \-codes
* So of 5618 equations not containing genuinely illegal \-codes, 5571
could be compiled by texvc, what makes up for 99.15%, more than
promised 99%.
= Performance =
It got some 20% slower since version 12. Yet it's still fast as hell.
Should you ever want to use texvc to process a few megabytes of TeX code
per second, there are many obvious ways how performance could be improved
even more.
real 0m0.244s
user 0m0.210s
sys 0m0.000s
real 0m0.240s
user 0m0.240s
sys 0m0.000s
real 0m0.236s
user 0m0.240s
sys 0m0.000s
Main changes:
* matrices support
* more characters inside \mbox are allowed
* more AMS symbols supported
I think texvc is ready for Wikipedia now, but I'd prefer if it
received more testing before that happens.
The only thing that yet has to be done is to integrate math.sql
with the rest of SQL code for creation/upgrade of Wikipedia database.
Wikipedia.org suddenly became inaccessible a few minutes ago. This was shortly after I locked a page, hoping to quell an NPOV dispute.
I'm not assuming that NPOV vandals are attacking the server in response to my action, but I'm not ruling it out either :-(
Anyway, Brion, could you please restart the server or something?
Ed Poor <-- writing at 11:24 A.M. Tuesday, U.S. East Coast timezone
Well, how about supplying a complete printout of all the queries we
currently use. You could start with the queries needed to view a typical
page.
Some pages have no links to other pages. I would *guess* this would get
loaded faster on average than pages with links.
Ed Poor
-----Original Message-----
From: Jonathan Walther [mailto:krooger@debian.org]
Sent: Tuesday, December 10, 2002 8:44 AM
To: wikitech-l(a)wikipedia.org
Subject: Re: [Wikitech-l] Languages & namespaces
On Mon, Dec 09, 2002 at 01:53:19PM -0800, Brion Vibber wrote:
>Surely it would take two:
>*fetch current page's contents, view count, restrictions, and last
>edited date
>*check existence, size/type of all linked pages
Add a third one: we need to know if the user has read his Talk page
since it was last modified.
For the second one, don't we just need to check the existance and type
of the linked pages? Why bother about the size?
We do also need to get the size of the primary page.
What do the "restrictions" on a page consist of? Whether it is
readable or editable?
Still, that is a 2/3 savings.
Jonathan
--
Geek House Productions, Ltd.
Providing Unix & Internet Contracting and Consulting,
QA Testing, Technical Documentation, Systems Design & Implementation,
General Programming, E-commerce, Web & Mail Services since 1998
Phone: 604-435-1205
Email: djw(a)reactor-core.org
Webpage: http://reactor-core.org
Address: 2459 E 41st Ave, Vancouver, BC V5R2W2
I have been asked by Ben-Zin on the de wikipedia for a special page that
lists articles in one language that link to another, but are not linked
back.
This reminds me of an issue I raised some time ago here: AFAIK, the
interlanguage links are *not* stored anywhere except in the article
body. While this is sufficient for display of the links on viewing,
every database search for these links needs to be based on non-indexed
fulltext search, which strikes me as somewhat inefficient.
To solve that issue, I see two proposals:
1. Store the interlanguage links in a table of the "source" language.
The normal links table might be sufficient.
2. Store the interlanguage links in a new database, shared by all languages.
#2 has IMHO several advantages:
* every language wiki software has to know only that location, instead
of cross-referencing with *all* other language databases
* faster access for multiple languages (e.g., show all pages in de,fr,eo
that link to en but are not linked back)
* can store other meta information (what's the name of the user
namespace in German etc.)
* no need to change the existing databases; every language needs to run
a script once to fill in the existing namespaces
I think this is getting high priority as more wikipedias are changed to
Phase III.
Magnus
Dear WikiTech,
I have downloaded and installed the Wikipedia script from CVS, and have
set it up for use in a private wiki I am running. I picked the
wikipedia script because it is such a well- and actively-maintained
script and it has so many useful features that I'm used to as an active
wikipedian.
Overall, I found customizing the script quite easy. I don't know if
generalization of the script to other wiki applications is one of your
goals, but if it is, then the following would be helpful:
1) Wikipedia-specific strings in $wgAllMessagesEn:
Many of the messages in $wgAllMessagesEn are wikipedia-specific. I
overrode these by creating a $wgAllMessagesEn in LanguageEn.php, which I
figured would change less than Language.php. I was able to resolve this
without much problem, but it would be nice if this array could be split
into smaller chunks, to make it easier to override one variable but not
all. It would be really nice if one of the smaller arrays could be a
$wgLocalStringsEn with defaults commented out in LocalSettings.php
2) Namespace, Interwiki
It would be nice if it were obvious how to override namespace names and
the interwiki array in LocalSettings.php so you don't have to hunt it
down.
Also, unrelated, I noticed after I changed the text for "Recent Changes"
that it actually resolves each link in the text (which I assume costs
database lookups). This seems colossally wasteful to me. People are
not going to be in the business of linking to non-existing articles on
recent changes very long, and recent changes is one of the most
frequently-hit articles, no?
Thanks for your work,
~~~
PS. please cc: me on answers.
Changes:
* Charsets other than UTF-8 supported, Latin1 and Latin2 only now (Wikipedia uses
only Latin1 and UTF8), but it requires to edit exactly 3 lines to add another
encoding (1 line to add new type to encoding_t, another to say what is its name,
and third to say what should be LaTeX header for it). If no or unknown encoding is
passed, texvc assumes it is UTF-8. At least it is right if it's some ASCII-only
string.
* texvc_cgi.phtml now generates right HTTP Content-Type headers with encoding
* both OutputPage.php and texvc_cgi.phtml pass encoding to texvc
* (not really related to texvc, but also included) <pre> changed back to old behaviour
Now the only important unresolved things are CJK support and PNG transparency.
But I think we can live without them for now.
Oh, and texvc should allow more characters in \mbox{}, not only letters
and those with 8th bit set.
Please check this patch and comment.
> > A long term
> > redesign of the software could definitely be beneficial.
>
> We have done it twice, without much benefit.
> I'm all for evolutionary way.
Actually, we'd never have made it this far without the "revolutionary"
redesign done by Lee earlier this year. Our performance old PHP code
was breaking under far less load than we have all the time now.
While, we do need to improve our DB performance, but without better
stats, we can't say for certain that there aren't any PHP bottlenecks.
And PHP bottlenecks could eventually arise if we improve DB performance
significantly.
I'm all for the evolutionary way, but I think there's room for someone
who wants to work on a revolutionary way while others improve our
current system.
Yours
Mark Christensen