I recently set up a MediaWiki (http://server.bluewatersys.com/w90n740/)
and I need to extra the content from it and convert it into LaTeX
syntax for printed documentation. I have googled for a suitable OSS
solution but nothing was apparent.
I would prefer a script written in Python, but any recommendations
would be very welcome.
Do you know of anything suitable?
Over the past couple of months, Roan Kattouw and I (Trevor Parscal) have
ResourceLoader. We're really excited about this technology, and hope
others will be too.
This system has been proving itself to be able to seriously improve
front-end performance. Just for starters, we're talking about taking the
Vector skin from 35 requests @ 30kB gzipped to 1 request @ 9.4kB gzipped
and small images in MediaWiki and on Wikimedia projects, and we're
seeking your comments and help.
== Background ==
The goals of the project were to improve front-end performance, reduce
CSS code in MediaWiki.
What's wrong with things as they are now?
image resources are being loaded individually, which causes poor
performance on the cluster and users experience the site as being slow.
resources with large amounts of unneeded whitespace and comments.
* We are purging our caches too much. Many user interface changes
require purging page caches to take effect and many assets are
unnecessarily being purged from client machines due to the use of a
single style version for all assets
being sent to clients whose browsers will either crash when it arrives
(BlackBerry comes to mind), just not use it at all (older versions of
many browsers) while parsing it unnecessarily (this is slow on older
browsers, especially IE 6) or isn't even being completely utilized
(UsabilityInitiative's plugins.combined.min.js for instance)
many different ways -- most of which are not ideal -- to get their
translated messages to the client.
* Right-to-left support in CSS is akward. Stylesheets for right-to-left
must to be either hand-coded in a separate stylesheet, generated each
time a change is made by running CSSJanus, or an extra style-sheet which
contains a series of over-rides.
* There's more! These and other issues were captured in our requirements
gathering process (see
What does ResourceLoader do to solve this?
* Combines resources together. Multiple scripts, styles, messages to be
delivered in a single request, either at initial page load or
dynamically; in both cases resolving dependencies automatically.
* Dramatically reduces the number of requests for small images. Small
images linked to from CSS code can be automatically in-lined as data
URLs (when the developer marks it with a special comment), and it's done
automatically as the file is served without requiring the developer to
do such steps manually.
* Allows deployment changes to all pages for all users within minutes,
without purging any HTML. ResourceLoader provides a short-expiry
or not, and if so has a complete manifest of all scripts and styles on
the server and their most recent versions, Also, this startup script
will be able to be inlined using ESI (see
http://en.wikipedia.org/wiki/Edge_Side_Includes ) when using Squid or
Varnish, reducing requests and improving performance even further.
* Provides a standard way to deliver translated messages to the client,
bundling them together with the code that uses them.
* Performs automatic left-to-right/right-to-left flipping for CSS files.
In most cases the developer won't have to do anything before deploying.
* Does all kinds of other cool tricks, which should soon make everyone's
What do you want from me?
* Help by porting existing code! While ResourceLoader and traditional
methods of adding scripts to MediaWiki output can co-exist, the
performance gains of ResourceLoader are directly related to the amount
of software utilizing it. There's some more stuff in core that needs to
be tweaked to utilize the ResourceLoader system, such as user scripts
and site CSS. We also need extensions to start using it, especially
those we are deploying on Wikimedia sites or thinking about deploying
soon. Only basic documentation exists on how to port extensions, but
much more will be written very shortly and we (Roan and I) be leading by
example by porting the UsabilityInitiative extensions ourselves. If you
need help, we're usually on IRC. (See
* Help writing new code! While wikibits.js is now also known as the
"mediawiki.legacy.wikibits" module, the functionality that it and
deprecated, in favor of new modules which take advantage of jQuery and
can be written using a lot less code while eliminating the current
dependence on a large number of globally accessible variables and
* Some patience and understanding... Please... While we are integrating
into trunk, things might break unexpectedly. We're diligently tracking
down issues and resolving them as fast as we can, but help in this
regard is much needed and really appreciated. But most of all, we're
sorry if something gets screwed up, and we're trying our best to make
this integration smooth.
Documentation is coming online as fast as we can write it. There's a
very detailed design specification document at
more information in general at
http://www.mediawiki.org/wiki/ResourceLoader , where we will be adding
more and more documentation as time goes on. If you can help with
documentation, please feel free to edit boldly - just try not to modify
the design specification unless you are also modifying the software :)
While this project has been bootstrapped by Roan and myself in a branch,
we're really excited about bringing it to trunk and hope the community
can start taking advantage of the new features right away.
Tracking bug for tracking things that ResourceLoader will fix:
- Trevor (and Roan, who's committing the merge to SVN right now)
There seems to be some confusion about how ResourceLoader works, which
has been leading people to make commits like r73196 and report bugs like
#25362. I would like to offer some clarification.
ResourceLoader, if you aren't already aware, is a new system in
MediaWiki 1.17 which allows developers to bundle collections of
*modules*. Modules may represent any number of scripts, styles and
messages, which are read from the file system, the database, or
generated by software.
When a request is made for one or more modules, each resource is
packaged together and sent back to the client as a response. The way in
which these requests and responses are performed depends on whether
debug is on or off.
When debug mode is off:
* Modules are requested in batches
* Resources are combined into modules
* Modules are combined into a response
* The response is minified
When debug mode is on:
* Modules are requested individually
* Resources are combined into modules
I think it's debatable whether debug=true mode goes far enough, since it
still combines resources into modules, and I am open to contributions
that can make debug=true mode even more debugging friendly by delivering
the resources to the client as unchanged as possible. I also think it's
debatable if debug=false mode goes far enough, since things like Google
Closure Compiler have been proven to even further reduce the size of
debug=false even more production friendly by improving front-end
The commits and bugs that I'm contending here are ones which are aiming
to dilute the optimized nature of debug=false mode, when debug=true mode
is really what they should be using or improving. These kinds of changes
and suggestions result in software that is neither optimized for
debugging or for production, making the front-end performance of the
site in production slower without making it any easier to debug than it
would have been by using debug=true.
If you are a developer, working on your localhost, you probably want to
$wgResourceLoaderDebug = true;
.. and then test that things work in debug=false mode before committing
your code. This will result in more requests but less processing, which
will be much faster when developing on localhost.
I hope this helps clarify this situation.
I have been tasked to evaluate whether we can use the parserTests db code
for the selenium framework. I just looked it over and have serious
reservations. I would appreciate any comments on the following analysis.
The environment for selenium tests is different than that for
parserTests. It is envisioned that multiple concurrent tests could run
using the same MW code base. Consequently, each test run must:
+ Use a db that if written to will not destroy other test wiki
+ Switch in a new images and math directory so any writes do not
interfere with other tests.
+ Maintain the integrity of the cache.
Note that tests would *never* run on a production wiki (it may be
possible to do so if they do no writes, but safety considerations suggest
they should always run on a test data, not production data). In fact
production wikis should always retain the setting $wgEnableSelenium =
false, to ensure selenium test are disabled.
Given this background, consider the following (and feel free to comment
parserTests temporary table code:
A fixed set of tables are specified in the code. parserTests creates
temporary tables with the same name, but using a different static prefix.
These tables are used for the parserTests run.
Problems using this approach for selenium tests:
+ Selenium tests on extensions may require use of extension specific
tables, the names of which cannot be elaborated in the code.
+ Concurrent test runs of parserTests are not supported, since the
temporary tables have fixed names and therefore concurrent writes to them
by parallel test runs would cause interference.
+ Clean up from aborted runs requires dropping fossil tables. But, if a
previous run tested an extension with extension-specific tables, there is
no way for a test of some other functionality to figure out which tables
For these reasons, I don't think we can reuse the parserTests code.
However, I am open to arguments to the contrary.
-- Dan Nessett
Back in June the Selenium Framework had a local configuration file called
LocalSeleniumSettings.php. This was eliminated by Tim Starling in a 6/24
commit with the comment that it was an insecure concept. In that commit,
new globals were added that controlled test runs.
Last Friday, mah ripped out the globals and put the configuration
information into the execute method of RunSeleniumTests.php with the
comment "@todo Add an alternative where settings are read from an INI
file." So, it seems we have dueling developers with contrary ideas about
what is the best way to configure selenium framework tests. Should
configuration data be exposed as globals or hidden in a local
Either approach works. But, by going back and forth, it makes development
of functionality for the Framework difficult. I am working on code not
yet submitted as a patch that now requires reworking because how to
reference configuration data has changed. We need a decision that decides
which of the two approaches to use.
-- Dan Nessett
(sending to main tech lists, crossposted to Tech Blog, feel free to forward
anywhere else you'd like)
Greetings MediaWiki hackers!
I am pleased to announce the upcoming MediaWiki Hack-A-Ton in Washington, DC.
As you are all aware, every year in April our good friends at
host the annual "MediaWiki Developers Meetup" in Berlin. At that
event, the program
is focused on demonstrations, workshops and small group discussions. To
complement this, we're planning the DC meetup to be focused solely on hacking,
bugfixing and getting down and dirty with the code.
We're scheduling this for October 22nd-24th in Washington, DC. Some of
haven't been ironed out yet, but will be announced over the coming
days as it is.
So clear your calendars, and keep your eyes on MediaWiki.org and the mailing
lists for more information.
Some travel assistance may be available for those coming a long way. I've also
been told there will be swag of some sort for attendees :)
Early on in the requirements stage of ResourceLoader development we
decided to use ISO8601 as the format for representing timestamps in
URLs. This was chosen for it's legibility, conformance to a standard and
ease of generation. However this was somewhat of an oversight since the
timestamp "1970-01-01T00:00:00Z" gets URL encoded to be
"1970-01-01T00%3A00%3A00Z" which leaves something to be desired. Also,
(minified and compressed).
So, before we seal the deal on using 8601, I would like to collect some
ideas about alternatives which would ideally...
* Be legible in a URL
* Conform to a well-defined/well-known standard
You all probably have noticed more people getting involved in code
review (and of course saw Brion's mail). This is partly in
anticipation of Tim being afk, and partly because we're long overdue
for distributing the load.
Here's who we have available for code review, and what they'll be focused on:
* Brion - general review, see his mail from earlier this week
* Chad - general review
* Roan - ResourceLoader, API, CentralNotice, UploadWizard
* Trevor - general review, mostly front-end
* Tim - general review
* Mark - general review as available
I'll let each of them elaborate on their areas of focus.
I imagine we will want to give the code review pages on mediawiki.org
some love in the coming days and weeks, starting here:
We have a number of related pages that potentially need to be merged,
reorganized, or deleted. I'll plug away at this, and I'll appreciate
any help on this (be bold; we'll revert if we don't like).
As you probably know, we're trying to get into the habit of providing
a monthly overview of all WMF-sponsored engineering activity. The
September update was posted to the techblog here:
For October, we'd like to draft this in public so as to get the
information out a little sooner, and to give you all the opportunity
to help out. Here's where we're drafting this:
Here's a very simple way you can help. If you see something on the
list that you're interested in, but don't see the status for yet, ping
one of us, then be bold and add what you learn to the appropriate wiki
page. If you do know the status, by all means add it.
Another useful thing to do: you'll notice that many of the project
pages that the status post links to are pretty sparse. Same rules
apply there. We'd love to get help keeping this up to date.