I recently set up a MediaWiki (http://server.bluewatersys.com/w90n740/)
and I need to extra the content from it and convert it into LaTeX
syntax for printed documentation. I have googled for a suitable OSS
solution but nothing was apparent.
I would prefer a script written in Python, but any recommendations
would be very welcome.
Do you know of anything suitable?
for some types of resources, it's desirable to upload source files
(whether it's Blender, COLLADA, Scribus, EDL, or some other format),
so that others can more easily remix and process them. Currently, as
far as I know, there's no way to upload these resources to Commons.
What would be the arguments against allowing administrators to upload
arbitrary ZIP files on Wikimedia Commons, allowing the Commons
community to develop policy and process around when such archived
resources are appropriate? An alternative, of course, would be to
whitelist every possible source format for admins, but it seems to me
that it would be a good general policy to not enable additional
support for formats that aren't officially supported (reduces
confusion among users about what's permitted -- there's only one file
format they can't use).
Deputy Director, Wikimedia Foundation
Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate
Over the past couple of months, Roan Kattouw and I (Trevor Parscal) have
ResourceLoader. We're really excited about this technology, and hope
others will be too.
This system has been proving itself to be able to seriously improve
front-end performance. Just for starters, we're talking about taking the
Vector skin from 35 requests @ 30kB gzipped to 1 request @ 9.4kB gzipped
and small images in MediaWiki and on Wikimedia projects, and we're
seeking your comments and help.
== Background ==
The goals of the project were to improve front-end performance, reduce
CSS code in MediaWiki.
What's wrong with things as they are now?
image resources are being loaded individually, which causes poor
performance on the cluster and users experience the site as being slow.
resources with large amounts of unneeded whitespace and comments.
* We are purging our caches too much. Many user interface changes
require purging page caches to take effect and many assets are
unnecessarily being purged from client machines due to the use of a
single style version for all assets
being sent to clients whose browsers will either crash when it arrives
(BlackBerry comes to mind), just not use it at all (older versions of
many browsers) while parsing it unnecessarily (this is slow on older
browsers, especially IE 6) or isn't even being completely utilized
(UsabilityInitiative's plugins.combined.min.js for instance)
many different ways -- most of which are not ideal -- to get their
translated messages to the client.
* Right-to-left support in CSS is akward. Stylesheets for right-to-left
must to be either hand-coded in a separate stylesheet, generated each
time a change is made by running CSSJanus, or an extra style-sheet which
contains a series of over-rides.
* There's more! These and other issues were captured in our requirements
gathering process (see
What does ResourceLoader do to solve this?
* Combines resources together. Multiple scripts, styles, messages to be
delivered in a single request, either at initial page load or
dynamically; in both cases resolving dependencies automatically.
* Dramatically reduces the number of requests for small images. Small
images linked to from CSS code can be automatically in-lined as data
URLs (when the developer marks it with a special comment), and it's done
automatically as the file is served without requiring the developer to
do such steps manually.
* Allows deployment changes to all pages for all users within minutes,
without purging any HTML. ResourceLoader provides a short-expiry
or not, and if so has a complete manifest of all scripts and styles on
the server and their most recent versions, Also, this startup script
will be able to be inlined using ESI (see
http://en.wikipedia.org/wiki/Edge_Side_Includes ) when using Squid or
Varnish, reducing requests and improving performance even further.
* Provides a standard way to deliver translated messages to the client,
bundling them together with the code that uses them.
* Performs automatic left-to-right/right-to-left flipping for CSS files.
In most cases the developer won't have to do anything before deploying.
* Does all kinds of other cool tricks, which should soon make everyone's
What do you want from me?
* Help by porting existing code! While ResourceLoader and traditional
methods of adding scripts to MediaWiki output can co-exist, the
performance gains of ResourceLoader are directly related to the amount
of software utilizing it. There's some more stuff in core that needs to
be tweaked to utilize the ResourceLoader system, such as user scripts
and site CSS. We also need extensions to start using it, especially
those we are deploying on Wikimedia sites or thinking about deploying
soon. Only basic documentation exists on how to port extensions, but
much more will be written very shortly and we (Roan and I) be leading by
example by porting the UsabilityInitiative extensions ourselves. If you
need help, we're usually on IRC. (See
* Help writing new code! While wikibits.js is now also known as the
"mediawiki.legacy.wikibits" module, the functionality that it and
deprecated, in favor of new modules which take advantage of jQuery and
can be written using a lot less code while eliminating the current
dependence on a large number of globally accessible variables and
* Some patience and understanding... Please... While we are integrating
into trunk, things might break unexpectedly. We're diligently tracking
down issues and resolving them as fast as we can, but help in this
regard is much needed and really appreciated. But most of all, we're
sorry if something gets screwed up, and we're trying our best to make
this integration smooth.
Documentation is coming online as fast as we can write it. There's a
very detailed design specification document at
more information in general at
http://www.mediawiki.org/wiki/ResourceLoader , where we will be adding
more and more documentation as time goes on. If you can help with
documentation, please feel free to edit boldly - just try not to modify
the design specification unless you are also modifying the software :)
While this project has been bootstrapped by Roan and myself in a branch,
we're really excited about bringing it to trunk and hope the community
can start taking advantage of the new features right away.
Tracking bug for tracking things that ResourceLoader will fix:
- Trevor (and Roan, who's committing the merge to SVN right now)
I know there's some discussion about "what's appropriate" for the
Wikipedia API, and I'd just like to share my recent experience.
I was trying to download the Wikipedia entries for people, of
which I found about 800,000. I had a scanner already written that
could do the download, so I got started.
After running for about I day, I estimated that it would take
about 20 days to bring all of the pages down through the API (running
single-threaded.) At that point I gave up, downloaded the data dump (3
hours) and wrote a script to extract the pages -- it then took about an
hour to the extraction, gzip compressing the text and inserting into a
Don't be intimidated by working with the data dumps. If you've got
an XML API that does streaming processing (I used .NET's XmlReader) and
use the old unix trick of piping the output of bunzip2 into your
program, it's really pretty easy.
Forwarding to wikitech-l, needs more audience.
I've no love for outdated software, so I'm firmly in
the +1 camp.
---------- Forwarded message ----------
From: Ashar Voultoiz <hashar+wmf(a)free.fr>
Date: Tue, Sep 28, 2010 at 3:39 PM
Subject: [Mediawiki-l] about requiring PHP 5.2
Looking at INSTALL it seems we are still supporting PHP version 5.1
which is 5 years old in a couple of weeks. This is getting old and
prevents developers from using some new features.
Ideally we could raise it to 5.3 to get Namespace support, closures but
that might be to early since most webhost probably still use 5.2.x.
Would it be possible to consider raising the requirement to at least
5.2.0 ? This would give us native JSON support and most probably the
filter extension enabled by default. The later can be used to speed up
the input validation.
MediaWiki-l mailing list
I fixed a bug today following a failed Cruise control build. I noticed
the build time was roughly 14 minutes, most of it spend by phpdoc
building the documentation. The xml log file is 6MB and 13800 lines are
_ is there any reason to use phpdoc instead of doxygen ?
_ can we build the api separately ? This will let us run tests more often.
_ Is there any human log file (HTML, text) beside XML ? I could process
it with XSLT but I am getting old :-)