I recently set up a MediaWiki (http://server.bluewatersys.com/w90n740/)
and I need to extra the content from it and convert it into LaTeX
syntax for printed documentation. I have googled for a suitable OSS
solution but nothing was apparent.
I would prefer a script written in Python, but any recommendations
would be very welcome.
Do you know of anything suitable?
I've been informally mentoring André, Tiago, Diego, and César. They
are four students at Minho University who are currently working on a
project to improve DB2 database support in MediaWiki.
So far, they've:
- Fixed several outstanding issues with DB2 support involving
character encoding, Windows vs Linux, etc
- Added DB2 support to the new MediaWiki 1.17 Installer and Updater
- Put in the appropriate Updater sql patches to reflect database
schema changes since 1.14
MediaWiki already had some DB2 support, but it's been broken since
1.15 and never complete. As a result of their work, it's now possible
to successfully install MediaWiki on DB2 out of the box and to use the
core wiki features.
I'll shortly commit their first patch using my SVN account (leonsp).
I've taken some care to look over the code and make sure it abides by
the MediaWiki code guidelines.
Bug 24207 requests switching the math rendering preference default from its
current setting (which usually produces a nice PNG and occasionally produces
some kinda ugly HTML) to the "always render PNG" setting.
I'd actually propose dropping the rendering options entirely...
* "HTML if simple" and "if possible" produce *horrible* ugly output that
nobody likes, so people use hacks to force PNG rendering. Why not just
render to PNG?
* "MathML" mode is even *MORE* limited than "HTML if simple", making it
* nobody even knows what "Recommended for modern browsers" means, but it
seems to be somewhere in that "occasionally crappy HTML, usually PNG"
So we're left with only two sane choices:
* Always render PNG
* Leave it as TeX (for text browsers)
Text browsers will show the alt text on the images, which is... the TeX
code. So even this isn't actually needed for its stated purpose. (Hi
Jidanni! :) lynx should show the tex source when using the PNG mode.)
It's conceivable that a few folks really honestly prefer to see the latex
source in their graphical browsers (should at least do a quick stat check to
see if anybody uses it on purpose), but I wouldn't mind removing that
Fancier rendering like MathJax etc should be considered as a separate thing
(and implemented a bit differently to avoid parser cache fragmentation!), so
don't let future mode concerns worry y'all. Any thoughts on whether this
makes sense to do for 1.18 or 1.19?
I am starting this thread because Brion's revision r94289 reverted
r94289  stating "core schema change with no discussion" .
Bugs 21860  and 25312  advocate for the inclusion of a hash
column (either md5 or sha1) in the revision table. The primary use
case of this column will be to assist detecting reverts. I don't think
that data integrity is the primary reason for adding this column. The
huge advantage of having such a column is that it will not be longer
necessary to analyze full dumps to detect reverts, instead you can
look for reverts in the stub dump file by looking for the same hash
within a single page. The fact that there is a theoretical chance of a
collision is not very important IMHO, it would just mean that in very
rare cases in our research we would flag an edit being reverted while
it's not. The two bug reports contain quite long discussions and this
feature has also been discussed internally quite extensively but oddly
enough it hasn't happened yet on the mailinglist.
So let's have a discussion!
while MediaWiki has been and is developed primarily with Wikimedia
Foundation's interests in mind, there are some big third-party users of
MediaWiki out there; while Wikia and wikiHow are the biggest and most
well-known, they certainly aren't the only ones.
What's common to third-party users of MediaWiki is not just custom
extensions, but sadly core changes, or as they're better known, core hacks
-- unsupported changes to the core of the MediaWiki software. I think that
everyone will agree with me when I say that we will want to reduce the
amount of core hacking by third-parties and instead increase collaboration
with us, the upstream developers of MediaWiki.
Reducing the amount of core hacks is generally a good idea for third
parties, because it will allow them to upgrade to the latest stable version
of MediaWiki easily and things like new hooks can and in many cases are
useful to other users of MediaWiki. For example, the
MakeGlobalVariablesScript hook (
originally introduced by Wikia (under the name 'ExtendJSGlobalVars'); in
r38397 I added the hook into core under its current name and right now there
are many extensions using the hook, including ones used by Wikimedia
Foundation sites (see
This is a fine example of how a third-party core hack became a part of the
MediaWiki core and thus something useful to other users of MediaWiki,
including the Wikimedia Foundation.
Another factor to take into account is security. According to the Version
lifecycle page on MediaWiki.org (see
http://www.mediawiki.org/wiki/Version_lifecycle), "The release manager has
also issued a strong recommendation that versions not listed above as
current version or legacy version should not be used in a productive
environment. They may contain critical security vulnerabilities and other
major bugs, including the threat of possible data loss and/or corruption".
For example, wikiHow is running MediaWiki 1.12.0, which was released on 21
March 2008 -- over three years ago. While I'm sure that the wikiHow
developers have applied plenty of the more modern security patches, there's
still a possibility that they may have missed one patch -- and even if not,
MediaWiki 1.12.0 doesn't have all the cool new features that MediaWiki
1.17.0 has. :-)
Essentially I'd like to see all major third-party users contributing code to
the upstream version of MediaWiki and everyone keeping their copies of
MediaWiki on the official MediaWiki Subversion repository at
svn.wikimedia.org. Maybe we could have a branch for each third party under
/mediawiki/branches/ or if that's unacceptable, then maybe even a whole new
repository (like how we currently have mediawiki, mysql, pywikipedia and
wikimedia -- see http://svn.wikimedia.org/viewvc), although I must admit
that it sounds a bit overkill to me.
I know from experience that many third parties have written some awesome
code and that there are many other people interested in third-party code,
but usually getting third-party code to run requires plenty of knowledge
about PHP and MediaWiki as the extensions and core changes have usually been
designed to work with one site or one farm. I want to change that and bring
more extensions available to the general public -- after all, there are many
people out there who run a MediaWiki wiki yet they aren't very PHP-savvy.
The official MediaWiki Subversion repository is also well-known and it can
also act as a "backup" of some kind. I'm sure that most people and companies
have extensive backup systems in place, but everything is still possible.
For example, the social wiki/blog hybrid site ArmchairGM, where the
SocialProfile extension (
http://www.mediawiki.org/wiki/Extension:SocialProfile) and many other,
equally cool and interesting extensions were developed, had its own
codebase. While the main Wikia codebase has been open source for years, the
ArmchairGM codebase was only recently (1 August 2011) open-sourced with the
kind help of Sean Colombo -- and for a rather long while, it seemed that
ArmchairGM's unique skin and the unique extensions had been lost; now that
would've been a major loss for the open source community. Tens of thousands
of lines of code, dozens of unique features and some pretty skins were
nearly lost; I think that it's in everyone's best interest to prevent such
incidents from happening and that is possible by keeping the code free and
I've CC'd this message to Sean Colombo of Wikia, Jack Herrick and Reuben
Smith of wikiHow, Joachim Bode of Twoonix Software GmbH and Markus Glaser of
Hallo Welt! -- please let me know your thoughts about this idea and how your
company would be able to contribute.
Thanks and regards,
I've been asking around on IRC but thought it would be good to open up
to a larger audience.
Has anyone here used PhoneGap (http://www.phonegap.com/) for mobile
app development? I'm eager to get your thoughts and potentially
brainstorm some new ideas.
I'll have a longer mail about this later but since its taken too long
to draft it .. i thought i'd just send this snippet now to start the
Over the last few weeks, Yusuke Matsubara, Shawn Walker, Aaron Halfaker and
Fabian Kaelin (who are all Summer of Research fellows) have worked hard
on a customized stream-based InputFormatReader that allows parsing of both
bz2 compressed and uncompressed files of the full Wikipedia dump (dump file
with the complete edit histories) using Hadoop. Prior to WikiHadoop and the
accompanying InputFormatReader it was not possible to use Hadoop to analyze
the full Wikipedia dump files (see the detailed tutorial / background for an
explanation why that was not possible).
1) We can now harness Hadoop's distributed computing capabilities in
analyzing the full dump files.
2) You can send either one or two revisions to a single mapper so it's
possible to diff two revisions and see what content has been addded /
3) You can exclude namespaces by supplying a regular expression.
4) We are using Hadoop's Streaming interface which means people can use this
InputFormat Reader using different languages such as Java, Python, Ruby and
The source code is available at: https://github.com/whym/wikihadoop
A more detailed tutorial and installation guide is available at:
(Apologies for cross-posting to wikitech-l and wiki-research-l)
as a quick update, Alolita, Rob & I presented a tech talk at Google
Mountain View last Thursday. The slides are up at:
They're not terribly self-explanatory, so you'll want to see the video
to make sense of them -- should go up soon on
The intent of the talk was to give a quick all-round update across
Wikimedia's engineering projects, to help refresh the understanding of
Googlers and anyone else who might be watching.
It's good for us to do these once a year to keep folks up-to-date, and
get some of them excited about helping :-)
VP of Engineering and Product Development, Wikimedia Foundation
Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate
I am note sure who might be in a position to correct this, but this list
seems the most likely..
For some reason sep11.wikipedia.org subdomain is forwarding to a spam site -
this was pointed out on OTRS earlier.
I assume this was set up as a redirect to the 9/11 memories Wiki, and that
site has since been taken over.
Can someone fix this?