I recently set up a MediaWiki (http://server.bluewatersys.com/w90n740/)
and I need to extra the content from it and convert it into LaTeX
syntax for printed documentation. I have googled for a suitable OSS
solution but nothing was apparent.
I would prefer a script written in Python, but any recommendations
would be very welcome.
Do you know of anything suitable?
I'm replying to this wikipedia-l post in wikitech-l, it's more relevant
Brion Vibber wrote:
> I'd been waiting on Tim's in-progress code to compare. Apparently there's not
> really anything much of that left (his work mostly transmogrified into the
> templatelinks temple) so I'm poking at Magnus's code now.
Salvatore's moderation feature was implemented in a similar way to
Magnus' one, in that it used an extra revision ID field in the page
table to point to the relevant version. Salvatore's used parameters
passed back to Revision to determine whether page_latest or
page_verified should be used, whereas Magnus's code operated mainly at
the UI level, redirecting to a page with an oldid parameter, IIRC.
Neither of them had the structure required for efficient caching, that
is, page/tag retrieval instead of page/revision retrieval. The basic
problem is that tugela, which we are now using instead of memcached, has
no efficient means for identifying and purging expired keys. In fact at
the moment, this garbage collection is not done at all. To limit the
growth of the cache under these circumstances, it's better to index the
parser cache by page and tag, rather than page and revision ID. I
thought that the best way to implement a tag concept, to merge Magnus's
and Salvatore's features while minimising MySQL index space, would be to
put the tag information in its own table.
Then there's the problem of template and link colour changes. I posted
to wikitech-l about that before. Magnus's suggestion of storing the
wikitext with the templates expanded at save time is a quite reasonable
I stopped working on Salvatore's moderation feature when Brion
implemented the semi-protection proposal put forward by the English
Wikipedia. It was quite redundant -- Salvatore's feature was a form of
semi-protection, a more complicated one than the one that the
Wikipedians were supporting. I was even working on integrating it into
the protection UI when Brion rewrote that part of the code. At that
stage I still hadn't addressed the caching issue. So I salvaged what I
could of my code branch (mostly the templatelinks table), and abandoned
the feature. I wasn't interested enough in the stable version feature to
keep working on the backend.
Perhaps the simplest solution at the moment is to put Magnus's feature
live (after the necessary code cleanup), and put up with the lack of
caching for a while. We've still got a bit of spare hardware capacity
haven't we? The request rate for stable versions should be lower than it
would have been for verified revisions. If I understand it correctly,
stable revisions are not displayed by default, verified revisions would
-- Tim Starling
Very soon (maybe this afternoon), I'd like to submit a patch to add
OpenID login support to MediaWiki. Dan Libby has already contributed
such a patch:
Our patch (JanRain, Inc.) is a patch against CVS HEAD, extends Dan
Libby's original modifications, and uses the PHP OpenID library that
we built and maintain at
Here are some notes from the openid.txt file included in the patch:
- OpenID support works in *addition* to normal wiki logins, including
any external authentication plugin configured by the MediaWiki
administrator. If a username looks like a URL, OpenID auth is
tried; otherwise, the regular authentication rules apply.
- If OpenID support cannot be verified (either because the library is
missing or because the store directory can't be initialized -- see
step (3) in Installation), MediaWiki will function normally even if
$wgUseOpenID is set to true.
- The account creation form cannot currently be used to create
accounts with OpenID identity URLs. If you want to create an
account with your OpenID, just log in. The account will be created
Any thoughts or concerns? I'm busy preparing some things and then
I'll attach a .diff to the bug ticket above.
Lastly, provided that the patch is accepted at some point, I'll be
happy to be active in supporting and maintaining OpenID support in
I've disabled the ability to use blank passwords on wiki accounts.
For a long time we treated accounts very laxly in this regard; there generally
wasn't _that_ much reason to secure a casual account unless you were one of the
tiny number of sysops.
In recent years though the number of sysops has exploded, and we've added
really annoying if someone gets into your account and messes with them. As a
small concession to security and accountability, it's time for blank passwords
While running some password security checks, I found that a handful of sysop
accounts had blank passwords. Probably some non-sysop accounts also had blanks.
Affected accounts can reset the password by the automated e-mail password gadget
on the login form, unless of course they didn't put in an e-mail.
-- brion vibber (brion @ pobox.com)
brion vibber (brion @ pobox.com) wrote:
> Tomasz Wegrzanowski wrote:
>> So, while dictionary-checking sysops' passwords make a lot of sense,
>> there's very little point in limiting passwords of the
> At the moment we don't have a separate switch for sysops, nor any control which
> would prevent blank-password accounts from being made into sysops. I'd rather
> risk disabling a few accounts temporarily than keep the incredibly dangerous
> sysop accounts open (which could be used potenially to great destructive effect).
Could you elaborate on the "temporarily" part ?
as the subject says: happy birthday.
I'd like to announce a new update, which can be found on CPAN:
The online manual also got updated:
I try to summarize the changes and new features in brief below, a full
revision history can be found in the changelogs for the involved Perl
If you have any problem, do not hesitate to open a bug at
http://rt.cpan.org or just send me an email.
* The wikimedia integration can output graphviz code, which means you can
use dot/neato/fdb/whatever to render the graph as PNG file
* align: left|right|center for all labels possible
* invisible edges
* enforcing a minlen for edges
* joints: edges can join other edges, or split up on the way to their
* support for image-based nodes (e.g. use an icon/png as a node)
* local (per-node, per-edge) and global flow that lets you define how the
graph should "flow", e.g. in which direction the edges are laid out.
This even works with relative flow (relative to the current node) *and*
absolute flow (relative to the orientation of the, well, global world)
* autosplit nodes are much improved - they had quite a few bugs,
especially when setting attributes on them
* improved layouter, can handle groups/autosplit nodes and joints much
* improved HTML output, all implemented features work now on major
browsers (IE; FF, Opera and Konqueror, except for very small nits)
* Links in SVG now work correctly on Firefox (workaround FF bug)
* many, many bugfixes, improved handling of syntax and documentation
0: The first version of Graph::Easy was released 27.12.2004, but it wasn't
actually usable until very late Januar, possible later.
Signed on Tue Jan 31 18:45:58 2006 with key 0x93B84C15.
Visit my photo gallery at http://bloodgate.com/photos/
PGP key on http://bloodgate.com/tels.asc or per email.
"The flow chart is a most thoroughly oversold piece of program
documentation." -- Frederick Brooks, 'The Mythical Man Month'
You should subscribe to wikitech-l (the Wikipedia technical issues
mailing list) for this sort of issue - I've forwarded this there.
On 31/01/06, Vijay <vijaykillu(a)gmail.com> wrote:
> Hello there,
> I have just downloaded the wikipedia en dump and am trying to cofigure
> wikipedia on my local server. I have mysql and php on windows 2003 server. I
> installed mediawiki and have extracted the wikipedia en dump file which gave
> me an xml file (4 GB).
> When I tried importing the dump to mysql using importDump.php, the process
> started fine. But when the record count reached 18600, the process stopped
> with the following error. I wonder what could be the problem? Any help in
> this regard is highly appreciated.
> By the way, the wikipedia dump file is the current version and not the
> complete one. The zip file is around 900 MB in size.
> Here are the last few lines from the output
> 18400 (23.2640009615 pages/sec 23.2614722657 revs/sec)
> 18500 (23.2941692512 pages/sec 23.2916509627 revs/sec)
> 18600 (23.3273356919 pages/sec 23.3248273763 revs/sec)
> Content-type: text/html
> X-Powered-By: PHP/4.3.9
> XML import parse failure at line 1672638, col 154 (byte 145084928; ""): no
> nt found
brion vibber (brion @ pobox.com) wrote:
> I've disabled the ability to use blank passwords on wiki accounts.
> For a long time we treated accounts very laxly in this regard; there generally
> wasn't _that_ much reason to secure a casual account unless you were one of the
> tiny number of sysops.
> In recent years though the number of sysops has exploded, and we've added
> really annoying if someone gets into your account and messes with them. As a
> small concession to security and accountability, it's time for blank passwords
> to go.
> While running some password security checks, I found that a handful of sysop
> accounts had blank passwords. Probably some non-sysop accounts also had blanks.
> Affected accounts can reset the password by the automated e-mail
> password gadget on the login form, unless of course they didn't put in an e-mail.
This is seriously wrong. It should be completely reversed.
A lot of people have just lost their account because of this,
and it wasn't even announced that it was coming.
This part of the problem could be reduced if the change was
announced in advance.
However, that's not the full problem.
Many people use blank or trival passwords and don't give their emails.
This is completely reasonable, as it's very hard to remember just
another password (and reusing passwords on different websites is about
as bad as having none),
and even if spamming wasn't a problem, why the heck would any website
need their email in the first place ?
So, while dictionary-checking sysops' passwords make a lot of sense,
there's very little point in limiting passwords of the non-privileged accounts.
(and yeah, /me just lost 2 (rarely used) accounts on fr.wp and de.wp)
> I'm surprised that blank passwords were ever allowed since they are
> probably the worst security you can make,
Second only to letting anybody edit your web site. ;)
UseModWiki actually went so far as to allow you to create multiple user accounts
with the same user name...
> Maybe in the future a more strict password security protocol
> should be established and enforced, forcing password changes every x
> days would be unduly burdensome but complexity requirements might be a
> good idea especially since as you mentioned the adminship and the
> community pool has enlarged greatly.
I'm fiddling with some basic dictionary checks and such.
-- brion vibber (brion @ pobox.com)