I recently set up a MediaWiki (http://server.bluewatersys.com/w90n740/)
and I need to extra the content from it and convert it into LaTeX
syntax for printed documentation. I have googled for a suitable OSS
solution but nothing was apparent.
I would prefer a script written in Python, but any recommendations
would be very welcome.
Do you know of anything suitable?
I've read on the techblog that the new UI go live in April. I have
1) What version? Acai, babaco, citron?
2) How/where could a wiki customize the special character insert menu,
and the inserted strings? And the embed file (picture) button inserts
this: "[[Example.jpg]]", without any "File:" or "Image:"!
3) The search and replace button is available in firefox, but does not
appear at all in opera. Why?
4) Currently the new navigable TOC does not work on FF/Opera at all
(I've tried those).
Not too early for live deployment?
Akos Szabo (Glanthor Reviol)
Sorry about bugging the list about it, but can anyone please explain
the reason for not enabling the Interlanguage extension?
See bug 15607 -
I believe that enabling it will be very beneficial for many projects
and many people expressed their support of it. I am not saying that
there are no reasons to not enable it; maybe there is a good reason,
but i don't understand it. I also understand that there are many other
unsolved bugs, but this one seems to have a ready and rather simple
I am only sending it to raise the problem. If you know the answer, you
may comment at the bug page.
Thanks in advance.
Amir Elisha Aharoni
heb: http://haharoni.wordpress.com | eng: http://aharoni.wordpress.com
cat: http://aprenent.wordpress.com | rus: http://amire80.livejournal.com
"We're living in pieces,
I want to live in peace." - T. Moore
I am from Malayalam Wikipedia (ml.wikipedia - user:Praveenp), and my
language is Malayalam. Consider our one big problem.
After the release of Unicode 5.1.0, there are two kind of encoding for
some characters of Malayalam alphabet (because of reverse
compatibility). This cause serious problems in linking, searching etc in
mediawiki software. Currently Windows 7 is the only operating system
which supports Unicode 5.1.0. (? according to my knowledge), but lot of
third-party tools for writing and reading Malayalam supports new
version. And now large quantity of data in Wikimedia projects are in new
version. It is not possible to link, or search titles encoded in
pre-Unicode 5.1.0 from Unicode 5.1.0 or vice versa. Currently one of our
namespace ???????? (Category) also has one such character, so it is
possible to write ???????? as ?????? which renders same as first but
different in encoding. It causes problem in categorization also.
Is it possible to put some unicode equivalence
<http://en.wikipedia.org/wiki/Unicode_equivalence> in mediawiki
software? We need urgent help.
*Visual * *Representation in 5.0 and Prior* *Preferred 5.1
1 CHILLU_NN.png 0D23, 0D4D, 200D 0D7A
CHILLU_N.png 0D28, 0D4D, 200D 0D7B
3 CHILLU_RR.png 0D30, 0D4D, 200D 0D7C
4 CHILLU_L.png 0D32, 0D4D, 200D 0D7D
5 CHILLU_LL.png 0D33, 0D4D, 200D 0D7E
Wikipedia Affiliate Button
I've made myself mailing list owner of mediawiki-l and wikitech-l
temporarily, since I don't think Brion is interested in moderating
either of them anymore.
But it's a relatively simple task, and it seems to me that it could be
done by someone with more spare time and less awesome hacking skills ;)
The tasks are:
* Manage the mailman configuration for these two lists.
* Put members on moderation if they post inappropriate content.
* Deal with people who send administrative requests to the list or to
You can also expect to get some forwarded spam. Anyone interested?
-- Tim Starling
New full history en wiki snapshot is hot off the presses!
It's currently being checksummed which will take a while for 280GB+ of
compressed data but for those brave souls willing to test please grab it
and give us feedback about its quality. This run took just over a month
and gained a huge speed up after Tims work on re-compressing ES. If we
see no hiccups with this data snapshot, I'll start mirroring it to other
locations (internet archive, amazon public data sets, etc).
For those not familiar, the last successful run that we've seen of this
data goes all the way back to 2008-10-03. That's over 1.5 years of
people waiting to get access to these data bits.
I'm excited to say that we seem to have it :)
I wanted to change the "cite" extension to have some extra functionality.
As citations have gotten more common, I've noticed an emerging use case
where people will copy and paste text from wikipedia to HTML-enabled
tools such as email clients or IM clients to share information.
Unfortunately, those citation links just link to anchors on the page and
don't provide anything useful when copied/pasted. Appending the full
page's URL to those links would take like 20 seconds, make them
functional, but would add extra markup to every page. Anyone have any
other good reasons why we shouldn't do this?
I'm interested in porting texvc to Python, and I was hoping this list
here might help me hash out the plan. Please let me know if I should
take my questions elsewhere.
Roughly, my plan of attack would be something like this:
1. Collect test cases and write a testing script
Thanks to avar from #wikimedia, I already have the <math>...</math> bits
from enwiki and dewiki. I would also construct some simpler ones by hand
to test each of the acceptable LaTeX commands.
Would there be any possibility of logging the input seen by texvc on a
production instance of Mediawiki, so I could get some invalid input
submitted by actual users?
This could also be useful to future maintainers for regression testing.
2. Implement an AMS-TeX validator
I'll probably use PLY because it's rumored to have helpful debugging
features (designed for a first-year compilers class, apparently). ANTLR
is another popular option, but this guy
thinks it's complicated and hard to debug. I've never used either, so if
anyone on this list knows of a good Python parsing package I'd welcome
3. Port over the existing tex->dvi->png rendering.
This is probably just a few calls into the subprocess module. Yeah, I
just jinxed it.
4. Add HTML rendering to texvc and test script
I don't even understand how the existing texvc decides whether HTML is
good enough. It looks like the original programmer just decreed that
certain LaTeX commands could be rendered to HTML, and defaults to PNG if
it sees anything not on that list. How important is this feature?
5. Repackage the entire Math thing as an extension
I might do this if I have time left at the end. I'm sure the project
will change over the summer.
Python doesn't have parsing just locked right down the way C does with
flex/bison, but there are some good options, I have the most experience
with it, and I think I'd be able to complete the port faster in Python
than in either of the other languages. I was tempted at first to port to
PHP, to conform with the rest of Mediawiki, but there don't seem to be
any good parsing packages for PHP. (Please tell me if that's wrong.)
I'd appreciate any advice or criticism. Since my only previous
experience has been using Wikipedia and setting up a test Mediawiki
instance for my ACM chapter, I'm only just now learning my way around
the code base and it's not always evident why things were done as they
are. Does this look like a reasonable and worthwhile project?
P.S. Some of you may remember me on IRC a couple of days ago getting a
little panicky about not knowing OCaml, but I'm a bit more hopeful now
after looking around the source. I definitely have to keep the OCaml
manual open for reference, but I've written Scheme, Common Lisp, and
Haskell before, so I think I might be able to fake it. These are just
Famous Last Words waiting to happen, I know.
^demon, Happy-melon, ialex, ashley and I have been preparing a new
class. It is intended to be replacement for the old wfMsg* functions
that many seem to dislike.
The most important reasons why we want to replace them are below.
There is some more at .
* There is too many functions with too many parameters
* It is easy to do the wrong thing
* It is difficult to do the right thing
The new class is in my opinion now ready enough for comments and
criticism. The full source code is at  and formatted documentation
It should be possible form the documentation to see how it is meant to be used.
Few examples are below. More examples and how they compare to old
wfMsg* functions can be found at .
# $button = Xml::submitButton( Message::key( 'submit' )->text() );
# Message::key( 'welcome-to' )->params( $wgSitename )->parse();
# Message::key( 'bad-message' )->rawParams( '<script>...</script>' )->escaped();
# Message::key( 'file-log' )->params( $user, $filename
It should be noted that it is not our intention to replace
OutputPage::addWikiMsg() and ::wrapWikiMsg().
Things I'd like to have comments for:
(1) Is it easy to use this class? Have we solved the three main issues
(2) The primary entry point is Message::key(). The syntax is little
more verbose than wfMsg*'s, but much more readable imho. Do we want to
use even short wrapper for the entry point? If yes, how should it be
called? For example _() (often used in Gettext projects) and
Msg::key() have been suggested.
(3) Anything else with regards to the documentation, the code or other issues.
In the current state the class should be able to cover almost all use
cases wfMsg-functions had, with some exceptions. I'd like to have some
tests for this class, can somebody help with that? Obviously this
quite a small change itself, but it will have a big impact when we
start converting code to use these new methods. For that reason I want
to get it right. I think we should proceed slowly during 1.17 cycle,
for example using these only in new code and iron out all problems.
Like ^demon pointed out at bug 16026 , we should perhaps review
HTML-safety of any message we convert to use these new methods.