I recently set up a MediaWiki (http://server.bluewatersys.com/w90n740/)
and I need to extra the content from it and convert it into LaTeX
syntax for printed documentation. I have googled for a suitable OSS
solution but nothing was apparent.
I would prefer a script written in Python, but any recommendations
would be very welcome.
Do you know of anything suitable?
today we came over 10k HTTP requests per second (even with inter-squid
traffic eliminated). Especially thanks to Mark and Tim, who've been
improving our caching, as well as doing lots of other work, and
achieved incredible results (while I was slacking). Really, thanks!
I've put together an extension for rating articles if anyone is
interested. It's just a first version and hasn't been tested much, but
the details can be found here:
You can see an example here on our development server:
(username password wikihow / wikihow2006) - scroll down to the bottom
of the page for the checkmarks.
I'd appreciate feedback if anyone has any. If someone wants to add
this to extensions in svn, that'd be great.
My name is Reid Priedhorsky, and I'm a Ph.D. student at GroupLens
Research, which is the human-computer interaction group at the
University of Minnesota.
We are currently working on some research which is investigating
Wikipedia contribution and vandalism. To this end, statistics on the
view rate of different articles would be extremely helpful to us --
something along the lines of Leon Weber's WikiCharts tool, but with a
larger limit (ideally all 1.7 million articles).
It seems to me that the easiest way to accomplish this would be to get
copies of your sampled Squid logs (as described on
and its links). We do not need the client IP or any other similarly
sensitive data, though if you gave it to us we would protect it
carefully as we protect the other sensitive research data we handle.
Would it be possible for us to have access to these log files?
If not, I would love to begin a discussion on what it would be possible
for us to access.
Your help would be greatly appreciated. Please let me know if you have
Whenever I save a page from Wikipedia, or any other MW site, using
Firefox, the resulting HTML loses its screen stylesheet. This seems to
be due to the way we are embedding the stylesheet, i.e.:
<style type="text/css" media="screen,projection">/*<![CDATA[*/ @import
This appears to cause Firefox to not treat the stylesheet as part of
the page it needs to retrieve. Is there a way to avoid this behavior,
either on the client or on the server side?
Peace & Love,
DISCLAIMER: This message does not represent an official position of
the Wikimedia Foundation or its Board of Trustees.
"An old, rigid civilization is reluctantly dying. Something new, open,
free and exciting is waking up." -- Ming the Mechanic
It would be really nice if wikipedias Special:random-function could take
arguments, making it possible to get a random article from user-specified
A suggestion to do this is as follows. When a user click on the "random
article" link the random article is loaded with a comment on the top of the
page saying something like "you requested a random article. click here to
view options" and so the the user can mark which categories the article
should come from. The settings can either be stored in a cookie or they can
be sent in the URI.
Everyone would love this feature!
View this message in context: http://www.nabble.com/Feature-request%3A-Random-article-in-specified-catego…
Sent from the Wikipedia Developers mailing list archive at Nabble.com.
Hey all. Please excuse the fact that this is posted to multiple
mailing lists, but I need to ensure that it reaches all concerned.
I'm writing as an IRC Group Contact to call all operators of Wikimedia
IRC bots to state their activity and use of their bot to me, using a
private e-mail, as I'm doing a bit of reorganisation. I have a few
issues to address and would like to try and smooth out the rather
convoluted situation we have right now.
As many will know, I'm a freenode staffer and as such am in a position
to notice the lack of continuity present. In particular many of our
bots use flood-protection exemption due to the nature of their
much-data roles. Unfortunately, with the death of freenode's founder
Rob Levin, there is no longer a very organised record of which bots
are doing what and where, and I would really like to establish a
better one. By doing this I aim to make things a lot easier for bot
operators to get the information/permissions they need. Please note
that this has nothing to do with MediaWiki bot flags or community
permission, which is still very important for bots that edit as well
as speak on IRC.
I would like to make two major changes to what we do at the moment.
Firstly, I would like to cloak all active bots with wikimedia/bot/nick
(wikimedia can be replaced by wikipedia, wikisource etc.) and add them
to a list I'll keep on meta (not set up yet, will see how this goes
first!). At the moment, there are a good few usercloak/bot/botnick
(e.g. I have wikimedia/xyrael/bot/winesteward) and a few
wikimedia/bot/botnicks around. I would like to make the distinction
that usercloak/bot/botnick is a non-Wikimedia bot and that the
project/bot/botnick is for those that are run specifically for one of
Secondly, I would like to use one bot o:line for flood protection.
Currently, there are a good few floating about, and I'm not sure who
is actually using them actively - a cull of bots no longer in use
would be good from freenode's perspective. There are obvious trust
issues with this in that one password leak would be a lot of trouble
compared with bot operators guarding their own personal passwords, but
I think it's worth it because of the two stage process of changing the
o:lines: me as group contact and then as staff, then actually getting
hold of someone to make the change.
I realise that I've rambled a bit here, and so I'll summarise my requests:
* That all operators of IRC bots contact me via e-mail telling me
their bot nickname, what is does and what it is cloaked with, as well
as a note if it uses an o:line. I can then recloak them with your
assistance. This also has the purpose of weeding out inactive bots (no
action being taken yet, though).
* That anyone who knows an operator who doesn't read any of the lists
forwards this to them and asks them to complete my request, perhaps
translating if necessary.
* That any ideas/complaints/ways-that-are-significantly-better-than-this
are expressed in this mailing list conversation!
Friendly IRC group contact
—Sean Whitton (Xyrael/xyr)
An automated run of parserTests.php showed the following failures:
This is MediaWiki version 1.10alpha (r20860).
Reading tests from "maintenance/parserTests.txt"...
Reading tests from "extensions/Cite/citeParserTests.txt"...
Reading tests from "extensions/Poem/poemParserTests.txt"...
17 still FAILING test(s) :(
* URL-encoding in URL functions (single parameter) [Has never passed]
* URL-encoding in URL functions (multiple parameters) [Has never passed]
* Table security: embedded pipes (http://mail.wikipedia.org/pipermail/wikitech-l/2006-April/034637.html) [Has never passed]
* Link containing double-single-quotes '' (bug 4598) [Has never passed]
* message transform: <noinclude> in transcluded template (bug 4926) [Has never passed]
* message transform: <onlyinclude> in transcluded template (bug 4926) [Has never passed]
* BUG 1887, part 2: A <math> with a thumbnail- math enabled [Has never passed]
* HTML bullet list, unclosed tags (bug 5497) [Has never passed]
* HTML ordered list, unclosed tags (bug 5497) [Has never passed]
* HTML nested bullet list, open tags (bug 5497) [Has never passed]
* HTML nested ordered list, open tags (bug 5497) [Has never passed]
* Inline HTML vs wiki block nesting [Has never passed]
* Mixing markup for italics and bold [Has never passed]
* dt/dd/dl test [Has never passed]
* Images with the "|" character in the comment [Has never passed]
* Parents of subpages, two levels up, without trailing slash or name. [Has never passed]
* Parents of subpages, two levels up, with lots of extra trailing slashes. [Has never passed]
Passed 494 of 511 tests (96.67%)... 17 tests failed!
After running importDump.php to completion, then running initStats.php
to update the statistics for the newly imported database,
I get the following output:
[root@gadugi /]# cd /wikidump/en
[root@gadugi en]# php maintenance/initStats.php
Refresh Site Statistics
Counting total edits...4798436
Counting number of articles...1980988
Counting total pages...4797798
Counting number of users...1
Counting number of admins...1
Counting number of images...1092505
Counting total page views...67977
Updating site statistics...done.
If I subsequently invoke rebuildall.php against this database, it
reports double the number of pages, runs up to about 277000 articles,
php process goes to sleep and never wakes up. rebuildall.php reports
the following wrong article count:
[root@gadugi en]# php maintenance/rebuildall.php
** Rebuilding fulltext search index (if you abort this will break
searching; run this script again to fix):
Rebuilding index fields for 9648325 pages...