Hi everyone,
I recently set up a MediaWiki (http://server.bluewatersys.com/w90n740/)
and I need to extra the content from it and convert it into LaTeX
syntax for printed documentation. I have googled for a suitable OSS
solution but nothing was apparent.
I would prefer a script written in Python, but any recommendations
would be very welcome.
Do you know of anything suitable?
Kind Regards,
Hugo Vincent,
Bluewater Systems.
Hello, everyone. I'm writing to this group because Wayne Saewyc tells me
that you might be interested in what I'm trying to present. My name is
Robert Rapplean, and I'm a software engineer and political analyst. You can
understand that I've spent an immense amount of time attempting to get ideas
across in the massively multiuser asynchronous world of the Internet. Over
the years I've developed a detailed understanding of the problems inherent
in trying to persue a logical argument in this kind of environment, and I've
used that understanding to design a tool that addresses these problems.
I am a Wikipedia user, and make it a point to contribute to the articles
when I find I have more expertise than those who have already presented
information. After spending quite a bit of time unwinding the sometimes
barely comprehensible dialogs that have occured on the discussion pages of
the articles, I've concluded that this particular environment would benefit
greatly from the implemenation of exactly the kind of tool that I've
designed.
With that in mind, I'm going to attempt to describe the idea to you. The
remains of this email is a short description of the design of the tool and
its reason for being structured the way it is.
In my examination of online debates, I've noted a small bestiary of bad
debating habits, almost all of which fall under the categories of "casual
debater" or "hostile debater". Casual debaters are those that don't take
the time to paruse the previous debate that has occured on a topic. They
tend to re-submit points that have already been debated ad-nausum and
require re-iteration of important talking points. Everyone starts out in
this category, but the casual debater gets bored before they get beyond that
point. Because online debating tools are very poor at organizing previous
information, it quickly becomes a prodigious effort to get up to speed on a
debate. This means that any forum which has enough contributors to form a
decent consensus also has a steady stream of neophytes clogging the
communication streams with off-the-cuff comments and other distractions.
An unfortunate side effect of this is that many of the good debaters get to
the point where they're tired of re-arguing the same points over and over
again. When the debate follows those lines yet again, they tend to quit
contributing, and may leave the forum entirely.
Hostile debaters are those who aren't there to exchange ideas so much as to
spout them. In other words, they're all mouth and no ears. They don't want
to find the truth, they want everyone to accept their personal truth. Their
entire purpose on the forum is to get a personal thrill from defeating the
opposition through wit, strategy, and tactics. As a result, they persue an
argument via the well-worn tactics of attacking where the enemy is weak and
retreating where the enemy is strong. If they can't win a particular point,
they'll shift the topic to something that they think the opponent might be
less strong on. They'll continue stringing their opponents on a line of
topics until they can find one that the opponent isn't as well versed on,
and then stand on it like a bastion of safety, insisting that it's the only
valid perspective from which to view the concept. If they can't find a weak
point, they'll circle back around to the original topic hoping for a second
try or resort to standard logic errors like ad-hominen attacks or faulty
analogies.
Although the design of the tool addresses many other issues (like ballot box
stuffing and squeaky wheel effects), these should be adequate to understand
the reasoning behind the basic structure I'm about to explain. As I go
along, I'm going to compare my design to existing online collaborative
tools, like wikis and forums.
In order to deal with a lot of the tactics of the hostile debater, I started
by removing the linear nature of wikis and forums. You can't lead a person
in circles if you're glued to the spot. With this in mind, the base unit of
this tool is a conjecture, something like "alcoholism is a disease". Each
person may (not must) make one statement about the conjecture. They can
change the statement any time that they like, but that one statement must be
a summation of their entire opinion on that conjecture. Then everyone gets
to vote on the statement that best matches their personal opinion. If none
of them match closely enough, they can make their own statement.
Statements are ranked based on popularity. Additionally, the writer of the
statement indicates the bias of their statement. A bias states that the
conjecture is:
1. factual (based on repeatable phenomena)
2. true (not based on repeatable phenomena, but enough evidence exists)
3. unproven (enough evidence does not exist one way or the other)
4. unprovable (the conjecture requires evidence that is not obtainable)
5. unsupported (the evidence suggests that the conjecture is not true)
6. false (repeatable phenomena disproves the conjecture conclusively)
For the purposes of determining the validity of a conjecture, all statements
with 1 & 2 add their votes together, all with 3 & 4 go together, and all
with 5 & 6 go together. This creates a distinct identification of the
participant's current consensus on the matter.
Since people need a place to ask questions and discuss ideas, a standard
message list should be matched with the conjecture, but it is strongly
suggested that all messages on the list group expire and vanish after 30
days or so to encourage the participants to embody their ideas in their
statements, not in their messages.
There's a further aspect of this. Every conjecture debated tends to result
in child conjectures, for instance "a disease is anything which effects the
wellness of an individual". These become their own conjectures, with their
own statements and (importantly) its own message list. It is voted on to
determine its individual validity, and it gets linked to the parent
conjecture. Participants in the parent conjecture can then rate the
relevance of the child conjecture to the parent conjecture, and take into
account the most relevant child conjectures when voting on a statement.
Taking this a step further, the conjectures can then be all reused. For
instance, a conjecture like "the will of god is unknowable" could be used
again and again, being attached to a very wide range of parent conjectures
without having to re-create it and re-argue it every time.
The final result would be that, for each conjecture, all of the reasoning
behind the current decision would be laid out in a readily examinable
format, ordered by relevance. This makes things much, much easier for the
casual arguer. The modular format also makes it extremely easy to slap a
"logic foul" conjecture on anyone who presents falacious arguments. The
non-linear format totally wrecks topic-shifting tactics, and the voting
system indicates not just how people feel about something, but how firmly
they feel about it.
I think that'll do it for an introduction. If this is interesting to you,
please let me know and I can provide you with more details.
Yours,
Robert Rapplean
Aim
===
To see if it's possible to execute arbitrary client-side JavaScript,
using MediaWiki as the delivery system.
Background
==========
To execute arbitrary JavaScript, we mostly need to find a way to get
MediaWiki to allow us to open a tag ("<script"), and to close a tag
(">"). In particular, we need some form of wiki input text that will
produce this as the rendered HTML output.
The MediaWiki parser seems to try to prevent both of these things, by
escaping "<" and ">" characters. For example, if you give it wiki
input like this:
---------------------------------------------------------
<"hello world">
---------------------------------------------------------
Then you get back this HTML:
---------------------------------------------------------
<"hello world">
---------------------------------------------------------
However, the parser is not perfect. There are some inputs that will
give unescaped ">" or "<" back.
The trick is probably to combine these omissions together, in such a
way so as to produce a working exploit. I don't have such a thing yet,
but I suspect it might be possible.
Unescaped Closing tags
======================
Getting the parser to give unescaped closing tags is much easier than
finding unescaped opening tags.
For example, a wiki input of just this:
---------------------------------------------------------
>>>>>>>
---------------------------------------------------------
Will give this HTML output:
---------------------------------------------------------
>>>>>>>
---------------------------------------------------------
(i.e. no escaping).
However, if we try to open one or more tags beforehand, then that
changes. So this wiki input:
---------------------------------------------------------
<
>>>>>>>
---------------------------------------------------------
Gives this HTML output:
---------------------------------------------------------
<
>>>>>>>
---------------------------------------------------------
So in other words, we can do this (close a tag and provide some
JavaScript to be executed), provided we don't use the "<" character:
---------------------------------------------------------
onmouseover="alert(document.cookie)">test
---------------------------------------------------------
... and we will get that literal text back in the HTML.
So to sum up: Any time after we use a "<" character, we lose this
privilege of having unescaped ">" characters.
To me, this feels like it might perhaps be a mistake, because it
allows an attacker an opening that they probably don't need to be
given (i.e. it's a free kick).
Unescaped Opening tags
======================
Almost all uses of "<" in the wiki input will result in "<" in the
HTML output.
However there are some uses that do not. In particular, I've found
that table properties are very weakly restricted, and we can get the
Parser to produce unescaped "<" characters with each of the following
3 inputs:
---------------------------------------------------------
{| WIDTH=[[image:ftp://~
{| ALIGN='''~~~</math>
{| BGCOLOR=<span style="font-weight: bold;">
---------------------------------------------------------
Which will give this HTML output:
---------------------------------------------------------
<table width="[[image:<a" class="external free" title="ftp://~">
<table align="<b><!--LINK">
<table bgcolor="<span">
---------------------------------------------------------
Some observations / problems here:
1) The unescaped "<" characters are in attribute strings. We need
somehow to avoid that, or break out of that, if the browser is to obey
them.
2) The type of the tags is limited ("<a>", "<!--" and "<span>" tags in
the above examples).
3) The final two examples that use "<" will mean that we cannot close
the tag (because, as described above, by using the "<" character we
lose the privilege of having close tags).
However we can avoid problem 3) with this, which never uses a "<" character:
---------------------------------------------------------
{| WIDTH=[[image:ftp://~
onmouseover="alert(document.cookie)">test
---------------------------------------------------------
Which will give this HTML output:
---------------------------------------------------------
<table width="[[image:<a" class="external free" title="ftp://~">
onmouseover="alert(document.cookie)">test
---------------------------------------------------------
... however this still has problems 1) and 2) described above.
Problem 2) may not necessarily be a showstopper (e.g. HTML like: '<a
href="#" onmouseover="alert(document.cookie);">Free Porn!</a>' is not
as powerful as something that successfully uses "onLoad", but it is
predictable that it will work a reasonable percentage of the time).
On the other hand, problem 1) is currently a huge restriction.
Conclusions
===========
* If anyone knows of a way of overcoming problems 1) and 2), or of an
alternate method, then please let me know. By combining that
information with the information above, it may well be possible to
create a working Proof-Of-Concept.
* Why does MediaWiki ever allow unescaped ">" characters? This
behaviour seem to increase the chances of a JavaScript security
problem.
All the best,
Nick.
Hello,
Does MediaWiki allows to automatically put a certain text in advance on a
non-existing page? What I am trying to do is this: when users click on a
picture, a (link to a) new page is generated with the name
"{{PAGENAME}}-more-detailed-name".
Is it possible to already put automatically some text (e.g. "{{template}}")
on this new (non existing) page?
greetings,
JAN
Hi, buddies. The Chinese Wikimedia users need some technical help from
the developers at the Foundation.
As everyone here believes, the government of P.R. China has blocked
access to all Wikimedia sites using its "Great Firewall"; Wikimedia
contributors and users in Mainland China have to use proxies to visit
Wikimedia sites. However, most of these proxies are unstable, so
CNBlog.org (a famous and prominent advocacy of blogosphere in China
operated by Social Brian Foundation, which hosted the First Chinese
Blogger Conference last year) has setup a stable proxy for Chinese
users.
With this service, Mainland Chinese users can access Chinese Wikipedia
using http://wikipedia.cnblog.org, and Chinese Wikinews using
http://wikinews.cnblog.org. There are over 16,000 visits and over
110,000 page requests per day this month. CNBlog.org has been very
helpful to Chinese users, and it is planning to expand its proxy
service to cover all Chinese Wikimedia projects, which is really great
news.
Unfortunately, Chinese Wikimedia administrators are frowned upon an
issue that came up recently: lots of vandalism is done using the
CNBlog proxy. Currently, the IP of the CNBlog proxy is displayed and
logged in edit history on Wikimedia. If an administrator blocks this
IP (CNBlog proxy), all Mainland Chinese users who are using CNBlog
proxy, either logged-in or not, will be blocked as well and not be
able to contribute.
Some Chinese Wikipedians discussed this issue with CNBlog.org, and we
believe that a technical solution should be feasible. We also
discussed some technical details, and I will send another e-mail to
wikitech-l regarding the details. Basically, we would like to have
users' real IP (IP used to access CNBlog proxy) displayed and logged
at Wikimedia.
We send this e-mail to here on behalf of many Chinese users. We hope
to have kind attention and help from the Foundation. Thank you very
much.
[[User:Shizhao]]
[[User:R.O.C]]
[[User:Yongxinge]]
[[User:Mountain]]
I'm looking into writing an script to (login and) edit MW page
content, via HTTP. Is there a sample script that anyone can
recommend? Alternatively, is there any documentation on how MW
handles login sessions (eg, what cookies it sets), etc?
-r
--
http://www.cfcl.com/rdm Rich Morin
http://www.cfcl.com/rdm/resume rdm(a)cfcl.com
http://www.cfcl.com/rdm/weblog +1 650-873-7841
Technical editing and writing, programming, and web development
Thanks, Rich. I've looked at IBIS, and it also takes the divide-and-conquor
approach to problem solving, which I think is one of this tool's greatest
advantages. Being able to divide an issue into subissues allows people to
identify what they're really disagreeing about instead of arguing over
things that are really differences in definition.
Unfortunately, IBIS is a managed system, which is to say that there is no
automated method for identifying when a particular conjecture has enough
evidence for most of the participants to accept or disclaim its truth. That
process seems to be done via an occasional vote, kind of like for
parlimentary procedure.
Since it lacks "degree of relevance" measures, it fails the casual arguer in
identifying the most important sub-points behind a conjecture. Also, the
lack of a "one person, one statement" structure means that all discussions
occur in a linear nature, which falls prey to the hostile arguer.
In short, while IBIS has an excellent structure, it lacks the specific
implementation details that I've designed into this tool that taylor it for
use in the massively diverse, unmoderated, ad-hoc environment that Wikipedia
exists in.
On 3/31/06, Rich Morin <rdm(a)cfcl.com> wrote:
>
> IBIS (Issue-Based Information System) can be used to
> structure discussions and cut down on repetition, while
> allowing everyone to have their say.
>
> http://www3.iath.virginia.edu/elab/hfl0104.html
> http://www-iurd.ced.berkeley.edu/pub/WP-131.pdf
>
> -r
> --
> http://www.cfcl.com/rdm Rich Morin
> http://www.cfcl.com/rdm/resume rdm(a)cfcl.com
> http://www.cfcl.com/rdm/weblog +1 650-873-7841
>
> Technical editing and writing, programming, and web development
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l(a)wikimedia.org
> http://mail.wikipedia.org/mailman/listinfo/wikitech-l
>
Brion Vibber wrote:
> Andrew Gray wrote:
>
>> Is that "keep recording but ignore them", or disable in the sense of
>> turn off logging totally? Just curious...
>>
>
> After a few months of having logs that you're not reading fill up the servers'
> hard disks every few days, you turn them off. :)
>
> -- brion vibber (brion @ pobox.com)
>
>
How about a cron job that turns logging on, then off, intermittently? Eg.
For example, on each server, have a cron job that does this:
Every 5 mins:
Is logging on?
Then: turn it off
Else: generate a random number
If it's == 0 mod 1000:
Then: turn logging on
Else: do nothing
This way, you get representative short blocks of 5 minutes of traffic,
kicking in once every three days or so on each of the 100 or so servers
at random times of the day or night. This would also suffice for gross
statistical analysis, and wouldn't require any modification of the squid
code, just a short external shell script.
Log-rotation should handle the rest and prevent the disks filling up,
since the average sampling rate would then be low enough to cope with.
-- Neil
On 3/24/06, Gabriel Wicke <gabrielwicke(a)users.sourceforge.net> wrote:
> Update of /cvsroot/wikipedia/phase3/includes
> In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv12319/includes
>
> Modified Files:
> Parser.php
> Log Message:
> Provide some cleanup if tidy is disabled:
>
> * fix invalid nesting of anchors and i/b
> * remove empty i/b tags
> * remove divs inside anchors
>
> Fixes several test cases
>
>
> Index: Parser.php
> ===================================================================
> RCS file: /cvsroot/wikipedia/phase3/includes/Parser.php,v
> retrieving revision 1.602
> retrieving revision 1.603
> diff -u -d -r1.602 -r1.603
> --- Parser.php 22 Mar 2006 04:57:14 -0000 1.602
> +++ Parser.php 24 Mar 2006 16:36:29 -0000 1.603
> @@ -250,6 +250,32 @@
>
> if (($wgUseTidy and $this->mOptions->mTidy) or $wgAlwaysUseTidy) {
> $text = Parser::tidy($text);
> + } else {
> + # attempt to sanitize at least some nesting problems
> + # (bug #2702 and quite a few others)
> + $tidyregs = array(
> + # ''Something [http://www.cool.com cool''] -->
> + # <i>Something</i><a href="http://www.cool.com"..><i>cool></i></a>
> + '/(<([bi])>)(<([bi])>)?([^<]*)(<\/?a[^<]*>)([^<]*)(<\/\\4>)?(<\/\\2>)/' =>
> + '\\1\\3\\5\\8\\9\\6\\1\\3\\7\\8\\9',
> + # fix up an anchor inside another anchor, only
> + # at least for a single single nested link (bug 3695)
> + '/(<a[^>]+>)([^<]*)(<a[^>]+>[^<]*)<\/a>(.*)<\/a>/' =>
> + '\\1\\2</a>\\3</a>\\1\\4</a>',
> + # fix div inside inline elements- doBlockLevels won't wrap a line which
> + # contains a div, so fix it up here; replace
> + # div with escaped text
> + '/(<([aib]) [^>]+>)([^<]*)(<div([^>]*)>)(.*)(<\/div>)([^<]*)(<\/\\2>)/' =>
> + '\\1\\3<div\\5>\\6</div>\\8\\9',
> + # remove empty italic or bold tag pairs, some
> + # introduced by rules above
> + '/<([bi])><\/\\1>/' => ''
> + );
> +
> + $text = preg_replace(
> + array_keys( $tidyregs ),
> + array_values( $tidyregs ),
> + $text );
> }
>
> wfRunHooks( 'ParserAfterTidy', array( &$this, &$text ) );
This fixes the "Bug 2702: Mismatched <i>, <b> and <a> tags are
invalid" test case but it's not really an improvement. The test case
was supposed to demonstrate that we don't balance tags, which this
doesn't fix, it merely hacks around very specific cases with regular
expressions which fail if you insert more tags which would be fixed in
a parser that balanced tags properly.
I'm all for fixing the parser, but it's not an improvement to make
that parser test cases we have pass by basically writing a hack in the
parser to make just that test pass rather than fixing the core issue.