Wikitech-l March 2006

wikitech-l@lists.wikimedia.org

106 participants
125 discussions

by Hugo Vincent

Hi everyone, I recently set up a MediaWiki (http://server.bluewatersys.com/w90n740/) and I need to extra the content from it and convert it into LaTeX syntax for printed documentation. I have googled for a suitable OSS solution but nothing was apparent. I would prefer a script written in Python, but any recommendations would be very welcome. Do you know of anything suitable? Kind Regards, Hugo Vincent, Bluewater Systems.

11 years, 10 months

207.142.131.221 listed in bl.spamcop.net

by Walter Vermeir

I have got a bounce because of this when send a reply to a OTRS-ticket. http://www.spamcop.net/w3m?action=checkblock&ip=207.142.131.221 Can be seen what can be done about this? [[meta:user:Walter]]

17 years, 10 months

A design for a debate tool

by Robert Rapplean

Hello, everyone. I'm writing to this group because Wayne Saewyc tells me that you might be interested in what I'm trying to present. My name is Robert Rapplean, and I'm a software engineer and political analyst. You can understand that I've spent an immense amount of time attempting to get ideas across in the massively multiuser asynchronous world of the Internet. Over the years I've developed a detailed understanding of the problems inherent in trying to persue a logical argument in this kind of environment, and I've used that understanding to design a tool that addresses these problems. I am a Wikipedia user, and make it a point to contribute to the articles when I find I have more expertise than those who have already presented information. After spending quite a bit of time unwinding the sometimes barely comprehensible dialogs that have occured on the discussion pages of the articles, I've concluded that this particular environment would benefit greatly from the implemenation of exactly the kind of tool that I've designed. With that in mind, I'm going to attempt to describe the idea to you. The remains of this email is a short description of the design of the tool and its reason for being structured the way it is. In my examination of online debates, I've noted a small bestiary of bad debating habits, almost all of which fall under the categories of "casual debater" or "hostile debater". Casual debaters are those that don't take the time to paruse the previous debate that has occured on a topic. They tend to re-submit points that have already been debated ad-nausum and require re-iteration of important talking points. Everyone starts out in this category, but the casual debater gets bored before they get beyond that point. Because online debating tools are very poor at organizing previous information, it quickly becomes a prodigious effort to get up to speed on a debate. This means that any forum which has enough contributors to form a decent consensus also has a steady stream of neophytes clogging the communication streams with off-the-cuff comments and other distractions. An unfortunate side effect of this is that many of the good debaters get to the point where they're tired of re-arguing the same points over and over again. When the debate follows those lines yet again, they tend to quit contributing, and may leave the forum entirely. Hostile debaters are those who aren't there to exchange ideas so much as to spout them. In other words, they're all mouth and no ears. They don't want to find the truth, they want everyone to accept their personal truth. Their entire purpose on the forum is to get a personal thrill from defeating the opposition through wit, strategy, and tactics. As a result, they persue an argument via the well-worn tactics of attacking where the enemy is weak and retreating where the enemy is strong. If they can't win a particular point, they'll shift the topic to something that they think the opponent might be less strong on. They'll continue stringing their opponents on a line of topics until they can find one that the opponent isn't as well versed on, and then stand on it like a bastion of safety, insisting that it's the only valid perspective from which to view the concept. If they can't find a weak point, they'll circle back around to the original topic hoping for a second try or resort to standard logic errors like ad-hominen attacks or faulty analogies. Although the design of the tool addresses many other issues (like ballot box stuffing and squeaky wheel effects), these should be adequate to understand the reasoning behind the basic structure I'm about to explain. As I go along, I'm going to compare my design to existing online collaborative tools, like wikis and forums. In order to deal with a lot of the tactics of the hostile debater, I started by removing the linear nature of wikis and forums. You can't lead a person in circles if you're glued to the spot. With this in mind, the base unit of this tool is a conjecture, something like "alcoholism is a disease". Each person may (not must) make one statement about the conjecture. They can change the statement any time that they like, but that one statement must be a summation of their entire opinion on that conjecture. Then everyone gets to vote on the statement that best matches their personal opinion. If none of them match closely enough, they can make their own statement. Statements are ranked based on popularity. Additionally, the writer of the statement indicates the bias of their statement. A bias states that the conjecture is: 1. factual (based on repeatable phenomena) 2. true (not based on repeatable phenomena, but enough evidence exists) 3. unproven (enough evidence does not exist one way or the other) 4. unprovable (the conjecture requires evidence that is not obtainable) 5. unsupported (the evidence suggests that the conjecture is not true) 6. false (repeatable phenomena disproves the conjecture conclusively) For the purposes of determining the validity of a conjecture, all statements with 1 & 2 add their votes together, all with 3 & 4 go together, and all with 5 & 6 go together. This creates a distinct identification of the participant's current consensus on the matter. Since people need a place to ask questions and discuss ideas, a standard message list should be matched with the conjecture, but it is strongly suggested that all messages on the list group expire and vanish after 30 days or so to encourage the participants to embody their ideas in their statements, not in their messages. There's a further aspect of this. Every conjecture debated tends to result in child conjectures, for instance "a disease is anything which effects the wellness of an individual". These become their own conjectures, with their own statements and (importantly) its own message list. It is voted on to determine its individual validity, and it gets linked to the parent conjecture. Participants in the parent conjecture can then rate the relevance of the child conjecture to the parent conjecture, and take into account the most relevant child conjectures when voting on a statement. Taking this a step further, the conjectures can then be all reused. For instance, a conjecture like "the will of god is unknowable" could be used again and again, being attached to a very wide range of parent conjectures without having to re-create it and re-argue it every time. The final result would be that, for each conjecture, all of the reasoning behind the current decision would be laid out in a readily examinable format, ordered by relevance. This makes things much, much easier for the casual arguer. The modular format also makes it extremely easy to slap a "logic foul" conjecture on anyone who presents falacious arguments. The non-linear format totally wrecks topic-shifting tactics, and the voting system indicates not just how people feel about something, but how firmly they feel about it. I think that'll do it for an introduction. If this is interesting to you, please let me know and I can provide you with more details. Yours, Robert Rapplean

18 years

Towards arbitrary client-side JavaScript execution

by Nick Jenkins

Aim === To see if it's possible to execute arbitrary client-side JavaScript, using MediaWiki as the delivery system. Background ========== To execute arbitrary JavaScript, we mostly need to find a way to get MediaWiki to allow us to open a tag ("<script"), and to close a tag (">"). In particular, we need some form of wiki input text that will produce this as the rendered HTML output. The MediaWiki parser seems to try to prevent both of these things, by escaping "<" and ">" characters. For example, if you give it wiki input like this: --------------------------------------------------------- <"hello world"> --------------------------------------------------------- Then you get back this HTML: --------------------------------------------------------- <"hello world"> --------------------------------------------------------- However, the parser is not perfect. There are some inputs that will give unescaped ">" or "<" back. The trick is probably to combine these omissions together, in such a way so as to produce a working exploit. I don't have such a thing yet, but I suspect it might be possible. Unescaped Closing tags ====================== Getting the parser to give unescaped closing tags is much easier than finding unescaped opening tags. For example, a wiki input of just this: --------------------------------------------------------- >>>>>>> --------------------------------------------------------- Will give this HTML output: --------------------------------------------------------- >>>>>>> --------------------------------------------------------- (i.e. no escaping). However, if we try to open one or more tags beforehand, then that changes. So this wiki input: --------------------------------------------------------- < >>>>>>> --------------------------------------------------------- Gives this HTML output: --------------------------------------------------------- < >>>>>>> --------------------------------------------------------- So in other words, we can do this (close a tag and provide some JavaScript to be executed), provided we don't use the "<" character: --------------------------------------------------------- onmouseover="alert(document.cookie)">test --------------------------------------------------------- ... and we will get that literal text back in the HTML. So to sum up: Any time after we use a "<" character, we lose this privilege of having unescaped ">" characters. To me, this feels like it might perhaps be a mistake, because it allows an attacker an opening that they probably don't need to be given (i.e. it's a free kick). Unescaped Opening tags ====================== Almost all uses of "<" in the wiki input will result in "<" in the HTML output. However there are some uses that do not. In particular, I've found that table properties are very weakly restricted, and we can get the Parser to produce unescaped "<" characters with each of the following 3 inputs: --------------------------------------------------------- {| WIDTH=[[image:ftp://~ {| ALIGN='''~~~</math> {| BGCOLOR=<span style="font-weight: bold;"> --------------------------------------------------------- Which will give this HTML output: --------------------------------------------------------- <table width="[[image:<a" class="external free" title="ftp://~"> <table align="<b><!--LINK"> <table bgcolor="<span"> --------------------------------------------------------- Some observations / problems here: 1) The unescaped "<" characters are in attribute strings. We need somehow to avoid that, or break out of that, if the browser is to obey them. 2) The type of the tags is limited ("<a>", "<!--" and "<span>" tags in the above examples). 3) The final two examples that use "<" will mean that we cannot close the tag (because, as described above, by using the "<" character we lose the privilege of having close tags). However we can avoid problem 3) with this, which never uses a "<" character: --------------------------------------------------------- {| WIDTH=[[image:ftp://~ onmouseover="alert(document.cookie)">test --------------------------------------------------------- Which will give this HTML output: --------------------------------------------------------- <table width="[[image:<a" class="external free" title="ftp://~"> onmouseover="alert(document.cookie)">test --------------------------------------------------------- ... however this still has problems 1) and 2) described above. Problem 2) may not necessarily be a showstopper (e.g. HTML like: '<a href="#" onmouseover="alert(document.cookie);">Free Porn!</a>' is not as powerful as something that successfully uses "onLoad", but it is predictable that it will work a reasonable percentage of the time). On the other hand, problem 1) is currently a huge restriction. Conclusions =========== * If anyone knows of a way of overcoming problems 1) and 2), or of an alternate method, then please let me know. By combining that information with the information above, it may well be possible to create a working Proof-Of-Concept. * Why does MediaWiki ever allow unescaped ">" characters? This behaviour seem to increase the chances of a JavaScript security problem. All the best, Nick.

18 years

Automatically put text on a non existing page?

by Jan Vanoverpelt

Hello, Does MediaWiki allows to automatically put a certain text in advance on a non-existing page? What I am trying to do is this: when users click on a picture, a (link to a) new page is generated with the name "{{PAGENAME}}-more-detailed-name". Is it possible to already put automatically some text (e.g. "{{template}}") on this new (non existing) page? greetings, JAN

18 years

Problems on block ip user in Chinese Wikipedia

by mingli yuan

Hi, buddies. The Chinese Wikimedia users need some technical help from the developers at the Foundation. As everyone here believes, the government of P.R. China has blocked access to all Wikimedia sites using its "Great Firewall"; Wikimedia contributors and users in Mainland China have to use proxies to visit Wikimedia sites. However, most of these proxies are unstable, so CNBlog.org (a famous and prominent advocacy of blogosphere in China operated by Social Brian Foundation, which hosted the First Chinese Blogger Conference last year) has setup a stable proxy for Chinese users. With this service, Mainland Chinese users can access Chinese Wikipedia using http://wikipedia.cnblog.org, and Chinese Wikinews using http://wikinews.cnblog.org. There are over 16,000 visits and over 110,000 page requests per day this month. CNBlog.org has been very helpful to Chinese users, and it is planning to expand its proxy service to cover all Chinese Wikimedia projects, which is really great news. Unfortunately, Chinese Wikimedia administrators are frowned upon an issue that came up recently: lots of vandalism is done using the CNBlog proxy. Currently, the IP of the CNBlog proxy is displayed and logged in edit history on Wikimedia. If an administrator blocks this IP (CNBlog proxy), all Mainland Chinese users who are using CNBlog proxy, either logged-in or not, will be blocked as well and not be able to contribute. Some Chinese Wikipedians discussed this issue with CNBlog.org, and we believe that a technical solution should be feasible. We also discussed some technical details, and I will send another e-mail to wikitech-l regarding the details. Basically, we would like to have users' real IP (IP used to access CNBlog proxy) displayed and logged at Wikimedia. We send this e-mail to here on behalf of many Chinese users. We hope to have kind attention and help from the Foundation. Thank you very much. [[User:Shizhao]] [[User:R.O.C]] [[User:Yongxinge]] [[User:Mountain]]

18 years

HTTP interaction with MW?

by Rich Morin

I'm looking into writing an script to (login and) edit MW page content, via HTTP. Is there a sample script that anyone can recommend? Alternatively, is there any documentation on how MW handles login sessions (eg, what cookies it sets), etc? -r -- http://www.cfcl.com/rdm Rich Morin http://www.cfcl.com/rdm/resume rdm(a)cfcl.com http://www.cfcl.com/rdm/weblog +1 650-873-7841 Technical editing and writing, programming, and web development

18 years

Re: [Wikitech-l] A design for a debate tool

by Robert Rapplean

Thanks, Rich. I've looked at IBIS, and it also takes the divide-and-conquor approach to problem solving, which I think is one of this tool's greatest advantages. Being able to divide an issue into subissues allows people to identify what they're really disagreeing about instead of arguing over things that are really differences in definition. Unfortunately, IBIS is a managed system, which is to say that there is no automated method for identifying when a particular conjecture has enough evidence for most of the participants to accept or disclaim its truth. That process seems to be done via an occasional vote, kind of like for parlimentary procedure. Since it lacks "degree of relevance" measures, it fails the casual arguer in identifying the most important sub-points behind a conjecture. Also, the lack of a "one person, one statement" structure means that all discussions occur in a linear nature, which falls prey to the hostile arguer. In short, while IBIS has an excellent structure, it lacks the specific implementation details that I've designed into this tool that taylor it for use in the massively diverse, unmoderated, ad-hoc environment that Wikipedia exists in. On 3/31/06, Rich Morin <rdm(a)cfcl.com> wrote: > > IBIS (Issue-Based Information System) can be used to > structure discussions and cut down on repetition, while > allowing everyone to have their say. > > http://www3.iath.virginia.edu/elab/hfl0104.html > http://www-iurd.ced.berkeley.edu/pub/WP-131.pdf > > -r > -- > http://www.cfcl.com/rdm Rich Morin > http://www.cfcl.com/rdm/resume rdm(a)cfcl.com > http://www.cfcl.com/rdm/weblog +1 650-873-7841 > > Technical editing and writing, programming, and web development > _______________________________________________ > Wikitech-l mailing list > Wikitech-l(a)wikimedia.org > http://mail.wikipedia.org/mailman/listinfo/wikitech-l >

18 years

Re: [Wikitech-l] [Wikipedia-l] Mediametry survey (was )Re: Language versions' popularity vs. number of articles...)

by Neil Harris

Brion Vibber wrote: > Andrew Gray wrote: > >> Is that "keep recording but ignore them", or disable in the sense of >> turn off logging totally? Just curious... >> > > After a few months of having logs that you're not reading fill up the servers' > hard disks every few days, you turn them off. :) > > -- brion vibber (brion @ pobox.com) > > How about a cron job that turns logging on, then off, intermittently? Eg. For example, on each server, have a cron job that does this: Every 5 mins: Is logging on? Then: turn it off Else: generate a random number If it's == 0 mod 1000: Then: turn logging on Else: do nothing This way, you get representative short blocks of 5 minutes of traffic, kicking in once every three days or so on each of the 100 or so servers at random times of the day or night. This would also suffice for gross statistical analysis, and wouldn't require any modification of the squid code, just a short external shell script. Log-rotation should handle the rest and prevent the disks filling up, since the average sampling rate would then be low enough to cope with. -- Neil

18 years

Re: [Wikitech-l] [MediaWiki-CVS] phase3/includes Parser.php, 1.602, 1.603

by Ævar Arnfjörð Bjarmason

On 3/24/06, Gabriel Wicke <gabrielwicke(a)users.sourceforge.net> wrote: > Update of /cvsroot/wikipedia/phase3/includes > In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv12319/includes > > Modified Files: > Parser.php > Log Message: > Provide some cleanup if tidy is disabled: > > * fix invalid nesting of anchors and i/b > * remove empty i/b tags > * remove divs inside anchors > > Fixes several test cases > > > Index: Parser.php > =================================================================== > RCS file: /cvsroot/wikipedia/phase3/includes/Parser.php,v > retrieving revision 1.602 > retrieving revision 1.603 > diff -u -d -r1.602 -r1.603 > --- Parser.php 22 Mar 2006 04:57:14 -0000 1.602 > +++ Parser.php 24 Mar 2006 16:36:29 -0000 1.603 > @@ -250,6 +250,32 @@ > > if (($wgUseTidy and $this->mOptions->mTidy) or $wgAlwaysUseTidy) { > $text = Parser::tidy($text); > + } else { > + # attempt to sanitize at least some nesting problems > + # (bug #2702 and quite a few others) > + $tidyregs = array( > + # ''Something [http://www.cool.com cool''] --> > + # <i>Something</i><a href="http://www.cool.com"..><i>cool></i></a> > + '/(<([bi])>)(<([bi])>)?([^<]*)(<\/?a[^<]*>)([^<]*)(<\/\\4>)?(<\/\\2>)/' => > + '\\1\\3\\5\\8\\9\\6\\1\\3\\7\\8\\9', > + # fix up an anchor inside another anchor, only > + # at least for a single single nested link (bug 3695) > + '/(<a[^>]+>)([^<]*)(<a[^>]+>[^<]*)<\/a>(.*)<\/a>/' => > + '\\1\\2</a>\\3</a>\\1\\4</a>', > + # fix div inside inline elements- doBlockLevels won't wrap a line which > + # contains a div, so fix it up here; replace > + # div with escaped text > + '/(<([aib]) [^>]+>)([^<]*)(<div([^>]*)>)(.*)(<\/div>)([^<]*)(<\/\\2>)/' => > + '\\1\\3<div\\5>\\6</div>\\8\\9', > + # remove empty italic or bold tag pairs, some > + # introduced by rules above > + '/<([bi])><\/\\1>/' => '' > + ); > + > + $text = preg_replace( > + array_keys( $tidyregs ), > + array_values( $tidyregs ), > + $text ); > } > > wfRunHooks( 'ParserAfterTidy', array( &$this, &$text ) ); This fixes the "Bug 2702: Mismatched <i>, <b> and <a> tags are invalid" test case but it's not really an improvement. The test case was supposed to demonstrate that we don't balance tags, which this doesn't fix, it merely hacks around very specific cases with regular expressions which fail if you insert more tags which would be fixed in a parser that balanced tags properly. I'm all for fixing the parser, but it's not an improvement to make that parser test cases we have pass by basically writing a hack in the parser to make just that test pass rather than fixing the core issue.

18 years

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l March 2006