Wikitech-l September 2007

wikitech-l@lists.wikimedia.org

98 participants
126 discussions

by Hugo Vincent

Hi everyone, I recently set up a MediaWiki (http://server.bluewatersys.com/w90n740/) and I need to extra the content from it and convert it into LaTeX syntax for printed documentation. I have googled for a suitable OSS solution but nothing was apparent. I would prefer a script written in Python, but any recommendations would be very welcome. Do you know of anything suitable? Kind Regards, Hugo Vincent, Bluewater Systems.

11 years, 10 months

10k

by Domas Mituzas

Hi, today we came over 10k HTTP requests per second (even with inter-squid traffic eliminated). Especially thanks to Mark and Tim, who've been improving our caching, as well as doing lots of other work, and achieved incredible results (while I was slacking). Really, thanks! Domas

16 years, 4 months

Arabic OTRS encoding

by Muhammad Alsebaey

Hi All, I just joined the OTRS team a short time ago but I noticed a chronic problem with the Arabic OTRS system, most of the messages in Arabic come to us encoded as Windows-1256 and not UTF-8 unfortunately (probably because MS Outlook, the prevalent client in Arab-speaking countries encodes them as such) , therefore to read them, we have to switch to plain view then change the encoding from the browser to read the mail, this is a minor inconvenience, the big problem however is that when we send them back a response through the system, users receive it as gibberish (I am guessing it is UTF-8 encoded). I have seen a lot of users complain in the short time I have been on OTRS that we are sending out unintelligible messages as response to them. Does this have a solution? how does OTRS handles encoding in general? do volunteers for other languages have the same problem? -- Best Regards, Muhammad Alsebaey

16 years, 6 months

Autologin from Mediawiki to another application

by Andreas Rindler

Hi, I am trying to find a way to autologin users who register or login to Mediawiki to also be registered and logged into another application's user database in order to save them a second registration. I have found many extensions that do it the other way round (from another application automatically into MW), but not this case. Does anyone have a suggestion on how to go about doing this? The second application is a php based web app with its own, very simple security model. It just needs username, password and email address. Some use cases: #1 1. New user fills in registration page in MW 2. a) MW registers user in MW database 2. b) MW registers user in second, external (but local) database 3. User is logged into MW and logged into external application #2 1. Existing user logs into MW 2. MW automatically logs user into other application #3 1. User logs out of MW 2. MW logs out user from other application #4 1. User changes password in MW 2. MW updates password in other database (there could be a variation of this use case if users use 'forgot password' and similar) Thanks, Andi

16 years, 6 months

User account reset

by Jalen

Hi, I have a question that concerns access to my wikipedia user accounts. I have user accounts at the English, Slovenian, German and Spanish wikipedias under the username "Jalen" ("Jalen1" on the German wikipedia). My access to these accounts has been blocked due to weak passwords. I had my my e-mail address provided on the Spanish wikipedia but not on the other three wikipedias, hence I am unable to restore access to the accounts by myself. My e-mail address is scythus at volja.net. I am also subscribed to the wikitech-l list under the same username ("Jalen") and the same e-mail address ("scythus(a)volja.net"). Would it be legitimate to request that my access to the user accounts on en:, de: and sl: wikis be restored? A bureaucrat at my native (sl:) wiki can confirm my identity since he has previously communicated with me through the above-mentioned e-mail address (he was the first person I contacted for troubleshooting, but since bureaucrats can not restore access to user accounts I was told to contact developers) and has also seen my IP address which is 84.52.134.168. To further prove my identity, I have saved the confirmation mail I received on opening my account on the Spanish wikipedia, where the original IP address is stated. I would also be satisfied if only the user account on my native (sl:) wiki could be restored since, as I have said, a bureaucrat there knows me and can confirm my the username ("Jalen"), the e-mail address ("scythus(a)volja.net") and the IP address belong to one and the same person. I would kindly appreciate any response or assistance. Regards, Jalen

16 years, 6 months

Dumps of English Wikipedia failing

by Luca de Alfaro

I see that the latest dump of the English Wikipedia failed (I mean, the dump of all the page histories). As part of some other work I am doing, I have efficient code that can "take apart" a dump into its single component pages, and out of that, it would be possible to fashion code that "stitches together" various partial dumps. This would allow to break up a single dump process into multiple, shorter processes, in which for example one only dumps one month worth, or one week worth, of revisions to the English wikipedia. Breaking up the dump process will increase the probability that each of the smaller dumps succeeds. For instance, one could have all the partial dumps, the launch the stitching process, and the stitching process produces a single dump, removing duplicate revisions. At UCSC, where I work, there are various Master students looking for projects... and some may be interested in doing work that is concretely useful to the Wikipedia. Should I try to get them interested in writing a proper dump stitching tool, and some code to do partial dumps? Can Brion or Tim give us more detail on why the dumps are failing? Are they already doing partial dumps? Is there already a dump stitching tool? Is there anything that could be done to help the process? I could help by looking for database students in search of a project and giving them my code as a starting point... Best, Luca

16 years, 6 months

MediaWiki automated test run failure 2007-09-30

by brion＠pobox.com

An automated run of parserTests.php showed the following failures: This is MediaWiki version 1.12alpha (r26246). Reading tests from "maintenance/parserTests.txt"... Reading tests from "extensions/Cite/citeParserTests.txt"... Reading tests from "extensions/Poem/poemParserTests.txt"... Reading tests from "extensions/LabeledSectionTransclusion/lstParserTests.txt"... 17 still FAILING test(s) :( * URL-encoding in URL functions (single parameter) [Has never passed] * URL-encoding in URL functions (multiple parameters) [Has never passed] * Table security: embedded pipes (http://lists.wikimedia.org/mailman/htdig/wikitech-l/2006-April/022293.html) [Has never passed] * Link containing double-single-quotes '' (bug 4598) [Has never passed] * message transform: <noinclude> in transcluded template (bug 4926) [Has never passed] * message transform: <onlyinclude> in transcluded template (bug 4926) [Has never passed] * BUG 1887, part 2: A <math> with a thumbnail- math enabled [Has never passed] * HTML bullet list, unclosed tags (bug 5497) [Has never passed] * HTML ordered list, unclosed tags (bug 5497) [Has never passed] * HTML nested bullet list, open tags (bug 5497) [Has never passed] * HTML nested ordered list, open tags (bug 5497) [Has never passed] * Inline HTML vs wiki block nesting [Has never passed] * Mixing markup for italics and bold [Has never passed] * dt/dd/dl test [Has never passed] * Images with the "|" character in the comment [Has never passed] * Parents of subpages, two levels up, without trailing slash or name. [Has never passed] * Parents of subpages, two levels up, with lots of extra trailing slashes. [Has never passed] Passed 527 of 544 tests (96.88%)... 17 tests failed!

16 years, 6 months

SQL for Subversion update-report

by Jack Bates

I'm working on a (currently read-only) Subversion interface to MediaWiki: http://www.mediawiki.org/wiki/WebDAV It's implemented in PHP and lets me checkout wiki pages using a Subversion client, or as Subversion externals: http://svnbook.red-bean.com/en/1.4/svn.advanced.externals.html I hope I'll eventually succeed in using this interface to edit pages offline, using Emacs version control mode, or Subclipse. Today I'm stuck on an SQL issue. To implement the Subversion update-report, I need a list of pages which changed since revision X, and whether those pages have any revisions before X (whether those pages are "new"). The first half of this query (list of changed pages) was straight forward. $entryCondition corresponds to revision.rev_id > X, but is actually a conversion of the Subversion client's claims about its current entry states to SQL an condition: $where = array(); $where[] = 'page_id = revision.rev_page'; if ( !empty( $entryCondition ) ) { $where[] = $entryCondition; } $options = array(); $options['GROUP BY'] = 'page_id'; $results = $dbr->select( array( 'page', 'revision' ), array( 'page_title', 'MAX(revision.rev_id)' ), $where, null, $options ); The second half of this query (whether pages are "new") has me stuck. 1) I considered building an array of pages with revisions before X; if a page id is in array, it's not "new". The interface is used to update from revision X, where X is often close to the overall max rev_id (HEAD). Because in MediaWiki the list of changed pages is always shorter than or equal to HEAD - X, and because the list of pages with revisions before X may be huge, the array may be huge relative to the number of pages the update-report actually handles. So I rejected this approach. 2) I considered first getting the list of pages which changed since revision X, then building an array of pages with revisions before X, limited to the list of changed pages using a "page_id IN ( list of changed pages )" SQL condition. This limits the array to only the pages the update-report actually handles. However, if this is an initial checkout, the list of changed pages may be all wiki pages. In this case the "page_id IN ( list of changed pages )" SQL condition will be huge. So I rejected this approach. Finally, I think what I need is something like a LEFT JOIN from revisions since X to revisions before X ON equal page ids. I can then check for NULL rows in the second table, corresponding to "new" pages. 1) My first problem is performing this query with MediaWiki's database layer. t1 is a row for each page changed since X, t2 is a row for each page with revisions before X and NULL rows for pages without: $where = array(); $where[] = 'page_id = t1.rev_page'; if ( !empty( $entryCondition ) ) { $where[] = $entryCondition; } $options = array(); $options['GROUP BY'] = 'page_id'; $results = $dbr->select( array( 'page', 'revision AS t1 LEFT JOIN revision AS t2 ON t1.rev_page = t2.rev_page AND t2.rev_id < t1.rev_id' ), array( 'page_title', 'MAX(t1.rev_id)', 't2.rev_id' ), $where, null, $options ); The expected SQL is something like: SELECT page_title, t1.rev_id, t2.rev_id FROM page, revision AS t1 LEFT JOIN revision AS t2 ON t1.rev_page = t2.rev_page AND t2.rev_id < t1.rev_id WHERE page_id = t1.rev_page AND t1.rev_id > 18 GROUP BY page_id; However I actually get: SELECT page_title,MAX(t1.rev_id),t2.rev_id FROM `page`,`revision AS t1 LEFT JOIN revision AS t2 ON t1.rev_page = t2.rev_page AND t2.rev_id < t1.rev_id` WHERE (page_id = t1.rev_page) AND (t1.rev_id > 18) GROUP BY page_id I'm sure the back ticks are a problem, but am not yet fully conversant with MediaWiki's database layer, so don't know the "right" way to fix them. Suggestions? 2) My second problem is the SQL query itself. It appears to work, however I suspect there's a problem in the "ON" clause. Because I GROUP BY page_id, t1.rev_id is _a_ revision id greater than X, but not necessarily the _minimum_ revision id greater than X. I tried putting "t2.rev_id < MIN(t1.rev_id)" in the "ON" clause, but MySQL complained: Invalid use of group function I haven't simply put "NOT $entryCondtition" in the "ON" clause because, though in these examples it corresponds to "NOT t2.rev_id > 18", it may actually be a far more complicated condition. Can anyone suggest changes to or provide feedback on this SQL query? Much thanks, Jack

16 years, 6 months

mwdumper timeouts in SSH for enwiki

by Saqib Kadri

I'm trying to use mwdumper to insert the English Wikipedia enwiki database into MySQL (enwiki-20070908-pages-articles.xml), but the SSH connection seems to timeout/disconnect after about 890K rows (out of about 10 million I believe) have been uploaded. How can I keep SSH from disconnecting? Is there some mwdumper command line option I can use, or some client or server side SSH setting? (server is Linux with OpenSSH server) Thanks, Saqib

16 years, 6 months

Any interest in a workable REF/NOTE/CITE system?

by Maury Markowitz

As an avid writer on the wiki, I am always frustrated by the current system for REFs (et all). They have little expressive power, break easily, and made editing _extremely_ difficult in certain circumstances. I consider this to be one of the biggest problems with the current MediaWiki software, it costs me perhaps as much as 15% wasted time on every article -- and a check over my contributions list should let you calculate what sort of real-world time that represents! The good news is that I don't think any of these problems aren't fixable. For argument's sake, here's some of the problems I'd like to see fixed: 1) CITE tags are _extremely_ large. Since the REF system requires them to be embedded in-line in the article body, they can make editing of the articles very difficult. For instance, look at the article at: http://en.wikipedia.org/wiki/Water_memory Now click edit. Even trying to figure out what is part of the body, as opposed to the REFs, can be very difficult. Of course one can mitigate this problem, slightly, by removing the vertical white space, but that doesn't _really_ help the issue as much as you would like, and has the side-effect of making the CITEs themselves harder to edit. 2) REFs should be _represented_ as footnotes, REFs, however, are _not footnotes_. It seems whoever built the REF system seems to have forgotten this fact. Footnotes can be used for all sorts of different purposes, but with the current REF system the two become synonymous. I like to add notes about pronunciation and "trivial" links to other articles using footnotes, but there's simply no way to do this with the current system. 3) There's no way to reference different page numbers! This is a _serious_ problem, because it means if you want to use different portions of a single work, like a book, you have to put in a different CITE for each one. In reality, people just don't bother. 4) I can't fold hand-edited refs into the REFLIST. For instance, let's say I used a book for most of the body of an article, so I didn't bother putting lots of REFs inline. I did, however, add a half dozen different REFs inlined to support specific facts. Now how do I make the REFLIST look right? I can't! I end up with some numbered ones, and some bulleted, and due to the default styles, they look different. Uggg. 5) REF should not be picky about position. Right now if you want to use the same REF in more than one place, you can use a named ref. Generally the idea is good. However, this system demands that the body of the named ref be placed in the very first place that ref is used. This works great until you want to actually edit the article, at which point it because terribly easy to break _all_ the references with something as simple as a cut-n-paste. Here's my suggested solutions: 1) named refs should work no matter where the body of the reference is placed. That would immediately fix most of these problems. I could, optionally, place only <ref name=x/> into the body, and remove all of the ref bodies to the ==References== section of the article. This would even allow me to fold "non-inlines" into the references list, using exactly the same mechanism. 2) there should be another, similar, tag for "real" footnotes. <note> would be great. They would operate identically to (1). 3) named REFs can have another parameter, "page=". These would be collected into the references at the bottom, with each lettered reference appearing. How would I use this? Well using the water memory article as an example, I would... 1) remove all the CITEs into the ==References== section, surrounded in REF tags with a name. 2) place name ref placeholders in the body of the article, some of these may optionally include page numbers 3) some of the comments would be surrounded with <note> tags, and optionally removed to a new ==Notes== section. Is there anything technically impossible here? Maury

16 years, 6 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l September 2007