Hi everyone,
I recently set up a MediaWiki (http://server.bluewatersys.com/w90n740/)
and I need to extra the content from it and convert it into LaTeX
syntax for printed documentation. I have googled for a suitable OSS
solution but nothing was apparent.
I would prefer a script written in Python, but any recommendations
would be very welcome.
Do you know of anything suitable?
Kind Regards,
Hugo Vincent,
Bluewater Systems.
Hi,
today we came over 10k HTTP requests per second (even with inter-squid
traffic eliminated). Especially thanks to Mark and Tim, who've been
improving our caching, as well as doing lots of other work, and
achieved incredible results (while I was slacking). Really, thanks!
Domas
Hi All,
I just joined the OTRS team a short time ago but I noticed a chronic problem
with the Arabic OTRS system, most of the messages in Arabic come to us
encoded as Windows-1256 and not UTF-8 unfortunately (probably because MS
Outlook, the prevalent client in Arab-speaking countries encodes them as
such) , therefore to read them, we have to switch to plain view then change
the encoding from the browser to read the mail, this is a minor
inconvenience, the big problem however is that when we send them back a
response through the system, users receive it as gibberish (I am guessing it
is UTF-8 encoded). I have seen a lot of users complain in the short time I
have been on OTRS that we are sending out unintelligible messages as
response to them. Does this have a solution? how does OTRS handles encoding
in general? do volunteers for other languages have the same problem?
--
Best Regards,
Muhammad Alsebaey
Hi,
I am trying to find a way to autologin users who register or login to
Mediawiki to also be registered and logged into another application's
user database in order to save them a second registration. I have
found many extensions that do it the other way round (from another
application automatically into MW), but not this case. Does anyone
have a suggestion on how to go about doing this?
The second application is a php based web app with its own, very
simple security model. It just needs username, password and email
address.
Some use cases:
#1
1. New user fills in registration page in MW
2. a) MW registers user in MW database
2. b) MW registers user in second, external (but local) database
3. User is logged into MW and logged into external application
#2
1. Existing user logs into MW
2. MW automatically logs user into other application
#3
1. User logs out of MW
2. MW logs out user from other application
#4
1. User changes password in MW
2. MW updates password in other database
(there could be a variation of this use case if users use 'forgot
password' and similar)
Thanks,
Andi
Hi,
I have a question that concerns access to my wikipedia user accounts. I have
user accounts at the English, Slovenian, German and Spanish wikipedias under
the username "Jalen" ("Jalen1" on the German wikipedia). My access to these
accounts has been blocked due to weak passwords. I had my my e-mail address
provided on the Spanish wikipedia but not on the other three wikipedias, hence
I am unable to restore access to the accounts by myself.
My e-mail address is scythus at volja.net. I am also subscribed to the
wikitech-l list under the same username ("Jalen") and the same e-mail address
("scythus(a)volja.net").
Would it be legitimate to request that my access to the user accounts on en:,
de: and sl: wikis be restored?
A bureaucrat at my native (sl:) wiki can confirm my identity since he has
previously communicated with me through the above-mentioned e-mail address (he
was the first person I contacted for troubleshooting, but since bureaucrats can
not restore access to user accounts I was told to contact developers) and has
also seen my IP address which is 84.52.134.168.
To further prove my identity, I have saved the confirmation mail I received on
opening my account on the Spanish wikipedia, where the original IP address is
stated.
I would also be satisfied if only the user account on my native (sl:) wiki could
be restored since, as I have said, a bureaucrat there knows me and can confirm
my the username ("Jalen"), the e-mail address ("scythus(a)volja.net") and the IP
address belong to one and the same person.
I would kindly appreciate any response or assistance.
Regards,
Jalen
I see that the latest dump of the English Wikipedia failed (I mean, the dump
of all the page histories).
As part of some other work I am doing, I have efficient code that can "take
apart" a dump into its single component pages, and out of that, it would be
possible to fashion code that "stitches together" various partial dumps.
This would allow to break up a single dump process into multiple, shorter
processes, in which for example one only dumps one month worth, or one week
worth, of revisions to the English wikipedia.
Breaking up the dump process will increase the probability that each of the
smaller dumps succeeds.
For instance, one could have all the partial dumps, the launch the stitching
process, and the stitching process produces a single dump, removing
duplicate revisions.
At UCSC, where I work, there are various Master students looking for
projects... and some may be interested in doing work that is concretely
useful to the Wikipedia. Should I try to get them interested in writing a
proper dump stitching tool, and some code to do partial dumps?
Can Brion or Tim give us more detail on why the dumps are failing? Are they
already doing partial dumps? Is there already a dump stitching tool? Is
there anything that could be done to help the process? I could help by
looking for database students in search of a project and giving them my code
as a starting point...
Best,
Luca
An automated run of parserTests.php showed the following failures:
This is MediaWiki version 1.12alpha (r26246).
Reading tests from "maintenance/parserTests.txt"...
Reading tests from "extensions/Cite/citeParserTests.txt"...
Reading tests from "extensions/Poem/poemParserTests.txt"...
Reading tests from "extensions/LabeledSectionTransclusion/lstParserTests.txt"...
17 still FAILING test(s) :(
* URL-encoding in URL functions (single parameter) [Has never passed]
* URL-encoding in URL functions (multiple parameters) [Has never passed]
* Table security: embedded pipes (http://lists.wikimedia.org/mailman/htdig/wikitech-l/2006-April/022293.html) [Has never passed]
* Link containing double-single-quotes '' (bug 4598) [Has never passed]
* message transform: <noinclude> in transcluded template (bug 4926) [Has never passed]
* message transform: <onlyinclude> in transcluded template (bug 4926) [Has never passed]
* BUG 1887, part 2: A <math> with a thumbnail- math enabled [Has never passed]
* HTML bullet list, unclosed tags (bug 5497) [Has never passed]
* HTML ordered list, unclosed tags (bug 5497) [Has never passed]
* HTML nested bullet list, open tags (bug 5497) [Has never passed]
* HTML nested ordered list, open tags (bug 5497) [Has never passed]
* Inline HTML vs wiki block nesting [Has never passed]
* Mixing markup for italics and bold [Has never passed]
* dt/dd/dl test [Has never passed]
* Images with the "|" character in the comment [Has never passed]
* Parents of subpages, two levels up, without trailing slash or name. [Has never passed]
* Parents of subpages, two levels up, with lots of extra trailing slashes. [Has never passed]
Passed 527 of 544 tests (96.88%)... 17 tests failed!
I'm working on a (currently read-only) Subversion interface to
MediaWiki: http://www.mediawiki.org/wiki/WebDAV
It's implemented in PHP and lets me checkout wiki pages using a
Subversion client, or as Subversion externals:
http://svnbook.red-bean.com/en/1.4/svn.advanced.externals.html
I hope I'll eventually succeed in using this interface to edit pages
offline, using Emacs version control mode, or Subclipse.
Today I'm stuck on an SQL issue. To implement the Subversion
update-report, I need a list of pages which changed since revision X,
and whether those pages have any revisions before X (whether those pages
are "new").
The first half of this query (list of changed pages) was straight
forward. $entryCondition corresponds to revision.rev_id > X, but is
actually a conversion of the Subversion client's claims about its
current entry states to SQL an condition:
$where = array();
$where[] = 'page_id = revision.rev_page';
if ( !empty( $entryCondition ) ) {
$where[] = $entryCondition;
}
$options = array();
$options['GROUP BY'] = 'page_id';
$results = $dbr->select( array( 'page', 'revision' ), array( 'page_title', 'MAX(revision.rev_id)' ), $where, null, $options );
The second half of this query (whether pages are "new") has me stuck.
1) I considered building an array of pages with revisions before X; if a
page id is in array, it's not "new".
The interface is used to update from revision X, where X is often
close to the overall max rev_id (HEAD). Because in MediaWiki the list
of changed pages is always shorter than or equal to HEAD - X, and
because the list of pages with revisions before X may be huge, the
array may be huge relative to the number of pages the update-report
actually handles. So I rejected this approach.
2) I considered first getting the list of pages which changed since
revision X, then building an array of pages with revisions before X,
limited to the list of changed pages using a "page_id IN ( list of
changed pages )" SQL condition. This limits the array to only the
pages the update-report actually handles.
However, if this is an initial checkout, the list of changed pages
may be all wiki pages. In this case the "page_id IN ( list of changed
pages )" SQL condition will be huge. So I rejected this approach.
Finally, I think what I need is something like a LEFT JOIN from
revisions since X to revisions before X ON equal page ids. I can then
check for NULL rows in the second table, corresponding to "new" pages.
1) My first problem is performing this query with MediaWiki's database
layer. t1 is a row for each page changed since X, t2 is a row for
each page with revisions before X and NULL rows for pages without:
$where = array();
$where[] = 'page_id = t1.rev_page';
if ( !empty( $entryCondition ) ) {
$where[] = $entryCondition;
}
$options = array();
$options['GROUP BY'] = 'page_id';
$results = $dbr->select( array( 'page', 'revision AS t1 LEFT JOIN revision AS t2 ON t1.rev_page = t2.rev_page AND t2.rev_id < t1.rev_id' ), array( 'page_title', 'MAX(t1.rev_id)', 't2.rev_id' ), $where, null, $options );
The expected SQL is something like:
SELECT page_title, t1.rev_id, t2.rev_id FROM page, revision AS t1
LEFT JOIN revision AS t2 ON t1.rev_page = t2.rev_page AND t2.rev_id
< t1.rev_id WHERE page_id = t1.rev_page AND t1.rev_id > 18 GROUP BY
page_id;
However I actually get:
SELECT page_title,MAX(t1.rev_id),t2.rev_id FROM `page`,`revision
AS t1 LEFT JOIN revision AS t2 ON t1.rev_page = t2.rev_page AND
t2.rev_id < t1.rev_id` WHERE (page_id = t1.rev_page) AND
(t1.rev_id > 18) GROUP BY page_id
I'm sure the back ticks are a problem, but am not yet fully
conversant with MediaWiki's database layer, so don't know the "right"
way to fix them. Suggestions?
2) My second problem is the SQL query itself. It appears to work,
however I suspect there's a problem in the "ON" clause. Because I
GROUP BY page_id, t1.rev_id is _a_ revision id greater than X, but
not necessarily the _minimum_ revision id greater than X.
I tried putting "t2.rev_id < MIN(t1.rev_id)" in the "ON" clause, but
MySQL complained: Invalid use of group function
I haven't simply put "NOT $entryCondtition" in the "ON" clause
because, though in these examples it corresponds to "NOT t2.rev_id >
18", it may actually be a far more complicated condition.
Can anyone suggest changes to or provide feedback on this SQL query?
Much thanks, Jack
I'm trying to use mwdumper to insert the English Wikipedia enwiki
database into MySQL (enwiki-20070908-pages-articles.xml), but the SSH
connection seems to timeout/disconnect after about 890K rows (out of
about 10 million I believe) have been uploaded. How can I keep SSH from
disconnecting?
Is there some mwdumper command line option I can use, or some client or
server side SSH setting? (server is Linux with OpenSSH server)
Thanks,
Saqib
As an avid writer on the wiki, I am always frustrated by the current
system for REFs (et all). They have little expressive power, break
easily, and made editing _extremely_ difficult in certain
circumstances. I consider this to be one of the biggest problems with
the current MediaWiki software, it costs me perhaps as much as 15%
wasted time on every article -- and a check over my contributions list
should let you calculate what sort of real-world time that represents!
The good news is that I don't think any of these problems aren't
fixable. For argument's sake, here's some of the problems I'd like to
see fixed:
1) CITE tags are _extremely_ large. Since the REF system requires them
to be embedded in-line in the article body, they can make editing of
the articles very difficult. For instance, look at the article at:
http://en.wikipedia.org/wiki/Water_memory
Now click edit. Even trying to figure out what is part of the body, as
opposed to the REFs, can be very difficult. Of course one can mitigate
this problem, slightly, by removing the vertical white space, but that
doesn't _really_ help the issue as much as you would like, and has the
side-effect of making the CITEs themselves harder to edit.
2) REFs should be _represented_ as footnotes, REFs, however, are _not
footnotes_. It seems whoever built the REF system seems to have
forgotten this fact. Footnotes can be used for all sorts of different
purposes, but with the current REF system the two become synonymous. I
like to add notes about pronunciation and "trivial" links to other
articles using footnotes, but there's simply no way to do this with
the current system.
3) There's no way to reference different page numbers! This is a
_serious_ problem, because it means if you want to use different
portions of a single work, like a book, you have to put in a different
CITE for each one. In reality, people just don't bother.
4) I can't fold hand-edited refs into the REFLIST. For instance, let's
say I used a book for most of the body of an article, so I didn't
bother putting lots of REFs inline. I did, however, add a half dozen
different REFs inlined to support specific facts. Now how do I make
the REFLIST look right? I can't! I end up with some numbered ones, and
some bulleted, and due to the default styles, they look different.
Uggg.
5) REF should not be picky about position. Right now if you want to
use the same REF in more than one place, you can use a named ref.
Generally the idea is good. However, this system demands that the body
of the named ref be placed in the very first place that ref is used.
This works great until you want to actually edit the article, at which
point it because terribly easy to break _all_ the references with
something as simple as a cut-n-paste.
Here's my suggested solutions:
1) named refs should work no matter where the body of the reference is
placed. That would immediately fix most of these problems. I could,
optionally, place only <ref name=x/> into the body, and remove all of
the ref bodies to the ==References== section of the article. This
would even allow me to fold "non-inlines" into the references list,
using exactly the same mechanism.
2) there should be another, similar, tag for "real" footnotes. <note>
would be great. They would operate identically to (1).
3) named REFs can have another parameter, "page=". These would be
collected into the references at the bottom, with each lettered
reference appearing.
How would I use this? Well using the water memory article as an
example, I would...
1) remove all the CITEs into the ==References== section, surrounded in
REF tags with a name.
2) place name ref placeholders in the body of the article, some of
these may optionally include page numbers
3) some of the comments would be surrounded with <note> tags, and
optionally removed to a new ==Notes== section.
Is there anything technically impossible here?
Maury