Hi everyone,
I recently set up a MediaWiki (http://server.bluewatersys.com/w90n740/)
and I need to extra the content from it and convert it into LaTeX
syntax for printed documentation. I have googled for a suitable OSS
solution but nothing was apparent.
I would prefer a script written in Python, but any recommendations
would be very welcome.
Do you know of anything suitable?
Kind Regards,
Hugo Vincent,
Bluewater Systems.
Seems to me playing the role of the average dumb user, that
en.wikipedia.org is one of the rather slow websites of the many websites
I browse.
No matter what browser, it takes more seconds from the time I click on a
link to the time when the first bytes of the HTTP response start flowing
back to me.
Seems facebook is more zippy.
Maybe Mediawiki is not "optimized".
Hi all.
"Recent changes" shows bytes added/removed in green/red. But "View history"
only shows revision length in bytes, and "User contributions" shows no byte
counts at all.
I think it would be nice for both "View history"[1] and "User contributions" to
show bytes added/removed. This would make it easier to distinguish between
small contributions from big ones: between multiple-sentence additions and
small typo fixes.
What do you think?
All the best,
-Jason
^ [1]. You can already get bytes added/removed to history revisions using a
gadget. Just add the following line to your vector.js:
importScript('fr:MediaWiki:Gadget-HistoryNumDiff.js');
Hi!
I've read on the techblog that the new UI go live in April. I have
some questions:
1) What version? Acai, babaco, citron?
2) How/where could a wiki customize the special character insert menu,
and the inserted strings? And the embed file (picture) button inserts
this: "[[Example.jpg]]", without any "File:" or "Image:"!
3) The search and replace button is available in firefox, but does not
appear at all in opera. Why?
4) Currently the new navigable TOC does not work on FF/Opera at all
(I've tried those).
Not too early for live deployment?
Regards,
Akos Szabo (Glanthor Reviol)
If you install:
http://www.mediawiki.org/wiki/Extension:VariablesExtension#Installation
Then edit the main page to contain the following (between the '---'):
---
{{#vardefine:pi|3.14159265418}}
{{#expr:{{#var:pi}}+1}}
---
The main page should, when rendered, now, show the number 4.14159265418
What I would like is something very similar called "CellsExtension"
which provides only the keyword "#cell" as in:
---
{{#expr:{{#cell:pi}}+1}}
---
However, it gets the value of "pi" from:
http://somedomain.org/mediawiki/index.php?title=Pi
Ideally, whenever a mediawiki rendered page is cached, dependency
pointers are created from all pages from which cells fetched values
during rendering of the page (implying the evaluation of #expr's. That
way, when the mediawiki source for one of the cached pages is edited,
not only is its cached rendering deleted, but so are all cached
renderings that depend on it directly or indirectly. This is so that
the next time those pages are accessed, they are rendered -- and
cached -- again, freshly evaluating the formulas in the #expr's
(which, of course, will contain #cell references such as {{#cell:pi}}).
Hi,
I have been working on getting asynchronous upload from url to work
properly[1]. A problem that I encountered was that I need to store
data across requests. Normally I would use $_SESSION, but this data
should also be available to job runners, and $_SESSION isn't.
As I see there are basically two ways to get a data store. The first
is to store the objects in the DB using wfGetCache( CACHE_DB ); I'm
not sure though whether it is meant to be used this way.
Alternatively I could revive my staged-upload work. In this branch,
all so called stashed uploads, uploads that require user intervention
before they can be completed have their meta data stored in the
database instead of in the session. That would be still quite a lot of
work though.
Or is there any other mechanism to be able to share data between the
jobqueue and requests?
Regards,
Bryan
[1] http://www.mediawiki.org/wiki/Special:Code/MediaWiki/author/btongminh?offse…
I'm going to begin working on the following bugs:
* "Support collation by a certain locale (sorting order of
characters)", https://bugzilla.wikimedia.org/show_bug.cgi?id=164 (only
parts related to category sorting)
* "Subcategory paging is not separate from article or image paging",
https://bugzilla.wikimedia.org/show_bug.cgi?id=1211
* "CategoryTree is inefficient",
https://bugzilla.wikimedia.org/show_bug.cgi?id=23682
As well as possibly:
* "Categories need to be structured by namespace",
https://bugzilla.wikimedia.org/show_bug.cgi?id=450
* "Natural number sorting in category listings",
https://bugzilla.wikimedia.org/show_bug.cgi?id=6948
There are essentially two problems here:
1) We currently sort articles on category pages by the Unicode code
point of their sort key. This is terrible for anything other than
English, and dodgy sometimes even for English. (This is bugs 164 and
6948.)
2) We have no way to efficiently get all items that are in a category
and also in a particular namespace. Particularly, we can't retrieve
all subcategories without scanning all items in the category, which is
inefficient when we have a few (or no) subcategories and tons of
items. (This is bugs 1211, 23682, and 450.)
One part of (2) needs to be clarified. The primary use-case is
obviously that we want to be able to count subcategories efficiently,
or display all of them when we only display some of the items in the
category: this is bugs 1211 and 23682. Secondarily, we have a request
at bug 450 to organize category pages by namespace, so main, Talk:,
User:, etc. are all paginated separately.
I think the goal for (2) should be to allow efficient separate
retrieval of subcategories, files, and other pages, but not to
distinguish between namespaces otherwise. The major motivation is
that to do this efficiently, we'll need to add namespace info to the
categorylinks table, and we want this to stay consistent with the info
in the page table. Categories, files, and other types of pages cannot
be moved to one another, as far as I know (it would hardly make
sense), so it automatically stays consistent this way. This is a big
plus, because there are inevitably bugs that cause denormalized data
to fall out of sync (look at cat_pages).
Furthermore, I don't think it's obvious that we want separate
namespaces to display separately at all on category pages. What's a
case where that would be desired? It would break up the display a
lot, with a bunch of separate headers for different namespaces, when
each namespace might only have a few items. Most categories whose
sort appearance you'd care about (i.e., excepting maintenance
categories) will have nearly everything in one namespace anyway. You
could always split the category into separate ones per namespace if
you want them separate.
So I propose that we keep the current category/normal page/file split,
and paginate those three parts of the page separately. So you'd have
up to 200 subcategories, then below that up to 200 normal pages, then
below that up to 200 files. (The numbers could be adjusted.
Currently they're hardcoded, which is stupid.) Paginating
subcategories separately is obviously needed. Paginating files
separately is not really needed, but it would be much more consistent.
The overall solution, then, would be:
1) Change the way category sortkeys are generated. Start them with a
letter depending on namespace, like 'C' for category, 'P' for regular
page, 'F' for file. After that first letter, append a sortkey
generated by ICU or whatever. I think Tim has opinions on what would
be a good choice to convert the article title into sort key -- if not,
I'll have to research it and hopefully not come up with a completely
incorrect answer.
2) On category pages, maintain three offsets and do three queries (or
maybe UNION them together, doesn't matter), one for each of
categories/regular pages/files. Because of (1), this will be
efficient and will also sort less unreasonably for non-English
languages.
One problem that was pointed out somewhere in the massive useless
discussion on bug 164 is that we'd have to do something to display the
first letter for each section. Currently it's just the first letter
of the sortkey, but if that's some binary string, that becomes a
problem. I'm not seeing an obvious solution, since the
sortkey-generation algorithm will be opaque to us. If it sorts Á the
same as A, then how do we figure out that the "canonical" first letter
for the section should be "A" and not "Á"? How do we even figure out
where the sections begin or end? Would that even make sense in all
cases? At a first pass, I'd say we should just skip the first letter
and display all the items straight from beginning to end without
section divisions. I don't think that's a big problem.
This is just my initial thoughts. Feedback appreciated. If people
agree with the general approach, I can start coding this up tomorrow.
Cross-posted to
<http://techblog.wikimedia.org/2010/07/mediawiki-version-statistics/>
Some kind people at Qualys have surveyed versions of open source web
apps present on the web, including MediaWiki. Here is the relevant
page from their presentation:
http://wimg.co.uk/3jK.png
For the original see:
https://community.qualys.com/docs/DOC-1401
And the press release:
<http://www.qualys.com/company/newsroom/newsreleases/usa/view/2010-07-28/>
They make the point that 95% of MediaWiki installations have a
"serious vulnerability", whereas only 4% of WordPress installations
do. While WordPress's web-based upgrade utility certainly has a
positive impact on security, I feel I should point out that what
WordPress counts as a serious vulnerability does not align with
MediaWiki's definition of the same term.
For instance, if a web-based user could execute arbitrary PHP code on
the server, compromising all data and user accounts, we would count
that as the most serious sort of vulnerability, and we would do an
immediate release to fix it. We're proud of the fact that we haven't
had any such vulnerability in a stable release since 1.5.3 (December
2005).
However in WordPress, they count this as a feature, and all
administrators can do it. Similarly, WordPress avoids the difficult
problem of sanitising HTML and CSS while preserving a rich feature set
by simply allowing all authors to post raw HTML.
If you are running MediaWiki in a CMS-like mode, with whitelist edit
and account creation restricted, then I think it's fair to say that in
terms of security, you're better off with MediaWiki 1.14.1 or later
than you are with the latest version of WordPress.
However, the statistics presented by Qualys show that an alarming
number of people are running versions of MediaWiki older than 1.14.1,
which was the most recent fix for an XSS vulnerability exploitable
without special privileges. There is certainly room for us to do better.
We have a new installer project in development, which we hope to
release in 1.17. It includes a feature which encourages users to sign
up for our release announcements mailing list. But maybe we need to do
more. Should we take a leaf from WordPress's book, and nag
administrators with a prominent notice when they are not using the
latest version? Such a feature would require MediaWiki to "dial home",
which is controversial in our developer community.
-- Tim Starling
Sorry about bugging the list about it, but can anyone please explain
the reason for not enabling the Interlanguage extension?
See bug 15607 -
https://bugzilla.wikimedia.org/show_bug.cgi?id=15607
I believe that enabling it will be very beneficial for many projects
and many people expressed their support of it. I am not saying that
there are no reasons to not enable it; maybe there is a good reason,
but i don't understand it. I also understand that there are many other
unsolved bugs, but this one seems to have a ready and rather simple
solution.
I am only sending it to raise the problem. If you know the answer, you
may comment at the bug page.
Thanks in advance.
--
Amir Elisha Aharoni
heb: http://haharoni.wordpress.com | eng: http://aharoni.wordpress.com
cat: http://aprenent.wordpress.com | rus: http://amire80.livejournal.com
"We're living in pieces,
I want to live in peace." - T. Moore