I recently set up a MediaWiki (http://server.bluewatersys.com/w90n740/)
and I need to extra the content from it and convert it into LaTeX
syntax for printed documentation. I have googled for a suitable OSS
solution but nothing was apparent.
I would prefer a script written in Python, but any recommendations
would be very welcome.
Do you know of anything suitable?
I've been putting placeholder images on a lot of articles on en:wp.
e.g. [[Image:Replace this image male.svg]], which goes to
[[Wikipedia:Fromowner]], which asks people to upload an image if they
I know it's inspired people to add free content images to articles in
several cases. What I'm interested in is numbers. So what I'd need is
a list of edits where one of the SVGs that redirects to
[[Wikipedia:Fromowner]] is replaced with an image. (Checking which of
those are actually free images can come next.)
Is there a tolerably easy way to get this info from a dump? Any
Wikipedia statistics fans who think this'd be easy?
(If the placeholders do work, then it'd also be useful convincing some
wikiprojects to encourage the things. Not that there's ownership of
articles on en:wp, of *course* ...)
I have applied to Google Summer of Code with the project to enable
category moving without using bots. After some correspondance with
Catrope, the following text is my project idea. Any feedback would be
I will provide capability of moving categories to achieve an effect
for the end-user similar to that of moving other pages. Currently,
contributors must apply to use a bot that recreates the category page
and changes the category link on all relevant articles.
The object can be divided into three parts. First, the category page
is moved, along with its history, just as renaming of articles works.
A redirect is optionally placed on the old category page, and the
category discussion is moved as well.
Second, all articles in the relevant category must have their category
links changed. There are several obstacles involved in this task:
1. Finding all alternative ways of categorizing articles. It is simple
to match the simple category links and category lists, but more
difficult to find e.g. categories included from a template. Roan
Kattouw (Catrope) suggested category redirects for this, such that all
articles categorised as [[Category:A]] would also be listed at
[[Category:B]] if the prior has been redirected to the latter.
2. Articles might be in the process of being edited as the movement is
done. This, however, can be solved in the same manner as edit
collisions are currently solved.
3. The algorithm would likely have high complexity and would thus not
scale well with very large categories.
This is likely to constitute a significant and challenging part of the project.
As the last step, the relevant entries in the categorylinks table
would need to be changed. This is accomplished by a simple SQL query.
This could be avoided if bug #13579  ("Category table should use
category ID rather than category name") is fixed, which it could be as
part of this project.
The project would preferably be written as a patch to the core.
Catrope suggested setting up a separate SVN branch for the project,
such that everyone can see my progress.
Profits for MediaWiki
Developing a means of moving categories would decrease dependency on
bots, gaining in administrative time. Additionally, the solution would
be faster than any bot-relying solution could be due to, among other
things, the removed need of loading pages.
Category moving would also increase the consistency in layout on the
different article types. The only real reason for a "move" tab not to
reside on category pages is that the feature is not yet implemented.
Publishing this document to the MediaWiki development community
(wikitech-l) and awaiting comments on the planned procedure would be
the first step.
After the community bonding period specified by the time line, a week
should be enough to get comfortable with the relevant MediaWiki code
and implement the first section, moving the category page along with
its discussion and history. Much old code should be reusable here,
such as the Title::moveTo() method for moving pages.
Until mid of July, most of the second part of the project should be
finished. In a week from there, the last part would be completed, too.
A month is then reserved for bug-testing, tweaking and as a buffer for
unexpected obstacles. The MediaWiki community is very important in
this step for testing and feedback.
Sorry for my English :)
What I need is case insensitive titles. My solution for the problem was to
change collation in mysql from <unf8_bin> to <utf8_general_ci> in table
<page>, for field <page_title>.
But bigger problem with links persists. In my case, if there is an article
<Frank Dreben>, link [[Frank Dreben]] is treated like a link to an existent
article (GoodLink), but link [[frank dreben]] is treated like a link to a
non-existent article, so, this link opens editing of existent article <Frank
Dreben>. What can be fixed for that link [[frank dreben]] to be treated like
I've spent some time in Parser.php, LinkCache.php, Title.php, Linker.php,
LinkBatch.php but found nothing useful. The last thing I tried was to do
strtoupper on title every time array of link cache is filled, in
LinkCache.php. I also tried to do strtoupper on title every time data is
fetched from the array.
I've tried to make titles in cache be case insensitive, but it didn't work
out, not sure why - it seems like when links are constructed (parser, title,
linker, etc) only LinkCache methods are used.
Could anybody point a direction to dig in? :)
I need to create some user accounts and have mediawiki send the users
a welcome message. I found the Mediawiki:confirmemail_body page which
appears to feed such a welcome message. How do I do that? If I use
Special:Userlogin, create an account and use "by email", the user gets
a password reminder, but not a welcome message. I could change the
password reminder text, but that wouldn't be correct for all the other
users who use that page to genuinely retrieve their password.
S Page wrote:
> Aerik Sylvan wrote:
> > At the risk of asking a stupid question: what is the status of category
> > intersections?
> At the risk of making a stupid answer: you could install the Semantic
> MediaWiki extension and make queries like
> SMW will query for membership of subcategories (thus it'll match members
> of Child actors) , to a configurable depth limit. The nifty thing is
> you can
> display other properties and categories of matching pages.
> See demo (temporarily) at,
That's cool... what's the backend for that? (I looked briefly at some docs
at http://semantic-mediawiki.org/wiki/Semantic_MediaWiki but didn't quickly
find any architecture stuff).
http://www.wikidweb.com - the Wiki Directory of the Web
http://tagthis.info - Hosted Tagging for your website!
I know a lot of people would love wysiwyg editing functionality, I know a
lot of people on this list are working on such a thing, and I know that a
problem seems to be the "grammar" involved. I heard on the grapevine that,
on this basis, it was "at least a year away". I don't want to know the
specifics of what that grammar is, or what its problems are. :-) But what
I'd like to know is:
* Would this issue be solved quicker by having a few funded developers
working on it? (If so, how many wo/man hours would be needed - what size of
a funding bid would be needed?)
* What does the fckeditor in existence *not* do yet? <
http://mediawiki.fckeditor.net> (If the answer is: "too much - it only does
a tiny handful of things compared to what we need" - well then, that answers
my question. :-))
If you haven't already surmised, this question is from a technical dunce -
answers at that level would be greatly appreciated. :-)
On Tue, 22 Apr 2008, Simetrical wrote:
> On Tue, Apr 22, 2008 at 10:59 AM, Roan Kattouw <roan.kattouw(a)home.nl>
> > I missed the explanation of the fulltext implementation. Something like
> > 'Foo With_spaces Bar' and then do a fulltext search for the cats you
> > need? That would be more powerful, and would probably be faster for
> > complex intersections. I'll write an alternative to
> > CategoryIntersections that uses the fulltext schema and run some
> > benchmarks. I expect to have some results by the end of the week.
> Aerik Sylvan has already done an implementation of the backend using
> CLucene. If a front-end could be done in core, with a pluggable
> backend, that might have the best chance of getting enabled on
> Wikimedia relatively quickly. MyISAM fulltext is not necessarily
> going to be fast enough due to the locking.
Yes, I did a fulltext search (which works quite well - I forget the response
times... I think it was around a third of a second even for intersections of
large groups, like "Living_People") and the way it handles booleans and
stuff is quite nice. I think I broke it when I moved servers, but I can put
it back up. I think it would probably be a great addition to core, and
would be very adequate for small wikis, but too slow for larger ones
(performance at a few tenths of a second will really add up with tens or
hundreds of hits...) I think doing updates is also an issue on large wikis,
due to table locking of the MyISAM table. But, I think it will be fine for
small wikis. MySQL doesn't break on underscores, so using the category as
it appears in the url seems to work great for fulltext search, and the built
in fulltext search is *much* faster than doing lookups on the categorylinks
table, especially for large sets.
So, I'd propose in core we add a MyISAM table with a fulltext index of
categories - this will suite small wikis. For big wikis, make this a InnoDB
table and use it to build a Lucene index, which you'd search with whatever
flavor of Lucene you like. This is a fairly straight path, that covers both
core and large wikis, should have good performance for either application,
and is flexible in that it does boolean searches. I don't have suggestions
for an interface, but why not just start with a SpecialPage and see what
happens? Once the functionality is there, suggestions for how to better use
it will come out of the woodwork.
I'm working on a CLucene daemon (calling it clucened, which is on SF - with
slightly out-of-date source in subversion - and at clucened.com), which
could be used for this, or anything else. I'm planning to make it Solr
compatible, but not a direct port of Solr, and the implementation will have
some differences. So far I have only the daemon and the search function
(takes a raw query, which can be boolean or have mulitple fields, and passes
it through). I think this is really cool, but if we already have a GCJ
Lucene search for En, it may be easier just to extend that to read at
categories Lucene index than use another architecture. Either way, I think
a search daemon will find an audience and will be a really cool thing :-)
http://www.wikidweb.com - the Wiki Directory of the Web
http://tagthis.info - Hosted Tagging for your website!
[courtesy copy to foundation-l, though I suggest that discussion, if any, be
centralised on wikitech-l]
the search index for the mailinglist archives was last rebuilt in January.
Now, after having made quite a few queries about this here and at other
places, I learnt (and obviously had to accept) that rebuilding the search
index is quite a resources-consuming process which resulted in crashes.
To put it bluntly, I dare suggest from a non-technical POV that the "htdig"
(that's the name, isn't it?) experiment has failed. If we can only update
our search index every 6 months or so, it is pointless to have it.
Instead, I suggest that http://lists.wikimedia.org/robots.txt be modified as
to allow Google (and other search engines) to crawl /pipermail/ again. I do
not really see the privacy issues of this, nabble, gmane etc. are
google-searchable as well and I really don't see the point in barring Google
from our own archive.
If I am very honest, I do not even remember anymore, why we decided to bar
Google from http://lists.wikimedia.org/pipermail.
Was it due to privacy concerns? If so, which, and why is
lists.wikimedia.orgas an archive different from Nabble/Gmane?