I recently set up a MediaWiki (http://server.bluewatersys.com/w90n740/)
and I need to extra the content from it and convert it into LaTeX
syntax for printed documentation. I have googled for a suitable OSS
solution but nothing was apparent.
I would prefer a script written in Python, but any recommendations
would be very welcome.
Do you know of anything suitable?
I've been putting placeholder images on a lot of articles on en:wp.
e.g. [[Image:Replace this image male.svg]], which goes to
[[Wikipedia:Fromowner]], which asks people to upload an image if they
I know it's inspired people to add free content images to articles in
several cases. What I'm interested in is numbers. So what I'd need is
a list of edits where one of the SVGs that redirects to
[[Wikipedia:Fromowner]] is replaced with an image. (Checking which of
those are actually free images can come next.)
Is there a tolerably easy way to get this info from a dump? Any
Wikipedia statistics fans who think this'd be easy?
(If the placeholders do work, then it'd also be useful convincing some
wikiprojects to encourage the things. Not that there's ownership of
articles on en:wp, of *course* ...)
We now have english wikipedia fully migrated to new servers / new search
backend. We cannot fully migrate other wikis until we resolve some hardware
issues. In the meantime, here is the overview of new features now deployed
1) Did you mean... - we now have search suggestions. Care has been taken to
provide suggestions that are context-sensitive, i.e. on phrases, proper
2) fuzzy and wildcard queries - a word can be made fuzzy by adding ~ to it's
end, e.g. query sarah~ thompson~ will give all different spellings and
similar names to sarah thompson. Wildcards can now be prefix and suffix,
e.g. *stan will give various countries in central asia.
3) prefix: - using this magic prefix, queries can be limited to pages
beginning with certain prefix. E.g.
mwsuggest prefix:Wikipedia:Village Pump
will search all village pumps and archives for mwsuggest. This should be
especially useful for archive searching in concert with inputbox or
4) intitle: - using this magic prefix, queries can be limited to titles only
5) generally improved quality of search results via usage of related
articles (based on co-occurrence of links), anchor text, text abstracts,
proximity within articles, sections, redirects, improved stemming and such
-----BEGIN PGP SIGNED MESSAGE-----
i have written a new extension to embed music scores in MediaWiki pages:
unlike the Lilypond extension, this uses a simple input language (ABC) that is
much easier to validate for security. ABC is mostly used to transcribe Irish
trad and other simple tunes, but it recently gained support for more advanced
features, e.g. multiple staves and lyrics. this is supported in the extension
using the 'abcm2ps' tool.
unlike the existing ABC extension (AbcMusic), it doesn't support opening
arbitrary files as ABC input (which is a potential security issue), and has
several additional features:
- - The original ABC can be downloaded easily
- - The score can be downloaded as PDF, PostScript, MIDI or Ogg Vorbis
- - A media player can be embedded in the page to play the media file
i believe the ABC format is suitable for transcribing the majority of scores
currently on Wikimedia projects. although it can't handle all of them, it is
better than the current situation. plus, as ABC is simple, and existing ABC
scores are easily available, it's easier for novice users to contribute.
i would be interested to hear peoples' thoughts on enabling this extension on
-----BEGIN PGP SIGNATURE-----
-----END PGP SIGNATURE-----
Am 10.10.2008 um 21:22 schrieb Erik Moeller:
> 2008/10/10 Derbeth <derbeth(a)wp.pl>:
>> I wonder about the legal aspects. In my opinion, when you create a
>> ready-to-print version,
>> you have to attach the text of GFDL license to it - directly, not
>> as a link. Like it is done in
As Erik wrote: This is already implemented (either a title of an
article or a URL to some license text can be set in
LocalSettings.php), but it's currently not configured.
>> Secondly, current version of the tool does a plagiarism - beacause
>> it does not mention
>> image authors and does not provide any mean (like by making images
>> clickable) to check
>> these authors.
> Ouch, thanks for pointing that out. Tricky to do this automatically
> since it's all wiki-text with templates, but we'll investigate a
> solution here.
We'd highly appreciate input from the community regarding this topic!
The printed books from PediaPress contain a list of figures where the
license of each image is listed, together with the URL to the image
description page. As some kind of "hotfix" this solution could be
implemented in the PDF export of the Collection extension, too. But
this doesn't really solve the problem.
We think it's more of a technical/software thing, so I cross-posted
(and set Reply-To) to Wikitech-l.
In our opinion, license management/handling must be a core feature of
MediaWiki, because the software is explicitely developed for the
collaborative distribution of free content. Licenses of the containing
articles and images should not be represented via some agreed-upon
convention but via structured (and machine-readable) information,
available for each relevant object in the wiki.
Some information that would be desired:
- Full (official) name of the license(s).
- Whether the full text of the license has to be included or a
- Reference to the full text of the license(s) (in some rigidly
defined format like wikitext).
- Whether attribution is required. If so: The list of required
So, basically all the information that's required to check if it's
possible to take some part of the MediaWiki and use it somewhere else
and all the information that has to be included in that other place.
This information could be made accessible via MediaWiki API, but
ideally it's contained in the wikitext and/or XHTML, too.
All this could be handled via microformats, even inside of templates,
but the main point is that any kind of new technique has to be
enforced, ideally via MediaWiki software itself: In the commons wikis
there are some conventions that can be used in software by people/
companies like us (although we have to work with hacks and
workarounds), but oftentimes, in wikis with smaller communities this
information doesn't even exist at all.
-- Johannes Beigel
While search is being updated, a small plea for a slightly better interface
to namespace selection (and advanced options generally).
At the moment namespace is 18 check boxes at the bottom of the page. A lot
of clicking, each in its own box. Changing this to a list box (allowing
CTRL-CLICK or selection/deselection of multiple namespaces in one action)
would make this much easier to enter. Possibly also the ability in the
config file for a wiki sysadmin to specify some standard options for
* Articles and article discussion
* Articles, discussion, project space, and project talk
* Project and project talk spaces
* Images, categories, templates and their talk pages
* All namespaces
or an option "Also search talk pages". But some way to make namespace
A number of the options available will be inaccessible to many users because
they need to be typed in as a parameterized search text. I'm going to guess
most users will not access the new features, even though extremely helpful,
because of lack of confidence. As with Google advanced search, one solution
might be an "advanced search" interface where different fields can be
entered and the user is guided what to put in, for each. It may be a good
idea if we're beefing up internal search, to also ensure users discover the
new features and can easily access them :)
In conclusion, we should have a search interface containing three basic
- a big search box -- Google has a simple front page with a centralised
search box for a reason
- some common options, like those proposed by FT2:
On 31.10.2008 13:13:13, FT2 wrote:
> * Articles
> * Articles and article discussion
> * Articles, discussion, project space, and project talk
> * Project and project talk spaces
> * Images, categories, templates and their talk pages
> * All namespaces
> * Custom
- and finally a "Go" button.
People are familiar with Google. A somewhat similar interface approach would
seem familiar. With an "Advanced search" button.
I'd consider also using a second trick Google does. It tries to pick out a
few specific useful links and highlight them first. In our case, pages with
<text> in the title, or sequentially in the text as a string, may be more
likely to be high quality hits, than pages that "just had the words
scattered somewhere in the text". Maybe highlight a few of those? So the
output might be:
<Hit #1 + sample text>
<Hit #2 + sample text>
<Hit #3 + sample text>
---> Click here for more pages with X in the title
<Hit #4 + sample text>
<Hit #5 + sample text>
<Hit #6 + sample text>
<Hit #7 + sample text>
--> Click here for more pages containing the text "X"
<Regular hits from here on>
I'm not sure if this is a generic problem, but see:
The following image's text doesn't scale with the image:
The image link in the article is:
proportions.svg|right|200px|thumb|Proportions of uranium-238 (blue)
and uranium-235 (red) found naturally versus grades that are enriched
by separating the two isotopes atom-by-atom using various methods that
all require a massive investment in time and money.]]
I've checked on Firefox 3.0.3 and IE 7.0
I believe that this must be a recent change - I check that article
semi-regularly (it's on my watchlist along with other nuclear topics)
and I don't recall seeing any problems with it...
-george william herbert
Before I decide to work on it sometime in the future, anyone else
interested in creating a LocalFileRepo for Amazon's API?
Unless someone corrects me, the best method of dealing with Amazon S3
for storing images would be to make use of S3's API, rather than
mounting buckets onto the filesystem. The former should be more reliable
(^_^ trying to use a mountpoint will probably drive someone up the wall
like NFS does for brion), and also using the API should be more reliable
for handling multiple buckets, since as I recall the Amazon docs say
that buckets can only hold up to 5Gb each.
Though considering the large things to deal with for multiple buckets,
and the fact that the best methods will probably also have some url
redirect handling as well to keep the standard urls, it might be best as
an extension rather than put into core.
If I want to restrict the read access to some custom namespace named
"Restrict", and I implement the wiki site like this,
** db: wikidb
** permission: anyone can read pages
** no custom namespace
** db: wikidb (share the same table with publicwiki)
** permission: only sysop can read pages
** custom namespace name: "Restrict"
(I may optionally modify the login process, so that,
http://localhost/publicwiki --> sysop login --> forwarded to
My question is,
* Is there any problem if publicwiki and restrictwiki share the same
table, but restrictwiki is configured to recognize additional custom
namespace named "Restrict", while publicwiki not?