I recently set up a MediaWiki (http://server.bluewatersys.com/w90n740/)
and I need to extra the content from it and convert it into LaTeX
syntax for printed documentation. I have googled for a suitable OSS
solution but nothing was apparent.
I would prefer a script written in Python, but any recommendations
would be very welcome.
Do you know of anything suitable?
I've been putting placeholder images on a lot of articles on en:wp.
e.g. [[Image:Replace this image male.svg]], which goes to
[[Wikipedia:Fromowner]], which asks people to upload an image if they
I know it's inspired people to add free content images to articles in
several cases. What I'm interested in is numbers. So what I'd need is
a list of edits where one of the SVGs that redirects to
[[Wikipedia:Fromowner]] is replaced with an image. (Checking which of
those are actually free images can come next.)
Is there a tolerably easy way to get this info from a dump? Any
Wikipedia statistics fans who think this'd be easy?
(If the placeholders do work, then it'd also be useful convincing some
wikiprojects to encourage the things. Not that there's ownership of
articles on en:wp, of *course* ...)
> Message: 8
> Date: Fri, 12 Oct 2007 17:59:22 +0200
> From: GerardM <gerard.meijssen(a)gmail.com>
> Subject: Re: [Wikitech-l] Primary account for single user login
> This issue has been decided. Seniority is not fair either; there are
> hundreds if not thousands of users that have done no or only a few edits and
> I would not consider it fair when a person with say over 10.000 edits should
> have to defer to these typically inactive users.
1. Yes, it's not fair, but this is the truth on wikimedia project that ones
have to admit. Imagine if, all wikimedia sites has a single user login
since when it is first established, the one who first register will own that
username for all wikimedia sites.
2. The person with less edits, doesn't mean that they are less active than the
one with more edits. And according to,
``Edit counts do not necessarily reflect the value of a user's contributions
to the Wikipedia project.''
What if, some users have less edits count,
* since they deliberately edit, preview, edit, and preview the articles,
over and over, before submitting the deliberated versions to wikimedia
* Some users edit, edit and edit the articles in their offline storage, over
and over, before submitting the only final versions to wikimedia sites.
While some users have more edits count,
* since they often submit so many changes, without previewing it first, and
have to correct the undeliberated edit, over and over.
* Some users often submit so many minor changes, over and over, rather than
accumulate the changes resulting in fewer edits count.
* Some users do so many robot routines by themselves, rather than letting
the real robot to do those tasks.
* Some users often take part in many edit wars.
* Some users often take part in many arguments in many talk pages.
What if, the users with less edits count, try to increase their edits count
to take back the status of primary account.
What if, they decide to change their habit of editing, to increase the
* by submitting many edits without deliberated preview,
* by splitting the accumulated changes into many minor edits, and submit
* by stopping their robots, and do those robot routines by themselves,
* by joining edit wars.
3. According to 2) above, I think, the better measurement of activeness is to
measure the time between the first edit and the last edit of that username.
The formula will look like this,
activeness = last edit time - first edit time
> A choice has been made and as always, there will be people that will find an
> un-justice. There were many discussions and a choice was made. It is not
> good to revisit things continuously, it is good to finish things so that
> there is no point to it any more.
> On 10/12/07, Anon Sricharoenchai <anon.hui(a)gmail.com> wrote:
> > According to the conflict resolution process, that the account with
> > most edits is selected as a primary account for that username, this
> > may sound reasonable for the username that is owned by the same person
> > on all wikimedia sites.
> > But the problem will come when the same username on those wikimedia
> > sites is owned by different person and they are actively in used.
> > The active account that has registered first (seniority rule) should
> > rather be considered the primary account.
> > Since, I think the person who register first should own that username
> > on the unified
> > wikimedia sites.
> > Imagine, what if the wikimedia sites have been unified ever since the
> > sites are
> > first established long time ago (that their accounts have never been
> > separated),
> > the person who register first will own that username on all of the
> > wikimedia
> > sites.
> > The person who come after will be unable to use the registered
> > username, and have
> > to choose their alternate username.
> > This logic should also apply on current wikimedia sites, after it have
> > been
> > unified.
Please copy this to your local village pump or other relevant on-wiki forum.
Werdna's #ifexist limit feature is now live. In response to complaints of
template breakage, I have increased the limit on Wikimedia wikis
temporarily, from 100 to 2000. Barring a coup, it will stay at 2000 for
about a week, and then we'll lower it to 100.
Please use this one-week period to check pages and templates that use
#ifexist heavily. Look in the HTML source of the preview or page view.
There will be a "limit report" that looks like this:
Pre-expand include size: 617515/2048000 bytes
Post-expand include size: 360530/2048000 bytes
Template argument size: 51168/2048000 bytes
#ifexist count: 1887/2000
This is the limit report from
one of the pages that will break.
At the end of the week, any pages which have a #ifexist count of over 100
will cease to be rendered correctly (after the next edit or cache clear).
All #ifexist calls after the hundredth will be treated as if the target
does not exist.
In some cases it may be possible to rewrite your templates so that they
still do the same thing, but with less #ifexist calls. In other cases, you
will need to remove template features. Removing features is always sad, as
a sofware developer I know that, but sometimes it is necessary for the
good of the project. This is one of those times.
-- Tim Starling
> Earlier: "... Whether or not you think it's
> a waste of time, there's no excuse for
> broadcasting every parser bug you find
> to three mailing lists. There's no
> shortage of parser bugs, and no need
> to act surprised when you find one ...
> If we want to talk about the parser
> grammar effort, we all know which list
> to subscribe to ...
Peter Blaise responds: Oh? Which one? I do not know, and you do not
mention it in your post, so, help me out here, please - which list? If
you're gonna type something, why not make it unambiguously accurate and
complete, anyway? Otherwise, what's the point?
Additionally, I personally find cross posting very important. Of course
anyone NOT interested can just scroll on or delete - there's no such
thing as too much information in my book (on topic - I'm not talking
about spam or off topic posts). Parser behavior = wiki tech in my book.
I very often I find spirited discussions ensue because of cross-posted
ideas - it tends to freshen otherwise stale meeting places.
More to the point here, wiki markup parser wise, my point is twofold:
One, I would prefer NOT to have my wiki end users get error messages
when they edit. I'd prefer that any editing just go in and be saved,
and later we'll deal with formatting surprises. I'm a firm believer in
separating the tasks of content creation and content presentation.
Someone adding content to a wiki should never be delayed by presentation
formatting error messages. Let the text land however it lands, clean it
Two, we tend to discover how things work in spite of erroneous,
presumptive, naive instructions. I already have begun to discard the
"rule" that bold happened between three apostrophes. Instead, I've
discovered a hierarchy of toggles. Three apostrophes toggle bold to the
other state. Two apostrophes toggle italics to the other state. The
parser makes it decisions on how to interpret duplicate punctuation at
the END of any code that matches it's look-up-table, or at the first
"word barrier" transition. Or does it? Cut and paste this into any
sandbox page and explore:
'1text = apostrophe one text; no duplicate punctuation, no wiki markup.
''2text = italics two text; duplicate punctuation matched wiki markup,
and the parser toggles the state of the matching function, here,
'''3text = bold three text; duplicate punctuation matched wiki markup,
and the parser toggles the state of the matching function, here, bold.
''''4text = bold apostrophe four text; duplicate punctuation matched
wiki markup, and the parser toggles the state of the matching function,
here, bold (3 apostrophes) was the superior interpretable state before
the 4th apostrophe, so bold toggles (on or off), and the final
apostrophe is interpreted as mere punctuation or text. Alternatively,
four apostrophes could be considered as two toggles of the italics
function. But, since the first word barrier occurs only after the 4th
apostrophe, and there is no text between the apostrophes, that
interpretation would not have any visible effect on the display. It
probably makes sense to have the parser continue interpreting up to the
third apostrophe before making a decision, and consider it a call for a
bold-toggle, rather than consider the first two apostrophes as an
italics-toggle, and then start looking for a subsequent wiki markup
instruction. Otherwise, the parser would never find bold (3
apostrophes) if it always gave precedent to interpreting the first 2
apostrophes as italics. The parser seems to reads left to right, and
interprets according to (what we hope are ) discoverable hierarchies:
word transitions, or, matching duplicate punctuation to wiki markup
code, whichever it finds first, such as knowing that three apostrophes
'''''5text = italics bold five text; which toggled first? Who knows? I
presume bold toggles first, then italics toggled. Let's test:
'''bold '''''5text = italics five text (no bold); implying the five
apostrophes were interpreted bold as highest in the hierarchy, so, of
the five apostrophes, the first three were considered a bold-toggle, and
final two were considered an italics-toggle.
''italics '''''5text = bold five text (no italics); implying bold again
wins, and the subsequent italics-toggle turns off italics as expected,
the pattern is, so far, predictable. But let's revisit four
'''bold ''''4text = bold apostrophe normal four text, this makes no
sense. In the four apostrophe grouping, the first three should have
toggled bold, and the final should have been interpreted as text,
displaying '4text normal, but when actually displayed, the ' was bold
and the 4text was normal. Huh? THERE'S THE BUG!
''italics ''''4text = normal two apostrophe four text, again, in the
four apostrophe group, the first three should have been a bold toggle
and the subsequent apostrophe should have been text. Apparently the
parser holds the existing state of wiki markup toggle in it's head and
raises that in the hierarchy. Who does this programming, anyway? Let's
test italics on and off first, not just on first:
''real italics'' ''''4text = normal apostrophe, bold four text. This
SHOULD be the same as avobe, but isn't. Apparently we need to add one
more item to our expected parser function hierarchy, THIS IS WHAT THE
PARSER SEEMS TO ASK:
1 - is there a bold or italics toggle ON outstanding? (This surprised
me, I thought toggles ON and OFF were hierarchically equivalent, but
apparently a toggle ON creates a pressing need to look for a toggle OFF
before interpreting anything else!)
2 - have we reached the superior matching wiki markup text? (On other
words, ''' is superior to '' in the look-up-table.)
3 - have we reached a text word barrier or paragraph barrier?
(Supposedly, paragraph markers reset all toggles to OFF, but apparently
some wiki markup survives paragraph markers, or is it only HTML-style
markup using <markup></markup>-style coding that ignores paragraph
Continuing the test:
''''''6text = italics bold apostrophe six text
'''''''7text = italics bold 2 apostrophe seven text
''''''''8text = italics bold 3 apostrophe eight text
... and so on.
today we came over 10k HTTP requests per second (even with inter-squid
traffic eliminated). Especially thanks to Mark and Tim, who've been
improving our caching, as well as doing lots of other work, and
achieved incredible results (while I was slacking). Really, thanks!
I've searched Google, mediawiki.org, the mailing list archives, and
looked through the listed extensions, but I have been unable to find
anything about keeping mediawiki accounts from being brute-forced. I'm
specifically looking for something that locks an account down after a
specified number of login attempts or which adds time between login
requests when the password is given incorrectly. Do measures like this
exist? Did I just use the wrong search terms?
On 11/29/07, werdna(a)svn.wikimedia.org <werdna(a)svn.wikimedia.org> wrote:
> Revision: 27946
> Author: werdna
> Date: 2007-11-29 10:42:48 +0000 (Thu, 29 Nov 2007)
> Log Message:
> Prevent the parser function #ifexist being used more than a certain amount on error pages. The maximum number of #ifexist queries can be set in $wgMaxIfExistCount, which is, by default, 100. Any more uses over here will default to the "else" text. Done to discourage templates like Template:highrfc-loop on enwiki, which willingly does something like 50 database queries for a template that is used for many users on one page. Clearly suboptimal from a performance perspective.
Is there any way, instead, to defer these so they can be done all at
once? Tim, do you have any thoughts on whether this is reasonable?
(I would suspect not, but it doesn't hurt to ask.)