Hi,
We have setup Mediawiki within our University and have a fair understanding
of it. Hoever, we failed to test the block feature and we discovered this
after setting it up and starting to use it :(
Block seems to clock the username from posting but it does not block the
user from logging in. Has there been any work to either block logins for
blocked accounts or add an enabled flag in the user table?
Although wikis should be open communities, we use it for internal
documentation and when an employee leaves (or student drop a course etc), we
would like to be able to delete the or block the account from gaining read
access. I am willing to work on this code if no one has done so already but
want the efforts to be put into the main source so I dont have to do it over
and over each time there is an upgrade.
Has anyone done this/started this/talked about this?
Dion Rowney
U of S
I don't know when I first heard the plans for wikidata, but one
year ago I proposed a more light-weight alternative approach on
the [[m:talk:Wikidata]] page. Then nothing happened, and today I
implemented it. It's 98 lines of Perl that processes an XML dump
and extracts template call parameter values. The source code can
be had from http://meta.wikimedia.org/wiki/User:LA2/Extraktor
The SQL dump of the templatelinks table already tells us which
pages calls which templates. This script goes beyond that to get
information about each individual call parameter.
The output format is a very simple awk-friendly text file. For
example, the German Wikipedia page [[de:Anthony Hope]] contains
the two template calls
{{PND|11901842X}}
{{Personendaten|
NAME=Hope, Anthony
|ALTERNATIVNAMEN=Hawkins, Anthony Hope
|KURZBESCHREIBUNG=englischer [[Rechtsanwalt]] und [[Autor]]
|GEBURTSDATUM=[[9. Februar]] [[1863]]
|GEBURTSORT=[[London]]
|STERBEDATUM=[[8. Juli]] [[1933]]
|STERBEORT=
}}
For this page, the output contains:
PND|Anthony Hope|1|1|11901842X
Personendaten|Anthony Hope|2|NAME|Hope, Anthony
Personendaten|Anthony Hope|2|ALTERNATIVNAMEN|Hawkins, Anthony Hope
Personendaten|Anthony Hope|2|KURZBESCHREIBUNG|englischer [[Rechtsanwalt]] und [[Autor]]
Personendaten|Anthony Hope|2|GEBURTSDATUM|[[9. Februar]] [[1863]]
Personendaten|Anthony Hope|2|GEBURTSORT|[[London]]
Personendaten|Anthony Hope|2|STERBEDATUM|[[8. Juli]] [[1933]]
Personendaten|Anthony Hope|2|STERBEORT|
As you can see, the |-separated fields are:
1. Name of the template called
2. Name of the page that called the template
3. Sequence number of this call within the page
4. Name or position number of the parameter
5. Value of the parameter
The output for the entire German Wikipedia dump is
bunzip2 <dewiki-20060803-pages-articles.xml.bz2 |
perl extraktor.pl >de.params
du -sm de.params
123 megabytes. With some simple awk, I get the following
statistics: There are
awk '-F|' '{print $2,$3}' de.params | sort -u | wc -l
790,985 template calls using a total of
wc -l de.params
2,076,178 parameters (on average 2.62 parameters per call) from
awk '-F|' '{print $2}' de.params | sort -u | wc -l
397,929 different pages to
awk '-F|' '{print $1}' de.params | sort -u | wc -l
13,295 different templates. The most commonly supplied parameter
names over all templates are
awk '-F|' '{print $4}' de.params | sort | uniq -c | sort -nr
NAME (113038 occurances), ALTERNATIVNAMEN (101799),
KURZBESCHREIBUNG (101723), GEBURTSORT (101706), GEBURTSDATUM
(101704), STERBEDATUM (101663), STERBEORT (101649), ID (10061),
ZEIT (6255), VORGÄNGER (6210), NACHFOLGER (6210), AMT (6199),
EINWOHNER (5942), FLÄCHE (5868), WEBSITE (5680), STAND_EINWOHNER
(5619), Name (5307), PJ (5242), PL (5240), LEN (5224), DS (5219),
OS (5214), OT (5152), MUSIK (5137), DT (5117), TITEL (4750),
INHALT (4739), PRO (4658), REG (4639), DRB (4627), AF (4568),
KAMERA (4557), SCHNITT (4516), Bild (4038), BILD (3562), PLZ
(3344), HÖHE (3265), GEMEINDEART (3180), BREITENGRAD (3074),
LÄNGENGRAD (3061), KANTON (3013), and NAME_ORT (3005).
Yes, the bad taste of all-caps parameter names is a disease of the
German Wikipedia since the early days of the Personendaten
project. Personendaten is also the template that is called from
100,000 different pages. Let's see which templates use the
parameter named GEMEINDEART (kind of municipality):
awk '-F|' '$4 == "GEMEINDEART" {print $1}' de.templates |
sort | uniq -c | sort -nr
Ort_Schweiz (2738 calls), Ortschaft_Schweiz (196),
Infobox_Slowakische_Gemeinde-K (121), Infobox_Slowakische_Gemeinde
(111), Ort_Liechtenstein (11), Infobox_Schweizer_Gemeinden (2),
Infobox_Deutsche_Städte (1).
Let's see which kinds of municipalities there are in Slovakia:
awk '-F|' '$1 == "Infobox_Slowakische_Gemeinde" &&
$4 == "GEMEINDEART" {print $5}' de.templates |
sort | uniq -c | sort -nr
Stadt (74), Stadtteil (21), Gemeinde (16).
And in Switzerland:
awk '-F|' '$1 == "Ort_Schweiz" &&
$4 == "GEMEINDEART" {print $5}' de.templates |
sort | uniq -c | sort -nr
Gemeinde (2591), Stadt (126), Gemeinden (12).
Perhaps "Gemeinden" (a plural) is an error that should be fixed?
Let's see which twelve pages use this value for this parameter to
this template:
awk '-F|' '$1 == "Ort_Schweiz" &&
$4 == "GEMEINDEART" &&
$5 == "Gemeinden" {print $2}' de.templates
Benken ZH, Flaach, Adlikon bei Andelfingen, Andelfingen ZH,
Berg am Irchel, Buch am Irchel, Dachsen, Dorf ZH, Feuerthalen,
Humlikon, Flurlingen, Henggart.
Hmm... It turns out that GEMEINDEART is not used in this infobox
template. That's odd. I'll leave it there.
I hope you get the point. Of course you can use your favorite
SQL database instead of awk. If you want speed, be sure to create
indexes for every column.
Imagine if there was a templateparameter table supported by
Mediawiki, then we could do this in real time. I'm wysiwyg
filling out an infobox template here. Which parameter names
should I supply? Which values should I typically use?
--
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik - http://aronsson.se
On 30/08/06, brion(a)svn.leuksman.com <brion(a)svn.leuksman.com> wrote:
> Revision: 16282
> Author: brion
> Date: 2006-08-30 03:56:17 -0700 (Wed, 30 Aug 2006)
>
> Log Message:
> -----------
> * Fix bug in wfRunHooks which caused corruption of objects in the hook list
> References EVIL! Not needed anymore in PHP 5 anyway.
Oh, thank fuck. I thought for a moment that a recent commit of mine to
Parser.php had screwed up the software, but couldn't find the
problematic change - all I could determine was that, for me, it seemed
to trigger when Cite was active.
It's wonderful what PHP will chuck out as HTML output when it
encounters "interesting" bugs like this. :)
Rob Church
Can you please take my son off this mailing list. For some reason he is receiving numerous e-mails, probably due to his signing up mistakenly. So sorry for the trouble. Thank you. Jill Aldridge
An automated run of parserTests.php showed the following failures:
Running test TODO: Table security: embedded pipes (http://mail.wikipedia.org/pipermail/wikitech-l/2006-April/034637.html)... FAILED!
Running test TODO: Link containing double-single-quotes '' (bug 4598)... FAILED!
Running test TODO: Template with thumb image (with link in description)... FAILED!
Running test Template infinite loop... FAILED!
Running test TODO: message transform: <noinclude> in transcluded template (bug 4926)... FAILED!
Running test TODO: message transform: <onlyinclude> in transcluded template (bug 4926)... FAILED!
Running test BUG 1887, part 2: A <math> with a thumbnail- math enabled... FAILED!
Running test TODO: HTML bullet list, unclosed tags (bug 5497)... FAILED!
Running test TODO: HTML ordered list, unclosed tags (bug 5497)... FAILED!
Running test TODO: HTML nested bullet list, open tags (bug 5497)... FAILED!
Running test TODO: HTML nested ordered list, open tags (bug 5497)... FAILED!
Running test TODO: Parsing optional HTML elements (Bug 6171)... FAILED!
Running test TODO: Inline HTML vs wiki block nesting... FAILED!
Running test TODO: Mixing markup for italics and bold... FAILED!
Running test TODO: 5 quotes, code coverage +1 line... FAILED!
Running test TODO: HTML Hex character encoding.... FAILED!
Running test TODO: dt/dd/dl test... FAILED!
Passed 412 of 429 tests (96.04%) FAILED!
Would it be possible to follow the lead of many websites, and replace
the two-step phase "The page ... has been removed from your
watchlist."/click "Return to ...", by a one-step phase, whereby the
page is shown again, but with a banner up the top that reads "This
page has been removed from your watchlist"?
I imagine the mechanism would be something like what happens when you
click a redirect (as we were discussing recently): The target of the
redirect is shown, but with some added text explaining where you came
from.
This would apply to at least these actions:
* Watch
* Unwatch
* Move
Something similar could be done when saving an edit, adding text like
"Your edit has been saved."
Any takers? Should I bugzilla this asa formal feature requset?
Steve
On 25/08/06, yurik(a)svn.leuksman.com <yurik(a)svn.leuksman.com> wrote:
> Revision: 16217
> Author: yurik
> Date: 2006-08-24 21:32:33 -0700 (Thu, 24 Aug 2006)
>
> Log Message:
> -----------
> * Now [[MediaWiki:Disambiguationspage]] may have either disambig template name, or a list of links to disambig templates.
For what purpose?
> + if( $set === false ) {
> + $set = 'FALSE'; # We must always return a valid sql query, but this way DB will always quicly return an empty result
Don't issue the SELECT at all in that case, it's a waste of a database
connection.
You didn't update the release notes. This is a change which will
affect the user's use of that message, plus the behaviour of the page;
the release notes MUST be updated.
Rob Church
Hello,
I read the thread "how bad is a category with ....", and I was wondering
how categories were filled. If I understand well, categories were filled
by editors of the article. This assume that these editors know the whole
set of categories and that these categories will not change with time ?
I was wondering if there is projects to help *detect* categories and
then to help editors by *suggesting* categories ?
I am thinking about two different technologies to help dealing with
these two problems :
1) Text clustering to help finding categories but probably not using
classical approaches where words space is used to describe a document
(applying a part of speech tagging
<http://en.wikipedia.org/wiki/Part-of-speech_tagging>, stemming
<http://en.wikipedia.org/wiki/Stemmer>, ...). I am thinking about
clustering links graph (seems similar to the clique problem
<http://en.wikipedia.org/wiki/Clique_problem> but with different
constraints), i.e. each document will not be described by his words (or
lemmas, LSA vector...) but by his links to other articles using an
algorithm that do not needs the number of cluster before processing but
needs a distance or a similarity threshold. With this kind of
processing, you will have a set of clusters that are linked together,
but a cluster will probably not be a complete graph (this is the
difference with the clique problem). Once you have the clusters, you
need to try labeling them with a category :
- give to the user the role of identifying the category name
- use the words space to find the better words that describe this set
of articles
- ...
Then you can run this algorithm on a category to try to split it in sub
categories.
2) Machine learning or links graph exploration to suggest categories
during edition of an article.
This first idea is to try to learn existing categories with a machine
learning algorithm (using words space) to guess categories of a new
article (but this algorithm will have to deal with the new categories
and the fact that the number of document not having a category is grater
than number of document having a category).
The second idea is really more simple and easier to implement : When you
edit an article, you can suggest categories of linked articles (can be
replaced by an other graph-exploration algorithm).
Is there some functions like these in Wikimedia ? and to you think that
this kind of algorithms could help ?
Finally, do you know people working on this functionalities (maybe
people working on semantic web ?)
Best Regards.
Julien Lemoine
I know I've done this once before, but this one's worse:
The name Pluto was first suggested by [[Venetia Burney|Venetia Phair
(née Burney)]], at the time an eleven-year-old girl from [[Oxford,
England|Oxford]], [[England]].<ref>{{cite web
|url=http://news.bbc.co.uk/1/hi/sci/tech/4596246.stm
|title=The girl who named a planet
|first= Paul
|last= Rincon
|publisher=BBC News
|accessdate=2006-03-05}}</ref> Venetia, who was interested in
[[Classical mythology]] as well as astronomy, suggested the name, the
Roman equivalent of [[Hades]], in a conversation to her grandfather
[[Falconer Madan]], a former [[librarian]] of [[Oxford University]]'s
[[Bodleian Library]].<ref>{{cite web
|url=http://www.amblesideonline.org/PR/PR62p030PlanetPluto.shtml
|title=The Planet 'Pluto'
|first= K.M
|last= Claxton
|publisher=Parents' Union School Diamond Jubilee Magazine, 1891-1951
(Ambleside: PUS, 1951), p. 30-32
|accessdate=2006-08-24}}</ref> Madan passed the suggestion to
Professor [[Herbert Hall Turner]], Turner then cabled the suggestion
to colleagues in America. After favourable consideration which was
almost unanimous{{fact}}, the name Pluto was officially adopted and an
announcement made by Slipher on [[1930-05-01]].
---
Can you believe that in that chunk of text, there are actually three
separate pieces of text, with two references between them? It's
totally unmanageable - attempting to actually edit the text that's
buried in there as a cohesive whole is next to impossible. Solutions
desperately wanted.
Steve