Hi!
I have published a draft of how changes on the wikidata repository are going to
percolate to the client wikis:
https://meta.wikimedia.org/wiki/Wikidata/Notes/Percolation
Any feedback would be appreciated!
Of course, we are not starting this from scratch. We are currently implementing
a stripped down, naive version of the draft. Basically, it works like this:
* Each change on the repository is recorded in the changes table.
* On each client wiki, a poll script periodically checks the changes table.
* The polling script maintains a local copy of the latest version of each data
entity on each cluster used by client wikis.
* If any page on the wiki is affected by the change, an entry representing that
change is injected into the client's recentchanges table.
* Wiki pages that are affected by a change are invalidated.
I think this will work for now, that is, for a small number of client wikis. The
new draft is an attempt to make this architecture scale up to several hundred
client wikis on multiple database clusters.
-- daniel
WereSpielChequers, 15/10/2012 09:56:
> 60 edits a minute sounds high, and probably faster than most of these
> sessions run at, but not if it is as I suspect, calculated every few
> seconds.
It's not, as far as I can see. This is how it works:
<https://www.mediawiki.org/wiki/Manual:$wgRateLimits> (someone please
expand it otherwise).
And these are all the existing limits:
<https://gerrit.wikimedia.org/r/gitweb?p=operations/mediawiki-config.git;a=b…>
Does Andrew's experience not fit with this?
> So if the tutor says "all save now" and ten people hit enter
> simultaneously the attempted editing rate is briefly rather more than 1 per
> second - hence the throttle kicks in and the tutorial collapses in chaos
> with several students getting throttling errors at the same time. It would
> be nice to think that the WiFi we used was going through the same IP as the
> rest of the British library and that we merely lifted the normal editing
> rate above 60 edits a minute, but I suspect that the rate is calculated
> rather more frequently than every minute.
>
> Presumably established users of some sort are whitelisted through this? If
> so it could explain a longstanding Cat a Lot problem. I frequently use Cat
> a lot to categorise images on Commons and my personal editing rate there
> has gone far above 60 edits a minute, however I'm pretty sure I'd be on any
> commons whitelist. But other editors have complained that Cat a Lot doesn't
> work for them and mysteriously hangs or fails, Is it possible that this
> throttling feature could be the cause of that problem as well?
noratelimit circumvents all such limits, but on Commons only the
standard groups plus account creators have it, and you're just
autopatrolled.
The only group having serious throttling problems in the past were
rollbackers on en.wiki; it shouldn't be too hard for Commons to add
noratelimit via some group, if that's a problem.
> If so perhaps it would be a good idea to analyse some of the recent
> incidents where this feature has kicked in, see how often it disrupts
> goodfaith editing and how often it disrupts badfaith editing that wouldn't
> have triggered the edit filter. Maybe this was once a net benefit, but with
> the edit filter dealing with most badfaith editing, and increasing amounts
> of editing workshops and tools like Catalot, perhaps this feature has
> transitioned from net positive to net negative? Alternatively could we have
> a process where we can whitelist the IP Addresses of places where we are
> running training sessions, and put note on
> http://commons.wikimedia.org/wiki/MediaWiki_talk:Gadget-Cat-a-lot.jsexplain…
> how to spot if your editing has been throttled and how to get
> yourself Whitelisted
Rate limits have never been a problem with some minimal preparation:
<https://www.mediawiki.org/wiki/Help:Mass_account_creation> (in 6-7
years of WMIT workshops, I've never heard of big problems with this).
Nemo
Hi folks,
I implemented a new feature to wm-bot which is still being tested. It
allows to parse and report RSS feeds into channel with custom format.
There is going to be special RSS parser optimized for wikimedia
bugzilla so that you should be able to insert any special "RSS items"
such as a user, ticket status, ticket dependencies etc. as variables
in a message template.
That means anyone should be able to create a custom irc feed for
bugzilla and use it in any wikimedia related irc channel you want (for
example, right now we have a bugzilla feed in #wikimedia-labs that
reports only labs related bugs). You can generate RSS feed in
bugzilla, just by creating a new search, then you can click link
"Feed" which is on bottom of each search results page.
(Example step by step to create rss bugzilla feed for a channel #wikimedia-labs)
1. Get a bot to channel (instructions at http://meta.wikimedia.org/wiki/Wm-bot)
2. Enable RSS feed by typing
@rss-on
3. Create a custom search in bugzilla for all bugs which were changed
in last 4 hours and convert it to RSS feed:
https://bugzilla.wikimedia.org/buglist.cgi?chfieldfrom=-4h&chfieldto=Now&li…
4. Insert this rss feed to bot by typing
@rss+ bugzilla https://bugzilla.wikimedia.org/buglist.cgi?chfieldfrom=-4h&chfieldto=Now&li…
5. Change the default template for rss item to something better:
@configure style-rss=[$name] ticket name: $title (ticket created by
$author) url: $link
please note that there is a variable $description which contains some
html code generated by bugzilla, which is useless in irc and needs to
be removed by overriding the default template
the special bugzilla items ($bugzilla_id, $bugzilla_createddate,
$bugzilla_changeddate, $bugzilla_lastuser, $bugzilla_status) are not
available yet, but should be soon, also there is still lot of stuff to
improve, but I hope one day we will be able to put bugzilla irc bots
to every dev channel we need it in and let it report only bugs we are
interested in.
--Petr
Hi everyone, it's some hot SMWCon news!
"Ask the speaker" is an experimental feature of our conference that we
want you to try to make it more exciting and to adjust the contents of the
talks to your specific need.
Each talk has an infobox with the link "Ask the speaker" that allows you to
ask the presenter questions you're interested in. Do you want the speaker
to talk more about the technical details of the topic? or maybe you're
interested in business model? or maybe you need more scientific details?
Don't hesitate to communicate the speaker even before the conference
begins.
All the talks are listed in the Agenda:
http://semantic-mediawiki.org/wiki/SMWCon_Fall_2012/Agenda
See you in Cologne!
Yury Katkov
P.S. yes,yes, nowadays 'Ask the speake'r just links to the discussion page
of the talk. :)
On Wed, Oct 10, 2012 at 8:44 PM, Markus Krötzsch <
markus(a)semantic-mediawiki.org> wrote:
> Dear all,
>
> the program for the upcoming SMWCon in Cologne is becoming more and more
> stable [1]. Most talks should be at their almost final location now.
> There are quite a few highlights that are worth mentioning:
>
> * We have two exciting keynote talks by Denny Vrandecic (Wikimedia
> Germany e.V.) and Peter Haase (fluidOps):
>
> Denny will introduce Wikidata, the next big thing for Wikipedia, and the
> underlying software Wikibase. The co-operation of SMW and Wikidata will
> be an important topic of this SMWCon.
>
> Peter will introduce the Information Workbench, a semantic knowledge
> management solution by fluidOps. For the first time, SMWCon will include
> a number of talks on related systems that are not SMW. Other highlights
> in this category include OntoWiki, BlueSpice, SlideWiki, and the
> Drupal-based Planetary System. I am sure that it will be insightful and
> inspiring to exchange experiences with these projects.
>
> * We'll have a number of practical experience talks. I am particularly
> looking forward to the insights of Wikia Inc., presented by Krzysztof
> Krzyżaniak (eloy).
>
> * Joel Natividad will join us live from NY to report about his
> award-winning sites and new smart city projects.
>
> * And of course there will be plenty of updates on SMW and its old and
> new extensions, including a number of presentations about using SMW in
> completely new ways.
>
> The tutorial day will leave more space for discussions and practical
> work, especially to discuss problems and ideas with the developers (as
> usual, we will have a very high concentration of those). As a special
> "non-semantic" tutorial, Yury Katkov will share his first-hand
> experience in fighting spam on semanticweb.org and
> semantic-mediawiki.org, a topic that concerns many public SMW sites.
>
> Cheers,
>
> Markus
>
> [1] http://semantic-mediawiki.org/wiki/SMWCon_Fall_2012/Agenda
>
>
> ------------------------------------------------------------------------------
> Don't let slow site performance ruin your business. Deploy New Relic APM
> Deploy New Relic app performance management and know exactly
> what is happening inside your Ruby, Python, PHP, Java, and .NET app
> Try New Relic at no cost today and get our sweet Data Nerd shirt too!
> http://p.sf.net/sfu/newrelic-dev2dev
> _______________________________________________
> Semediawiki-devel mailing list
> Semediawiki-devel(a)lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/semediawiki-devel
>
We should probably update the documentation for $wgSecretKey however I'm
not sure the best way to write it.
https://www.mediawiki.org/wiki/Manual:%24wgSecretKey
Right now $wgSecretKey is 99% worthless. We aren't using it directly
anymore. We now generate all tokens with proper cryptographic random
sources. So we don't base security off keeping a string secret anymore.
((That %1 is from the fact that if you have no access to urandom or are on
old php, no mcrypt random, and no openssl random we do use $wgSecretKey as
a very small source of entropy, but it's of barely any value most of our
entropy comes from clock drift in that case.))
At the same time it's worth noting the warning about user_token. It does
not apply to any new user_token but old user_tokens for users who have not
updated their passwords resulting in the reset of user_token on wikis that
have not done a full reset will still be somewhat vulnerable to
$wgSecretKey leaks.
--
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]
Summary: I'm looking for people to discuss about the parsing of math.
Dear all,
I came up with a proposal for a new version of the rendering of the
<math> tag. I proposed to use LaTeXML to convert the LaTeX expressions
in the math tag to MathML. If the browser is not capable of displaying
MathML I use MathJaX to display the MathML output in the browser.
My implementation (the LaTeXML branch) has only a few very little
differences in contrast to the master branch. I have the feeling that
the php code of the math extensions could be improved. For example I'd
suggest to put all the texvc related stuff to another class.
Furthermore I was thinking about an asynchronous rendering of the
formula, which would speed up page loading time especially for major
edits.
Attached you find the draft of a paper where I describe in detail what
I changed and why it is an improvement. The paper will appear soon in
the postconference proceedings of CICM2012.
Now I want to figure out, who is working on the development of the
math extension, and who wants to discuss with me about our ideas.
I'm open to any kind of suggestions and questions.
Best regards
Moritz
--
Mit freundlichen Grüßen
Moritz Schubotz
Telefon (Büro): +49 30 314 22784
Telefon (Privat):+49 30 488 27330
E-Mail: schubotz(a)itp.physik.tu-berlin.de
Web: http://www.physikerwelt.de
Skype: Schubi87
ICQ: 200302764
Msn: Moritz(a)Schubotz.de
Hi,
I'm new on this list but found that the last thread about ExternalAuth [1]
dated back from 2010 [2] but I thought it was acceptable to bring up
the subject again :)
Stated simply: many AuthPlugin modules stick to using "External
Sessions" for SSO purpose and only implement the "UserLoadFromSession"
hook. They don't bother implementing a "true" authentication plugin.
In such a case [3] this is often incompatible with the use of MW XML API.
ExternalAuth provides a clean API for this which even appears to be
used by the MW code-base itself:
in SpecialUserlogin.php:
> function authenticateUserData() {
> [...]
> $this->mExtUser = ExternalUser::newFromName($this->mUsername);
> [...]
> $this->mExtUser->authenticate($this->mPassword);
The issue here is that a regular AuthPlugin (a class implementing
AuthPlugin) is still needed, at the very least because soon after
happens an unconditional call to:
> $u->checkPassword().
[ and User::checkPassword() only uses $wgAuth ]
questions:
1) if ExternalAuth->authenticate() succeeded why do we needed
User::checkPassword() ? It seems like this is an unneeded duplicated
check ?
2) User::checkPassword() makes no consideration for ExternalAuth: it
always use $wgAuth and only $wgAuth.
=> 2.1) does it mean that an AuthPlugin *must* be associated to each
ExternalAuth extension ?
=> 2.2) or does it mean that User::checkPassword() should be fixed to
call authenticate() from the proper class (either AuthPlugin or
ExternalAuth) ?
If the answer to 2.1 is "yes", then another question arises:
2.1.1) how to access and make use of the ExternalAuth object ($mExtUser
in LoginForm) from $wgAuth->authenticate() so that it's not necessary to
duplicate code among both classes ?
I attached to pseudo-patch to workaround what is problematic to me.
thank you in advance for your answers.
footnotes:
[1] http://www.mediawiki.org/wiki/ExternalAuth
[2] http://article.gmane.org/gmane.science.linguistics.wikipedia.technical/48044http://article.gmane.org/gmane.science.linguistics.wikipedia.technical/47710
[3] I personally keep in mind the case of AuthDrupal:
http://www.mediawiki.org/wiki/AuthDrupalhttps://drupal.org/project/mediawikiauthhttps://gitorious.org/drzraf/drupal-mediawiki/commits/custom
Hi all,
here is our weekly report on changesets to core relevant for Wikidata
development.
* Great news! The Wikidata branch got merged, possibly the biggest
single changeset to MediaWiki. Thanks to everyone for their input, I
am afraid to try to list them all because I would fail. Special thanks
to Tim for accompanying the development of the branch for the last
seven months (!), and congratulations to Duesentrieb for creating it.
We had cake. <http://instagram.com/p/QmtQaqhE1N> Thanks to Siebrand
for pushing the button.
The ContentHandler branch created a number of follow up items, many of
which are already merged. Here are a few open ones:
* Fix declaration of content_model and content_format fields:
<https://gerrit.wikimedia.org/r/#/c/27394/>
* Support plain text content:
<https://gerrit.wikimedia.org/r/#/c/27399/> (two +1s already)
* Silence warnings about deprecation by ContentHandler (since there
are too many right now, has already been discussed on this list)
<https://gerrit.wikimedia.org/r/#/c/27537/>
Besides the content handler als the ORMTable to access foreign wikis
changeset got merged. Yay to Chad!
<https://gerrit.wikimedia.org/r/#/c/25264/>
Also the Sites management got merged after a review by Asher for its
DB impact and by Chad for the code. You are awesome! Thanks to Jeroen
for working on this <https://gerrit.wikimedia.org/r/#/c/23528/>
We have one more changeset open, and I'd love to see that one closed too:
* Sorting in jQuery tables:
<https://gerrit.wikimedia.org/r/#/c/22562/> That would be awesome to
merge. It got several +1's over its lifetime, it has been around for
more than a month, it got improved continually. If you have comments,
please write them into the code, Henning will take care of them.
Thank you so much for your awesome help everyone!
Cheers,
Denny
--
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.