I somehow clicked reply instead of reply to all, my response is bellow...
On Mon, Jul 29, 2013 at 1:14 PM, Petr Bena <benapetr(a)gmail.com> wrote:
> On Mon, Jul 29, 2013 at 1:04 PM, Antoine Musso <hashar+wmf(a)free.fr> wrote:
>> Le 28/07/13 18:35, Petr Bena a écrit :
>>> I think you kind of misunderstood my proposal hashar :) I know that,
>>> IRC feed is where the dispatcher is going to take data from, the
>>> difference is, that dispatcher is a special service for bot operators,
>>> that allow them to subscribe for selected pages / authors (even using
>>> regular expressions) and it would filter these for them from RC feed
>>> (currently the IRC version) and fill them up in a redis queue they
>>> specify in a format they prefer.
>>
>> Petan, MzMcBribe, Ori and I had an IRC discussion on that topic this
>> morning. Here is a quick summary.
>>
>>
>>
>> What I dislike in your proposal is that you are still relying on the IRC
>> feed service which is not the best way to publish metadata. It is really
>> meant to be consumed by IRC client for friendly human displaying.
>>
>
> As I said on irc, the source code is very flexible, and indeed I am
> now relying on the /only/ feed service we have in this moment, which
> is IRC feed. No matter if we like it or not, it's the only service we
> have and I MUST use it because there is no other thing. Once there is
> anything better I can use that instead of IRC.
>
>> For the context the related code is in RecentChange::getIRCLine() and as
>> an exemple there is the title formatting:
>>
>> "\00314[[\00307$title\00314]]";
>>
>> Not easily parseable. Moreover the code has plenty of exceptions and
>> craft a URL for end user to click.
>>
>>
>> As I understood it, your bot would parse the horrible IRC syntax, craft
>> some JSON and write it inn Redis for bots to consume. Thus bots authors
>> will no more have to care about IRC format. That is an improvement, but
>> we can do better.
>>
>
> That is sort of true. The dispatcher will convert the current irc
> message to some serializable class item. That can be serialized to
> whatever format the bot developer who is target consumer prefer. In
> this moment plain text (separated values with pipe) / xml and json are
> available
>
>> Instead, we could have MediaWiki send JSON directly. Victor Vasiliev
>> propsed a change to provide a JSON feed:
>>
>> https://gerrit.wikimedia.org/r/#/c/52922
>>
>> We could have that feed send to EventLogging zero mqueue, and write
>> subscribers to it that would put the RC events in Redis.
>>
>
> That's indeed interesting, for dispatcher this means only that the
> current parser of edits would be replaced with json parser (instead of
> irc parser). However the subscribers you talk about is exactly what
> dispatcherd is doing now (its existence kind of kills the requirement
> of bot developers to create their own, which may be a lot of work).
> People can subscribe to RC feed using a simple 2 line (in future
> hopefully 1 line) command in terminal, which automagically creates a
> redis queue filled with edits, see
> https://wikitech.wikimedia.org/wiki/Bot_Dispatcher#Example_usage
>
>> To achieve that:
>>
>> - we need Victor patch to be polished up and deployed
>> - find out what need to be written to Redis (one queue per bot? A
>> shared queue?)
>> - write a zmq subscriber to publish in Redis
>>
>> Eventually provide some library for bots author to easily query their
>> Redis queue.
>>
>>
>> In the end you have:
>> - a very robust feeding system which is on par with the other events
>> feeds we are already maintaining
>> - got rid of IRC formatting
>> - nice JSON out of the box :-]
>>
>>
>> --
>> Antoine "hashar" Musso
Hi,
I have an idea for new service we could implement on tools project
that would greatly save system resources. I would like to have some
kind of feedback.
Imagine a daemon similar to inet.d
It would watch the recentchages of ALL wikis we have @wm and users
could subscribe (using web browser or some terminal interface) to this
service, so that on certain events (page X was modified), this bot
dispatcher would do something (submit their bot on grid / sent some
signal / tcp packet somewhere / insert data to redis etc etc).
This way bot designers could very easily hook their bots to certain
events without having to write their own "wiki-watchers". This would
be extremely useful, not just for bots that should be triggered on
event (someone edit some page) but also for bots that run
periodically.
For example: archiving bot is now running in a way, that it checks ALL
pages where the template for archiving is. No matter if these talk
pages are dead for years or not. Using such a dispatcher, everytime
when a talk page was modified some script or process could be launched
that would add it to some queue (redis like, even the dispatcher could
have this as an event so that no process would need to be launched)
and archiving bot would only check these pages that are active,
instead of thousands dead pages.
This way we could very efficiently schedule bots and save ton of
system resources (cpu / memory / IO / network / even production
servers load). It would also make it far easier for bot operators to
create new tasks / bots as they would not need to program
"wiki-watchers" themselves.
What you think about it?
Hi,
sorry for another long Email today.
Currently, when you change a Wikidata item, its associated Wikipedia
articles get told to update, too. So your change to the IMDB ID of a movie
in Wikidata will be pushed to all language versions of that article on
Wikipedia. Yay!
There are two use cases that currently are not possible:
* a Wikipedia article on a city might display the mayor. Now someone
changes on Wikidata the label of the mayor - the Wikipedia article will get
updated the next time the page is rendered, but there is no active update
of the page.
* a Wikipedia article might want to include data about another item than
the associated item - most importantly for references, where I might be
interested in the author of a book, it's year of publication, etc. This
feature is currently disabled (even though it would be trivial to switch it
on) because this information would only get updated when the page is
actively rerendered.
In order to enable these use cases we need to track on which pages (on
Wikipedia) an item (from Wikidata) is used. We are thinking of doing this
in two tables:
* EntityUsage: one table per client. It has two columns, one with the
pageId and one with the entityId, indexed on both columns (and one column
with a pk, I guess, for OSC).
* Subscriptions: one table on the client. It has two columns, one with the
pageId and one with the siteId, indexed on both columns (and one column
with a pk, I guess, for OSC).
EntityUsage is a potentially big table (something like pagelinks-size).
On a change on Wikidata, Wikidata consults the Subscriptions table, and
based on that it dispatches the changes to all clients listed there for a
given change. Then the client receives the changes and based on the
EntityUsage table performs the necessary updates.
We wanted to ask for input on this approach, and if you see problems or
improvements that we should put in.
Cheers,
Denny
--
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.
Lot of people hate these discussions I <3 them.
Can someone tell me some pros and cons of using python over php? I
recently heard from several people that python is even better than php
for website developement so I am wondering if that is actually true.
Someone has experience with that?
Hi!
The question is a good base for a holy war :-)
I want to say that PHP has some advantages Python will never have - it's very simple in deployment, there is no fuss with library versions, nearly all needed features are already built-in including a good SAPI (!) so you don't need wsgi, psgi and etc, you don't need any virtualenvs for deploying because nobody typicaaly uses pear libraries :-)
PHP is faster (if you don't take pypy and etc in account).
Also I personally HATE block formatting using indentation. It's so silly idea no more language in the world has. Also for example I don't like python's strict typization ideas (for example it throws exception if you concatenate long and str using +). PHP is simple and has no such problems.
And for webdev I don't like frameworks, either. I.e I don't like them at all - because I always feel they are trying to restrict me. So Django is not an argument for me, and may be not an argument for you too. And definitely you can't say django is just better than php.
What php misses it's the builtin metaprogramming, but in 99% cases you should better write code instead of doing metaprogramming.
So for webdev my opinion is that php is MUCH better than python.
It's not "bad" design. It's "bad" only theoretically and just different
from strongly-typed languages. I like its "inconsistent" function names
- for a lot of functions they're similar to C and in most cases they're
very easy to remember, as opposed to some other languages, including
python (!!).
Of course there are some nuances, but they're in any language. And I
personally think "10" is semantically equal to 10 in most cases, so
comparison is not a problem, either. You just need to be slightly more
accurate while writing things.
And my main idea is that only a statically typed should try to be
strict. And python very oddly tries to be strict in some places while
being dynamically typed. Look, it doesn't concatenate string and long -
even Java does that!
This isn't an appropriate list for this, but MaxSem and hashar told me to
post it here anyway, so here goes.
There's a patch[1] to remove 'visualeditor-enable' from $wgHiddenPrefs,
essentially allowing for disabling VE on a per-user basis again. It has
overwhelming community support, but the VisualEditor team is refusing to
acknowledge it, and ops say it's "none of their business".
Can something be done about it?
[1] https://gerrit.wikimedia.org/r/#/c/73565/
--
Matma Rex
Dario,
Do you intend to measure the total number of edits per day prior to
and after the visual editor roll-out?
It appears that you have not analyzed or presented any data associated
with those statistics.
For example, why are you not providing a daily version of the hourly
graph at http://ee-dashboard.wmflabs.org/graphs/enwiki_ve_hourly_by_ui
?
On Fri, Jul 26, 2013 at 7:28 PM, Dario Taraborelli <
dtaraborelli(a)wikimedia.org> wrote:
>...
> We do have a graph of total hourly edits on enwiki across mainspaces here:
> http://ee-dashboard.wmflabs.org/graphs/enwiki_edits_api - it's trivial to
bin
> by day and filter to the main namespace only, I'll add this to my todo
list.
Thank you! Here is a daily graph of edits by source and visual editor with
totals:
http://i.imgur.com/2f0tmEu.png
It would be great to know what the average total edits per day was in June.