Heya :)
On the 4th of February Quim will be at the Wikimedia Germany office to
introduce MediaWiki groups. See https://www.mediawiki.org/wiki/Groups
for more info about the groups.
We'll meet at 18:30 in the office in Obentrautstr. 72, Berlin. Quim
will talk and answer questions for about 1 hour and then we'll move on
to Brauhaus Lemke for some food and drinks.
If you're going to attend please let me know soon so I can plan
better. I'd also be delighted if you could forward it to other people
who might be interested. I hope to see many of you there.
Cheers
Lydia
--
Lydia Pintscher - http://about.me/lydia.pintscher
Community Communications for Wikidata
Wikimedia Deutschland e.V.
Obentrautstr. 72
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
On 2/1/13 12:10 PM, Michael Dale wrote:
> The software patent situation for mp3 is sad, considering how long the mp3
> format has been around:
>
> http://www.tunequest.org/a-big-list-of-mp3-patents/20070226/
[...]
> But fundamentally Wikimedia is not "distributing" these encoders and there
> are no royalties for media distribution. Likewise we are not shipping
> decoders (the decoders are in browser or the mobile OS )
The USPTO is holding a comment period and a couple meetings on
software patents. The meetings are both 9am – noon local time. Both
require an RSVP by email by this Monday the 4th.
Stanford ("Silicon Valley") is
Tuesday 2013-02-12
Stanford University
Paul Brest Hall
555 Salvatierra Walk, Stanford, CA
NYC ("New York City") is
Wednesday 2013-02-27
New York University
Henry Kaufman Management Center
Faculty Lounge
Room 11-185
44 West 4th St.
New York, NY 10012
http://www.groklaw.net/articlebasic.php?story=20130104012214868
-Jeremy
Hello,
There is a lack of guidelines regarding the rate at the mediawiki API may be used.
Specifically for the enwiki and commons api (if it doesn't matter, please say so):
For bots:
Are simultaneous uploads permitted? E.g. uploading 3 20MB files simultaneously to the commons on a 1Gbit/s line. What is the max rate permitted?
I was recommended at https://bugzilla.wikimedia.org/show_bug.cgi?id=44584 to post to this mailing list.
-Small
On Fri, Feb 1, 2013 at 4:31 PM, Brad Jorsch <bjorsch(a)wikimedia.org> wrote:
> On Fri, Feb 1, 2013 at 11:16 AM, Max Semenik <maxsem.wiki(a)gmail.com> wrote:
>> Because with File::transform()'s worst-case performance, 500 is too
>> much.
>
> Perhaps we should patch ApiQueryImageInfo too then.
>
> Although https://en.wikipedia.org/w/api.php?format=jsonfm&action=query&generator=all…
> didn't seem bad. Must not be hitting the worst case.
https://gerrit.wikimedia.org/r/#/c/47189/
A month ago, PageImages extension[1] was black-deployed, intended to
automatically associate images with articles. It populates its data
when LinksUpdate is run, i.e. when a page or templates it trascludes
is edited or purged. Since then, most of pages were re-parsed, however
slightly less than a million English WP articles remain:
select count(*), avg(page_len) from page where page_namespace=0 and page_is_redirect=0 and page_touched < '20121229000000';
+----------+---------------+
| count(*) | avg(page_len) |
+----------+---------------+
| 977568 | 3172.0948 |
+----------+---------------+
1 row in set (5 min 59.55 sec)
Waiting for these pages to be updated naturally could take forever:
select min(page_touched) from page where page_namespace=0 and page_is_redirect=0;
+-------------------+
| min(page_touched) |
+-------------------+
| 20090714142954 |
+-------------------+
1 row in set (2 min 15.13 sec)
That was [2] before I purged it: obscure topic, no templates.
Thus, I would like to populate this data with a script[3]. To reduce
the scare, let me remark that these pages have almost no templates and
are significantly smaller than average: 3172 bytes vs. 5673 so they
should be mostly fast to parse.
Is running it a good idea?
-----
[1] https://www.mediawiki.org/wiki/Extension:PageImages
[2] https://en.wikipedia.org/wiki/City_of_Melbourne_election,_2008
[3] https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/extensions/PageImages.git…
--
Best regards,
Max Semenik ([[User:MaxSem]])
Let me first say that the ResourceLoader [1] is a wonderful part of the software. Thanks goes out to everyone who contributed to this project - it's made my life much better. That being said, I don't think that I and my team have figure out how to properly take advantage of its benefits.
At Vistaprint, we are currently using the ResourceLoader to load modules, some of which contain JavaScript. The dependencies are made explicit in the registering of the ResourceLoader, and they execute in the proper order on the client side. In many of these JavaScript files we wrap our code in a jQuery .ready() callback [2]. Since these JavaScript files have dependencies on one-another (as laid out in the RL,) they need to be executed in the correct order to work properly. We're finding that when using jQuery's .ready() (or similar) function, the callbacks seem to execute in different (unexepected, browser-dependent) order. This causes errors.
Using the WikiEditor extension as a specific example:
Customizing the WikiEditor-toolbar is one of the specific cases where we've encountered problems. First, the WikiEditor provides no good events to bind to once the toolbar is loaded. This is not a problem because there is a documented work-around [3]. However, our JavaScript code needs to execute in the proper order, which it is not. We have about four JavaScript files that add custom toolbars, sections, and groups.
My questions:
It recently dawned on me that executing our code within a $(document).ready(); callback might not be necessary as the JavaScript for each ResourceLoader module is executed in its own callback on the client-side. This should provide the necessary scope to avoid clobbering global variables along with getting executed at the proper time. Is this a correct assumption to make? Is it a good idea to avoid binding our code to jQuery's ready event?
--Daniel (User:The Scientist)
[1] http://www.mediawiki.org/wiki/ResourceLoader
[2] http://docs.jquery.com/Events/ready
[3] http://www.mediawiki.org/wiki/Extension:WikiEditor/Toolbar_customization#Mo…
Hi all!
Just an FYI here that this has been done, yay! Varnish, Nginx, and Squid frontends are now all logging with tab as the field delimiter.
For those who would notice, for the time being, we have started outputting logs to new filenames with .tab. in the name, so as to differentiate the format. We will most likely change the file names back to their original names in a month or so.
Thanks all!
-Andrew Otto
On Jan 28, 2013, at 11:33 AM, Matthew Flaschen <mflaschen(a)wikimedia.org> wrote:
> On 01/27/2013 08:07 AM, Erik Zachte wrote:
>> The code to change existing tabs into some less obnoxious character is dead
>> trivial, hardly any overhead. At worst one field will then be affected, not
>> the whole record, which makes it easier to spot and debug the anomaly when
>> it happens.
>>
>> Scanning an input record for tabs and raising a counter is also very
>> efficient. Sending one alert hourly based on this counter should make us
>> aware soon enough when this issue needs follow-up, yet without causing
>> bottle necks.
>
> Doing both of those would be pretty robust. However, if that isn't
> workable, a simple option is just to strip tab characters before
> Varnish/Squid/etc. writes the line.
>
> That means downstream code doesn't have to do anything special, and it
> shouldn't affect many actual requests.
>
> Matt Flaschen
>
> _______________________________________________
> Analytics mailing list
> Analytics(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
Hello,
we plan to change the way AbuseFilter filter hits are logged
fundamentally.
Feel free to skip to "Actual impact to end users" in case you're not
interested in or don't understand the technical background.
Some technical background:
We have a AbuseFilterVariableHolder object that contains all variables
usable by the filters. Some of these variables are stored as AFPData
objects and some as AFComputedVariable objects. The values of the
AFPData ones are already known while the values of the
AFComputedVariable ones are computed when needed (lazy load variables).
Right now AbuseFilter is logging filter hits by saving a serialized
version of an AbuseFilterVariableHolder object without any of the lazy
load variables computed. That object, as explained above, includes
several AFComputedVariable objects which hold information on how the
value for a lazy load variable can be computed (eg. parameters and a
method for AFComputedVariable::compute). That has several technical
downsides like it's not very forward compatible so that we will never be
able to change the method names or the way methods in
AFComputedVariable::compute work as we always have to expect that an old
log entry calls the methods with the old parameters. That's an even
bigger problem with the hooks in that function as those have to stay
backwards compatible as well. Furthermore this means we're saving a lot
unneeded data to the database.
What we're going to change now is that we will no longer log
AbuseFilterVariableHolder objects in serialized form to the database but
a serialized array with only native data types (which is much more
robust). Lazy load variables will be logged only if they have been
computed before the logging occurs. This furthermore implies that we
will no longer log any lazy load accessor information to the database.
Actual impact to end users:
The actual impact to the users will be very little as the logging page
(Special:AbuseLog) will still hold all non lazy load variables (like
page title, page namespace, user name, ...) and the lazy load variables
used by the filter(s) tested. Due to this all the relevant data for the
current log action will still be there (while irrelevant data might not
be available). In some cases this might even make it easier to spot
information relevant to a filter hit as data not involved in this filter
hit is no longer logged forcibly.
This change will make it much simpler to make more data available for
filters without having to face the headaches of the current logging
format.
I hope you agree with me that this change makes sense so that we finally
can move forward with the AbuseFilter extension!
Gerrit change: https://gerrit.wikimedia.org/r/42501
Note: This was cross-posted to wikitech-ambassadors
Cheers,
Marius Hoch (hoo)