Hi Antoine,
Looks like we'll soon be running cucumber tests on our own infrastructure,
right? Exciting!
I had a glance at the error and I think it might be this option preventing
our test from running:
--tags @firefox
Because our basic E2E test doesn't have that tag.
2014-04-23 13:58 GMT+02:00 jenkins-bot <nobody(a)integration.wikimedia.org>:
> * FAILURE:
> browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox
> Build #1
> <https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.w…>
> (Wed, 23 Apr 2014 11:58:39 +0000)*
>
> _______________________________________________
> Multimedia-Alerts mailing list
> Multimedia-Alerts(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/multimedia-alerts
>
>
Hey Pine,
Thanks for your kind words about Echo. Those of us who created it are really happy the tool is useful to you. :)
See my comments inline below.
Fabrice
On Apr 22, 2014, at 11:53 PM, ENWP Pine <deyntestiss(a)hotmail.com> wrote:
> Thanks again to those who created Echo. It is being used widely in my circles and I get lots of useful notices these days.
>
> * I would like to suggest making more of the notices send email by default, especially thanks notifications.
>
We enable email notifications by default for all new users.
Existing users have to opt-in. When we launched the product, we didn’t want current users to be overwhelmed by too many email notifications they didn’t ask for. At this point, it would be strange to change that policy, given that individual users have sole control over their preferences.
> * I heard another user suggest sending a notification when a file that someone uploads to Commons is added to a Wikipedia article. I think that is a great idea. Can this be done?
>
Yes, our multimedia team is hoping to build this ‘your file was used’ notification in coming months, as specified here:
https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/170
Stay tuned for more ...
> * I would like to ask about the thanks system on Meta. I started to use the Echo thanks system to thank who are answering questions on the Annual Plan talk page, including some of you on this list, and I ran into some differences from English Wikipedia. On Meta I am asked for the revision ID of the edit for the thanks notification, and after thanking a user the "thank" link for an edit doesn't change to "thanked". Will Meta thanks notifications be upgraded in the future to function like ENWP's does?
>
>
I’m not sure why Thanks would behave differently on Meta. Does anyone know?
> Pine
> _______________________________________________
> EE mailing list
> EE(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/ee
_______________________________
Fabrice Florin
Product Manager
Wikimedia Foundation
http://en.wikipedia.org/wiki/User:Fabrice_Florin_(WMF)
Hi team,
Here are the projects we propose to track next with our Media Viewer metrics dashboards:
* de - German
* he - Hebrew
* ko - Korean
* pl - Polish
* pt - Portuguese
* English Wikivoyage
I have updated this metrics ticket #476 which Gilles created, so we can track Media Viewer metrics on these sites:
https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/476
The purpose of this task is to monitor activity in key sites where Media Viewer is deployed (or coming soon), so we can evaluate its impact on both community and performance. My selection criteria for this second batch include:
* diverse mix of languages and sizes
* already on first pilots list (or coming soon)
* active relation with community champions
Any other important sites we should track right away — or in the next batch? We are already tracking commons, en, fr, hu and mediawiki. For the next batch, I propose the following large Wikipedias: es, hi, it, ja, ru, zh — plus any other sites you guys think we should track.
Thanks,
Fabrice
>>
>>
>> On Mon, Apr 21, 2014 at 2:23 AM, Gilles Dubuc <gilles(a)wikimedia.org> wrote:
>> Gilles how much work is entailed in creating dashboards for each major language? If there is any chance we could do more of them? What would be a reasonable amount on your end? It would also be great if we could accelerate this metrics update task, so it would give us and our community champions more visibility sooner on what’s happening on these sites.
>>
>> I think one dashboard per site makes sense, plus a global one, instead of the current setup where each tab is a long list of graphs. It would also allow us to have per-site maps. Setting this up would be an initial 3 points, and 1 point every time we want to add (an) additional site(s). If new ones come later, it's better to do them as a batch, because what makes it time-consuming is updating all the different parts and having them reviewed, but the task itself isn't complicated.
>>
>> I've created a card for it:https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/476 if you want more sites than commons, en, fr, hu and mediawiki, please update the ticket.
>>
>
>
> Thanks, Gilles, much appreciated.
>
> I will update the ticket with recommended additional sites to track, for consideration at Wednesday’s sprint planning meeting.
_______________________________
Fabrice Florin
Product Manager
Wikimedia Foundation
http://en.wikipedia.org/wiki/User:Fabrice_Florin_(WMF)
Hi folks,
I wanted to share the first results of our Media Viewer surveys (1), which started last week in three languages: English, Hungarian and Catalan. (2) (3) (4)
* Hungarian Survey: 42% find the tool useful, 51% don’t find it useful, 6% are not sure (159 responses)
* English Survey: 62% find the tool useful, 10% don’t find it useful, 27% are not sure (37 responses)
* Catalan Survey: 50% find the tool useful, 25% don’t find it useful, 25% are not sure (8 responses)
I just compiled comments from the Hungarian Wikipedia (3), which suggest that these are the most frequent requests on that site:
* Too slow (17)
* Blurred images (6)
* More image sizes (4)
* Can't see images (4)
* Return where I clicked on article (scroll bug, now fixed) (5)
We are now using this feedback as a guideline for improvements to make as we approach wider deployment. Note that this survey is aimed at all users, not just editors, so we can get feedback from other viewpoints than those already expressed on our talk page. For example, about 61% of respondents on the Hungarian survey so far have never edited Wikipedia.
Next, we will start surveys in French, German and Portuguese — and perhaps a few more languages, as requested by community champions for large wikis. Hearing from readers around the world helps complement the feedback we get from experienced editors on our talk pages, to give us a more comprehensive perspective from all of our users.
As discussed earlier, we would now like to invite respondents to leave their email address at the end of the survey, if they are open to being contacted about their comments. For example, our team would like to follow up directly with users about image load issues, to find out why some of these images are taking so long to load.
Our legal team thinks it is reasonable for us to collect email addresses for WMF team follow-ups, as covered by our feedback policy, which specifically says "If you submit an email address, we may use it to communicate with you and send you updates on your feedback.”
Community members on this thread, would it be all right with you if we retained these email addresses for two years? Their emails would be kept private, in accordance with our Privacy Policy, and will only be used by the Wikimedia Foundation to follow up with them about this or related multimedia products. One thing we don’t do enough with our users now is to check back with them a year or so later, to see how they like the product.
Thanks for your constructive guidance, as always!
All the best,
Fabrice
_______________________________
(1) Media Viewer Survey:
https://www.mediawiki.org/wiki/Multimedia/Media_Viewer/Survey
(2) English Survey:
* Dashboard: https://www.surveymonkey.com/sr.aspx?sm=ZktzNzgF_2bGpdvIP3kl55mvJ2cVE_2bSrG…
* Comments: https://www.mediawiki.org/wiki/Multimedia/Media_Viewer/Survey#Comments
* Survey form: https://www.surveymonkey.com/s/media-viewer-1
(3) Hungarian Survey:
* Dashboard: https://www.surveymonkey.com/sr.aspx?sm=UlIq4sBVXvAeVkLPfcc_2bwG6dmVARHDBAJ…
* Comments: https://www.mediawiki.org/wiki/Multimedia/Media_Viewer/Survey/Results/Hunga…
* Survey form: https://www.surveymonkey.com/s/media-viewer-1-hu
(4) Catalan Survey:
* Dashboard: https://www.surveymonkey.com/sr.aspx?sm=nB2siDPf8Cjo_2f_2btq0HB5Hb7y2Ts6r6B…
* Survey form: https://www.surveymonkey.com/s/media-viewer-1-ca
_______________________________
Fabrice Florin
Product Manager, Multimedia
Wikimedia Foundation
http://en.wikipedia.org/wiki/User:Fabrice_Florin_(WMF)
Hi all,
we have gotten lots of reports that it can take very long for MediaViewer
to display (unblur) the images (some people mentioned 20-30 second waiting
times); our tests and metrics [1], on the other hand, show 1-2 second
loading times. We need to find out under what conditions does the
application become so slow; if you experience something like that, or know
other people who have experienced it, please help us collect more
information.
Ideally, we would like to know the following:
- how long did it take for the image to load? (time spent between clicking
on a thumbnail, and the blurry image becoming sharp)
- can you reproduce the slow loading time with other images? (Use images
from a different wiki page, so that they are not preloaded)
- can you reproduce the slow loading time with same image, after refreshing
the page with a clear cach (Ctrl-F5 on most browsers)?
- what OS/browser do you use?
- what kind of internet connection do you use, what bandwidth does it have?
If you are comfortable with using the network tab on the web console of
your browser (F12 on most browsers), and can look at it and tell us in
detail which requests took up the majority of the loading time, that would
be even more helpful.
You can report the results here or on-wiki at [2].
Thanks in the name of the multimedia team!
[1]
http://multimedia-metrics.wmflabs.org/dashboards/mmv#overall_network_perfor…
[2] https://www.mediawiki.org/wiki/Multimedia/Media_Viewer/Speed_reports
Mark deployed the change, the mean and standard deviation on the "Overall
network performance" and "Geographical network performance" tabs are now
geometric:
http://multimedia-metrics.wmflabs.org/dashboards/mmv
These charts and maps now make a lot more sense! Next I'll be working on
distribution histograms, so that we can see the outlier values that are now
excluded from those graphs.
Thanks again Aaron, thanks to you these visualizations have become truly
useful and meaningful, in the way they were meant to be.
On Thu, Apr 17, 2014 at 6:13 PM, Aaron Halfaker <ahalfaker(a)wikimedia.org>wrote:
> Yikes! Good catch.
>
>
> On Thu, Apr 17, 2014 at 11:12 AM, Gilles Dubuc <gilles(a)wikimedia.org>wrote:
>
>> A solution to this problem is to generate a geometric mean[2] instead.
>>>
>>
>> Thanks a lot for the help, it literally instantly solved my problem!
>>
>> There was a small mistake in the order of functions in your example, for
>> the record it should be:
>>
>> EXP(AVG(LOG(event_total))) AS geometric_mean
>>
>> And conveniently the geometric standard deviation can be calculated the
>> same way:
>>
>> EXP(STDDEV(LOG(event_total))) AS geometric_stddev
>>
>> I put it to the test on a specific set of data where we had a huge
>> outlier, and for that data it seems equivalent to excluding the lower and
>> upper 10 percentiles, which is exactly what I was after.
>>
>>
>>
>>
>>
>> On Wed, Apr 16, 2014 at 4:24 PM, Aaron Halfaker <ahalfaker(a)wikimedia.org>wrote:
>>
>>> Hi Gilles,
>>>
>>> I think I know just the thing you're looking for.
>>>
>>> It turns out that much of this performance data is log-normally
>>> distributed[1]. Log-normal distributions tend to have a hockey stick
>>> shape where most of the values are close to zero, but occasionally very
>>> large values appear[3]. Taking the mean of a log-normal distributions tend
>>> to be sensitive to outliers like the ones you describe.
>>>
>>> A solution to this problem is to generate a geometric mean[2] instead.
>>> One convenient thing about log-normal data is that if you log() it, it
>>> becomes normal[4] -- and not sensitive to outliers in the usual way. Also
>>> convenient, geometric means are super easy to generate. All you need to do
>>> is this: (1) pass all of the data through log() (2) pass the same data
>>> through mean() (or avg() -- whatever) (3) pass the result through exp().
>>> The best thing about this is that you can do it in MySQL.
>>>
>>> For example:
>>>
>>> SELECT
>>> country,
>>> mean(timings) AS regular_mean,
>>> exp(log(mean(timings)) AS geomteric_mean
>>> FROM log.WhateverSchemaYouveGot
>>> GROUP BY country
>>>
>>>
>>> 1. https://en.wikipedia.org/wiki/Log-normal_distribution
>>> 2. https://en.wikipedia.org/wiki/Geometric_mean
>>> 3. See distribution.log_normal.svg (24K)<https://mail.google.com/mail/u/0/?ui=2&ik=1aecb4a505&view=att&th=1456ae573a…>
>>> 4. See distribution.log_normal.logged.svg (33K)<https://mail.google.com/mail/u/0/?ui=2&ik=1aecb4a505&view=att&th=1456ae58ec…>
>>>
>>> -Aaron
>>>
>>> On Wed, Apr 16, 2014 at 8:42 AM, Dan Andreescu <dandreescu(a)wikimedia.org
>>> > wrote:
>>>
>>>> So, my latest idea for a solution is to write a python script that
>>>>>> will import the section (last X days) of data from the EventLogging tables
>>>>>> that we're interested in into a temporary sqlite database, then proceed
>>>>>> with removing the upper and lower percentiles of the data, according to any
>>>>>> column grouping that might be necessary. And finally, once the data
>>>>>> preprocessing is done in sqlite, run similar queries as before to export
>>>>>> the mean, standard deviation, etc. for given metrics to tsvs. I think using
>>>>>> sqlite is cleaner than doing the preprocessing on db1047 anyway.
>>>>>>
>>>>>> It's quite an undertaking, it basically means rewriting all our
>>>>>> current SQL => TSV conversion. The ability to use more steps in the
>>>>>> conversion means that we'd be able to have simpler, more readable SQL
>>>>>> queries. It would also be a good opportunity to clean up the giant
>>>>>> performance query with a bazillion JOINS:
>>>>>> https://gitorious.org/analytics/multimedia/source/a949b1c8723c4c41700cedf6e… can actually be divided into several data sources all used in the
>>>>>> same graph.
>>>>>>
>>>>>> Does that sound like a good idea, or is there a simpler solution out
>>>>>> there that someone can think of?
>>>>>>
>>>>>
>>>> Well, I think this sounds like we need to seriously evaluate how people
>>>> are using EventLogging data and provide this sort of analysis as a feature.
>>>> We'd have to hear from more people but I bet it's the right thing to do
>>>> long term.
>>>>
>>>> Meanwhile, "simple" is highly subjective here. If it was me, I'd clean
>>>> up the indentation of that giant SQL query you have, then maybe figure out
>>>> some ways to make it faster, then be happy as a clam. So if sql-lite is
>>>> the tool you feel happy as a clam with, then that sounds like a great
>>>> solution. Alternatives would be python, php, etc. I forgot if pandas was
>>>> allowed where you're working but that's a great python library that would
>>>> make what you're talking about fairly easy.
>>>>
>>>> Another thing for us to seriously consider is PostgreSQL. This has
>>>> proper f-ing temporary tables and supports actual people doing actual work
>>>> with databases. We could dump data, especially really simple schemas like
>>>> EventLogging, into PostgreSQL for analysis.
>>>>
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> Analytics(a)lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> Analytics(a)lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics(a)lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> _______________________________________________
> Analytics mailing list
> Analytics(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
Including the analytics team in case they have a magical solution to our
problem.
Currently, our graphs display the mean and standard deviation of metrics,
as provided in "mean" and "std" columns coming from our tsvs, generated
based on EventLogging data:
http://multimedia-metrics.wmflabs.org/dashboards/mmv However we already
see that extreme outliers can make the standard deviation and mean
skyrocket and as a result make the graphs useless for some metrics. See
France, for example, for which a single massive value was able to skew the
map into making the country look problematic:
http://multimedia-metrics.wmflabs.org/dashboards/mmv#geographical_network_p…'s
no performance issue with France, but the graph suggests that is
the case because of that one outlier.
Ideally, instead of using the mean for our graphs, we would be using what
is called the "trimmed mean", i.e. the mean of all values excluding the
upper and lower X percentiles. Unfortunately, MariaDB doesn't provide that
as a function and calculating it with SQL can be surprisingly complicated,
especially since we often have to group values for a given column. The best
alternative I could come up with so far for our geographical queries was to
exclude values that differ more than X times the standard deviation from
the mean. It kind of flattens the mean. It's not ideal, because I think
that in the context of our graphs it makes things look like they perform
better than they really do.
I think the main issue at the moment is that we're using a shell script to
pipe a SQL request directly from db1047 to a tsv file. That limits us to
one giant SQL query, and since we don't have the ability to create
temporary tables on the log database with the research_prod user, we can't
preprocess the data in multiple queries to filter out the upper and lower
percentiles. The trimmed mean would be kind of feasible as a single
complicated query if it wasn't for the GROUP BY:
http://stackoverflow.com/a/8909568
So, my latest idea for a solution is to write a python script that will
import the section (last X days) of data from the EventLogging tables that
we're interested in into a temporary sqlite database, then proceed with
removing the upper and lower percentiles of the data, according to any
column grouping that might be necessary. And finally, once the data
preprocessing is done in sqlite, run similar queries as before to export
the mean, standard deviation, etc. for given metrics to tsvs. I think using
sqlite is cleaner than doing the preprocessing on db1047 anyway.
It's quite an undertaking, it basically means rewriting all our current SQL
=> TSV conversion. The ability to use more steps in the conversion means
that we'd be able to have simpler, more readable SQL queries. It would also
be a good opportunity to clean up the giant performance query with a
bazillion JOINS:
https://gitorious.org/analytics/multimedia/source/a949b1c8723c4c41700cedf6e…
can actually be divided into several data sources all used in the
same graph.
Does that sound like a good idea, or is there a simpler solution out there
that someone can think of?
Hi folks,
I’m happy to let you know that Media Viewer has just deployed on our first pilot sites today!
1. First Pilots
We just released Media Viewer enabled by default on Catalan, Hungarian and Korean Wikipedias, as well as English Wikivoyage. Next Thursday, we plan to deploy to more pilot sites: Czech, Estonian, Finnish, Hebrew, Polish, Romanian, Thai, Slovak, and Vietnamese. Try it out for yourself on the Hungarian Wikipedia:
https://hu.wikipedia.org
2. First Metrics
MediaWiki.org, we jumped from 100 image views per day to 1k/day, about a 10 x increase. And on Commons it was much higher, due to the ‘View Expanded’ button: from 240 image views per day to 24k/day yesterday — that’s a 100 x increase ! You can track the adoption of this tool on these first metrics dashboards.
http://multimedia-metrics.wmflabs.org/dashboards/mmv
3. Share your feedback
Please let us know what you think of Media Viewer — and join other beta users from around the world on this discussion page:
https://www.mediawiki.org/wiki/Talk:Multimedia/About_Media_Viewer
If you’re short on time, please take this quick survey to let us know how Media Viewer works for you:
https://www.surveymonkey.com/s/media-viewer-1?c=email
Many thanks to all the team and community members who made this launch possible!
Enjoy,
Fabrice — for the Multimedia Team
P.S.: If you haven’t tried Media Viewer yet, follow the test tips on this demo page on MediaWiki.org:
https://www.mediawiki.org/wiki/Lightbox_demo
_______________________________
Fabrice Florin
Product Manager, Multimedia
Wikimedia Foundation
https://www.mediawiki.org/wiki/User:Fabrice_Florin_(WMF)
Hi all,
we have just deployed a new URL format for MediaViewer [0], I am submitting
it here for comments and for the benefit of people who have to do something
similar in other contexts.
MediaViewer stores the name of the image in the hash part of the URL so one
can share links to a page with a specific image open in the lightbox. (We
considered using the History API [1] to change the path or the query part,
but that degrades poorly.) I have looked at three options:
1. Just put the file name as-is (with spaces replaced by underscores) in
the URL fragment part.
Pro: readable file names in URLs, easy to generate.
Con: technically not a valid URI. [2] (It would be a valid IRI,
probably, but browser support for that is not so great, so non-ASCII bytes
might get encoded in unexpected ways.) Creates nasty usability and security
issues (injection vulnerabilities, RTL characters, characters which break
autolinking). Would make it very hard to introduce more complex URL formats
later, as file names can contain pretty much any character.
2. Use percent encoding (with underscores for spaces).
Pro: this is the standard way of encoding fragments. [2][3] Always
results in a valid URI. Readable file names in Firefox. Easy to generate
on-wiki (e.g. with {{urlencode}})
Con: Non-Latin filenames look horrible in any browser that's not Firefox.
3. Use MediaWiki anchor encoding (like percent encoding, but use a dot
instead of a percent sign).
This would have the advantage that links can be generated in
wikitext very conveniently, using the [[#...]] syntax. Unfortunately the
way MediaWiki does percent encoding is intrinsically broken (the dot itself
does not get encoded, but it does get decoded when followed by suitable
characters, so file names cannot get roundtripped safely), so this is not
an option.
We went with option 2, so URLs look like this:
https://www.mediawiki.org/wiki/Lightbox_demo#mediaviewer/File:Swallow_flyin…https://www.mediawiki.org/wiki/Lightbox_demo#mediaviewer/File:%E0%AE%85%E0%…
One issue that we ran into is that window.location.hash behaves weirdly
with percent-encoded hashes in Firefox [4], but that's easy to avoid once
you know about it. Other than that, it seems to work reliably.
[0] https://www.mediawiki.org/wiki/Multimedia/Media_Viewer
[1] http://diveintohtml5.info/history.html
[2] http://tools.ietf.org/html/rfc3986#section-3.5
[3] https://tools.ietf.org/html/rfc3987#section-3.1
[4] https://bugzilla.mozilla.org/show_bug.cgi?id=483304