On Fri, May 16, 2014 at 9:17 AM, Federico Leva (Nemo) <nemowiki(a)gmail.com>wrote:
> * From 40 to 260 events logged per second in a month: what's going on?
Eep, thanks for raising the alarm. MediaViewer is 170 events /
sec, MultimediaViewerDuration is 38 / sec.
+CC Multimedia.
(I removed analytics-l from CC as this is probably not of general interest
any more)
The load wasn't too much of a problem. TokuDB is being used now and it's
supposed to be orders of magnitude better than InnoDB.
The reason we started sampling Friday was because there didn't seem to be a
need for all the data, and it had contributed to basically doubling
EventLogging's total stream of events.
However, Gilles proves the opposite in his message below. So, we should
either stop sampling or change from 1/1000 to like 1/10 for all the
"action" events (see Gilles' message where he mentions non-"action" events
were already sampled and are not affected).
On Tue, May 20, 2014 at 12:41 PM, Toby Negrin <tnegrin(a)wikimedia.org> wrote:
> If we skip the db and dump the data into hadoop it could probably handle
> the load. No idea if this is a good idea right now. Just a thought.
>
> ---------- Forwarded message ----------
> From: Gilles Dubuc <gilles(a)wikimedia.org>
> Date: Tue, May 20, 2014 at 5:21 AM
> Subject: Re: [Analytics] [Multimedia] Media Viewer Dashboards
> To: Wikimedia Foundation Multimedia Team <multimedia(a)lists.wikimedia.org>
> Cc: Analytics Team List <analytics(a)lists.wikimedia.org>
>
>
> Media Viewer's usage of EventLogging grew considerably because of all the
> tracking we're doing:
> http://lists.wikimedia.org/pipermail/analytics/2014-May/002053.html and
> Nuria asked us to reduce the rate.
>
> Due to the global size we're dealing with, instead of logging every action
> on every site, we'll now have to measure a sample and extrapolate an
> estimate. As a quickfix last Friday Gergo introduced the sampling of
> actions (one every thousand actions instead of each action is now
> recorded). As a result all figures on the actions graph were divided by
> 1000 overnight, making the line appear to go to 0. If you actually hover
> over recent days and look at the lest sidebar, you'll see that there are
> figures (they are kind of useless, though, more on that below).
>
> We're now working on improvements and fixing the graphs:
> https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/619The general gist of it is that the figures will be compensated according to
> the sampling and that the sampling factor will be fine-tuned to only apply
> to metrics that were responsible for the high traffic.
>
> Unfortunately it looks like the 1:1000 sampling since last Friday was too
> extreme and is destructive of information, even for the actions that were
> the most numerous. We knew that such a high sampling factor was going to
> destroy information for small wikis or metrics with low figures, but even
> the huge metrics in the millions have become unreliable. I'm saying that
> because multiplying even the largest figures by 1000 still doesn't give an
> estimate close to what it was before the change. Which means that the
> actions graph probably won't be fixable for the period since last Friday
> until my fixes make it through. Even compensating for the sampling (by
> multiplying the figures by 1000), the line would jump up and down every day
> for each metric.
>
> Graphs other than actions are unaffected (they were already sampled). The
> duration log was also affected, but that one doesn't have graphs yet, as
> the task to create them has been given low priority in the cycle.
>
>
> On Mon, May 19, 2014 at 8:43 PM, Fabrice Florin <fflorin(a)wikimedia.org>wrote:
>
>> Hi guys,
>>
>> Does anyone know why the Media Viewer metrics dashboards seem to be stuck
>> with old data from Friday?
>>
>> http://multimedia-metrics.wmflabs.org/dashboards/mmv
>>
>> Is there anything we could fiddle with to get the new data to show up?
>>
>> Thanks for any insights :)
>>
>>
>> Fabrice
>>
>>
>> _______________________________
>>
>> Fabrice Florin
>> Product Manager
>> Wikimedia Foundation
>>
>> http://en.wikipedia.org/wiki/User:Fabrice_Florin_(WMF)
>>
>>
>>
>>
>> _______________________________________________
>> Multimedia mailing list
>> Multimedia(a)lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/multimedia
>>
>>
>
> _______________________________________________
> Analytics mailing list
> Analytics(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
Hi,
I'm running into this error when the tests for my changeset (
https://gerrit.wikimedia.org/r/#/c/134064/ ) run on Jenkins:
https://integration.wikimedia.org/ci/job/mwext-MultimediaViewer-qunit/1881/…
That changeset's parent is the latest master commit, so it's probably not
an issue of automerge. The error comes from ResourceLoader when trying to
load the dependencies for the tests.
I can't reproduce the error locally, neither on Firefox nor Chrome. It's
hard to guess where the error is coming from, as the change is quite large.
Any idea what I could do to resolve the issue? Is there a simple way to
reproduce Jenkin's setup locally?
Hi all,
the Multimedia team is preparing to collect data to better understand
usability problems with UploadWizard. UW has a "checkout" structure (step
1: put files in basket, step 2: choose license, step 3: add description,
step 4: you are done), so a funnel analysis to identify which step causes
the most users to abort the upload process and why seems like a good
approach. I'm trying to understand how well the existing EventLogging
infrastructure supports this.
The problem is how to get information about the actions of users who fell
out of the funnel. I'll try to illustrate with an example: in one of the
steps, the user can choose between "I am uploading my own work" and "I am
uploading someone else's work" and the resulting interaction will be quite
different. We would like to know whether that choice has a big effect on
the likeliness of the user making it to the next step.
Using EventLogging, I can count the number of users who make it until that
step. I can count the number of users making it to the next step. I can
count the number of users choosing this or that author option. These
numbers do not tell us much on their own, though; the interesting
information would be how they are correlated.
Another thing I could do is creating a schema which includes both the
choice of author option and the step number; when the user chooses "own
work", we log an ownwork event, when they click "next step", we log a
step(step=3, work=own) event. We can then calculate the number of users who
did choose "own work" but did not make it to the next step as the
difference of the two. But this won't work: "own work" is a radio button,
the user select and deselect it any number of times before proceeding to
the next step (or leaving the page).
So what we are trying to log are not really events but application states
that describe users who are successful vs. unsuccessful in the given step.
I thought of two ways of dealing with this; any feedback on the
plausibility of these or possible alternatives would be highly appreciated.
One would be to have a "step X succeeded" and a "step X failed" event (the
schema for which could include all sorts of state, such as which authorship
option was selected). This would require the ability to log an event when
the user leaves the page. I see two ways two do that:
- send the event log as a synchronous request from an unload event handler.
This is not supported on ancient browsers; also, there is probably some
mechanism in most browsers to kill an unload event handler if it takes long.
- store the event in cookies/localStorage, log it on the next page load.
This works in all browsers but it is less reliable (what if the user never
comes back?) and logs the event for a different page load from where it
actually occurred (what if the user comes back after a month?), and
probably runs int all sorts of complications with multiple tabs.
The other way could be to log event chains: set a random identifier (which
only lives until the page is unloaded), and add it to every event. Event
groups can then be merged into meta-events by SQL magic, although that
looks like it will be extremely painful to do. On the other hand, this is
much more generic than the previous method, and could be used to answer
more complex questions.
What do you think? Which would be the method I am not shooting myself in
the foot with? Currently I am leaning towards using unload handlers.
Towards the end of developing features in Media Viewer, Pau asked us to
correct a lot of the layout that was provided by the OOUI defaults. In fact
I know there are still some elements that aren't styled the way Pau wants
them to be, that we couldn't improve due to lack of time.
I think I recall that this happened because OOUI doesn't implement Agora
styles yet, which I understand to be the standard look as defined by the
Design team. Is that correct? Or were the styling issues specific to Pau's
design for Media Viewer?
Since we're embarking on Upload Wizard work, the fact that we will use OOUI
for it or not will soon be debated, and I'd like to clarify if we're likely
to have to do that sort of patching on top of existing CSS again in order
to comply to unified styling.
If there is a plan to make OOUI CSS comply to the unified styling, what's
the status of that? Is anyone actively working on it?
https://www.mediawiki.org/wiki/UX_standardization seems to list it as a
TODO, but there's no timeline nor references about who's taking on that
task.
Hi all,
We would appreciate your help to come up with a class name that community members can use to exclude an image from Media Viewer or related tools.
Too many small files (like icons, flags, etc.) appear in Media Viewer for some articles, even though they are unrelated to the topic of the article. Other image files also need to be excluded, because they are not suitable for Media Viewer (such as maps using weird CSS/JS tricks, or images which use a clipping template).
Many community members have reported this issue, which delivers an unpleasant browsing experience for users who only want to view images that are relevant for the article they are reading (and which are supported by Media Viewer).
We agree that this is an important issue. The most practical way to address it would require editors to add a .metadata class to the images they don’t want to show on a page, as proposed here:
https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/511
We just need to come up with a class name people are happy with for excluding an image from Media Viewer or related tools. We already exclude images which have a .metadata class, but there are images that aren't really metadata but still not appropriate.
Any ideas? What class name do you recommend we use to convey this important information?
Here are some possible ideas, to get this conversation started;
* hide
* exclude
* noshow
* ??
It would be best if we agreed on a name that is not tied to Media Viewer, so it can be used by other tools which may have the same needs, now or in the future.
Once we settle on a class name, we can schedule that feature for development, so editors can filter out unsuitable images for everyone’s viewing pleasure :)
Thanks for your feedback!
Fabrice
_______________________________
Fabrice Florin
Product Manager, Multimedia
Wikimedia Foundation
http://en.wikipedia.org/wiki/User:Fabrice_Florin_(WMF)
For those of you who are not following wikimedia-l: you might be interested
in this discussion about searches with harmless keywords bringing up very
NSFW images (an old and highly controversial topic):
http://thread.gmane.org/gmane.org.wikimedia.foundation/71770/focus=71903
(I am linking the (currently) last few mails of the thread; the discussion
leading up to them is also interesting, but terribly long.)
Greetings!
Here’s our weekly update on what our multimedia team is working on. We hope you find this report helpful.
1. Media Viewer Releases
Today, we just enabled Media Viewer (1) by default on Wikimedia Commons. Next week, we will release the tool on the English, German, Italian and Russian Wikipedias, as well as WikiSources in all languages; if all goes well, we plan to roll out on all wikis the following week, as outlined in our release plan (2). Overall response has been largely favorable, as reported before (3). Please share your feedback on the discussion page (4).
2. Last week’s sprint
Last week, we worked mostly on Media Viewer and Tech Debt (GWToolset and Image Scaler issues), as outlined in our meeting notepad (5). On the Media Viewer front, we fixed a number of bugs, upgraded our metrics with a global image view dashboard (6), investigated then ruled out a simple zoom link, and estimated the server load of Media Viewer when deployed to all wikis (7). On the Technical Debt front, we fixed bugs for the GW Toolset, and added a throttle to prevent it from processing too many images at once; we also investigated root causes of the recent Image Scaler outage, identifying a few promising solutions to work on next.
3. This week’s sprint
This week, we’re planning to split our time evenly between Media Viewer, Tech Debt and Upload Wizard work, as shown in our current sprint board (8). For Media Viewer, we’ll make it easier to discover metadata and add more tooltips to address community requests, as well as run more tests in different browsers like IE9. For Tech Debt, we’ll start generating reference thumbnails to save CPU time on image scalers, as well as lower the frequency of GWToolset jobs. For Upload Wizard, we’re starting to collect usage data on key steps in the upload workflow, as part of a funnel analysis that will tell us where people drop off most often; we’re also analyzing the feedback and bugs backlog, to identify major pain points -- and generating first design ideas to address them; lastly, we’re getting UploadWizard ready for jQuery 1.9, so we can get more familiar with the current code base.
4. Next steps
Through the middle of June, we plan to gradually spend more time on Upload Wizard, once Media Viewer has been successfully deployed, as shown in our current cycle board (9). In coming weeks, we will host a number of community discussions to prioritize key issues and review possible solutions together. Based on this feedback, we aim to implement some of the most promising solutions through the end of the summer.
5. Thanks
We’re very grateful to all the community and team members who keep guiding our progress at each step of the way. We couldn’t do this work without you -- and consider ourselves lucky to have such great partners. Today, we would like to give special thanks to Aaron Arcos, a former Google employee who volunteered for 5 months with our team, and was a wonderful coach and collaborator: he helped us focus on quality, refine our agile development process and get serious about unit testing, as described in the exit video he put together, with interviews from our team (10).
We look forward to more great collaborations with you all. :)
Be well,
Fabrice - for the Multimedia Team
(1) About Media Viewer: https://www.mediawiki.org/wiki/Multimedia/About_Media_Viewer
(2) Large Wiki Releases: https://www.mediawiki.org/wiki/Multimedia/Media_Viewer/Release_Plan#Large_W…
(3) Survey Report: https://www.mediawiki.org/wiki/Multimedia/Media_Viewer/Survey/Results_-_05-…
(4) Media Viewer page: https://www.mediawiki.org/wiki/Talk:Multimedia/About_Media_Viewer
(5) Multimedia Sprint Notes: http://etherpad.wikimedia.org/p/multimedia-weekly-meeting-2014-05-14
(6) Global Image View Dashboard: http://multimedia-metrics.wmflabs.org/graphs/mmv_image_views_global
(7) Global Server Load Estimate: https://www.mediawiki.org/wiki/Multimedia/Metrics/Estimations
(8) Current Sprint: http://ur1.ca/gtyrp
(9) Current Cycle Board: http://ur1.ca/h7w5s
(10) Volunteering at Wikipedia: https://www.youtube.com/watch?v=RxW8TMMA05k
_______________________________
Fabrice Florin
Product Manager, Multimedia
Wikimedia Foundation
https://www.mediawiki.org/wiki/User:Fabrice_Florin_(WMF)
On Wed, May 14, 2014 at 12:09 PM, Greg Grossmeier <greg(a)wikimedia.org> wrote:
> https://bugzilla.wikimedia.org/show_bug.cgi?id=53770
>
> Error:
> You do not have permission to move this page, for the following
> reason:
> The target filename is invalid
>
>
> There are some examples in the bug comments. Bawolff said this:
> "Taking a brief look over the code. This error appears to be coming from
> LocalFileMoveBatch::doDBUpdates in the case it can't update the db.
> However in the case of that error, its supposed to rollback the db
> transaction, and bail, all before touching any files on the file
> system..."
>
>
> Help?
Hi folks,
Cross-posting to the Multimedia team list, since I think we should
collectively come up with a plan here. Aaron Schulz is a likely
candidate to help out from the MediaWiki Core side, but we are pretty
likely to need at least review help from Multimedia.
I believe Aaron already has a rather ambitious fix for this in Gerrit:
https://gerrit.wikimedia.org/r/#/c/127460/
However, the change is risky, and probably means breaking more things
than it fixes at first. Thoughts on next steps?
Rob