The analytics team recently produced some breakdowns of mobile usage
across devices. Would be great if we could regularly publish our top
ten. I would say we should be supporting anything over 1%
cc'ing analytics
On Mon, Apr 7, 2014 at 10:56 AM, Tomasz Finc <tfinc(a)wikimedia.org> wrote:
> Wikimedia Mobile Web Team,
>
> Where is our latest list of top phones to test on? This seems out of date
>
> https://www.mediawiki.org/wiki/Compatibility#Mobile_browsers
>
> and i can't find our other matrix chart.
>
> --tomasz
>
> On Fri, Apr 4, 2014 at 7:54 AM, Megan Hernandez
> <mhernandez(a)wikimedia.org> wrote:
>> Hey Tfinc,
>>
>> We're getting into testing more mobile banners over here in fundraising. At
>> the moment, we're testing these out on our mix of personal phones, friends,
>> & online simulators. So we want to get our reader top used phones to use for
>> internal testing.
>>
>> Do you guys already have a list of the top devices we should get based on
>> our reader usage? Any best practices / tips you can share about internal
>> mobile testing before our new banners hit the site? At the moment, we are
>> testing the banners out ourselves on a variety of devices, putting them into
>> usertesting.com, watching readers in focus groups and of course on loyal
>> friends/guinea pigs. Would love to hear your tips.
>>
>> TY for the info & go mobile,
>>
>> Megan
>>
>>
>> --
>>
>> Megan Hernandez
>>
>> Director of Online Fundraising
>> Wikimedia Foundation
>
> _______________________________________________
> Mobile-l mailing list
> Mobile-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/mobile-l
--
Jon Robson
* http://jonrobson.me.uk
* https://www.facebook.com/jonrobson
* @rakugojon
Hi all,
We finished our sprint last Tuesday and made plans for the next one on
Thursday and I wanted to let you know the updates.
We're currently in the middle of comparing project management tools so the
links will be pretty random. Sorry about that. The Foundation is performing
a review of many of the tools we use and we are participating in this
review. [1]
We finished the two projects that are part of the Editor Engagement Vital
Signs project that we had committed to for the sprint:
- WikiMetrics: User Makes Report Public (https://fab.wmflabs.orgs/T76)
- WikiMetrics: User Creates Scheduled Report (
https://fab.wmflabs.orgs/T77)
These features are currently available in our Staging environment
(https://metrics-staging.wmflabs.org/)
We were also able to finish the following tasks:
- Camus productized (Kafka import into Hadoop) and running
- Updated Report Card for Metrics Meeting
- Gerrit taken down by labs bot
- Bits traffic now being ingested into Hadoop via Kafka
- Helped Multimedia Team set up maps in Limn
We fixed the following defects:
- 63684 Stop analytics-internal(a)wikimedia.org (no "lists." in domain)
address from delivering only to part of analytics-internal list
- 63531 Geowiki not working on stat1003
- 63470 analytics1012 fails Hadoop applications and jobs
- 63505 Revoke permission to plain users to modify files underneath
/a/squid on stats1002
- 63371 Wikipedia Zero job for 2014-03-01 failed on Hadoop with
"java.io.IOException:
stored gzip size doesn't match decompressed size"
- 63773 Default project not available (arwiki)
- 62456 Getting wrong output from Result
- 60581 scripts/install did something that broke my pip installation
- 63879 Incomplete monthly aggregated page view files
- 63884 Daily aggregation of page views stats breaks on invalid input
(sort error)
- 63478 Tests for wikistats
For the new sprint, the team has committed to the following tasks:
- Spike on commit process (http://fab.wmflabs.org/T103)
- Deploy wikimetrics to production (http://fab.wmflabs.org/T109)
- 64026 Wrong name is being shown in CSV output. Part I. Refactor of
metrics
- 63836 encoding issue of validating cohort
Please let me know if you have any questions.
thanks,
-Toby
[1] https://www.mediawiki.org/wiki/Requests_for_comment/Phabricator
Hi,
in case you stumble upon Icinga having slave lag checks for the
analytics slaves disabled, please leave them disabled.
Springle told me that they are disabled on purpose, as they went off
too often (due to slow queries run by analytics :-) ).
A separate machine to run slow queries on is on the way. Once it is
there, and slow queries have been migrated over, we'll turn on Icinga
alerts for the normal analytics slaves again.
Bug 64088
https://bugzilla.wikimedia.org/show_bug.cgi?id=64088
keeps track of the disabled Icinga monitoring.
Have fun,
Christian
--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Christian Aistleitner
Gruendbergstrasze 65a Email: christian(a)quelltextlich.at
4040 Linz, Austria Phone: +43 732 / 26 95 63
Fax: +43 732 / 26 95 63
Homepage: http://quelltextlich.at/
---------------------------------------------------------------
Wikimedia stats portal <http://stats.wikimedia.org/> stats.wikimedia.org
now features more tools and reports than ever (57 and growing). An often
heard complaint was that the portal was a bit overwhelming and hard to
navigate. Two changes hopefully help you find what you need with more ease.
First all entries are now in one huge list, no artificial breakdown between
internal and external tools. By itself this list may be even more daunting
in size, but the <http://stats.wikimedia.org/cgi-bin/search_portal.pl> new
search feature aims to address just that.
Read more (and please comment there):
http://infodisiac.com/blog/2014/04/portal-stats-wikimedia-org-can-now-be-sea
rched/
Erik Zachte
Sending it to the right Analytics email address....
On Wed, Apr 16, 2014 at 1:22 PM, Gilles Dubuc <gilles(a)wikimedia.org> wrote:
> Including the analytics team in case they have a magical solution to our
> problem.
>
> Currently, our graphs display the mean and standard deviation of metrics,
> as provided in "mean" and "std" columns coming from our tsvs, generated
> based on EventLogging data:
> http://multimedia-metrics.wmflabs.org/dashboards/mmv However we already
> see that extreme outliers can make the standard deviation and mean
> skyrocket and as a result make the graphs useless for some metrics. See
> France, for example, for which a single massive value was able to skew the
> map into making the country look problematic:
> http://multimedia-metrics.wmflabs.org/dashboards/mmv#geographical_network_p…'s no performance issue with France, but the graph suggests that is
> the case because of that one outlier.
>
> Ideally, instead of using the mean for our graphs, we would be using what
> is called the "trimmed mean", i.e. the mean of all values excluding the
> upper and lower X percentiles. Unfortunately, MariaDB doesn't provide that
> as a function and calculating it with SQL can be surprisingly complicated,
> especially since we often have to group values for a given column. The best
> alternative I could come up with so far for our geographical queries was to
> exclude values that differ more than X times the standard deviation from
> the mean. It kind of flattens the mean. It's not ideal, because I think
> that in the context of our graphs it makes things look like they perform
> better than they really do.
>
> I think the main issue at the moment is that we're using a shell script to
> pipe a SQL request directly from db1047 to a tsv file. That limits us to
> one giant SQL query, and since we don't have the ability to create
> temporary tables on the log database with the research_prod user, we can't
> preprocess the data in multiple queries to filter out the upper and lower
> percentiles. The trimmed mean would be kind of feasible as a single
> complicated query if it wasn't for the GROUP BY:
> http://stackoverflow.com/a/8909568
>
> So, my latest idea for a solution is to write a python script that will
> import the section (last X days) of data from the EventLogging tables that
> we're interested in into a temporary sqlite database, then proceed with
> removing the upper and lower percentiles of the data, according to any
> column grouping that might be necessary. And finally, once the data
> preprocessing is done in sqlite, run similar queries as before to export
> the mean, standard deviation, etc. for given metrics to tsvs. I think using
> sqlite is cleaner than doing the preprocessing on db1047 anyway.
>
> It's quite an undertaking, it basically means rewriting all our current
> SQL => TSV conversion. The ability to use more steps in the conversion
> means that we'd be able to have simpler, more readable SQL queries. It
> would also be a good opportunity to clean up the giant performance query
> with a bazillion JOINS:
> https://gitorious.org/analytics/multimedia/source/a949b1c8723c4c41700cedf6e… can actually be divided into several data sources all used in the
> same graph.
>
> Does that sound like a good idea, or is there a simpler solution out there
> that someone can think of?
>
>
>
Just in case this is news to you: WMF is in the process of shutting down
our Tampa datacenter. The stat1 server that you know and love is in Tampa,
and will be shutdown along with the rest of most of Tampa in a couple of
weeks. stat1003 is a new replacement server for stat1 in our Ashburn
datacenter.
stat1003.wikimedia.org is up and running now! Over the last week we did an
audit of user accounts on stat1. We wanted to trim down the list of users
that had access to ones that actually used that access. (The complete list
of migrated accoutns is in this etherpad:
http://etherpad.wikimedia.org/p/stat1_accounts, under the 'Keep' heading.)
For the most part, everything will be the same on stat1003 as it was on
stat1. Home directories have been rsynced over (as of April 3), and /a has
been fully rsynced over as well (as of April 2nd). I will rsync /a again
once last time before stat1 is to be decommissioned. Crontabs have also
been migrated, so any cronjobs you had on stat1 are now also running on
stat1003.
There are a very few differences:
- stat1003.wikimedia.org is the new hostname.
If there is a desire for a stat1 redirect/cname to stat1003, let me know.
I don't plan on setting one up otherwise.
- stat1003 does not allow direct ssh.
You must use bastion hosts (bast1001.wikimedia.org) to ssh in. Add the
following to your .ssh/config file to do this:
Host stat1003.wikimedia.org
ProxyCommand ssh -e none bast1001.wikimedia.org exec nc -w 3600 %h %p
This will fail if you don't have an account on bast1001. You should have
one! If this doesn't work for you, let me know and we will fix that asap.
- /a has been renamed to /srv
We are trying to use /srv rather than /a on all new servers, in order to
keep more in line with Linux FHS: http://www.pathname.com/fhs/. I have set
up a symlink from /a -> /srv on stat1003, so if you have scripts that rely
on the the /a absolute path, they should continue to work on stat1003
without modification.
- Firewall!
stat1003 still has a public IP, but it also has pretty restrictive firewall
rules in place. If you need access to a service on stat1003, please submit
an RT ticket to open a hole in this firewall. This will allow us to be
more careful about what is running on stat1003 accessible to the outside
world.
Tampa will be shut down soon, and I need time to let you all migrate, and
also time enough to decommission stat1 before everything is turned off.
Please make sure stat1003 works for you and everything is as it should be
before Friday April 11th. After that date I plan to shutdown stat1.
Thanks! Don't hesitate to let me know if you need any help.
-Andrew Otto
---------- Forwarded message ----------
From: Andrew Otto <otto(a)wikimedia.org>
Date: Tue, Mar 25, 2014 at 12:19 PM
Subject: stat1 account audit
To: Analytics List <analytics(a)lists.wikimedia.org>, Development and
Operations Engineers <engineering(a)lists.wikimedia.org>, matanya <
matanya(a)foss.co.il>, Operations Engineers <ops(a)lists.wikimedia.org>
Hi all!
We will soon be migrating everything on stat1 over to a new server in
eqiad: stat1003. For the most part, data, accounts and cronjobs will be
copied over exactly as they are. However, stat1 has been around for a
while, and there are quite a few accounts on there, may of which are
probably not used. We're doing a little audit to see which accounts we
don't need to migrate to the new server.
I've pasted a list of names below that we are not sure about. None of
these users have logged in in the last few weeks at least.
If you see a name there and you know that it SHOULD DEFINITELY have an
account on the new stat1003 server, please let me know via a reply by
Tuesday April 1.
See also: https://rt.wikimedia.org/Ticket/Display.html?id=6789
Thanks!
-Andrew Otto
Add Analytics to cc, as I think they'll be interested as well :)
On Thu, Mar 27, 2014 at 3:20 AM, Yuvi Panda <yuvipanda(a)gmail.com> wrote:
> Hello!
>
> We are getting closer to a general release of the Wikipedia Android
> and iOS apps, and I think we should standardize on a User-Agent
> format. The old app just appended an identifier in front of the
> phone's default UA[1] but I think we can do better, to avoid privacy
> concerns[2].
>
> How about:
>
> WikipediaApp/<version> <OS>/<form-factor>/<version>
>
> This gives us all the info we need (App version, OS, Form Factor
> (Tablet / Phone) and OS version) without giving away too much. It is
> also fairly simple to construct and parse.
>
> For the latest alpha, my Nexus 4 would generate
>
> WikipediaApp/32 Android/Phone/4.4
>
> While an iOS device might generate
>
> WkipediaApp/2.0 iOS/Phone/7.1
>
> form-factor would just be Phone|Tablet for now, and can be expanded
> later if necessary.
>
> Thoughts?
>
> [1]: https://www.mediawiki.org/wiki/Mobile/User_agents#Apps
> [2]: https://www.mediawiki.org/wiki/EventLogging/UserAgentSanitization
> --
> Yuvi Panda T
> http://yuvi.in/blog
--
Yuvi Panda T
http://yuvi.in/blog
The next Research & Data showcase will be live-streamed tomorrow Wed 4/16 at 11.30 PT.
The streaming link will be posted on the lists a few minutes before the showcase starts and you can join the conversation on IRC at #wikimedia-research.
We look forward to seeing you!
Dario
This month:
WikiProjects yesterday, today and tomorrow (Jonathan Morgan)
In this talk I’ll give an overview of some research [1] [2] on English Wikipedia Wikiprojects: what kind of work they do, how they do it, and how they have changed over time.
Visualizing Wikipedia Communities using Gephi (Haitham Shammaa)
I will introduce Gephi as a tool for generating a visualized representation of Wikimedia projects communities. Gephi is an open-source network analysis and visualization software, and is utilized to generate graphs that represent users and the interaction among them based on the frequency they send messages to each other on their talk pages.
Hi,
just a quick heads up that the replication lag on analytics' s1 slave
(s1-analytics-slave.eqiad.wmnet, db1047.eqiad.wmnet) has risen to
>12 hours.
I filed RT ticket 7265:
https://rt.wikimedia.org/Ticket/Display.html?id=7265
Best regards,
Christian
--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Christian Aistleitner
Gruendbergstrasze 65a Email: christian(a)quelltextlich.at
4040 Linz, Austria Phone: +43 732 / 26 95 63
Fax: +43 732 / 26 95 63
Homepage: http://quelltextlich.at/
---------------------------------------------------------------
Update on this:
The analytics data on db67 has been backed-up and mysqld is now shutdown so
the host can be shipped to the new DC. The disks will be wiped before
transit.
In case people didn't see the earlier warning to these lists, or you
discover something critical was talking to db67 after all, let me know and
I can load the backup somewhere for read-only access.
On Tue, Apr 1, 2014 at 11:54 AM, Sean Pringle <springle(a)wikimedia.org>wrote:
>
>>
>> Hi!
>>
>> As PMTPA is going away, db67 is on the list of hosts to be
>> decommissioned. You guys have personal databases on there and it still
>> replicates S1.
>>
>> How does db67 get used these days? Is it simply an S1 slave less likely
>> to be suffering replag while db1047 does the heavy lifting, or is it more
>> special?
>>
>> Note that db67 is actually one of a batch getting shipped to the new DC,
>> so you will probably get it back under a new name. But that will take time,
>> obviously.
>>
>
--
DBA @ WMF