Sure,
+analytics
On Mon, Jun 8, 2015 at 5:50 PM, Adam Baso <abaso(a)wikimedia.org> wrote:
> Okay to move this to the public list and remove the internal list?
>
>
> On Monday, June 8, 2015, Joseph Allemandou <jallemandou(a)wikimedia.org>
> wrote:
>
>> Hi,
>> In fact instead of using isAppPageview UDF, one should use access_method
>> = 'mobile app' :)
>>
>> On Mon, Jun 8, 2015 at 12:44 PM, Marcel Ruiz Forns <mforns(a)wikimedia.org>
>> wrote:
>>
>>> + analytics internal
>>>
>>> Hi Jon and Adam,
>>>
>>> Yes, this totally helps. It confirms the work that we are doing.
>>>
>>> In fact, 3 of the items you list are already working and available for
>>> querying:
>>>
>>> - app (yes/no), through the UDF in hive 'isAppPageview()'
>>> - OS (android, iOS, other), through the user_agent_map['os_family']
>>> field
>>> - OS version, through the user_agent_map['os_major'] field
>>>
>>> And, as discussed, we'll add in short the possibility of querying for
>>> the 'app version' through the user_agent_map['app_version'] field.
>>>
>>> Thanks!
>>>
>>> Marcel
>>>
>>>
>>> On Fri, Jun 5, 2015 at 10:48 PM, Jon Katz <jkatz(a)wikimedia.org> wrote:
>>>
>>>> +Adam, who has been diving deep into this stuff lately.
>>>>
>>>> Hi Marcel,
>>>> I am a bit swamped right now, so can't look at the tickets, but of the
>>>> strings you showed, the fields more important to me are:
>>>>
>>>> -
>>>> *app (yes/no) *
>>>> -
>>>> *OS (android,iOS, other) *
>>>> - *app version (numeric)*
>>>> - tablet/phone (ios only, right)
>>>> - OS version (is this possible?)
>>>>
>>>> Bolded are big deals :) Does this help?
>>>> Thanks!
>>>> -J
>>>>
>>>> On Fri, Jun 5, 2015 at 11:13 AM, Marcel Ruiz Forns <
>>>> mforns(a)wikimedia.org> wrote:
>>>>
>>>>> Hi Jon,
>>>>>
>>>>> Here is Marcel from Analytics, how are you?!
>>>>>
>>>>> I am developing the analytics-refinery-source code that will parse the
>>>>> missing user-agent info for the mobile app requests. See the task:
>>>>> https://phabricator.wikimedia.org/T99932
>>>>>
>>>>> I think I understand what your team wants in that respect, so I
>>>>> already implemented the functionality, you can check it here:
>>>>> https://gerrit.wikimedia.org/r/#/c/216060/
>>>>> but I still wanted to confirm with you :-)
>>>>>
>>>>> Considering these user agent strings:
>>>>>
>>>>> WikipediaApp/2.0-r-2015-04-23 (Android 5.0.1; Phone) Google Play
>>>>> WikipediaApp/4.1.2 (iPhone OS 8.3; Tablet)
>>>>>
>>>>> The program should take the part after "WikipediaApp/" and before the
>>>>> next " " (space), for example: "2.0-r-2015-04-23" or "4.1.2".
>>>>> And store it as part of the user agent map, in a field named i.e.
>>>>> "app_version".
>>>>> So that it is easy queryable like: SELECT
>>>>> user_agent_map["app_version"] FROM ...
>>>>>
>>>>> Is that right?
>>>>>
>>>>> That was the question. Thanks!
>>>>>
>>>>> Marcel
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> Analytics-internal mailing list
>>> Analytics-internal(a)lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics-internal
>>>
>>>
>>
>>
>> --
>> *Joseph Allemandou*
>> Data Engineer @ Wikimedia Foundation
>> IRC: joal
>>
>
> _______________________________________________
> Analytics-internal mailing list
> Analytics-internal(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics-internal
>
>
+External
Hi,
I realized I don't get any responses from internal--but Joseph sent me
something helpful to me this morning so I saw all the responses..up to that
point. I think.
Anyway, thanks for the help!! The strange thing for me seems to be that
the numbers I get don't make that much sense to me.
For beta, (using query below) I get:
Unique IPs num_pvs referrer
3638 5967 external
1972 5760 internal
I would have expected a much larger external-->internal referrer ratio. In
other words, I would have expected that the vast majority of sessions or
even ips only hit the site 1x in a given hour. Instead, I am seeing that
54% of IPs are clicking a link within that hour... I would probably expect
to see #'s no more than 10%.
I am probably doing something wrong, right? I *know* that I am making
convenient assumptions here that do not apply to edge cases, so let's not
consider those unless you think they make a big difference. Perhaps by
using the referer field I am inherently leaving out all of the external
traffic for which we do not have data?
Thanks!
-J
SELECT
COUNT(DISTINCT ip) AS Unique_IPs,
x_analytics_map['mf-m'] AS mobile_site, count(*) AS num_pvs,
CASE WHEN referer LIKE "%en.m.wikipedia%" THEN 'internal' ELSE 'external'
END AS session_depth
FROM
wmf.webrequest
WHERE TRUE = TRUE
AND webrequest_source = 'mobile'
AND year = 2015
AND month = 5
AND day = 25
and hour = 1
AND agent_type = "user"
AND is_pageview = TRUE
AND x_analytics_map['mf-m'] IS NOT NULL
AND uri_host like "%en.m.wikipedia.org%"
GROUP BY
CASE WHEN referer LIKE "%en.m.wikipedia%" THEN 'internal' ELSE 'external'
END,
x_analytics_map['mf-m']
ORDER BY hits DESC
LIMIT 50;
On Thu, May 28, 2015 at 2:30 PM, Jon Katz <jkatz(a)wikimedia.org> wrote:
> Hi,
> Trying to run a hive query to rough-count number of 1-page-only,
> 'sessions' on mobile-web Here is the error I get
>
>
> FAILED: ParseException line 15:22 missing KW_END at 'device_family' near
> 'device_family'
> line 15:35 missing EOF at ''] <> "Spider"\n AND is_pageview = TRUE\n AND
> x_analytics_map['' near 'device_family
>
> Here is the query:
>
> SELECT
> COUNT(DISTINCT ip) AS hits,
> x_analytics_map['mf-m'] AS mobile_site, count(*) AS num_pvs,
> CASE
> WHEN referer LIKE "%en.m.wikipedia%"
> THEN 'internal'
> ELSE 'Misc’
> END AS session_depth
> FROM
> wmf.webrequest
> WHERE
> YEAR = 2015
> AND MONTH = 5
> AND DAY = 25
> AND user_agent_map['device_family'] <> "Spider"
> AND is_pageview = TRUE
> AND x_analytics_map['mf-m'] IS NOT NULL
> AND uri_host like "%en.m.wikipedia.org%"
> GROUP BY session_depth, mobile_site
> ORDER BY hits DESC
> LIMIT 50;
>
>
> Any advice?
>
> Thanks!
>
> Jon
>
>
>
>
Hi Pine,
I don't know frankly. A lot of people believe Visual Editor would help and
I can understand why. I'd love to see numbers from a pilot test. I
thought there was one at some point?
If I remember correctly, Visual Editor is now hidden under people's user
options? Could be interesting to see if male or female editors are using
it now if that data exists.
Best,
Jason
p.s. switching away from digests since I guess I can't respond in-thread
--
Jason Radford
Doctoral Student, Sociology, University of Chicago
Visiting Researcher, Lazer Lab, Northeastern University
*Connect*: LinkedIn <http://www.linkedin.com/in/jsradford>, Twitter
<http://www.twitter.com/jsradford>, University of Chicago
<http://home.uchicago.edu/%7Ejsradford/>
*Play Games for Science at Volunteer Science
<http://www.volunteerscience.com>*
Hi,
Are there any easy to see statistics about the survival rate of
newly-created pages in Wikipedias in different languages?
I need this for understanding the success of ContentTranslation, which is
primarily an article creation tool
I couldn't find something like this in stats.wikimedia.org. It does have
the number of created pages per day. For en.wikipedia, for example, it's
about 800. But how many are deleted the same day ("speedy")? Knowing that
alone would be very useful, and there are other possible questions, such
as: How many are deleted within a week or a month? What is the age
distribution of the articles that are deleted every day - how many of them
were created the same day, how many were created a year ago, and so on.
Using a simple (and possibly wrong - I don't do this often) query,[1] I
found that around 500 or 600 deletions happen each day in the English
Wikipedia. Does this sound sensible? Is there a better query that I could
run, or a dashboard where I could see such a thing conveniently? And of
course, I'd love to see it for all languages and not just English.
Thanks for any help!
[1] SELECT max(ar_id), ar_title, ar_timestamp FROM `archive` WHERE
ar_namespace = 0 and ar_timestamp between 20150521000000 and 20150521999999
group by ar_title ORDER BY NULL;
--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
“We're living in pieces,
I want to live in peace.” – T. Moore
Many people on these lists design and use tools that depend on action=query (beyond bots). If you do, please read the following:
> Begin forwarded message:
>
> From: "Brad Jorsch (Anomie)" <bjorsch(a)wikimedia.org>
> Subject: [Wikitech-l] API BREAKING CHANGE: Default continuation mode for action=query will change at the end of this month
> Date: June 2, 2015 at 10:42:47 PM GMT+2
> To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>, mediawiki-api-announce(a)lists.wikimedia.org
> Reply-To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
>
> As has been announced several times (most recently at
> https://lists.wikimedia.org/pipermail/wikitech-l/2015-April/081559.html),
> the default continuation mode for action=query requests to api.php will be
> changing to be easier for new coders to use correctly.
>
> *The date is now set:* we intend to merge the change to ride the deployment
> train at the end of June. That should be 1.26wmf12, to be deployed to test
> wikis on June 30, non-Wikipedias on July 1, and Wikipedias on July 2.
>
> If your bot or script is receiving the warning about this upcoming change
> (as seen here
> <https://www.mediawiki.org/w/api.php?action=query&list=allpages>, for
> example), it's time to fix your code!
>
> - The simple solution is to simply include the "rawcontinue" parameter
> with your request to continue receiving the raw continuation data (
> example
> <https://www.mediawiki.org/w/api.php?action=query&list=allpages&rawcontinue=1>).
> No other code changes should be necessary.
> - Or you could update your code to use the simplified continuation
> documented at https://www.mediawiki.org/wiki/API:Query#Continuing_queries
> (example
> <https://www.mediawiki.org/w/api.php?action=query&list=allpages&continue=>),
> which is much easier for clients to implement correctly.
>
> Either of the above solutions may be tested immediately, you'll know it
> works because you stop seeing the warning.
>
> I've compiled a list of bots that have hit the deprecation warning more
> than 10000 times over the course of the week May 23–29. If you are
> responsible for any of these bots, please fix them. If you know who is,
> please make sure they've seen this notification. Thanks.
>
> AAlertBot
> AboHeidiBot
> AbshirBot
> Acebot
> Ameenbot
> ArnauBot
> Beau.bot
> Begemot-Bot
> BeneBot*
> BeriBot
> BOT-Superzerocool
> CalakBot
> CamelBot
> CandalBot
> CategorizationBot
> CatWatchBot
> ClueBot_III
> ClueBot_NG
> CobainBot
> CorenSearchBot
> Cyberbot_I
> Cyberbot_II
> DanmicholoBot
> DeltaQuadBot
> Dexbot
> Dibot
> EdinBot
> ElphiBot
> ErfgoedBot
> Faebot
> Fatemibot
> FawikiPatroller
> HAL
> HasteurBot
> HerculeBot
> Hexabot
> HRoestBot
> IluvatarBot
> Invadibot
> Irclogbot
> Irfan-bot
> Jimmy-abot
> JYBot
> Krdbot
> Legobot
> Lowercase_sigmabot_III
> MahdiBot
> MalarzBOT
> MastiBot
> Merge_bot
> NaggoBot
> NasirkhanBot
> NirvanaBot
> Obaid-bot
> PatruBOT
> PBot
> Phe-bot
> Rezabot
> RMCD_bot
> Shuaib-bot
> SineBot
> SteinsplitterBot
> SvickBOT
> TaxonBot
> Theo's_Little_Bot
> W2Bot
> WLE-SpainBot
> Xqbot
> YaCBot
> ZedlikBot
> ZkBot
>
>
> --
> Brad Jorsch (Anomie)
> Software Engineer
> Wikimedia Foundation
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hi,
Since participating in the Inspire grant campaign, I got interested in the
question of exactly how many women would be needed on Wikipedia to close
the gender gap. I ran some simulations and came up with some fairly radical
numbers. For example, according to my calculations, there are so few
current and new female editors that, even if every current and new active,
female editor stayed active for ten years, we wouldn't close the gap.
I've posted the results
<https://civilsociology.wordpress.com/2015/05/31/closing-the-gender-gap-on-w…>
to my blog. It's password protected so I can share the results and get
feedback without making it pubic. You can access them by using the
password "wikipedia". I was hoping you folks on the analytics list would be
keen to catching any missing variables or unrealistic assumptions in the
work.
I appreciate any feedback,
Jason
--
Jason Radford
Doctoral Student, Sociology, University of Chicago
Visiting Researcher, Lazer Lab, Northeastern University
*Connect*: LinkedIn <http://www.linkedin.com/in/jsradford>, Twitter
<http://www.twitter.com/jsradford>, University of Chicago
<http://home.uchicago.edu/%7Ejsradford/>
*Play Games for Science at Volunteer Science
<http://www.volunteerscience.com>*
Hi all - some interesting analysis on the share-a-fact feature from the mobile team.
-Toby
Begin forwarded message:
> From: Adam Baso <abaso(a)wikimedia.org>
> Date: May 21, 2015 at 12:05:29 PDT
> To: mobile-l <mobile-l(a)lists.wikimedia.org>
> Subject: [WikimediaMobile] Share a Fact Initial Analysis
>
> Hello all,
>
> We’ve been looking at some initial results from the Share a Fact feature introduced on the Wikipedia apps for Android and iOS in its basic "minimal viable product" implementation. Here’s some analysis, using data from one day (20150512) with respect to the latest stable versions of the apps (2.0-r-2015-04-23 on Android and 4.1.2 on iOS) for that day.
>
> * On iOS, when a user initiates the first step of the default sharing workflow - tapping the up-arrow box share button (6,194 non-highlighting instances for the day under question) - about 11.7% of the time it yielded successful sharing.
>
> * On Android, it’s not possible to easily tell when the sharing workflow was carried through to successful share, but we anticipate the Android success rate is currently much higher, as general engagement percentage up to the point of picking an app for sharing is higher on Android than on iOS.
>
> * On Android, when presented with the share card preview, 28.0% of the time the ‘Share as image’ button was tapped and 55.5% of the time the 'Share as text' button was tapped, whereas on iOS it was 8.4% ‘Share as image’ and 16.8% ‘Share as text’.
>
> * The forthcoming 4.1.4 version of the iOS app will relax its default sharing snippet generation rules and be more like the Android version in that respect. We anticipate this will result in higher engagement with both the ‘Share as image’ and ‘Share as text’ buttons on iOS, and we should be able to verify this once the 4.1.4 iOS version is released and generally adopted (usually takes 4-5 days after release; the 4.1.4 release isn’t released yet).
>
> * On the Android app the ‘Share’ option is located on the overflow menu, not as part of the main set of UI buttons. This potentially increases the likelihood of Android users being primed to step through the workflow. On the iOS app, the share button (up-arrow box) is plainly visible from the main UI and not an overflow menu, and this probably creates a different priming dynamic for the iOS demographic.
>
> * When users on iOS tapped on the ‘Share as image’ or ‘Share as text’ buttons, there is a pretty sharp drop off at the next stage - the system sharesheet. Once the sharesheet was presented to iOS users, 41.6% of the time it resulted in active abandonment. We believe this probably has something to do with the relatively small set of default apps listed on the sharesheet and the extra work involved with exposing additional social apps for sharing in that context. As with the Android app, the labels of ‘Share as image’ and ’Share as text’ may also pose something of a hurdle at least for first time users of the feature. To this end, there is an onboarding tutorial planned at least on Android.
>
> * For a one hour period (2015051201) there were about 100 pageviews in some sense attributable to Share a Fact using a provenance parameter available on the latest stable versions of the apps at that time; this may slightly overstate the number of pageviews attributable to the two specific apps reviewed in this analysis, but probably not too much (n.b., previously a different source parameter was used than the new wprov provenance parameter). Pageviews are not the sole motivation for the feature, but following the trendline over the long run should be interesting. Impact on social media and the destinations of shares is a little harder to capture directly, but https://twitter.com/search?f=realtime&q=%40wikipedia%20-%40itzwikipedia%20f… gives one a sense about image shares, at least.
>
> * A couple potential options for increasing sharing include:
>
> ** Trying to add support for sharing to the Photos app on iOS. People may be interested in using images from the Photos apps for various workflows, as Dan Garry has noted.
>
> ** Offering a more concise app picklist, in particular explicitly adding the native OS app components (namely, Twitter and Facebook, and as mentioned, Photos if possible), with an option to expose the sharesheet for additional options if necessary. This is probably also somewhat confined to iOS, although conceivably a similar approach could be possible on Android. On Android the full list of applications in its equivalent of the sharesheet is by default readily available to the user, though.
>
> ** On Android, exposing the diagonal arrow share button on the main interface akin to how the iOS version of the app shows the up-arrow share button. This may introduce more opportunities for sharing (and thus numbers of abandons would go up in tandem with numbers of shares), but would also partially clutter the interface and probably increase abandon. A controlled experiment may be useful for observing the impact of such an approach.
>
> * As a point of reference, for the app versions in scope for this analysis over a single day, there appeared to be approximately 3.78 million Wikipedia for Android pageviews and 1.19 Wikipedia Mobile for iOS app pageviews. There were about 6.73 million app pageviews on the “modern” versions of these apps total for this particular day, meaning there were about 1.75 million pageviews on other modern versions of the app.
>
> * Examination of the categories of successful shares on iOS showed the following distributions:
>
> Images:
> 48.5% messaging
> 25.5% sharesheet copy
> 22.9% social
> 1.8% productivity
> 0.9% reading
>
>
> Text:
> 53.6% messaging
> 31.9% sharesheet copy
> 7.1% social
> 5.4% reading
> 2.0% productivity
>
>
> Here were some queries used in the analysis:
>
> == SHARE A FACT ATTRIBUTABLE PAGEVIEWS FOR ONE HOUR ==
>
> select wprov, uri_host, count(*) from (select x_analytics_map['wprov'] as wprov, uri_host
> from webrequest where year = 2015 and month = 5 and day = 12 and hour = 1 and is_pageview = true and uri_host like '%.wikipedia.org' and x_analytics_map['wprov'] is not null) t
> group by wprov, uri_host;
>
>
> == PAGE VIEWS FOR THE DAY FOR THE “MODERN” VERSIONS OF THE APPS ==
>
> SELECT
> user_agent, count(*)
> FROM
> wmf.webrequest
> tablesample(BUCKET 1 OUT OF 100 ON rand())
> WHERE
> YEAR = 2015
> AND MONTH = 5
> AND DAY = 12
> AND is_pageview = TRUE
> AND lower(uri_host) like '%.wikipedia.org'
> AND user_agent like 'WikipediaApp%'
> GROUP BY user_agent;
>
>
>
> == HIGHLIGHTING SESSION CASE FOR SPECIFIC VERSIONS OF THE APPS ==
> select CASE WHEN t2.userAgent LIKE 'WikipediaApp/2.0-r-2015-04-23%' THEN 'Android' WHEN t2.userAgent LIKE 'WikipediaApp/4.1.2%' THEN 'iOS' END AS 'ua', t1.event_action, t1.event_sharemode, t1.event_target, count(*) from MobileWikiAppShareAFact_11331974 t1 inner join MobileWikiAppShareAFact_11331974 t2 on t1.event_shareSessionToken = t2.event_shareSessionToken where t1.timestamp > '20150512' and t1.timestamp < '20150513' and t2.timestamp > '20150512' and t2.timestamp < '20150513' and t1.event_action != 'highlight' and t2.event_action = 'highlight' and (t2.userAgent like 'WikipediaApp/2.0-r-2015-04-23%' or t2.userAgent like 'WikipediaApp/4.1.2%') group by ua, t1.event_action, t1.event_sharemode, t1.event_target;
>
>
> == NON-HIGHLIGHTING SESSION CASE FOR SPECIFIC VERSIONS OF THE APPS ==
> n.b., subtract the highlighting cases from the non-highlighting cases to arrive at the default sharing behavior. Technically, inner joins can be used to do more comprehensive session analysis, but the queries take a long time.
>
> select CASE
> WHEN userAgent LIKE 'WikipediaApp/2.0-r-2015-04-23%' THEN 'Android'
> WHEN userAgent LIKE 'WikipediaApp/4.1.2%' THEN 'iOS'
> END AS 'ua', event_action, event_sharemode, event_target,
> count(*) from MobileWikiAppShareAFact_11331974 where timestamp > '20150512' and timestamp < '20150513' and (userAgent like 'WikipediaApp/2.0-r-2015-04-23%' or userAgent like 'WikipediaApp/4.1.2%') group by ua, event_action, event_sharemode, event_target;
>
> -Adam
> _______________________________________________
> Mobile-l mailing list
> Mobile-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/mobile-l
Hello,
I'd like to set up an ETL process to get your pagecounts*.gz files and load
it into our system. I understand you provide hourly data. Is there a
specific schedule when new files are uploaded to the site (
http://dumps.wikimedia.org/other/pagecounts-raw/2015/) ? I'd like to get
new data as soon as possible.
Thanks,
Vadim
--
Vadim Y. Bichutskiy
@vybstat
Lead Data Scientist
Echelon Insights
vadim(a)echeloninsights.com
(408) 439-5932
ᐧ