Just a heads-up:
Analytics-store is seeing several hours of replag on s1, s4, and s6.
s4 is me doing a commonswiki schema change, which should be done
shortly. s1 and s6 are lagging due to load from queries like:
create table staging.enwiki_intra select a.pl_from as page_id_from,
a.pl_title as page_title_to, b.page_id as page_id_to from
enwiki.pagelinks a left join enwiki.page b on a.pl_title=b.page_title
where a.pl_namespace=0 and a.pl_from_namespace=0;
Some very large cross joins there. No doubt it's intended activity.
And if not, now at least you know what's happening ;-)
BR
Sean
Hey all,
(Sending this to the public list because it's more transparent and I'd
like people who think this data is useful to be able to shout out)
Erik has asked me to write an exploratory app for user-agent data. The
idea is to enable Product Managers and engineers to easily explore
what users use so they know what to support. I've thrown up an example
screenshot at http://ironholds.org/agents_example_screen.png (I'd
host it on Commons, inb4Dario, but I'm not sure the copyright status
of the UI)
One side-effect of this is that we end up with files of common user
agents, split between {readers,editors} and {mobile, desktop}, parsed
and unparsed. I'd like to release these files. The reuse potential is
twofold; researchers and engineers can use the parsed files to see
what browser penetration looks like globally and what browsers should
be supported at a top-10, and software engineers can use the unparsed
files to improve detection rates.
The privacy implications /should/ be minimal, because of how this data
is gathered. The editor data is gathered from the checkuser table,
globally, and automatically excludes any user agent used by fewer than
50 distinct usernames. The reader data is gathered from a month of
1:1000 sampled log files, and excludes any agent responsible for fewer
than 500 pageviews in a 24 hour period (except, sampled. So,
practically speaking, that's 500,000 pageviews)
What do people think about making this a data release? Would people
get value from the data, as well as the tool?
--
Oliver Keyes
Research Analyst
Wikimedia Foundation
Pinging Analytics to ask about editor longevity data (:
My understanding is that newbies (<= 10 edits) are more likely to disappear
early in their "careers" than they were 5 years ago, but that editors that
have been active for years are likely to remain active for years.
It would be interesting, as part of the strategic plan process, to work on
improving editor retention. I believe that this may be related to our
treatment and training of newcomers (onboarding, civility, NPP, Teahouse,
etc.) in addition to external changes in our environment (e.g. the rise of
Facebook).
Pine
*This is an Encyclopedia* <https://www.wikipedia.org/>
*One gateway to the wide garden of knowledge, where lies The deep rock of
our past, in which we must delve The well of our future,The clear water we
must leave untainted for those who come after us,The fertile earth, in
which truth may grow in bright places, tended by many hands,And the broad
fall of sunshine, warming our first steps toward knowing how much we do not
know.*
*—Catherine Munro*
On Wed, Mar 4, 2015 at 8:44 AM, Anders Wennersten <mail(a)anderswennersten.se>
wrote:
> On svwp there has over the years been 45 individuals who have each made
> more then 38000 edits.
> Of these 45, 44 are still active, only one has left (in 2009) making 97,7
> still around. For the users with less then 38 000 edits, only about 6 out
> of 10 is still active.
>
> Is this a global valid number, that when you have made 38000 edits you are
> fully addicted to Wikipedia ("until death do us part")?
>
>
> Anders
>
>
>
>
>
> _______________________________________________
> Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/
> wiki/Mailing_lists/Guidelines
> Wikimedia-l(a)lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
This is just a reminder that for new cookies, it's generally best to use
HttpOnly (https://www.owasp.org/index.php/HttpOnly), unless it is
desired that the cookie is available to JavaScript.
Matt Flaschen
Hey all,
I'm very pleased to announce that the new pageviews definition is (1)
complete and (2) implemented. Prominent features include:
1. A removal of the per-project double-counting due to banners;
2. The removal of meta over-over-OVER-counting due to EventLogging;
3. The inclusion of Mobile App traffic;
4. The inclusion of projects with non-standard URL schemes.
What this means in practice is that when the data begins coming out
through stats.wikimedia.org and elsewhere, you can expect to see a
substantial drop in traffic. This is not a drop in traffic; it is a
correction for the massive inaccuracies in the existing definition,
which are causing an artificial /rise/.
So, what's next? Well, the Analytics Engineering team has to implement
the functionality on a regularly running job to get the data released
on a consistent basis. We also need to split out per-article pageviews
and do some tagging to provide granular reports - see
https://meta.wikimedia.org/wiki/Research:Page_view#Future_work . But
the core definition is complete.
Huge thanks to Andrew Otto, Christian, Nuria, Aaron and Bob West for
their contributions to this project.
--
Oliver Keyes
Research Analyst
Wikimedia Foundation
May I suggest leaving a notice somewhere, like the signpost so people know.
Reguyla
Sent from my T-Mobile 4G LTE device
------ Original message------
From: Oliver Keyes
Date: Wed, Mar 4, 2015 1:20 PM
To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics.;
Subject:[Analytics] [Announce] new Pageviews definition complete and implemented
Hey all,
I'm very pleased to announce that the new pageviews definition is (1)
complete and (2) implemented. Prominent features include:
1. A removal of the per-project double-counting due to banners;
2. The removal of meta over-over-OVER-counting due to EventLogging;
3. The inclusion of Mobile App traffic;
4. The inclusion of projects with non-standard URL schemes.
What this means in practice is that when the data begins coming out
through stats.wikimedia.org and elsewhere, you can expect to see a
substantial drop in traffic. This is not a drop in traffic; it is a
correction for the massive inaccuracies in the existing definition,
which are causing an artificial /rise/.
So, what's next? Well, the Analytics Engineering team has to implement
the functionality on a regularly running job to get the data released
on a consistent basis. We also need to split out per-article pageviews
and do some tagging to provide granular reports - see
https://meta.wikimedia.org/wiki/Research:Page_view#Future_work . But
the core definition is complete.
Huge thanks to Andrew Otto, Christian, Nuria, Aaron and Bob West for
their contributions to this project.
--
Oliver Keyes
Research Analyst
Wikimedia Foundation
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
Hi guys,
I am doing some research and I struggling a bit to obtain geolocalized
articles in several languages. They told me that the best tool to obtain
the geolocalization for each article would be GeoData API. But I see there
I need to introduce each article name and I don't know if it is the best
way.
I am thinking for instance that for big wikipedies like French or German I
might need to make a million queries to get only those with coords... Also,
I would like to obtain the region according to ISO 3166-2 which seems to be
there.
My objective is to obtain different lists of articles related to countries
and regions.
I don't know if using WikiData with python would be a better option. But I
see that there there isn't the region. Maybe I could combine WikiData and
some other tool to give me the region.
Anyone could help me?
Thanks a lot.
Marc Miquel
ᐧ
Hi Seth -- we're currently working to provide geo-located page views with a
privacy acceptable level of aggregation. We don't currently have an ETA.
I'm cc'ing the public analytics list for more information.
Best,
-Toby
On Mon, Mar 2, 2015 at 9:41 AM, Seth Stephens-Davidowitz <
seth.stephens(a)gmail.com> wrote:
> Dear Toby,
> Domas Mituzas suggested I contact you. I am looking for data on page
> views by location. I only am able to find total page views. But it is not
> broken down by location. Does this data exist anywhere?
>
> Thanks so much,
> Seth
>
Hi Analytics,
I've been digging through some of the wiki page count files and found some
strange results.
In several files, the Main_page visit count is vastly lower than expected:
mruttley$ cat pagecounts-20141101-170000 | grep "^en Main_page"
en Main_page 260 6202982
mruttley$ cat pagecounts-20150201-170000 | grep "^en Main_page"
en Main_page 200 4802139
Only 260 and 200 page views!
What do you reckon? Am I doing it wrong?
Best regards,
Matthew