(cc-ing analytics@, our public list in case other contributors can chime in
>For an evaluation of one of the options I need to know how many percent of
the newly registered users in the german speaking area submit an >email
address - on average. Do you have any data on that that you could share
with us? This would be very helpful!
We do not have data on this regard, our datasets are geared towards
pageviews and edits for the most part and this is neither. Now, if you have
some development resources available this is data that a developer can get
from the mediawiki database without too much effort.
On Wed, Apr 13, 2016 at 7:27 AM, Erik Zachte <ezachte(a)wikimedia.org> wrote:
> Hi Katharina,
> Sorry I have no data for this.
> Maybe our Analytics Team can help?
> Referring you to Nuria Ruiz, who leads the team.
> -----Original Message-----
> From: Katharina Nocun [mailto:email@example.com]
> Sent: Wednesday, April 13, 2016 13:51
> To: erikzachte(a)infodisiac.com
> Subject: Data Request: How many % of new WP Users do submit an Email?
> Dear Erik,
> let be briefly introduce myself: I work for Wikimedia Germany as campaigns
> manager. Currently we are evaluating different options for a campaign that
> shall attract new contributors for Wikipedia in the german speaking area.
> For an evaluation of one of the options I need to know how many percent of
> the newly registered users in the german speaking area submit an email
> address - on average. Do you have any data on that that you could share
> with us? This would be very helpful!
> Best regards,
> Katharina Nocun
> Projektmanager (Online-) Kampagnen
> Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin Tel. +49
> (0)30-219 15 826-0 http://wikimedia.de
> Stellen Sie sich eine Welt vor, in der jeder Mensch an der Menge allen
> Wissens frei teilhaben kann.
> Helfen Sie uns dabei!
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
> der Nummer 23855 B.
> Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I
> Berlin, Steuernummer 27/029/42207.
Hope my email finds you well. My name is Nima Dashtban and I'm a student of
computer science in Ca'foscari University of Venice / Italy.
I am investigating these access logs of wikipedia pages:
In particular I would like to build up an DB of the time series of accesses
to (Italian) pages of wikipedia that have a GPS position, i.e. wikipedia
page that refer to geographical point of interests. I think that such data
could be useful as predictive signal of interest of potential visitors of
such geographical places.
Any help of you whether you say it is possible or not would be huge for me.
Sincerely and Regards,
It is hard to say but from your request seems that you are asking for
requests to a specific type of page and I do not think that is data we have
on the analytics side.
Please file a phabricator request with the info you need, make sure to
include details as to the data you are requesting and what do you plan on
using it for and we can correspond on the ticket. See an example of a data
request filed on phabricator: https://phabricator.wikimedia.org/T128132
If you are looking to count pageviews of a set of pages you already have
you can use the pageview API to get pageview numbers on a per article basis:
On Fri, Apr 8, 2016 at 10:29 AM, Nima Dashtban <nima.dashtban(a)gmail.com>
> Hello Dr Nuria,
> Hope my email finds you well. My name is Nima Dashtban and I am MSc
> student of Ca'foscari University of Venice/Italy. My Final Thesis is:
> "Build a DB of the time series of accesses to (Italian) pages of wikipedia
> that have a GPS position, i.e. wikipedia page that refer to geographical
> point of interests with specific coordinates. I think that such data could
> be useful as predictive signal of interest of potential visitors of such
> geographical places."
> In order to do it, at the very beginning I need to access to the log which
> saves geographical coordinates for different locations. Is it possible to
> help me for this access?
> Thank you in advance,
> Nima Dashtban
Best list to ask these kinds of questions is analytics@ (cc-ed).
>A minor question - could you also count the number of unique recurring
user agents per month? I.e. the number of visits that return and have a
still valid cookie (e.g. by >marking the cookie after the count).
mmm...Not sure what you mean by "recurring" as you can have thousands of
people with the same user agent, right? Think "everyone in Seattle with an
iPhone and the latest OS using Safari" . You can add other pieces of info
like IP, but in mobile and due to NAT-ing  that can also mean a group of
thousands of people. So it will always under-report heavily the number of
unique devices if you use "recurring user agents" as base for your main
Now, I might be missing something as your question is brief, maybe you can
elaborate a bit more ?
>I am worried that the current number, due to the freshness offset might
Since the offset calculation takes IP into account when looking for
freshness and it only keeps devices having 1 event without cookies and 0
with cookies the calculation is likely to under-report in mobile, due to,
again, NAT-ing and user agents being shared among many devices. We see this
on our data as smaller offset numbers in mobile projects than desktop
projects. Now, this methodology might over report for a user that uses many
distinct IPS, same browser, does 1 request and clears cookies after every
session, now this is a far less often a common of a scenario.
Hopefully this makes sense.
>Again, congratulations on the work! I am really happy to see the WMF not
being dependent on a commercial traffic numbers provider anymore!
Many thanks for reading!
On Fri, Apr 8, 2016 at 10:30 AM, Denny Vrandečić <vrandecic(a)gmail.com>
> Hi Nuria, Aaron,
> first congratulations on the Unique devices work! I am really impressed by
> the solution and the dataset. I am looking forward to the visualizations
> that will come out from this.
> A minor question - could you also count the number of unique recurring
> user agents per month? I.e. the number of visits that return and have a
> still valid cookie (e.g. by marking the cookie after the count).
> My reasoning is the following: knowing well that it would possibly further
> underreport the number of unique user agents, it would get rid of all user
> agents that clean their cookies out or that use some form of incognito
> mode. It would only count people who have been there, got a cookie,
> returned, and then we mark the cookie, and don't count them further until
> it expires.
> I am worried that the current number, due to the freshness offset ,
> might be overreporting, and I do not agree fully with your reasoning in
> that page that this is OK. Counting only the recurring ones would clean
> that up, give a more reliable number, although it would potentially
> underreport the people who indeed only come once a month (a number I don't
> expect to be too large).
> It would be interesting to see these two numbers side by side.
> Again, congratulations on the work! I am really happy to see the WMF not
> being dependent on a commercial traffic numbers provider anymore!
Hello all, I'm prepared to participate in Individual Engagement Grant (IEG) and has an idea closely linked to the Accuracy Review Project raised by James Salsman. Here is a brief summary of my proposal: Out-of-date information and references are common in Wikipedia articles, especially in Chinese Wikipedia. Therefore, I would like to evaluate some existed solutions of identifying those out-of-date contents, and create a new bot to identify the information based on the results of testing. More detailed tests will be arranged after that by selected articles from Wikipedia and the cases that we compile. And here is the URL of the project proposal: https://meta.wikimedia.org/wiki/Grants:IdeaLab/Searching_for_out-of-date_in… And please comment on the proposal in the discussion board of it: https://meta.wikimedia.org/wiki/Grants_talk:IdeaLab/Searching_for_out-of-da… Li Linxuan
Cross-posting this announcement since people on Analytics-l tend to use
dumps a lot
basically http access is being redirected to https starting April 4th
On Fri, Apr 1, 2016 at 7:03 AM, Ariel Glenn WMF <ariel(a)wikimedia.org> wrote:
> We plan to make this change on April 4 (this coming Monday), redirecting
> plain http access to https.
> A reminder that our dumps can also be found on our mirror sites, for those
> who may have restricted https access.
> Ariel Glenn
> Wikitech-l mailing list