hello
i want to find out which browsers are bigger in which parts of the
world, and the ratios
however
the browser names are quite confusing on this page:
https://stats.wikimedia.org/wikimedia/squids/SquidReportCountryBrowser.htm
mozilla and firefox are two seperate entities ? is one mobile and the
other desktop ?
iOS is also seperate from ipad and iphone ? why so ?
thanks
thomas
Thank you, Erik. I am not sure where you fixed the missing country names. Please take a look at the screenshot here:
http://tinypic.com/r/2vanmg0/8
It shows the browser stats for the month of November 2013. YOu will notice that the first column, Country, has no entries. This happens for each month following September 2013.
To see what I mean by stats for the "Apple" browser take a look at
http://tinypic.com/r/be6b2w/8
The heading for the fifth column from the right hand side reads Apple. Given that there is already an entry for Safari & iOS it is not clear to me what Apple might mean.
I hope you will be able to help with this.
Atul
Dear Wikimedia,I am using Wikimedia statistical data and have run into a small issue. For some reason, starting from October 2013 your SquidReportBrowserCountry tables do not indicate the names of the countries which makes the data impossible to use. I had hoped that I would be able to "guess" the country names by looking at previous tables but the order of the countries changes quite frequently. I would be most grateful if you could indicate the names of the countries here. Also, would you mind telling me precisely what is meant by "Apple" in the list of browsers? You have simultaneous entries for iOS, iPad and Safari etc so it is not immediately obvious what Apple might mean.Atul Vaidya
Hi,
The team is focused on reaching its quarterly goals (
https://www.mediawiki.org/wiki/Wikimedia_Engineering/2014-15_Goals#Analytics
) and part of the team is using Agile Scrum solely for the delivery of
Editor Engagement Vital Signs. Production issues and Refinery development
are handled by the other part of the team (see Adventures in Clusterland
https://lists.wikimedia.org/pipermail/analytics/2014-September/002485.html )
Here’s a summary of the next sprint:
Bug ID
Component
Summary
Points
67459
Wikimetrics
Story:b WikimetricsUser runs 'Rolling New Active Editors' report
8
67460
Wikimetrics
Story:c WikimetricsUser runs 'Rolling Surviving New Active Editors' report
13
68822
EEVS
Story: AnalyticsEng has static file with list of projects and metrics
8
68445
EEVS
Story: EEVSUser downloads report with correct Http Cache Headers
5
68142
EEVS
Story: EEVSUser adds/removes a metric/project
21
That’s 55 points in 5 stories. You can see the sprint here:
http://sb.wmflabs.org/t/analytics-developers/2014-09-16/
cheers,
Kevin Leduc
Hi,
in the week from 2014-09-08–2014-09-14, Andrew and Jeff worked on the
following items around the Analytics Cluster and Analytics related
Ops:
* Logstash logs from Analytics Cluster
* More investigation around analytics1021 partition leader drop-outs
* Feasibility check on upgrading stat1002 to trusty
(details below)
Have fun,
Christian
* Logstash logs from Analytics Cluster
Logging via gelf got enabled again and is now puppetized.
Also names of threads in log messages now get normalized, which makes
it way easier to filter.
* More investigation around analytics1021 partition leader drop-outs
Logs from recent analytics1021 drop-outs have been analyzed, but no
clear culprit has been identified yet.
* Feasibility check on upgrading stat1002 to trusty
After the stat1003 upgrade to trusty a few weeks back, users asked to
upgrade stat1002 to trusty too. However, stat1002 runs Hadoop clients,
and Cloudera does not provide Hadoop packages for trusty yet, so
upgrading is not too straight forward. Currently, the best way forward
seems to be a dist-upgrade, but leaving Hadoop client packages at
precise. This approach worked on a labs test instance, but that would
put stat1002 in version limbo between precise and trusty. Once another
pair of Ops-eyes looked over the approach and agreed to it, stat1002
can get upgraded.
--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Christian Aistleitner
Kefermarkterstrasze 6a/3 Email: christian(a)quelltextlich.at
4293 Gutau, Austria Phone: +43 7946 / 20 5 81
Fax: +43 7946 / 20 5 81
Homepage: http://quelltextlich.at/
---------------------------------------------------------------
I don't think that we keep those logs historically. analytics-l (CC'd)
might have more insights.
Do we have anything more granular than the hourly view logs available here:
https://dumps.wikimedia.org/other/pagecounts-raw/
On Wed, Sep 17, 2014 at 10:39 AM, Valerio Schiavoni <
valerio.schiavoni(a)gmail.com> wrote:
> Hello Aaron,
> 1 hour is way too coarse.
> Let's say 1 second would be ok.
> Is that available ?
>
> On Wed, Sep 17, 2014 at 5:23 PM, Aaron Halfaker <aaron.halfaker(a)gmail.com>
> wrote:
>
>> Hi Valerio,
>>
>> The page counts dataset has a time resolution of one hour. Is that too
>> coarse? How fine of resolution do you need?
>>
>> On Wed, Sep 17, 2014 at 9:44 AM, Valerio Schiavoni <
>> valerio.schiavoni(a)gmail.com> wrote:
>>
>>> Hello Giovanni,
>>> on second thought, I think the Click dataset won't do either.
>>> I've parsed the smaller sample [1], which is said to be extracted from
>>> the bigger one.
>>>
>>> In that dataset there are ~34k entries related to Wikipedia, but they
>>> look like the following:
>>>
>>> {"count": 1, "timestamp": 1257181201, "from": "en.wikipedia.org", "to":
>>> "ko.wikipedia.org"}
>>>
>>> That is, the log only reports the host/domain accessed, but not the
>>> specific URL being requested (to be clear, the one in the HTTP request
>>> issued by the client).
>>>
>>> This is what is of main interest to me.
>>>
>>> Thanks for your interest anyway!
>>> Valerio
>>>
>>>
>>> 1 - http://carl.cs.indiana.edu/data/#traffic-websci14
>>>
>>> On Wed, Sep 17, 2014 at 4:24 PM, Valerio Schiavoni <
>>> valerio.schiavoni(a)gmail.com> wrote:
>>>
>>>> Hello Giovanni,
>>>> thanks for the pointer to the Click datasets.
>>>> I'd have to take a look at the complete dataset, to see how much of
>>>> those requests are touching wikipedia.
>>>>
>>>> Then, one of the requirements to access those datas is:
>>>> "The Click Dataset is large (~2.5 TB compressed), which requires that
>>>> it be transferred on a physical hard drive. You will have to provide the
>>>> drive as well as pre-paid return shipment. "
>>>>
>>>> I have to check if this is possible and how long this might take to
>>>> ship and send back an hard-drive from Switzerland.
>>>> I'll let you know !!
>>>>
>>>> Best,
>>>> Valerio
>>>>
>>>> On Wed, Sep 17, 2014 at 4:09 PM, Giovanni Luca Ciampaglia <
>>>> gciampag(a)indiana.edu> wrote:
>>>>
>>>>> Valerio,
>>>>>
>>>>> I didn't know such data existed. As an alternative, perhaps you could
>>>>> have a look at our click datasets, which contain requests to the Web at
>>>>> large (i.e., not just Wikipedia) generated from within the campus of
>>>>> Indiana University over a period of several months. HTH
>>>>>
>>>>> http://carl.cs.indiana.edu/data/#click
>>>>>
>>>>> Cheers
>>>>>
>>>>> G
>>>>>
>>>>> Giovanni Luca Ciampaglia
>>>>>
>>>>> ✎ 919 E 10th ∙ Bloomington 47408 IN ∙ USA
>>>>> ☞ http://www.glciampaglia.com/
>>>>> ✆ +1 812 855-7261
>>>>> ✉ gciampag(a)indiana.edu
>>>>>
>>>>> 2014-09-17 9:53 GMT-04:00 Valerio Schiavoni <
>>>>> valerio.schiavoni(a)gmail.com>:
>>>>>
>>>>>> Hello,
>>>>>> just bumping my email from last week, since so far I did not get any
>>>>>> answer.
>>>>>>
>>>>>> Should I consider that dataset to be somehow lost ?
>>>>>>
>>>>>> I've also contacted the researchers who partially released it, but
>>>>>> making it publicly available is tricky for them, due to its size (12 TB),
>>>>>> which might instead be somehow in the norms of the operations taken daily
>>>>>> by Wikipedia servers.
>>>>>>
>>>>>> Thanks again,
>>>>>> Valerio
>>>>>>
>>>>>>>
>>>>>>> On Wed, Sep 10, 2014 at 4:15 AM, Valerio Schiavoni <
>>>>>>> valerio.schiavoni(a)gmail.com> wrote:
>>>>>>>
>>>>>>>> Dear WikiMedia foundation,
>>>>>>>> in the context of a EU research project [1], we are interested in
>>>>>>>> accessing
>>>>>>>> wikipedia access traces.
>>>>>>>> In the past, such traces were given for research purposes to other
>>>>>>>> groups
>>>>>>>> [2].
>>>>>>>> Unfortunately, only a small percentage (10%) of that trace has been
>>>>>>>> made
>>>>>>>> made available (10%).
>>>>>>>> We are interested in accessing the totality of that same trace (or
>>>>>>>> even
>>>>>>>> better, a more recent one, but the same one will do).
>>>>>>>>
>>>>>>>> If this is not the correct ML to use for such requests, could
>>>>>>>> please anyone
>>>>>>>> redirect me to correct one ?
>>>>>>>>
>>>>>>>> Thanks again for your attention,
>>>>>>>>
>>>>>>>> Valerio Schiavoni
>>>>>>>> Post-Doc Researcher
>>>>>>>> University of Neuchatel, Switzerland
>>>>>>>>
>>>>>>>> 1 - http://www.leads-project.eu
>>>>>>>> 2 - http://www.wikibench.eu/?page_id=60
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Wiki-research-l mailing list
>>>>>> Wiki-research-l(a)lists.wikimedia.org
>>>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Wiki-research-l mailing list
>>>>> Wiki-research-l(a)lists.wikimedia.org
>>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>>>
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> Wiki-research-l mailing list
>>> Wiki-research-l(a)lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>
>>>
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> Wiki-research-l(a)lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>>
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
Hi,
in the week from 2014-09-01–2014-09-07 Andrew, Jeff, and I worked on
the following items around the Analytics Cluster and Analytics related
Ops:
* Investigating ways to allow queries across MediaWiki and Hadoop databases
* Deployment of webstatscollector's ulsfo https fix
* Re-run reports due to slave lag
* X-Analytics tag for used PHP engine
* Digging deeper into analytics1021 issues
(details below)
Have fun,
Christian
* Investigating ways to allow queries across MediaWiki and Hadoop databases
Currently data from Hadoop is fully separated from the our wiki's
databases, which it hard to query across the two different kinds of
databases, and hence makes researcher's life harder. Of the available
solutions to overcome this issue, Scoop seems like a suitable
approach. Scoop allows to import data from MediaWiki databases into
HDFS, and query them from within Hadoop. It was looked at how Scoop
imports work, and discussions were started with researchers on which
imports would be useful and which would not.
* Deployment of webstatscollector's ulsfo https fix
The fix that stops webstatscollector to count ulsfo https requests
twice got deployed.
* Re-run reports due to slave lag
The annonced schema changes caused more slave lag than some reports
could cope with, so we had to re-run a few reports by hand to make up
for the slave lag.
* X-Analytics tag for used PHP engine
Ops added a “php” tag to the X-Analytics header. This header allows to
identify which PHP implementation got used to serve requests.
* Digging deeper into analytics1021 issues
Despite the recent buffer increases, analytics1021 still from time to
time fails to act as proper partition leader. Since the failure is not
reproducible manually, debugging is tricky ... and time consuming. We
added some more monitoring, and waited for the issue to re-appear. It
seems that from time to time bursts of disk writes free up lots memory
on analytics1021. During these write-out phases, the processes on
analytics are getting starved. If starvation takes to long,
analytics1021 gets (correctly) kicked out of the partition leader
role. We now need to find the source of those write bursts, to see if
they are the real issue, or just the symptom of a different issue.
--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Christian Aistleitner
Kefermarkterstrasze 6a/3 Email: christian(a)quelltextlich.at
4293 Gutau, Austria Phone: +43 7946 / 20 5 81
Fax: +43 7946 / 20 5 81
Homepage: http://quelltextlich.at/
---------------------------------------------------------------