I saw that stats.grok.se has data for May 1-8 and May 10 but is missing May
9 data. Any idea if this is because stats.grok.se didn't run properly or
because the Wikimedia dumps for May 9 aren't available?
Vipul
Sounds good. Perhaps Kourosh Karimkhany could help with a conversation with
Facebook.
Pine
On May 11, 2015 7:47 AM, "Dan Andreescu" <dandreescu(a)wikimedia.org> wrote:
On Fri, May 8, 2015 at 10:00 PM, Jeremy Baron <jeremy(a)tuxmachine.com> wrote:
> On Sat, May 9, 2015 at 1:58 AM, Oliver Keyes <okeyes(a)wikimedia.org> wrote:
> > Facebook sanitises their users' referers. There's no research and
> > engagement work to perform there.
>
> well facebook surely has this data. (both views of Wikipedia content
> and probably also clicks of edit buttons) we could ask them about
> sharing it.
>
We could even create a prominent "who brought editors to Wikipedia"
dashboard. And if Facebook shares with us verifiable data, we can use it
as an incentive for google to add an Edit button in their Knowledge graph
(because then they'd get a spot on our spiffy dashboard). Of course, we'd
put a bit "this is all self reported data", etc. as a disclaimer at the
bottom.
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
Quite possibly. I'm only on Facebook occasionally. Still, I haven't heard
of WMF tracking FB content views or edits that come through FB. Perhaps
there are opportunities here.
Pine
On Fri, May 8, 2015 at 6:06 PM, Alex Monk <krenair(a)gmail.com> wrote:
> I think that link has been there for several years...
>
> Alex
>
> On 9 May 2015 at 02:05, Pine W <wiki.pine(a)gmail.com> wrote:
>
>> Hi EE and and analytics,
>>
>> I searched for Seattle on Facebook and am surprised to see that Facebook
>> says below the page lede, "From Wikipedia, the free enclopedia. Edit on
>> Wikipedia". I don't recall seeing an edit link there before. Does Analytics
>> have a way of tracking Wikipedia content views on FB and edits that were
>> from FB viewers? Are there any plans for FB readership, text editing or
>> image upload editor engagement work?
>>
>> Thanks,
>> Pine
>>
>> _______________________________________________
>> EE mailing list
>> EE(a)lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/ee
>>
>>
>
> _______________________________________________
> EE mailing list
> EE(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/ee
>
>
Hi EE and and analytics,
I searched for Seattle on Facebook and am surprised to see that Facebook
says below the page lede, "From Wikipedia, the free enclopedia. Edit on
Wikipedia". I don't recall seeing an edit link there before. Does Analytics
have a way of tracking Wikipedia content views on FB and edits that were
from FB viewers? Are there any plans for FB readership, text editing or
image upload editor engagement work?
Thanks,
Pine
Hi
analytics-store tmp space filled up today with many large temporary
tables (it was ~32G) from many slow research queries. Those had to be
killed, the database process restarted, and tmp space expanded.
It's back up now.
Sean
--
DBA @ WMF
We added a patch to include the build number in our user agent string. This
just allows us to get a more specific on the version (important in
differentiating beta builds from production builds)
Practically this means this:
“4.1.3" would become “4.1.3.96”
Are there any issues we should be aware of when it comes to Analytics? Any
impacts to data collection?
Thanks!
Gerrit Patch:
https://gerrit.wikimedia.org/r/#/c/209024/2
--
Corey Floyd
Software Engineer
Mobile Apps / iOS
Wikimedia Foundation
Cross-posting to research and analytics, too!
---------- Forwarded message ----------
From: Oliver Keyes <okeyes(a)wikimedia.org>
Date: 6 May 2015 at 13:11
Subject: Traffic to the portal from Zero providers
To: wikimedia-search(a)lists.wikimedia.org
Hey all,
(Throwing this to the public list, because transparency is Good)
I recently did a presentation on a traffic analysis to the Wikipedia
"home page" - www.wikipedia.org.[1]
One of the biggest visualisations, in impact terms, showed that a lot
of portal traffic - far more, proportionately, than traffic to
Wikipedia overall - is coming from India and Brazil.[2] One of the
hypotheses was that this could be Zero traffic.
I've done a basic analysis of the traffic, looking specifically at the
zero headers,[3] and this hypothesis turns out to be incorrect -
almost no zero traffic is hitting the portal. The traffic we're seeing
from Brazil and India is not zero-based.
This makes a lot of sense (the reason mobile traffic redirects to the
enwiki home page from the portal is the Zero extension, so presumably
this happens specifically to Zero traffic) but it does mean that our
null hypothesis - that this traffic is down to ISP-level or
device-level design choices and links - is more likely to be correct.
[1] http://ironholds.org/misc/homepage_presentation.html
[2] http://ironholds.org/misc/homepage_presentation.html#/11
[3] https://phabricator.wikimedia.org/T98076
--
Oliver Keyes
Research Analyst
Wikimedia Foundation
--
Oliver Keyes
Research Analyst
Wikimedia Foundation
Phew, ok, things did go wrong! We ran into a couple of bugs recently introduced in Yarn and in Hive that took us a while to find work arounds. Jobs are again flowing through the cluster. However, jobs have been lagging behind since they haven’t been able to run all day. They should eventually catch up. For now, the cluster is back open for business, but I’d appreciate if no one ran any heavy jobs until tomorrow.
Also, it is still possible we may run into other issues we haven’t yet seen, so I can’t guarantee that I won’t have to restart things again.
Anyway, aside from those hiccups. CDH 5.4.0 is now installed, Hive 1.1 and Spark 1.3.0 are now available, weeeeee!
-Ao
> On May 4, 2015, at 11:05, Andrew Otto <aotto(a)wikimedia.org> wrote:
>
> Hi all, as a reminder, I will be doing this upgrade today. Within the next hour I will turn off the Hadoop cluster. Please do not attempt to use it again until I notify you again.
>
> Thanks!
> -AO
>
>
>
>> On Apr 29, 2015, at 14:57, Robert West <west(a)cs.stanford.edu> wrote:
>>
>> All good!
>>
>> On Wed, Apr 29, 2015 at 11:35 AM, Aaron Halfaker
>> <ahalfaker(a)wikimedia.org> wrote:
>>> + the right research list (Andrew, remove wmfresearch@ from your contact
>>> list :P )
>>>
>>> All looks good to me. Thanks. :)
>>>
>>> On Wed, Apr 29, 2015 at 1:11 PM, Leila Zia <leila(a)wikimedia.org> wrote:
>>>>
>>>> FYI
>>>>
>>>> Ashwin, Bob, Ellery, I don't anticipate this having negative impact on our
>>>> workflow. If you see possible issues, please communicate with Andrew (cc-ing
>>>> me), or let me know and I communicate. Thanks!
>>>>
>>>>
>>>> ---------- Forwarded message ----------
>>>> From: Andrew Otto <aotto(a)wikimedia.org>
>>>> Date: Wed, Apr 29, 2015 at 11:05 AM
>>>> Subject: [wmfresearch] Hadoop Cluster Downtime
>>>> To: Operations Engineers <ops(a)lists.wikimedia.org>, "A mailing list for
>>>> the Analytics Team at WMF and everybody who has an interest in Wikipedia and
>>>> analytics." <analytics(a)lists.wikimedia.org>,
>>>> "wmfresearch(a)lists.wikimedia.org Research" <wmfresearch(a)lists.wikimedia.org>
>>>>
>>>>
>>>> Hi all!
>>>>
>>>> CDH 5.4 is out[1] and we’d like to upgrade. We are doing this now, rather
>>>> than later, because there is an important Parquet/Hive related bug that has
>>>> been fixed in this version[2]. This upgrade will include Spark 1.3, which
>>>> should at least make one researcher happy.
>>>>
>>>> To do this upgrade, I need to schedule some downtime for Hadoop. I’d like
>>>> to do this on Monday May 4th. I expect the upgrade to take me no more than
>>>> an hour or two, but just to be safe I’d like to schedule the downtime for
>>>> the whole day.
>>>>
>>>> If anyone has critical things that they absolutely have to run on Monday,
>>>> let me know now and I will find another day.
>>>>
>>>> Thanks!
>>>> -Ao
>>>>
>>>> [1]
>>>> http://blog.cloudera.com/blog/2015/04/cloudera-enterprise-5-4-is-released/
>>>> [2] https://issues.apache.org/jira/browse/HIVE-9482
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> wmfresearch mailing list
>>>> wmfresearch(a)lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/wmfresearch
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Research-Internal mailing list
>>>> Research-Internal(a)lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/research-internal
>>>>
>>>
>>>
>>> _______________________________________________
>>> Research-Internal mailing list
>>> Research-Internal(a)lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/research-internal
>>>
>>
>>
>>
>> --
>> Up for a little language game? -- http://www.unfun.me
>>
>> _______________________________________________
>> Ops mailing list
>> Ops(a)lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/ops
>
I'm planning to grab four cores of stat1002 overnight for a task for
Legal. Does anyone have any objections/anything they need?
--
Oliver Keyes
Research Analyst
Wikimedia Foundation
I am thrilled to announce our speaker lineup for this month’s research showcase <https://www.mediawiki.org/wiki/Analytics/Research_and_Data/Showcase#April_2…>.
Jeff Nickerson (Stevens Institute of Technology) will talk about remix and reuse in collaborative communities; Heather Ford (Oxford Internet Institute) will present an overview of the oral citations debate in the English Wikipedia.
The showcase will be recorded and publicly streamed at 11.30 PT on Thursday, April 30 (livestream link will follow). We’ll hold a discussion and take questions from remote attendees via the Wikimedia Research IRC channel (#wikimedia-research <http://webchat.freenode.net/?channels=wikimedia-research> on freenode) as usual.
Looking forward to seeing you there.
Dario
Creating, remixing, and planning in open online communities
Jeff Nickerson
Paradoxically, users in remixing communities don’t remix very much. But an analysis of one remix community, Thingiverse, shows that those who actively remix end up producing work that is in turn more likely to remixed. What does this suggest about Wikipedia editing? Wikipedia allows more types of contribution, because creating and editing pages are done in a planning context: plans are discussed on particular loci, including project talk pages. Plans on project talk pages lead to both creation and editing; some editors specialize in making article changes and others, who tend to have more experience, focus on planning rather than acting. Contributions can happen at the level of the article and also at a series of meta levels. Some patterns of behavior – with respect to creating versus editing and acting versus planning – are likely to lead to more sustained engagement and to higher quality work. Experiments are proposed to test these conjectures.
Authority, power and culture on Wikipedia: The oral citations debate
Heather Ford
In 2011, Wikimedia Foundation Advisory Board member, Achal Prabhala was funded by the WMF to run a project called 'People are knowledge' or the Oral citations project <https://meta.wikimedia.org/wiki/Research:Oral_Citations>. The goal of the project was to respond to the dearth of published material about topics of relevance to communities in the developing world and, although the majority of articles in languages other than English remain intact, the English editions of these articles have had their oral citations removed. I ask why this happened, what the policy implications are for oral citations generally, and what steps can be taken in the future to respond to the problem that this project (and more recent versions of it <https://meta.wikimedia.org/wiki/Research:Indigenous_Knowledge>) set out to solve. This talk comes out of an ethnographic project in which I have interviewed some of the actors involved in the original oral citations project, including the majority of editors of the surr <https://en.wikipedia.org/wiki/surr> article that I trace in a chapter of my PhD[1] <http://www.oii.ox.ac.uk/people/?id=286>.