The folks on Meta are considering whether or not to enable WikiLove and
they were hoping to find some data about it. There is a research project on
Meta about WikiLove (https://meta.wikimedia.org/wiki/Research:WikiLove),
but it seems to have been "in progress" since 2011. Could someone in
Analytics update that page to indicate that it is no longer in progress (or
finish whatever piece was still ongoing)?
It would also be great if someone from Analytics could respond to the
questions and comments about the research data at
https://meta.wikimedia.org/wiki/WikiLove#Support_for_another_discussion_abo…
.
Thanks!
Hello,
I am an economist working on a research project analyzing contribution
behavior on Wikipedia, and am interested in computing the fraction of
individuals who use the site and make contributions. I have found the data
on daily contributions, and am now looking for data on the number of
individuals who use Wikipedia. I believe I have found the data I am looking
for here:
https://stats.wikimedia.org/EN/TablesUsageVisits.htm
Unfortunately, the data contained in that file only covers the period from
August 2002 through October 2004. Does a similar database exist for later
time periods? Any information you can provide would be greatly
appreciated. Thank you for you time and I look forward to hearing from you.
Sincerely,
Nathan Marwell
Detailed technical report on an undergraduate student project at Virginia
Tech (work in progress) to import the entire English Wikipedia history dump
into the university's Hadoop cluster and index it using Apache Solr, to
"allow researchers and developers at Virginia Tech to benchmark
configurations and big data analytics software":
Steven Stulga, "English Wikipedia on Hadoop Cluster"
https://vtechworks.lib.vt.edu/handle/10919/70932 (CC BY 3.0)
IIRC this has rarely or never been attempted due to the large size of the
dataset - 10TB uncompressed. And it looks like the author here encountered
an out of memory error that he wasn't able to solve before the end of
term...
--
Tilman Bayer
Senior Analyst
Wikimedia Foundation
IRC (Freenode): HaeB
--
Sent from Gmail Mobile
Hi all!
For years now, y’all have been accessing the Analytics Hadoop Cluster using
stat1002. This works just fine, but others use stat1002 for number
crunching outside of Hadoop as well. At times stat1002 can get pretty
overloaded, which can make accessing Hadoop via this one box a little
annoying.
But fret no longer! stat1004 is here! stat1004 can now be accessed by
anyone in the analytics-privatedata-users and analytics-users groups. If
you previously had access to stat1002 AND used it to talk to Hive and
Hadoop, you may now also do this from stat1004. You don’t have to do
anything new to get access to stat1004 if you already had Hadoop accounts.
stat1002 will remain useable as is. If you are looking for a more
dedicated place from which to interact with Hadoop services, use stat1004
instead.
You don’t have to do anything to get access.
I’ve updated the wikitech documentation accordingly. Let us know if you
have any questions!
-Andrew
Hi all,
I hacked up a very quick count of the 2015 video viewing aggregate
figures, using the data that Bartosz put together last year - with the
caveat that the data only goes up to 10 December, but it's probably
indicative of whole-year trends. I haven't yet tried to merge in the
11-31/12 data. Nothing very insightful but I don't recall seeing it
done before, so it might be of interest!
http://www.generalist.org.uk/blog/2016/most-popular-videos-on-wikipedia/
The headline figure is that we had about three billion (!!)
video/audio plays during the year, and that some of the most popular
items are insanely popular - the most popular was viewed an average of
42,000 times a day, every day.
Pine: the video you asked about in the other thread was viewed 187,899
times from 31/10/15 to 10/12/15. So there's half your answer :-)
--
- Andrew Gray
andrew.gray(a)dunelm.org.uk
Hi all!
We just noticed a problem with the (old) version kafka-python client we are
using to produce EventLogging events to Kafka: it doesn’t handle creation
of new topics now that we’ve upgraded the Kafka cluster to 0.9.
This means that until we fix, events produced to new schemas will not be
saved. We will fix this ASAP (hopefully by tomorrow), but in the meantime,
don’t make new schemas! :) I will update again once we have this fixed.
Sorry for the trouble!
-Andrew
Thanks; Nielsen data can indeed be very useful, I asked about it earlier
because I'd love to have it again for Italy.
https://meta.wikimedia.org/w/index.php?title=Talk:ComScore/Announcement&old…
Nemo
Tilman Bayer, 11/05/2016 19:23:
> New study (US only) by the Knight Foundation:
> https://medium.com/mobile-first-news-how-people-use-smartphones-to ,
> summarized here:
> http://www.theatlantic.com/technology/archive/2016/05/people-love-wikipedia…
>
> "People spent more time on Wikipedia’s mobile site than any other news
> or information site in Knight’s analysis, about 13 minutes per month
> for the average visitor. CNN wasn’t too far behind, at 9 minutes 45
> seconds per month. BuzzFeed clocked in third at 9 minutes 21 seconds
> per month. (BuzzFeed, however, slays both CNN and Wikipedia in time
> spent with the sites’ apps, compared with mobile websites. BuzzFeed
> users devote more than 2 hours per month to its apps, compared with
> about 46 minutes among CNN app users and 31 minutes among Wikipedia
> app loyalists.)
>
> Another way to look at Wikipedia’s influence: Wikipedia reaches almost
> one-third of the total mobile population each month, according to
> Knight’s analysis, which used data from the audience-tracking firm
> Nielsen."
>
>
Forwarding because this may be of interest to Analytics subscribers as well.
Pine
---------- Forwarded message ----------
From: "Thomas Steiner" <tomac(a)google.com>
Date: May 2, 2016 01:18
Subject: [Wiki-research-l] [ANN] Wikipedia Tools for Google Spreadsheets
To: "Thomas Steiner" <tomac(a)google.com>
Cc: "public-lod(a)w3.org" <public-lod(a)w3.org>, "Semantic Web" <
semantic-web(a)w3.org>, "Discussion list for the Wikidata project." <
wikidata(a)lists.wikimedia.org>, "Research into Wikimedia content and
communities" <wiki-research-l(a)lists.wikimedia.org>
Esteemed Wikipedia, Wikidata, Linked Data, and Semantic Web communities[*],
===
tl;dr: Released a Google Spreadsheets add-on called Wikipedia Tools
[1] that makes working with data from Wikipedia and Wikidata a breeze.
===
I am happy to release a Google Spreadsheets add-on called Wikipedia
Tools [1]. This add-on allows you to work with data from Wikipedia and
Wikidata from within a spreadsheet context using custom formulas. Let
me motivate the tools with a short example:
You may have heard of Volkswagen's #DieselGate scandal. Is this still
a problem for Volkswagen—and if so, where? Google Trends to the
rescue? Maybe [2]. But what about global impact? How do people in
Korea, an important Volkswagen export market [citation needed😉],
refer to the scandal? Turns out they call it 폭스바겐 배기가스 조작 (among
probably other options).
With a custom function from Wikipedia Tools, we can safely "translate"
from one English (a language that, for the sake of this example, we
assume we dominate well enough) Wikipedia article to many other
languages (that we do not necessarily dominate):
=WIKITRANSLATE("en:Volkswagen_emissions_scandal")
bg Афера на Фолксваген
cs Dieselgate
de VW-Abgasskandal
[…]
zh 福斯集團汽車舞弊事件
Then, using Wikipedia page views as one (among others) reasonable
popularity indicator, for each of these language results, for example
for Korean, we can get =WIKIPAGEVIEWS("ko:폭스바겐 배기가스 조작") for the last
n days, and plot the results [3] (in practice, you would probably
still normalize by size and/or total views of the particular
Wikipedia[**]).
There are a lot more custom functions implemented than I could cover
in this short example. I have put together a slide deck [4] and paper
[5] that go into more detail if you are interested, a demo with all
functions is available at [6]. The add-on also has a built-in manual
(in Google Sheets, click Add-ons→Wikipedia Tools→Show documentation)
and its underlying code is open-source [7].
Please let me know in case of any open question, feature request, or
bug. Thanks!
Cheers,
Tom
--
[1] http://bit.ly/wikipedia-tools-add-on
[2]
http://www.google.com/trends/explore?hl=en-US&q=volkswagen+emissions+scanda…
[3]
https://docs.google.com/spreadsheets/d/1PyFq59iEeLWpPQrWDUyU8mlmQrb4GDv2QEl…
[4] bit.ly/wikipedia-tools-slides
[5] bit.ly/wikipedia-tools-paper (PDF)
[6]
https://docs.google.com/spreadsheets/d/1sVduZul787O-bRzuy0UKpRl7bkouxwaIOsx…
[7] https://github.com/tomayac/wikipedia-tools-for-google-spreadsheets/
[*] Cross-posted on purpose
(http://ruben.verborgh.org/blog/2014/01/31/apologies-for-cross-posting/),
please choose your reply options accordingly.
[**] This is a simple example for illustrative purposes, I do _not_
claim it is an accurate popularity prediction, nor do I mean to bash
Volkswagen.
--
Dr. Thomas Steiner, Employee (http://blog.tomayac.com,
https://twitter.com/tomayac)
Google Germany GmbH, ABC-Str. 19, 20354 Hamburg, Germany
Managing Directors: Matthew Scott Sucherman, Paul Terence Manicle
Registration office and registration number: Hamburg, HRB 86891
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.29 (GNU/Linux)
iFy0uwAntT0bE3xtRa5AfeCheCkthAtTh3reSabiGbl0ck0fjumBl3DCharaCTersAttH3b0ttom
hTtPs://xKcd.cOm/1181/
-----END PGP SIGNATURE-----
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hi!
I have been trying to gage the speed/efficiency of a database I have setup.
In order to test it, I have filled it with a lot of wikipedia articles from
a specific category (for example history). The database does multi-word
queries and returns the articles that best match the multiword query. For
example if I search up "history in Italy in the past 100 years" then the
best matching articles should pop up.
I was wondering if anyone has any advice how to form sample test queries to
model realistic situations/queries. I don't think it would be fair to do
random phrases (such as "banana the string") and wanted to model queries
based on my data to test performance and correctness of output. Does anyone
have any advice? How or Is this done at wikipedia?
I have looked here (
http://blog.wikimedia.org/2012/09/19/what-are-readers-looking-for-wikipedia…)
but the data has been down for a while.
Cheers,
Hello everybody,
does anybody know a way how to get the number of registered accounts for
specific dates without the criteria of the number of edits, just the
registered accounts, preferrably with publicly available data.
In case you are wondering why we want to know: We are currently gathering
statistics about the development of editor numbers and this is one of our
metrics.
Best
Verena
--
Verena Lindner
Projektmanagerin Know-how
Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Tel. (030) 219 158 26-0
http://wikimedia.de
Stellen Sie sich eine Welt vor, in der jeder Mensch an der Menge allen
Wissens frei teilhaben kann. Helfen Sie uns dabei!
http://spenden.wikimedia.de/
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.