This is research I did for The Wikinewsie Group newsletter. We've been
looking at ways to increase community participation and content output.
The method has been suggested as a possible roadmap to success in these
areas, so we wanted to explore the implications with a known case of this
being done. Given the nature of news reporting, this implications here are
really only relevant to other Wikinews projects, not other WMF projects.
Original copy available at
Wikinews Content Import Analysis
*An analysis of the impact of content import on community development and
traffic on Serbian Wikinews*
One of the current goals of members of The Wikinewsie Group is to increase
participation on all projects. Three members of the group’s provisional
board are active on English Wikinews. Behind the scenes, the English
Wikinews community also have this as a goal. One of the perennial solutions
suggested by outsiders to the project is to increase community
participation and content by importing content from other similarly
licensed news projects to English Wikinews. This essay seeks to look at a
case study from another language Wikinews project to determine the impact
on overall content creation and community participation to determine if the
case study of external content import offers a path towards community
content creation and new contributor
In 2009, Serbian Wikinews as a project, led by
created a bot that imported content from similarly licensed Serbian
language news sources with the content being published on the
import bot, and subsequent surpassing of English Wikinews in terms of
content production, was a point of pride for the project with the news
documented in *Balkan
the same time, English Wikinews was also debating doing something similar,
but this was ultimately rejected by the community because the articles
would not meet the project’s
then, the two projects have diverged greatly in terms of policies.
What has the impact been on both communities in terms of content creation
rates before and after October 2009, when Serbian Wikinews started
importing content from other news sources? For perspective, Serbian
Wikinews was created in July 2005 while English Wikinews was created in
importing bot on Serbian Wikinews was active until April 3, 2013 when it
was blocked by an administrator citing the contributors as an undesirable
way to write news.<http://meta.wikimedia.org/wiki/User:LauraHale/Wikinews_Content_Import_Analy…>
comparisons between the two projects will be based on the start date for
Serbian Wikinews unless otherwise stated.
purpose of this analysis is to understand what happened on Serbian Wikinews
in the context of content import, to see how this impacted contributor
participation, overall content creation and article traffic. Once
understood, this can serve as a potential guide for implementation or
non-implementation of similar solutions for increasing content and
One of the first ways to measure the raw impact of the content import on
Serbian Wikinews is to examine the slope for article creation before and
after the import. If the import was successful, the slope of the line for
article creation as a factor of time would be steeper. Using the average
number of articles published per day in a month, the slope of the line for
the period between May 2005 and June 2009 was 1.05 for Serbian Wikinews and
-4.02 for English Wikinews. For the period between October 2009 and March
2013, the slope of the line is -0.29 for Serbian Wikinews and -4.57 for
English Wikinews. The growth on Serbian Wikinews increased steadily from
its founding until the period where they added the importing bot. Following
that period, the project saw a decline in growth. This contrasts to English
Wikinews which saw a decline in growth over the whole period.
[image: Graph en v sr
This is visualized in the graph above which shows the average daily article
production for both projects. English Wikinews has been incrementally
declining in total daily articles produced. Serbian Wikinews in comparison
was slowly increasing its daily production prior to the external content
production. Since that time, community production has decreased and
rapidly. As one data point, this appears to suggest that the content import
potentially is a net negative for community content creation.
Another potential way to determine the impact of the content importing is
to look at the slope of the line for editors who have ever contributed to
the project before and after the content import. This gives a perspective
regarding the ability to attract new contributors. Using the total number
of registered users who have ever contributed by
the slope of the line for the period between May 2005 and June 2009 was
3.1451 for Serbian Wikinews and 0.0460 for English Wikinews. For the period
between October 2009 and March 2013, the slope of the line is -2.0084 for
Serbian Wikinews and 0.1814 for English Wikinews. When the bot import
activity was halved, the period between October 2010 and March 2013 still
does not show pre-content import levels of attracting new
a slope of 2.211. The rate of historical contributor growth can be seen on
the graph below. Serbian Wikinews’s rate of attracting new contributors was
worse after the content importing. The three month period where the bot was
turned off, the slope is 1.5, which suggests that turning it off alone did
not assist in attracting new contributors.
[image: Graph en v sr history
Another way of looking at contributors is to look at the total number of
active editors contributing to the main space, where articles are
published, on a monthly basis. The slope can be calculated to see the
relative increase or decrease based on that. The slope of the line for the
period between May 2005 and June 2009 was 1.2970 for Serbian Wikinews and
-0.1246 for English Wikinews. For the period between October 2009 and March
2013, the slope of the line is -1.4319 for Serbian Wikinews and -0.1108 for
English Wikinews. Prior to the introduction of the news importer, Serbian
Wikinews had more participants as it had more articles. In the period prior
to the introduction of the imported content, Serbian Wikinews saw a small
increase in the number of active contributors on a monthly basis. Following
the introduction of imported content, they saw a decline of contributors at
the same rate.
Bot contributions could possibly be seen as facilitating the ease of human
participation on a project, with the content import on Serbian Wikinews
largely being done by a bot. The correlation for the total number of human
versus bot edits to the main space was calculated to confirm the
possibility that this might be
data is not available for all months. This lack of monthly data accounts
for using different time periods than other data. For the period between
July 2005 and April 2009, the correlation between total bot edits and human
edits to the main space was 0.249. The correlation for the period between
October 2009 and April 2013, the period when content import was active, was
0.467 which suggests a small correlation between increased bot edits and
increased human edits. The correlation between bot edits and human edits
from October 2010 and April 2013 when bot imports was halved was -0.149,
which suggests a completely random relationship between human contributors
to the main space and bot edits to the main space. This suggests overall
there is no conclusive link that bot-related contributors impact
human-related contributors to a project.
One possible argument is that a community is not needed and human
production of original content is not required. The goal of projects is to
freely share knowledge: if this is knowledge that has been previously
published on another website and has a compatible license with the Wikinews
project and it can get traffic, a community is not necessarily a
requirement for the project to function. Traffic and imported content can
sustain the project.
With this thinking in mind, Serbian Wikinews´s model of content importing
could be argued as successful if it generated increased amounts of traffic
compared to periods when the import bot was not active and production rates
were lower. Using monthly traffic totals for Serbian
comparing that against daily production rates by month, the following graph
[image: Graph en v sr
Serbian Wikinews had a surge in traffic in the initial period before the
implementation of the content importer following all time traffic lows.
Following this, the graph suggests that this content import led to a rise
in traffic, but this was not sustained. In fact, a large spike in traffic
occurred after the content import was
in a number of periods actually appears higher than when the content import
was most active.<http://meta.wikimedia.org/wiki/User:LauraHale/Wikinews_Content_Import_Analy…>
data is, to a degree, also supported by correlation. The correlation
between the article traffic and total articles created by day from June
2008 to June 2009, the period before content import, was 0.72. This number
suggests a strong correlation between article production and traffic
totals. In the period between October 2009 and March 2013 when the import
bot was active, the correlation was 0.17. This suggests traffic relative to
article production was close to random. It is supported when the
correlation is found for the period between October 2009 and October 2010,
when the import bot was most active. In that period, the correlation was
-0.04, which suggests almost true randomness. In the period between October
2010 and March 2013, when the bot activity was halved, the correlation was
-0.47. This suggests that the greater the number of articles, the less
traffic Serbian Wikinews had. Serbian Wikinews did not benefit from
increased traffic during periods of increased article creation as a result
of content import.<http://meta.wikimedia.org/wiki/User:LauraHale/Wikinews_Content_Import_Analy…>
Serbian Wikinews has the largest archive of published news stories of any
news project. As of May 2013, their 75,000 articles account for 37% of all
content across all Wikinews projects. The next closest language project in
terms of content size is Polish Wikinews with about 25,000 articles and
English Wikinews with nearly 20,000 articles. The archived material could
be perceived as being useful by the wider community as a large archive of
historical news material. To determine this, the total monthly page views
was divided by the total number of articles on the project to determine the
relative access levels to these news stories as a historical archive. For
Serbian Wikinews, prior to the introduction of the content importer, the
project had an average of between 20 and 80 views per
the introduction of the content importing, the average monthly page views
per article drops to less than ten. As the graph below shows, this pattern
of per article drop off is consistent across English, Spanish and Polish
Wikinews , though for projects other than Serbian the percentage drop is
[image: Graph en v sr traffic es
The total average article views per month dropped for Serbian Wikinews and
it appears the project is not being used as a resource by Serbian speakers
to view historical news
On the whole, the data suggests that Serbian Wikinews did not benefit from
an increase in contributor written news stories, in creating a larger
editing community, or an increase in traffic as a result of new story
content import from other news reporting sites. This appears to be
something that the Serbian Wikinews community has recognised as problematic
when the importer was blocked from the community in April 2013. If other
considering content import, they should consider the lessons from Serbian
Wikinews to see if the outcomes achieved by the import match with the
project’s own goals.
While editor retention would be ideal to study, editor retention is
much harder to address without looking at the individual history of
contributors. The number of Serbian Wikinews contributors is small enough
to make this feasible. Some information can be gleaned by looking at
http://stats.wikimedia.org/wikinews/EN/TablesWikipediaSR.htm and this is
an area where further analysis may be useful.
Details of this are documented on English Wikinews at
Supporters at the time included Juliancolton, ShakataGaNai, and
Tempodivalse. Contributors opposing included Blood Red Sandman, Pi zero,
BrianMcNeil, and Bawolff. A record of some of this conversation can be
found at Wikinews:Water_cooler/miscellaneous/archives/2009/October#Articles
copied from VOA<http://en.wikinews.org/wiki/en:_Wikinews:Water_cooler/miscellaneous/archive…>.
The Serbian Wikinews bot was imported and beta tested on the project
starting on October 2009, with the announcement made on
The request to test the bot is found
Stats used for the dates and all the data for this analysis can be
found on or linked from
a history of the bot’s editing.
Most of this analysis assumes there are no other major changes to
either project that would lead to “unnatural” changes in community output
and participation. This is not true for English Wikinews, which underwent
major changes in reviewing. This led to the creation of a fork in September
2011, with the date mentioned at
The project closed and deleted in August 2012 according to English
Wikinews’s Signpost at Wikipedia:Wikipedia Signpost/2012-08-20/News and
There are also other independent variables present on English Wikinews that
may possibly accountfor downward editing trends. Some of these mirror
patterns on English Wikipedia, including moves towards making information
more neutral, verifiable and greater enforcement of copyright policy.
This number will always increase because it is not an average of who
has edited in a given month but historically how many people have ever
As a point of contrast, following the Open Globe fork from English
Wikinews, the slope for new editor growth on English Wikinews was higher
than both of the periods mentioned. It was 0.232.
This was done by adding the total number of editors to main space in
Data found at
which provides stats from June 2008 to May 2013.
The traffic averages in that periods suggest higher traffics, but this
is offset by the medians which suggest the opposite. The following tables
provide greater insight into traffic median and mode for these periods.
English, Spanish and Polish Wikinews traffic information is provided as a
basis for comparison.
PeriodMath / DatesMedian - SerbianMean - SerbianMedian - EnglishMean -
EnglishMedian - SpanishMean - SpanishMedian - PolishMean - PolishPrior
to content importJune 2008 - June 2009186,000.00183,384.625,700,000.00
2009 - March 2013274,500.00322,571.435,550,000.005,695,238.10665,500.00
746,690.48689,000.00719,571.43Post Open Globe forkSeptember 2012 - June
671,500.00696,000.00Content import halvedOctober 2010 - March 2013
667,500.00698,833.33Content turned offMarch 2013 - June 2013271,000.00
681,000.00Content import most activeOctober 2009 - October 2010256,000.00
As a point of reference, English Wikinews has a completely different
pattern than Serbian Wikinews.
English Wikinews articles created per day versus article views.
English Wikinews correlations generally suggest to a small degree, the
greater the content production, the more views, though the correlation for
the period between March 2013 and May 2013 suggests the opposite is true:
the less content produced, the greater the page views. Similar patterns
also hold relatively true for Spanish Wikinews. The period between
September 2012 and May 2013, which is the period after the closure of the
English Wikinews fork, has a correlation of 0.853. For Spanish Wikinews,
when compared directly to Serbian Wikinews, the pre-content import period
of June 2008 to June 2009, has the most randomness for the relationship
between daily content production and page views with a correlation of 0.23.
The relationship between per day article production and views on Spanish
Given the nature of SEO and the amount of traffic derived from Google,
it is possible that Google’s algorithm gave less value to Serbian Wikinews
articles that were copies from other sites. Serbian Wikinews also lacks a
visible Twitter and Facebook account. For English and Spanish Wikinews,
where Google may prefer the content because it is original and is more
likely to put results higher in searches, Google related traffic may be
more consistent overall. It is also possible that English and Spanish
Wikinews traffic may also be dependent on other variables such as type of
content, social media efforts, incoming links from sister projects, etc.
This number is based on dividing the total number of articles and the
total number of monthly page views. This number is likely not a true
reflection of actual views because page views includes all pages on a
project and, according to the stats page, contains bot generated traffic
totals which account for roughly 15% of all page views counted.
Some of this can possibly be explained by total language speakers.
Serbian is spoken by approximately 9.2 million people compared to 40
million Polish speakers and 500 million Spanish speakers. This cannot
completely account for all the differences. The June 2008 starting point is
80 views for Serbian Wikinews compared to 119 for Polish Wikinews and 314
for Spanish Wikinews. If traffic was based on relative population of
speakers, Serbian should have started at a lower average or Polish should
have started at a higher point: The two are too close, despite one language
having about 5 times as many speakers.
This research is not applicable to other Wikimedia projects, because
news is news. Once published, new stories are generally not refactored.
Instead, new news stories are published with updated information. This
implicitly differs from Wikipedia, Wiktionary and Wikivoyage, where
imported content could easily be refactored, changed and updated by the
community. Other research would need to be done to determine the success of
content import on community and traffic on other sister projects.
Analysis conducted by LauraHale<http://meta.wikimedia.org/wiki/User:LauraHale>.
Raw data used for analysis available upon request.
At the recent gendergap strategy retreat the issue of racial
demographics was briefly brought up but no one had any numbers on it, so
we didn't know if it was an actual issue or not. Anecdotal evidence
suggests there is also a "racial gap" among editors, but it would be
nice to have some numbers on this to facilitate discussion. I did some
digging and the only statistics I could find were about the racial
demographics of American readers
It seems that none of our editor surveys have asked about race, although
we've asked almost every other demographic question imaginable.
Does anyone know of any research or statistics related to the racial
demographics of Wikipedia editors?
If not, should we consider doing a micro-survey as was done for gender
We've been working on tracking down the top 25 articles for each week, but as you can see
it requires determining which rankings are due to actual human views and which are due to bots, and recently, the bots have been having a field day. I've been asked by the creator of the list to ask you for help and/or advice on how to use analytics to separate human from non-human views. Please let me know if there's anything that can be done.
This sprint we delivered 43 points, for the coming sprint we have scheduled
Apologies for cross-posting; ideally you should receive this on the
Analytics Mailinglist so we can have one focal point for conversation. If
you are not on the Analytics list then please subscribe at
## Defects & Features completed (Ready for Showcase/Shipping/Done) during
Sprint ending 2013-07-10 ##
719 D - Cleanup duplicate loglines (Data quality for Wikipedia Zero) Amit
Kapoor - Wikipedia Zero
786 F - X-CS pageview counts for June 2013 Part II Amit Kapoor - Wikipedia
781 F - Monthly Reportcard May 2013 Erik Moeller - Executive Office
766 F - Zero Dashboard for Pageviews Metrics Amit Kapoor - Wikipedia Zero
705 F - Port over of Bytes_added Dario Taraborelli - Product (E3)
133 I - Puppetize JMXTrans Diederik van Liere - Analytics
348 F - Dump stats: update edit/revert stats monthly Howie Fung - Product
696 F - Porting over the request logic Jessie Wild - Learning & Evaluation
469 F - Comparison of different pageview definitions Tomasz Finc - Product
(Mobile Web & Apps)
738 D - Cronjob fails new mobile pageview report Diederik van Liere -
793 F - report grouping Wikimedia projects/languages along two axes:
growing/stagnant/declining, small/medium/large communities. Howie Fung -
## Current Sprint (ending 2013-07-24) ##
Stories in progress from last sprint:
#700 F - Port static cohort user interaction (3) requested by E3/Grantmaking
#716 F - Debianization of dClass-dev and dClass-dev-jni (5) requested by Ops
#745 I - Puppetize Hive (server side) requested by Ops
#793 F - Wiki growth requested by Erik Moeller
#578 F - Job Management requested by Grantmaking & Product
#753 I - Configuration and Deployment to Labs requested by Grantmaking &
#794 F - Do new projects bring in new editors requested by Erik Moeller
#798 F - Setup X-CS sampled Wikipedia Zero reporting requested by Amit
(Number in parentheses) = estimate of complexity
N/E = not estimated;
F = Feature
D = Defect
I = Infrastructure Task
S = Spike
Any mingle card can be accessed using the base url
https://mingle.corp.wikimedia.org/projects/analytics/cards/XYZ where XYZ is
the Mingle card id.
If you have any questions, comments or feedback: please let us know!
I've put European mobile traffic on the esams mobile servers during the last 2 hours.
As far as I can tell from the stats & logs, it's working just fine.
The Ganglia graphs are here:
cp3013 and cp3014 will be added to these two in the following weeks; they're currently unreachable due to a management interface misconfiguration.
Should there be any problems, the previous situation can easily be restored through normal GeoDNS configuration; a quick way to do that is to copy /etc/powerdns/scenarios/esams-down/m.wikimedia.org to /etc/powerdns/scenarios/normal/m.wikimedia.org and running 'authdns-update', on dobson.
Mark Bergsma <mark(a)wikimedia.org>
Lead Operations Architect
Hi Quim, hi analytics team,
Tomorrow afternoon I will start my holidays until 1st August. In Bitergia we have
tried to complete our first iteration with Wikimedia dashboard.
Right now we have:
* Wikimedia dashboard in your servers: http://korma.wmflabs.org/
* Updated each day by Automator with fresh data
* The dashboard has been checked by you and all issues detected in this
first iteration solved except the 365 days index page. I have some
proposal pending to talk with you about this index page.
* SCM and ITS data sources for Mediawiki are covered. For MLS data
source we are right now downloading the suggested lists to be covered. Basic
Gerrit support is also included, but this support with the IRC one is
for the next iteration.
In order to cover all things in the contract we need to finish IRC and
But as you know, we are pretty interested in working with Wikimedia
Foundation so we can define some areas of common interest to work in:
* Personal pages: we are already working on them but we need to make
them richer and with more data about the activity of the person in the
project data sources. This will be the first are in which we will work
* KPI: it is pretty important to work together detecting the Key
Performance Indicators for Wikimedia Foundation about the development of
its technologies. Time to fix issues? Time to merge revision? Time to
attention for an issue? Time to answer messages? New people arriving
each day/week/month to the different data sources?
I am not sure about the best way to discuss this KPIs. Some of them are
already in Community Metrics.
Maybe we can discuss about:
and try to find the best metric for each of this questions.
* Extending the analysis to all Wikimedia technologies: in the contract
we have covered MediaWiki and MediaWiki Extensions, but I think you want
to cover the rest of technologies in Wikimedia Foundation.
* MediaWiki Extension to use VizGrimoire inside MediaWiki: this is the
GRANT we are working.
Any other areas you have in mind?
|\_____/| Alvaro del Castillo
[o] [o] acs(a)bitergia.com - CTO, Software Engineer
| V | http://www.bitergia.com
|\_____/| Alvaro del Castillo
[o] [o] acs(a)bitergia.com - CTO, Software Engineer
| V | http://www.bitergia.com
Just kidding! July 15th.
On Fri, Jul 12, 2013 at 12:40 PM, EventLogging Alerts <
> Hi folks,
> There will be a brief EventLogging outage on Monday, July 8, starting at
> 20:00 UTC, and lasting hopefully less than an hour. Faidon and I will be
> rolling out an update to the EventLogging Puppet module which changes the
> way processes are initialized and managed. I will follow up with another
> e-mail when we're done to log the precise downtime.
> Please let me know if this doesn't work for you, in which case I'll try to
> EventLogging-Alerts mailing list
just a quick heads up that (as discussed with Diederik) Kraken
received its gerrit repository at
We'll keep the old github Kraken repository around at
and will start to replicate changes from gerrit to github.
If you used to push to github, please upload to the new gerrit
repository directly instead.
Have fun and best regards,
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Gruendbergstrasze 65a Email: christian(a)quelltextlich.at
4040 Linz, Austria Phone: +43 732 / 26 95 63
Fax: +43 732 / 26 95 63
There will be a brief EventLogging outage on Monday, July 8, starting at
20:00 UTC, and lasting hopefully less than an hour. Faidon and I will be
rolling out an update to the EventLogging Puppet module which changes the
way processes are initialized and managed. I will follow up with another
e-mail when we're done to log the precise downtime.
Please let me know if this doesn't work for you, in which case I'll try to