[Wikimedia-l] Fwd: [Wmfcc-l] [press] Erik Zachte in Wired

Bishakha Datta bishakhadatta at gmail.com
Sat Jan 4 06:13:18 UTC 2014


About time.

The more recent Monthly Report Cards (http://reportcard.wmflabs.org/) which
organizes stats by region are extremely useful and I often use your
animated history of wikipedia to start presentations because it's so
beautiful.

Bravo!
Bishakha


On Sat, Jan 4, 2014 at 7:01 AM, Sue Gardner <sgardner at wikimedia.org> wrote:

> Just wanted to share this article, because it makes me so happy!
> Erik's one of our earliest contributors and *we've* all depended on
> his work for years, but it's mostly invisible to the world beyond
> Wikimedia. It makes me really happy to see him get some external
> recognition :-)
>
> Thanks,
> Sue
>
> ---------- Forwarded message ----------
> From: "Jay Walsh" <jwalsh at wikimedia.org>
> Date: 27 Dec 2013 12:20
> Subject: [Wmfcc-l] [press] Erik Z in Wired
> To: "Communications Committee" <wmfcc-l at lists.wikimedia.org>
> Cc:
>
> http://www.wired.com/wiredenterprise/2013/12/erik-zachte-wikistats/
>
> Meet the Stats Master Making Sense of Wikipedia’s Massive Data Trove
>
> BY ASHIK SIDDIQUE
> 12.27.13
> 9:30 AM
>
> Erik Zachte. Photo: Lane Hartwell/Wikimedia Foundation
>
> There are websites, and then there’s Wikipedia. The internet behemoth
> boasts 30 million articles written in more than 285 languages, tweaked
> by 70,000 active editors and viewed by 530 million visitors worldwide
> each month. As mountains of information go, it’s Everest. Teasing out
> trends from the open source encyclopedia’s archives is a task few
> would even attempt. Yet Erik Zachte did just that.
>
> Zachte used his statistical intuition to create “Wikistats,” an online
> statistics package that’s more than a trove of charts and graphs for
> data geeks. It’s the most direct measure yet of Wikipedia’s success in
> achieving its central objective: making the sum of all human knowledge
> available to everyone everywhere.
>
> “When I discovered Wikipedia I felt thrilled from the outset,” says
> Zachte, who was working as an IT guy at KLM Airlines in the early days
> of the Wiki revolution. Not content simply to edit articles, he joined
> the mailing lists in which a fervid network of volunteers debated how
> to increase the site’s functionality. As Wikipedia exploded in
> popularity, power users complained there was no consistent way to
> measure its growth in article count from the beginning.
>
> “In 2003 there was already an online page counter if I remember
> correctly, but not much else,” says Zachte. He realized it was
> possible to extract far more descriptive data from historical metadata
> in Wikipedia’s massive database dumps, copies of all raw content that
> available to anyone in XML format.
>
> He started crunching numbers and quickly became famous among fellow
> Wikiholics for developing Wikistats. The site’s monthly reports filled
> a valuable niche for descriptive metrics in the Wiki community, with
> measures like article count, number of editors, and edits per article
> that serve as proxy indicators of Wiki quality. Impressed by Zachte’s
> stat-fu, the nonprofit Wikimedia Foundation that supports the
> Wikipedia infrastructure made him its data analyst in 2008.
>
> Since then, Zachte’s figures – all of which are open source and in the
> public domain – have revealed ongoing challenges to the organization’s
> growth, as well as noteworthy trends.
>
> Wikistats data made it clear that a core of Wikipedians does an
> outsize portion of the editing. As of October, 4.7 million people have
> contributed to the English language Wikipedia, but just over 26,000
> people have made more than 1,000 edits. In fact, that relatively small
> group of people has made 73 percent of all edits. While a small core
> of very active editors has remained stable, a larger pool of active
> editors (those making at least five edits monthly) in all Wikipedia
> language editions peaked at 90,000 in 2007 and has dropped since. As
> of October, the count stands at 70,000.
>
> That has some worried that a shrinking community indicates declining
> quality and concerted efforts within the Wikimedia Foundation to boost
> editor engagement, which the organization considers one of the
> foremost indicators of Wikipedia’s success. In 2009, the organization
> launched an ambitious five-year strategic plan to drastically increase
> language and content diversity by encouraging internet users in the
> “Global South” – particularly the developing regions of Africa, Asia,
> the Middle East, and Latin America – to contribute. Wikistats metrics
> gauge its progress each month.
>
> “Many projects exist within WMF to influence editor influx and
> retention,” says Zachte, “but in the end Wikistats gives the final
> count: Are we on the right track?”
>
> The numbers show reason for measured optimism. While the largest and
> most densely populated language editions like English, German, French,
> and Japanese, have seen the number of active editors level off or even
> decline since about 2007, newer editor networks in highly populous
> languages like Chinese, Arabic, and Persian continue to grow. In
> addition, the global share of page edits is slowly shifting to
> populous countries in the southern hemisphere, some of which, like
> India and the Philippines, use and edit Wikipedia overwhelmingly in
> English.
>
> Zachte’s reports also reveal idiosyncratic patterns of activity in
> different languages.
>
> For example, some volunteer coders program bots to create article
> stubs in massive bursts, hoping other users will expand the articles
> over time. While bots can supplement the work of active editor
> networks, Wikistats summaries show that some language editions are
> populated almost entirely by bot-created stubs – like the Cebuano and
> Waray-Waray Wikipedias, which rocketed to almost one million articles
> this year despite tiny editor networks that are unlikely to fill in
> those blanks anytime soon.
>
> Zachte’s animation of growth for all Wikipedia language sites, which
> measures four aspects of each site: bubbles representing each language
> slide across an x-axis indicating their age and up a y-axis measuring
> their article count, expanding as their editor networks grow and
> changing color as average article size grows. Image: Erik Zachte
>
> The data also provide raw material for striking visualizations, which
> Zachte sometimes creates and posts on his blog, Infodisiac and
> compiles from other authors on Wikistats.
>
> For years, Zachte was the only staffer working on general metrics
> about Wikipedia, but today the Wikimedia Foundation now has many
> analysts and engineers crunching data. The organization is preparing
> to absorb Zachte’s work into a much more powerful data infrastructure.
>
> “The plan is to take the existing functionality of Wikistats and
> modernize it across the board,” says Toby Negrin, Wikimedia’s director
> of analytics. “Erik’s work is amazing, but we need to make the data
> more accessible and update it faster.”
>
> One recent update is a streamlined Monthly Report Card that tracks
> user engagement by language and geographical region, with customizable
> graphs measuring factors like unique visitors, page views, and editing
> activity over time. Other extensions will capture and analyze all
> Wikimedia traffic, and provide metrics for editor engagement projects
> like Wikipedia Zero, which gives users in developing countries free
> Wikipedia access on their mobile devices.
>
> Zachte embraces the changes. “Most of what I built will be phased out
> over the coming years,” he says. “I’m fine with that. All software has
> a limited lifespan.”
>
> Until the new infrastructure can take over, Zachte maintains the
> scripts that populate Wikistats reports while working from home in
> Leiden, the Netherlands. Occasionally, he works on analytic pet
> projects. His next idea focuses on measuring content diversity across
> different Wikipedia language editions.
>
> “In early years Wikipedia was often characterized as mostly geek
> content: physics and sci-fi,” he says. “People don’t do that anymore,
> but is our content really balanced now? Do we have similar depth of
> content for ballet or folk culture or fashion?”
>
> Most articles in larger Wikipedias are assigned multiple categories –
> for example, the English-language entry for Barack Obama lists 45. But
> users can assign a single article many different categories, and each
> category can have an unlimited number of parent categories. That makes
> it difficult to easily compare the number of articles in each category
> as an indicator of content diversity.
>
> Zachte’s idea is that comparing word frequencies within articles to
> word frequencies for all named categories in a language (the English
> Wikipedia has over 1 million, according to a 2012 estimate) can more
> effectively categorize articles, and create profiles of which topics
> receive more heavy coverage. He has written a proposal, but it’s still
> unclear how it fits into Wikimedia’s current budget. It might just be
> a hobby project – or, open source to the end, he concedes that someone
> else might as well scoop him.
>
> “Now I have given away the basic concept,” he says. “Someone can base
> her thesis on this, and beat me to it, which is fine. Science would
> progress faster if it did not thrive on secrecy.”
>
> Another Zachte animation visualizes all Wikipedia edits on a specific
> day in July 2011, on a world map in which 369,483 edits in multiple
> languages appear as geographically distributed bursts of color in a
> sped-up version of real time. Image: Erik Zachte
>
> Tags: Erik Zachte, Wikimedia Foundation, Wikipedia
>
> Post Comment |
>
> Comments |
>
> Permalink
>
>
> --
> Jay Walsh
> WikimediaFoundation.org
> blog.wikimedia.org
> @jansonw
>
> _______________________________________________
> Wmfcc-l mailing list
> Wmfcc-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wmfcc-l
>
> _______________________________________________
> Wikimedia-l mailing list
> Wikimedia-l at lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-request at lists.wikimedia.org?subject=unsubscribe>


More information about the Wikimedia-l mailing list