Hello!
Over the last few weeks, Yusuke Matsubara, Shawn Walker, Aaron Halfaker and
Fabian Kaelin (who are all Summer of Research fellows)[0] have worked hard
on a customized stream-based InputFormatReader that allows parsing of both
bz2 compressed and uncompressed files of the full Wikipedia dump (dump file
with the complete edit histories) using Hadoop. Prior to WikiHadoop and the
accompanying InputFormatReader it was not possible to use Hadoop to analyze
the full Wikipedia dump files (see the detailed tutorial / background for an
explanation why that was not possible).
This means:
1) We can now harness Hadoop's distributed computing capabilities in
analyzing the full dump files.
2) You can send either one or two revisions to a single mapper so it's
possible to diff two revisions and see what content has been addded /
removed.
3) You can exclude namespaces by supplying a regular expression.
4) We are using Hadoop's Streaming interface which means people can use this
InputFormat Reader using different languages such as Java, Python, Ruby and
PHP.
The source code is available at: https://github.com/whym/wikihadoop
A more detailed tutorial and installation guide is available at:
https://github.com/whym/wikihadoop/wiki
(Apologies for cross-posting to wikitech-l and wiki-research-l)
[0] http://blog.wikimedia.org/2011/06/01/summerofresearchannouncement/
Best,
Diederik
Greetings everyone,
Now that the the WMF summer research program in the Community Department has
come to a close, I wanted to point interested parties to the body of
findings we've produced.
We covered a lot of territory so to save you the trouble if you just want to
browse, we collected our most salient results into one wiki page.
- Relevant blog post here:
http://blog.wikimedia.org/2011/09/06/summer-research-findings/
- Summary of findings on Meta, with links to further documentation:
https://secure.wikimedia.org/wikipedia/meta/wiki/Research:Wikimedia_Summer_…
Next steps are twofold for this program:
1. We'll be working with the Global Development team and some volunteers
from the local community to extend these analyses to cover Portuguese
Wikipedia, specifically to support Global Dev's work in Brazil.
2. We're choosing and implementing a platform to release not just our
code, but the datasets we compiled over the summer. You'll hear more about
this soon, but we're taking our time in order to decide on a solution that
will work in the long term for sharing open data beyond the dumps.
Last but not least, if anyone would like to have a more in-depth discussion
about these findings and the research that produced them, I'm definitely
open to hosting an IRC office hours with some members of the team. Just let
me know if you're interested (on or offlist) and I'll set something up soon.
--
Steven Walling
Fellow at Wikimedia Foundation
wikimediafoundation.org
I'd love to see some expert opinion on the recent survey into Image filter.
Researchers might be able to get their hands on the raw data to make
sense of it all.
http://meta.wikimedia.org/wiki/Image_filter_referendum/Results/en
---------- Forwarded message ----------
From: John Vandenberg <jayvdb(a)gmail.com>
Date: Mon, Sep 5, 2011 at 9:21 AM
Subject: Re: [Wikiquote-l] Personal Image Filter results announced
To: foundation-l(a)lists.wikimedia.org
On Sun, Sep 4, 2011 at 2:33 PM, Philippe Beaudette
<pbeaudette(a)wikimedia.org> wrote:
>
>
> Ladies and Gentlemen,
>
> The committee running the vote on the features for the Personal Image Filter
> have released their interim report and vote count. You may see the results
> at http://meta.wikimedia.org/wiki/Image_filter_referendum/Results/en.
> Please note that the results are not final: although the vote count is, and
> has been finalized, the analysis of comments is ongoing.
Was this survey approved by the Research Committee?
If so, can they give us an opinion on the survey instrument used,
whether the survey population obtained is suitable, etc?
--
John Vandenberg
We are glad to announce the inaugural issue of the Wikimedia Research Newsletter [1], a new monthly survey of recent scholarly research about Wikimedia projects.
This is a joint project of the Signpost [2] and the Wikimedia Research Committee [3] and follows the publication of two research updates in the Signpost, see also last month's announcement on this list [4].
The first issue (which is simultaneously posted as a section of the Signpost and as a stand-alone article in the Wikimedia Research Index) includes 5 "in depth" reviews of papers published over the last few
months and a number of shorter notes for a total of 15 publications, covering both peer-reviewed research and results published in research blogs. It also includes a report from the Wikipedia research workshop
at OKCon 2011 and highlights from the Wikimedia Summer of Research program.
The following is the TOC of issue #1:
• 1 Edit wars and conflict metrics
• 2 The anatomy of a Wikipedia talk page
• 3 Wikipedians as "Janitors of Knowledge"
• 4 Use of Wikipedia among law students: a survey
• 5 Miscellaneous
• 6 Wikipedia research at OKCon 2011
• 7 Wikimedia Summer of Research
• 7.1 How New English Wikipedians Ask for Help
• 7.2 Who Edits Trending Articles on the English Wikipedia
• 7.3 The Workload of New Page Patrollers & Vandalfighters
• 8 References
We are planning to make the newsletter easy to syndicate and subscribe to. If you wish your research to be featured, a CFP or event you organized to be highlighted, or just join the team of contributors, head over to this page to find out how: [5] We hope to make this newsletter a favorite reading for our research community and we look forward to your feedback and contributions.
Dario Taraborelli, Tilman Bayer (HaeB)
on behalf of the WRN contributors
[1] http://meta.wikimedia.org/wiki/Research:Newsletter/2011-07-25
[2] http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost
[3] http://meta.wikimedia.org/wiki/Research:Committee
[4] http://lists.wikimedia.org/pipermail/wiki-research-l/2011-June/001552.html
[5] http://meta.wikimedia.org/wiki/Research:Newsletter
--
Dario Taraborelli, PhD
Senior Research Analyst
Wikimedia Foundation
http://wikimediafoundation.orghttp://nitens.org/taraborelli
Dear All
The following is the Call for Participation for the Open Government Data Camp 2011
* http://ogdcamp.org/cfp/
* What: The world's biggest open data event to date.
* When: 20-21st October, with satellite events from 17-26th October
* Where: Warsaw, Poland
* Web: http://www.ogdcamp.org
* Hashtag: #ogdcamp
# Submit your Proposal
This event and will bring together the international Open Government Data Community, so please: be bold! We encourage people to submit talks, workshops and satellite events that are visionary, extraordinary and even mindblowing! If you have something to say, propose or demonstrate that will ignite the imagination of the crowd.
Please submit a proposal via the link below:
* http://ogdcamp.org/programme/submit/
There are four main kinds of submission:
* Lightning presentations - 5 minutes
* Talks - 10-15 minutes
* Sessions - 2-4 hours
* Satellite events in the days surrounding the camp - 1/2 day / full day
## About
The camp has 4 key objectives:
* Build consensus – around core open data principles and values
* Build community – expand and strengthen international open data community
* Share ideas – on the future of open data and how we can do things better
* Make things – from starting projects, to making plans, to writing code
## Who is behind it?
Open Government Data Camp is run as collaborative partnership between key stakeholders in the open government data community around the world. Find out more at:
* http://ogdcamp.org/about/who/
# Registration
Registration is now open. Make sure you get one of the few early bird tickets at
* http://ogdcamp.org/register/
## Travel Bursaries
We have several travel bursaries available to support participants who could otherwise note afford to attend the camp, including:
* A European bursary for EU27 citizens and residents travelling from within Europe, provided by the European Commission.
* An international bursary.
* A US bursary, provided by the Sunlight Foundation.
Full details are available at:
* http://ogdcamp.org/bursaries/
## Stay in touch
If you're interested in talking with others interested in open government data around the world, you can introduce yourself on the open-government mailing list:
* http://lists.okfn.org/mailman/listinfo/open-government
You can follow developments on Twitter with the hashtag #ogdcamp.
If you have any questions for the organising team, please contact: info(a)ogdcamp.org
We are looking forward meeting you in Warsaw!
Daniel Dietrich for the OGDCamp organiser team.
--
Daniel Dietrich
The Open Knowledge Foundation
Promoting Open Knowledge in a Digital Age
www.okfn.org - www.opendefinition.org
Mail: daniel.dietrich(a)okfn.org
Mobil: +49 171 780 870 3
Twitter: @ddie
I want to specifically invite interested researchers to the Wikimedia
and MediaWiki hackathon happening 14-16 October in New Orleans,
Louisiana, USA.
http://www.mediawiki.org/wiki/NOLA_Hackathon
We're getting together a wide variety of contributors -- including
template, script, tool, extension, and gadget writers -- to participate,
give feedback, test, and hack with us. If you write software that uses
the MediaWiki API, or runs on the Toolserver, we want to chat and
collaborate with you.
At the event, MediaWiki developers and Wikimedia operations engineers
will be working on Wikimedia's gadgets/extensions/tools support,
authorization/authentication strategy, dev-ops virtualization, and
general training and hacking. And we'll improve and discuss the
Wikimedia Labs projects infrastructure and other stuff that makes it
easier for anyone to supercharge Wikimedia with awesomeness.
The event is open to anyone who wants to come and contribute, and is an
opportunity to spend time with senior MediaWiki developers & ops
engineers, write beautiful code, and learn about the latest
developments. We'll write code together, discuss the software, and hold
little workshops.
If you can make it to New Orleans, Louisiana, USA, 14-16 October 2011,
we'd love to have you. Please add your name to the attendees list:
http://www.mediawiki.org/wiki/NOLA_Hackathon#Attendees
(And please spread the word!)
Thanks and best wishes.
--
Sumana Harihareswara
Volunteer Development Coordinator
Wikimedia Foundation