Pursuant to prior discussions about the need for a research
policy on Wikipedia, WikiProject Research is drafting a
policy regarding the recruitment of Wikipedia users to
participate in studies.
At this time, we have a proposed policy, and an accompanying
group that would facilitate recruitment of subjects in much
the same way that the Bot Approvals Group approves bots.
The policy proposal can be found at:
http://en.wikipedia.org/wiki/Wikipedia:Research
The Subject Recruitment Approvals Group mentioned in the proposal
is being described at:
http://en.wikipedia.org/wiki/Wikipedia:Subject_Recruitment_Approvals_Group
Before we move forward with seeking approval from the Wikipedia
community, we would like additional input about the proposal,
and would welcome additional help improving it.
Also, please consider participating in WikiProject Research at:
http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Research
--
Bryan Song
GroupLens Research
University of Minnesota
I've been looking to experiment with node.js lately and created a
little toy webapp that displays updates from the major language
wikipedias in real time:
http://wikistream.inkdroid.org
Perhaps like you, I've often tried to convey to folks in the GLAM
sector (Galleries, Libraries, Archives and Museums) just how much
Wikipedia is actively edited. GLAM institutions are increasingly
interested in "digital curation" and I've sometimes displayed the IRC
activity at workshops to demonstrate the sheer number of people (and
bots) that are actively engaged in improving the content there...with
the hopes of making the Wikipedia platform part of their curation
strategy.
Anyhow, I'd be interested in any feedback you might have about wikistream.
//Ed
Hi. I forward this e-mail, I hope there are people interested on this map.
---------- Forwarded message ----------
From: emijrp <emijrp(a)gmail.com>
Date: 2011/6/11
Subject: Wikis around Europe!
To: wikiteam-discuss(a)googlegroups.com
Hi all;
A friend of mine has sent me this link about wikis (locapedias) around
Europe.[1] I'm very surprised about the huge amount of wikis available.
Time to archive all of them.[2] I have been working on Spanish ones. If you
want to help archiving one country, please, reply to this message to
coordinate. If not, I will try to archive entire Europe!
Regards,
emijrp
[1]
http://maps.google.com/maps/ms?ie=UTF8&t=h&msa=0&msid=115570622864617231547…
[2] http://code.google.com/p/wikiteam/
As a Commons admin I've thought a lot about the problem of
distributing Commons dumps. As for distribution, I believe BitTorrent
is absolutely the way to go, but the Torrent will require a small
network of dedicated permaseeds (servers that seed indefinitely).
These can easily be set up at low cost on Amazon EC2 "small" instances
- the disk storage for the archives is free, since small instances
include a large (~120 GB) ephemeral storage volume at no additional
cost, and the cost of bandwidth can be controlled by configuring the
BitTorrent client with either a bandwidth throttle or a transfer cap
(or both). In fact, I think all Wikimedia dumps should be available
through such a distribution solution, just as all Linux installation
media are today.
Additionally, it will be necessary to construct (and maintain) useful
subsets of Commons media, such as "all media used on the English
Wikipedia", or "thumbnails of all images on Wikimedia Commons", of
particular interest to certain content reusers, since the full set is
far too large to be of interest to most reusers. It's on this latter
point that I want your feedback: what useful subsets of Wikimedia
Commons does the research community want? Thanks for your feedback.
--=20
Derrick Coetzee
User:Dcoetzee, English Wikipedia and Wikimedia Commons administrator
http://www.eecs.berkeley.edu/~dcoetzee/
On Mon, Jun 27, 2011 at 6:49 AM,
<wiki-research-l-request(a)lists.wikimedia.org> wrote:
> Date: Mon, 27 Jun 2011 06:18:31 -0400
> From: Samuel Klein <sjklein(a)hcs.harvard.edu>
> Subject: Re: [Wiki-research-l] Wikipedia dumps downloader
>
> Thank you, Emijrp!
>
> What about the dump of Commons images? =A0 [for those with 10TB to spare]
>
> SJ
>
> On Sun, Jun 26, 2011 at 8:53 AM, emijrp <emijrp(a)gmail.com> wrote:
>> Hi all;
>>
>> Can you imagine a day when Wikipedia is added to this list?[1]
>>
>> WikiTeam have developed a script[2] to download all the Wikipedia dumps =
(and
>> her sister projects) from dumps.wikimedia.org. It sorts in folders and
>> checks md5sum. It only works on Linux (it uses wget).
>>
>> You will need about 100GB to download all the 7z files.
>>
>> Save our memory.
>>
>> Regards,
>> emijrp
>>
>> [1] http://en.wikipedia.org/wiki/Destruction_of_libraries
>> [2]
>> http://code.google.com/p/wikiteam/source/browse/trunk/wikipediadownloade=
r.py
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> Wiki-research-l(a)lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>>
>
>
>
> --
> Samuel Klein ? ? ? ? ?identi.ca:sj ? ? ? ? ? w:user:sj ? ? ? ? ?+1 617 52=
9 4266
>
>
>
> ------------------------------
>
> Message: 5
> Date: Mon, 27 Jun 2011 13:07:51 +0200
> From: emijrp <emijrp(a)gmail.com>
> Subject: Re: [Wiki-research-l] [Xmldatadumps-l] Wikipedia dumps
> =A0 =A0 =A0 =A0downloader
> To: Richard Farmbrough <richard(a)farmbrough.co.uk>
> Cc: xmldatadumps-l(a)lists.wikimedia.org,
> =A0 =A0 =A0 =A0wikiteam-discuss(a)googlegroups.com, =A0 =A0 =A0Wikimedia Fo=
undation Mailing List
> =A0 =A0 =A0 =A0<foundation-l(a)lists.wikimedia.org>, =A0 =A0 Research into =
Wikimedia content
> =A0 =A0 =A0 =A0and communities <wiki-research-l(a)lists.wikimedia.org>
> Message-ID: <BANLkTim9bTwCb75qOE4Cm935SK+3SSh35Q(a)mail.gmail.com>
> Content-Type: text/plain; charset=3D"iso-8859-1"
>
> Hi Richard;
>
> Yes, a distributed project would be probably the best solution, but it is
> not easy to develop, unless you use a library like bittorrent, or similar
> and you have many peers. Althought most of the people don't seed the file=
s
> long time, so sometimes is better to depend on a few committed persons th=
an
> a big but ephemeral crowd.
>
> Regards,
> emijrp
>
> 2011/6/26 Richard Farmbrough <richard(a)farmbrough.co.uk>
>
>> **
>> It would be useful to have =A0an archive of archives. =A0I have to delet=
e my
>> old data dumps as time passes, for space reasons, however a team could,
>> between them, maintain multiple copies of every data dump. This would ma=
ke a
>> nice distributed project.
>>
>> On 26/06/2011 13:53, emijrp wrote:
>>
>> Hi all;
>>
>> Can you imagine a day when Wikipedia is added to this list?[1]
>>
>> WikiTeam have developed a script[2] to download all the Wikipedia dumps
>> (and her sister projects) from dumps.wikimedia.org. It sorts in folders
>> and checks md5sum. It only works on Linux (it uses wget).
>>
>> You will need about 100GB to download all the 7z files.
>>
>> Save our memory.
>>
>> Regards,
>> emijrp
>>
>> [1] http://en.wikipedia.org/wiki/Destruction_of_libraries
>> [2]
>> http://code.google.com/p/wikiteam/source/browse/trunk/wikipediadownloade=
r.py
>>
>>
>> _______________________________________________
>> Xmldatadumps-l mailing listXmldatadumps-l@lists.wikimedia.orghttps://lis=
ts.wikimedia.org/mailman/listinfo/xmldatadumps-l
>>
>>
>>
>
2011/6/28 Platonides <platonides(a)gmail.com>
> emijrp wrote:
>
>> Hi;
>>
>> @Derrick: I don't trust Amazon.
>>
>
> I disagree. Note that we only need them to keep a redundant copy of a file.
> If they tried to tamper the file we could detect it with the hashes (which
> should be properly secured, that's no problem).
>
>
I didn't mean security problems. I meant just deleted files by weird terms
of service. Commons hosts a lot of images which can be problematic, like
nudes or copyrighted materials in some jurisdictions. They can deleted what
they want and close every account they want, and we will lost the backups.
Period.
And we don't only need to keep a copy of every file. We need several copies
everywhere, not only in the Amazon coolcloud.
> I'd like having the hashes for the xml dumps content instead of the
> compressed one, though, so it could be easily stored with better compression
> without weakening the integrity check.
>
>
> Really, I don't trust Wikimedia
>> Foundation either. They can't and/or they don't want to provide image
>> dumps (what is worst?).
>>
>
> Wikimedia Foundation has provided image dumps several times in the past,
> and also rsync3 access to some individuals so that they could clone it.
>
Ah, OK, that is enough (?). Then, you are OK with old-and-broken XML dumps,
because people can slurp all the pages using an API scrapper.
> It's like the enwiki history dump. An image dump is complex, and even less
> useful.
>
>
It is not complex, just resources consuming. If they need to buy another 10
TB of space and more CPU, they can. $16M were donated last year. They just
need to put resources in relevant stuff. WMF always says "we host the 5th
website in the world", I say that they need to act like that.
Less useful? I hope they don't need such a useless dump for recovering
images, just like happened in the past.
>
> Community donates images to Commons, community
>> donates money every year, and now community needs to develop a software
>> to extract all the images and packed them,
>>
>
> There's no *need* for that. In fact, such script would be trivial from the
> toolserver.
Ah, OK, only people with toolserver account may have access to an image
dump. And you say it is trivial from Toolserver and very complex from
Wikimedia main servers.
and of course, host them in a permanent way. Crazy, right?
>>
>
> WMF also tries hard to not lose images.
I hope that, but we remember a case of lost images.
> We want to provide some redundance on our own. That's perfectly fine, but
> it's not a requirement.
That _is_ a requirement. We can't trust Wikimedia Foundation. They lost
images. They have problems to generate English Wikipedia dumps and image
dumps. They had a hardware failure some months ago in the RAID which hosts
the XML dumps, and they didn't offer those dumps during months, while trying
to fix the crash.
> Consider that WMF could be automatically deleting page history older than a
> month,
or images not used on any article. *That* would be a real problem.
>
>
You just don't understand how dangerous is the current status (and it was
worst in the past).
>
> @Milos: Instead of spliting image dump using the first letter of
>> filenames, I thought about spliting using the upload date (YYYY-MM-DD).
>> So, first chunks (2005-01-01) will be tiny, and recent ones of several
>> GB (a single day).
>>
>> Regards,
>> emijrp
>>
>
> I like that idea since it means the dumps are static. They could be placed
> in tape inside a safe and not needed to be taken out unless data loss
> arises.
>
Everyone,
I wanted to let everyone know that we just announced the first of two data
challenges this year. The Wikipedia Participation Challenge is a data
modeling competition where contestants are tasked with developing an
algorithm that predicts future editing activity of editors on the English
Wikipedia. More details may be found on the blog post:
http://blog.wikimedia.org/2011/06/28/data-competition-announcing-the-wikipe…
We’re very excited about this competition. The competition starts today and
runs until September 20, 2011. More information on the competition may be
found here:
http://www.kaggle.com/c/wikichallenge
(Kaggle is a company that hosts online data competitions. They are helping
us manage this competition on a pro-bono basis).
Just like Wikipedia, it’s open to anyone who wants to participate, so please
spread the word!
Howie
Workshop: Wikipedia & Research: The innovative character of Wikipedia research and the new challenges (and opportunities) associated with it
Workshop at the Open Knowledge Conference: June 30th, at 14:00 in Workshop, Kalkscheune, Johannisstr. 2, 10117 Berlin, Germany
Further information: http://okcon.org/2011/programme/wikipedia-research-the-innovative-character…
Contact: mayo.fuster(at)eui.eu
In 2011, Wikipedia celebrated its tenth anniversary as one of the world’s ten most visited websites and as one of the more active communities on the web. Particularly since 2005, there has been an increasing interest within the scientific community in researching Wikipedia. A recent review of Wikipedia literature resulted in 2,100 peer-reviewed articles and 38 doctoral theses related to Wikipedia (http://en.wikipedia.org/wiki/Wikipedia:Academic_studies_of_Wikipedia). Quantitative analysis of large data sets and on the English version of Wikipedia was the predominant approach in early empirical research on Wikipedia.,The focus was then expanded to conducting research on other language versions, covering a larger variety of issues, such as socio-political questions, and also adopting qualitative methods. In conjunction, the research on Wikipedia constituted a substantial body of research in itself which allowed researchers (and communities) to better and critically understand Wikimedia projects functioning from a plurality of perspectives, and to advance our knowledge on issues that go beyond Wikipedia itself. Research in a sense (and under certain conditions) is becoming a way of contributing to the Wikimedia movement. Furthermore, the community of (more or less committed) researchers on Wikipedia is growing, together with the willingness to collaborate, the synergy between research initiatives of various kinds, and the willingness to continue innovating (in what is already constituting one of the leading node of methodological innovation); a Wikimedia research “informational common” is growing, as it also increases the promotion of research from the Wikimedia Foundation (such as with the creation of the Research Committee) and Wikimedia chapters (such as the performance of surveys by Amical Viquipedia or the German Wikimedia participation in the Render project).
But new problems have also emerged, such as information overload, the lack of coordination between the various research efforts, and tensions between community members and certain researchers’ needs (for example on the question of subject recruitment, or on the publication policy of researchers and the need to maintain their positions in academia). In sum, Wikipedia research has increased substantially, and in the process has become an important area for experimentation and research innovation, but also faces new challenges associated with progression.
The workshop will focus on addressing the stage of Wikipedia research and in general common – based peer production (less focused on the content than on the methodologies and research process itself) and the innovations, problems and new insights regarding (action) research on common-based peer production. The workshop is organized in collaboration between the Research Committee of the Wikimedia Foundation, German Wikimedia and Amical Viquipedia (Catalan Wikimedia). It will consist of a set of brief presentations (including Mayo Fuster Morell member Research Committee of the Wikimedia Foundation and Amical Viquipedia, Daniel Mietchen members Research Committee of the Wikimedia Foundation, Mathias Schindler from Wikimedia German and the Render project, and Mako Benjamin Hill Wikimedia Foundation Advisory Board, among others) and “networking” discussions towards action.
Bio presenters:
Mayo Fuster Morell is currently a postdoctoral researcher at the Institute of Govern and Public Policies (Autonomous University of Barcelona) and visiting scholar at the Internet Interdisciplinary Institute (Open University of Catalonia). She has been appointed Berkman Center of Internet & Society fellow for the academic year 2011-2012. She collaborates in research projects on Wikimedia/pedia with Science Po and Barcelona Media. She is member of the research committee of the Wikimedia Foundation and the Association Amical Viquipedia (User: Lilaroja). She is promotor of the international forum of collaborative communities for the building of digital commons. She was co-founder of the International Forum on Free Culture and organized its first two editions (2009 & 2010). Additionally, she promoted the Networked Politics collaborative research and developed techno-political tools within the frame of the World Social Forum. She did her PhD thesis at the European University Institute on “The governance of online creation communities: Provision of infrastructure for the building of digital commons”. She co-wrote the books Rethinking Political Organisation in an Age of Movements and Networks (2007), Activist Research and Social Movements (in Spanish, 2005), and Guide for Social Transformation of Catalonia (in Catalan, 2003).
Daniel Mietchen (User:Mietchen) is a biophysicist by training and currently a postdoc in brain morphometry at the University of Jena, Germany. He has a general interest in integrating collaborative activities in wikis and similar environments with scholarly workflows in the framework of open science, particularly with original research, encyclopaedic knowledge, open access publishing, reputation systems and scientific networking as well as teaching and outreach. His home wikis are Citizendium and OpenWetWare, and he also contributes to a number of other wiki communities, including several Wikimedia wikis, Encyclopedia of Earth, Scholarpedia and WikiEducator.
Mathias Schindler co-founded Wikimedia Deutschland e.V. He is member of the Communication Committee of the Wikimedia Foundation and project manager in the German chapter. After studying in Frankfurt/Main, Germany he worked at the German National Library at the office for authority files. He was co-organizer of the Social Web and Knowledge Management Workshop SWKM 2008 in Beijing, China, co-located with the WWW conference. He was on the organization committee for the WikiMania conference in 2005, 2007 and 2009. His research interests include Wikipedia-style massive collaboration and bibliographic metadata.
Benjamin Mako Hill (born December 2, 1980) is a Debian hacker, intellectual property researcher, activist and author. He is a contributor and free software developer as part of the Debian and Ubuntu projects as well as the author of two best-selling technical books on the subject, Debian GNU/Linux 3.1 Bible (ISBN 978-0-7645-7644-7) and The Official Ubuntu Book (ISBN 978-0-13-243594-9). He currently serves as a member of the Free Software Foundation board of directors.[2] Hill has a Masters degree from the MIT Media Lab and is currently a Senior Researcher at the MIT Sloan School of Management where he studies free software communities and business models. He is also a Fellow at the MIT Center for Future Civic Media where he coordinates the development of software for civic organizing, and works as an advisor and contractor for the One Laptop per Child project. He is a speaker for the GNU Project,[3] and serves on the board of Software Freedom International (the organization that organizes Software Freedom Day).
«·´`·.(*·.¸(`·.¸ ¸.·´)¸.·*).·´`·»
«·´¨*·¸¸« Mayo Fuster Morell ».¸.·*¨`·»
«·´`·.(¸.·´(¸.·* *·.¸)`·.¸).·´`·»
Research Digital Commons Governance: http://www.onlinecreation.info
Ph.D European University Institute
Postdoctoral Researcher. Institute of Govern and Public Policies. Autonomous University of Barcelona.
Visiting scholar. Internet Interdisciplinary Institute. Open University of Catalonia (UOC).
Visiting researcher (2008). School of information. University of California, Berkeley.
Member Research Committee. Wikimedia Foundation
http://www.onlinecreation.info
E-mail: mayo.fuster(a)eui.eu
Skype: mayoneti
Phone Spanish State: 0034-648877748
Can you share your script with us?
2011/6/27 Platonides <platonides(a)gmail.com>
> emijrp wrote:
>
>> Hi SJ;
>>
>> You know that that is an old item in our TODO list ; )
>>
>> I heard that Platonides developed a script for that task long time ago.
>>
>> Platonides, are you there?
>>
>> Regards,
>> emijrp
>>
>
> Yes, I am. :)
>
>