Wiki-research-l June 2011

wiki-research-l@lists.wikimedia.org

34 participants
23 discussions

by song＠cs.umn.edu

Pursuant to prior discussions about the need for a research policy on Wikipedia, WikiProject Research is drafting a policy regarding the recruitment of Wikipedia users to participate in studies. At this time, we have a proposed policy, and an accompanying group that would facilitate recruitment of subjects in much the same way that the Bot Approvals Group approves bots. The policy proposal can be found at: http://en.wikipedia.org/wiki/Wikipedia:Research The Subject Recruitment Approvals Group mentioned in the proposal is being described at: http://en.wikipedia.org/wiki/Wikipedia:Subject_Recruitment_Approvals_Group Before we move forward with seeking approval from the Wikipedia community, we would like additional input about the proposal, and would welcome additional help improving it. Also, please consider participating in WikiProject Research at: http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Research -- Bryan Song GroupLens Research University of Minnesota

10 months, 3 weeks

Is use of diacritics common?

by Piotr Konieczny

I wonder if anybody has studied the use of diacritics on Wikipedia? It is my experience that they are commonly used, but some editors are challenging that (http://en.wikipedia.org/w/index.php?title=Wikipedia:Naming_conventions_(use…). I wonder if we have any data on that, or if somebody could create and run some form of query on the database (as I don't code, this is unfortunately beyond my capabilities). -- Piotr Konieczny PhD Candidate Dept of Sociology Uni of Pittsburgh http://pittsburgh.academia.edu/PiotrKonieczny/ http://en.wikipedia.org/wiki/User:Piotrus

11 years, 10 months

wikistream: displays wikipedia updates in realtime

by Ed Summers

I've been looking to experiment with node.js lately and created a little toy webapp that displays updates from the major language wikipedias in real time: http://wikistream.inkdroid.org Perhaps like you, I've often tried to convey to folks in the GLAM sector (Galleries, Libraries, Archives and Museums) just how much Wikipedia is actively edited. GLAM institutions are increasingly interested in "digital curation" and I've sometimes displayed the IRC activity at workshops to demonstrate the sheer number of people (and bots) that are actively engaged in improving the content there...with the hopes of making the Wikipedia platform part of their curation strategy. Anyhow, I'd be interested in any feedback you might have about wikistream. //Ed

12 years, 7 months

Fwd: Wikis around Europe!

by emijrp

Hi. I forward this e-mail, I hope there are people interested on this map. ---------- Forwarded message ---------- From: emijrp <emijrp(a)gmail.com> Date: 2011/6/11 Subject: Wikis around Europe! To: wikiteam-discuss(a)googlegroups.com Hi all; A friend of mine has sent me this link about wikis (locapedias) around Europe.[1] I'm very surprised about the huge amount of wikis available. Time to archive all of them.[2] I have been working on Spanish ones. If you want to help archiving one country, please, reply to this message to coordinate. If not, I will try to archive entire Europe! Regards, emijrp [1] http://maps.google.com/maps/ms?ie=UTF8&t=h&msa=0&msid=115570622864617231547… [2] http://code.google.com/p/wikiteam/

12 years, 11 months

Re: [Wiki-research-l] Wikipedia dumps downloader

by Derrick Coetzee

As a Commons admin I've thought a lot about the problem of distributing Commons dumps. As for distribution, I believe BitTorrent is absolutely the way to go, but the Torrent will require a small network of dedicated permaseeds (servers that seed indefinitely). These can easily be set up at low cost on Amazon EC2 "small" instances - the disk storage for the archives is free, since small instances include a large (~120 GB) ephemeral storage volume at no additional cost, and the cost of bandwidth can be controlled by configuring the BitTorrent client with either a bandwidth throttle or a transfer cap (or both). In fact, I think all Wikimedia dumps should be available through such a distribution solution, just as all Linux installation media are today. Additionally, it will be necessary to construct (and maintain) useful subsets of Commons media, such as "all media used on the English Wikipedia", or "thumbnails of all images on Wikimedia Commons", of particular interest to certain content reusers, since the full set is far too large to be of interest to most reusers. It's on this latter point that I want your feedback: what useful subsets of Wikimedia Commons does the research community want? Thanks for your feedback. --=20 Derrick Coetzee User:Dcoetzee, English Wikipedia and Wikimedia Commons administrator http://www.eecs.berkeley.edu/~dcoetzee/ On Mon, Jun 27, 2011 at 6:49 AM, <wiki-research-l-request(a)lists.wikimedia.org> wrote: > Date: Mon, 27 Jun 2011 06:18:31 -0400 > From: Samuel Klein <sjklein(a)hcs.harvard.edu> > Subject: Re: [Wiki-research-l] Wikipedia dumps downloader > > Thank you, Emijrp! > > What about the dump of Commons images? =A0 [for those with 10TB to spare] > > SJ > > On Sun, Jun 26, 2011 at 8:53 AM, emijrp <emijrp(a)gmail.com> wrote: >> Hi all; >> >> Can you imagine a day when Wikipedia is added to this list?[1] >> >> WikiTeam have developed a script[2] to download all the Wikipedia dumps = (and >> her sister projects) from dumps.wikimedia.org. It sorts in folders and >> checks md5sum. It only works on Linux (it uses wget). >> >> You will need about 100GB to download all the 7z files. >> >> Save our memory. >> >> Regards, >> emijrp >> >> [1] http://en.wikipedia.org/wiki/Destruction_of_libraries >> [2] >> http://code.google.com/p/wikiteam/source/browse/trunk/wikipediadownloade= r.py >> >> _______________________________________________ >> Wiki-research-l mailing list >> Wiki-research-l(a)lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >> >> > > > > -- > Samuel Klein ? ? ? ? ?identi.ca:sj ? ? ? ? ? w:user:sj ? ? ? ? ?+1 617 52= 9 4266 > > > > ------------------------------ > > Message: 5 > Date: Mon, 27 Jun 2011 13:07:51 +0200 > From: emijrp <emijrp(a)gmail.com> > Subject: Re: [Wiki-research-l] [Xmldatadumps-l] Wikipedia dumps > =A0 =A0 =A0 =A0downloader > To: Richard Farmbrough <richard(a)farmbrough.co.uk> > Cc: xmldatadumps-l(a)lists.wikimedia.org, > =A0 =A0 =A0 =A0wikiteam-discuss(a)googlegroups.com, =A0 =A0 =A0Wikimedia Fo= undation Mailing List > =A0 =A0 =A0 =A0<foundation-l(a)lists.wikimedia.org>, =A0 =A0 Research into = Wikimedia content > =A0 =A0 =A0 =A0and communities <wiki-research-l(a)lists.wikimedia.org> > Message-ID: <BANLkTim9bTwCb75qOE4Cm935SK+3SSh35Q(a)mail.gmail.com> > Content-Type: text/plain; charset=3D"iso-8859-1" > > Hi Richard; > > Yes, a distributed project would be probably the best solution, but it is > not easy to develop, unless you use a library like bittorrent, or similar > and you have many peers. Althought most of the people don't seed the file= s > long time, so sometimes is better to depend on a few committed persons th= an > a big but ephemeral crowd. > > Regards, > emijrp > > 2011/6/26 Richard Farmbrough <richard(a)farmbrough.co.uk> > >> ** >> It would be useful to have =A0an archive of archives. =A0I have to delet= e my >> old data dumps as time passes, for space reasons, however a team could, >> between them, maintain multiple copies of every data dump. This would ma= ke a >> nice distributed project. >> >> On 26/06/2011 13:53, emijrp wrote: >> >> Hi all; >> >> Can you imagine a day when Wikipedia is added to this list?[1] >> >> WikiTeam have developed a script[2] to download all the Wikipedia dumps >> (and her sister projects) from dumps.wikimedia.org. It sorts in folders >> and checks md5sum. It only works on Linux (it uses wget). >> >> You will need about 100GB to download all the 7z files. >> >> Save our memory. >> >> Regards, >> emijrp >> >> [1] http://en.wikipedia.org/wiki/Destruction_of_libraries >> [2] >> http://code.google.com/p/wikiteam/source/browse/trunk/wikipediadownloade= r.py >> >> >> _______________________________________________ >> Xmldatadumps-l mailing listXmldatadumps-l@lists.wikimedia.orghttps://lis= ts.wikimedia.org/mailman/listinfo/xmldatadumps-l >> >> >> >

12 years, 11 months

Re: [Wiki-research-l] Wikipedia dumps downloader

by emijrp

2011/6/28 Platonides <platonides(a)gmail.com> > emijrp wrote: > >> Hi; >> >> @Derrick: I don't trust Amazon. >> > > I disagree. Note that we only need them to keep a redundant copy of a file. > If they tried to tamper the file we could detect it with the hashes (which > should be properly secured, that's no problem). > > I didn't mean security problems. I meant just deleted files by weird terms of service. Commons hosts a lot of images which can be problematic, like nudes or copyrighted materials in some jurisdictions. They can deleted what they want and close every account they want, and we will lost the backups. Period. And we don't only need to keep a copy of every file. We need several copies everywhere, not only in the Amazon coolcloud. > I'd like having the hashes for the xml dumps content instead of the > compressed one, though, so it could be easily stored with better compression > without weakening the integrity check. > > > Really, I don't trust Wikimedia >> Foundation either. They can't and/or they don't want to provide image >> dumps (what is worst?). >> > > Wikimedia Foundation has provided image dumps several times in the past, > and also rsync3 access to some individuals so that they could clone it. > Ah, OK, that is enough (?). Then, you are OK with old-and-broken XML dumps, because people can slurp all the pages using an API scrapper. > It's like the enwiki history dump. An image dump is complex, and even less > useful. > > It is not complex, just resources consuming. If they need to buy another 10 TB of space and more CPU, they can. $16M were donated last year. They just need to put resources in relevant stuff. WMF always says "we host the 5th website in the world", I say that they need to act like that. Less useful? I hope they don't need such a useless dump for recovering images, just like happened in the past. > > Community donates images to Commons, community >> donates money every year, and now community needs to develop a software >> to extract all the images and packed them, >> > > There's no *need* for that. In fact, such script would be trivial from the > toolserver. Ah, OK, only people with toolserver account may have access to an image dump. And you say it is trivial from Toolserver and very complex from Wikimedia main servers. and of course, host them in a permanent way. Crazy, right? >> > > WMF also tries hard to not lose images. I hope that, but we remember a case of lost images. > We want to provide some redundance on our own. That's perfectly fine, but > it's not a requirement. That _is_ a requirement. We can't trust Wikimedia Foundation. They lost images. They have problems to generate English Wikipedia dumps and image dumps. They had a hardware failure some months ago in the RAID which hosts the XML dumps, and they didn't offer those dumps during months, while trying to fix the crash. > Consider that WMF could be automatically deleting page history older than a > month, or images not used on any article. *That* would be a real problem. > > You just don't understand how dangerous is the current status (and it was worst in the past). > > @Milos: Instead of spliting image dump using the first letter of >> filenames, I thought about spliting using the upload date (YYYY-MM-DD). >> So, first chunks (2005-01-01) will be tiny, and recent ones of several >> GB (a single day). >> >> Regards, >> emijrp >> > > I like that idea since it means the dumps are static. They could be placed > in tape inside a safe and not needed to be taken out unless data loss > arises. >

12 years, 11 months

Data Competition: Announcing the Wikipedia Participation Challenge

by Howie Fung

Everyone, I wanted to let everyone know that we just announced the first of two data challenges this year. The Wikipedia Participation Challenge is a data modeling competition where contestants are tasked with developing an algorithm that predicts future editing activity of editors on the English Wikipedia. More details may be found on the blog post: http://blog.wikimedia.org/2011/06/28/data-competition-announcing-the-wikipe… We’re very excited about this competition. The competition starts today and runs until September 20, 2011. More information on the competition may be found here: http://www.kaggle.com/c/wikichallenge (Kaggle is a company that hosts online data competitions. They are helping us manage this competition on a pro-bono basis). Just like Wikipedia, it’s open to anyone who wants to participate, so please spread the word! Howie

12 years, 11 months

Workshop: Wikipedia & Research - Open Knowledge Conference: June 30th at 14h, Berlin

by Fuster, Mayo

Workshop: Wikipedia & Research: The innovative character of Wikipedia research and the new challenges (and opportunities) associated with it Workshop at the Open Knowledge Conference: June 30th, at 14:00 in Workshop, Kalkscheune, Johannisstr. 2, 10117 Berlin, Germany Further information: http://okcon.org/2011/programme/wikipedia-research-the-innovative-character… Contact: mayo.fuster(at)eui.eu In 2011, Wikipedia celebrated its tenth anniversary as one of the world’s ten most visited websites and as one of the more active communities on the web. Particularly since 2005, there has been an increasing interest within the scientific community in researching Wikipedia. A recent review of Wikipedia literature resulted in 2,100 peer-reviewed articles and 38 doctoral theses related to Wikipedia (http://en.wikipedia.org/wiki/Wikipedia:Academic_studies_of_Wikipedia). Quantitative analysis of large data sets and on the English version of Wikipedia was the predominant approach in early empirical research on Wikipedia.,The focus was then expanded to conducting research on other language versions, covering a larger variety of issues, such as socio-political questions, and also adopting qualitative methods. In conjunction, the research on Wikipedia constituted a substantial body of research in itself which allowed researchers (and communities) to better and critically understand Wikimedia projects functioning from a plurality of perspectives, and to advance our knowledge on issues that go beyond Wikipedia itself. Research in a sense (and under certain conditions) is becoming a way of contributing to the Wikimedia movement. Furthermore, the community of (more or less committed) researchers on Wikipedia is growing, together with the willingness to collaborate, the synergy between research initiatives of various kinds, and the willingness to continue innovating (in what is already constituting one of the leading node of methodological innovation); a Wikimedia research “informational common” is growing, as it also increases the promotion of research from the Wikimedia Foundation (such as with the creation of the Research Committee) and Wikimedia chapters (such as the performance of surveys by Amical Viquipedia or the German Wikimedia participation in the Render project). But new problems have also emerged, such as information overload, the lack of coordination between the various research efforts, and tensions between community members and certain researchers’ needs (for example on the question of subject recruitment, or on the publication policy of researchers and the need to maintain their positions in academia). In sum, Wikipedia research has increased substantially, and in the process has become an important area for experimentation and research innovation, but also faces new challenges associated with progression. The workshop will focus on addressing the stage of Wikipedia research and in general common – based peer production (less focused on the content than on the methodologies and research process itself) and the innovations, problems and new insights regarding (action) research on common-based peer production. The workshop is organized in collaboration between the Research Committee of the Wikimedia Foundation, German Wikimedia and Amical Viquipedia (Catalan Wikimedia). It will consist of a set of brief presentations (including Mayo Fuster Morell member Research Committee of the Wikimedia Foundation and Amical Viquipedia, Daniel Mietchen members Research Committee of the Wikimedia Foundation, Mathias Schindler from Wikimedia German and the Render project, and Mako Benjamin Hill Wikimedia Foundation Advisory Board, among others) and “networking” discussions towards action. Bio presenters: Mayo Fuster Morell is currently a postdoctoral researcher at the Institute of Govern and Public Policies (Autonomous University of Barcelona) and visiting scholar at the Internet Interdisciplinary Institute (Open University of Catalonia). She has been appointed Berkman Center of Internet & Society fellow for the academic year 2011-2012. She collaborates in research projects on Wikimedia/pedia with Science Po and Barcelona Media. She is member of the research committee of the Wikimedia Foundation and the Association Amical Viquipedia (User: Lilaroja). She is promotor of the international forum of collaborative communities for the building of digital commons. She was co-founder of the International Forum on Free Culture and organized its first two editions (2009 & 2010). Additionally, she promoted the Networked Politics collaborative research and developed techno-political tools within the frame of the World Social Forum. She did her PhD thesis at the European University Institute on “The governance of online creation communities: Provision of infrastructure for the building of digital commons”. She co-wrote the books Rethinking Political Organisation in an Age of Movements and Networks (2007), Activist Research and Social Movements (in Spanish, 2005), and Guide for Social Transformation of Catalonia (in Catalan, 2003). Daniel Mietchen (User:Mietchen) is a biophysicist by training and currently a postdoc in brain morphometry at the University of Jena, Germany. He has a general interest in integrating collaborative activities in wikis and similar environments with scholarly workflows in the framework of open science, particularly with original research, encyclopaedic knowledge, open access publishing, reputation systems and scientific networking as well as teaching and outreach. His home wikis are Citizendium and OpenWetWare, and he also contributes to a number of other wiki communities, including several Wikimedia wikis, Encyclopedia of Earth, Scholarpedia and WikiEducator. Mathias Schindler co-founded Wikimedia Deutschland e.V. He is member of the Communication Committee of the Wikimedia Foundation and project manager in the German chapter. After studying in Frankfurt/Main, Germany he worked at the German National Library at the office for authority files. He was co-organizer of the Social Web and Knowledge Management Workshop SWKM 2008 in Beijing, China, co-located with the WWW conference. He was on the organization committee for the WikiMania conference in 2005, 2007 and 2009. His research interests include Wikipedia-style massive collaboration and bibliographic metadata. Benjamin Mako Hill (born December 2, 1980) is a Debian hacker, intellectual property researcher, activist and author. He is a contributor and free software developer as part of the Debian and Ubuntu projects as well as the author of two best-selling technical books on the subject, Debian GNU/Linux 3.1 Bible (ISBN 978-0-7645-7644-7) and The Official Ubuntu Book (ISBN 978-0-13-243594-9). He currently serves as a member of the Free Software Foundation board of directors.[2] Hill has a Masters degree from the MIT Media Lab and is currently a Senior Researcher at the MIT Sloan School of Management where he studies free software communities and business models. He is also a Fellow at the MIT Center for Future Civic Media where he coordinates the development of software for civic organizing, and works as an advisor and contractor for the One Laptop per Child project. He is a speaker for the GNU Project,[3] and serves on the board of Software Freedom International (the organization that organizes Software Freedom Day). «·´`·.(*·.¸(`·.¸ ¸.·´)¸.·*).·´`·» «·´¨*·¸¸« Mayo Fuster Morell ».¸.·*¨`·» «·´`·.(¸.·´(¸.·* *·.¸)`·.¸).·´`·» Research Digital Commons Governance: http://www.onlinecreation.info Ph.D European University Institute Postdoctoral Researcher. Institute of Govern and Public Policies. Autonomous University of Barcelona. Visiting scholar. Internet Interdisciplinary Institute. Open University of Catalonia (UOC). Visiting researcher (2008). School of information. University of California, Berkeley. Member Research Committee. Wikimedia Foundation http://www.onlinecreation.info E-mail: mayo.fuster(a)eui.eu Skype: mayoneti Phone Spanish State: 0034-648877748

12 years, 11 months

Re: [Wiki-research-l] Wikipedia dumps downloader

by emijrp

Can you share your script with us? 2011/6/27 Platonides <platonides(a)gmail.com> > emijrp wrote: > >> Hi SJ; >> >> You know that that is an old item in our TODO list ; ) >> >> I heard that Platonides developed a script for that task long time ago. >> >> Platonides, are you there? >> >> Regards, >> emijrp >> > > Yes, I am. :) > >

12 years, 11 months

Wikipedia dumps downloader

by emijrp

Hi all; Can you imagine a day when Wikipedia is added to this list?[1] WikiTeam have developed a script[2] to download all the Wikipedia dumps (and her sister projects) from dumps.wikimedia.org. It sorts in folders and checks md5sum. It only works on Linux (it uses wget). You will need about 100GB to download all the 7z files. Save our memory. Regards, emijrp [1] http://en.wikipedia.org/wiki/Destruction_of_libraries [2] http://code.google.com/p/wikiteam/source/browse/trunk/wikipediadownloader.py

12 years, 11 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Wiki-research-l June 2011