Wiki-research-l August 2011

wiki-research-l@lists.wikimedia.org

22 participants
21 discussions

Re: [Wiki-research-l] Missing images in image sql table?

by bawolff

> I'm doing some analysis on the wikipedia image metadata and seeing some > missing image rows in the sql dumps. > > I downloaded > enwiki-latest-image.sql, enwiki-latest-imagelinks.sql, > enwiki-latest-imagelinks.sql > and enwiki-latest-oldimage.sql from > http://dumps.wikimedia.org/enwiki/latest/ > > I picked a page, 25041, > http://en.wikipedia.org/wiki/Special:Export/Lockheed_P-38_Lightning > > I get 39 links from > "select il_to from imagelinks where il_from = 25041" > > When I query the image table for these, only 8 of the 39 appear. > Some of the missing files are 050218-F-1234P-076.jpg, 020930-O-9999G-017.jpg > > I grepped the original mysql file for these and get nothing. > > I can see the original file here though: > http://en.wikipedia.org/wiki/File:050218-F-1234P-076.jpg > > I did a select count and got a total of 849,801 rows. Seems low for the > total # of wikipedia images. > > Any ideas why i'm getting missing data? > > -- > @tommychheng > http://tommy.chheng.com >

12 years, 8 months

Test Your Knowledge of Wikipedia + Summer of Research Brown Bag

by Shawn Walker

Please join the summer of research fellows (http://meta.wikimedia.org/wiki/Research:WSOR11) as we present the results of our research on the causes, effects, characteristics, tools, and visualizations of the editor decline. Test your knowledge of Wikipedia against some of the summer of research results by taking our quiz (https://docs.google.com/spreadsheet/viewform?formkey=dFdscUFfY0dhdUN5eGkwUT…). Submit the quiz before the brown bag on Thursday, August 25th from 2-3:30pm PST. We'll share our research and the answers to the quiz. Prizes will be given out for the highest grade, but you must attend in SF or remotely to receive your prize. **REMOTE call-in instructions** Topic: Summer of Research Wrap-up Date: Thursday, August 25, 2011 Time: 2:00 pm, Pacific Daylight Time (San Francisco, GMT-07:00) Meeting Number/Access Code: 801 468 325 Meeting Password: (This meeting does not require a password.) ------------------------------------------------------- To join the online meeting (Now from mobile devices!) ------------------------------------------------------- 1. Go to https://wikimedia.webex.com/wikimedia/j.php?ED=159146047&UID=1207190257&RT=… 2. If requested, enter your name and email address. 3. If a password is required, enter the meeting password: (This meeting does not require a password.) 4. Click "Join". To view in other time zones or languages, please click the link: https://wikimedia.webex.com/wikimedia/j.php?ED=159146047&UID=1207190257&ORT… ------------------------------------------------------- To join the audio conference only ------------------------------------------------------- To receive a call back, provide your phone number when you join the meeting, or call the number below and enter the access code. Call-in toll-free number (US/Canada): 1-877-669-3239 Call-in toll number (US/Canada): +1-408-600-3600 Global call-in numbers: https://wikimedia.webex.com/wikimedia/globalcallin.php?serviceType=MC&ED=15… Toll-free dialing restrictions: http://www.webex.com/pdf/tollfree_restrictions.pdf

12 years, 8 months

Missing images in image sql table?

by Tommy Chheng

I'm doing some analysis on the wikipedia image metadata and seeing some missing image rows in the sql dumps. I downloaded enwiki-latest-image.sql, enwiki-latest-imagelinks.sql, enwiki-latest-imagelinks.sql and enwiki-latest-oldimage.sql from http://dumps.wikimedia.org/enwiki/latest/ I picked a page, 25041, http://en.wikipedia.org/wiki/Special:Export/Lockheed_P-38_Lightning I get 39 links from "select il_to from imagelinks where il_from = 25041" When I query the image table for these, only 8 of the 39 appear. Some of the missing files are 050218-F-1234P-076.jpg, 020930-O-9999G-017.jpg I grepped the original mysql file for these and get nothing. I can see the original file here though: http://en.wikipedia.org/wiki/File:050218-F-1234P-076.jpg I did a select count and got a total of 849,801 rows. Seems low for the total # of wikipedia images. Any ideas why i'm getting missing data? -- @tommychheng http://tommy.chheng.com

12 years, 8 months

Re: [Wiki-research-l] Revert detection

by Aaron Halfaker

I've updated my dump processing python project to include code for quickly detecting identity reverts from XML dumps. See https://bitbucket.org/halfak/wikimedia-utilities for the project and the process() function at bottom of https://bitbucket.org/halfak/wikimedia-utilities/src/f1c8fe7224f3/wmf/dump/… for the algorithm. The actual function with the revert detection logic is about 50 lines long. The resulting dump.map function using this revert processor() will emit "revert" revisions and "reverted" revisions with the following fields: Revert revision: - "revert" - denotes that this row is a reverting edit - revision_id - the rev_id if the reverting edit - reverted_to_id - the rev_id of the reverted to edit - for_vandalism - using D_LOOSE/D_STRICT regular expression on the reverting comment (See Priedhorsky et al. "Creating, Destroying and Restoring Value in Wikipedia" GROUP 2007) - reverted_revs - number of revisions that were reverted (this is the number of revisions between the reverting edit and reverted to edit) Reverted revision: - "reverted" - denotes that this row is a reverted edit - revision_id - the rev_id of the reverted edit - reverting_id - the rev_id if the reverting edit - reverted_to_id - the rev_id of the reverted to edit - for_vandalism - using D_LOOSE/D_STRICT regular expression on the reverting comment (See Priedhorsky et al. "Creating, Destroying and Restoring Value in Wikipedia" GROUP 2007) - reverted_revs - number of revisions that were reverted (this is the number of revisions between the reverting edit and reverted to edit) I hope this is helpful. -Aaron On Fri, Aug 19, 2011 at 3:08 PM, Aaron Halfaker <aaron.halfaker(a)gmail.com>wrote: > An identity revert is one which changes the article to an absolutely > identical previous state. This is a common operation in the English > Wikipedia. > > There is a Kittur & Kraut (and others) paper which I can't recall that > found the vast majority of reverts of any sort were identity. Some other > types the define are: > > - "Partial reverts": Part of an edit is discarded > - "Effective reverts": Looks to be an identity revert, but not > *exactly* the same as a previous revision. Often a few white-space > characters were out of place. > > See http://www.grouplens.org/node/427 for a discussion of the difficulty > of detecting reverts in better ways. > > My code detects identity reverts. For example suppose the following is the > content of a sequence of revisions. > > > 1. "foo" > 2. "bar" > 3. "foobar" > 4. "bar" > 5. "barbar" > > Revision #4 reverts back to revision #2 and revision #3 is reverted. When > looking for identity reverts, I have found that limiting the number of > revisions that can be reverted to ~15 produces the highest quality of > results. This is discussed in http://www.grouplens.org/node/416 (see > http://www-users.cs.umn.edu/~halfak/summaries/A_Jury_of_Your_Peers.html for > quick/dirty summary of the work.). > > This subject deserves a long conversation, but I think the bit you might be > interested in is that the identity revert (described above and example) > seems to be the accepted approach for identifying reverts for most types of > analyses. > > -Aaron > > On Fri, Aug 19, 2011 at 4:39 PM, Flöck, Fabian <fabian.floeck(a)kit.edu>wrote: > >> Hi Aaron, >> >> thanks, that would be awesome :) we built something ourselves, but I'm not >> quite content with it. >> >> Could you also tell me how you defined a revert (and maybe how you >> determine who is the reverter)? Because this is a crucial issue for me. >> Is it the complete deletion of all the characters entered by an editor in >> an edit? What about editors that revert others or delete content? do you >> treat their edits as being reverted if the deleted content gets >> reintroduced? Did you take into account location of the words in the text or >> did you use a bag-of-words model? >> I read many papers and tool documentations that use "reverts", and some >> mention their method (while many don't), while it seems almost no-one >> describes their definition of what a "revert" actually is. >> >> But maybe I will get the answers to this from your code as well :) >> >> Anyway, thanks for the help! >> >> Best, >> Fabian >> >> >> On 19 Aug 2011, at 18:31, Aaron Halfaker wrote: >> >> Fabian, >> >> I actually have some software for quickly producing reverts from a >> database dump. The framework for doing it is available here: >> https://bitbucket.org/halfak/wikimedia-utilities. I still have to >> package up the code that actually generates the reverts though. It's just a >> matter of finding time to sit down with it and figure out the dependencies! >> I expect that I can have it ready by Monday. I hope to actually package up >> the revert detecting code into the above python project as an example. >> >> I just wanted to let you know that I have a response for you on the way. >> >> -Aaron >> >> On Thu, Aug 18, 2011 at 4:40 AM, Flöck, Fabian <fabian.floeck(a)kit.edu>wrote: >> >>> Hi, >>> >>> I'm trying to detect reverts in Wikipedia for my research, right now with >>> a self-built script using MD5hashes and DIFFs between revisions. I always >>> read about people taking reverts into account in their data, but it's >>> seldomly described HOW exactly a revert is determined or what tool they use >>> to do that. Can you point me to any research or tools or tell me maybe what >>> you used in your own research to identify which edits were reverted and/or >>> who reverted them? >>> >>> Best, >>> >>> Fabian >>> >>> >>> >>> >>> -- >>> Karlsruhe Institute of Technology (KIT) >>> Institute of Applied Informatics and Formal Description Methods >>> >>> Dipl.-Medwiss. Fabian Flöck >>> Research Associate >>> >>> Building 11.40, Room 222 >>> KIT-Campus South >>> D-76128 Karlsruhe >>> >>> Phone: +49 721 608 4 6584 >>> Skype: f.floeck_work >>> E-Mail: fabian.floeck(a)kit.edu >>> WWW: http://www.aifb.kit.edu/web/Fabian_Flöck >>> >>> KIT – University of the State of Baden-Wuerttemberg and >>> National Research Center of the Helmholtz Association >>> >>> >>> _______________________________________________ >>> Wiki-research-l mailing list >>> Wiki-research-l(a)lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >>> >> >> >> >> >> >> >> -- >> >> Karlsruhe Institute of Technology (KIT) >> Institute of Applied Informatics and Formal Description Methods >> >> Dipl.-Medwiss. Fabian Flöck >> Research Associate >> >> Building 11.40, Room 222 >> KIT-Campus South >> D-76128 Karlsruhe >> >> Phone: +49 721 608 4 6584 >> Skype: f.floeck_work >> E-Mail: fabian.floeck(a)kit.edu >> WWW: http://www.aifb.kit.edu/web/Fabian_Flöck >> >> KIT – University of the State of Baden-Wuerttemberg and >> National Research Center of the Helmholtz Association >> >> >

12 years, 8 months

Re: [Wiki-research-l] Wiki-research-l Digest, Vol 72, Issue 5

by Ed H. Chi

It's worth pointing out in our research at PARC, we had also discussed the possibility of using containment based measure as described in: On the resemblance and containment of documents, AZ Broder In the end, we realized that the real issue is that there is no universal agreement on what is a 'revert'. --Ed On Sun, Aug 21, 2011 at 3:15 PM, <wiki-research-l-request(a)lists.wikimedia.org> wrote: > There have been a few publication on the subject: > 1. "Us vs. them: Understanding social dynamics in Wikipedia with revert > graph visualizations", B Suh, EH Chi, BA Pendleton. > 2. "He says, she says: Conflict and coordination in Wikipedia.", A Kittur, B > Suh, BA Pendleton. >

12 years, 8 months

WikiViz 2011 extended deadline: August 28, 2011

by Dario Taraborelli

We received several requests to extend the submission deadline for WikiViz 2011 – a competition organized by WikiSym and the Wikimedia Foundation to visualize WIkipedia's impact with open data. The WikiSym committee is glad to announce that the deadline has been extended to August 28 2011. http://www.wikisym.org/ws2011/wikiviz:presentation http://twitter.com/WikiViz/status/104349201680437248 The 3 finalists will have travel costs covered for the awarding ceremony at WikiSym 2011 in Mountain View, CA (3-5 October 2011) and their work showcased at the conference, featured in our partners' dataviz outlets (FlowingData, Information Aesthetics, Periscopic, Visualizing.org) and published by El Mundo – the largest digital newspaper by readership in Spanish. Please circulate the call to anyone who might be interested. Best, Dario -- Dario Taraborelli, PhD Senior Research Analyst Wikimedia Foundation http://wikimediafoundation.org http://nitens.org/taraborelli

12 years, 8 months

Revert detection

by Flöck, Fabian

Hi, I'm trying to detect reverts in Wikipedia for my research, right now with a self-built script using MD5hashes and DIFFs between revisions. I always read about people taking reverts into account in their data, but it's seldomly described HOW exactly a revert is determined or what tool they use to do that. Can you point me to any research or tools or tell me maybe what you used in your own research to identify which edits were reverted and/or who reverted them? Best, Fabian -- Karlsruhe Institute of Technology (KIT) Institute of Applied Informatics and Formal Description Methods Dipl.-Medwiss. Fabian Flöck Research Associate Building 11.40, Room 222 KIT-Campus South D-76128 Karlsruhe Phone: +49 721 608 4 6584 Skype: f.floeck_work E-Mail: fabian.floeck(a)kit.edu WWW: http://www.aifb.kit.edu/web/Fabian_Flöck KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association

12 years, 8 months

Pediameter

by lbenedix＠l3q.de

Hello, we have written a tool that simulates the hardware-meter. It should run on Windows, Linux, Android and MacOS. You can find it on our project-page: http://l3q.de/pediameter and in the android market when it has left the beta-status Greets, Lukas Benedix and Jens Hantke.

12 years, 8 months

Fraction of reverts

by Denny Vrandecic

Hello, does anyone have a rough estimate of how many edits get reverted? Does anyone have a study handy? Cheers, Denny

12 years, 8 months

Pediameter

by Jens Hantke

Hello listeners, in a project on our university(FU-Berlin), wevisualize therecent changeson the majorWikipedia's(related tolanguage). It's called Pediameter. If you're interested, you can have a look at our project on http://l3q.de/pediameter/ . It's supported on Windows, Linux and MacOSX. Greets, Lukas Benedix and Jens Hantke.

12 years, 8 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Wiki-research-l August 2011