Thanks Ward -- very useful! It would be interesting to run it again on a recent dump and to find whether certain categories are getting better video treatment, though the set
Fascinating that in about 2.5 years, the number of videos in that category has not changed much.
By coincidence, I was looking at a 2009 blog post I had about Encarta and Wikipedia's lack of video/multimedia.
"There is a loss to the world with the absence of Encarta’s historic images [and video]. Because Wikipedia has a strict “free” edict on content, especially images and multimedia, it will always be at a disadvantage in having visuals that are unique and under copyright protection. For that, the community will have to wait until copyright runs out on those materials. Technology may be fast, but that’s one area that will be slow."
-Andrew
On Mon, Jan 21, 2013 at 6:04 PM, Ward Cunningham ward@c2.com wrote:
Andrew -- Good question. I have an answer. It's a few years old. But if you like my method, I bring the data up to date.
I used my exploratory parsing mechanism to look for [[File: ... ]] links to media files. I first ignored files with familiar suffixes like jpg, png, gif and pdf. This left lots of ogg and ogv files which I separated out as videos. This left a couple of oga files and some strange suffixes I didn't recognize like djvu, shivg and ext. I ignored them.
All total I found 878 video files on 707 pages, 227 of which were flagged as "Articles containing video clips".
I also looked for {{cite video ... }} templates and found 9,716 of them.
I'm scraping this information from an enwiki.xml dump file downloaded Sep 22, 2010. It was 12,162,183,168 bytes uncompressed and contained 2,598,517 pages.
I'm attaching a text file with one line for each page on which I found (at least) one video. The tab-separated columns are: page-title, media-file, clips-flag.
I'd be happy to adjust my methods if there are other ways to markup a video. I hope this is useful.
Best regards. -- Ward
On Jan 21, 2013, at 3:17 PM, Andrew Lih wrote:
Hi all,
I'm wondering if anyone has done any research into identifying which articles in Wikipedia have associated video?
There is this category, which only has 280 or so articles: http://en.wikipedia.org/wiki/Category:Articles_containing_video_clips
It seems far from complete. Appreciate any advice or previous work in this area.
The background: I'm working with some grad students on staging a Wiki Makes Video contest in April, and we'd like to do some measurement of the current state of video in Wikipedia.
Thanks, and email me if you'd like to know more about the video project for April.
-Andrew
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l