Hi all,
I'm wondering if anyone has done any research into identifying which articles in Wikipedia have associated video?
There is this category, which only has 280 or so articles: http://en.wikipedia.org/wiki/Category:Articles_containing_video_clips
It seems far from complete. Appreciate any advice or previous work in this area.
The background: I'm working with some grad students on staging a Wiki Makes Video contest in April, and we'd like to do some measurement of the current state of video in Wikipedia.
Thanks, and email me if you'd like to know more about the video project for April.
-Andrew
Andrew -- Good question. I have an answer. It's a few years old. But if you like my method, I bring the data up to date.
I used my exploratory parsing mechanism to look for [[File: ... ]] links to media files. I first ignored files with familiar suffixes like jpg, png, gif and pdf. This left lots of ogg and ogv files which I separated out as videos. This left a couple of oga files and some strange suffixes I didn't recognize like djvu, shivg and ext. I ignored them.
All total I found 878 video files on 707 pages, 227 of which were flagged as "Articles containing video clips".
I also looked for {{cite video ... }} templates and found 9,716 of them.
I'm scraping this information from an enwiki.xml dump file downloaded Sep 22, 2010. It was 12,162,183,168 bytes uncompressed and contained 2,598,517 pages.
I'm attaching a text file with one line for each page on which I found (at least) one video. The tab-separated columns are: page-title, media-file, clips-flag.
I'd be happy to adjust my methods if there are other ways to markup a video. I hope this is useful.
Best regards. -- Ward
On Jan 21, 2013, at 3:17 PM, Andrew Lih wrote:
Hi all,
I'm wondering if anyone has done any research into identifying which articles in Wikipedia have associated video?
There is this category, which only has 280 or so articles: http://en.wikipedia.org/wiki/Category:Articles_containing_video_clips
It seems far from complete. Appreciate any advice or previous work in this area.
The background: I'm working with some grad students on staging a Wiki Makes Video contest in April, and we'd like to do some measurement of the current state of video in Wikipedia.
Thanks, and email me if you'd like to know more about the video project for April.
-Andrew
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Dear all,
I don't have an answer but am pondering related questions: The Open Access Media Importer has uploaded over 10k files so far (mostly videos, plus a few audio files), but I have no idea what percentage that is of the video or audio files that are on Commons.
http://commons.wikimedia.org/wiki/Category:Videos and http://commons.wikimedia.org/wiki/Category:Sound are similarly incomplete as http://en.wikipedia.org/wiki/Category:Articles_containing_video_clips .
However, for category-based stats, we have GLAMorous, which also yields the number of uses per file across projects, as well as the total number of files per project, e.g. * http://toolserver.org/~magnus/glamorous.php?doit=1&category=Videos&u... * http://toolserver.org/~magnus/glamorous.php?doit=1&category=Sound&us... * http://toolserver.org/~magnus/glamorous.php?doit=1&category=Uploaded+wit...
Daniel On Tue, Jan 22, 2013 at 3:04 AM, Ward Cunningham ward@c2.com wrote:
Andrew -- Good question. I have an answer. It's a few years old. But if you like my method, I bring the data up to date.
I used my exploratory parsing mechanism to look for [[File: ... ]] links to media files. I first ignored files with familiar suffixes like jpg, png, gif and pdf. This left lots of ogg and ogv files which I separated out as videos. This left a couple of oga files and some strange suffixes I didn't recognize like djvu, shivg and ext. I ignored them.
All total I found 878 video files on 707 pages, 227 of which were flagged as "Articles containing video clips".
I also looked for {{cite video ... }} templates and found 9,716 of them.
I'm scraping this information from an enwiki.xml dump file downloaded Sep 22, 2010. It was 12,162,183,168 bytes uncompressed and contained 2,598,517 pages.
I'm attaching a text file with one line for each page on which I found (at least) one video. The tab-separated columns are: page-title, media-file, clips-flag.
I'd be happy to adjust my methods if there are other ways to markup a video. I hope this is useful.
Best regards. -- Ward
On Jan 21, 2013, at 3:17 PM, Andrew Lih wrote:
Hi all,
I'm wondering if anyone has done any research into identifying which articles in Wikipedia have associated video?
There is this category, which only has 280 or so articles: http://en.wikipedia.org/wiki/Category:Articles_containing_video_clips
It seems far from complete. Appreciate any advice or previous work in this area.
The background: I'm working with some grad students on staging a Wiki Makes Video contest in April, and we'd like to do some measurement of the current state of video in Wikipedia.
Thanks, and email me if you'd like to know more about the video project for April.
-Andrew
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Thanks Ward -- very useful! It would be interesting to run it again on a recent dump and to find whether certain categories are getting better video treatment, though the set
Fascinating that in about 2.5 years, the number of videos in that category has not changed much.
By coincidence, I was looking at a 2009 blog post I had about Encarta and Wikipedia's lack of video/multimedia.
"There is a loss to the world with the absence of Encarta’s historic images [and video]. Because Wikipedia has a strict “free” edict on content, especially images and multimedia, it will always be at a disadvantage in having visuals that are unique and under copyright protection. For that, the community will have to wait until copyright runs out on those materials. Technology may be fast, but that’s one area that will be slow."
-Andrew
On Mon, Jan 21, 2013 at 6:04 PM, Ward Cunningham ward@c2.com wrote:
Andrew -- Good question. I have an answer. It's a few years old. But if you like my method, I bring the data up to date.
I used my exploratory parsing mechanism to look for [[File: ... ]] links to media files. I first ignored files with familiar suffixes like jpg, png, gif and pdf. This left lots of ogg and ogv files which I separated out as videos. This left a couple of oga files and some strange suffixes I didn't recognize like djvu, shivg and ext. I ignored them.
All total I found 878 video files on 707 pages, 227 of which were flagged as "Articles containing video clips".
I also looked for {{cite video ... }} templates and found 9,716 of them.
I'm scraping this information from an enwiki.xml dump file downloaded Sep 22, 2010. It was 12,162,183,168 bytes uncompressed and contained 2,598,517 pages.
I'm attaching a text file with one line for each page on which I found (at least) one video. The tab-separated columns are: page-title, media-file, clips-flag.
I'd be happy to adjust my methods if there are other ways to markup a video. I hope this is useful.
Best regards. -- Ward
On Jan 21, 2013, at 3:17 PM, Andrew Lih wrote:
Hi all,
I'm wondering if anyone has done any research into identifying which articles in Wikipedia have associated video?
There is this category, which only has 280 or so articles: http://en.wikipedia.org/wiki/Category:Articles_containing_video_clips
It seems far from complete. Appreciate any advice or previous work in this area.
The background: I'm working with some grad students on staging a Wiki Makes Video contest in April, and we'd like to do some measurement of the current state of video in Wikipedia.
Thanks, and email me if you'd like to know more about the video project for April.
-Andrew
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
On Jan 22, 2013, at 8:00 AM, Andrew Lih wrote:
Thanks Ward -- very useful! It would be interesting to run it again on a recent dump and to find whether certain categories are getting better video treatment, though the set
Will do.
I wonder if we could get some students to check my results and add the "containing video clips" category to pages that deserve it.
Best regards. -- Ward
I don't know that I add that category when I do add videos to articles myself. One solution would be to narrow a scope to a select list of articles around a topic/wikiproject like Roads, Sports, medicine, and then conduct a search for .ogv, .ogg. http://en.wikipedia.org/wiki/Netball , http://topsy.com/en.wikipedia.org/wiki/Roller_derby , http://en.wikipedia.org/wiki/Lauren_Jackson , http://en.wikipedia.org/wiki/Jenna_O%27Hea , http://en.wikipedia.org/wiki/Canberra_Capitals , http://en.wikipedia.org/wiki/Carrie_Graf , http://en.wikipedia.org/wiki/Marianna_Tolo all have video in there.
I think there are some limitations with video that are also worth discussing: Is it better to have a quality picture of Marianna Tolo in the article? Or is it better to have lesser quality video in the article? I believe the maximum upload size on Commons is 100mb. There is no streaming. This creates a major wall to sharing this type of content in an article and making a video illustrative of a point. (The video in the roller derby article for example illustrates safety in roller derby. Tolo video illustrates her playing in a game. If you're a French fan of her, you might want to see what she played like in Australia, especially since Australian games were not televised in France.)
On Tue, Jan 22, 2013 at 5:39 PM, Ward Cunningham ward@c2.com wrote:
On Jan 22, 2013, at 8:00 AM, Andrew Lih wrote:
Thanks Ward -- very useful! It would be interesting to run it again on a recent dump and to find whether certain categories are getting better video treatment, though the set
Will do.
I wonder if we could get some students to check my results and add the "containing video clips" category to pages that deserve it.
Best regards. -- Ward
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Laura -- If there are extra fields I should capture from the article's markup, I'm happy to add them to my results.
I'm 2.6 million articles through the parse of the jan 2 dump. I've found close to 3000 videos so far.
My parse is running at 6mb per second. I have to run to the train station to pickup a colleague. I'm going to see if I can finish on battery. (I should have done this in the cloud but thought it would be handy to have a current copy of wikipedia on my laptop.)
More soon. -- Ward
On Jan 22, 2013, at 9:21 AM, Laura Hale wrote:
I don't know that I add that category when I do add videos to articles myself. One solution would be to narrow a scope to a select list of articles around a topic/wikiproject like Roads, Sports, medicine, and then conduct a search for .ogv, .ogg. http://en.wikipedia.org/wiki/Netball , http://topsy.com/en.wikipedia.org/wiki/Roller_derby , http://en.wikipedia.org/wiki/Lauren_Jackson , http://en.wikipedia.org/wiki/Jenna_O%27Hea , http://en.wikipedia.org/wiki/Canberra_Capitals , http://en.wikipedia.org/wiki/Carrie_Graf , http://en.wikipedia.org/wiki/Marianna_Tolo all have video in there.
I think there are some limitations with video that are also worth discussing: Is it better to have a quality picture of Marianna Tolo in the article? Or is it better to have lesser quality video in the article? I believe the maximum upload size on Commons is 100mb. There is no streaming. This creates a major wall to sharing this type of content in an article and making a video illustrative of a point. (The video in the roller derby article for example illustrates safety in roller derby. Tolo video illustrates her playing in a game. If you're a French fan of her, you might want to see what she played like in Australia, especially since Australian games were not televised in France.)
On Tue, Jan 22, 2013 at 5:39 PM, Ward Cunningham ward@c2.com wrote: On Jan 22, 2013, at 8:00 AM, Andrew Lih wrote:
Thanks Ward -- very useful! It would be interesting to run it again on a recent dump and to find whether certain categories are getting better video treatment, though the set
Will do.
I wonder if we could get some students to check my results and add the "containing video clips" category to pages that deserve it.
Best regards. -- Ward
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- mobile: 0412183663 twitter: purplepopple blog: ozziesport.com _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
On Tue, Jan 22, 2013 at 6:32 PM, Ward Cunningham ward@c2.com wrote:
Laura -- If there are extra fields I should capture from the article's markup, I'm happy to add them to my results.
My understanding is that the only real video file format that works on Commons is .ogg (and .ogv), but you run into the issue of picking up sound files. http://en.wikipedia.org/wiki/Wikipedia:Spoken_articles is the biggest example of where I think you'd run into false positives with the sound file issue using that as a search phrase. Music articles are probably another... because I'm pretty sure audio files are still limited to .ogg and .oga given licensing limitations because of non-CC compliant codecs being used for formats like .mp3.
I'm 2.6 million articles through the parse of the jan 2 dump. I've found close to 3000 videos so far.
I tend to create project by project lists so I can do comparisons as I'm less interested in the actual total volume, and more interested in seeing how things differ from one group to another. There might be a good reason why you don't have much data or why numbers have gone up if looking at things say based on Wikiproject or category inclusion. (Maybe videos from older movies suddenly came into public domain and with support of a GLAM donation, a major effort was made to include these into articles. Maybe the Wiki Loves Monuments people opened up video and people made a big push to include their monument videos into articles.) Approaching the research question from a what is going on and how can this be explained seems a better way to contextualize data to understand why things are moving.
My parse is running at 6mb per second. I have to run to the train station to pickup a colleague. I'm going to see if I can finish on battery. (I should have done this in the cloud but thought it would be handy to have a current copy of wikipedia on my laptop.)
Not a worry. Just a side point.
On Jan 22, 2013, at 9:54 AM, Laura Hale wrote:
I tend to create project by project lists so I can do comparisons as I'm less interested in the actual total volume, and more interested in seeing how things differ from one group to another.
Me too. Curiosity.
On this list a few months ago I suggested that we should use wiki to study wiki. I'm developing such a wiki, where one can create and share results minded from recent dumps. My method excels where curiosity goes beyond what has been already parsed.
Best regards. -- Ward
On Tue, Jan 22, 2013 at 7:07 PM, Ward Cunningham ward@c2.com wrote:
Me too. Curiosity.
On this list a few months ago I suggested that we should use wiki to study wiki. I'm developing such a wiki, where one can create and share results minded from recent dumps. My method excels where curiosity goes beyond what has been already parsed.
On a sport level, I'd hazard a guess that Australia has more videos in use on their sporting related pages (or more videos of Australian sport are used on general pages) because unlike the United States, you cannot copyright an event. Hence, you can make recordings at professional sporting matches without as much fear.
Size limitations are a PITA though, which is why I personally haven't uploaded more. If you have high quality video at 70 to 100 meg, and then you have metered internet with only 20 gigs a month, how much high quality video do you want to be uploading? Hence yeah, the importance of meta data.
I tend to use much smaller data sets. http://commons.wikimedia.org/wiki/File:HOPAU_at_London_Paralympics.pdf and http://commons.wikimedia.org/wiki/File:IPC_NorAmCup.pdf are examples of contextualizing and understanding where content work around events works in order to give GLAMs an understanding of the impact of such undertakings, especially if they can contextualize it against their own internal data. It often isn't the single data collection that matters but contextualizing it against others. (How does Wikipedia traffic compare around an event compared to say a news site? Which one has further audience reach? How does the total editor contributions compare to the total comments?)
Sincerely, Laura Hale
Laura, thanks for your insight into this. I also worried about the generic "ogg" container and not knowing exactly whether it was audio or video, without digging deeper into the metadata.
Since there seems to be interest, here's a pointer to the video project planning page and please do feel to add/markup/edit. The plan is to execute a video gathering/production project in March/April.
http://en.wikipedia.org/wiki/User:Fuzheado/Video_project
-Andrew
On Tue, Jan 22, 2013 at 10:16 AM, Laura Hale laura@fanhistory.com wrote:
On Tue, Jan 22, 2013 at 7:07 PM, Ward Cunningham ward@c2.com wrote:
Me too. Curiosity.
On this list a few months ago I suggested that we should use wiki to study wiki. I'm developing such a wiki, where one can create and share results minded from recent dumps. My method excels where curiosity goes beyond what has been already parsed.
On a sport level, I'd hazard a guess that Australia has more videos in use on their sporting related pages (or more videos of Australian sport are used on general pages) because unlike the United States, you cannot copyright an event. Hence, you can make recordings at professional sporting matches without as much fear.
Size limitations are a PITA though, which is why I personally haven't uploaded more. If you have high quality video at 70 to 100 meg, and then you have metered internet with only 20 gigs a month, how much high quality video do you want to be uploading? Hence yeah, the importance of meta data.
I tend to use much smaller data sets. http://commons.wikimedia.org/wiki/File:HOPAU_at_London_Paralympics.pdfand http://commons.wikimedia.org/wiki/File:IPC_NorAmCup.pdf are examples of contextualizing and understanding where content work around events works in order to give GLAMs an understanding of the impact of such undertakings, especially if they can contextualize it against their own internal data. It often isn't the single data collection that matters but contextualizing it against others. (How does Wikipedia traffic compare around an event compared to say a news site? Which one has further audience reach? How does the total editor contributions compare to the total comments?)
Sincerely, Laura Hale
-- mobile: 0412183663 twitter: purplepopple blog: ozziesport.com
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
On Tue, Jan 22, 2013 at 8:54 PM, Andrew Lih andrew.lih@gmail.com wrote:
Laura, thanks for your insight into this. I also worried about the generic "ogg" container and not knowing exactly whether it was audio or video, without digging deeper into the metadata.
Since there seems to be interest, here's a pointer to the video project planning page and please do feel to add/markup/edit. The plan is to execute a video gathering/production project in March/April.
It would be awesome to see more video, but the technical end really needs to improve on some level. Ditto with the audio. There was a discussion at one point about putting a video on the front page of English Wikipedia as a DYK picture. (HJ Mitchell nominated it I think. The video was from a cartoon and was public domain.) It ran because the image was in public domain. It had an outstanding number of views. The problem is yeah... local download required. I believe even if you resize the file like "200px", the video itself isn't "resized" so taking a 1000px wide video that is 100megs, putting it into an article as 200px still requires a 100megs upload. Beyond that, no size warnings or warnings anywhere that you're doing essentially a local download, not streaming.
I've talked a number of sport organisations about video, and they can be used to great effect when used properly. I have a few myself of skiers. Seeing a picture of a blind skier and their guide kind of looks cool but hearing the guide call out instructions to the blind skier and watching them go down the mountain together could make the point about http://enwp.org/Para-alpine_skiing much more effective in terms of understanding that part of the sport. They see the value but the push isn't there. I probably need to get a few really successful examples. The picture part is easy enough.
The use of video has also been something we've actively discussed on English Wikinews. As one of the primary users of ogg sound files and people who does on the ground original reporting, it would be fantastic to have an editor that would easily allow us to put together a video more easily. There are just a lot of technical challenges, including the file format. (That said, I absolutely love https://itunes.apple.com/us/app/fire-2-field-recorder/id436241643?mt=8 this application for the iPhone and iPad. It basically allows for recording in ogg. We've got it set up on Wikinews when we do mobile reporting that we can then upload these files locally to speed up writing time through the use of Dropbox. We're working on a tool that will then build in the licenses for uploading to further ease this problem. It's been incredibly effective for reviewers who sometimes face problems downloading from Commons.)
On Wed, Jan 23, 2013 at 3:47 AM, Laura Hale laura@fanhistory.com wrote:
It would be awesome to see more video, but the technical end really needs to improve on some level. Ditto with the audio. There was a discussion at one point about putting a video on the front page of English Wikipedia as a DYK picture. (HJ Mitchell nominated it I think. The video was from a cartoon and was public domain.) It ran because the image was in public domain. It had an outstanding number of views. The problem is yeah... local download required. I believe even if you resize the file like "200px", the video itself isn't "resized" so taking a 1000px wide video that is 100megs, putting it into an article as 200px still requires a 100megs upload.
That is no longer the case. Part of the TimedMediaHandler rollout last year was the added functionality to generate derivatives in multiple resolutions. This gives you a YouTube-style selector when viewing the video which allows you to pick the resolution you'd like to play it in. Moreover, when embedding a video in a tiny pixel size, the player will "pop out" when clicking the video to enable you to actually watch it.
https://blog.wikimedia.org/2012/11/08/introducing-wikipedias-new-html5-video...
On Thu, Mar 21, 2013 at 7:57 PM, Erik Moeller erik@wikimedia.org wrote:
That is no longer the case. Part of the TimedMediaHandler rollout last year was the added functionality to generate derivatives in multiple resolutions. This gives you a YouTube-style selector when viewing the video which allows you to pick the resolution you'd like to play it in. Moreover, when embedding a video in a tiny pixel size, the player will "pop out" when clicking the video to enable you to actually watch it.
https://blog.wikimedia.org/2012/11/08/introducing-wikipedias-new-html5-video...
Very awesome and good to know. :) Now if I could just find an encoder from my end, things would be pretty good. :D
Sincerely, Laura Hale
Andrew Lih, 22/01/2013 20:54:
Laura, thanks for your insight into this. I also worried about the generic "ogg" container and not knowing exactly whether it was audio or video, without digging deeper into the metadata.
Now that we have WebM, how much do things change? An interesting thing IMHO is that pages with videos are much more prominent in Google results: it would be very nice if someone measured the impact on pageviews of adding a video on Wikipedia articles. As for measuring, the easiest way is probably to ask a list of user videos at https://jira.toolserver.org/browse/DBQ and add them to a tracking category. GLAMourous etc. may then be used to track stats.
Since there seems to be interest, here's a pointer to the video project planning page and please do feel to add/markup/edit. The plan is to execute a video gathering/production project in March/April.
Good, I see it's going on. "WebM not used by video pros/tools" really? Does nobody upload to YouTube? Are they all on Vimeo or what?
Nemo
Federico, thanks for the note and good idea on reaching out to the Toolserver folks to help us track this.
As for the WebM and pros, from my understanding YouTube is transcoding all videos that are uploaded into WebM format. However, I'd be very surprised if even 2% of YouTube uploaders are using WebM as the format. Most likely they are MP4, AVCHD, MOV, or even raw MTS files.
So perhaps it's more accurate to say WebM not used by video pros in their workflow. I'll clarify in the document now.
In the meantime, check out some of the videos produced from our Feb/March pilot with Alverno College.
http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Wiki_Makes_Video#Sample_V...
-Andrew
On Thu, Mar 21, 2013 at 8:02 AM, Federico Leva (Nemo) nemowiki@gmail.comwrote:
Andrew Lih, 22/01/2013 20:54:
Laura, thanks for your insight into this. I also worried about the
generic "ogg" container and not knowing exactly whether it was audio or video, without digging deeper into the metadata.
Now that we have WebM, how much do things change? An interesting thing IMHO is that pages with videos are much more prominent in Google results: it would be very nice if someone measured the impact on pageviews of adding a video on Wikipedia articles. As for measuring, the easiest way is probably to ask a list of user videos at https://jira.toolserver.org/**browse/DBQhttps://jira.toolserver.org/browse/DBQand add them to a tracking category. GLAMourous etc. may then be used to track stats.
Since there seems to be interest, here's a pointer to the video project planning page and please do feel to add/markup/edit. The plan is to execute a video gathering/production project in March/April.
http://en.wikipedia.org/wiki/**User:Fuzheado/Video_projecthttp://en.wikipedia.org/wiki/User:Fuzheado/Video_project
Good, I see it's going on. "WebM not used by video pros/tools" really? Does nobody upload to YouTube? Are they all on Vimeo or what?
Nemo
wiki-research-l@lists.wikimedia.org