On Tue, Jan 22, 2013 at 6:32 PM, Ward Cunningham <ward(a)c2.com> wrote:
Laura -- If there are extra fields I should capture
from the article's
markup, I'm happy to add them to my results.
My understanding is that the only real video file format that works on
Commons is .ogg (and .ogv), but you run into the issue of picking up sound
files.
http://en.wikipedia.org/wiki/Wikipedia:Spoken_articles is the
biggest example of where I think you'd run into false positives with the
sound file issue using that as a search phrase. Music articles are probably
another... because I'm pretty sure audio files are still limited to .ogg
and .oga given licensing limitations because of non-CC compliant codecs
being used for formats like .mp3.
I'm 2.6 million articles through the parse of the
jan 2 dump. I've found
close to 3000 videos so far.
I tend to create project by project lists so I can do comparisons as I'm
less interested in the actual total volume, and more interested in seeing
how things differ from one group to another. There might be a good reason
why you don't have much data or why numbers have gone up if looking at
things say based on Wikiproject or category inclusion. (Maybe videos from
older movies suddenly came into public domain and with support of a GLAM
donation, a major effort was made to include these into articles. Maybe
the Wiki Loves Monuments people opened up video and people made a big push
to include their monument videos into articles.) Approaching the research
question from a what is going on and how can this be explained seems a
better way to contextualize data to understand why things are moving.
My parse is running at 6mb per second. I have to run
to the train station
to pickup a colleague. I'm going to see if I can finish on battery. (I
should have done this in the cloud but thought it would be handy to have a
current copy of wikipedia on my laptop.)
Not a worry. Just a side point.
--
mobile: 0412183663
twitter: purplepopple
blog:
ozziesport.com