Brian, thanks much for running this. I'll spend some time in the next day to run some metrics to see how it compares with our Jan 2013 results.

In general, this is what I'm looking for and I'll post some interesting stats when I process this.

-Andrew


-Andrew Lih
Associate professor of journalism, American University
Email: andrew@andrewlih.com
WEB: http://www.andrewlih.com
BOOK: The Wikipedia Revolution: http://www.wikipediarevolution.com
PROJECT: Wiki Makes Video http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Wiki_Makes_Video

On Sun, Dec 7, 2014 at 12:22 AM, Brian Wolff <bawolff@gmail.com> wrote:
On 12/5/14, Andrew Lih <andrew@andrewlih.com> wrote:
> Brian, thanks yes that would be what I'd be looking for.
>
> In fact, a monthly report on a regular basis would be really interesting to
> see.
>

Alright, here is my first attempt:

http://tools.wmflabs.org/bawolff/usedVideos.htm (Data formatted as tsv
if anyone wants to do further processing:
http://tools.wmflabs.org/bawolff/usedVideos.txt )

It gives a mostly alphabetical list of articles with videos on them. A
video is defined as follows:
*A webm file
*An ogg file, registered as video in the database (This roughly means
that it has the string "theora" somewhere in the first 256 bytes of
the file, not counting the string "ffmpeg2theora", except for some
older files might still count the ffmpeg2theora, and also there's no
garuntee that an ogg theora file has a theora data packet in the first
255 bytes, and its also very possible for non-theora files to have
that string in the header. Consider this a "rough" metric. In practise
I think it works most of the time, but do your own checking before
using for anything serious).
*An animated gif file that is at least 10 seconds long. I figured this
very roughly separates non-videos esque gifs from video-ish gifs.

Based on that metric, there are currently 8464 articles on enwikipedia
that have videos on them (6442 if you take out the longer than 10
seconds GIF files).

Before setting this up to update itself, is this the sort of thing you
are looking for? Would it be more useful with different definitions of
a "video", or instead of listing it as an alphabetical list of
articles, orient it around which video is used the most places? Or
would some other ordering be best?

I guess I'm asking, what questions about videos are you actually
looking to answer, and how could this type of report be modified to
better answer them?

--bawolff

p.s. For those interested in this sort of thing, the sql query I used was:

select page_title, GROUP_CONCAT( i2.img_name separator ', ' ) as
"commons videos", GROUP_CONCAT( i1.img_name separator ', ' ) as
"enwiki videos", GROUP_CONCAT( i3.img_name separator ', ' ) as
"commons long gifs", GROUP_CONCAT( i4.img_name separator ', ' ) as
"enwiki long gifs" from page inner join imagelinks on il_from =
page_id left join image i1 on il_to = i1.img_name and
i1.img_media_type = 'VIDEO' left join commonswiki_p.image i2 on il_to
= i2.img_name and i2.img_media_type = 'VIDEO' left join
commonswiki_p.image i3 on il_to = i3.img_name and i3.img_media_type =
'BITMAP' and i3.img_major_mime = 'image' and i3.img_minor_mime = 'gif'
and i3.img_metadata regexp '"duration";d:\\d{2,}' left join image i4
on il_to = i4.img_name and i4.img_media_type = 'BITMAP' and
i4.img_major_mime = 'image' and i4.img_minor_mime = 'gif' and
i4.img_metadata regexp '"duration";d:\\d{2,}' where page_namespace = 0
and (i1.img_name is not null or i2.img_name is not null or i3.img_name
is not null or i4.img_name is not null) group by page_title;

_______________________________________________
Wikivideo-l mailing list
Wikivideo-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikivideo-l