Re: [Commons-l] Files per decade

20 May 2010


      Hi Lars,
Op 19-5-2010 13:45, Lars Aronsson schreef:
...
Wikipedia was created in 2001 and the image bank Wikimedia Commons
a few years later. It now contains 6 million files, mostly images.
Most of them use the template:Information which has a Date= field
to indicate when the content was created. The ideal format is the
ISO date format YYYY-MM-DD, but this is not always followed. When
I tried to parse the year, I was successful for 3.5 million files.
(Maybe I didn't try very hard.)
I guess you used a regex. Which one exactly? Or did you publish your 
code somewhere?
...
So, when were our files created? Of course, most were created
after Wikipedia was founded, in the most recent decade.
Even for old buildings, new photos were taken and uploaded.
For older decades, we should expect more information for more
recent ones, since more cameras were in used and more books
published with each new decade. Exactly how big has that
growth rate been?
It turns out, we have roughly 2% more files for each new year.
A graph plotting each year is very bumpy, but if sum up each
decade, it becomes quite smooth. This does not mean that content
production increased with 2% annually, but the content that
survived and was copied to Wikimedia Commons has grown this fast.
But this is only true for the years between 1750 and 1900.
For years before 1750, before enlightenment, the growth rate
is only 0.5 percent annually. Also quite reasonable.
The real surprise is that after 1900, there is no growth.
We have roughly 30,000 files from each decade in the
20th century. These are the numbers I found:
1850s  8652 files
1860s 12144
1870s 16561
1880s 19382
1890s 25985
1900s 37936
1910s 34882
1920s 23715
1930s 24507
1940s 30720
1950s 29364
1960s 24164
1970s 23991
1980s 31185
1990s 45423
2000s 2,951,138 files
And the graph is found on
http://commons.wikimedia.org/wiki/File:Wikimedia_Commons_files_per_decade.pn...
My guess is that this is an effect of copyright laws,
which locks down the use of 20th century content.
Nice stats! I wonder how the distribution is of years with the images of 
the batch uploads and if this influences the overall statistics.
Do you have a list of the 2.5 M files you couldn't parse? We might be 
able to add dates to some of these images or convert the dates to the 
ISO format.
Maarten

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Commons-l] Files per decade