[Commons-l] Files per decade

Lars Aronsson lars at aronsson.se
Wed May 19 11:45:25 UTC 2010


Wikipedia was created in 2001 and the image bank Wikimedia Commons
a few years later. It now contains 6 million files, mostly images.
Most of them use the template:Information which has a Date= field
to indicate when the content was created. The ideal format is the
ISO date format YYYY-MM-DD, but this is not always followed. When
I tried to parse the year, I was successful for 3.5 million files.
(Maybe I didn't try very hard.)

So, when were our files created? Of course, most were created
after Wikipedia was founded, in the most recent decade.
Even for old buildings, new photos were taken and uploaded.

For older decades, we should expect more information for more
recent ones, since more cameras were in used and more books
published with each new decade. Exactly how big has that
growth rate been?

It turns out, we have roughly 2% more files for each new year.
A graph plotting each year is very bumpy, but if sum up each
decade, it becomes quite smooth. This does not mean that content
production increased with 2% annually, but the content that
survived and was copied to Wikimedia Commons has grown this fast.

But this is only true for the years between 1750 and 1900.

For years before 1750, before enlightenment, the growth rate
is only 0.5 percent annually. Also quite reasonable.

The real surprise is that after 1900, there is no growth.
We have roughly 30,000 files from each decade in the
20th century. These are the numbers I found:

1850s  8652 files
1860s 12144
1870s 16561
1880s 19382
1890s 25985
1900s 37936
1910s 34882
1920s 23715
1930s 24507
1940s 30720
1950s 29364
1960s 24164
1970s 23991
1980s 31185
1990s 45423
2000s 2,951,138 files

And the graph is found on
http://commons.wikimedia.org/wiki/File:Wikimedia_Commons_files_per_decade.png

My guess is that this is an effect of copyright laws,
which locks down the use of 20th century content.


-- 
  Lars Aronsson (lars at aronsson.se)
  Aronsson Datateknik - http://aronsson.se





More information about the Commons-l mailing list