Optim wrote on wikipedia-l
I think it would be a good idea to have separate
databases (or downloadable files) for userpages,
talkpages and articles.
..
Having separate databases (or downloadable files)
will help the people who mirror our content to
copy just what they really want (the articles)
and not userpages and talkpages
Optim provided non-technical rationale.
People who want to download Wikipedia for local browsing might appreciate
smaller dump sizes,
so I checked the distribution of records per namespace.
Here are figures for the most recent fr: dumps (largest dumps that I can
download without errors)
Keep in mind that real dumps are smaller due to compression, and that some
namespaces may compress better than other due to similarity of subsequent
versions.
I am not sure what to conclude from this, but here are the figures anyway.
CUR table
namespace=description: x bytes = y perc. of total - z number of records
0: Articles: 58346580 bytes = 71.7% - 32549 records
1: Article discussions: 7487166 bytes = 9.2% - 3714 records
2: User pages: 1811361 bytes = 2.2% - 1254 records
3: User discussions: 3737941 bytes = 4.5% - 1379 records
4: Wikipedia: 6788373 bytes = 8.3% - 641 records
5: Wikipedia discussions: 1680691 bytes = 2% - 282 records
6: Image pages: 1020852 bytes = 1.2% - 3653 records
7: Image discussions: 48294 bytes = 0% - 62 records
8: Messages: 389328 bytes = 0.4% - 600 records
9: Message discussions: 10702 bytes = 0% - 8 records
OLD table
namespace=description: x bytes = y perc. of total - z number of records
0: Articles: 783459699 bytes = 51.4% - 195416 records
1: Article discussions: 100863970 bytes = 6.6% - 12668 records
2: User pages: 19020815 bytes = 1.2% - 5765 records
3: User discussions: 133820662 bytes = 8.7% - 10385 records
4: Wikipedia: 455423770 bytes = 29.8% - 22493 records
5: Wikipedia discussions: 30388381 bytes = 1.9% - 2408 records
6: Image pages: 558602 bytes = 0% - 2456 records
7: Image discussions: 109211 bytes = 0% - 90 records
8: Messages: 91808 bytes = 0% - 109 records
9: Message discussions: 21505 bytes = 0% - 16 records
Erik Zachte