/mnt/user-store is full

List overview All Threads
Download

newer

older

jobserver

I10N and toolserver subdomains

Darkdadaah

22 Dec 2010 22 Dec '10

2:31 p.m.

Hi, it looks like the user-store is completely full:

...

df

hemlock:/aux0/user-store 3904398336 3904398336 0 100% /mnt/user-store

It requires cleaning and sorting (especially the dumps that are all over the place), but I especially wonder what is taking so much place ? I guess I'll have to use my home quota for now.

Darkdadaah

Attachments:

attachment.htm (text/html — 416 bytes)

Show replies by date

Aryeh Gregor

22 Dec 22 Dec

9:29 p.m.

On Wed, Dec 22, 2010 at 8:31 AM, Darkdadaah darkdadaah@yahoo.fr wrote:

...

Hi, it looks like the user-store is completely full:

...
df

hemlock:/aux0/user-store 3904398336 3904398336 0 100% /mnt/user-store

It requires cleaning and sorting (especially the dumps that are all over the place), but I especially wonder what is taking so much place ?

I guess I'll have to use my home quota for now.

I'm looking at this, but I have no brighter idea than just running du on it, which is taking a very long time. So if any other roots have a better idea to figure out what's going on and/or fix it, feel free to kill my du process (running as root) and delete /tmp/userstore-du on hemlock. (I'm also not quite sure what I'd do if I did figure out the culprit, since I don't want to delete users' data without their permission unless it's clearly useless.)

Gerald A

9:36 p.m.

Hey,

On Wed, Dec 22, 2010 at 3:29 PM, Aryeh Gregor <Simetrical+wikilist@gmail.comSimetrical%2Bwikilist@gmail.com

...

wrote:

...

I'm looking at this, but I have no brighter idea than just running du on it, which is taking a very long time. So if any other roots have a better idea to figure out what's going on and/or fix it, feel free to kill my du process (running as root) and delete /tmp/userstore-du on hemlock. (I'm also not quite sure what I'd do if I did figure out the culprit, since I don't want to delete users' data without their permission unless it's clearly useless.)

It looks like a huge volume, so it makes sense that it would take some time to complete.

Are quotas enabled on the volume? That might give you a quick snapshot of who the biggest user is.

You could try running the du in a subdirectory, which might give you some more ideas about big directories. (You can try: du -k | sort -rn).

I'm not a root, so I can't help in the problem analysis except by writing here. :)

Gerald

Aryeh Gregor

9:40 p.m.

On Wed, Dec 22, 2010 at 3:36 PM, Gerald A geraldablists@gmail.com wrote:

...

It looks like a huge volume, so it makes sense that it would take some time to complete.

Yes. It's spent almost the whole time so far in /aux0/user-store/osm_hillshading, which looks to be millions of tiny files split up over tens of thousands of directories.

...

Are quotas enabled on the volume? That might give you a quick snapshot of who the biggest user is.

Not as far as I can tell. quota -v on users who have lots of files there doesn't return any results.

...

You could try running the du in a subdirectory, which might give you some more ideas about big directories. (You can try: du -k | sort -rn).

I could, but it doesn't seem like it would be much faster than just waiting for a du on the whole thing to complete.

Colin Marquardt

23 Dec 23 Dec

12:13 a.m.

2010/12/22 Aryeh Gregor Simetrical+wikilist@gmail.com:

...

Yes. It's spent almost the whole time so far in /aux0/user-store/osm_hillshading, which looks to be millions of tiny files split up over tens of thousands of directories.

This has been like that for months, and never was a problem so far. It's also unlikely that this directory has increased in size lately, unless someone apart from me has added stuff there.

Cheers Colin

2:33 a.m.

On 12/22/2010 5:13 PM, Colin Marquardt wrote:

...

2010/12/22 Aryeh Gregor Simetrical+wikilist@gmail.com:

...
Yes. It's spent almost the whole time so far in /aux0/user-store/osm_hillshading, which looks to be millions of tiny files split up over tens of thousands of directories.

Well, there's 986 Gigs in /mnt/user-store/stats

Daniel Kinzler

24 Dec 24 Dec

11:23 a.m.

FYI, we have ordered a new array with 24 TB of space for stats, user store, etc. We hope to get it installed in January. Things will get better soon.

-- daniel

On 23.12.2010 02:33, Q wrote:

...

On 12/22/2010 5:13 PM, Colin Marquardt wrote:

...
2010/12/22 Aryeh Gregor Simetrical+wikilist@gmail.com:

...
Yes. It's spent almost the whole time so far in /aux0/user-store/osm_hillshading, which looks to be millions of tiny files split up over tens of thousands of directories.

Well, there's 986 Gigs in /mnt/user-store/stats

Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette

DaB.

1:44 p.m.

Hello, At Friday 24 December 2010 13:43:34 DaB. wrote:

...

FYI, we have ordered a new array with 24 TB of space for stats, user store, etc. We hope to get it installed in January. Things will get better soon.

but this should not stop people to look into the user-store and remove old data which they need anymore ;-). (like the 7th dump of enwp of the same age).

Sincerly, DaB.

-- Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885

emijrp

28 Dec 28 Dec

3:51 p.m.

WOW, that is great!

BTW I have deleted some temp files of my own in /mnt/user-store (about 27 GB).

2010/12/24 Daniel Kinzler daniel@brightbyte.de

...

FYI, we have ordered a new array with 24 TB of space for stats, user store, etc. We hope to get it installed in January. Things will get better soon.

-- daniel

On 23.12.2010 02:33, Q wrote:

...
On 12/22/2010 5:13 PM, Colin Marquardt wrote:

...
2010/12/22 Aryeh Gregor <Simetrical+wikilist@gmail.comSimetrical%2Bwikilist@gmail.com

:

...
...
Yes. It's spent almost the whole time so far in /aux0/user-store/osm_hillshading, which looks to be millions of tiny files split up over tens of thousands of directories.

Well, there's 986 Gigs in /mnt/user-store/stats

Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list:

https://wiki.toolserver.org/view/Mailing_list_etiquette

Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette

emijrp

3:52 p.m.

Please, do not delete that files, they are important for stats.

2010/12/23 Q overlordq@gmail.com

...

On 12/22/2010 5:13 PM, Colin Marquardt wrote:

...
2010/12/22 Aryeh Gregor <Simetrical+wikilist@gmail.comSimetrical%2Bwikilist@gmail.com :

...
Yes. It's spent almost the whole time so far in /aux0/user-store/osm_hillshading, which looks to be millions of tiny files split up over tens of thousands of directories.

Well, there's 986 Gigs in /mnt/user-store/stats

Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette

Frédéric Schütz

2 Jan 2 Jan

2:12 p.m.

New subject: Compressing stats files better (was; Re: /mnt/user-store is full)

On 12/28/2010 03:52 PM, emijrp wrote:

...

...
Well, there's 986 Gigs in /mnt/user-store/stats

Please, do not delete that files, they are important for stats.

Following a suggestion made by River, I am currently recompressing all files using the newly-installed "xz" program. Based on an arbitrary sample (all files from 1 January 2011), we can expect a size reduction of about 25% -- or about 250 Gb freed in total. The compression time, however, is quite long, and I haven't measured how the decompression time compares with gzip.

Let me know if the xz format causes any problem; I have started recompressing the older files, which are not used so often, but if this is ok with all users, I may start in the future recompressing the new files as soon as they are downloaded.

Hopefully, with the new storage space, we'll be able to store these files in a more convenient way (not only raw files as they are now, but also the same information in a better format).

Frédéric

emijrp

3 Jan 3 Jan

1:25 a.m.

New subject: Compressing stats files better (was; Re: /mnt/user-store is full)

Hi Frederic, thanks for your work. Have you tested 7z?

We can compress to xz while the new disks arrive. I read that it is about 24 TB, so, we can revert to gzip in the future.

2011/1/2 Frédéric Schütz schutz@mathgen.ch

...

On 12/28/2010 03:52 PM, emijrp wrote:

...
...
Well, there's 986 Gigs in /mnt/user-store/stats

Please, do not delete that files, they are important for stats.

Following a suggestion made by River, I am currently recompressing all files using the newly-installed "xz" program. Based on an arbitrary sample (all files from 1 January 2011), we can expect a size reduction of about 25% -- or about 250 Gb freed in total. The compression time, however, is quite long, and I haven't measured how the decompression time compares with gzip.

Let me know if the xz format causes any problem; I have started recompressing the older files, which are not used so often, but if this is ok with all users, I may start in the future recompressing the new files as soon as they are downloaded.

Hopefully, with the new storage space, we'll be able to store these files in a more convenient way (not only raw files as they are now, but also the same information in a better format).

Frédéric

Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette

River Tarnell

1:29 a.m.

New subject: Compressing stats files better (was; Re: /mnt/user-store is full)

emijrp:

...

We can compress to xz while the new disks arrive. I read that it is about 24 TB, so, we can revert to gzip in the future.

I suggested xz pretty much as an emergency fix; we don't want to delete the files, but they do take up a lot of space.

I don't mind switching back to gzip when we have more space; in fact, I already suggested this to schutz in private. However, if xz works okay (and it should, since the interface is identical to gzip) we may as well stay with it.

If it turns out to be too slow, we could consider 7z or something else (rzip, bzip2, ..., or maybe even just gzip -9).

- river.

Daniel Kinzler

10:26 p.m.

New subject: Compressing stats files better (was; Re: /mnt/user-store is full)

On 03.01.2011 01:29, River Tarnell wrote:

...

If it turns out to be too slow, we could consider 7z or something else (rzip, bzip2, ..., or maybe even just gzip -9).

i found pbzip2 to be nice. bzip2, just faster :)

-- daniel

River Tarnell

1:35 a.m.

New subject: Compressing stats files better (was; Re: /mnt/user-store is full)

emijrp:

...

We can compress to xz while the new disks arrive. I read that it is about 24 TB, so, we can revert to gzip in the future.

PS: Yes, we are installing 24TB of disks (12x 2TB disks), but this is raw space. To begin with, a 2TB disk is only 1,862 GB of real space. Assuming we configure the disks as RAID 50 with two 6-disk legs, that gives (1862*(6-1))*2 = 18,620 GB (18.2TB) usable space. We will reserve some of this for internal use (such as backups), so the total amount available to users will be less than that.

When budgeting for this upgrade, we assumed 5TB would be used for user-store. In reality, there should be a lot more than this (we originally planned to use 1TB disks); but it won't be a full 24TB.

- river.

Daniel Kinzler

10:28 p.m.

New subject: Compressing stats files better (was; Re: /mnt/user-store is full)

On 03.01.2011 01:35, River Tarnell wrote:

...

When budgeting for this upgrade, we assumed 5TB would be used for user-store. In reality, there should be a lot more than this (we originally planned to use 1TB disks); but it won't be a full 24TB.

indeed. listen to river. sorry for throwing around numbers :P

-- daniel

Frederic Schutz

5:08 p.m.

New subject: Compressing stats files better (was; Re: /mnt/user-store is full)

emijrp wrote:

...

Hi Frederic, thanks for your work. Have you tested 7z?

It makes no difference to me. River suggested (and installed) xz, so I used it, but 7z would have worked too.

A quick test using my biased data for one day (but it should be representative enough):

$ du -s * 1027260 7z 1004 M, 25.27% saved 1374804 gz 1.4 G, 0% saved 1020692 xz 997 M, 25.75% saved

The difference between xz and 7z is negligible (<1%). I haven't benchmarked anything formally, but 7z was much faster on my system. It looks like this is mainly because the software can use several cores simultaneously.

...

We can compress to xz while the new disks arrive. I read that it is about 24 TB, so, we can revert to gzip in the future.

Is there any particular reason to use gzip ? When I use these files, I mostly uncompress them on the fly from Perl, and there is a module to do this with zx too (haven't tested it, though). I am sure Python and other languages can do the same.

Even if we have plenty of space, it makes sense to use xz (or another format that offers good compression) and to benefit from the size reduction, for example if/when these files are backuped or moved around. Also, I'd like to be able to provide the files for download for those people who want local copies [several academic groups have already requested them], and the 25% size reduction is a big bonus here too.

But as I wrote earlier, these files are mostly archived on the toolserver, and I assume that most users don't dig often through the older ones, so that the best compression should not be a problem.

A better file format (e.g. one file per day, with separate data for 24 hours, and another file with data aggregated per day) is probably what is most needed for "real uses" -- as far as I know, this is how Erik Zachte handles this data. A databae would be best, of course, but requires much more work...

As always, comments are very welcome.

Frédéric

Platonides

5:57 p.m.

New subject: Compressing stats files better (was; Re: /mnt/user-store is full)

Frederic Schutz wrote:

...

emijrp wrote:

...
Hi Frederic, thanks for your work. Have you tested 7z?

It makes no difference to me. River suggested (and installed) xz, so I used it, but 7z would have worked too.

A quick test using my biased data for one day (but it should be representative enough):

$ du -s * 1027260 7z 1004 M, 25.27% saved 1374804 gz 1.4 G, 0% saved 1020692 xz 997 M, 25.75% saved

The difference between xz and 7z is negligible (<1%).

xz has a much saner syntax.

River Tarnell

27 Dec 27 Dec

1:09 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Aryeh Gregor:

...

I'm looking at this, but I have no brighter idea than just running du on it, which is taking a very long time.

I wrote a tool called summdisk for this, which produces reports on per-user disk usage:

http://lists.wikimedia.org/pipermail/toolserver-announce/2010-September/000343.html

Unfortunately it still takes a very long time to run. According to df, there are 136,110,408 inodes used on the volume[0]; I wonder if people who currently create large numbers of small files could save some accounting space by aggregating them into larger blocks, like OSM's meta-tiles.

- river.

[0] of which ~127m or 93% are used by a single user

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (NetBSD) iEYEARECAAYFAk0YgfUACgkQIXd7fCuc5vIv8ACguhvsRsc+lRi6hGOaEpWR/Xfg cxYAn2sjtEkeeG+uoWrvTvH/5fQQ4U54 =WWcS -----END PGP SIGNATURE-----

5115

Age (days ago)

5127

Last active (days ago)

toolserver-l@lists.wikimedia.org

18 comments

11 participants

tags (0)

participants (11)

Aryeh Gregor
Colin Marquardt
DaB.
Daniel Kinzler
Darkdadaah
emijrp
Frédéric Schütz
Gerald A
Platonides
Q
River Tarnell