Hello,
today we have run a little bit short of disc-space on the /home-partition. I searched for files with more than 500MB, hoping to find some old log-files that I could delete before I tell you to clean-up. What I found was upsetting: The biggest log-file I found was 74GB(!) big and several others were also dozen of GB big. I deleted them all (a list can be found at [1]).
Guys, what is so hard to check from time to time a big a logfile is and truncate it? Do I really have to speed-up the re-installation of the quota- system so that you all have 256MB per default and angry mails are send if you use more?
So please: Use the weekend to log into your toolserver-account, check how much disc-space your use (use "du -hs your(sub)directoryhere" for that) and look if you can do some clean-up. If everything is ok and you still use 5GB of disc- space: no problem, if you need it, take it.
I will contact the top10-disc-users on Monday by email.
Sincerely, DaB.
* [1] https://jira.toolserver.org/browse/MNT-1252
On Thu, Jun 21, 2012 at 10:02:21PM +0200, DaB. wrote:
Hello,
today we have run a little bit short of disc-space on the /home-partition. I searched for files with more than 500MB, hoping to find some old log-files that I could delete before I tell you to clean-up. What I found was upsetting: The biggest log-file I found was 74GB(!) big and several others were also dozen of GB big. I deleted them all (a list can be found at [1]).
Guys, what is so hard to check from time to time a big a logfile is and truncate it? Do I really have to speed-up the re-installation of the quota- system so that you all have 256MB per default and angry mails are send if you use more?
Better, just setup logrotation, if you just rotate each day and keep a week of logs (or maybe more, depending on what it is), it won't grow out of hand.
So please: Use the weekend to log into your toolserver-account, check how much disc-space your use (use "du -hs your(sub)directoryhere" for that) and look if you can do some clean-up. If everything is ok and you still use 5GB of disc- space: no problem, if you need it, take it.
I will contact the top10-disc-users on Monday by email.
I had a quick look over the MMP's I have access to, and found some core-files, that is perhaps one you can scan for as well. I would say it is save to remove all core-files older then a month.
Regards,
Andre
On 21/06/12 23:12, Andre Koopal wrote:
I had a quick look over the MMP's I have access to, and found some core-files, that is perhaps one you can scan for as well. I would say it is save to remove all core-files older then a month.
Regards,
Andre
Why are they generated by default?
*Removes two php-cgi cores 30M each, plus 8M python core*
On 22/06/12 00:16, Platonides wrote:
On 21/06/12 23:12, Andre Koopal wrote:
I had a quick look over the MMP's I have access to, and found some core-files, that is perhaps one you can scan for as well. I would say it is save to remove all core-files older then a month.
Regards,
Andre
Why are they generated by default?
*Removes two php-cgi cores 30M each, plus 8M python core*
/me removes two php-cgi cores, 12M and 160M, being now in quota limits.
Andre Koopal wrote:
On Thu, Jun 21, 2012 at 10:02:21PM +0200, DaB. wrote:
today we have run a little bit short of disc-space on the /home-partition. I searched for files with more than 500MB, hoping to find some old log-files that I could delete before I tell you to clean-up. What I found was upsetting: The biggest log-file I found was 74GB(!) big and several others were also dozen of GB big. I deleted them all (a list can be found at [1]).
Guys, what is so hard to check from time to time a big a logfile is and truncate it? Do I really have to speed-up the re-installation of the quota- system so that you all have 256MB per default and angry mails are send if you use more?
Better, just setup logrotation, if you just rotate each day and keep a week of logs (or maybe more, depending on what it is), it won't grow out of hand.
Did you read https://jira.toolserver.org/browse/MNT-1252? Your reply makes it seem as though you did not. Log rotation isn't needed here; a ban on the use of interwiki.py is what's needed here. Oy vey.
MZMcBride
On Fri, Jun 22, 2012 at 11:25 AM, MZMcBride z@mzmcbride.com wrote:
Did you read https://jira.toolserver.org/browse/MNT-1252? Your reply makes it seem as though you did not. Log rotation isn't needed here; a ban on the use of interwiki.py is what's needed here. Oy vey.
Wasn't that done? apart from the multimantainer
Hello,
sorry, I am afraid I was a bit fast with a counter measure but I set a quota for everyone now. If the user has more it is still no problem but he wont be able to create new files. Please let me know if this is a show stopper for your account and I will handle this asap if your reason is sensible.
Cheers nosy
On Jun 22, 2012, at 1:06 PM, Marlen Caemmerer wrote:
Hello,
sorry, I am afraid I was a bit fast with a counter measure but I set a quota for everyone now. If the user has more it is still no problem but he wont be able to create new files. Please let me know if this is a show stopper for your account and I will handle this asap if your reason is sensible.
Cheers nosy
This is insane in my opinion. There sure is a better reason to cause service disription for lots of hardworking volunteers in a way that there is almost no way to find out whats going on.
Toolserver users genrally don't work on their tools every day. I just got home and after getting stuff running I hear reports that people can't log into my php tools. The fact that I have time to look into this right away is probably an exception compared to the average ts-user. Two of the MMPs I maintain are broken, and several irc bots are down.
I see: * No automatic e-mails or anything * Nothing in php errors * No mysql errors (since the error report I got mentinoed that users couldn't log into the tool) * No obvious thread on toolserver-l (lots of noise there anyway, maybe we need a separate toolserver-announce-l for stuff that actually matters that likely need users to do something?)
Now, in the mean time I think I know what caused it. But just so you know, here is a short summary of how I've spend the last 3 hours trying to figure out what the hell is going on. And hopefully will encourage ts-admins to act more carefully or at least better communicate.
One of the MMPs is CVN. The IRC bots timed out earlier today and those that I granted access to start them from a web control panel couldn't log in. Turns out that the PHP sessions were the issue. For some reason whenever the session was modified, it was emptied and the user was as if there is no active session. Whenever a new session is created, it appears to work fine, until you look up the data in the next request and find it is gone.
After having checked status.toolserver.org and looking up mysql errors, php errors and then ssh-ing into my account and trying to access the database directly, it turns out everything looks fine.
I opened TS-1422[1], and worked on a test case to reproduce it in a plain .php file. Tried to upload it to /home/krinkle/public_html/tmp and everything seems to have gone fine. No errors or anything out of the ordinary.
Then when I try to access that file from the web, I get a blank 200 OK reponse. Looking it up in SSH shows me it is chmod 000 and size 0 bytes. So I opened up TS-1423[2].
Then I'm reading up on toolserver-l and see that the quotas are finally going to enforced. I welcome that. DaB tells us we have the weekend to make sure we are either under the quota or have requested a bigger quota. This sounds reasonable to me.
I did not connect the problems I was hearing about from all over with the quota that was going to come after the weekend. The reason being that I did not get any emails regarding limits being reached on /home/krinkle (or the home of the MMPs) or any errors when trying to write to a file.
e.g. $ echo "Hello World" > test.txt
.. works fine without errors. But looking it up shows it is size 0 bytes. If this is indeed being done by the quota system, then I'd recommend getting a better quota system or configuring it differenly. Allowing empty files to be created is one thing. Silently ignoring non-empty write attempts and turning them into empty files without any form of response is quite another. Obviously I'd rather have no file at all, then a broken file without any indication that it is broken.
Also, $ quota -v; gives me this rather useless response:
cvn@willow:~$ quota -v cvn; Disk quotas for cvn (uid 8153): Filesystem usage quota limit timeleft files quota limit timeleft
Looks like something is missing there?
Connect that to DaB's mail, and I'd say this means the quota will come, but is not yet started/activated. So I spend another hour trying to find out the "real" cause (which, obviously, I didn't find since it is indeed caused by the quota). And tried to temporarily disabled a few things only to find out that the files I modified are gone:
For example: * /home/krinkle/public_html/wikimedia-svn-search/header.php - 0 bytes * /home/krinkle/public_html/tmp/session-test.php - 0 bytes
And then I see your message that (albeit it not appearing so) the quota has indeed been enabled for everyone now. Why? Now I can't even try to clean up, because I can't even edit a big file and replace it with "Temporarily disabled". I can't remove 100MB to add a small .ini file. I can't comment out things that are breaking stuff. My account is completely locked and anything I try to touch is immediately wiped. Error/warnings are absent.
On IRC it was pointed out that logging in would tell me if the limit is reached. Looking again, it does say "Block limit reached on /home". But considering it makes no mention of "quota" and no mention of "/home/krinkle". I didn't notice it. And also, it was placed in no particularly attention-grabby way. Just on the bottom of the welcome screen.
And then there is the fact that that is only for personal accounts. For MMPs there is no welcome screen. So for MMPs this information is not expressed in any place I know of.
So, afraid of touching anything else, I'll log out, and wait for things to be fixed on your end. So I can then fix things on my end.
Thanks, -- Krinkle
[1] https://jira.toolserver.org/browse/TS-1422 [2] https://jira.toolserver.org/browse/TS-1423
On Jun 21, 2012, at 10:02 PM, DaB. wrote:
So please: Use the weekend to log into your toolserver-account, check how much disc-space your use (use "du -hs your(sub)directoryhere" for that) and look if you can do some clean-up. If everything is ok and you still use 5GB of disc- space: no problem, if you need it, take it.
As for the quota -v query, I tested it shortly after my Toolserver account has been created, and up to now, it is the same thing: exactly as Krinkle had stated it below, so this is no "new" problem, though it needs to be fixed.
________________________________ From: Krinkle krinklemail@gmail.com To: Toolserver-l toolserver-l@lists.wikimedia.org Sent: Friday, June 22, 2012 8:04 PM Subject: Re: [Toolserver-l] Dude, where is my logfile?
On Jun 22, 2012, at 1:06 PM, Marlen Caemmerer wrote:
Hello,
sorry, I am afraid I was a bit fast with a counter measure but I set a quota for everyone now. If the user has more it is still no problem but he wont be able to create new files. Please let me know if this is a show stopper for your account and I will handle this asap if your reason is sensible.
Cheers nosy
This is insane in my opinion. There sure is a better reason to cause service disription for lots of hardworking volunteers in a way that there is almost no way to find out whats going on.
Toolserver users genrally don't work on their tools every day. I just got home and after getting stuff running I hear reports that people can't log into my php tools. The fact that I have time to look into this right away is probably an exception compared to the average ts-user. Two of the MMPs I maintain are broken, and several irc bots are down.
I see: * No automatic e-mails or anything * Nothing in php errors * No mysql errors (since the error report I got mentinoed that users couldn't log into the tool) * No obvious thread on toolserver-l (lots of noise there anyway, maybe we need a separate toolserver-announce-l for stuff that actually matters that likely need users to do something?)
Now, in the mean time I think I know what caused it. But just so you know, here is a short summary of how I've spend the last 3 hours trying to figure out what the hell is going on. And hopefully will encourage ts-admins to act more carefully or at least better communicate.
One of the MMPs is CVN. The IRC bots timed out earlier today and those that I granted access to start them from a web control panel couldn't log in. Turns out that the PHP sessions were the issue. For some reason whenever the session was modified, it was emptied and the user was as if there is no active session. Whenever a new session is created, it appears to work fine, until you look up the data in the next request and find it is gone.
After having checked status.toolserver.org and looking up mysql errors, php errors and then ssh-ing into my account and trying to access the database directly, it turns out everything looks fine.
I opened TS-1422[1], and worked on a test case to reproduce it in a plain .php file. Tried to upload it to /home/krinkle/public_html/tmp and everything seems to have gone fine. No errors or anything out of the ordinary.
Then when I try to access that file from the web, I get a blank 200 OK reponse. Looking it up in SSH shows me it is chmod 000 and size 0 bytes. So I opened up TS-1423[2].
Then I'm reading up on toolserver-l and see that the quotas are finally going to enforced. I welcome that. DaB tells us we have the weekend to make sure we are either under the quota or have requested a bigger quota. This sounds reasonable to me.
I did not connect the problems I was hearing about from all over with the quota that was going to come after the weekend. The reason being that I did not get any emails regarding limits being reached on /home/krinkle (or the home of the MMPs) or any errors when trying to write to a file.
e.g. $ echo "Hello World" > test.txt
.. works fine without errors. But looking it up shows it is size 0 bytes. If this is indeed being done by the quota system, then I'd recommend getting a better quota system or configuring it differenly. Allowing empty files to be created is one thing. Silently ignoring non-empty write attempts and turning them into empty files without any form of response is quite another. Obviously I'd rather have no file at all, then a broken file without any indication that it is broken.
Also, $ quota -v; gives me this rather useless response:
cvn@willow:~$ quota -v cvn; Disk quotas for cvn (uid 8153): Filesystem usage quota limit timeleft files quota limit timeleft
Looks like something is missing there?
Connect that to DaB's mail, and I'd say this means the quota will come, but is not yet started/activated. So I spend another hour trying to find out the "real" cause (which, obviously, I didn't find since it is indeed caused by the quota). And tried to temporarily disabled a few things only to find out that the files I modified are gone:
For example: * /home/krinkle/public_html/wikimedia-svn-search/header.php - 0 bytes * /home/krinkle/public_html/tmp/session-test.php - 0 bytes
And then I see your message that (albeit it not appearing so) the quota has indeed been enabled for everyone now. Why? Now I can't even try to clean up, because I can't even edit a big file and replace it with "Temporarily disabled". I can't remove 100MB to add a small .ini file. I can't comment out things that are breaking stuff. My account is completely locked and anything I try to touch is immediately wiped. Error/warnings are absent.
On IRC it was pointed out that logging in would tell me if the limit is reached. Looking again, it does say "Block limit reached on /home". But considering it makes no mention of "quota" and no mention of "/home/krinkle". I didn't notice it. And also, it was placed in no particularly attention-grabby way. Just on the bottom of the welcome screen.
And then there is the fact that that is only for personal accounts. For MMPs there is no welcome screen. So for MMPs this information is not expressed in any place I know of.
So, afraid of touching anything else, I'll log out, and wait for things to be fixed on your end. So I can then fix things on my end.
Thanks, -- Krinkle
[1] https://jira.toolserver.org/browse/TS-1422 [2] https://jira.toolserver.org/browse/TS-1423
On Jun 21, 2012, at 10:02 PM, DaB. wrote:
So please: Use the weekend to log into your toolserver-account, check how much
disc-space your use (use "du -hs your(sub)directoryhere" for that) and look if you can do some clean-up. If everything is ok and you still use 5GB of disc- space: no problem, if you need it, take it.
_______________________________________________ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Le 23 juin 2012 à 03:41, Hazard-SJ hazard_sj@yahoo.com a écrit :
As for the quota -v query, I tested it shortly after my Toolserver account has been created, and up to now, it is the same thing: exactly as Krinkle had stated it below, so this is no "new" problem, though it needs to be fixed.
Right but actually quota -v seems to work fine for users (at least works fine for me) but not for MMPs (I tested it on interwikibot). So I though quotas were only activated for *users* but it looks like not according to Krinkle's problems...
Regards, Toto Azéro
On Jun 23, 2012, at 8:41 AM, Toto Azéro wrote:
Le 23 juin 2012 à 03:41, Hazard-SJ hazard_sj@yahoo.com a écrit :
As for the quota -v query, I tested it shortly after my Toolserver account has been created, and up to now, it is the same thing: exactly as Krinkle had stated it below, so this is no "new" problem, though it needs to be fixed.
Right but actually quota -v seems to work fine for users (at least works fine for me) but not for MMPs (I tested it on interwikibot). So I though quotas were only activated for *users* but it looks like not according to Krinkle's problems...
Regards, Toto Azéro
For my personal account the quota command is working as expected.
In the mean time I've discovered that /mnt/user-store is a lot easier to use then I thought. Yesterday I moved almost all of the 3.2 GB over there and fixed paths to restore immediate service. In the meantime I've cleaned everything up and I'm down to ~400M in the /home's of my account and the MMPs. and the /mnt/user-store about ~ 1-2 GB.
Thanks :)
-- Krinkle
Hi Krinkle,
Op 24-6-2012 2:17, Krinkle schreef:
In the mean time I've discovered that /mnt/user-store is a lot easier to use then I thought.
Don't forget that there are no backups of /mnt/user-store so if something gets deleted or the array dies, you won't be able to restore anything.
Maarten
So anyone can create folders in /mnt/user-store right?
Hazard-SJ
________________________________ From: Maarten Dammers maarten@mdammers.nl To: toolserver-l@lists.wikimedia.org Sent: Sunday, June 24, 2012 7:06 AM Subject: Re: [Toolserver-l] Dude, where is my logfile?
Hi Krinkle,
Op 24-6-2012 2:17, Krinkle schreef:
In the mean time I've discovered that /mnt/user-store is a lot easier to use then I thought.
Don't forget that there are no backups of /mnt/user-store so if something gets deleted or the array dies, you won't be able to restore anything.
Maarten
_______________________________________________ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Platonides wrote:
On 25/06/12 20:44, Hazard-SJ wrote:
So anyone can create folders in /mnt/user-store right?
Hazard-SJ
Yes.
More info: https://wiki.toolserver.org/view/user-store. The page could use some love.
MZMcBride
P.S. Why is toolserver-l now bizarrely named "A list for the Toolserver run by WM-DE"?
Thanks, and good question, MZMcBride.
Hazard-SJ
________________________________ From: MZMcBride z@mzmcbride.com To: A list for the Toolserver run by WM-DE toolserver-l@lists.wikimedia.org Sent: Monday, June 25, 2012 6:42 PM Subject: Re: [Toolserver-l] Dude, where is my logfile?
Platonides wrote:
On 25/06/12 20:44, Hazard-SJ wrote:
So anyone can create folders in /mnt/user-store right? Hazard-SJ
Yes.
More info: https://wiki.toolserver.org/view/user-store. The page could use some love.
MZMcBride
P.S. Why is toolserver-l now bizarrely named "A list for the Toolserver run by WM-DE"?
_______________________________________________ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Because recently folks combed through all the wmf-hsted mailing lists and added descriptions to those that did not have one. The local list admin should be able to set a better description if this one sucks.
Ariel
Στις 25-06-2012, ημέρα Δευ, και ώρα 16:49 -0700, ο/η Hazard-SJ έγραψε:
Thanks, and good question, MZMcBride.
Hazard-SJ
From: MZMcBride z@mzmcbride.com To: A list for the Toolserver run by WM-DE toolserver-l@lists.wikimedia.org Sent: Monday, June 25, 2012 6:42 PM Subject: Re: [Toolserver-l] Dude, where is my logfile?
Platonides wrote:
On 25/06/12 20:44, Hazard-SJ wrote:
So anyone can create folders in /mnt/user-store right?
Hazard-SJ
Yes.
More info: https://wiki.toolserver.org/view/user-store. The page could use some love.
MZMcBride
P.S. Why is toolserver-l now bizarrely named "A list for the Toolserver run by WM-DE"?
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
On Jun 26, 2012, at 12:34 AM, Ariel T. Glenn wrote:
Because recently folks combed through all the wmf-hsted mailing lists and added descriptions to those that did not have one. The local list admin should be able to set a better description if this one sucks.
How about "Discussion among users of the Wikimedia Toolserver"?
Or shorter: "Wikimedia Toolserver discussion"?
-earwig
Hello, At Tuesday 26 June 2012 17:55:17 DaB. wrote:
Because recently folks combed through all the wmf-hsted mailing lists and added descriptions to those that did not have one.
And I was wondering why my auto-mail-sorting was not longer working… :-(.
The local list admin should be able to set a better description if this one sucks.
I will request ownership of this ML too and than change the name.
Sincerely, DaB.
Apologies for top posting, on a mobile device.
The list description additions and tweaks are being tracked at http://meta.wikimedia.org/wiki/User:Thehelpfulone/Mailing_lists_cleanup.
The tweak for this and a few other lists was boldly done by Daniel, I plan to email list owners to let them know of changes before hand so they don't start wondering who changed their settings.
No further changes should be done before I've sent those emails, which I won't be doing just yet, so feel free to comment on or simply improve my suggested new list descriptions on Meta.
Thehelpfulone
On Tuesday, June 26, 2012, DaB. wrote:
Hello, At Tuesday 26 June 2012 17:55:17 DaB. wrote:
Because recently folks combed through all the wmf-hsted mailing lists and added descriptions to those that did not have one.
And I was wondering why my auto-mail-sorting was not longer working… :-(.
The local list admin should be able to set a better description if this one sucks.
I will request ownership of this ML too and than change the name.
Sincerely, DaB.
-- Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885
And then I see your message that (albeit it not appearing so) the quota has indeed been enabled for everyone now. Why? Now I can't even try to clean up, because I can't even edit a big file and replace it with "Temporarily disabled". I can't remove 100MB to add a small .ini file. I can't comment out things that are breaking stuff. My account is completely locked and anything I try to touch is immediately wiped. Error/warnings are absent.
I have also been bitten by this. On the one hand, it forced me to go through some archived logs that had gone wild. On the other hand, I wanted to a three-line fix to a tool, and that was a blocker. Not even overwriting a bigger file. Delete a file to free space? Nope, it doesn't allow you to write anything. cp over an existing file? No. dd over an existing file? No, that still usess O_TRUNC. Edit the file with nano and paste the new contents? No dd conv=notrunc *does* work. But it's not trivial to come up.
Also, if you got the report:
NFS server ha-nfs.esi not responding still trying NFS server ha-nfs.esi ok
Does it mean everything was processed correctly on disk?
I have a process running for 14 hours checking a tar file which took 20 minutes to create. Seems a bit odd :/
On 23/06/12 03:04, Krinkle wrote:
I did not connect the problems I was hearing about from all over with the quota that was going to come after the weekend. The reason being that I did not get any emails regarding limits being reached on /home/krinkle (or the home of the MMPs) or any errors when trying to write to a file.
e.g. $ echo "Hello World" > test.txt
.. works fine without errors. But looking it up shows it is size 0 bytes.
Interesting. I first suspected echo (1) wasn't reporting the problem, but there's no error reported to the application: $ truss /usr/bin/echo "Hello World" > test.txt (...)
write(1, " H e l l o W o r l d\n", 12) = 12 _exit(0)
OTOH, cat reports the error: $ cat /tmp/test.txt > test.txt cat: write error: Disc quota exceeded
The system call is failing there:
write(1, " H e l l o W o r l d\n", 12) Err#49 EDQUOT
Why such discrepancy?
Edit: actually, it seems that it sometimes the write succeeds and in others it fails, with echo exiting in such case with non-zero status (/usr/bin/echo doesn't provide a message in stderr, but bash builtin does).
write(1, " H e l l o W o r l d\n", 12) Err#49 EDQUOT _exit(1)
I think NFS is partly at fault here for the random behavior. The logic seem to be that if the file doesn't exist, you get no error on write, and later close() returns the Err#49 EDQUOT. But if the file already exists, the error is returned directly on the write().
Unlike echo, cat is performing a close (both binary and builtin), which allows it to detect that condition. echo doesn't need to do such close(), but as bash isn't reporting the error on close, it silently fails.
Workaround, replace your echo "Hello World" > test.txt calls to echo "Hello World" | cat > test.txt
and have fun when someone mocks of you for writing silly code :)
- No obvious thread on toolserver-l (lots of noise there anyway, maybe we
need a separate toolserver-announce-l for stuff that actually matters that likely need users to do something?)
You mean like https://lists.wikimedia.org/mailman/listinfo/toolserver-announce? The one where no such notification was posted, even though the original thread-starting “Dude, where is my logfile” e-mail by DaB was?
-- [[cs:User:Mormegil | Petr Kadlec]]
toolserver-l@lists.wikimedia.org