Hello,
Today we have disabled BigBrother in Toolforge. BigBrother was a tool that monitored continuous jobs that failed to get restarted because they ran into corner cases where Grid Engine wasn't sufficiently smart to re-start them (e.g. out of memory). BigBrother would continuously monitor those jobs and duplicate that functionality on a layer above Grid Engine.
Although very few tools used BigBrother (0.65% to be more precise), it taxed our NFS file server constantly so keeping it around didn't make much sense. Additionally, its functionality could be easily implemented with a shell script running from cron.
So we've converted all tools that had a .bigbrotherrc file to using a bigbrother.sh script that is triggered every 5min to restart jobs. If your tool used BigBrother, please check your crontab (`crontab -l`) and will see a few entries like this:
``` # Ensure continuous jobs are running */5 * * * * jlocal /data/project/tool_name/bigbrother.sh job_name job_script ```
Documentation has also been updated to reflect this change: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Grid#Bigbrother_(Deprecat...)
In our tests everything worked fine but please let us know if your tool is being impacted by this change.
Regards,
cloud-announce@lists.wikimedia.org