Time to time, we receive a strange warning message in fenari:/home/wikipedia/log/syslog/apache.log
Oct 3 01:01:03 10.0.11.59 apache2[20535]: PHP Warning: * require() [<a href='function.require'>function.require</a>]: GC cache entry '/usr/local/apache/common-local/wmf-config/ExtensionMessages-1.20wmf12.php' (dev=2049 ino=10248005) was on gc-list for 601 seconds in /usr/local/apache/common-local/php-1.20wmf12/includes/AutoLoader.php* on line 1150
Definitely this issue comes from *APC*, source code from package apc-3.1.6-r1. When item is inserted into user cache or file cache, this function is called.
static void process_pending_removals(apc_cache_t* cache TSRMLS_DC) { slot_t** slot; time_t now;
/* This function scans the list of removed cache entries and deletes any * entry whose reference count is zero (indicating that it is no longer * being executed) or that has been on the pending list for more than * cache->gc_ttl seconds (we issue a warning in the latter case). */
if (!cache->header->deleted_list) return;
slot = &cache->header->deleted_list; now = time(0);
while (*slot != NULL) { int gc_sec = cache->gc_ttl ? (now - (*slot)->deletion_time) : 0;
if ((*slot)->value->ref_count <= 0 || gc_sec > cache->gc_ttl) { slot_t* dead = *slot;
if (dead->value->ref_count > 0) { switch(dead->value->type) { case APC_CACHE_ENTRY_FILE: apc_warning("GC cache entry '%s' (dev=%d ino=%d) was on gc-list for %d seconds" TSRMLS_CC, dead->value->data.file.filename, dead->key.data.file.device, dead->key.data.file.inode, gc_sec); break; case APC_CACHE_ENTRY_USER: apc_warning("GC cache entry '%s'was on gc-list for %d seconds" TSRMLS_CC, dead->value->data.user.info, gc_sec); break; } } *slot = dead->next; free_slot(dead TSRMLS_CC); } else { slot = &(*slot)->next; } } }
From APC configuration (
http://us.php.net/manual/en/apc.configuration.php#ini.apc.gc-ttl )
*apc.gc_ttl integer*
The number of seconds that a cache entry may remain on the garbage-collection list. This value provides a fail-safe in the event that a server process dies while executing a cached source file; if that source file is modified, the memory allocated for the old version will not be reclaimed until this TTL reached. Set to zero to disable this feature.
We get messages "GC cache entry '%s' (dev=%d ino=%d) was on gc-list for %d seconds" or "GC cache entry '%s'was on gc-list for %d seconds" in this condition:
(gc_sec > cache->gc_ttl) && (dead->value->ref_count > 0)
First condition means, item was deleted later then apc.gc_ttl seconds ago and its still in garbage collector list. Seconds condition means, item is still referenced.
e.g., when a process unexpectedly died, reference is not decreased. First apc.ttl seconds is active in APC cache, then is deleted (there isn't next hit on this item). Now item is on garbage collector list (GC) and apc.gc_ttl timeout is running. When apc.gc_ttl is less then (now - item_deletion_time), warning is written and item is finally completely flushed.
So what should we do?
I think the latter case is a likely candidate, as we see a couple hundred apache worker segfaults daily, in both php5 and libxml2 space. The first case is likely a bug in php core and it's worth checking whether we'd see the same behavior running a current php release. Core analysis may help us determine if a reproduceable state always leads to the crash. Similarly with libxml2. I suppose we'd have to patch apc with additional logging to really know for sure that this is the cause.
From your understanding of the apc source, when would such items ever
actually be freed? Only on apache restart?
On Tuesday, October 2, 2012, Patrick Reilly wrote:
Time to time, we receive a strange warning message in fenari:/home/wikipedia/log/syslog/apache.log
Oct 3 01:01:03 10.0.11.59 apache2[20535]: PHP Warning: * require() [<a href='function.require'>function.require</a>]: GC cache entry '/usr/local/apache/common-local/wmf-config/ExtensionMessages-1.20wmf12.php' (dev=2049 ino=10248005) was on gc-list for 601 seconds in /usr/local/apache/common-local/php-1.20wmf12/includes/AutoLoader.php* on line 1150
Definitely this issue comes from *APC*, source code from package apc-3.1.6-r1. When item is inserted into user cache or file cache, this function is called.
static void process_pending_removals(apc_cache_t* cache TSRMLS_DC) { slot_t** slot; time_t now;
/* This function scans the list of removed cache entries and deletes any
- entry whose reference count is zero (indicating that it is no longer
- being executed) or that has been on the pending list for more than
- cache->gc_ttl seconds (we issue a warning in the latter case).
*/
if (!cache->header->deleted_list) return;
slot = &cache->header->deleted_list; now = time(0);
while (*slot != NULL) { int gc_sec = cache->gc_ttl ? (now - (*slot)->deletion_time) : 0;
if ((*slot)->value->ref_count <= 0 || gc_sec > cache->gc_ttl) { slot_t* dead = *slot; if (dead->value->ref_count > 0) { switch(dead->value->type) { case APC_CACHE_ENTRY_FILE: apc_warning("GC cache entry '%s' (dev=%d ino=%d)
was on gc-list for %d seconds" TSRMLS_CC, dead->value->data.file.filename, dead->key.data.file.device, dead->key.data.file.inode, gc_sec); break; case APC_CACHE_ENTRY_USER: apc_warning("GC cache entry '%s'was on gc-list for %d seconds" TSRMLS_CC, dead->value->data.user.info, gc_sec); break; } } *slot = dead->next; free_slot(dead TSRMLS_CC); } else { slot = &(*slot)->next; } } }
From APC configuration ( http://us.php.net/manual/en/apc.configuration.php#ini.apc.gc-ttl )
*apc.gc_ttl integer*
The number of seconds that a cache entry may remain on the garbage-collection list. This value provides a fail-safe in the event that a server process dies while executing a cached source file; if that source file is modified, the memory allocated for the old version will not be reclaimed until this TTL reached. Set to zero to disable this feature.
We get messages "GC cache entry '%s' (dev=%d ino=%d) was on gc-list for %d seconds" or "GC cache entry '%s'was on gc-list for %d seconds" in this condition:
(gc_sec > cache->gc_ttl) && (dead->value->ref_count > 0)
First condition means, item was deleted later then apc.gc_ttl seconds ago and its still in garbage collector list. Seconds condition means, item is still referenced.
e.g., when a process unexpectedly died, reference is not decreased. First apc.ttl seconds is active in APC cache, then is deleted (there isn't next hit on this item). Now item is on garbage collector list (GC) and apc.gc_ttl timeout is running. When apc.gc_ttl is less then (now - item_deletion_time), warning is written and item is finally completely flushed.
So what should we do? _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 03/10/12 11:25, Patrick Reilly wrote:
e.g., when a process unexpectedly died, reference is not decreased. First apc.ttl seconds is active in APC cache, then is deleted (there isn't next hit on this item). Now item is on garbage collector list (GC) and apc.gc_ttl timeout is running. When apc.gc_ttl is less then (now - item_deletion_time), warning is written and item is finally completely flushed.
So what should we do?
That's the point of the apc.gc_ttl feature. The shared memory pool was being exhausted because of reference leaks, so we set apc.gc_ttl to release the leaked objects. We set it to a value much higher than the request timeout, so it's unlikely to release an object which is in use.
I don't think the warning is an issue which needs to be fixed. Maybe the segfaults need to be fixed, if segfaults are causing the leaks.
-- Tim Starling
wikitech-l@lists.wikimedia.org