I can already see some of this by greping wfReadOnly() on /includes. This function simply reads a file on the disk, so it is not cached, nor per-transaction. Therefore it can change in the short time one operation of an a larger set of operations was called, and the next. This can cause broken "transactions" on occasion.
I suspect this is what caused the logging problems. It only occurred during heavy traffic or when someone was rapidly doing something (I got a report from someone using a commons mass deletion script) and somewhat randomly (only a portion of the time). This is consistent with the wfReadOnly() check that used to be in LogPage.php.
Voice of All wrote:
I can already see some of this by greping wfReadOnly() on /includes. This function simply reads a file on the disk, so it is not cached, nor per-transaction. Therefore it can change in the short time one operation of an a larger set of operations was called, and the next. This can cause broken "transactions" on occasion.
I suspect this is what caused the logging problems. It only occurred during heavy traffic or when someone was rapidly doing something (I got a report from someone using a commons mass deletion script) and somewhat randomly (only a portion of the time). This is consistent with the wfReadOnly() check that used to be in LogPage.php.
The contents of the read-only file is cached in $wgReadOnly. If the file is blank or missing, $wgReadOnly is set to false, suppressing further reads. So it doesn't change.
However, it can change in the load balancer when the reader connection is opened, and with the new LBFactory system, it's possible that that may happen more than once. So that could indeed cause such a problem, but only if a foreign database is accessed between the main operation and the logging operation.
Note that any segfault, deadlock or PHP fatal error which occurs after the main transaction is committed may cause an unlogged operation. Any of these modes of termination lead to all database connections being immediately closed, and thus all open transactions being rolled back. So I don't think it's necessary to look to such exotic causes. For instance, we're logging a lot of memory exhausted errors at the moment, it's possible that some of them are causing half-executed write operations.
-- Tim Starling
I see. OK, so now this is driving me crazy again. What is it about logging that makes it have all these problems, and nothing else? Does anyone have any ideas? Deadlocks were the first think I was thinking about, but then I wondered why no one reported getting DB error messages.
When I had the affectedRows() > 0 check in LogPage.php, an admin reported getting the error I had it say whenever it was not met (this was for blocking) a few times. This is odd because it seems like the main query function would just return false and throw a DB error if that was the case. I with I had it logging the error code for that.
-Aaron Schulz
To: wikitech-l@lists.wikimedia.org From: tstarling@wikimedia.org Date: Mon, 28 Apr 2008 10:55:32 +1000 Subject: Re: [Wikitech-l] Use of wfReadOnly() in non-entry point functions
Voice of All wrote:
I can already see some of this by greping wfReadOnly() on /includes. This function simply reads a file on the disk, so it is not cached, nor per-transaction. Therefore it can change in the short time one operation of an a larger set of operations was called, and the next. This can cause broken "transactions" on occasion.
I suspect this is what caused the logging problems. It only occurred during heavy traffic or when someone was rapidly doing something (I got a report from someone using a commons mass deletion script) and somewhat randomly (only a portion of the time). This is consistent with the wfReadOnly() check that used to be in LogPage.php.
The contents of the read-only file is cached in $wgReadOnly. If the file is blank or missing, $wgReadOnly is set to false, suppressing further reads. So it doesn't change.
However, it can change in the load balancer when the reader connection is opened, and with the new LBFactory system, it's possible that that may happen more than once. So that could indeed cause such a problem, but only if a foreign database is accessed between the main operation and the logging operation.
Note that any segfault, deadlock or PHP fatal error which occurs after the main transaction is committed may cause an unlogged operation. Any of these modes of termination lead to all database connections being immediately closed, and thus all open transactions being rolled back. So I don't think it's necessary to look to such exotic causes. For instance, we're logging a lot of memory exhausted errors at the moment, it's possible that some of them are causing half-executed write operations.
-- Tim Starling
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_________________________________________________________________ In a rush? Get real-time answers with Windows Live Messenger. http://www.windowslive.com/messenger/overview.html?ocid=TXT_TAGLM_WL_Refresh...
Aaron Schulz wrote:
I see. OK, so now this is driving me crazy again. What is it about logging that makes it have all these problems, and nothing else? Does anyone have any ideas? Deadlocks were the first think I was thinking about, but then I wondered why no one reported getting DB error messages.
Well, I was thinking of a deadlock in the PHP process, we've observed them from time to time. The process typically persists for days, until apache is manually restarted. The database connections time out and close eventually. The user will also see a timeout. This can happen when malloc() is called from a signal handler, a problem which the designers of apache were apparently unaware of.
When I had the affectedRows() > 0 check in LogPage.php, an admin reported getting the error I had it say whenever it was not met (this was for blocking) a few times. This is odd because it seems like the main query function would just return false and throw a DB error if that was the case. I with I had it logging the error code for that.
Yes, that's odd. And more consistent with what the admins are telling us, about the lack of timeouts or other error messages when logging fails.
-- Tim Starling
On Sun, Apr 27, 2008 at 8:55 PM, Tim Starling tstarling@wikimedia.org wrote:
Note that any segfault, deadlock or PHP fatal error which occurs after the main transaction is committed may cause an unlogged operation.
When I looked, logging actually occurred in the same transaction as the activity itself, IIRC. If this isn't the case, shouldn't it be?
wikitech-l@lists.wikimedia.org