Hi all!
Since https://gerrit.wikimedia.org/r/#/c/21584/ got merged, people have been complaining that they get tons of warnings. A great number of them seem to be caused by the fact the MediaWiki will, if the DBO_TRX flag is set, automatically start a transaction on the first call to Database::query().
See e.g. https://bugzilla.wikimedia.org/show_bug.cgi?id=40378
The DBO_TRX flag appears to be set by default in sapi (mod_php) mode. According to the (very limited) documentation, it's intended to wrap the entire web request in a single database transaction.
However, since we do not have support for nested transactions, this doesn't work: the "wrapping" transaction gets implicitely comitted when begin() is called to start a "proper" transaction, which is often the case when saving new revisions, etc.
So, DBO_TRX sems to be misguided, or at least broken, to me. Can someone please explain why it was introduced? It seems the current situation is this:
* every view-only request is wrapped in a transaction, for not good reason i can see.
* any write operation that uses an explicit transaction, like page editing, watching pages, etc, will break the wrapping transaction (and cause a warning in the process). As far as I understand, this really defies the purpose of the automatic wrapping transaction.
So, how do we solve this? We could:
* suppress warnings if the DBO_TRX flag is set. That would prevent the logs from being swamped by transaction warnings, but it would not fix the current broken (?!) behavior.
* get rid of DBO_TRX (or at least not use it per default). This seems to be the Right Thing to me, but I suppose there is some point to the automatic transactions that I am missing.
* Implement support for nested transactions, either using a counter (this would at least make DBO_TRX work as I guess it was intended) or using safepoints (that would give us support for actual nested transactions). That would be the Real Solution, IMHO.
So, can someone shed light on what DBO_TRX is intended to do, and how it is supposed to work?
-- daniel
On 25/09/12 19:33, Daniel Kinzler wrote:
So, can someone shed light on what DBO_TRX is intended to do, and how it is supposed to work?
Maybe you should have asked that before you broke it with I8c0426e1.
DBO_TRX provides the following benefits:
* It provides improved consistency of write operations for code which is not transaction-aware, for example rollback-on-error.
* It provides a snapshot for consistent reads, which improves application correctness when concurrent writes are occurring.
DBO_TRX was introduced when we switched over to InnoDB, along with the introduction of Database::begin() and Database::commit().
begin() and commit() were never meant to be "matched", so it's not surprising that you would get a lot of warnings if you started trying to enforce that.
Initially, I set up a scheme where transactions were "nested", in the sense that begin() incremented the transaction level and commit() decremented it. When it was decremented to zero, an actual COMMIT was issued. So you would have a call sequence like:
* begin() -- sends BEGIN * begin() -- does nothing * commit() -- does nothing * commit() -- sends COMMIT
This scheme soon proved to be inappropriate, since it turned out that the most important thing for performance and correctness is for an application to be able to commit the current transaction after some particular query has completed. Database::immediateCommit() was introduced to support this use case -- its function was to immediately reduce the transaction level to zero and commit the underlying transaction.
When it became obvious that that every Database::commit() call should really be Database::immediateCommit(), I changed the semantics, effectively renaming Database::immediateCommit() to Database::commit(). I removed the idea of nested transactions in favour of a model of cooperative transaction length management:
* Database::begin() became effectively a no-op for web requests and was sometimes omitted for brevity. * Database::commit() should be called after completion of a sequence of write operations where atomicity is desired, or at the earliest opportunity when contended locks are held.
In cases where transactions end up being too short due to the need for a called function to commit a transaction when the caller already has a transaction open, it is the responsibility of the callers to introduce some suitable abstraction for serializing the transactions.
When transactions too long, you hit performance problems due to lock contention. When transactions are too short, you hit consistency problems when requests fail. The scheme I introduced favours performance over consistency. It resolves conflicts between callers and callees by using the shortest transaction time. I think was an appropriate choice for Wikipedia, both then and now, and I think it is probably appropriate for many other medium to high traffic wikis.
Savepoints were not available at the time the scheme was introduced. But they are a refinement of the abandoned transaction nesting scheme, not a refinement of the current scheme which is optimised for reducing lock contention.
In terms of performance, perhaps it would be feasible to use short transactions with an explicit begin() with savepoints for nesting. But then you would lose the consistency benefits of DBO_TRX that I mentioned at the start of this post.
-- Tim Starling
On 25/09/12 13:38, Tim Starling wrote:
On 25/09/12 19:33, Daniel Kinzler wrote:
So, can someone shed light on what DBO_TRX is intended to do, and how it is supposed to work?
Maybe you should have asked that before you broke it with I8c0426e1.
He did ask about the whole database transactions on Aug 23 ("Nested database transactions thread"), and explicitely asked from objections the next day. I8c0426e1 is from Aug 27.
Nobody said that «begin() and commit() were never meant to be "matched"» on that thread, the previous one (2010) or even mails in the last few years AFAIK. I guess that's because you were the only one who knew how they were meant to be used. :)
(...)
Initially, I set up a scheme where transactions were "nested", in the sense that begin() incremented the transaction level and commit() decremented it. When it was decremented to zero, an actual COMMIT was issued. So you would have a call sequence like:
- begin() -- sends BEGIN
- begin() -- does nothing
- commit() -- does nothing
- commit() -- sends COMMIT
This scheme soon proved to be inappropriate, since it turned out that the most important thing for performance and correctness is for an application to be able to commit the current transaction after some particular query has completed.
Except when you break your caller, whose transaction you wrecked, with bad consequences such as losing the descriptions of uploaded files (bug 32551).
As for the original issue, I think that it could be solved by leaving the "transaction counter" at 0 for the implicit DB_TRX, and making begin() commit it and create a new one when called explicitely (we would need another flag... name that transaction level 0.5? :).
Hi Tim
Thanks for bringing some light into the DBO_TRX stuff. Seems like few knew it existed, and hardly anyone understood what it means or how it should be used.
I'll give my thoughts inline and propose a solution at the bottom.
On 25.09.2012 13:38, Tim Starling wrote:
On 25/09/12 19:33, Daniel Kinzler wrote:
So, can someone shed light on what DBO_TRX is intended to do, and how it is supposed to work?
Maybe you should have asked that before you broke it with I8c0426e1.
Well, I did ask about nested transactions on the list. Nobody mentioned the scheme you describe. Is it documented somewhere?
Anyway, I just added warnings, the behavior didn't change.
DBO_TRX provides the following benefits:
- It provides improved consistency of write operations for code which
is not transaction-aware, for example rollback-on-error.
But it *breaks* write consistency for code that *is* transaction aware. Calling begin() will prematurely commit the already open transaction.
- It provides a snapshot for consistent reads, which improves
application correctness when concurrent writes are occurring.
Ok.
Initially, I set up a scheme where transactions were "nested", in the sense that begin() incremented the transaction level and commit() decremented it. When it was decremented to zero, an actual COMMIT was issued. So you would have a call sequence like:
- begin() -- sends BEGIN
- begin() -- does nothing
- commit() -- does nothing
- commit() -- sends COMMIT
This scheme soon proved to be inappropriate, since it turned out that the most important thing for performance and correctness is for an application to be able to commit the current transaction after some particular query has completed. Database::immediateCommit() was introduced to support this use case -- its function was to immediately reduce the transaction level to zero and commit the underlying transaction.
Ok.
When it became obvious that that every Database::commit() call should really be Database::immediateCommit(), I changed the semantics, effectively renaming Database::immediateCommit() to Database::commit(). I removed the idea of nested transactions in favour of a model of cooperative transaction length management:
- Database::begin() became effectively a no-op for web requests and
was sometimes omitted for brevity.
But it isn't! It's not a no-op if there is an active transaction! It *breaks* the active transaction! I think that is the crucial point here.
- Database::commit() should be called after completion of a sequence
of write operations where atomicity is desired, or at the earliest opportunity when contended locks are held.
Ok, so it's basically a savepoint.
Using that scheme, a new transaction should be started immediately after the commit(). I guess when DBO_TRX is set, query() will do that.
In cases where transactions end up being too short due to the need for a called function to commit a transaction when the caller already has a transaction open, it is the responsibility of the callers to introduce some suitable abstraction for serializing the transactions.
In the presence of hooks implemented by extensions, this frankly seems impossible. Also, it would require functions to document exactly if and when they are using transactions, and hooks have to document whether implementors can use transactions.
Currently, the only safe way for an extension to use transactions seems to be to check the trxLevel explicitly, and only start a transaction if there is none already in progress. Which effectively brings us back to the level-counting scheme.
When transactions too long, you hit performance problems due to lock contention.
Yes... which makes me wonder why it's a good idea to start a transaction upon the first select, even for requests that do not write to the database at all.
When transactions are too short, you hit consistency problems when requests fail. The scheme I introduced favours performance over consistency. It resolves conflicts between callers and callees by using the shortest transaction time. I think was an appropriate choice for Wikipedia, both then and now, and I think it is probably appropriate for many other medium to high traffic wikis.
In terms of performance, perhaps it would be feasible to use short transactions with an explicit begin() with savepoints for nesting. But then you would lose the consistency benefits of DBO_TRX that I mentioned at the start of this post.
I'm trying to think of a way to implement this scheme without breaking transactions and causing creeping inconsistencies.
For the semantics you propose, begin() and commit() seem to be misleading names - flush() would be more descriptive of what is going on, and implies no nesting.
So, how about this:
* normally, flush() will commit any currently open DBO_TRX transaction (and the next query will start a new one). In cli (auto-commit) mode, it would do nothing.
* begin() and end() (!) use counted levels. In auto-commit mode, the outer-most level of them starts and commits a transaction.
* Between begin and end, flush does nothing.
This would allow us to use the "earliest commit" semantics, but also control blocks of DB operations that should be consistent and atomic. And it would make it explicit to programmers what they are doing.
-- daniel
PS: for the time being, I suggest to just disable the transaction warnings if the DBO_TRX flag is set. That lets us keep warnings about the use of nested transactions in unit tests without flooding the logs.
On 26/09/12 05:26, Daniel Kinzler wrote:
- Database::commit() should be called after completion of a sequence
of write operations where atomicity is desired, or at the earliest opportunity when contended locks are held.
Ok, so it's basically a savepoint.
Savepoints don't release locks.
In cases where transactions end up being too short due to the need for a called function to commit a transaction when the caller already has a transaction open, it is the responsibility of the callers to introduce some suitable abstraction for serializing the transactions.
In the presence of hooks implemented by extensions, this frankly seems impossible. Also, it would require functions to document exactly if and when they are using transactions, and hooks have to document whether implementors can use transactions.
Presumably there is some particular case where you have encountered this problem. What is it?
When transactions too long, you hit performance problems due to lock contention.
Yes... which makes me wonder why it's a good idea to start a transaction upon the first select, even for requests that do not write to the database at all.
Ordinary select queries do not acquire locks. They just open a snapshot.
For the semantics you propose, begin() and commit() seem to be misleading names
- flush() would be more descriptive of what is going on, and implies no nesting.
begin() and commit() are named after the queries that they unconditionally issue. A BEGIN query causes an implicit COMMIT of the preceding transaction, if there is any. The Database class is redundantly performing the same operation, not implementing some novel concept.
So, how about this:
- normally, flush() will commit any currently open DBO_TRX transaction (and the
next query will start a new one). In cli (auto-commit) mode, it would do nothing.
- begin() and end() (!) use counted levels. In auto-commit mode, the outer-most
level of them starts and commits a transaction.
- Between begin and end, flush does nothing.
This would allow us to use the "earliest commit" semantics, but also control blocks of DB operations that should be consistent and atomic. And it would make it explicit to programmers what they are doing.
I think that to avoid confusion, begin() and commit() should continue to issue the queries they are named for.
You shouldn't use the terms "CLI" and "autocommit" interchangeably. Autocommit is enabled when DBO_TRX is off and $db->mTrxLevel is zero. There are several callers that set up this situation during a web request, and there are many CLI scripts that start transactions explicitly.
Your scheme does not appear to provide a method for hooks to release highly contended locks that they may acquire. Lock contention is usually the most important reason for calling commit(). Holding a contended lock for an excessive amount of time has often brought the site down. Imagine if someone wrapped a hook that writes to site_stats with begin/end. The code would work just fine in testing, and then instantly bring the site down when it was deployed.
-- Tim Starling
I agree that begin()/commit() should do what they say (which they do now). I'd like to have another construct that behaves like how those two used to (back when there were immediate* functions). Callers would then have code like: $db->enterTransaction() ... atomic stuff ... $db->exitTransaction() This would use counters for nested begins (or perhaps SAVEPOINTs to deal with rollback better...though that can cause RTT spam easily). If using counters, it could be like begin()/finish() in https://gerrit.wikimedia.org/r/#/c/16696/. The main advantage of doing this would be that in cli mode (which defaults to using autocommit), all the code will still start transactions when needed. It would be nice to have the consistency/robustness.
In any case, echoing what Tim said, most code that has begin()/commit() does so for performance reasons. In some cases, they can be changed to use DeferredUpdates or $db->onTransactionIdle(). I had a few patches in gerrit to this affect. Some things may not actually need begin/commit explicitly (I got rid of this in some preferences code ages ago). Things like WikiPage/LocalFile are examples of classes that would have a hard time not using begin()/commit() as they do. Perhaps some code could be restructured in some cases so that the calls at least match, meaning the splitting of transactions would at least be more deliberate rather than accidental.
-- View this message in context: http://wikimedia.7.n6.nabble.com/Can-we-kill-DBO-TRX-It-seems-evil-tp4986002... Sent from the Wikipedia Developers mailing list archive at Nabble.com.
On 26.09.2012 02:49, Aaron Schulz wrote:
Perhaps some code could be restructured in some cases so that the calls at least match, meaning the splitting of transactions would at least be more deliberate rather than accidental.
I think this is the most important bit here: it should be obvious to the programmer what is going on, and it should be easy to debug when stuff gets committed and why. The behavior of the respective functions should be documented in detail for all the possible situations.
My impression is that I'm not the only developer who had the wrong mental model of what begin/commit does, or how it is intended to use. Fixing that is the most important thing. Coming up with ways to actually allow for unbroken nested transactions would be good I think, but that comes second.
For the "being obvious" bit, better documentation is the first step. Another possibility would be to provide additional functions (even if they just act as aliases) that allow the programmer to make explicit the desired behavior.
The "debug" bit is mostly about when debug messages and warnings should be issued. I see three factors that could influence that: * the DBO_TRX flag * whether write queries have been executed on the connection (since the last commit). * whether wgDebugTransactions is enabled.
What do you think, which of these factors should be considered, and how?
-- daniel
On 26.09.2012 02:08, Tim Starling wrote:
I think that to avoid confusion, begin() and commit() should continue to issue the queries they are named for.
True. So we'd need start() and finish() or some such.
Your scheme does not appear to provide a method for hooks to release highly contended locks that they may acquire. Lock contention is usually the most important reason for calling commit(). Holding a contended lock for an excessive amount of time has often brought the site down. Imagine if someone wrapped a hook that writes to site_stats with begin/end. The code would work just fine in testing, and then instantly bring the site down when it was deployed.
I see your point. But if we have the choice between lock contention and silent data loss, which is better? While I agree that inconsistencies in secondary data like link tables is acceptable, I think we do need a way to protect critical transactions to prevent data loss.
Hm... if we use flush() for immediate commits, and start() and finish() for blocks that disable immediate commits (i.e. critical transactions), then a "silenced" flush() inside an start() and finish() block should issue a warning. That would make tests fail. Would you be OK with that?
So, my current proposal is a more expressive high-level api for transaction control consisting of start/finish/flush (and perhaps abort) on top of the low level interface consisting of begin/commit/rollback. Documentation needs to be very clear on how they behave and how they relate to each other.
For the short term, I suggest to suppress warnings about nested transactions under some conditions, see my previous response to aaron.
-- daniel
On Wednesday, September 26, 2012, Daniel Kinzler wrote:
I see your point. But if we have the choice between lock contention and silent data loss, which is better?
This isn't really a choice - by default, when a statement in mysql hits a lock timeout, it is rolled back but the transaction it's in is not. That can also lead to data loss via partial writes in real world cases if not properly accounted for by the application.
Avoiding holding locks longer than needed really should be paramount. Developers need to adapt to cases where transaction semantics alone can't guarantee consistancy across multiple write statements. We're planning on sharding some tables this year and there will be cases where writes will have to go to multiple database servers, likely without the benefit of two phase commit. That doesn't mean that we should give up on consistancy or that we shouldn't try to do better, but not in exchange for more lock contention.
On 26.09.2012 12:06, Asher Feldman wrote:
On Wednesday, September 26, 2012, Daniel Kinzler wrote:
I see your point. But if we have the choice between lock contention and silent data loss, which is better?
This isn't really a choice - by default, when a statement in mysql hits a lock timeout, it is rolled back but the transaction it's in is not.
Uh. That sounds evil and breaks the A in ACID, no? Why isn't the entire transaction rolled back in such a case?
That can also lead to data loss via partial writes in real world cases if not properly accounted for by the application.
How could we detect such a case?
That doesn't mean that we should give up on consistancy or that we shouldn't try to do better, but not in exchange for more lock contention.
Well, improving consistency and avoiding data loss is going to be hard without the use of locks... how do you propose to do that?
-- daniel
On Wed, Sep 26, 2012 at 4:07 AM, Daniel Kinzler daniel@brightbyte.dewrote:
On 26.09.2012 12:06, Asher Feldman wrote:
On Wednesday, September 26, 2012, Daniel Kinzler wrote:
I see your point. But if we have the choice between lock contention and silent data loss, which is better?
This isn't really a choice - by default, when a statement in mysql hits a lock timeout, it is rolled back but the transaction it's in is not.
Uh. That sounds evil and breaks the A in ACID, no? Why isn't the entire transaction rolled back in such a case?
There's a distinction (possibly misguided) between cases where a statement can be retried with an expectation of success, and cases that aren't which trigger an implicit rollback. Deadlocks are considered the latter by mysql, they result in a transaction rollback. Oracle behaves the same way as mysql with regards to lock timeouts - it's up to developers to either retry the timed-out statement, or rollback. The results can definitely be evil if not handled correctly, but it's debatable if it's a violation of atomicity.
If lock timeout throws an exception that closes the connection to mysql, at least that will result in a rollback. If the connection is pooled and reused, it can likely result in a commit.
Mysql does offer a "rollback_on_timeout" option that changes the default behavior. We can enable it at wmf, but since that may not be an option for many installs, it's better to work around it.
That
can also lead to data loss via partial writes in real world cases if not properly accounted for by the application.
How could we detect such a case?
I can't think of a way that's actually good. Better to account for the behavior.
That doesn't mean that we should give up on consistancy or
that we shouldn't try to do better, but not in exchange for more lock contention.
Well, improving consistency and avoiding data loss is going to be hard without the use of locks... how do you propose to do that?
We could try to identify cases where consistency is extremely important, vs. where it isn't. In the cases where a very important lock holding transaction will be entered, can we defer calling hooks or doing anything unrelated until that transaction is closed at its intended endpoint? If so, perhaps everything else can be subject to current behavior, where unrelated code can call commit.
----- Original Message -----
From: "Asher Feldman" afeldman@wikimedia.org
If lock timeout throws an exception that closes the connection to mysql, at least that will result in a rollback. If the connection is pooled and reused, it can likely result in a commit.
I would assert that if that's true, than connection pooling is unacceptably Broken As Designed.
Cheers, -- jra
On 27/09/12 01:16, Tim Starling wrote:
On 26/09/12 19:13, Daniel Kinzler wrote:
I see your point. But if we have the choice between lock contention and silent data loss, which is better?
It's better if you don't have to make that choice. So don't call any hooks while a critical transaction is open.
-- Tim Starling
Can the "critical transactions where world will end if holded a little more" be listed? Not being able to touch things out of fear that it might explode in production (while unable to test it before) is bad.
Daniel Kinzler daniel@brightbyte.de wrote:
So, my current proposal is a more expressive high-level api for transaction control consisting of start/finish/flush (and perhaps abort) on top of the low level interface consisting of begin/commit/rollback. Documentation needs to be very clear on how they behave and how they relate to each other.
I did some application (in Zope) where correctness was more important than locks and it was running on PostgreSQL so we never had this problem. Zope collects all transactions from different sources (multiple database connections for example) and handles them transparently (like automatic rollback on error/exception n the code). In MediaWiki context that would be equivalent to keeping transactions controlled at the WebRequest level. I know too little about InnoDB transactions to comment as I understand MySQL is very different.
For the short term, I suggest to suppress warnings about nested transactions under some conditions, see my previous response to aaron.
In includes/db/DatabasePostgres.php there is PostgresTransactionState monitor, which is very nice to debug all problems with implicit/explicit transactions. It can easily be abstracted (to Database.php or somewhere else) and maybe there are functions to monitor InnoDB transaction status as well.
From the PostgreSQL side I see one problem with nesting - we are already
using savepoints to emulate MySQL's "INSERT IGNORE" and friends.\ It might be difficult to abuse that feature for something more than this. There is a class "SavepointPostgres" which is used for that.
//Marcin
On Thu, Sep 27, 2012 at 08:40:13PM +0000, Marcin Cieslak wrote:
From the PostgreSQL side I see one problem with nesting - we are already using savepoints to emulate MySQL's "INSERT IGNORE" and friends.\ It might be difficult to abuse that feature for something more than this. There is a class "SavepointPostgres" which is used for that.
As long as the savepoints are properly nested and multiple levels of nesting don't try to reuse the same name, things should be fine. And since this use is just "SAVEPOINT", "INSERT", "RELEASE SAVEPOINT", there's no opportunity for things to not be properly nested, and avoiding name collision would not be hard.
BTW, it looks to me like the use of SavepointPostgres in "INSERT IGNORE" emulation will build up unreleased savepoints whenever an insert is ignored, because $savepoint->rollback() does not actually release the savepoint. But this shouldn't cause trouble beyond wasting resources to store all those savepoints.
Brad Jorsch b-jorsch@alum.northwestern.edu wrote:
On Thu, Sep 27, 2012 at 08:40:13PM +0000, Marcin Cieslak wrote:
From the PostgreSQL side I see one problem with nesting - we are already using savepoints to emulate MySQL's "INSERT IGNORE" and friends.\ It might be difficult to abuse that feature for something more than this. There is a class "SavepointPostgres" which is used for that.
As long as the savepoints are properly nested and multiple levels of nesting don't try to reuse the same name, things should be fine. And since this use is just "SAVEPOINT", "INSERT", "RELEASE SAVEPOINT", there's no opportunity for things to not be properly nested, and avoiding name collision would not be hard.
All is fine as long something like SqlBagOStuff or Localization Cache or something else working in parallel does not do something to your so-called "transaction" (https://bugzilla.wikimedia.org/show_bug.cgi?id=35357 or https://bugzilla.wikimedia.org/show_bug.cgi?id=27283).
//Marcin
I have submitted two changes for review that hopefully remedy the current problems:
* I1e746322 implements better documentation, more consistent behavior, and easier tracking of implicit commits in Database::begin()
* I6ecb8faa restores the "flushing" commits that I removed a while ago under the assumption that a commit without a begin would be a no-op.
I hope this addresses any pressing issues.
I still think that we need a way to protect critical sections. But an RFC seems to be in order for that.
-- daniel
It looks like Daniel's change to log implicit commits went live on the wmf cluster with the release of 1.21wmf9.
Unfortunately, it doesn't appear to be as useful as hoped for tracking down nested callers of Database::begin, the majority of log entries just look like:
Wed Feb 13 22:07:21 UTC 2013 mw1146 dewiki DatabaseBase::begin: Transaction already in progress (from DatabaseBase::begin), performing implicit commit!
It's like we'd need a backtrace at this point. So I think we should revisit this issue and either:
- expand the logging to make it more useful
- disable it to prevent filling the dberror log with inactionable messages and nothing else
- revisit the ideas of either dropping the implicit commit by use of a transaction counter, or of emulating real nested transactions via save points.
The negative impact on concurrency due to longer lived transactions and longer held locks may negate the viability of the third option, even though it feels the most correct.
-Asher
On Wed, Sep 26, 2012 at 4:30 AM, Daniel Kinzler daniel@brightbyte.dewrote:
I have submitted two changes for review that hopefully remedy the current problems:
- I1e746322 implements better documentation, more consistent behavior, and
easier tracking of implicit commits in Database::begin()
- I6ecb8faa restores the "flushing" commits that I removed a while ago
under the assumption that a commit without a begin would be a no-op.
I hope this addresses any pressing issues.
I still think that we need a way to protect critical sections. But an RFC seems to be in order for that.
-- daniel
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 2013-02-13 6:24 PM, "Asher Feldman" afeldman@wikimedia.org wrote:
Unfortunately, it doesn't appear to be as useful as hoped for tracking
down
nested callers of Database::begin, the majority of log entries just look like:
Wed Feb 13 22:07:21 UTC 2013 mw1146 dewiki DatabaseBase::begin: Transaction already in progress (from DatabaseBase::begin), performing implicit commit!
For starters, why isnt wfWarn being called with a second argument of 2?
Why isnt $fname defaulting to wfGetCaller() if unspecified?
Sure a full backtrace or other more complicated logging may be useful/needed but as a start we know that saying called from database::begin is pretty useless.
Before worrying too much about loggging on production, why not deal with the ones that occur regularly on testing wikis. They are extremely easy to reproduce locally and happen quite regularly. (For example on every purge action).
-bawolff
wikitech-l@lists.wikimedia.org