Nested database transactions

List overview All Threads
Download

newer

older

Re: [Wikitech-l] Flagged Reviews...

Using ORM patterns in core

Daniel Kinzler

23 Aug 2012 23 Aug '12

4:37 p.m.

Hi all

I think it would be extremely useful to allow nested database transactions - or simulate them using a counter that would only to the actual commit after commit() has been called as many times as begin() was called before.

This actually used to be the case, according to the comment on Database::trxLevel:

* Historically, transactions were allowed to be "nested". This is no * longer supported, so this function really only returns a boolean.

This means that currently, if you call begin() while a transaction is already in progress, the previous transaction is inadvertently committed, possibly causing inconsistencies (at least on MySQL).

Why was this feature removed? Not counting transaction levels is causing a world of pain for us on the Wikidata project, and I'm sure the same problem arises elsewhere. Here's the problem:

* Before saving a change using WikiPage::doEdit(), i want to perform some checks on the database, enforcing some global consistency constraints. * The check should be in the same transaction, so the DB can't change after the check but before the save. * I can't open a transaction before my check, because WikiPage::doEdit()already opens a transaction which would in turn abort mine, causing the save to be performed in a separate transaction after all. * I could try to inject my check into WikiPage::doEdit() using some hook. That may work but it cumbersome and annoying, and I have to hope that the hook is never moved outside the transaction (hooks generally don't guarantee whther they are called in a transaction or not).

Basically, any code that needs transactional logic needs to first check whether a transaction is already ongoing, open a transaction if not, remember whether it owns the transaction, and commit the transaction in the end only if it's owned locally. This essentially implements the is-in-transaction-counter based on the call stack.

Looking at the code, this kind of check is hardly ever done. So the code just *assumes* that it is, or is not, called within a transaction. This is bad.

So... why was the nice and simple trxLevel counting removed? What would break if we put it back? Is there some other magic method to do this safely and nicely?

Thanks, Daniel

PS: btw, if the non-counting behavior is to be kept, Database::begin() should really fail if a transaction is already in progress. Silently committing the previous transaction is very likely to cause creeping, low volume and hard to track down database corruption.

Show replies by date

Brion Vibber

23 Aug 23 Aug

4:49 p.m.

On Thu, Aug 23, 2012 at 1:37 PM, Daniel Kinzler daniel@brightbyte.dewrote:

...

I think it would be extremely useful to allow nested database transactions

or

simulate them using a counter that would only to the actual commit after commit() has been called as many times as begin() was called before.

This actually used to be the case, according to the comment on Database::trxLevel:
     * Historically, transactions were allowed to be "nested". This is
no * longer supported, so this function really only returns a boolean.

This means that currently, if you call begin() while a transaction is already in progress, the previous transaction is inadvertently committed, possibly causing inconsistencies (at least on MySQL).

Why was this feature removed? Not counting transaction levels is causing a world of pain for us on the Wikidata project, and I'm sure the same problem arises elsewhere.

Well, the main reason is probably that MySQL doesn't support nested transactions... trying to simulate them with a counter sounds fragile, as a single rollback would roll back the entire transaction "tree", not just the last nested one you started (or else do nothing if you just decrement the counter, also possibly dangerous if you expected the rollback to work).

-- brion

Evan Priestley

5:02 p.m.

We solve this in Phabricator by using BEGIN (depth 0) or SAVEPOINT (depth 1+) when incrementing the counter, ROLLBACK TO SAVEPOINT (depth 1+) or ROLLBACK (depth 0) when decrementing it after a failure, and nothing (depth 1) or COMMIT (depth 0) when decrementing it after a success. Our experience with transaction stacks has generally been good (no real surprises, doesn't feel magical, significantly reduces the complexity of transactional code), although we don't support anything but MySQL.

On Aug 23, 2012, at 1:49 PM, Brion Vibber wrote:

...

On Thu, Aug 23, 2012 at 1:37 PM, Daniel Kinzler daniel@brightbyte.dewrote:

...
I think it would be extremely useful to allow nested database transactions

or

simulate them using a counter that would only to the actual commit after commit() has been called as many times as begin() was called before.

This actually used to be the case, according to the comment on Database::trxLevel:
    * Historically, transactions were allowed to be "nested". This is
no * longer supported, so this function really only returns a boolean.

This means that currently, if you call begin() while a transaction is already in progress, the previous transaction is inadvertently committed, possibly causing inconsistencies (at least on MySQL).

Why was this feature removed? Not counting transaction levels is causing a world of pain for us on the Wikidata project, and I'm sure the same problem arises elsewhere.
Well, the main reason is probably that MySQL doesn't support nested transactions... trying to simulate them with a counter sounds fragile, as a single rollback would roll back the entire transaction "tree", not just the last nested one you started (or else do nothing if you just decrement the counter, also possibly dangerous if you expected the rollback to work).

-- brion _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Brion Vibber

5:21 p.m.

On Thu, Aug 23, 2012 at 2:02 PM, Evan Priestley epriestley@phacility.comwrote:

...

We solve this in Phabricator by using BEGIN (depth 0) or SAVEPOINT (depth 1+) when incrementing the counter, ROLLBACK TO SAVEPOINT (depth 1+) or ROLLBACK (depth 0) when decrementing it after a failure, and nothing (depth

or COMMIT (depth 0) when decrementing it after a success. Our experience

with transaction stacks has generally been good (no real surprises, doesn't feel magical, significantly reduces the complexity of transactional code), although we don't support anything but MySQL.

Oooh, nice! Hadn't come across SAVEPOINT before.

http://dev.mysql.com/doc/refman/5.0/en/savepoint.html

-- brion

Tyler Romeo

5:24 p.m.

Also, as a matter of record, I just checked and the SAVEPOINT command (or an equivalent) is supported on SQLite, Postgresql, and mssql.

*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com

On Thu, Aug 23, 2012 at 5:21 PM, Brion Vibber brion@pobox.com wrote:

...

On Thu, Aug 23, 2012 at 2:02 PM, Evan Priestley <epriestley@phacility.com

...
wrote:

...
We solve this in Phabricator by using BEGIN (depth 0) or SAVEPOINT (depth 1+) when incrementing the counter, ROLLBACK TO SAVEPOINT (depth 1+) or ROLLBACK (depth 0) when decrementing it after a failure, and nothing

(depth

...

or COMMIT (depth 0) when decrementing it after a success. Our

experience

...
with transaction stacks has generally been good (no real surprises,

doesn't

...
feel magical, significantly reduces the complexity of transactional

code),

...
although we don't support anything but MySQL.

Oooh, nice! Hadn't come across SAVEPOINT before.

http://dev.mysql.com/doc/refman/5.0/en/savepoint.html

-- brion _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Platonides

5:30 p.m.

On 23/08/12 23:21, Brion Vibber wrote:

...

On Thu, Aug 23, 2012 at 2:02 PM, Evan Priestley epriestley@phacility.comwrote:

...
We solve this in Phabricator by using BEGIN (depth 0) or SAVEPOINT (depth 1+) when incrementing the counter, ROLLBACK TO SAVEPOINT (depth 1+) or ROLLBACK (depth 0) when decrementing it after a failure, and nothing (depth

or COMMIT (depth 0) when decrementing it after a success. Our experience

with transaction stacks has generally been good (no real surprises, doesn't feel magical, significantly reduces the complexity of transactional code), although we don't support anything but MySQL.

Oooh, nice! Hadn't come across SAVEPOINT before.

http://dev.mysql.com/doc/refman/5.0/en/savepoint.html

-- brion

I proposed this same thing on the mailing list two years ago: http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/49500

Yes, I completely support changing the transactions to be nested using savepoints. They even are supported (or were) on all our db backends.

However, I was told that it "might make us hold a lot of locks for much too long". So with fear to cause magical db overload, nothing was changed. :(

Jeroen De Dauw

5:54 p.m.

Hey,

One concern I have with big transcations that have lots of stuff in them is that some code might get called in which needs to run in a transaction with a higher isolation level then the current one. For MySQLs InnoDB the default is repeatable read, so if you have code that requires serializability and gets called during that transaction, you're basically fucked. Or am I missing something?

Cheers

-- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. --

Platonides

6:24 p.m.

On 23/08/12 23:54, Jeroen De Dauw wrote:

...

Hey,

One concern I have with big transcations that have lots of stuff in them is that some code might get called in which needs to run in a transaction with a higher isolation level then the current one. For MySQLs InnoDB the default is repeatable read, so if you have code that requires serializability and gets called during that transaction, you're basically fucked. Or am I missing something?

Cheers

I don't think we have any code requiring a different transaction isolation.

Jeroen De Dauw

7:13 p.m.

I'm not saying we have any such code, I'm asking what to do when one wants to introduce such code. It's entirely feasible some new functionality requires serializable transactions, so we might want to keep that into consideration.

Sent from my Android phone. On 24 Aug 2012 00:30, "Platonides" Platonides@gmail.com wrote:

...

On 23/08/12 23:54, Jeroen De Dauw wrote:

...
Hey,

One concern I have with big transcations that have lots of stuff in them

is

...
that some code might get called in which needs to run in a transaction

with

...
a higher isolation level then the current one. For MySQLs InnoDB the default is repeatable read, so if you have code that requires serializability and gets called during that transaction, you're basically fucked. Or am I missing something?

Cheers

I don't think we have any code requiring a different transaction isolation.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Freako F. Freakolowsky

24 Aug 24 Aug

2:51 a.m.

I was also thinking about introducing a similar functionality, not exactly nested transaction but just a way to prevent $db->commit and $db->rollback from doing enything. Akshay(my GSoC) wanted to have his extension code transaction safe, but there was just no way of doing it without tinkering with the core. So it's true in core there is (as it should be) no actual need for transaction nesting, but there are extensions that could use some tx safety.

The simplest way i can think of how this could be done is changing $db->mTrxLevel from a flag to a counter and start using savepoints (doable in Mysql, PG and Oracle, don't know for other DBs). So only the first mTrxLevel would do actual commit or blank rollback (maybe add an optional parameter to calls to override), on higher levels you would just skip on commit and rollback to savepoint on rollback call.

LP, Jure

On 08/24/2012 01:13 AM, Jeroen De Dauw wrote:

...

I'm not saying we have any such code, I'm asking what to do when one wants to introduce such code. It's entirely feasible some new functionality requires serializable transactions, so we might want to keep that into consideration.

Sent from my Android phone. On 24 Aug 2012 00:30, "Platonides" Platonides@gmail.com wrote:

...
On 23/08/12 23:54, Jeroen De Dauw wrote:

...
Hey,

One concern I have with big transcations that have lots of stuff in them

is

...
that some code might get called in which needs to run in a transaction

with

...
a higher isolation level then the current one. For MySQLs InnoDB the default is repeatable read, so if you have code that requires serializability and gets called during that transaction, you're basically fucked. Or am I missing something?

Cheers

I don't think we have any code requiring a different transaction isolation.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Brad Jorsch

23 Aug 23 Aug

6:41 p.m.

On Thu, Aug 23, 2012 at 02:02:41PM -0700, Evan Priestley wrote:

...

We solve this in Phabricator by using BEGIN (depth 0) or SAVEPOINT (depth 1+) when incrementing the counter, ROLLBACK TO SAVEPOINT (depth 1+) or ROLLBACK (depth 0) when decrementing it after a failure, and nothing (depth 1) or COMMIT (depth 0) when decrementing it after a success. Our experience with transaction stacks has generally been good (no real surprises, doesn't feel magical, significantly reduces the complexity of transactional code), although we don't support anything but MySQL.

We do the same thing in our PostgreSQL-based app at my day job, although for commit at depth > 0 we use RELEASE SAVEPOINT rather than doing nothing. I don't think it makes much difference, though, beyond allowing for the release of resources related to the savepoint itself.

FWIW, our savepoints are simply named along the lines of "savepoint$depth". It's been working for us without issue for years.

On Thu, Aug 23, 2012 at 05:24:25PM -0400, Tyler Romeo wrote:

...

Also, as a matter of record, I just checked and the SAVEPOINT command (or an equivalent) is supported on SQLite, Postgresql, and mssql.

According to the PostgreSQL documentation (which is usually pretty good about this sort of thing), it's standard SQL. So any sufficiently-new (and sufficiently-good) SQL database should support it.

On Thu, Aug 23, 2012 at 11:30:20PM +0200, Platonides wrote:

...

However, I was told that it "might make us hold a lot of locks for much too long". So with fear to cause magical db overload, nothing was changed. :(

Although it seems to me that avoiding that problem by making people have to know whether the function they're calling is "safe" to call within a transaction or not isn't the best idea.

Aaron Schulz

9:14 p.m.

The counter idea kind of reminds of what I have in https://gerrit.wikimedia.org/r/#/c/16696/ .

I think the whole implicit commit issue is definitely pretty annoying and I wish there was a reasonable way to address it without reasonable backwards compatibility. rollback() is the hard case to deal with (I ended up not even having it in that gerrit patch).

In general callers should avoid using rollback() for detecting problems or race conditions. They should be checked up front. I put some comments about this in the tiny IDBAccessObject interface a while ago. This avoids complexity with "what if someone rollback". It also avoid mysql undo segment usage (though rollback is faster in PG).

SAVEPOINTs are useful if we really need to support people rollback transactions *and* we need nested transaction support. I think they could be made to work, but I'm not sold on their necessity for any use cases we have.

-- View this message in context: http://wikimedia.7.n6.nabble.com/Nested-database-transactions-tp4983700p4983... Sent from the Wikipedia Developers mailing list archive at Nabble.com.

Daniel Kinzler

24 Aug 24 Aug

12:36 p.m.

On 24.08.2012 03:14, Aaron Schulz wrote:

...

SAVEPOINTs are useful if we really need to support people rollback transactions *and* we need nested transaction support. I think they could be made to work, but I'm not sold on their necessity for any use cases we have.

So, how would you solve the use case I described? What I need to do is to perform some checks before calling WikiPage::doEdit, and make sure the result of the check is still valid when the actual save occurs.

I can't see a clean way to do this without supporting nested transactions in *some* way.

-- daniel

Tyler Romeo

12:55 p.m.

...

So, how would you solve the use case I described? What I need to do is to perform some checks before calling WikiPage::doEdit, and make sure the

result of

...

the check is still valid when the actual save occurs.

SAVEPOINTs are basically nested transactions. Can you describe the use case in more detail?

*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com

On Fri, Aug 24, 2012 at 12:36 PM, Daniel Kinzler daniel@brightbyte.dewrote:

...

On 24.08.2012 03:14, Aaron Schulz wrote:

...
SAVEPOINTs are useful if we really need to support people rollback transactions *and* we need nested transaction support. I think they

could be

...
made to work, but I'm not sold on their necessity for any use cases we

have.

So, how would you solve the use case I described? What I need to do is to perform some checks before calling WikiPage::doEdit, and make sure the result of the check is still valid when the actual save occurs.

I can't see a clean way to do this without supporting nested transactions in *some* way.

-- daniel

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Daniel Kinzler

1:08 p.m.

On 24.08.2012 18:55, Tyler Romeo wrote:

...

...
So, how would you solve the use case I described? What I need to do is to perform some checks before calling WikiPage::doEdit, and make sure the

result of

...
the check is still valid when the actual save occurs.

SAVEPOINTs are basically nested transactions.

Yes. I'd like to use them.

MediaWiki's current behaviour is calling begin() when a transaction is open silently commits the old transaction and starts a new one.

This SUCKS.

...

Can you describe the use case in more detail?

So, in wikidata, we have global constraints, e.g. the requirement that only one data item can have a sitelink to a given wikipedage (there's a 1:1 relationship between wikipedia pages and data items). Before saving the item page, this constraint needs to be checked, just before calling WikiPage::doEdit(). And we also want to check for edit conflicts (comparing the base revision - note that we are not using EditPage).

Anyway, wee need to do some checks before we call WikiPage::doEdit. And make sure the database doesn't change before the actual save is done. So our checks should be in the same transaction as the actual save.

But WikiPage::doEdit already opens a transaction. So we can no open a surrounding transactiopn bracket - because nested transactions are not supported.

This could be solved be the "counting" or the "safepoint" solution, the latter being somewhat nicer. But we need to st least *one* of them, as far as I can tell.

The current situation is that code always has to know whether it is safe to call some function X from inside a transaction, and conversely, any function needs to decide on whether it expects to be called from within an existing transaction, or if it should open its own.

These things can often not really be known in advance. This has caused trouble in the past (caused by transactions being committed prematurely, because another transaction started). I'm sure it will cause more pain in the future.

So I'm proposing to implement support for nested transactions, either by just counting (and, on rollback, roll back all open transactions). Or by using savepoints.

-- daniel

Aaron Schulz

28 Aug 28 Aug

12:16 a.m.

I'd have to see what you are doing to see if rollback is really needed.

-- View this message in context: http://wikimedia.7.n6.nabble.com/Nested-database-transactions-tp4983700p4984... Sent from the Wikipedia Developers mailing list archive at Nabble.com.

Daniel Kinzler

3:16 a.m.

On 28.08.2012 06:16, Aaron Schulz wrote:

...

I'd have to see what you are doing to see if rollback is really needed.

As far as I see, nested transactions are needed, but no rollback. at least not from our side.

So, for that case, a simple counter would be sufficient. I still wonder why that feature got removed from the code.

-- daniel

Freako F. Freakolowsky

9:09 a.m.

May i barge into this discussion a bit.... and please, feel free to shut me down if i'm missing a point here.

I'm failing to see why all this debate. We already have transactions, we have begin, commit and rollback in the abstraction (Database.php lines 2830+). And this works. All that needs to be done is extending this functonality without breaking the existing MO which maintains a single transaction level.

Daniel, you have the same issue as Akshay has in his extension, which is the fact that doEdit closes a transation before you complete the whole editing process. So you don't need nested transactions as such ... just a way to prevent core from commiting if your extension spans over multiple core transactions and taking over the commiting/rollingback by yourself ... basically what you need is this (not sure about PG compatibility here):

public function begin( $fname = 'DatabaseBase::begin', $maskTransaction ) { if ( $this->mTrxLevel < 2 ) { $this->query( 'BEGIN', $fname ); $this->mTrxLevel = 1; // this is here just to make sure that begin works the same way as now when on txLevel = 1 if ( $maskTransaction ) { $this->mTrxLevel++; } } else { $this->mTrxLevel++; $this->query( 'SAVEPOINT SP'.$this->mTrxLevel, $fname ); }

}

public function commit( $fname = 'DatabaseBase::commit', $doGlobally = false ) { if ( $this->mTrxLevel == 1 || $this->mTrxLevel == 2 || $doGlobally ) { $this->query( 'COMMIT', $fname ); $this->mTrxLevel = 0; } else { $this->mTrxLevel--; } }

public function rollback( $fname = 'DatabaseBase::rollback', $doGlobally = false ) { if ( $this->mTrxLevel == 1|| $this->mTrxLevel == 2 || $doGlobally ) { $this->query( 'ROLLBACK', $fname, true ); $this->mTrxLevel = 0; } elseif ( $this->txLevel > 0 ) { $this->query( 'ROLLBACK TO SAVEPOINT SP'.$this->mTrxLevel, $fname, true ); $this->mTrxLevel--; } }

This should not change anything in core, and an extension gets a way to block core's commits and restarts of an already open transaction. I've added the savepoints-foo, just to have a complete solution.

So what's so drastic about such change? Why all this debate?

On 08/28/2012 09:16 AM, Daniel Kinzler wrote:

...

On 28.08.2012 06:16, Aaron Schulz wrote:

...
I'd have to see what you are doing to see if rollback is really needed.

As far as I see, nested transactions are needed, but no rollback. at least not from our side.

So, for that case, a simple counter would be sufficient. I still wonder why that feature got removed from the code.

-- daniel

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

4503

Age (days ago)

4508

Last active (days ago)

wikitech-l@lists.wikimedia.org

17 comments

9 participants

tags (0)

participants (9)

Aaron Schulz
Brad Jorsch
Brion Vibber
Daniel Kinzler
Evan Priestley
Freako F. Freakolowsky
Jeroen De Dauw
Platonides
Tyler Romeo