transaction isolation level set to READ UNCOMMITTED

List overview All Threads
Download

newer

older

PHP weirdness

Killing long running queries

Daniel Kinzler

18 Jul 2006 18 Jul '06

5:11 p.m.

Hi all

Today, we had another replag problem, caused by Aka's user stats script. Replag kept rising even after the script was stopped (not killed). Apparently, the reason lies with the transaction mode used: a long running query may lock (parts of) the tables it uses, denying any updates the replication daemon wants to make.

As far as I know, this should not happen if the long running query uses READ UNCOMMITTED (dirty read). Using it should not cause serious problems in our context, save a bit of resources and allow replication to continue while reading from tables. Details on transaction isolation levels can be found here: http://dev.mysql.com/doc/refman/5.0/en/innodb-transaction-isolation.html

DaB. has set the global default isolation level to READ UNCOMMITTED for now. If you are experiencing problems or need data that is guarantied to be consistent under all circumstances, *and* your query does not take long (no more than, say, a minute or so), you can change the isolation level like this:

SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;

REPEATABLE READ is MySQL's "normal" default, i.e. what you have been using until now if you did not set the isolation level explicitly. There are other options, look at the documentation link above.

I hope this actually does solve the problem, and does not cause you too much trouble.

Regards -- Daniel

-- Homepage: http://brightbyte.de

Show replies by date

Andre Karwath

18 Jul 18 Jul

6:25 p.m.

Hi,

...

Today, we had another replag problem, caused by Aka's user stats script. Replag kept rising even after the script was stopped (not killed). Apparently, the reason lies with the transaction mode used: a long running query may lock (parts of) the tables it uses, denying any updates the replication daemon wants to make.

the script already starts with

"set session transaction isolation level read uncommitted"

as told by River on this list a few months ago. Beside this, the last time it run it did not affect the replication lag. So, I'm not sure that this script caused these problems.

-- Aka

Andre Karwath

6:59 p.m.

Hi again,

...

the script already starts with

"set session transaction isolation level read uncommitted"

as told by River on this list a few months ago. Beside this, the last time it run it did not affect the replication lag. So, I'm not sure that this script caused these problems.

.. and the script pauses, if the replication lag raises above a few seconds, just to make sure it gets not killed again ... During these pauses it sleeps and there are no pending database queries.

How can it affect the replication lag then? Please find out the real source of these problems and do not kill random processes.

-- Aka

Daniel Kinzler

7:21 p.m.

New subject: transaction isolation level set to READ UNCOMMITTED

...

How can it affect the replication lag then? Please find out the real source of these problems and do not kill random processes.

It was definitely the cause. it was the only thing that was started when replag started to rise, and when it was killed, replag dropped instantly (it did not drop while the script was paused). it *was* your script. I'm not sure why, but it was.

I asked DaB to save a copy of SHOW FULL PROCESSLIST next time something like this happens, so we can investigate more closely. Right now I don't have enough information to tell you what precisely happened.

-- Daniel

-- Homepage: http://brightbyte.de

Andre Karwath

7:29 p.m.

Hi,

...

...
How can it affect the replication lag then? Please find out the real source of these problems and do not kill random processes.

It was definitely the cause. it was the only thing that was started when replag started to rise, and when it was killed, replag dropped instantly (it did not drop while the script was paused). it *was* your script. I'm not sure why, but it was.

I asked DaB to save a copy of SHOW FULL PROCESSLIST next time something like this happens, so we can investigate more closely. Right now I don't have enough information to tell you what precisely happened.

I have no idea how this can happen and how a script can affect the replication lag while it does nothing than sleeping. But I will not start it again - then there are just no statistic updates anymore.

BTW, the lag raises very often even if this scripts does'nt run ..

-- Aka

Daniel Kinzler

8:58 p.m.

New subject: transaction isolation level set to READ UNCOMMITTED

...

I have no idea how this can happen and how a script can affect the replication lag while it does nothing than sleeping.

By holding a lock. I don't know if this is the case here, but it would be an explanation. Are you using explicit transactions?

...

But I will not start it again - then there are just no statistic updates anymore.

Please tell me what queries exactly the script performs. I'm pretty sure this could be done more efficiently somehow - for example, by using at the recentchanges table instead of revisions.

...

BTW, the lag raises very often even if this scripts does'nt run ..

Every time the replag rises like this, we find a script that's causing it. It's not always yours, sure, but it has been before.

Don't think we just want to blame you - your efforts are appreciated. The toolserver is just at its limits right now (we *still* don't have the fast disk online - this is really annoying). We have to cope somehow.

Regards Daniel

-- Homepage: http://brightbyte.de

Essjay

9:21 p.m.

New subject: transaction isolation level set to READ UNCOMMITTED

Daniel Kinzler wrote:

...

...
I have no idea how this can happen and how a script can affect the replication lag while it does nothing than sleeping.

By holding a lock. I don't know if this is the case here, but it would be an explanation. Are you using explicit transactions?

...
But I will not start it again - then there are just no statistic updates anymore.

Please tell me what queries exactly the script performs. I'm pretty sure this could be done more efficiently somehow - for example, by using at the recentchanges table instead of revisions.

...
BTW, the lag raises very often even if this scripts does'nt run ..

Every time the replag rises like this, we find a script that's causing it. It's not always yours, sure, but it has been before.

Don't think we just want to blame you - your efforts are appreciated. The toolserver is just at its limits right now (we *still* don't have the fast disk online - this is really annoying). We have to cope somehow.

Regards Daniel

Sorry to interrupt here, but I'm only getting one side of these emails (those from Daniel); I've checked that it's not a filtering problem on my end, so I'm a bit confused. Wondering if anyone else is having this problem?

Essjay

-- http://en.wikipedia.org/wiki/User:Essjay Wikipedia:The Free Encyclopedia http://www.wikipedia.org/

DaB.

10:09 p.m.

New subject: transaction isolation level set to READ UNCOMMITTED

Hello, Am Dienstag, den 18.07.2006, 15:21 -0400 schrieb Essjay:

...

Sorry to interrupt here, but I'm only getting one side of these emails (those from Daniel); I've checked that it's not a filtering problem on my end, so I'm a bit confused. Wondering if anyone else is having this problem?

No, no problem here.

...

Essjay

Sincerly, DaB.

-- Blog: http://www.wp-blog.de | PGP: 2D3EE2D42B255885

mdd4696＠gmail.com

10:33 p.m.

New subject: transaction isolation level set to READ UNCOMMITTED

...

Sorry to interrupt here, but I'm only getting one side of these emails (those from Daniel); I've checked that it's not a filtering problem on my end, so I'm a bit confused. Wondering if anyone else is having this problem?

Essjay

I am also only receiving one side of the conversation -- only the messages from Daniel.

~mdd4696

Platonides

10:59 p.m.

New subject: transaction isolation level set to READUNCOMMITTED

...

...
Sorry to interrupt here, but I'm only getting one side of these emails (those from Daniel); I've checked that it's not a filtering problem on my end, so I'm a bit confused. Wondering if anyone else is having this problem?

Essjay

I am also only receiving one side of the conversation -- only the messages from Daniel.

~mdd4696

Me too, only from Daniel.

Marco Schuster

19 Jul 19 Jul

3:28 p.m.

New subject: transaction isolation level set to READUNCOMMITTED

Platonides schrieb:

...

...
...
Sorry to interrupt here, but I'm only getting one side of these emails (those from Daniel); I've checked that it's not a filtering problem on my end, so I'm a bit confused. Wondering if anyone else is having this problem?

Essjay

I am also only receiving one side of the conversation -- only the messages from Daniel.

Recieving all emails. Sure you checked your spam directory (maybe a special char in topic is reason for alarming your junk filter)

Regards, Marco

Platonides

6:35 p.m.

New subject: transaction isolation level setto READUNCOMMITTED

Was it. Gmail caught as spam, 4 Aka messages (it'll show as only an item). Now i've recovered them :) Essjay also uses gmail so i guess it was his problem too.

I don't see any 'special char' or anything wrong in the emails. And they're simple plain text. :s Strange.

...

Recieving all emails. Sure you checked your spam directory (maybe a special char in topic is reason for alarming your junk filter)

Marco Schuster

7:46 p.m.

New subject: transaction isolation level setto READUNCOMMITTED

Platonides schrieb:

...

Was it. Gmail caught as spam, 4 Aka messages (it'll show as only an item). Now i've recovered them :) Essjay also uses gmail so i guess it was his problem too.

I don't see any 'special char' or anything wrong in the emails. And they're simple plain text. :s Strange.

Hm, I got (in mail topic) a black square between setto and READUNCOMMITED. It is a tabulator, maybe email clients filter it out.

Greets, Marco

Platonides

8:11 p.m.

New subject: transaction isolation level setto READUNCOMMITTED

I don't see it. But it was gmail, not my mail client.

...

Hm, I got (in mail topic) a black square between setto and READUNCOMMITED. It is a tabulator, maybe email clients filter it out.

Greets, Marco

Andre Karwath

18 Jul 18 Jul

10:07 p.m.

Hi,

...

By holding a lock. I don't know if this is the case here, but it would be an explanation. Are you using explicit transactions?

No.

...

...
But I will not start it again - then there are just no statistic updates anymore.

Please tell me what queries exactly the script performs. I'm pretty sure this could be done more efficiently somehow - for example, by using at the recentchanges table instead of revisions.

No, the recentchances does not help. For the statements, please see "~aka/Cronjobs/UpdateDEUserStat.pl". The long running function is called "GetUserCounts". Its pretty straight forward.

I already played with caches and so on, but this introduces other problems. So, I have no idea how I can still further improve the script without loosing functionality.

If you have an idea, please drop me a (private) mail, and you can write in German then ;)

Regards, Aka

Gregory Maxwell

19 Jul 19 Jul

6:50 p.m.

New subject: transaction isolation level set to READ UNCOMMITTED

On 7/18/06, Daniel Kinzler daniel@brightbyte.de wrote:

...

As far as I know, this should not happen if the long running query uses READ UNCOMMITTED (dirty read). Using it should not cause serious problems in our context, save a bit of resources and allow replication to continue while reading from tables. Details on transaction isolation levels can be found here: http://dev.mysql.com/doc/refman/5.0/en/innodb-transaction-isolation.html

DaB. has set the global default isolation level to READ UNCOMMITTED for now. If you are experiencing problems or need data that is guarantied to be consistent under all circumstances, *and* your query does not take long (no more than, say, a minute or so), you can change the isolation level like this:

I already run my queries that can run safely with dirty reads, with dirty reads. Yet DaB has still killed one of my queries, and even accused my query of impacting replag when I wasn't even running one.

The replag paranoia is out of hand...

There is no reason that a select should be blocking other reads no matter what the isolation level is.... Mysql had that sort of behavior with ISAM tables, but it should not exist with innodb..

If this is the case, can we *please* ask mysql to fix their software?

Daniel Kinzler

9:14 p.m.

New subject: transaction isolation level set to READ UNCOMMITTED

...

The replag paranoia is out of hand...

It would help to have more than one person looking at it.

...

There is no reason that a select should be blocking other reads no matter what the isolation level is.... Mysql had that sort of behavior with ISAM tables, but it should not exist with innodb..

If this is the case, can we *please* ask mysql to fix their software?

If we find the exact problem, sure. At the time, DaB's replication tools where getting transaction timeouts while trying to perform updates. So *something* was locking the tables. It would be great if anyone could tell use what except long running selects could cause this (considering no one else can write to the db).

If setting the default isolation level to dirty *really* does not help, or if it causes *serous* problems to anyone, we should undo it. OTOH, if you rely on a specific transactional behavior, you should probably request it explicitly anyway.

Regards, Daniel

PS: the subject line is so long that it got wrapped into a new header line, and a tab was inserted at the start of the wrapped part. This is compliant with the mail RFC, afaik. But apparently, some clients or MTA's choke on it.

-- Homepage: http://brightbyte.de

Gregory Maxwell

9:50 p.m.

New subject: transaction isolation level set to READ UNCOMMITTED

On 7/19/06, Daniel Kinzler daniel@brightbyte.de wrote:

...

...
The replag paranoia is out of hand...

It would help to have more than one person looking at it.

...
There is no reason that a select should be blocking other reads no matter what the isolation level is.... Mysql had that sort of behavior with ISAM tables, but it should not exist with innodb..

If this is the case, can we *please* ask mysql to fix their software?

If we find the exact problem, sure.

[snip]

We know whats happening.. the system is IO starved and has been for a long time. The system sits and thrashes on the disk while only doing <6mbytes/sec of work. There might be additional complications, but it is hard to troubleshoot something that is burried under another problem.

Weither this is a pure hardware issue (lack of IO capacity) or something that can be partially adressed through tunables or checking for FS fragmentation (have updates to page and *links fragmented the disk? I'm not sure. Does mysql support tablespaces? How does inno store updates? Does solaris behave smartly with interleaved appends to multiple files?) is not something I'm sure of..

It is clear that replication playback even with no users on the system is just able to catch up.. in other words, it'sunacceptably slow... so why all the cycles wasted on user oriented solutions?

As far as read uncommited goes, any time you join one of the updated tables (page or *links) to another table on a live, you'll risk getting bogus results because of the referenced rows on the other side may not yet exist.

In any case, MySQL AB claims that with innodb tables *readers never block writers*. If thats not true it is a pretty serious problem.

DaB.

20 Jul 20 Jul

12:17 a.m.

New subject: transaction isolation level set to ....

Hello, Am Mittwoch, den 19.07.2006, 21:14 +0200 schrieb Daniel Kinzler:

...

...
The replag paranoia is out of hand...

It would help to have more than one person looking at it.

I must agree with daniel in this point. There are 4 people (if I calc right) with root-rights on zedler at the moment, but I'm the only one, who is realy in the chat at the moment. And I guess, that I'm the person with the lowest experience. So please come and help us/me :)

@river: If hemlock would be avaible, the huge amount of replag reasons (scripts which torture the discs) could be eliminate. I had said it and say it again: I will help you in any way, you must just say, how (I would even speak with brion, if you see a access-problem).

...

If we find the exact problem, sure. At the time, DaB's replication tools where getting transaction timeouts while trying to perform updates.

It was not my sync-programm, it was the mysql-one. You should see, how critical that is, because this tool is not programm by an amateur like me, but from the profis by mysql.

I'm very unhappy if I annoy somebody, because I kill his/her mysql-query or his/her programm. I'm sorry about this.

Sincerly, DaB.

...

-- Blog: http://www.wp-blog.de | PGP: 2D3EE2D42B255885

River Tarnell

19 Jul 19 Jul

6:52 p.m.

New subject: transaction isolation level set to READ UNCOMMITTED

Daniel Kinzler:

...

DaB. has set the global default isolation level to READ UNCOMMITTED

i disagree with this change. Aka has said that he already runs his queries--which were the justification for this change--with READ UNCOMMITTED isolation level, so this change will not help.

by default, we should always provide correct results to users, unless they request otherwise.

- river.

Kelly Martin

7:23 p.m.

New subject: transaction isolation level set to READ UNCOMMITTED

I do too. Fix the goddamned hardware instead of finding ways to make the box even more unusable for its intended purpose.

Kelly

On 7/19/06, River Tarnell river@attenuate.org wrote:

...

Daniel Kinzler:

...
DaB. has set the global default isolation level to READ UNCOMMITTED

i disagree with this change. Aka has said that he already runs his queries--which were the justification for this change--with READ UNCOMMITTED isolation level, so this change will not help.

by default, we should always provide correct results to users, unless they request otherwise.
    - river.
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFEvmNprCkrk04PPEoRAj1XAJ9ql8uov0hVZTDMG6U20zXtBNelxgCcCZrE FmdaJXNpOQRfdoT7IG32GM4= =3ZV2 -----END PGP SIGNATURE-----

Toolserver-l mailing list Toolserver-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/toolserver-l

6731

Age (days ago)

6732

Last active (days ago)

toolserver-l@lists.wikimedia.org

20 comments

10 participants

tags (0)

participants (10)

Andre Karwath
DaB.
Daniel Kinzler
Essjay
Gregory Maxwell
Kelly Martin
Marco Schuster
mdd4696＠gmail.com
Platonides
River Tarnell