Using MySQL as a NoSQL

List overview All Threads
Download

newer

older

Alternative to opendir() functions?

list of halfway implemented CSS

Nikola Smolenski

23 Dec 2010 23 Dec '10

4:34 p.m.

I have recently encountered this text in which the author claims very high MySQL speedups for simple queries (7.5 times faster than MySQL, twice faster than memcached) by reading the data directly from InnoDB where possible (MySQL is still used for writing and for complex queries.) Knowing that faster DB is always good, I thought this would be an interesting thing to consider :)

http://yoshinorimatsunobu.blogspot.com/2010/10/using-mysql-as-nosql-story-fo...

Show replies by date

Platonides

24 Dec 24 Dec

3:19 a.m.

Nikola Smolenski wrote:

...

I have recently encountered this text in which the author claims very high MySQL speedups for simple queries (7.5 times faster than MySQL, twice faster than memcached) by reading the data directly from InnoDB where possible (MySQL is still used for writing and for complex queries.) Knowing that faster DB is always good, I thought this would be an interesting thing to consider :)

http://yoshinorimatsunobu.blogspot.com/2010/10/using-mysql-as-nosql-story-fo...

It looks interesting. There are some places where mediawiki could take that shortcut if available. I wonder if we have such CPU bottleneck, though.

George Herbert

3:55 a.m.

On Thu, Dec 23, 2010 at 5:19 PM, Platonides Platonides@gmail.com wrote:

...

Nikola Smolenski wrote:

...
I have recently encountered this text in which the author claims very high MySQL speedups for simple queries (7.5 times faster than MySQL, twice faster than memcached) by reading the data directly from InnoDB where possible (MySQL is still used for writing and for complex queries.) Knowing that faster DB is always good, I thought this would be an interesting thing to consider :)

http://yoshinorimatsunobu.blogspot.com/2010/10/using-mysql-as-nosql-story-fo...

It looks interesting. There are some places where mediawiki could take that shortcut if available. I wonder if we have such CPU bottleneck, though.

I was afraid it would be a local-DB-only hack, but on reading it they implemented a full network service (text based, telnet as a minimal interface, bueno for sysadmins/engineers and debugging) inside MySQL's main daemon software, which was pretty darn cool of them.

Even if this wasn't ultimately relevant for MediaWiki or Wikimedia projects, thanks for posting that to the list, Nikola. I do systems architecture involving DBs a lot, and I could well end up recommending something like this for a customer application.

-- -george william herbert george.herbert@gmail.com

Anthony

4:44 a.m.

On Thu, Dec 23, 2010 at 9:34 AM, Nikola Smolenski smolensk@eunet.rs wrote:

...

I have recently encountered this text in which the author claims very high MySQL speedups for simple queries (7.5 times faster than MySQL, twice faster than memcached) by reading the data directly from InnoDB where possible (MySQL is still used for writing and for complex queries.) Knowing that faster DB is always good, I thought this would be an interesting thing to consider :)

http://yoshinorimatsunobu.blogspot.com/2010/10/using-mysql-as-nosql-story-fo...

It's easy to get fast results if you don't care about your reads being atomic (*), and I find it hard to believe they've managed to get atomic reads without going through MySQL.

(*) Among other possibilities, just use MyISAM.

Domas Mituzas

11:08 a.m.

Hi!

...

It's easy to get fast results if you don't care about your reads being atomic (*), and I find it hard to believe they've managed to get atomic reads without going through MySQL.

MySQL upper layers know nothing much about transactions, it is all engine-specific - BEGIN and COMMIT processing is deferred to table handlers. It would incredibly easy for them to implement repeatable read snapshots :) (if thats what you mean by atomic read)

...

(*) Among other possibilities, just use MyISAM.

How is that applicable to any discussion?

Domas

Jared Williams

3:26 p.m.

Hi,

...

-----Original Message----- From: wikitech-l-bounces@lists.wikimedia.org [mailto:wikitech-l-bounces@lists.wikimedia.org] On Behalf Of Domas Mituzas Sent: 24 December 2010 09:09 To: Wikimedia developers Subject: Re: [Wikitech-l] Using MySQL as a NoSQL

Hi!

A:

...
It's easy to get fast results if you don't care about your

reads being

...
atomic (*), and I find it hard to believe they've managed to get atomic reads without going through MySQL.

MySQL upper layers know nothing much about transactions, it is all engine-specific - BEGIN and COMMIT processing is deferred to table handlers. It would incredibly easy for them to implement repeatable read snapshots :) (if thats what you mean by atomic read)

It seems from my tinkering that MySQL query cache handling is circumvented via HandlerSocket. So if you update/insert/delete via HandlerSocket, then query via SQL your not guarenteed to see the changes unless you use SQL_NO_CACHE.

...

...
(*) Among other possibilities, just use MyISAM.

How is that applicable to any discussion?

Domas _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Jared

Domas Mituzas

3:41 p.m.

Hi!

...

It seems from my tinkering that MySQL query cache handling is circumvented via HandlerSocket.

On busy systems (I assume we talk about busy systems, as discussion is about HS) query cache is usually eliminated anyway. Either by compiling it out, or by patching the code not to use qcache mutexes unless it really really is enabled. In worst case, it is just simply disabled. :)

...

So if you update/insert/delete via HandlerSocket, then query via SQL your not guarenteed to see the changes unless you use SQL_NO_CACHE.

You are probably right. Again, nobody cares about qcache at those performance boundaries.

Domas

Jared Williams

6:17 p.m.

...

-----Original Message----- From: wikitech-l-bounces@lists.wikimedia.org [mailto:wikitech-l-bounces@lists.wikimedia.org] On Behalf Of Domas Mituzas Sent: 24 December 2010 13:42 To: Wikimedia developers Subject: Re: [Wikitech-l] Using MySQL as a NoSQL

Hi!

...
It seems from my tinkering that MySQL query cache handling is circumvented via HandlerSocket.

On busy systems (I assume we talk about busy systems, as discussion is about HS) query cache is usually eliminated anyway. Either by compiling it out, or by patching the code not to use qcache mutexes unless it really really is enabled. In worst case, it is just simply disabled. :)

...
So if you update/insert/delete via HandlerSocket, then

query via SQL

...
your not guarenteed to see the changes unless you use

SQL_NO_CACHE.

...

You are probably right. Again, nobody cares about qcache at those performance boundaries.

Domas

Ah, interesting. The only reason I took at it was because you don't have to pfaff with encoding/escaping values* the way you have to SQL. SQL injection vulnerabilities don't exist.

* And the protocol handles binary values which normally have to pfaff about getting in and out of MySQL with the various PHP apis.

Does seem a bit specialised, could have a persistent cache, maybe as a session handler.

Jared

Jared Williams

6:22 p.m.

...

-----Original Message----- From: wikitech-l-bounces@lists.wikimedia.org [mailto:wikitech-l-bounces@lists.wikimedia.org] On Behalf Of Jared Williams Sent: 24 December 2010 16:18 To: 'Wikimedia developers' Subject: Re: [Wikitech-l] Using MySQL as a NoSQL

...
-----Original Message----- From: wikitech-l-bounces@lists.wikimedia.org [mailto:wikitech-l-bounces@lists.wikimedia.org] On Behalf Of Domas

...

...
Mituzas Sent: 24 December 2010 13:42 To: Wikimedia developers Subject: Re: [Wikitech-l] Using MySQL as a NoSQL

Hi!

...
It seems from my tinkering that MySQL query cache handling is circumvented via HandlerSocket.

On busy systems (I assume we talk about busy systems, as

discussion is

...
about HS) query cache is usually eliminated anyway. Either by compiling it out, or by patching the code not to

use qcache

...
mutexes unless it really really is enabled. In worst case,

it is just

...
simply disabled. :)

...
So if you update/insert/delete via HandlerSocket, then

query via SQL

...
your not guarenteed to see the changes unless you use

SQL_NO_CACHE.

...
You are probably right. Again, nobody cares about qcache at those performance boundaries.

Domas

Ah, interesting. The only reason I took at it was because you don't have to pfaff with encoding/escaping values* the way you have to SQL. SQL injection vulnerabilities don't exist.

And the protocol handles binary values which normally have

to pfaff about getting in and out of MySQL with the various PHP

apis.

...

Does seem a bit specialised, could have a persistent cache, maybe as a session handler.

Maybe a session handler even.

...

Jared

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Anthony

4:35 p.m.

On Fri, Dec 24, 2010 at 4:08 AM, Domas Mituzas midom.lists@gmail.com wrote:

...

Hi!

A:

...
It's easy to get fast results if you don't care about your reads being atomic (*), and I find it hard to believe they've managed to get atomic reads without going through MySQL.

MySQL upper layers know nothing much about transactions, it is all engine-specific - BEGIN and COMMIT processing is deferred to table handlers. It would incredibly easy for them to implement repeatable read snapshots :) (if thats what you mean by atomic read)

I suppose it's possible in theory, but in any case, it's not what they're doing. They *are* going through MySQL, via the HandlerSocket plugin.

I wonder if they'd get much different performance by just using prepared statements and read committed isolation, with the transactions spanning multiple requests. The tables would only get locked once per transaction, right?

Or do I just have no idea what I'm talking about?

...

...
(*) Among other possibilities, just use MyISAM.

How is that applicable to any discussion?

It was an example of a way to get fast results if you don't care about your reads being atomic.

Domas Mituzas

7:35 p.m.

Hi!

...

I suppose it's possible in theory, but in any case, it's not what they're doing. They *are* going through MySQL, via the HandlerSocket plugin.

After reading the code I can sure correct myself - they are calling into some of MySQL things (e.g. for table open), but everything else is table handler interfaces :-) Though I guess what I mean "going through MySQL" and you have in mind are entirely different topics :)

...

I wonder if they'd get much different performance by just using prepared statements and read committed isolation, with the transactions spanning multiple requests. The tables would only get locked once per transaction, right?

They wouldn't gain too much of perf with PS, transaction establishment is quite cheap in MySQL, especially compared to other vendors. Do note, prepared statements don't prepare query plan for you - it is reestablished at every execution, neither it does have open table handlers, IIRC - generally you just don't have to reparse the text.

...

It was an example of a way to get fast results if you don't care about your reads being atomic.

InnoDB is faster than MyISAM at high performance workloads.

Domas

Domas Mituzas

11:01 a.m.

Hi!

...

I have recently encountered this text in which the author claims very high MySQL speedups for simple queries

It is not that he speeds up simple queries (you'd notice that maybe if you used infiniband, and even then it wouldn't matter much :) He just avoided hitting some expensive critical sections that make scaling on multicore systems problematic.

...

It looks interesting. There are some places where mediawiki could take that shortcut if available.

It wouldn't be a shortcut if you had to establish another database connection besides existing one.

...

I wonder if we have such CPU bottleneck, though.

No, not really. Our average (do note, this isn't median and is affected by heavy queries more) DB response time is 1.3ms (measuring on the client).

Domas

Nikola Smolenski

11:22 a.m.

On 12/24/2010 10:01 AM, Domas Mituzas wrote:

...

...
I wonder if we have such CPU bottleneck, though.

No, not really. Our average (do note, this isn't median and is affected by heavy queries more) DB response time is 1.3ms (measuring on the client).

This could also reduce memory usage by not using memcached (as often) which, I understand, is a bigger problem.

Domas Mituzas

11:33 a.m.

Hi!

...

This could also reduce memory usage by not using memcached (as often) which, I understand, is a bigger problem.

No it is not.

First of all, our memcached and database access times not that far away - 0.7 vs 1.3 ms (again, memcached is static response time, whereas database average is impacted by calculations). On another hand, we don't store in memcached what is stored in database and we don't store in database what is stored in memcached.

Think about these as two separate systems, not as complementing each other too much. We use memcached to offload application cluster, not database cluster.

And database cluster already has over a terabyte of RAM (replicas and whatnot), whereas our memcached lives in puny 158GB arena.

I described some of fundamental differences of how we use memcached in http://dom.as/uc/workbook2007.pdf - pages 11-13. Nothing much changed since then.

Domas

Platonides

5:24 p.m.

Domas Mituzas wrote:

...

...
It looks interesting. There are some places where mediawiki could take that shortcut if available.

It wouldn't be a shortcut if you had to establish another database connection besides existing one.

I was assuming usage of pfsockopen(), of course.

Domas Mituzas

7:35 p.m.

Hi!

...

I was assuming usage of pfsockopen(), of course.

Though protocol is slightly cheaper, you still have to do TCP handshake :)

Domas

5112

Age (days ago)

5113

Last active (days ago)

wikitech-l@lists.wikimedia.org

15 comments

6 participants

tags (0)

participants (6)

Anthony
Domas Mituzas
George Herbert
Jared Williams
Nikola Smolenski
Platonides