You are viewing kostja_osipov

Fish Magic - RDBMS software is difficult

> Recent Entries
> Archive
> Friends
> Profile
> My photos at flickr

April 28th, 2010


Previous Entry Add to Memories Share Next Entry
02:06 am - RDBMS software is difficult
I spent the day today reviewing Dmitry's patch for Bug#46947. When looking at the quick fix, we discovered that MySQL <-> storage engine locking protocol is very weakly defined when it comes to more advanced features, such as subqueries, stored functions, views.
It's badly defined, it's not documented, it's not tested. As a result, some bits of server behaviour flipped back and forth between 4.1, 5.1 and 5.5.

The problem is that when a statement accesses the table via a subquery, view or a function, the lock that the server needs to take on that table is not a function of the SQL in the subquery/view itself, but depends on the context where this view or function is used.
If we use the view in an UPDATE statement, and don't take sufficiently strong locks on its tables or rows, replication may break, or, in some engines, repeatable read consistency get violated.

This problem does not depend on how effectively we optimize subqueries. It does not matter how fast the storage engine underneath is. It's a bug in the infrastructure that MySQL server provides to its engines, and one that is not easy to get right.

This reminded me of my conversation with David, one of MySQL founders, on Drizzle, at the recent O'Reilly MySQL Conference.
Obviously, there is no such bug in Drizzle, which has no subqueries, functions or views. And when they add any of those, the bug is very likely to resurface.
That's why back then I said that it's impossible to rethink everything. At least, it takes ages to do. I mentioned that no one was able to "rethink" InnoDB, or throw away and replace the obscure code of MySQL optimiser. I mentioned that there is a lot more to it, than just killing badly designed and non-standard MySQL behaviour, or rewriting all the bits to use STL and boost. I said that when it comes to good stuff, there is no reason why MySQL can't or won't do it. The main reason it is harder to do changes with MySQL is a larger legacy, including political and managerial, but you get into exact same situation in any project after your first release. I said that all things considered, the current MySQL trunk is perhaps as good starting point for rethinking as the current Drizzle.

Recent years there's been a serious fragmentation of technical thought in MySQL ecosystem. Drizzle, MariaDB, Percona are excellent for community, but are not at all good for our ability to make MySQL a universal database platform. I mean, ability to make MySQL a database platform comparable to what Linux/Unix is nowadays to operating systems. Truth be said, I am not at all sure that my current employer, Oracle, is a good host to seek this holy grail either. Perhaps we'll never get there, not with this project.

I would not want to actually diminish importance of Drizzle (initially, I was fond of it and rather wanted to join; the reason I didn't, I've just spelled out). I'd love to be proven wrong, but I don't see it becoming such a universal piece of software that I personally would like to be contributing to. And I've never blogged about it before since, I thought, the more forks, the better.
The reason for this blog post is the recent interview on Drizzle with our beloved community leader. Watching the interview, I thought that some properties of Drizzle or all forks, for that matter, are not clearly understood.

(4 comments | Leave a comment)

Comments:


From:macplusg3
Date:April 28th, 2010 01:38 am (UTC)

We have subqueries :)

(Link)
drizzle> select * from t1 where a in (select a from t1 where a>2);
+---+
| a |
+---+
| 3 |
+---+
1 row in set (0 sec)

drizzle> explain select * from t1 where a in (select a from t1 where a>2);
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
| 1 | PRIMARY | t1 | index | NULL | PRIMARY | 4 | NULL | 2 | Using where; Using index |
| 2 | SUBQUERY | t1 | index | PRIMARY | PRIMARY | 4 | NULL | 2 | Using where; Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
2 rows in set (0 sec)
[User Picture]
From:kostja_osipov
Date:April 28th, 2010 08:07 am (UTC)

Re: We have subqueries :)

(Link)
Good :) Then you need to watch out for MySQL bugs in subqueries, because you're likely to have some of them.
From:kristiannielsen
Date:April 28th, 2010 06:19 am (UTC)

Agree

(Link)
Excellent points, Kostja!

And I do agree that it is unfortunate with the fragmentation due to the many forks. I think they are necessary at the moment (I don't see how I would be able to work on the MySQL code now without MariaDB), but I am still hoping against hope that MySQL and MariaDB will be able to join up again eventually.
[User Picture]
From:krow
Date:April 28th, 2010 03:32 pm (UTC)
(Link)
We have, subqueries, functions, and some would argue views (though certainly not "CREATE VIEW" type views).

Universal database platform? FreeBSD continues to exist.

Fragmentation is just fine, too much of the problems in the past were based on a single group thought.


> Go to Top
LiveJournal.com