Log in

No account? Create an account
MySQL: multiple user level locks per connection - Fish Magic — LiveJournal

> Recent Entries
> Archive
> Friends
> Profile
> My photos at flickr

December 4th, 2012

Previous Entry Share Next Entry
11:44 pm - MySQL: multiple user level locks per connection

People say that to have a good vacation, you need to do something else, something you don't do every day at work.
So, instead of hacking on Tarantool, I did some good old MySQL hacking. Thanks to Alexey Rybak from Badoo I had a nice opportunity for it -- a task to improve MySQL user level locks.

GET_LOCK() function in MySQL allows a connection to hold at most one user level lock. Taking a new lock automatically releases the old lock, if any.
The limit of one lock per session existed since early versions of MySQL didn't have a deadlock detector for SQL locks. MDL patches in MySQL 5.5 added a deadlock detector, so starting from 5.5 it became possible to take multiple locks in any order -- a deadlock, should it occur, would be detected and an error returned to the client that closed the wait chain.
So, thanks to MDL, implementing user level locks seemed to be an easy task, and in line with MySQL general strategy of moving all hand-crafted lock implementations to a single system. A code cleanup, too.

The implementation indeed turned out to be rather straightforward, but as it always happens with MySQL, not without issues.
By now I've finished working on the patch and published the tree, it's available here:

I intend to contribute the patch to all MySQL forks - MySQL at Oracle, Percona, MariaDB. I'm publishing the patch under BSD licence, so any other fork (Twitter, Facebook, Google) is welcome to pick it up too.

Now let me list some less obvious moments in the new user level locks:

  • it has become possible not only to take distinct locks in the same connection, it's also possible to take the same lock twice. In this case, the lock is granted and each instance of the same lock needs to be released afterwards. In other words, the new user level locks are recursive.
  • the documented (and preserved) behaviour of GET_LOCK() is to return 0 in case of lock wait timeout and NULL in case of error. This doesn't look right to me, since when a lock is not granted I'd personally prefer getting an error, not a 0 or NULL. This starts to matter when a user lock is taken inside a stored function or trigger - if an error is returned, the statement is usually aborted, but 0 or NULL from GET_LOCK will keep it going. So as long as currently GET_LOCK() timeout doesn't return an error, it's possible that a trigger is invoked for each row, and the lock times out for some rows, and doesn't time out for others. But oh well, this is the current MySQL behaviour, so a matter of separate consideration.
  • if a connection which is waiting on a user level lock is killed by KILL CONNECTION/KILL QUERY, it's wait is aborted. This is alright, and works with MDL too. GET_LOCK() returns NULL in this case, and I preserved this behaviour. But if a connection is simply gone (the client has disappeared, closed a socket, crashed, etc, all this while waiting on a user lock), the old user lock wait implementation would eventually detect an abandoned socket, and abort the wait.

    MDL, however, didn't look at session sockets while waiting on a lock. I thought that this matter is important enough and fixed MDL to look at session socket state during long waits on any lock. Indeed, the whole checking for the disconnected mutex was done in scope of a fix for Bug#10374 by Davi Arnaut. (Hello, Oracle, if not for an open bugs database, I would never be able to find or understand this!). At some point this was considered important enough, so why break it.

  • the last issue is with variable @@lock_wait_timeout. In theory, @@lock_wait_timeout should affect all locks in SQL. I could make it work for user locks as well. But I decided not to do it yet, since there is always an explicit timeout, and honouring @@lock_wait_timeout would mean checking which one is larger -- the explicitly provided one, or session global, and honoring the smaller timeout. This perhaps needs to be done.

Fixes in tests

It was a surprise to see that actually no test is relying on (or testing) the fact there could be only one lock per session. There is not even a test which would test all return values of GET_LOCK() or RELEASE_LOCK(). For example, if a lock is not owned by this session, RELEASE_LOCK() returns either NULL or 0, depending on whether the lock exists at all or not. And I haven't found tests for IS_USED_LOCK()/IS_FREE_LOCK() either.
The main test suite actually passed after the first draft, and most surprises came from the replication tests.
For example, rpl_err_ignoredtable.test in 5.5 apparenty works according to the intent of the author, but despite some of its obscure details.
In particular, this test takes a user lock in an UPDATE, to make sure that UPDATE blocks at some point, and be able to abort it while it's blocked. But to detect that the UPDATE has blocked, an impossible condition is used, so the detection code actually oversleeps the lock wait timeout.
This test started to fail when lock implementation changed, so I had to provide a correct wait condition.
rpl_stm_000001 (why would you use 5 leading zeros in a test name, especially considering there is only rpl_stm_000002?-)) has a hard-coded sleep, instead of a synchronous wait, so I fixed it too.
Another replication test -- rpl.rpl_rewrt_db -- failed since it relied on the order of subsystem destruction in server session cleanup (THD::~THD()).
Before my patch, user level locks of a session were destroyed last, in particular, after closing temporary tables. So, this replication test would do the following trick to synchronously wait until a temporary table is closed:

  • take a lock in a session
  • kill it
  • take a lock in a concurrent session, and, as soon as this lock is granted, assume that the other session is destroyed, and, in particular, temporrary tables are closed (the side effect which was ultimately desired).

How clever! Except that at first I put user level lock subsystem destruction slighly higher in THD::~THD(), closer to its new home - MDL subsystem. Well, I had to put everything back, plus move MDL susbystem destruction to the end of THD::~THD(), to make this test work.


I doubt I would have been able to make my way through the test suite if I haven't had previous experience on the MySQL team.
Writing the patch was moderately fun (I'm not going to bash MySQL Item class hierarchy another time), but groveling through a huge test suite and fixing stupid errors which were only barely related to my patch was extremely tedious.


(4 comments | Leave a comment)


[User Picture]
From:Sinisa Milivojevic
Date:December 5th, 2012 04:44 pm (UTC)
First and first of all, I am one of the authors of the original GET_LOCK();

Just a couple of comments. First, I doubt the usefulness of this sub-feature. I imagine that this would be needed in some cases, but those are VERY, VERY few cases. I do not recall a single FR requesting multiple user-level locks in the single connection.

Next, now this change requires some other clarifications. When a single user, having locked the same string several times, calls RELEASE_LOCK(), will it release them all or just first or just last ???

Next, IS_FREE_LOCK() and IS_USED_LOCK() should return additional data, related to lock being taken by the same connection already. Actually, what is missing is a function returning number of connections in a queue and number of locks taken by the same connection.

This complicates things beyond usability.

Last, but not least, this functionality is not so much used and having several locks can be worked around, in some cases, by smart composition of the lock string.
[User Picture]
Date:December 5th, 2012 05:45 pm (UTC)
Hello Sinisa!

I did it for a real-world user.

RELEASE_LOCK() is recursive: if the same lock is taken 3 times, you need to call RELEASE_LOCK() 3 times to release all instances of the same lock.

IS_FREE_LOCK() and IS_USED_LOCK() work just as before. I added test coverage to ensure this doesn't break in the future.

Lock inspection is a great idea, and now, when locks are part of MDL, lock inspection can be done for entire MDL, and user locks will get covered by that.

Edited at 2012-12-05 05:46 pm (UTC)
[User Picture]
From:Sinisa Milivojevic
Date:December 5th, 2012 06:01 pm (UTC)
Kostja, my dear friend and my brother !!!!

I did not say that there are no uses for your sub-feature. I just think that usage would not be too frequent. I inspected the code and it is very well done. Very nice, which is usual from you.

May be RELEASE_LOCK() could get an optional argument for releasing them all, for the same connection ???

If they work as before, then IS_FREE_LOCK() and IS_USED_LOCK() will return the same result if a lock is held by the same or some other connection, which is 50 % as good as it should be. Hence, a new function would be required, IMHO.

There could be a P_S table with those locks, which would make unnecessary both of the above existing functions as well as a new function that I proposed. With this P_S table, you could also see a graph of connections holding / waiting for these locks !!!! What a good idea from me, hehehe .....
Date:March 5th, 2013 07:26 am (UTC)
"First, I doubt the usefulness of this sub-feature. I imagine that this would be needed in some cases, but those are VERY, VERY few cases. I do not recall a single FR requesting multiple user-level locks in the single connection."


The MySQL bug page for this issue is full of appalled people who want this. Most of us probably don't even need anything recursive, but being able to have two locks in one connection seems basic. It caused me great frustration when I found out I could have only one named lock at a time. Why? I don't even need it to be recursive.

Multiple concurrent locks needs to be added to MySQL ASAP!

> Go to Top