You are viewing kostja_osipov

Fish Magic

> Recent Entries
> Archive
> Friends
> Profile
> My photos at flickr
> previous 10 entries

December 3rd, 2013

10:07 pm - Urgent and important vs. anything else
Making people do what you want them to do is impossible. They seem to always do what *they* want to do.
Do we need syntax highlighting in the command line client? Probably. Do we need it now? Definitely not.
But do we need colors in the *test runner*? Yeah, in 2025, perhaps! But we do have it now. And there is nothing I can do about it. Except this little revenge.

(1 comment | Leave a comment)

October 21st, 2013

07:35 pm - The video from NoSQL matters about Tarantool

(Leave a comment)

October 2nd, 2013

11:34 am - Performance of stdarg.h
Most discussions I was able to find online about functions with variable number of arguments in C and C++ focus on syntax and type safety. Perhaps it has to do with C++11 support of such functions. But how much are they actually slower?

I wrote a small test to find out:

kostja@olah ~/snippets % gcc -std=c99 -O3 stdarg.c; time ./a.out
./a.out 0.18s user 0.00s system 99% cpu 0.181 total
kostja@olah ~/snippets % vim stdarg.c
kostja@olah ~/snippets % gcc -std=c99 -O3 stdarg.c; time ./a.out
./a.out 0.31s user 0.00s system 98% cpu 0.320 total

64-bit ABI allows passing some function arguments in C via registers. Apparently this is not the case for functions with variable number of arguments. I don't know for sure how many registers can be used, but the speed difference between standard and variadic function call increases when increasing the number of arguments.

(Leave a comment)

September 28th, 2013

02:27 am - Launchpad bug tracker
The issue tracker on our source code host, GitHub, has matured enough for the team to make a decision to move.
It's probably not the best idea to criticize a free home for an open source project, after all, Launchpad wasn't making any money from hosting us, but, truth be said, and perhaps lack of business model is the reason, it has fallen behind in features and usability.

Just for the record, the most important problems with bugs at Launchpad for us were:
- 7-digit bug ids. Tarantool is a small project and we perhaps will never go out of 4 digits, and you often need to have a quick and easy "handle" for a bug during conversation or in a email
- too many attributes of a bug. The milestone and series system was again designed for a large project, and only complicated matters for us
- bug states were quite nice, but then again we only used a few of them. At the same time there was no "legal" way to mark a bug as a duplicate - perhaps something related to the internal policies at Canonical.
- no way to cross-link a bug and a commit, unless (I guess) you're using Bazaar
- no bulk operations on bugs.

GitHub issues solve a lot of the above, plus, and this is actually the main reason, the issue tracker and the code both benefit from being close to each other.

(Leave a comment)

01:55 am - New algorithm for taking snapshot in Tarantool
Just merged in a patch which I think gives Tarantool one more small but important edge over any other free in-memory database on the market.
The patch changes the algorithm of snapshotting (consistent online backup in Tarantool) from fork() + copy-on-write to use of delayed garbage collection. The overhead per tuple (Tarantool name for a record) is only 4 bytes, to store the added MVCC version. And, since delayed garbage collection is way more fine-grained compared to page-splits after a fork(), as it works on record level, not on page level, the extra memory "headroom", required for a snapshot, is now within 10% of all memory dedicated to an instance.

This feature goes into 1.5, which is, technically speaking, frozen :), but the patch has quite good locality and has been tested in production for a few months already, so I couldn't stand the temptation of making it available ASAP.

Speaking of our master, 1.6, it has already got online add/drop space/index, space and index names, and now is getting ready to switch to msgpack as the primary data format. But since we withstood from making incompatible changes for almost 3 years, there is still a lot of wants and wishes for 1.6. So the current best bet is to get 1.6 out of alpha by the end of the year.

(2 comments | Leave a comment)

01:47 am - open_memstream()
Have you heard about open_memstream()? This is a nice addition of POSIX 2008.
Good little step towards bringing down the number of different string classes in an average C/C++ program.

(Leave a comment)

September 16th, 2013

03:32 pm - Relevance of regression test failures on exotic platforms
Back in my days at MySQL we had a lot of issues with test failures. We had lots of platforms, and would try to run and maintain our regression test suite on all of them. I remember spending days investigating issues on some obscure OS (Mac OS, mainly, Windows was taken care of) or hardware (little-endian, mainly) .
With Tarantool, we never got to do that. We do run buidls on lots of platforms, and someone always screams when they break, since we only run builds on platforms which are in actual use. And they do break, so it's a lot of hassle. But we haven't had time to maintain the regression tests on some of these platforms. Ugly? Yes. Yet we know which systems people use in production, and do take care of these. This set is much more narrow than the set of systems which people play with.
And also, we don't pay attention to test failures caused by, essentially, bad tests. If a test fails once in a while on a busy box, well, this is kind of bad, but tolerable. One day we'll rewrite the test.
It turns out that these tests failures have very little relevance to what people experience in production. In the course of these 3 years I've never seen a test failure on an exotic platform being relevant to any production bug we've had.
Perhaps this is all possible since Tarantool team is so much smaller than MySQL. But it spares us all from lots and lots of boring and unneeded work.

(Leave a comment)

August 8th, 2013

11:25 am - Notes from a test plan
It's been a month or so since I've begun looking at the new data dictionary implementation for Tarantool. Roman created a first version 3 months ago, but myself, blame my perfectionism, thought that some flexibility in the way system spaces can be manipulated with won't harm. The idea is that all space metadata is stored in spaces, not in a configuration file. Kind of "kill the .frms in your mind" feature. A user simply updates the "system" spaces and that effects in creation or destruction of spaces, indexes, data types.
When this whole thing becomes really twisted is that a system space definition also resides in system spaces. There are 3 dedicated spaces - _space - defines space names and ids, _index - defines indexes on spaces, and _format - defines tuple format. And these spaces, in the beginning, contain their own definitions.

Now, here's what I wrote in the test plan for this feature yesterday:

Check that when replacing a tuple in _index system space, thus redefining the primary key in this system space, the new tuple ends up (can be later found) in the new primary key it defines.

(Leave a comment)

June 2nd, 2013

02:43 pm - Evaluating a MySQL database connector
Since Tarantool stored procedure API was extended with socket I/O, a whole universe of applications for data-enriched networking (routing, proxying, PUSH-notifications, and so on) has become possible.

But there is one case which doesn't lend itself so easily: anything MySQL. The first scenario I'd love to support is when Tarantool works as a smart write-back cache for MySQL, providing a higher update RPS, but automatically maintaining a slightly dated copy of all the data in a relational database.

One dramatic shortcoming of MySQL universe, which, IMHO, if addressed properly, could spark a whole new set of uses and third-party solutions, is the clumsiness of the client-server protocol.

The MySQL client-server protocol is unnecessarily hard to implement: it is built on top of a layered design, with built-in compression and transport-level tricks to be able to communicate over unreliable protocol such as UDP.

A separate issue still not done right is that replication has never been considered part of the protocol or the client library (there is even an open source project solving exactly this problem).

In Tarantool, a user of the connector can read the replication stream in just the same way as he/she would read result set of an ordinary query, and this adds a whole new set of ways in which a database can be used.

Finally, the MySQL library itself lacks necessary modularity: socket I/O, SSL encryption, character set support, prepared statement support, zlib compression, even reading client passwords from the command line, and, since recently, plug-in support are all intermixed with the binary protocol, which is at the core of what the library does, and are all part of one thick bundle.

What if I want something tiny to just be able to connect to the server and send simple queries?
What if I want to use it inside an event loop or in coroutine environment?
What if I want to write a protocol mapper between, say, HTTP and MySQL?
What if I don't want to add a dependency on OpenSSL?
In the best case, there is only one answer to a question like this, in the worst case one has to re-implement the protocol from ground up.

In Tarantool, we learned that the protocol must stand alone after we re-designed our own library 3 times, since people would simply ignore the "official" library and just go ahead and write their own.

Back to MySQL, the situation has begun to change in the last three years.

First, Drizzle project implemented libdrizzle - a wholly new client library to talk to Drizzle server. The unobvious part is that Drizzle binary protocol is fully compatible with MySQL, so libdrizzle can talk to MySQL as well.
Good things about libdrizzle is that it is built around an event loop, so the entire code base is callback based. Makes it easier to embed into callback-based environment such as node.js. Little advantage for Tarantool, which, while using an event loop under the hood, hides it completely by providing lightweight green threads, and thus is able to execute sequential networked code.

Another good thing about libdrizzle is that the code base is small and is easy to read, even if you're new to MySQL world.

The shortcomings of libdrizzle is that it doesn't support prepared statements - indeed, why would you add support for prepared statements when Drizzle itself doesn't have it :), and is completely character set-unaware - in Drizzle, everything is utf-8.

The second library created recently is MariaDB's native-client.
I've taken a quick look at the code, and it seems that it pretty much is the same as old-good MySQL - full support of the API in one think bundle.

Which library should we choose? Whatever it is, we'll need to patch it, since even libdrizzle is not modular enough so that it can be integrated into Tarantool core without changing any of the upstream code. The advantage of MariaDB library is that it has prepared statement support. On the other hand, prepared statements still don't work properly with connection pools, and hence are still not used widely. Indeed, most of the shops I know simply wrap their statements into stored procedures, which also gives extra security, but use the old direct API to invoke them.

Next week we'll be looking at the two libraries more closely.

(2 comments | Leave a comment)

April 30th, 2013

01:16 am - Importance of intra-query parallelism.
Oracle Database has a feature which allows it to query millions of rows in parallel while executing a join which has a big fanout.
How important is it that a database server has a lot of intra-query concurrency? Does it still make a lot of sense to run an analytical query in parallel threads, on a single machine?

While at Percona Live, there was a lot of talk about the future of MySQL, and some even mentioned this as being part of the future.

The reason for intra-query parallelism has always been to fill up the pipeline to disk with lots of parallel queries. Indeed, this pipe is thick and long - and if used, it'd better produce a lot of data at once. Efficiency of CPU utilization is sacrificed to achieve efficiency of a rotating disk drive.

Yet in DaaS world this all fails to make sense to me. In a cloud, one execution unit is not one CPU, but one instance, and one database instance equals to a cluster of virtual machines. Map/Reduce was only the first sign of the change - it is stupid, indeed, but network is faster than disk, and if a query needs to inspect a million of rows, they'd better be on thousands of disks, not on a single one.

It's funny how MySQL technology is steadily pulled up-market. I haven't seen a single project use MySQL Stored Procedures, which were created for SAP R/3 integration, in applications they were created for. Perhaps, when parallel query in MySQL is ready, it also will be used for something completely different.

Meanwhile, I think the task of coming up with an efficient join algorithm to run across sharded data is more in line with the way hardware is going to look like in the future. Sharding is done best when not done at all. But so is concurrency.

(4 comments | Leave a comment)

> previous 10 entries
> Go to Top