You are viewing [info]kostja_osipov's journal

Fish Magic

> Recent Entries
> Archive
> Friends
> User Info
> My photos at flickr
> previous 10 entries

April 26th, 2012


10:32 am - Videos of recent talks
Live streaming technology is advancing by leaps and bounds, and showing up at a conference unprepared becomes more and more problematic.

Videos of the two most recent appearances (in Russian):

http://live.digicast.ru/embed/1056?language=ru#time1334390220
http://techforum.mail.ru/video/ (Зал 2, the video starts at 5:27)

(Leave a comment)

10:21 am - An old way of sandboxing MySQL

While preparing for the yesterday's MySQL/MariaDB 5.5 talk at the Mail.Ru technical forum, I downloaded the source code of three 5.5 forks - Oracle, Percona and MariaDB. And was happy that my g'old way of sandboxing an instance without installing it still works.

I don't know whether MySQL sandbox is using this approach under the hood, here it goes:

1. Make sure /etc/mysql/my.cnf is not present or commented out.
2. Create a ~/.my.cnf with few important lines:
[client]
port        = 3307
host        = 127.0.0.1
socket      = /opt/local/var/mysql/mysql.sock

[mysqld]
gdb # this one is necessary just to sandbox
max_allowed_packet=16M
port=3307
socket=/opt/local/var/mysql/mysql.sock
language=/home/kostja/work/mariadb/5.5/sql/share/english
character-sets-dir=/home/kostja/work/mariadb/5.5/share/charsets
basedir=/home/kostja/work/mariadb/5.5
datadir=/opt/local/var/mysql
server_id=1
These two steps are not strictly necessary, but they allow you to avoid the "mysql will choose the most appropriate cnf file" foo. Of course, you need to make sure that all paths in the configuration file point to correct locations at the source tree, and the data dir exists and is writable.

3. Now we need to populate the data directory. Here's how:

Fire up
shell> mysqld --bootstrap
type
create database mysql
type ctrl-d.

You could do the same in step 4, it's just fun that when nothing else works you can send queries to mysqld using the standard input.

4. Now let's restart mysqld with --skip-grant-tables, create all the necessary system tables and fill them with data:

shell> mysqld --skip-grant-tables
...firing up the mysql command line client:
shell> mysql
use mysql  -- the below scripts don't choose the default database
source /home/kostja/work/mariadb/5.5/scripts/mysql_system_tables.sql
source /home/kostja/work/mariadb/5.5/scripts/mysql_system_tables_data.sql
Once this is all done, we can restart mysqld with no extra switches, issue the necessary grants, and get it going.

(2 comments | Leave a comment)

April 19th, 2012


08:18 pm - Debian repository with Tarantool packages
As part of our regular build procedure, we now maintain a .deb archive with most recent stable builds:
sudo echo "deb http://tarantool.org/dist/debian/ unstable main" >> /etc/apt/sources.list
sudo apt-get update
sudo apt-get install tarantool

(1 comment | Leave a comment)

April 17th, 2012


02:04 pm - Search engine: blekko
I blogged about duckduckgo previously, and here's another search engine, I've just recently learned about: blekko.

Met their CTO at Percona Live, running the CentOS (!) booth, to learn they have their own 25 petabyte index and an own NoSQL database to run it.
Their database (should I say petabase?) is written in C and Perl.

(Leave a comment)

April 16th, 2012


04:51 pm - Speak at NoSQL matters 2012 in Cologne, 29-30 May 2012
Provided I manage to get a visa, I should speak about Tarantool at nosql-matters.org conference, in Cologne, this May.
Judging by the invited speaker crowd this is going to be more a technology (lots of authors of NoSQL databases), rather than a community event. Interesting none the less!

(2 comments | Leave a comment)

April 10th, 2012


08:50 pm - Tarantool 1.4.5: what's in it
1.4.5 comes loaded with a bunch of fixes:
  • a new WAL I/O algorithm, increasing write-bound performance by at least 70%. The patch both lowers request processing latency, which gives us the lower performance boundary in single-connection benchmarks, and increases write throughput, giving 200%+ increase in multi-threaded write workloads,
  • use hardware CRC32 to calculate checksums when writing to the write ahead log,
  • it's now possible to push, pop, insert and delete fields in the middle of a tuple with UPDATE. Since one can pack multiple UPDATE commands into a single request, and these commands are then executed atomically, this is the most primitive form of multi-statement transaction support (other transaction features are in the pipeline),
  • more compact and efficient TREE indexes,
  • reverse iterators: you can now browse an index in reverse order,
  • positioned iteration over a multi-part index: it's possible to position an iterator in Lua using only a prefix of a multipart key, and retrieve all tuples which match the prefix,
  • full support of 64-bit integers in Lua (in addition to 32-bit integers, which we supported since 1.3),
  • a couple of dozen of bugfixes.
Whoever is using 1.4.4 I warmly recommend an upgrade: the performance boost is worth it. The release is fully backward compatible, and the team is on stand-by to fix anything our test coverage didn't catch.

(2 comments | Leave a comment)

09:23 am - Cost of a syscall
The problem with modern hardware is that it's impossible to know *how* expensive things are.
A simple thing such as a memory access can mean an L1 cache hit, L2 cache hit, cache miss, or a page fault. Cost difference is 1000000 times.

This leads to a programming style when an engineer doesn't know and doesn't want to know what machine instructions his/her code will produce, and how much they will cost.
This situation, which was normal in 90s when CPU speeds doubled every 2-3 years, nowadays is an obscure and crippling effect of Moore's law.

A syscall which blocks, even momentarily, is bound to cost way more than a system call which doesn't: a context switch not only has to do more work, it potentially thrashes L1/L2 cache, so is fraught with consequences.
Just to find out how much it may cost, my colleague [info]avdicius set up a small benchmark, which ping-pongs a single byte between two processes using a pipe.

The result is 200000 writes to a pipe per second. Or, peak 100000 rps with 100% CPU utilization when handling a request involves working with some sort of device.
For comparison, writing to a pipe which doesn't block costs less than 1/12 of that.

Now, above is not just a funny example, it's a puzzling example of a program which runs *faster* when the system has more work to do. Run the above test in multiple instances, and *each* instance gets a performance boost.
So far I've been unable to rationally explain this part.

(2 comments | Leave a comment)

April 4th, 2012


11:38 am - Increasingly dissatisfied with Google Search
I'm starting to use other engines, mainly bing, more and more often. In the past months I noticed some background noise about personalized search output, but haven't got a chance to turn it off. And now I clearly see that Google is hiding results from me.

The most recent example was when I was trying to find more data on Funky Fieber, or Funky Beans - this is a game which is getting increasingly popular in Germany and Europe, and unless you hold the beans and play with them, it's really hard to understand why it's cool. At first, Google was stupid enough to assume that a) I don't understand German b) I'm more interested in search results from Russia.
Funk Beans is a difficult search, I know: the game is (apparently) from Netherlands, is super-popular in Germany, but the name is English, and it hasn't risen high enough yet to dominate over your old grandma Cajun bean recipes.

Yes, another piece of Google fun was that Google knows that I'm a vegetarian, and at first tried to feed me with a huge amount of vegetarian links. I guess I will need to have different avatars going forward, to fool (or teach?) Google AI: Kostja The Vegetarian, Kostja The Construction Worker, Kostja the Child and Kostja the Software Engineer.

(2 comments | Leave a comment)

11:11 am - munmap() performance on Linux
munmap() is slow on Linux, and it is linear from the chunk size.
I have known about it for a while, but haven't got a chance to investigate why.

Anyone knows the reason? The problem is so serious that when working with large memory volumes Tarantool has to fork off a separate thread just to do munmap(), to avoid serious latency issues in the connection thread.

This morning I got a chance to actually write a test program and measure the impact.
An interesting result is that it gets relatively faster with a larger chunk.
kostja@atlas:~$ ./a.out 
   size  	mmap time	munmap time	t1/t2   	t2/size
   1048576	0.000006	0.000281	46.833333	0.026798
   2097152	0.000008	0.000388	48.500000	0.018501
   4194304	0.000006	0.001230	205.000000	0.029325
   8388608	0.000008	0.000680	85.000000	0.008106
  16777216	0.000002	0.001550	775.000000	0.009239
  33554432	0.000002	0.002711	1355.500000	0.008079
  67108864	0.000006	0.005897	982.833333	0.008787
 134217728	0.000007	0.010866	1552.285714	0.008096
 268435456	0.000004	0.020137	5034.250000	0.007502
 536870912	0.000005	0.040046	8009.200000	0.007459
1073741824	0.000011	0.067541	6140.090909	0.006290


PS If you have Solaris/Darwin/FreeBSD at hand, I would love if you repeat the program and paste your numbers.

(7 comments | Leave a comment)

01:31 am - Need help of MySQL experts (no kidding)
Today at a conference I was approached with a task, typical for a modern web app.

We have a chat system, and need to store and show all messages in the system.
There is no limit as to how long you store, and how much you can see.

There are two types of queries:
- get all incoming messages for a given user, in chronological order, with pagination.
- show a dialogue of two users, in chronological order, with pagination.

A user is identified by 32 bit uid.
A message can be uniquely identified by sender_uid and created_time (32 bit), or uid (destination user) and created_time.

If you store the whole thing in a single table, <uid, created_time, sender_uid, message >, you're messed up with random reads when you need to show a user inbox in chronological order.

If you store the same message in two places, you get the same mess, but at write time.

How do you best approach this? Is there a canonical solution? Column store?

Thanks,

--
kostja
Tags:

(7 comments | Leave a comment)

> previous 10 entries
> Go to Top
LiveJournal.com