?

Log in

No account? Create an account
A better MySQL CHECKSUM TABLE, fixing Bug#39474 - Fish Magic — LiveJournal

> Recent Entries
> Archive
> Friends
> Profile
> My photos at flickr

September 28th, 2010


Previous Entry Share Next Entry
01:42 pm - A better MySQL CHECKSUM TABLE, fixing Bug#39474
The current MySQL table checksum is very simple: it's basically the same as CRC32(CONCAT(all data in the table)).
Since long ago we've had customers complain about the algorithm, but one doesn't change such things every day.
Now the time has come to make the change.
The only question is how much should the checksum formula be changed? Is it sufficient to just fix Bug#39474 or should we take the opportunity to do more?
  • Is crc32 a good enough function for a checksum? Should we start using some other hash function?
  • Should checksum change when table metadata changes? I.e. when you change the underlying data type? What about changing the table comment? Order of columns in the table?
  • Any other issues we should address along the way?
Your input is much appreciated!

(5 comments | Leave a comment)

Comments:


[User Picture]
From:kostja_osipov
Date:September 28th, 2010 11:01 am (UTC)

MyISAM checksum

(Link)
Mental note: make sure the algorithm is consistent with CHECKSUM TABLE QUICK, which is currently only available for MyISAM.
Is it OK if different CHECKSUM variants produce different results? Perhaps it is.
[User Picture]
From:kostja_osipov
Date:September 28th, 2010 11:24 am (UTC)
(Link)
Another mental note: check if we need to take pieces of Monty's implementation as described in Bug#37007
[User Picture]
From:kostja_osipov
Date:September 29th, 2010 11:33 am (UTC)

mn#3

(Link)
pz suggests to use 64bit checksum
[User Picture]
From:kostja_osipov
Date:September 29th, 2010 11:33 am (UTC)
(Link)
mats says no immediate issues with rpl are known
From:ext_270508
Date:September 29th, 2010 12:17 pm (UTC)

Replication issues and future

(Link)
Changing the checksum algorithm will not per-se cause any replication problems, but it would be good if we could distinguish between a checksum for the definition and one for the data. Changing the name of a field, for example, might be important for some application but not for others.

Also, it would be good if a checksum algorithm could be maintained incrementally: that would allow quick checking of consistency between tables on master and slave, even if they are huge. This basically requires a linear checksum function.

> Go to Top
LiveJournal.com