A CRC-32 reversal implementation

August 5th, 2008

(This post is mainly intended for fellow geeks. Non-programming readers of this blog may safely ignore this, I promise you won't be missing out on anything useful.)

The CRC-32 checksum algorithm is everywhere. It is used to detect random data corruption in ethernet packets, PNG graphics files and—as it turns out—zip files. The basic idea is that a 4 byte checksum of a sequence of bytes of arbitrary length is calculated using a specific algorithm. The sender of a message constructs a checksum of the message and send it along with the actual data, and then the receiver can compare the checksum with it's own checksum calculated from the payload data and use that information to detect random data corruption.

The algorithm is designed to be fast and easy to implement in hardware. You can view it as a machine that you feed bytes of data. At any time you can query the machine for a checksum value of all the data fed into the machine. Each byte fed alters it's internal state so that a different 4 byte checksum value is returned. Even a single bit change in the incoming data will alter the checksum to a completely different value.

The algorithm is explained in a very detailed way with impressive clarity in A painless Guide to CRC error detection algorithms so if you are interested, please read that document.

My conundrum with regards to CRC-32 was that I would like to make a short-cut, calculating the resulting checksum of a short dynamic sequence of bytes (an ID3 tag) combined with a longer, static sequence of bytes (an MP3 file) without needing to re-calculate the checksum of the whole every time the dynamic part changed.

So, what to do. Calculating a backwards CRC-32 value for the latter part of the stream to be combined with the checksum of the dynamic part seems to be too hard a problem to solve. However, I got a tip from a friend that with a 4 byte sequence of "compensation data" the checksum can be reset to whatever value one chooses. That way the dynamic tag plus four well chosen bytes can be reset to a known good state for the CRC-32 machine to chug along from and arrive at a known destination.

I got a link to a paper describing an algorithm to calculate the magic compensation bytes, but I think that the description of the algorithm was so complicated that I decided to figure out a way to do it on my own. So, I did. After a bit of experimenting looking at the table based algorithm used in the GNU Classpath CRC32.java I found out a way to calculate the magic reversal bytes as a function of some current checksum as well as a desired checksum.

I have released the resulting CRC32Compensator.java under the MIT Free Software license, in the hope that someone else will find it useful or instructive.


Trackback URI | Comments RSS

Leave a Reply

Name (required)

Email (required)

Website

Speak your mind