$\begingroup$

I would like to maintain a list of unique data blocks (up to 1MiB in size), using the SHA-256 hash of the block as the key in the index. Obviously there is a chance of hash collisions, so what is the best way of reducing that risk? If I also calculate the (e.g.) MD-5 hash of the block, and use the combination (SHA-256, MD-5) as the key, is the chance of a collision about the same as some 384-bit hash function, or is it a little bit better because I'm using different hash functions?

Thanks for the info!

Edit: My blocks come from normal user data on hard drives, but it will be many petabytes in total.

Edit2: As a follow-up (just tell me if this should be moved to a different question): Since the blocks can vary in size but can be up to some preconfigured limit (e.g. 1MiB), how will collision resistance be affected if I make the (64-bit) size of the block part of the key? That way you can only have collisions of blocks with the same size...