Hashing Explained: MD5, SHA, and When to Use Each
Hash functions turn data into fingerprints. Here's how they work and why MD5 isn't dead yet.
You download a file. The website shows an MD5 checksum. You're supposed to verify it somehow. But why? And what's the difference between MD5 and SHA-256?
Hash functions are everywhere in software. Understanding them helps you use them correctly.
What Hashing Does
A hash function takes any input and produces a fixed-size output. Always the same output for the same input. Change one bit of input, and the output changes completely.
"hello" → 5d41402abc4b2a76b9719d911017c592 (MD5)
"Hello" → 8b1a9953c4611296a827abf8c47804d7 (MD5)
One capital letter changes the entire hash. That's the point.
Hash vs Encryption
Hashing is one-way. You can't reverse a hash to get the original data.
Encryption is two-way. With the right key, you can decrypt and recover the original.
If someone says they can "decrypt" an MD5 hash, they're either cracking it (trying inputs until one matches) or they're confused about how hashing works.
The Common Algorithms
MD5 - 128-bit output. Fast but cryptographically broken. Fine for checksums, not for security.
SHA-1 - 160-bit. Also broken for cryptographic use. Legacy systems still use it.
SHA-256 - 256-bit. Current standard. Use this when security matters.
SHA-512 - 512-bit. Slightly more secure, sometimes faster on 64-bit systems.
BLAKE2/BLAKE3 - Modern alternatives. Faster than SHA-256 with similar security.
When MD5 Is Fine
MD5 gets a bad reputation because it's "broken." But broken for cryptography doesn't mean useless.
File integrity checks. Verifying a download hasn't been corrupted. An attacker would need to create a malicious file with the same hash, which requires more effort than most attacks are worth for non-security contexts.
Cache keys. Hashing content to generate unique identifiers. No security implications.
Deduplication. Finding duplicate files by comparing hashes.
When to Use SHA-256 or Better
Password storage. Actually, use bcrypt or Argon2 instead. They're designed for passwords.
Digital signatures. Document integrity where adversaries might try forgery.
Anything security-critical. When an attacker has incentive to find collisions.
The Collision Problem
A collision is when two different inputs produce the same hash. Birthday math says collisions become likely sooner than you'd expect.
For MD5, researchers have found ways to create intentional collisions. Two different files, same hash. That's why it's "broken."
For SHA-256, no practical collision attacks exist. Yet.
Practical Examples
Verifying downloads:
sha256sum ubuntu.iso
# Compare output to the hash on the download page
Storing API keys: Store the hash, not the actual key. When users submit a key, hash it and compare.
Git commits:
Git uses SHA-1 to identify commits. That's why commit hashes look like a1b2c3d4e5f6...
Don't Roll Your Own
Never invent your own hashing scheme. "MD5 twice" or "SHA-256 with salt appended" are worse than using established patterns.
For passwords: bcrypt, scrypt, or Argon2. For integrity: SHA-256. For speed: BLAKE3.
The experts have already figured this out.
Hash functions are simple in concept but nuanced in application. Pick the right algorithm for the job, and don't use cryptographic hashing where a simple checksum would do.