Engineering |

The 946-Millisecond Tax: Migrating API Key Auth from Bcrypt to HMAC-SHA256

A shipping port at dusk with container trucks queued at a single access checkpoint, cranes and vessels in the background

I never profiled our authentication middleware. Why would I? It's a key check. The request comes in, you verify the key, you move on. It's plumbing. Then one afternoon I stuck a timer on it and watched it print 946 milliseconds. I re-ran it. Same. Every authenticated request to our API was spending nearly a full second deciding whether the caller was allowed in, before it did a single useful thing.

We were hashing API keys with bcrypt. It felt like the right thing to do. It wasn't.

The 100-Millisecond-Per-Key Tax

When VesselAPI's authentication was first built, someone — by someone I mean me, it was me — did the reasonable thing. User creates an API key, we hash it with bcrypt at cost factor 10 and store the hash. On each incoming request, extract the Bearer token, load the stored hashes from PostgreSQL, and run bcrypt.CompareHashAndPassword against each one until a match is found or the list runs out.

Bcrypt is a password hashing function. It was engineered, on purpose, to be slow. That deliberate slowness is its entire value proposition for passwords: if an attacker steals your database full of password hashes, the CPU cost of bcrypt makes brute-force recovery so expensive it's impractical. At cost factor 10, each comparison takes roughly 65–100 milliseconds depending on hardware. For a user logging into a web app once a day, that's invisible. For every single API call, it's a wall.

Here's what compounded it: bcrypt hashes aren't indexable. The function incorporates a random salt, so the same input produces a different hash each time. You can't compute the hash of an incoming key and look it up in the database — you have to load all the stored hashes and compare them one by one. With 11 active keys in production at the time of our migration, the worst case was 11 bcrypt comparisons. Just under 1.1 seconds of pure CPU work before we even said hello to the request.

11 active keys × ~100ms bcrypt = up to 1.1 seconds of authentication.

The health endpoint with no auth ran in 581ms. Authenticated endpoints ran in 1,466ms. We were paying nearly a second for permission to serve the response.

At the traffic volumes we had, this wasn't causing visible degradation. The load test user was doing 94,315 requests in a month but spread across time. The problem wasn't that the system was falling over — it was that it was architecturally incapable of scaling. With 11 keys to check, bcrypt at cost factor 10 limits you to under one authenticated request per second per core. That's not a "tune this later" ceiling. It's a physics ceiling.

Why bcrypt Is the Wrong Tool for API Keys

We grabbed bcrypt because that's what you use for credentials. We never stopped to ask whether API keys are credentials in the same sense passwords are.

Bcrypt's whole thing is being slow on purpose. If someone steals your password database, the cost of each hash attempt is what stands between them and everyone's plaintext passwords. That makes sense for passwords — people pick terrible ones, reuse them, and there are massive precomputed tables for cracking common choices.

API keys are not that. Ours are 256 bits of cryptographically random noise. You'd need something like 2255 guesses to brute-force one. At a billion attempts per second, that's — let me check — longer than the Sun will exist. So what was bcrypt's cost factor actually buying us? Nothing. The keys were already uncrackable. We were paying 100 milliseconds per request for protection against a threat that doesn't apply.

The real risk with API keys is leakage: someone commits one to a public repo, or it shows up in a log file, or a compromised client exfiltrates it. Bcrypt doesn't help with any of those. What helps is a fast, deterministic hash — something you can index — plus a server-side secret protecting the stored hashes.

So we use a pepper — a server-side secret. We compute HMAC-SHA256 over the API key using a 64-character random secret stored in AWS Secrets Manager. The stored value is the HMAC output, not the key. An attacker with the database still needs the pepper from Secrets Manager to do anything with those hashes — a completely separate security boundary. And HMAC-SHA256 takes about a microsecond to compute, not a hundred milliseconds.

// Before: bcrypt load-and-compare-all (O(n) * 100ms per key)
for _, stored := range allKeyHashes {
    if err := bcrypt.CompareHashAndPassword([]byte(stored), []byte(apiKey)); err == nil {
        return stored, nil
    }
}

// After: HMAC + indexed lookup (O(1), ~1μs + one DB round trip)
mac := hmac.New(sha256.New, []byte(pepper))
mac.Write([]byte(apiKey))
keyHMAC := hex.EncodeToString(mac.Sum(nil))
return db.VerifyApiKeyHMAC(keyHMAC) // indexed lookup, <1ms

Replacing bcrypt with HMAC alone got auth overhead from ~946ms to ~811ms. Better, but not the kind of better you write a blog post about. Most of that remaining time was network round trips to PostgreSQL — the hash was fast, the wire wasn't. So we kept going.

Split view of two server racks — left side cluttered with tangled cables and amber warning lights, right side clean and minimal with green status LEDs

Zero Database Queries on the Hot Path

I'll spare you the detour where I briefly considered Redis before realizing I was optimizing a 17-key lookup and needed to calm down. The cache design we landed on is simple. Two sync.Map instances live in the auth service: keyCache maps HMAC hashes to key metadata, and userCache maps user IDs to subscription and quota information. When both are populated, an authenticated request never touches the database.

Before: Every Request

  • Extract Bearer token
  • DB: load all key hashes
  • bcrypt.CompareHashAndPassword (each key, ~100ms)
  • DB: check per-key quota
  • DB: increment per-key usage counter
  • 3–4 round trips, up to 1.1s of CPU

After: Steady State (cache warm)

  • Extract Bearer token
  • HMAC-SHA256 with pepper (~1µs)
  • sync.Map lookup: keyCache (tens to hundreds of ns)
  • atomic.AddInt64: local counter (~25ns)
  • Memory check: usage < limit? (nanoseconds)
  • Zero database queries

The usage tracking is worth a closer look, because it's where the real database write elimination happens. In the old system, every authenticated request incremented a per-key counter in PostgreSQL. That's a write on the hot path, with all the associated row contention and WAL traffic. Now, we increment a local int64 via atomic.AddInt64 and a background goroutine wakes up every 30 seconds, atomically swaps all the counters to zero, and flushes the totals to the database in a single batch call.

// Simplified — the real flush iterates userCache, where the counter
// lives alongside quota metadata in each cached entry.
func (r *cachedAuthRepository) flush() {
    r.userCache.Range(func(k, v any) bool {
        entry := v.(*userEntry)
        count := atomic.SwapInt64(&entry.localUsageCount, 0)
        if count > 0 {
            // accumulate for batch DB write
        }
        return true
    })
    // single batch call to update all user usage counts
}

Usage numbers can be up to 30 seconds stale, which is fine for monthly quota enforcement but worth understanding. We added one safeguard: when the in-memory counter says a user has exceeded their quota, we do a fresh database lookup before returning 429. This handles plan upgrades — if someone bumps from Starter to Growth mid-session, the cache still thinks their limit is 2,500 until the next DB refresh, so they'd incorrectly hit a wall. The fallback catches it. It adds one DB round trip at the moment that matters most, which is the right trade-off.

We also fixed something embarrassing while we were in there. The old quota model tracked usage per-key, not per-user. Each key had its own counter and its own limit. Which meant that a user who created three keys got three times the quota. Nobody exploited this, as far as I know, but only because nobody thought to try. We were one curious developer away from an unlimited plan at the price of a free tier and some creativity. Seventeen keys across seven users — early days — so nobody noticed. Per-user counters now. Should have been from day one.

The Lazy Migration

We had 17 API keys in production on migration day. A one-off script to backfill HMAC hashes for all of them would have taken about ten minutes to write and thirty seconds to run. We did the lazy migration instead, and I think it was worth the extra complexity.

Why not just run a backfill script? We could have. It would have taken thirty seconds. But then the bcrypt fallback path — which we wrote, and which we'd need if anything went wrong — would never get exercised in production until the one time it actually matters. With lazy migration, every old key that gets used proves the fallback works. It's testing in prod, but the honest kind.

The flow goes like this. Compute the HMAC of the incoming key. Check the cache — miss. Check the database for a matching key_hmac value — miss, because old keys don't have one yet. Fall through to bcrypt, verify against the old hash, succeed. Now kick off an async HMAC backfill, populate the cache, serve the request. Next time that key shows up, it hits the HMAC path directly and never touches bcrypt again.

// Simplified — real signatures differ slightly
hmacHex := computeHMAC(pepper, apiKey)

// 1. Cache (warm path)
if cached, ok := keyCache.Load(hmacHex); ok {
    return cached.(*keyEntry), nil
}

// 2. HMAC indexed lookup (migrated keys)
if key, err := db.VerifyApiKeyHMAC(hmacHex); err == nil {
    keyCache.Store(hmacHex, key)
    return key, nil
}

// 3. Bcrypt fallback (un-migrated keys, once per key)
keyID, err := db.VerifyApiKey(apiKey)
if err != nil {
    return nil, ErrUnauthorized
}

entry := loadKeyEntry(keyID)
go func() {
    if err := db.BackfillHMAC(keyID, hmacHex); err != nil {
        log.Error("hmac backfill failed", "key_id", keyID, "error", err)
    }
}()
keyCache.Store(hmacHex, entry)
return entry, nil

New keys created through the API key manager get their HMAC computed at creation time and never touch the bcrypt path at all. The lazy path is there for the legacy keys, and once they've each been used once, it's never exercised again.

Need vessel data? Check out VesselAPI

The Rollout

It took three deployment attempts on February 7th. Wrong Secrets Manager config format, then a fix baked into an AMI that hadn't propagated yet, then a missing IAM permission in a different repo. Standard infrastructure bingo. On the third try the server started, we hit /v1/health with a Bearer token, got 200 OK, and briefly celebrated — before realizing the health endpoint doesn't use auth. We'd validated our new authentication system against the one endpoint that skips it. Hit /v1/vessel/{id} instead, watched the bcrypt fallback fire, watched the HMAC backfill, hit it again — sub-10ms. That one counted.

What the Numbers Actually Look Like

Production measurements from February 7th, network overhead stripped out:

ScenarioDB QueriesAuth Overhead
Bcrypt fallback, cold cache4~946ms
HMAC lookup, cold cache2~811ms
Cache hit0<10ms

Under 10 milliseconds with a warm cache. Zero database queries. The auth overhead went from being the dominant cost of every request to being lost in the measurement noise.

Throughput ceiling with bcrypt and 11 keys: under 1 authenticated request per second per core. With HMAC and a cache, authentication is no longer the bottleneck for anything.

The Trade-offs Worth Naming

What changed beyond "it got faster":

The security model is different. With bcrypt, even if you have the database, you can't reverse the hashes. With HMAC, if you get both the database and the pepper, you can verify candidate keys. Our keys have enough entropy that this doesn't matter in practice, but it's a weaker property and I'd rather be upfront about it.

Usage numbers can be stale by up to 30 seconds. A user could overshoot their quota slightly before the flush catches it. At monthly limits measured in thousands, I genuinely don't care. If we ever need per-second rate limiting, we'd use Redis, not this.

There's also a concurrency wrinkle I should own up to: the usage counter itself is atomic, but the dbUsageCount and monthlyLimit fields on the same struct are plain int values read and written without synchronization. Multiple goroutines can read stale values, and worse, the quota-refresh path writes those fields directly on the shared struct without a lock — two concurrent refreshes could clobber each other. I think this is fine at our scale. I'm not entirely sure it's fine. It probably needs a mutex, and I'll probably add one the next time I'm in that file and feeling responsible.

The cache entries do have a TTL — both key and user entries expire after a configured duration, and the lookup re-fetches from the database when they go stale. But there's no active invalidation push. If you revoke a key, it stays valid in memory until that TTL window elapses. At current scale — 17 keys, 7 active users — this is the trade-off that actually kept me up at night, more than the others. We'll add a revocation broadcast before we onboard any enterprise customers. For now, the TTL window is short enough that I can live with it.

Two weeks after migration day, all 17 pre-existing keys have been lazily migrated. New keys get HMAC on creation. The bcrypt fallback path exists in the code and will presumably never run again, but I'm not ready to remove it yet. It's not hurting anyone, and the fallback existing is a different kind of insurance than the fallback being needed.

An old brass padlock sitting open on a weathered wooden desk, cobwebs on the shackle, a laptop with terminal code blurred in the background

The performance improvement worked like the math said it would. What I didn't expect was how many other things the migration shook loose — the per-key quota bug, the stale schema assumptions in other services, the concurrency shortcuts we'd been getting away with. You open up one table and suddenly you're looking at every decision you made in the first six months and wondering which ones are still load-bearing. Some of them weren't. Some of them still are.

← Back to all posts