Debugging at 2 AM: What Actually Helps
When production is down and your brain is foggy, these are the tools and habits that save you.
It's 2 AM. Your phone just buzzed with a PagerDuty alert. Something is broken in production. Your brain is running at maybe 40% capacity.
This is not the time for clever debugging. This is the time for systematic, boring, reliable techniques that work even when you're half asleep.
Here's what I've learned from too many late-night incidents.
Step 1: What Changed?
90% of production bugs are caused by something that changed recently. A deployment. A config update. A database migration. An expired certificate.
Before you start reading code, figure out what's different from when it last worked.
Compare the current config to the last known good one. A text diff tool makes this a 10-second operation instead of squinting at two files side by side.
Seriously. I once spent 45 minutes debugging a "mysterious" API failure that turned out to be a misplaced comma in a JSON config. A diff would have caught it in seconds.
Step 2: Read the Actual Error
This sounds obvious. It isn't.
When you're tired, you see an error message and immediately start hypothesizing. "Oh it's probably the database connection." You spend 20 minutes checking the database. The error message said TypeError: Cannot read property 'name' of undefined. Nothing to do with the database.
Read the error. Read it again. Read the stack trace. Follow it to the exact line of code.
Step 3: Make the Error Readable
Production logs are messy. Everything is on one line, JSON objects are unformatted, timestamps are in Unix epoch format.
Copy that blob of JSON, paste it into a JSON formatter, and suddenly you can actually read the error response.
I keep a formatter tab permanently open. During incidents, I use it every few minutes to make sense of API responses, log entries, and config files.
Step 4: Isolate, Don't Guess
Tired brains love to guess. "Maybe it's this. Let me try changing that."
Resist this. Each random change you make adds uncertainty. Now you don't know if the original problem is fixed or if your fix introduced a new problem.
Instead: reproduce the error with the smallest possible input. Strip away everything that isn't necessary. Find the one variable that matters.
Step 5: Check the Boring Stuff
Before you dive into code, check:
- DNS: Is it resolving correctly?
- Certificates: Did one expire?
- Disk space: Is the server full?
- Memory: Is something leaking?
- Dependencies: Is an external API down?
I keep a sticky note on my monitor with these five items. During an incident, I check them first. They've been the root cause more often than actual code bugs.
The 2 AM Debugging Toolkit
These are the browser tabs I open when an alert wakes me up:
- Text diff — for comparing configs, environment variables, deployment files
- JSON formatter — for reading API responses and log entries
- Regex tester — for searching through logs with patterns
- A plain text editor — for drafting notes about what I've tried
That last one matters more than you think. At 2 AM, your short-term memory is garbage. Write down what you tried and what happened. Future-you will thank past-you.
After the Incident
Once the fire is out, don't just go back to sleep. Spend five minutes writing down:
- What broke
- Why it broke
- What fixed it
- How to prevent it next time
You will not remember this tomorrow. Write it down now.
The best 2 AM debugging isn't about being smart. It's about being systematic. Diff the config. Read the error. Format the logs. Check the boring stuff. Write it down. That's the whole playbook.