Member-only story
Microservice Forensics 101
Recently we were brought in by a customer to investigate a problem with their microservices architecture where specific transactions weren’t yielding the expected results. Since some of their microservices heavily rely on one another, it wasn’t immediately clear which one of them was the one misbehaving.
After a few days of combing through their code, we finally found the culprit. These are some of the lessons we learned while debugging.
1. Run it locally
With a microservices infrastructure where at least a dozen docker containers are required to debug a specific part of a transaction, you might be inclined to delay setting up a local development environment as long as possible. Sure, it’s a hassle, but being able to make changes directly in the code and SSH’ing directly into boxes to debug and see logs makes it so much easier to actually reproduce a transaction in a way that lets you find the root cause of the problem. In the end, it’s much faster than only poking around in production logs and going through code.
2. One service at a time
Don’t try to understand what the full lifecycle of a transaction is through the myriad of API calls that connect microservices to each other. Instead, go for a one-by-one approach. Understand what a certain service does with your input data, look at the logs for that one service when you replay the transaction, verify the output contains what you would expect it to contain, and move on to the next…