A few months ago, we ran into an issue where some AWS lambda logs weren’t appearing in Kibana. It wasn’t happening too frequently, but it was frequent enough where it worried SRE and the senior software engineers. Strangely, when we would search in Kibana for these relevant logs, they weren’t appearing. With this discover in mind, I got to work.
For this particular architecture, a lambda (called the message bus) collected application logs from the other relevant lambdas via an SQS queue and then POSTed them to the elasticsearch box in our ELK stack. The log payload itself was just JSON data for sign ups.
First, I confirmed that the lambda had access to POST to elasticsearch. I sent a few example POST requests to elastic from my machine to confirm elasticsearch was working. I also used the lambda test event to send some test messages and confirmed that those messages were appearing as well. This was clear confirmation that the lambda was able to make direct request to the elasticsearch cluster and that elasticsearch was properly receiving the data
Note: I found the data in the elasticsearch cluster using the search API
Second, I checked Kibana. While going through the Kibana console, I confirmed that the majority of the test messages that I sent weren’t appearing. I started by recreating the index and confirming that Kibana was properly connected to elasticsearch.
Finally, I figured out the solution! It turned out to be two things: First, the timestamp in the logs wasn’t in a format that Kibana supported natively. Second, some of the log messages weren’t properly updating the timestamp. The JSON had two places for timestamps, one for when the log was written, and one for when the actual data was written. For the test data in dev (along with some data that got POSTed later because of backed up SQS queues), it had the date set very far in the past (as far back as January 2020 which is before the service was even made). So when we looked up the data in Kibana, it didn’t appear because we were only looking back the past few days.
We fixed this issue by updating the index in Kibana to use the field for when the log was written and by ensuring that the timestamp is always set.