We upgraded our EKS clusters to 1.14.7 about 2 weeks ago. Since then, pods communication, either inter-pod or to the internet, started failing intermittently with “Unable to resolve hostname”, which more or less indicates DNS queries timeout.

Digging into the problem, it turns out we are not the only one:

There are already some good write-ups about this issue:

Long story short, we need these patches in netfilter module:

Above patches are merged and available in kernel version 4.19. EKS AMI is using kernel 4.14 (EKS team already acknowledged the issue, tracking ticket here)