24 hours of outage, probably due to Docker

 •  Filed under docker

All my sites on a particular server went down at 00:13 last night. Nothing in any logs. The docker containers running the websites themselves were responsive to curls from the server. So that left the https-portal container, which reverse-proxies all the sites, as the culprit. Not that I think this is https-portal's fault, actually -- this seems like docker being more generally unstable (I just found out about that article in diagnosing this problem -- holy hell). This was corroborated by some weird errors on trying to kill/restart the container in question.

I really love https-portal. I don't want to give it up. But it certainly is a critical point of failure.

(The name of this article is a little misleading -- I fixed the problem in 10 minutes, it just took me 24 hours to get around to doing so. Nothing big impacted, just a couple personal projects. Including this blog.)