您的位置:首页 > 运维架构 > Docker

Docker(swarm mode)在一段时间不用后无法启动

2017-01-08 09:43 791 查看
docker1.12版本刚出的时候,自己建了个虚拟机安装实验了下内置的swarm模式的新特性,后来这个虚拟机就一直没用。今天在打开这个虚拟机时,发现docker服务无法启动了,具体现象如下:

[root@node1 /]# service docker start
Redirecting to /bin/systemctl start  docker.service
Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.

查看详细的信息

[root@node1 /]# systemctl status docker.service -l
* docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since 六 2017-01-07 20:19:22 CST; 56s ago
Docs: https://docs.docker.com Process: 2707 ExecStart=/usr/bin/dockerd (code=exited, status=1/FAILURE)
Main PID: 2707 (code=exited, status=1/FAILURE)

1月 07 20:19:21 node1 dockerd[2707]: time="2017-01-07T20:19:21.941128813+08:00" level=warning msg="mountpoint for pids not found"
1月 07 20:19:21 node1 dockerd[2707]: time="2017-01-07T20:19:21.941923814+08:00" level=info msg="Loading containers: start."
1月 07 20:19:21 node1 dockerd[2707]: ...time="2017-01-07T20:19:21.966308550+08:00" level=info msg="Firewalld running: false"
1月 07 20:19:22 node1 dockerd[2707]: time="2017-01-07T20:19:22.458578104+08:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
1月 07 20:19:22 node1 dockerd[2707]: time="2017-01-07T20:19:22.572281786+08:00" level=info msg="Loading containers: done."
1月 07 20:19:22 node1 dockerd[2707]: time="2017-01-07T20:19:22.635556518+08:00" level=fatal msg="Error creating cluster component: error while loading TLS Certificate in /var/lib/docker/swarm/certificates/swarm-node.crt: x509: certificate has expired or is not yet valid"
1月 07 20:19:22 node1 systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE
1月 07 20:19:22 node1 systemd[1]: Failed to start Docker Application Container Engine.
1月 07 20:19:22 node1 systemd[1]: Unit docker.service entered failed state.
1月 07 20:19:22 node1 systemd[1]: docker.service failed.

其中有一条错误信息,大致意思是swarm-mode.crt证书已经过期或无效。

error while loading TLS Certificate in /var/lib/docker/swarm/certificates/swarm-node.crt: x509: certificate has expired or is not yet valid

查询docker的issue里,是有一条24132号关于这个问题的讨论的:

Swarm certificates automatically renew and have 90 day expiry period by default. Still, if you don't start the daemon during that time the certificates will expire and starting daemon will fail with
time="2016-06-29T17:18:06.165656736Z" level=fatal msg="Error creating cluster component: error while loading TLS Certificate in /var/lib/docker/swarm/certificates/swarm-node.crt: x509: certificate has expired or is not yet valid"


I think refusing to start and not ignoring this error is correct. We could provide
--reset-swarm
option to leave swarm so the user doesn't need to remove the state dir manually. Problem is that user must remember to remove this option as otherwise, it would clear the state on every next restart as well.

Maybe a good enough solution would be to add instructions for removing the state directory in the error message.

swarm的证书默认是有90天的有效期,如果在有效期内,可以通过自动续期的机制更新证书,但是如果长时间没有启动服务器,超过了有效期,那docker将无法启动。

针对这个问题,我们可以先将/var/lib/docker/swarm目录删除或更名,docker就可以正常启动了。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  Docker Swarm service TLS