最新消息:比度技术-是关注云计算、大数据、分布式存储、高并发、高性能、人工智能等互联网技术的个人博客。

Swarm调度实践

云计算 bidu 553浏览

Swarm 调度实践 Scheduler

调度模块主要用户容器创建时,选择一个最优节点。在选择最优节点过程中,分为了两个阶段:

第一个阶段,是过滤。根据条件过滤出符合要求的节点,过滤器有以下5中,

(1)Constraints,约束过滤器,可以根据当前操作系统类型、内核版本、存储类型等条件进行过滤,当然也可以自定义约束,在启动daemon的时候,通过Label来指定当前主机所具有的特点。

(2)Affnity,亲和性过滤器,支持容器亲和性和镜像亲和性,比如一个web应用,我想将db容器和web容器放在一起,就可以通过这个过滤器来实现。

(3)Dependency,依赖过滤器。如果在创建容器的时候使用了–volume-from/–link/–net某个容器,则创建的容器会和依赖的容器在同一个节点上。

(4)Health filter,他会根据节点状态进行过滤,会去除故障节点。

(5)Ports filter,会根据端口的使用情况过滤。

调度的第二个阶段是根据策略选择一个最优节点。有以下三种策略

(1)Binpack,在同等条件下,选择资源使用最多的节点,通过这一个策略,可以将容器聚集起来。

(2)Spread,在同等条件下,选择资源使用最少的节点,通过这一个策略,可以将容器均匀分布在每一个节点上。

(3)Random,随机选择一个节点。

 

 

 (1)172.16.40.07
[root@07Node ~]# vim /lib/systemd/system/docker.service
[Unit]
 Description=Docker Application Container Engine
 Documentation=https://docs.docker.com
 After=network.target docker.socket
 Requires=docker.socket
[Service]
 Type=notify
 # the default is not to use systemd for cgroups because the delegate issues still
 # exists and systemd currently does not support the cgroup feature set required
 # for containers run by docker
 ExecStart=/usr/bin/docker daemon --label storage=ssd --label nodenum=007 -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock --cluster-advertise=172.16.40.7:2375 --cluster-store=etcd://172.16.40.7:2379,172.16.10.216:2379,172.16.10.219:2379 --storage-driver=devicemapper
 MountFlags=slave
 LimitNOFILE=1048576
 LimitNPROC=1048576
 LimitCORE=infinity
 TimeoutStartSec=0
 # set delegate yes so that systemd does not reset the cgroups of docker containers
 Delegate=yes
[Install]
 WantedBy=multi-user.target
(2)172.16.10.216
[root@216Node ~]# vim /lib/systemd/system/docker.service
[Unit]
 Description=Docker Application Container Engine
 Documentation=https://docs.docker.com
 After=network.target docker.socket
 Requires=docker.socket
[Service]
 Type=notify
 # the default is not to use systemd for cgroups because the delegate issues still
 # exists and systemd currently does not support the cgroup feature set required
 # for containers run by docker
 ExecStart=/usr/bin/docker daemon --label storage=disk --label nodenum=216 -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock --cluster-advertise=172.16.10.216:2375 --cluster-store=etcd://172.16.40.7:2379,172.16.10.216:2379,172.16.10.219:2379 --storage-driver=devicemapper
 MountFlags=slave
 LimitNOFILE=1048576
 LimitNPROC=1048576
 LimitCORE=infinity
 TimeoutStartSec=0
 # set delegate yes so that systemd does not reset the cgroups of docker containers
 Delegate=yes
[Install]
 WantedBy=multi-user.target
(3)172.16.10.219
 [root@219Node ~]# vim /lib/systemd/system/docker.service
[Unit]
 Description=Docker Application Container Engine
 Documentation=https://docs.docker.com
 After=network.target docker.socket
 Requires=docker.socket
[Service]
 Type=notify
 # the default is not to use systemd for cgroups because the delegate issues still
 # exists and systemd currently does not support the cgroup feature set required
 # for containers run by docker
 ExecStart=/usr/bin/docker daemon --label nodenum=219 -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock --cluster-advertise=172.16.10.219:2375 --cluster-store=etcd://172.16.40.7:2379,172.16.10.216:2379,172.16.10.219:2379/dockerEtcd --storage-driver=devicemapper
 MountFlags=slave
 LimitNOFILE=1048576
 LimitNPROC=1048576
 LimitCORE=infinity
 TimeoutStartSec=0
 # set delegate yes so that systemd does not reset the cgroups of docker containers
 Delegate=yes
[Install]
 WantedBy=multi-user.target

---------------------------重启docker daemon-----------------
 systemctl daemon-reload
 systemctl restart docker
(4)swarm 查看集群信息发现标签已经生效:
 [root@219Node ~]# docker -H 172.16.10.216:2376 info
 Containers: 12
 Running: 6
 Paused: 0
 Stopped: 6
 Images: 20
 Server Version: swarm/1.2.0
 Role: primary
 Strategy: spread
 Filters: health, port, dependency, affinity, constraint
 Nodes: 3
 07Node: 172.16.40.7:2375
 └ Status: Healthy
 └ Containers: 4
 └ Reserved CPUs: 0 / 1
 └ Reserved Memory: 0 B / 1.019 GiB
 └ Labels: executiondriver=, kernelversion=3.10.0-327.13.1.el7.x86_64, nodenum=007, operatingsystem=CentOS Linux 7 (Core), storage=ssd, storagedriver=devicemapper
 └ Error: (none)
 └ UpdatedAt: 2016-05-25T02:09:05Z
 └ ServerVersion: 1.11.1
 216Node: 172.16.10.216:2375
 └ Status: Healthy
 └ Containers: 4
 └ Reserved CPUs: 0 / 1
 └ Reserved Memory: 0 B / 1.019 GiB
 └ Labels: executiondriver=, kernelversion=3.10.0-327.13.1.el7.x86_64, nodenum=216, operatingsystem=CentOS Linux 7 (Core), storage=disk, storagedriver=devicemapper
 └ Error: (none)
 └ UpdatedAt: 2016-05-25T02:09:13Z
 └ ServerVersion: 1.11.1
 219Node: 172.16.10.219:2375
 └ Status: Healthy
 └ Containers: 4
 └ Reserved CPUs: 0 / 1
 └ Reserved Memory: 0 B / 1.019 GiB
 └ Labels: executiondriver=, kernelversion=3.10.0-327.13.1.el7.x86_64, nodenum=219, operatingsystem=CentOS Linux 7 (Core), storagedriver=devicemapper
 └ Error: (none)
 └ UpdatedAt: 2016-05-25T02:09:02Z
 └ ServerVersion: 1.11.1
 Plugins:
 Volume:
 Network:
 Kernel Version: 3.10.0-327.13.1.el7.x86_64
 Operating System: linux
 Architecture: amd64
 CPUs: 3
 Total Memory: 3.056 GiB
 Name: af2034bde8b6
 Docker Root Dir:
 Debug mode (client): false
 Debug mode (server): false
 WARNING: No kernel memory limit support
 ---------------------------------------------------------
 (5)删除所有Exited状态的容器:
 [root@219Node ~]# docker -H 172.16.40.7:2376 rm `docker -H 172.16.40.7:2376 ps -a |awk '{print $0}' | grep Exited | awk '{print $1}' `
删除集群网络:
 [root@219Node ~]# docker -H 172.16.10.216:2376 network ls
 NETWORK ID NAME DRIVER
 11670f2de987 07Node/bridge bridge
 6ff9e687037b 07Node/docker_gwbridge bridge
 f9780d3d9142 07Node/host host
 402545cc54b5 07Node/none null
 4cb88ef7be76 216Node/bridge bridge
 14a0baa95b49 216Node/docker_gwbridge bridge
 ed741aad52e8 216Node/host host
 b2a3353efccc 216Node/none null
 abf480d37539 219Node/bridge bridge
 3442074de477 219Node/docker_gwbridge bridge
 31ffb5981e46 219Node/host host
 1663bf477daa 219Node/none null
 169d5c9b0ed9 overlaynetCluster overlay
 6f2ceadac04f overlaynetCluster2 overlay
 c06ce0837fd3 overlaynetCluster2 overlay
 [root@219Node ~]# docker -H 172.16.10.216:2376 network rm -h
Usage: docker network rm [OPTIONS] NETWORK [NETWORK...]
Deletes one or more networks
--help Print usage
 flag: help requested
 [root@219Node ~]# docker -H 172.16.10.216:2376 network rm 169d5c9b0ed9
 [root@219Node ~]# docker -H 172.16.10.216:2376 network rm 6f2ceadac04f
 [root@219Node ~]# docker -H 172.16.10.216:2376 network rm c06ce0837fd3
 Error response from daemon: Error response from daemon: network overlaynetCluster2 has active endpoints
 [root@219Node ~]#
如果有节点活跃在网络上,网络会删除失败 如上面.
 创建新的overlay网络olnet01:docker -H 172.16.10.216:2376 network create -d overlay olnet01
发现创建网络在40.07 10.216机器创建成功 在 10.219创建失败 原因是 /lib/systemd/system/docker.service 配置的etcd服务发现不对:最后的PATH 不需要:dockerEtcd
 10.219 : ExecStart=/usr/bin/docker daemon --label nodenum=219 -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock --cluster-advertise=172.16.10.219:2375 --cluster-store=etcd://172.16.40.7:2379,172.16.10.216:2379,172.16.10.219:2379/dockerEtcd --storage-driver=devicemapper
 ------------------------------------------------------
运行新的容器:
 在有标签ssd的机器节点运行容器
 docker -H 172.16.10.219:2376 run -itd -e constraint:storage==ssd --name ubantupc001 --net olnet01 ubuntu:14.04
和容器ubantupc001所在节点机器运行新容器
 docker -H 172.16.10.219:2376 run -itd -e affinity:container==ubantupc001 --name ubantupc001.1 --net olnet01 ubuntu:14.04

-m参数指定运行内存为1g docker -H 172.16.10.219:2376  run  -itd -m 1g -e constraint:storage==ssd  --name ubantupc001 --net olnet01 ubuntu:14.04

 可以指定容器运行在镜像被pull下来的节点上 -e affinity:image==ubuntu:14.04
下面容器不会被调度到219机器节点上,因为该节点没有pull 下来 ubuntu:14.04
 docker -H 172.16.10.219:2376 run -itd -e affinity:image==ubuntu:14.04 --name ubantupc004 --net olnet01 ubuntu:14.04
 docker -H 172.16.10.219:2376 run -itd -e affinity:image==ubuntu:14.04 --name ubantupc005 --net olnet01 ubuntu:14.04
调度到容器最少的机器节点上(被调度到219 因为219没有ubuntu:14.04镜像所以先pull镜像 导致容器启动比较慢)
 docker -H 172.16.10.219:2376 run -itd --name ubantupc006 --net olnet01 ubuntu:14.04
 ------------------------------------------------------------------------
Swarm重新调度(目前还没有实验成功)********************************************
 docker -H 172.16.10.219:2376 run -itd -l 'com.docker.swarm.reschedule-policy=["on-node-failure"]' ubuntu:14.04
 docker -H 172.16.10.219:2376 run -itd -e reschedule:on-node-failure ubuntu:14.04
docker -H 172.16.10.219:2376 run -itd -e reschedule:on-node-failure --name ubantupc008 --net olnet01 ubuntu:14.04
 docker -H 172.16.10.219:2376 run -itd -e reschedule:on-node-failure --name ubantupc009 --net olnet01 ubuntu:14.04

转载请注明:比度技术-关注互联网技术的个人博客 » Swarm调度实践