最新消息:比度技术-是关注云计算、大数据、分布式存储、高并发、高性能、人工智能等互联网技术的个人博客。

Swarm 调度,集群,操作实践[总结]

云计算 bidu 942浏览

Swarm 调度,集群,操作实践


Swarm 调度,集群,操作实践
---------------------------Swarm 调度实践-----------------------------
docker run --rm swarm  list etcd://172.16.40.7:2379/swarmcluser //查看集群信息
Swarm 调度实践

Scheduler
调度模块主要用户容器创建时,选择一个最优节点。在选择最优节点过程中,分为了两个阶段:

第一个阶段,是过滤。根据条件过滤出符合要求的节点,过滤器有以下5中,

(1)Constraints,约束过滤器,可以根据当前操作系统类型、内核版本、存储类型等条件进行过滤,当然也可以自定义约束,在启动daemon的时候,通过Label来指定当前主机所具有的特点。

(2)Affnity,亲和性过滤器,支持容器亲和性和镜像亲和性,比如一个web应用,我想将db容器和web容器放在一起,就可以通过这个过滤器来实现。

(3)Dependency,依赖过滤器。如果在创建容器的时候使用了--volume-from/--link/--net某个容器,则创建的容器会和依赖的容器在同一个节点上。

(4)Health filter,他会根据节点状态进行过滤,会去除故障节点。

(5)Ports filter,会根据端口的使用情况过滤。

调度的第二个阶段是根据策略选择一个最优节点。有以下三种策略

(1)Binpack,在同等条件下,选择资源使用最多的节点,通过这一个策略,可以将容器聚集起来。

(2)Spread,在同等条件下,选择资源使用最少的节点,通过这一个策略,可以将容器均匀分布在每一个节点上。

(3)Random,随机选择一个节点。

(1)172.16.40.07

[root@07Node ~]# vim /lib/systemd/system/docker.service

[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network.target docker.socket
Requires=docker.socket

[Service] 
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/docker daemon --label storage=ssd  --label nodenum=007  -H tcp://0.0.0.0:2375  -H unix:///var/run/docker.sock    --cluster-advertise=172.16.40.7:2375   --cluster-store=etcd://172.16.40.7:2379,172.16.10.216:2379,172.16.10.219:2379 --storage-driver=devicemapper
MountFlags=slave
LimitNOFILE=1048576
LimitNPROC=1048576
LimitCORE=infinity
TimeoutStartSec=0
# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes

[Install]
WantedBy=multi-user.target

(2)172.16.10.216

[root@216Node ~]#  vim /lib/systemd/system/docker.service

[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network.target docker.socket
Requires=docker.socket

[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/docker daemon  --label storage=disk  --label nodenum=216   -H tcp://0.0.0.0:2375  -H unix:///var/run/docker.sock    --cluster-advertise=172.16.10.216:2375   --cluster-store=etcd://172.16.40.7:2379,172.16.10.216:2379,172.16.10.219:2379 --storage-driver=devicemapper
MountFlags=slave
LimitNOFILE=1048576
LimitNPROC=1048576
LimitCORE=infinity
TimeoutStartSec=0
# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes

[Install]
WantedBy=multi-user.target

(3)172.16.10.219
[root@219Node ~]# vim /lib/systemd/system/docker.service

[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network.target docker.socket
Requires=docker.socket

[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/docker daemon --label nodenum=219  -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock    --cluster-advertise=172.16.10.219:2375   --cluster-store=etcd://172.16.40.7:2379,172.16.10.216:2379,172.16.10.219:2379/dockerEtcd --storage-driver=devicemapper
MountFlags=slave
LimitNOFILE=1048576
LimitNPROC=1048576
LimitCORE=infinity
TimeoutStartSec=0
# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes

[Install]
WantedBy=multi-user.target

---------------------------重启docker daemon-----------------
systemctl daemon-reload
systemctl restart docker

(4)swarm 查看集群信息发现标签已经生效:
[root@219Node ~]# docker -H 172.16.10.216:2376 info
Containers: 12
 Running: 6
 Paused: 0
 Stopped: 6
Images: 20
Server Version: swarm/1.2.0
Role: primary
Strategy: spread
Filters: health, port, dependency, affinity, constraint
Nodes: 3
 07Node: 172.16.40.7:2375
  └ Status: Healthy
  └ Containers: 4
  └ Reserved CPUs: 0 / 1
  └ Reserved Memory: 0 B / 1.019 GiB
  └ Labels: executiondriver=, kernelversion=3.10.0-327.13.1.el7.x86_64, nodenum=007, operatingsystem=CentOS Linux 7 (Core), storage=ssd, storagedriver=devicemapper
  └ Error: (none)
  └ UpdatedAt: 2016-05-25T02:09:05Z
  └ ServerVersion: 1.11.1
 216Node: 172.16.10.216:2375
  └ Status: Healthy
  └ Containers: 4
  └ Reserved CPUs: 0 / 1
  └ Reserved Memory: 0 B / 1.019 GiB
  └ Labels: executiondriver=, kernelversion=3.10.0-327.13.1.el7.x86_64, nodenum=216, operatingsystem=CentOS Linux 7 (Core), storage=disk, storagedriver=devicemapper
  └ Error: (none)
  └ UpdatedAt: 2016-05-25T02:09:13Z
  └ ServerVersion: 1.11.1
 219Node: 172.16.10.219:2375
  └ Status: Healthy
  └ Containers: 4
  └ Reserved CPUs: 0 / 1
  └ Reserved Memory: 0 B / 1.019 GiB
  └ Labels: executiondriver=, kernelversion=3.10.0-327.13.1.el7.x86_64, nodenum=219, operatingsystem=CentOS Linux 7 (Core), storagedriver=devicemapper
  └ Error: (none)
  └ UpdatedAt: 2016-05-25T02:09:02Z
  └ ServerVersion: 1.11.1
Plugins: 
 Volume: 
 Network: 
Kernel Version: 3.10.0-327.13.1.el7.x86_64
Operating System: linux
Architecture: amd64
CPUs: 3
Total Memory: 3.056 GiB
Name: af2034bde8b6
Docker Root Dir: 
Debug mode (client): false
Debug mode (server): false
WARNING: No kernel memory limit support
---------------------------------------------------------
(5)删除所有Exited状态的容器:
[root@219Node ~]# docker -H 172.16.40.7:2376  rm `docker -H 172.16.40.7:2376  ps  -a |awk '{print $0}' | grep Exited  | awk '{print $1}' `



删除集群网络:
[root@219Node ~]# docker -H 172.16.10.216:2376 network ls
NETWORK ID          NAME                      DRIVER
11670f2de987        07Node/bridge             bridge              
6ff9e687037b        07Node/docker_gwbridge    bridge              
f9780d3d9142        07Node/host               host                
402545cc54b5        07Node/none               null                
4cb88ef7be76        216Node/bridge            bridge              
14a0baa95b49        216Node/docker_gwbridge   bridge              
ed741aad52e8        216Node/host              host                
b2a3353efccc        216Node/none              null                
abf480d37539        219Node/bridge            bridge              
3442074de477        219Node/docker_gwbridge   bridge              
31ffb5981e46        219Node/host              host                
1663bf477daa        219Node/none              null                
169d5c9b0ed9        overlaynetCluster         overlay             
6f2ceadac04f        overlaynetCluster2        overlay             
c06ce0837fd3        overlaynetCluster2        overlay             
[root@219Node ~]# docker -H 172.16.10.216:2376 network  rm -h

Usage:	docker network rm [OPTIONS] NETWORK [NETWORK...]

Deletes one or more networks

  --help             Print usage
flag: help requested
[root@219Node ~]# docker -H 172.16.10.216:2376 network  rm 169d5c9b0ed9
[root@219Node ~]# docker -H 172.16.10.216:2376 network  rm 6f2ceadac04f
[root@219Node ~]# docker -H 172.16.10.216:2376 network  rm c06ce0837fd3
Error response from daemon: Error response from daemon: network overlaynetCluster2 has active endpoints
[root@219Node ~]# 

如果有节点活跃在网络上,网络会删除失败 如上面.
创建新的overlay网络olnet01:docker -H 172.16.10.216:2376  network create -d overlay  olnet01

发现创建网络在40.07 10.216机器创建成功 在 10.219创建失败 原因是  /lib/systemd/system/docker.service  配置的etcd服务发现不对:最后的PATH 不需要:dockerEtcd
10.219 : ExecStart=/usr/bin/docker daemon --label nodenum=219  -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock    --cluster-advertise=172.16.10.219:2375   --cluster-store=etcd://172.16.40.7:2379,172.16.10.216:2379,172.16.10.219:2379/dockerEtcd --storage-driver=devicemapper


-------------

运行新的容器:
在有标签ssd的机器节点运行容器
docker -H 172.16.10.219:2376  run  -itd  -e constraint:storage==ssd  --name ubantupc001 --net olnet01 ubuntu:14.04

docker -H 172.16.10.219:2376  run  -itd -m 1g -e constraint:storage==ssd  --name ubantupc001 --net olnet01 ubuntu:14.04



和容器ubantupc001所在节点机器运行新容器
docker -H 172.16.10.219:2376  run  -itd  -e affinity:container==ubantupc001   --name ubantupc001.1 --net olnet01 ubuntu:14.04


可以指定容器运行在镜像被pull下来的节点上 -e affinity:image==ubuntu:14.04

下面容器不会被调度到219机器节点上,因为该节点没有pull 下来 ubuntu:14.04
docker -H 172.16.10.219:2376  run  -itd  -e affinity:image==ubuntu:14.04   --name ubantupc004 --net olnet01 ubuntu:14.04
docker -H 172.16.10.219:2376  run  -itd  -e affinity:image==ubuntu:14.04   --name ubantupc005 --net olnet01 ubuntu:14.04

调度到容器最少的机器节点上(被调度到219 因为219没有ubuntu:14.04镜像所以先pull镜像 导致容器启动比较慢)
docker -H 172.16.10.219:2376  run  -itd    --name ubantupc006 --net olnet01 ubuntu:14.04
-----------------------------------------------

Swarm重新调度(目前还没有实验成功)********************************************
docker -H 172.16.10.219:2376  run  -itd  -l 'com.docker.swarm.reschedule-policy=["on-node-failure"]'   ubuntu:14.04
docker -H 172.16.10.219:2376  run  -itd  -e reschedule:on-node-failure   ubuntu:14.04

docker -H 172.16.10.219:2376  run  -itd  -e reschedule:on-node-failure   --name ubantupc008reschedule --net olnet01 ubuntu:14.04
docker -H 172.16.10.219:2376  run  -itd  -e reschedule:on-node-failure   --name ubantupc009 --net olnet01 ubuntu:14.04

初步估计是swarm的bug

docker -H 172.16.10.219:2376  run  -itd  -e reschedule:on-node-failure   --name ubantupc009reschedule  ubuntu:14.04


------------------------swarm集群实践高可用------------------------------------

这是三台虚拟机环境

docker1:172.16.40.7
docker2:172.16.10.216
docker3:172.16.10.219


systemctl daemon-reload

systemctl restart docker

etcd://172.16.40.7:2379,172.16.10.216:2379,172.16.10.219:2379/nodes


(一)集群etcd服务发现:
(1)单etcd节点

docker run --name swarm_manager --restart=always -p 2376:2375 -d swarm manage etcd://172.16.40.7:2379/nodes

docker run --name swarm_agent --restart=always -d swarm join --addr=172.16.40.7:2375  etcd://172.16.40.7:2379/nodes

docker run --name swarm_agent --restart=always -d swarm join --addr=172.16.10.216:2375  etcd://172.16.40.7:2379/nodes

docker run --name swarm_agent --restart=always -d swarm join --addr=172.16.10.219:2375  etcd://172.16.40.7:2379/nodes



(2)多etcd节点
docker run --name swarm_manager --restart=always -p 2376:2375 -d swarm manage etcd://172.16.40.7:2379,172.16.10.216:2379,172.16.10.219:2379/nodes

docker run --name swarm_agent --restart=always -d swarm join --addr=172.16.40.7:2375 etcd://172.16.40.7:2379,172.16.10.216:2379,172.16.10.219:2379/nodes
docker run --name swarm_agent --restart=always -d swarm join --addr=172.16.10.216:2375 etcd://172.16.40.7:2379,172.16.10.216:2379,172.16.10.219:2379/nodes
docker run --name swarm_agent --restart=always -d swarm join --addr=172.16.10.219:2375  etcd://172.16.40.7:2379,172.16.10.216:2379,172.16.10.219:2379/nodes




(3)信息查询
docker -H 172.16.40.7:2376 info

docker run --rm swarm  list etcd://172.16.40.7:2379/nodes

docker run --rm swarm  list etcd://172.16.10.219:2379/nodes




(二)Swarm manager高可用集群节点:

主管理节点标志,从管理节点无需配置 
–replication 告诉swarm此管理节点为多管理配置,并且此管理节点为主管理节点 
–advertise 指定主管理节点地址,主管理节点会广播信息给其它从管理节点


docker run --name swarm_manager001 --restart=always -p 2376:2375 -d  swarm manage --replication --advertise 172.16.40.7:2376 etcd://172.16.40.7:2379,172.16.10.216:2379,172.16.10.219:2379/swarmcluser 
 
docker run --name swarm_manager002 --restart=always -p 2376:2375 -d   swarm manage    --replication --advertise 172.16.10.216:2376 etcd://172.16.40.7:2379,172.16.10.216:2379,172.16.10.219:2379/swarmcluser 

docker run  --name swarm_manager003  --restart=always -p 2376:2375 -d   swarm manage   --replication --advertise 172.16.10.219:2376 etcd://172.16.40.7:2379,172.16.10.216:2379,172.16.10.219:2379/swarmcluser 


------------------------nohup模式-------------------------------------------------
nohup docker run --name swarm_manager001 --restart=always -p 2376:2375 -d  swarm manage --replication --advertise 172.16.40.7:2376 etcd://172.16.40.7:2379,172.16.10.216:2379,172.16.10.219:2379/swarmcluser > /var/log/swarmcluser.log 2>&1 &
 
nohup docker run --name swarm_manager002 --restart=always -p 2376:2375 -d   swarm manage    --replication --advertise 172.16.10.216:2376 etcd://172.16.40.7:2379,172.16.10.216:2379,172.16.10.219:2379/swarmcluser > /var/log/swarmcluser.log 2>&1 &

nohup docker run  --name swarm_manager003  --restart=always -p 2376:2375 -d   swarm manage   --replication --advertise 172.16.10.219:2376 etcd://172.16.40.7:2379,172.16.10.216:2379,172.16.10.219:2379/swarmcluser > /var/log/swarmcluser.log 2>&1 &

 --------------swarm agnet-------------------
 
docker run --name swarm_agent --restart=always -d swarm join --addr=172.16.40.7:2375 etcd://172.16.40.7:2379,172.16.10.216:2379,172.16.10.219:2379/swarmcluser

docker run --name swarm_agent --restart=always -d swarm join --addr=172.16.10.216:2375 etcd://172.16.40.7:2379,172.16.10.216:2379,172.16.10.219:2379/swarmcluser

docker run --name swarm_agent --restart=always -d swarm join --addr=172.16.10.219:2375  etcd://172.16.40.7:2379,172.16.10.216:2379,172.16.10.219:2379/swarmcluser

-----------------------------信息查询---------------------------------------------------
docker -H 172.16.40.7:2376 info

docker run --rm swarm  list etcd://172.16.40.7:2379/swarmcluser

docker run --rm swarm  list etcd://172.16.10.219:2379/swarmcluser


(三)集群token方式

[root@localhost ~]# docker run --rm swarm create
10cef56a3a44f2049012d8e9473a30ec
加入集群
docker run -d swarm join --addr=172.16.40.7:2375 token://10cef56a3a44f2049012d8e9473a30ec  
docker run -d swarm join --addr=172.16.10.216:2375 token://10cef56a3a44f2049012d8e9473a30ec
docker run -d swarm join --addr=172.16.10.219:2375 token://10cef56a3a44f2049012d8e9473a30ec
启动manager
docker run -d -p 2376:2375 swarm manage token://10cef56a3a44f2049012d8e9473a30ec

查看集群信息
docker run --rm swarm list  token://10cef56a3a44f2049012d8e9473a30ec

docker -H 172.16.10.216:2376 info

服务发现etcd:

docker pull index.tenxcloud.com/google_containers/etcd:2.2.1


curl -L  https://github.com/coreos/etcd/releases/download/v2.3.3/etcd-v2.3.3-linux-amd64.tar.gz -o etcd-v2.3.3-linux-amd64.tar.gz


查看容器:
docker ps -a

运行指定ID容器

docker start d0f04fc0ebfc


ps -ef | grep swarm  查看本机 swarm进程 



 
 -------------------------------swarm 操作实践-------------------------------------------------------
 
 1、swarm 删除集群上所有Exited状态的容器
docker -H 172.16.40.7:2376  rm `docker -H 172.16.40.7:2376  ps  -a |awk '{print $0}' | grep Exited  | awk '{print $1}' `
注意: rm后是使用反引号
[root@localhost ~]# docker -H 172.16.40.7:2376  rm `docker -H 172.16.40.7:2376  ps  -a |awk '{print $0}' | grep Created  | awk '{print $1}' `

2、swarm网络操作

[root@216Node etcd-v2.3.3-linux-amd64]# docker -H 172.16.40.7:2376 network ls
NETWORK ID          NAME                   DRIVER
099c353db598        07Node/bridge          bridge              
3f3c7ce382e6        07Node/host            host                
ee80a159a083        07Node/none            null                
6727b276c126        07Node/swarm_network   bridge              
7876950716e3        216Node/bridge         bridge              
805f02c8622b        216Node/host           host                
67f2fceb459a        216Node/none           null                
bf21a27dd42b        219Node/bridge         bridge              
e1d0666e26e6        219Node/host           host                
e067944532b0        219Node/none           null                
fac48871599a        overlaynet             overlay             
169d5c9b0ed9        overlaynetCluster      overlay             
[root@216Node etcd-v2.3.3-linux-amd64]# docker -H 172.16.40.7:2376 network rm 6727b276c126
[root@216Node etcd-v2.3.3-linux-amd64]# docker -H 172.16.40.7:2376 network rm fac48871599a

3、swarm 容器调度
 docker -H 172.16.40.7:2376 run hello-world  连续执行几次
 
 
 可以看到hello word 被调度到不同的节点机器上面
 
 [root@216Node etcd-v2.3.3-linux-amd64]# docker -H 172.16.40.7:2376 ps -a
CONTAINER ID        IMAGE               COMMAND                  CREATED                  STATUS                              PORTS                        NAMES
855ce9d886bc        hello-world         "/hello"                 Less than a second ago   Exited (0) Less than a second ago                                219Node/sick_mccarthy
52db1947eb95        hello-world         "/hello"                 About a minute ago       Exited (0) About a minute ago                                    216Node/insane_sinoussi
8b4aaacd2482        hello-world         "/hello"                 About a minute ago       Exited (0) About a minute ago                                    07Node/thirsty_einstein
e662c73a0daf        hello-world         "/hello"                 2 minutes ago            Exited (0) About a minute 
[root@216Node etcd-v2.3.3-linux-amd64]# 

[root@219Node ~]# docker  -H 172.16.40.7:2376 run  -itd --name net1c18 --net overlaynetCluster2 ubuntu:14.04
a26dd40128aa27202f48719f33cf7b063c19b66da073a0523248aa1064206708

[root@216Node ~]# docker  -H tcp://172.16.40.7:2376 ps
CONTAINER ID        IMAGE               COMMAND             CREATED              STATUS              PORTS               NAMES
e0577c1fe39b        ubuntu:14.04        "/bin/bash"         21 minutes ago       Up 21 minutes                           07Node/net1c5
97261b262aff        ubuntu:14.04        "/bin/bash"         22 minutes ago       Up 22 minutes                           216Node/net1c4
c45764753ed2        ubuntu:14.04        "/bin/bash"         25 minutes ago       Up 25 minutes                           216Node/net1c2
7aa3dba0475b        ubuntu:14.04        "/bin/bash"         42 minutes ago       Up 42 minutes                           07Node/net1c1


如果被选中的节点没有容器运行需要的基础镜像,就会先pull 镜像到该节点在再运行容器

-------------------->
swarm 停止容器:
[root@219Node ~]# docker -H 172.16.40.7:2376  rm `docker -H 172.16.40.7:2376  ps  -a |awk '{print $0}' | grep "07Node/net1"  | awk '{print $1}' `
Error response from daemon: 409 Conflict: You cannot remove a running container 23fb4d7de5a26153373307159076417c44659f33abca0c31f14a3f80f566e61f. Stop the container before attempting removal or use -f
[root@219Node ~]# docker -H 172.16.40.7:2376  stop  `docker -H 172.16.40.7:2376  ps  -a |awk '{print $0}' | grep "07Node/net1"  | awk '{print $1}' `
23fb4d7de5a2
2ff3ab6171ce

[root@219Node docker]# docker  -H tcp://172.16.40.7:2376 run  -itd --name ubantu014 --net overlaynetCluster2 ubuntu:14.04

(三)swarm操作集群网络
docker network ls
docker network -h
docker network create -h
docker network create -d overlay  ovlnet1

docker  -H tcp://172.16.40.7:2376 run  -itd --name ubantupc002 --net ovlnet1 ubuntu:14.04
[root@219Node ~]# docker -H 172.16.10.219:2376  run  -itd -m 1g -e constraint:storage==ssd  --name ubantupc001 --net ovlnet1  ubuntu:14.04


[root@219Node ~]# docker -H 172.16.10.219:2376  ps -a
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                          PORTS                          NAMES
52287cf282ca        ubuntu:14.04        "/bin/bash"              4 minutes ago       Exited (0) About a minute ago                                  07Node/ubantupc003
2e65752e58ba        ubuntu:14.04        "/bin/bash"              4 minutes ago       Up 4 minutes                                                   219Node/ubantupc002
fb1e8fe02859        ubuntu:14.04        "/bin/bash"              9 minutes ago       Up 9 minutes                                                   216Node/ubantupc001
68c8f02d5bb2        swarm               "/swarm manage --repl"   About an hour ago   Up About an hour                172.16.40.7:2376->2375/tcp     07Node/swarm_manager001
3ac9219f5723        swarm               "/swarm manage --repl"   About an hour ago   Up About an hour                172.16.10.219:2376->2375/tcp   219Node/swarm_manager003
af2034bde8b6        swarm               "/swarm manage --repl"   About an hour ago   Up About an hour                172.16.10.216:2376->2375/tcp   216Node/swarm_manager002
67b913c8290d        swarm               "/swarm join --addr=1"   2 hours ago         Up 2 hours                      2375/tcp                       219Node/swarm_agent
fb15ec94ef3b        swarm               "/swarm join --addr=1"   2 hours ago         Up 2 hours                      2375/tcp                       216Node/swarm_agent
e0d934030937        swarm               "/swarm join --addr=1"   2 hours ago         Up 2 hours                      2375/tcp                       07Node/swarm_agent
[root@219Node ~]# 


[root@219Node ~]# docker -H 172.16.10.219:2376  attach 52287cf282ca
You cannot attach to a stopped container, start it first
[root@219Node ~]# docker -H 172.16.10.219:2376  start  52287cf282ca
52287cf282ca
[root@219Node ~]# docker -H 172.16.10.219:2376  attach 52287cf282ca
root@52287cf282ca:/# 
root@52287cf282ca:/# ls
bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var
root@52287cf282ca:/# ifconfig
eth0      Link encap:Ethernet  HWaddr 02:42:0a:00:02:05  
          inet addr:10.0.2.5  Bcast:0.0.0.0  Mask:255.255.255.0
          inet6 addr: fe80::42:aff:fe00:205/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1450  Metric:1
          RX packets:15 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:1206 (1.2 KB)  TX bytes:648 (648.0 B)

eth1      Link encap:Ethernet  HWaddr 02:42:ac:12:00:02  
          inet addr:172.18.0.2  Bcast:0.0.0.0  Mask:255.255.0.0
          inet6 addr: fe80::42:acff:fe12:2/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:8 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:648 (648.0 B)  TX bytes:648 (648.0 B)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

root@52287cf282ca:/# 


---------------overlay网络---跨主机网络通信
[root@07Node ~]# docker -H 172.16.10.219:2376 attach 2e65752e58ba
root@2e65752e58ba:/# 
root@2e65752e58ba:/# ls
bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var
root@2e65752e58ba:/# ifconfig
eth0      Link encap:Ethernet  HWaddr 02:42:0a:00:00:02  
          inet addr:10.0.0.2  Bcast:0.0.0.0  Mask:255.255.255.0
          inet6 addr: fe80::42:aff:fe00:2/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1450  Metric:1
          RX packets:14 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:1116 (1.1 KB)  TX bytes:648 (648.0 B)

eth1      Link encap:Ethernet  HWaddr 02:42:ac:12:00:02  
          inet addr:172.18.0.2  Bcast:0.0.0.0  Mask:255.255.0.0
          inet6 addr: fe80::42:acff:fe12:2/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:16 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:1248 (1.2 KB)  TX bytes:648 (648.0 B)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

root@2e65752e58ba:/# ping 10.0.2.5
PING 10.0.2.5 (10.0.2.5) 56(84) bytes of data.
From 10.0.13.65 icmp_seq=5 Time to live exceeded
From 10.0.13.65 icmp_seq=12 Time to live exceeded
From 10.0.13.65 icmp_seq=35 Time to live exceeded

---------------------------------添加了路由信息---------------------------------------------
root@2e65752e58ba:/# vi /etc/hosts

127.0.0.1       localhost
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
10.0.0.2        2e65752e58ba
172.18.0.2      2e65752e58ba

--------------退出attach容器
docker attach my-web 退出,一定不要用ctrl+c,那样就是让docker容器停止了。要用如下快捷键:先按,ctrl+p ,ctrl+q
------------------------------------SWARM 调度原理--------------------------------------------------
3.1 swarm create
Swarm中swarm create命令用于创建一个集群标志,用于Swarm管理Docker集群时,Docker Node的节点发现功能。

发起该命令之后,Swarm会前往Docker Hub上内建的发现服务中获取一个全球唯一的token,用以唯一的标识Swarm管理的Docker集群。

注:Swarm的运行需要使用服务发现,目前该服务内建与Docker Hub,该服务发现机制目前还在alpha版本,站点为:http://discovery-stage.hub/docker.com 。

3.2 swarm manage
Swarm中swarm manage是最为重要的管理命令。一旦swarm manage命令在Swarm节点上被触发,则说明用户需要swarm开始管理Docker集群。

从运行流程的角度来讲,swarm经历的阶段主要有两点:启动swarm、接收并处理Docker集群管理请求。

Swarm启动的过程包含三个步骤:

发现Docker集群中的各个节点,收集节点状态、角色信息,并监视节点状态的变化;
初始化内部调度(scheduler)模块;
创建并启动API监听服务模块;
第一个步骤,Swarm发现Docker集群中的节点。发现(discovery)是Swarm中用于维护Docker集群状态的机制。既然涉及到发现(discovery),那在这之前必须先有注册(register)。Swarm中有专门负责发现(discovery)的模块,而关于注册(register)部分,不同的discovery模式下,注册(register)也会有不同的形式。

目前,Swarm中提供了5种不同的发现(discovery)机制:Node Discovery、File Discovery、Consul Discovery、EtcD Discovery和Zookeeper Discovery。

第二个步骤,Swarm内部的调度(scheduler)模块被初始化。swarm通过发现机制发现所有注册的Docker Node,并收集到所有Docker Node的状态以及具体信息。此后,一旦Swarm接收到具体的Docker管理请求,Swarm需要对请求进行处理,并通过所有Docker Node的状态以及具体信息,来筛选(filter)决策到底哪些Docker Node满足要求,并通过一定的策略(strategy)将请求转发至具体的一个Docker Node。

第三个步骤,Swarm创建并初始化API监听服务模块。从功能的角度来讲,可以将该模块抽象为Swarm Server。需要说明的是:虽然Swarm Server完全兼容Docker的API,但是有不少Docker的命令目前是不支持的,毕竟管理Docker集群与管理单独的Docker会有一些区别。当Swarm Server被初始化并完成监听之后,用户即可以通过Docker Client向Swarm发送Docker集群的管理请求。

Swarm的swarm manage接收并处理Docker集群的管理请求,即是Swarm内部多个模块协同合作的结果。请求入口为Swarm Server,处理引擎为Scheduler,节点信息依靠Disocovery。

3.3 swarm join
Swarm的swarm join命令用于将Docker Node添加至Swarm管理的Docker集群中。从这点也可以看出swarm join命令的执行位于Docker Node,因此在Docker Node上运行该命令,首先需要在Docker Node上安装Swarm,由于该Swarm只会执行swarm join命令,故可以将其当成Docker Node上用于注册的agent模块。

功能而言,swarm join可以认为是完成Docker Node在Swarm节点处的注册(register)工作,以便Swarm在执行swarm manage时可以发现该Docker Node。然而,上文提及的5种discovery模式中,并非每种模式都支持swarm join命令。不支持的discovery的模式有Node Discovery与File Discovery。

Docker Node上swarm join执行之后,标志着Docker Node向Swarm注册,请求加入Swarm管理的Docker集群中。Swarm通过注册信息,发现Docker Node,并获取Docker Node的状态以及具体信息,以便处理Docker请求时作为调度依据。

3.4 swarm list
Swarm中的swarm list命令用以列举Docker集群中的Docker Node。

Docker Node的信息均来源于Swarm节点上注册的Docker Node。而一个Docker Node在Swarm节点上注册,仅仅是注册了Docker Node的IP地址以及Docker监听的端口号。

使用swarm list命令时,需要指定discovery的类型,类型包括:token、etcd、file、zk以及。而swarm list并未罗列Docker集群的动态信息,比如Docker Node真实的运行状态,或者Docker Node在Docker集群中扮演的角色信息。

4.总结
Swarm的
----------------------------swarm集群 etcd k/v--------------------
[root@219Node ~]# curl http://172.16.10.219:2379/v2/keys
{
    "action": "get",
    "node": {
        "dir": true,
        "nodes": [
            {
                "key": "/docker",
                "dir": true,
                "modifiedIndex": 4,
                "createdIndex": 4
            },
            {
                "key": "/swarmcluser",
                "dir": true,
                "modifiedIndex": 6,
                "createdIndex": 6
            }
        ]
    }
}

[root@219Node ~]# curl http://172.16.10.219:2379/v2/keys/swarmcluser
{
    "action": "get",
    "node": {
        "key": "/swarmcluser",
        "dir": true,
        "nodes": [
            {
                "key": "/swarmcluser/docker",
                "dir": true,
                "modifiedIndex": 6,
                "createdIndex": 6
            }
        ],
        "modifiedIndex": 6,
        "createdIndex": 6
    }
}

[root@219Node ~]# curl http://172.16.10.219:2379/v2/keys/swarmcluser/docker

{
    "action": "get",
    "node": {
        "key": "/swarmcluser/docker",
        "dir": true,
        "nodes": [
            {
                "key": "/swarmcluser/docker/swarm",
                "dir": true,
                "modifiedIndex": 6,
                "createdIndex": 6
            }
        ],
        "modifiedIndex": 6,
        "createdIndex": 6
    }
}

[root@219Node ~]# curl http://172.16.10.219:2379/v2/keys/swarmcluser/docker/swarm

{
    "action": "get",
    "node": {
        "key": "/swarmcluser/docker/swarm",
        "dir": true,
        "nodes": [
            {
                "key": "/swarmcluser/docker/swarm/nodes",
                "dir": true,
                "modifiedIndex": 6,
                "createdIndex": 6
            },
            {
                "key": "/swarmcluser/docker/swarm/leader",
                "value": "172.16.40.7:2376",
                "expiration": "2016-05-30T02:28:12.673387916Z",
                "ttl": 14,
                "modifiedIndex": 90165,
                "createdIndex": 58
            }
        ],
        "modifiedIndex": 6,
        "createdIndex": 6
    }
}

[root@219Node ~]# curl http://172.16.10.219:2379/v2/keys/swarmcluser/docker/swarm/leader

{
    "action": "get",
    "node": {
        "key": "/swarmcluser/docker/swarm/leader",
        "value": "172.16.40.7:2376",
        "expiration": "2016-05-30T02:29:06.006676038Z",
        "ttl": 18,
        "modifiedIndex": 90184,
        "createdIndex": 58
    }
}

[root@219Node ~]# curl http://172.16.10.219:2379/v2/keys/swarmcluser/docker/swarm/nodes
{
    "action": "get",
    "node": {
        "key": "/swarmcluser/docker/swarm/nodes",
        "dir": true,
        "nodes": [
            {
                "key": "/swarmcluser/docker/swarm/nodes/172.16.40.7:2375",
                "value": "172.16.40.7:2375",
                "expiration": "2016-05-30T02:31:35.349659329Z",
                "ttl": 156,
                "modifiedIndex": 90180,
                "createdIndex": 90180
            },
            {
                "key": "/swarmcluser/docker/swarm/nodes/172.16.10.219:2375",
                "value": "172.16.10.219:2375",
                "expiration": "2016-05-30T02:31:08.898274583Z",
                "ttl": 130,
                "modifiedIndex": 90170,
                "createdIndex": 90170
            },
            {
                "key": "/swarmcluser/docker/swarm/nodes/172.16.10.216:2375",
                "value": "172.16.10.216:2375",
                "expiration": "2016-05-30T02:31:08.896933661Z",
                "ttl": 130,
                "modifiedIndex": 90171,
                "createdIndex": 90171
            }
        ],
        "modifiedIndex": 6,
        "createdIndex": 6
    }
}



 

转载请注明:比度技术-关注互联网技术的个人博客 » Swarm 调度,集群,操作实践[总结]