您的位置:首页 > 运维架构 > Docker

【docker 17 源码分析】docker run container 源码分析一 docker create

2016-12-02 15:53 615 查看



问题:Docker run 来执行一个容器,那么执行 Docker run 之后到底都做了什么工作呢?

     首先,用户通过Docker client输入
docker
run
来创建一个容器。Docker client 主要的工作是通过解析用户所提供的一系列参数后,分别发送了这样两条请求:

    docker create 和 docker start,这篇文章分析 docker create 命令

一. docker create container 源码分析

1. 客户端 ContainerCreate

    函数位置 client/container_create.go,如下所示:

func (cli *Client) ContainerCreate(ctx context.Context, config *container.Config, hostConfig *container.HostConfig,
networkingConfig *network.NetworkingConfig, containerName string) (container.ContainerCreateCreatedBody, error) {
var response container.ContainerCreateCreatedBody

if err := cli.NewVersionError("1.25", "stop timeout"); config != nil && config.StopTimeout != nil && err != nil {

// When using API 1.24 and under, the client is responsible for removing the container
if hostConfig != nil && versions.LessThan(cli.ClientVersion(), "1.25") {
hostConfig.AutoRemove = false
}

query := url.Values{}
if containerName != "" {
query.Set("name", containerName)
}

body := configWrapper{
Config:           config,
HostConfig:       hostConfig,
NetworkingConfig: networkingConfig,
}

serverResp, err := cli.post(ctx, "/containers/create", query, body, nil)

err = json.NewDecoder(serverResp.body).Decode(&response)
ensureReaderClosed(serverResp)
return response, err
}


      客户端发送的请求,POST 请求发到 daemon API Router,body 中包含,容器基本配置 container.Config,container.HostConfig,network.NetworkingConfig。

type Config struct {
Hostname        string              // Hostname
Domainname      string              // Domainname
User            string              // User that will run the command(s) inside the container, also support user:group
AttachStdin     bool                // Attach the standard input, makes possible user interaction
AttachStdout    bool                // Attach the standard output
AttachStderr    bool                // Attach the standard error
ExposedPorts    nat.PortSet         `json:",omitempty"` // List of exposed ports
Tty             bool                // Attach standard streams to a tty, including stdin if it is not closed.
OpenStdin       bool                // Open stdin
StdinOnce       bool                // If true, close stdin after the 1 attached client disconnects.
Env             []string            // List of environment variable to set in the container
Cmd             strslice.StrSlice   // Command to run when starting the container
Healthcheck     *HealthConfig       `json:",omitempty"` // Healthcheck describes how to check the container is healthy
ArgsEscaped     bool                `json:",omitempty"` // True if command is already escaped (Windows specific)
Image           string              // Name of the image as it was passed by the operator (e.g. could be symbolic)
Volumes         map[string]struct{} // List of volumes (mounts) used for the container
WorkingDir      string              // Current directory (PWD) in the command will be launched
Entrypoint      strslice.StrSlice   // Entrypoint to run when starting the container
NetworkDisabled bool                `json:",omitempty"` // Is network disabled
MacAddress      string              `json:",omitempty"` // Mac Address of the container
OnBuild         []string            // ONBUILD metadata that were defined on the image Dockerfile
Labels          map[string]string   // List of labels set to this container
StopSignal      string              `json:",omitempty"` // Signal to stop a container
StopTimeout     *int                `json:",omitempty"` // Timeout (in seconds) to stop a container
Shell           strslice.StrSlice   `json:",omitempty"` // Shell for shell-form of RUN, CMD, ENTRYPOINT
}

        Config 主要目的是基于容器的可移植性信息,与 host 相互独立,非可移植性在 HostConfig 结构体中。Config 包括容器的基本信息,名字,输入输出流等,不解释结构体注释很清晰。
    
r.routes = []router.Route{
// POST
router.NewPostRoute("/containers/create", r.postContainersCreate),
router.NewPostRoute("/containers/{name:.*}/kill", r.postContainersKill),


     根据 server 中路由(container/container.go),则进入 postContainersCreate 函数

2. 服务端 PostContainersCreate

    函数位置 api/server/router/container/container_routes.go
unc (s *containerRouter) postContainersCreate(ctx context.Context, w http.ResponseWriter,
......
name := r.Form.Get("name")

config, hostConfig, networkingConfig, err := s.decoder.DecodeConfig(r.Body)

// When using API 1.24 and under, the client is responsible for removing the container
if hostConfig != nil && versions.LessThan(version, "1.25") {
hostConfig.AutoRemove = false
}

ccr, err := s.backend.ContainerCreate(types.ContainerCreateConfig{
Name:             name,
Config:           config,
HostConfig:       hostConfig,
NetworkingConfig: networkingConfig,
AdjustCPUShares:  adjustCPUShares,
})

return httputils.WriteJSON(w, http.StatusCreated, ccr)
}

    
    函数位置 daemon/create.go 中 ContainerCreate 如下所示:

func (daemon *Daemon) ContainerCreate(params types.ContainerCreateConfig) (containertypes.ContainerCreateCreatedBody, error) {
return daemon.containerCreate(params, false)
}

func (daemon *Daemon) containerCreate(params types.ContainerCreateConfig, managed bool) (containertypes.ContainerCreateCreatedBody, error) {
......
container, err := daemon.create(params, managed)
if err != nil {
return containertypes.ContainerCreateCreatedBody{Warnings: warnings}, daemon.imageNotExistToErrcode(err)
}
containerActions.WithValues("create").UpdateSince(start)

return containertypes.ContainerCreateCreatedBody{ID: container.ID, Warnings: warnings}, nil
}

    主要函数 daemon.create 放置 3 章节详解

2. 服务端 daemon create

    2.1 找到镜像,稍后分析,现在只知道有镜像就可以

if params.Config.Image != "" {
img, err = daemon.GetImage(params.Config.Image)
if err != nil {
return nil, err
}
}


    2.2 mergeAndVerifyConfig 主要是将 img 合并到 params.Config 中,

if err := daemon.mergeAndVerifyConfig(params.Config, img); err != nil {
return nil, err
}


     2.3 daemon.newContainer 函数主要生成一个 container 结构体,这里包括 id 和 name 的确定,如下所示:

func (daemon *Daemon) newContainer(name string, platform string, config *containertypes.Config,
hostConfig *containertypes.HostConfig, imgID image.ID, managed bool) (*container.Container, error) {

id, name, err = daemon.generateIDAndName(name)

base := daemon.newBaseContainer(id)
base.Created = time.Now().UTC()
base.Managed = managed
base.Path = entrypoint
base.Args = args //FIXME: de-duplicate from config
base.Config = config
base.HostConfig = &containertypes.HostConfig{}
base.ImageID = imgID
base.NetworkSettings = &network.Settings{IsAnonymousEndpoint: noExplicitName}
base.Name = name
base.Driver = daemon.GraphDriverName(platform)
base.Platform = platform
return base, err
}


   2.4 new 一个结构体 container,接下来的就是为启动容器所做的工作:创建读写层


// Set RWLayer for container after mount labels have been set
if err := daemon.setRWLayer(container); err != nil {
return nil, err
}


func (daemon *Daemon) setRWLayer(container *container.Container) error {
var layerID layer.ChainID
if container.ImageID != "" {
img, err := daemon.stores[container.Platform].imageStore.Get(container.ImageID)
layerID = img.RootFS.ChainID()
}

rwLayerOpts := &layer.CreateRWLayerOpts{
MountLabel: container.MountLabel,
InitFunc:   daemon.getLayerInit(),
StorageOpt: container.HostConfig.StorageOpt,
}

rwLayer, err := daemon.stores[container.Platform].layerStore.CreateRWLayer(container.ID, layerID, rwLayerOpts)
container.RWLayer = rwLayer

return nil
}   


    CreateRWLayerOpts 为创建读写层的参数,最主要函数为 CreateRWLayer,在第 3 章中讲解, 位于文件layer/layer_store.go文件

   2.5 以 root uid gid 的属性创建目录,在
/var/lib/docker/containers 目录下创建容器文件,并在容器文件下创建 checkpoints 目录

rootIDs := daemon.idMappings.RootPair()
if err := idtools.MkdirAndChown(container.Root, 0700, rootIDs); err != nil {
return nil, err
}
if err := idtools.MkdirAndChown(container.CheckpointDir(), 0700, rootIDs); err != nil {
return nil, err
}


   2.6 setHostConfig

daemon.registerMountPoints 注册所有挂载到容器的数据卷 (2.6.1 讲解)
daemon.registerLinks,load 所有 links(包括父子关系),写入 host 配置至文件 (2.6.2
讲解)

func (daemon *Daemon) setHostConfig(container *container.Container, hostConfig *containertypes.HostConfig) error {
// Do not lock while creating volumes since this could be calling out to external plugins
// Don't want to block other actions, like `docker ps` because we're waiting on an external plugin
if err := daemon.registerMountPoints(container, hostConfig); err != nil {
return err
}

container.Lock()
defer container.Unlock()

// Register any links from the host config before starting the container
if err := daemon.registerLinks(container, hostConfig); err != nil {

runconfig.SetDefaultNetModeIfBlank(hostConfig)
container.HostConfig = hostConfig
return container.CheckpointTo(daemon.containersReplica)
}


    2.6.1 registerMountPoints 函数如下:

位于daemon/volumes.go,注册所有挂载到容器的数据卷,bind 挂载。主要有三种方式和来源:

容器本身自带的挂载的数据卷,容器的json镜像文件中 Volumes 这个 key 对应内容;

// 1. Read already configured mount points.
for destination, point := range container.MountPoints {
mountPoints[destination] = point
}


其他容器(--volumes-from)挂载的数据卷;

// 2. Read volumes from other containers.
for _, v := range hostConfig.VolumesFrom {
containerID, mode, err := volume.ParseVolumesFrom(v)
c, err := daemon.GetContainer(containerID)

for _, m := range c.MountPoints {
cp := &volume.MountPoint{
Type:        m.Type,
Name:        m.Name,
Source:      m.Source,
RW:          m.RW && volume.ReadWrite(mode),
Driver:      m.Driver,
Destination: m.Destination,
Propagation: m.Propagation,
Spec:        m.Spec,
CopyData:    false,
}

if len(cp.Source) == 0 {
v, err := daemon.volumes.GetWithRef(cp.Name, cp.Driver, container.ID)
cp.Volume = v
}
dereferenceIfExists(cp.Destination)
mountPoints[cp.Destination] = cp
}
}


命令行参数 -v 挂载与主机绑定的数据卷,与主机绑定得数据卷在docker中叫做 bind-mounts;

// 3. Read bind mounts
for _, b := range hostConfig.Binds {
bind, err := volume.ParseMountRaw(b, hostConfig.VolumeDriver)

_, tmpfsExists := hostConfig.Tmpfs[bind.Destination]
if binds[bind.Destination] || tmpfsExists {
return fmt.Errorf("Duplicate mount point '%s'", bind.Destination)
}

if bind.Type == mounttypes.TypeVolume {
// create the volume
v, err := daemon.volumes.CreateWithRef(bind.Name, bind.Driver, container.ID, nil, nil)

bind.Volume = v
bind.Source = v.Path()
// bind.Name is an already existing volume, we need to use that here
bind.Driver = v.DriverName()
if bind.Driver == volume.DefaultDriverName {
setBindModeIfNull(bind)
}
}

binds[bind.Destination] = true
dereferenceIfExists(bind.Destination)
mountPoints[bind.Destination] = bind
}


    2.6.2 registerLinks 记录父子以及别名之间的关系,将 hostconfig 写入文件 hostconfig.json 中

// registerLinks writes the links to a file.
func (daemon *Daemon) registerLinks(container *container.Container, hostConfig *containertypes.HostConfig) error {
if hostConfig == nil || hostConfig.NetworkMode.IsUserDefined() {
return nil
}

for _, l := range hostConfig.Links {
name, alias, err := opts.ParseLink(l)

child, err := daemon.GetContainer(name)

for child.HostConfig.NetworkMode.IsContainer() {
parts := strings.SplitN(string(child.HostConfig.NetworkMode), ":", 2)
child, err = daemon.GetContainer(parts[1])
}
if child.HostConfig.NetworkMode.IsHost() {
return runconfig.ErrConflictHostNetworkAndLinks
}
if err := daemon.registerLink(container, child, alias); err != nil {

}

// After we load all the links into the daemon
// set them to nil on the hostconfig
_, err := container.WriteHostConfig()
return err
}


    2.7 createContainerPlatformSpecificSettings 

Mount 函数在 /var/lib/docker/aufs/mnt 目录下创建文件,以及设置工作目录

// createContainerPlatformSpecificSettings performs platform specific container create functionality
func (daemon *Daemon) createContainerPlatformSpecificSettings(container *container.Container, config *containertypes.Config, hostConfig *containertypes.HostConfig) error {
if err := daemon.Mount(container); err != nil {

rootIDs := daemon.idMappings.RootPair()
if err := container.SetupWorkingDirectory(rootIDs); err != nil {

for spec := range config.Volumes {
name := stringid.GenerateNonCryptoID()
destination := filepath.Clean(spec)

// Skip volumes for which we already have something mounted on that
// destination because of a --volume-from.
if container.IsDestinationMounted(destination) {
continue
}
path, err := container.GetResourcePath(destination)

stat, err := os.Stat(path)
if err == nil && !stat.IsDir() {
return fmt.Errorf("cannot mount volume over existing file, file exists %s", path)
}

v, err := daemon.volumes.CreateWithRef(name, hostConfig.VolumeDriver, container.ID, nil, nil)

if err := label.Relabel(v.Path(), container.MountLabel, true); err != nil {

container.AddMountPointWithVolume(destination, v, true)
}
return daemon.populateVolumes(container)
}


    2.8 SetDefaultNetModeIfBlank 如果没有设置网络,将网络模式设置为 default

func SetDefaultNetModeIfBlank(hc *container.HostConfig) {
if hc != nil {
if hc.NetworkMode == container.NetworkMode("") {
hc.NetworkMode = container.NetworkMode("default")
}
}
}


3. 服务端创建读写层 CreateRWLayer
    所需要的层关系在 4 章节讲解

func (ls *layerStore) CreateRWLayer(name string, parent ChainID, opts *CreateRWLayerOpts) (RWLayer, error) {
if opts != nil {
mountLabel = opts.MountLabel
storageOpt = opts.StorageOpt
initFunc = opts.InitFunc
}

ls.mountL.Lock()
defer ls.mountL.Unlock()
m, ok := ls.mounts[name]

if string(parent) != "" {
p = ls.get(parent)
if p == nil {
return nil, ErrLayerDoesNotExist
}
pid = p.cacheID
}

m = &mountedLayer{
name:       name,
parent:     p,
mountID:    ls.mountID(name),
layerStore: ls,
references: map[RWLayer]*referencedRWLayer{},
}

if initFunc != nil {
pid, err = ls.initMount(m.mountID, pid, mountLabel, initFunc, storageOpt)
m.initID = pid
}

createOpts := &graphdriver.CreateOpts{
StorageOpt: storageOpt,
}

if err = ls.driver.CreateReadWrite(m.mountID, pid, createOpts); err != nil {

if err = ls.saveMount(m); err != nil {

return m.getReference(), nil
}


   3.1 initMount 函数如下, CreateReadWrite
创建

func (ls *layerStore) initMount(graphID, parent, mountLabel string, initFunc MountInit, storageOpt map[string]string) (string, error) {
// Use "<graph-id>-init" to maintain compatibility with graph drivers
// which are expecting this layer with this special name. If all
// graph drivers can be updated to not rely on knowing about this layer
// then the initID should be randomly generated.
initID := fmt.Sprintf("%s-init", graphID)

createOpts := &graphdriver.CreateOpts{
MountLabel: mountLabel,
StorageOpt: storageOpt,
}

if err := ls.driver.CreateReadWrite(initID, parent, createOpts); err != nil {
return "", err
}
p, err := ls.driver.Get(initID, "")
if err != nil {
return "", err
}

if err := initFunc(p); err != nil {
ls.driver.Put(initID)
return "", err
}

if err := ls.driver.Put(initID); err != nil {
return "", err
}

return initID, nil
}


 
   3.1.1 CreateReadWrite 函数如下, 在
/var/lib/docker/aufs 目录下创建两个文件 mnt 和 diff,创建 /var/lib/docker/aufs/layers/${id} 文件,获得该层的父层,记录所有父层 id 该文件

func (a *Driver) CreateReadWrite(id, parent string, opts *graphdriver.CreateOpts) error {
return a.Create(id, parent, opts)
}

// Create three folders for each id
// mnt, layers, and diff
func (a *Driver) Create(id, parent string, opts *graphdriver.CreateOpts) error{
if err := a.createDirsFor(id); err != nil {
return err
}
// Write the layers metadata
f, err := os.Create(path.Join(a.rootPath(), "layers", id))

if parent != "" {
ids, err := getParentIDs(a.rootPath(), parent)

if _, err := fmt.Fprintln(f, parent); err != nil {

for _, i := range ids {
if _, err := fmt.Fprintln(f, i); err != nil {

}

return nil
}


   3.2 saveMount
函数是在 /var/lib/image/aufs/layerdb/mounts目录下操作,如下所示:

func (ls *layerStore) saveMount(mount *mountedLayer) error {
if err := ls.store.SetMountID(mount.name, mount.mountID); err != nil {

if mount.initID != "" {
if err := ls.store.SetInitID(mount.name, mount.initID); err != nil {

if mount.parent != nil {
if err := ls.store.SetMountParent(mount.name, mount.parent.chainID); err != nil {

ls.mounts[mount.name] = mount

return nil
}


    3.2.1 SetMountID
函数位置 layer/filestore.go,主要是在 /var/lib/docker/image/aufs/layerdb/mounts 目录下创建层,将 ${mount-id} 写入 mount-id 文件

func (fms *fileMetadataStore) SetMountID(mount string, mountID string) error {
if err := os.MkdirAll(fms.getMountDirectory(mount), 0755); err != nil {
return err
}
return ioutil.WriteFile(fms.getMountFilename(mount, "mount-id"), []byte(mountID), 0644)
}


    3.2.2 SetInitID
主要是在 ${mount-id}-init 写入 init-id 文件

func (fms *fileMetadataStore) SetInitID(mount string, init string) error {
if err := os.MkdirAll(fms.getMountDirectory(mount), 0755); err != nil {
return err
}
return ioutil.WriteFile(fms.getMountFilename(mount, "init-id"), []byte(init), 0644)
}


    3.2.2 SetMountParent 将父层 image 记录 parent 文件

func (fms *fileMetadataStore) SetMountParent(mount string, parent ChainID) error {
if err := os.MkdirAll(fms.getMountDirectory(mount), 0755); err != nil {
return err
}
return ioutil.WriteFile(fms.getMountFilename(mount, "parent"), []byte(digest.Digest(parent).String()), 0644)
}


4. AUFS 层介绍
    layerStore结构体如下所示:

type layerStore struct {
store  MetadataStore
driver graphdriver.Driver

layerMap map[ChainID]*roLayer
layerL   sync.Mutex

mounts map[string]*mountedLayer
mountL sync.Mutex

useTarSplit bool

platform string
}


    MetadataStore 为接口,主要为获得层基本信息的方法。 metadata 是这个层的额外信息,不仅能够让 Docker 获取运行和构建的信息,也包括父层的层次信息(只读层和读写层都包含元数据)。

// MetadataStore represents a backend for persisting
// metadata about layers and providing the metadata
// for restoring a Store.
type MetadataStore interface {
// StartTransaction starts an update for new metadata
// which will be used to represent an ID on commit.
StartTransaction() (MetadataTransaction, error)

GetSize(ChainID) (int64, error)
GetParent(ChainID) (ChainID, error)
GetDiffID(ChainID) (DiffID, error)
GetCacheID(ChainID) (string, error)
GetDescriptor(ChainID) (distribution.Descriptor, error)
GetPlatform(ChainID) (Platform, error)
TarSplitReader(ChainID) (io.ReadCloser, error)

SetMountID(string, string) error
SetInitID(string, string) error
SetMountParent(string, ChainID) error

GetMountID(string) (string, error)
GetInitID(string) (string, error)
GetMountParent(string) (ChainID, error)

// List returns the full list of referenced
// read-only and read-write layers
List() ([]ChainID, []string, error)

Remove(ChainID) error
RemoveMount(string) error
}


    graphdriver.Driver 也为接口,主要以 aufs 主要介绍,在 daemon/graphdriver 文件下有 aufs,btrfs,devmapper,overlay 等的实现。 除差别和改动等的方法,graphdriver 最主要的功能是 Get、 Put、 Create 和 Remove 方法 。
ype ProtoDriver interface {
// String returns a string representation of this driver.
String() string
// CreateReadWrite creates a new, empty filesystem layer that is ready
// to be used as the storage for a container. Additional options can
// be passed in opts. parent may be "" and opts may be nil.
CreateReadWrite(id, parent string, opts *CreateOpts) error
// Create creates a new, empty, filesystem layer with the
// specified id and parent and options passed in opts. Parent
// may be "" and opts may be nil.
Create(id, parent string, opts *CreateOpts) error
// Remove attempts to remove the filesystem layer with this id.
Remove(id string) error
// Get returns the mountpoint for the layered filesystem referred
// to by this id. You can optionally specify a mountLabel or "".
// Returns the absolute path to the mounted layered filesystem.
Get(id, mountLabel string) (dir string, err error)
// Put releases the system resources for the specified id,
// e.g, unmounting layered filesystem.
Put(id string) error
// Exists returns whether a filesystem layer with the specified
// ID exists on this driver.
Exists(id string) bool
// Status returns a set of key-value pairs which give low
// level diagnostic status about this driver.
Status() [][2]string
// Returns a set of key-value pairs which give low level information
// about the image/container driver is managing.
GetMetadata(id string) (map[string]string, error)
// Cleanup performs necessary tasks to release resources
// held by the driver, e.g., unmounting all layered filesystems
// known to this driver.
Cleanup() error
}

// DiffDriver is the interface to use to implement graph diffs
type DiffDriver interface {
// Diff produces an archive of the changes between the specified
// layer and its parent layer which may be "".
Diff(id, parent string) (io.ReadCloser, error)
// Changes produces a list of changes between the specified layer
// and its parent layer. If parent is "", then all changes will be ADD changes.
Changes(id, parent string) ([]archive.Change, error)
// ApplyDiff extracts the changeset from the given diff into the
// layer with the specified id and parent, returning the size of the
// new layer in bytes.
// The archive.Reader must be an uncompressed stream.
ApplyDiff(id, parent string, diff io.Reader) (size int64, err error)
// DiffSize calculates the changes between the specified id
// and its parent and returns the size in bytes of the changes
// relative to its base filesystem directory.
DiffSize(id, parent string) (size int64, err error)
}

// Driver is the interface for layered/snapshot file system drivers.
type Driver interface {
ProtoDriver
DiffDriver
}


    每一层都包括指向父层的指针。如果没有这个指针,说明处于最底层。

type roLayer struct {
chainID    ChainID
diffID     DiffID
parent     *roLayer
cacheID    string
size       int64
layerStore *layerStore
descriptor distribution.Descriptor
platform   Platform

referenceCount int
references     map[Layer]struct{}
}


总结:
   命令行创建调用 docker API,/containers/create,body 包含配置(host config 网络)

   1. daemon 端初始化一个container 对象用于创建,创建一个读写层,aufs 中 mnt diff parent 等纪录父子之间关系 id
   2. 创建容器根目录,mnt 目录
   3. 设置挂载,网络等
     docker create 主要是准备 container 的 layer 和配置文件
     docker 将用户指定的参数和 image 配置文件中的部分参数进行合并,然后将合并后生成的容器的配置文件放在 /var/lib/docker/containers 下面,目录名称就是容器的ID

config.v2.json: 通用的配置,如容器名称,要执行的命令等

hostconfig.json: 主机相关的配置,跟操作系统平台有关,如cgroup的配置

checkpoints: 容器的checkpoint这个功能在当前版本还是experimental状态。

  /var/lib/docker/aufs
目录下创建两个文件 mnt diff parent 目录 

   /var/lib/docker/image/aufs 记录层之间的之间的关系
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: