Skip to content

Commit

Permalink
translate doc
Browse files Browse the repository at this point in the history
Signed-off-by: 泰铭曹 <[email protected]>
  • Loading branch information
caotaiming committed Jul 24, 2018
1 parent 6ba9623 commit 15b3a99
Show file tree
Hide file tree
Showing 3 changed files with 26 additions and 27 deletions.
Binary file removed .DS_Store
Binary file not shown.
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ origin doc link : https://github.com/pouchcontainer/blog/blob/master/blog-cn/Pou

### 6. IO stream processing

Kubernetes provides functions such as kubectl exec/attach/port-forward to enable direct interaction between the user and a specific Pod or container. As follows:
Kubernetes provides functions such as kubectl exec/attach/port-forward to enable direct interaction between the user and a specific Pod or container, as showned below:

```sh
aster $ kubectl exec -it shell-demo -- /bin/bash
Expand All @@ -13,15 +13,15 @@ boot etc lib media opt root sbin sys usr
root@shell-demo:/#
```

As you can see, exec a Pod is equivalent to ssh logging into the container. Then, we analyze the processing of IO requests in Kubernetes and the role that CRI Manager plays in the execution flow of kubectl exec.
As you can see, `exec` a Pod is equivalent to logging into the container via `ssh`. Then, we analyze the processing of IO requests in Kubernetes and the role CRI Manager plays in the execution flow of `kubectl exec`.


![stream-3.png | left | 827x296](https://cdn.yuque.com/lark/0/2018/png/103564/1527478375654-1c891ac5-7dd0-4432-9f72-56c4feb35ac6.png "")


As shown in picture above, the steps to execute a kubectl exec command are as follows:
As shown in picture above, the steps to execute a `kubectl exec` command are as follows:

1. The essence of the `kubectl exec` command is executing an exec command on a container in the Kubernetes cluster and forward the resulting IO stream to the user. The request will first be forwarded to the Kubelet of the node of the container located, and the Kubelet will then call the `Exec` port in the CRI according to the configuration. The requested configuration parameters are as follows:
1. The essence of the `kubectl exec` command is to execute an exec command on a container in the Kubernetes cluster and forward the output IO stream to the user. The request will first be forwarded to the Kubelet where the node of the container located, and the Kubelet will then call the `Exec` API in the CRI according to the configuration. The requested configuration parameters are as follows:

```go
type ExecRequest struct {
Expand All @@ -34,7 +34,7 @@ As shown in picture above, the steps to execute a kubectl exec command are as fo
}
```
2. Surprisingly, CRI Manager's `Exec` method does not directly call Container Manager and execute the exec command on the target container, but instead calls the built-in Stream Server's `GetExec`.
3. The working of Stream Server's `GetExec` method is saving the contents of the exec request to the Request Cache shown above and return a token. Using this token, we can retrieve the corresponding exec request from the Request Cache again. Finally, write the token to a URL and return as a result layer to the ApiServer.
3. The Stream Server's `GetExec` method is used for saving the contents of the exec request to the Request Cache shown above and return a token. Using this token, we can retrieve the corresponding exec request from the Request Cache again. Finally, write the token to a URL and return as a result layer to the ApiServer.
4. ApiServer using the returned URL directly initiate an http request to the Stream Server of the node of the target container. The request header contains the "Upgrade" field, which is required to upgrade the http protocol to a streaming protocol such as websocket or SPDY to support the processing of multiple IO streams. This article takes SPDY as an example.
5. The Stream Server processes the request sent by the ApiServer. First according to the token in the URL, the previously saved exec request configuration is obtained from the Request Cache. Then reply to the http request, agree to upgrade the protocol to SPDY, and wait for the ApiServer creating a specified number of streams according to the configuration of the exec request, corresponding to standard input Stdin, standard output Stdout, standard error output Stderr.
6. After the Stream Server obtains the specified number of Streams, call the Container Manager's `CreateExec` and `startExec` methods in turn, performed the exec operation on the target container, and forwarded the IO stream to the corresponding stream.
Expand All @@ -44,7 +44,7 @@ In fact, before introducing of CRI, Kubernetes handled the IO in the same way as

### 7. Summary

This paper begins with the reason of introducing CRI. Then briefly describes the architecture of CRI, and focuses on the implementation of PouchContainer's core functional modules. The presence of CRI makes it easier and faster for PouchContainer containers to join the Kubernetes ecosystem. We also believe that the unique features of PouchContainer will definitely make the Kubernetes ecosystem more diverse.
This paper begins with the reason of introducing of CRI, briefly describes the architecture of CRI, and focuses on the implementation of PouchContainer's core functional modules. The presence of CRI makes it easier and faster for PouchContainer containers to join the Kubernetes ecosystem. We also believe that the unique features of PouchContainer will definitely make the Kubernetes ecosystem more diverse.
<span data-type="color" style="color:rgb(38, 38, 38)"><span data-type="background" style="background-color:rgb(255, 255, 255)"> The design and implementation of PouchContainer CRI is a joint research project of the Alibaba-Zhejiang University Frontier Technology Joint Research Center, which aims to help PouchContainer as a mature container runtime and actively embrace CNCF on the ecological level. The outstanding technical strength of Zhejiang University's SEL Lab has effectively helped Pouch complete the CRI gap, and it is expected to create immeasurable value in Alibaba and other data centers using PouchContainer.</span></span>

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,15 @@ origin doc link :https://github.com/pouchcontainer/blog/blob/master/blog-cn/Po
# PouchContainer supporting LXCFS to achieve high-reliability container isolation

## Introduction
PouchContainer is an open-source runtime container software developed by Alibaba. The latest released version is 0.3.0, located at [https://github.com/alibaba/pouch](https://github.com/alibaba/pouch). PouchContainer is designed to support LXCFS to realize highly reliable container separation. While Linux adopted cgroup technology to realize resource separation, the problem that users obtain host information when trying to read files in /proc/meminfo/, caused by host machine's file system still mounting in container, remains unresolved. The lack of `/proc view isolation` will cause a series of problems such as stalling or obstructing enterprise business containerization. LXCFS ([https://github.com/lxc/lxcfs](https://github.com/lxc/lxcfs)) is an open-source file system solution for resolving `/proc view isolation` issue, making the container acting like a traditional virtual machine in presentation layer. This article will first introduce the appropriate business scene for LXCFS and then introduce how LXCFS works in PouchContainer.
PouchContainer is an open-source runtime container software developed by Alibaba. The latest released version is 0.3.0, located at [https://github.com/alibaba/pouch](https://github.com/alibaba/pouch). PouchContainer is designed to support LXCFS to realize highly reliable container separation. While Linux adopted cgroup technology to achieve resource separation, this solution still causes problem. For example, because host machine's file system usually still stays mounting in container, users will obtain host information instead of actual informations when trying to read files in /proc/meminfo/. The lack of `/proc view isolation` will cause a series of problems and then further stalls or obstructs enterprise business containerization. LXCFS ([https://github.com/lxc/lxcfs](https://github.com/lxc/lxcfs)) is an open-source FUSE file system solution for resolving `/proc view isolation` issue, making the container acting more like a traditional virtual machine in the presentation layer. This article will first introduce the appropriate business scenario for LXCFS and then introduce how LXCFS works in PouchContainer.


## LXCFS Business Scene
In the age of physical machine and virtual machine, the Alibaba developed a internal toolbox including compiler packing, application deployment, unified monitoring. These tools have been providing stable services for applications that deploy physical machines and virtual machines.
## LXCFS Business Scenario
In the age of physical machine and virtual machine, Alibaba developed an internal toolbox including compiling, packing, application deployment, and unified monitoring. These tools have been providing stable services for applications that are deployed in physical machines and virtual machines. Next, we present how LXCFS works in the containerization process from monitoring, operations, and applications deployment aspects in detail.


### Monitoring and Operational tools
Most monitoring tools rely on the /proc file system to retrieve system information. In the example of Alibaba, part of the infrastructural monitoring tools collect informations through tsar([https://github.com/alibaba/tsar](https://github.com/alibaba/tsar)). However, collecting memory and CPU information in tsar depends on /proc file system. We can download tsar source code to learn how tsar uses files under /proc.
Most monitoring tools rely on the /proc file system to retrieve system information. In the example of Alibaba's monitoring system, part of the infrastructural monitoring tools collect informations through tsar([https://github.com/alibaba/tsar](https://github.com/alibaba/tsar)). However, collecting memory and CPU information in tsar depends on /proc file system. We can download tsar source code to learn how tsar uses files under /proc.

```
$ git remote -v
Expand All @@ -22,26 +22,26 @@ $ grep -r cpuinfo .
:tsar letty$ grep -r meminfo .
./include/define.h:#define MEMINFO "/proc/meminfo"
./include/public.h:#define MEMINFO "/proc/meminfo"
./info.md:内存的计数器在/proc/meminfo,里面有一些关键项
./info.md:memory counter is in /proc/meminfo,there are some key elements
./modules/mod_proc.c: /* read total mem from /proc/meminfo */
./modules/mod_proc.c: fp = fopen("/proc/meminfo", "r");
./modules/mod_swap.c: * Read swapping statistics from /proc/vmstat & /proc/meminfo.
./modules/mod_swap.c: /* read /proc/meminfo */
$ grep -r diskstats .
./include/public.h:#define DISKSTATS "/proc/diskstats"
./info.md:IO的计数器文件是:/proc/diskstats,比如:
./info.md:IO Counter is :/proc/diskstats, for example:
./modules/mod_io.c:#define IO_FILE "/proc/diskstats"
./modules/mod_io.c:FILE *iofp; /* /proc/diskstats*/
./modules/mod_io.c: handle_error("Can't open /proc/diskstats", !iofp);
```

It is obvious that tsar's monitoring of process, IO, and cpu relies on /proc file system.
It is obvious that tsar's monitoring of processes, IO, and CPU relies on /proc file system.

When the information provided by /proc file system is from host machine, these monitorings cannot monitor the information in container. To satisfy business demand to appropriate container monitoring, it is even nessary to develop another set of monitoring tools specifically for a container. This issue will, in nature, stall or even obstruct the containerization of enterprise existing business. Therefore, container technology must have the compatibility of original existing monitoring tools to avoid develop new tools everytime and preserve nice user interface that customs to Engineers' user habits.
When the information provided by /proc file system is from host machines, these monitorings cannot monitor the information in the container. To satisfy business demand to appropriate container monitoring, it is even nessary to develop another set of monitoring tools specifically for a container. This issue will, in nature, stall or even obstruct the containerization of existing enterprise business. Therefore, container technology must be compatible with existing monitoring tools to avoid new tools development and accustoms to Engineers' user habits.

PouchContainer is a tool that support LXCFS and get rid of issues listed above. PouchContainer resolves listed issues by transparentizing the monitoring and operational tools that depend on /proc file system deployed in container or on the host. Existing monitoring and operational tools will be able to transition into container to achieve in-container monitoring and operations without appropriating structure or re-developing.
PouchContainer is a tool that supports LXCFS and capable of getting rid of issues listed above. PouchContainer resolves listed issues by transparentizing the monitoring and maintenence tools that depend on /proc file system deployed in the container or on the host. Existing monitoring and maintenence tools will be able to transition into containers to achieve in-container monitoring and operations without appropriating structure or re-developing tools.

Next, let's see from an example of installing PouchContainer 0.3.0 in Ubuntu:
Next, let's see an example of installing PouchContainer 0.3.0 in Ubuntu:



Expand Down Expand Up @@ -71,7 +71,7 @@ Cached: 433928 kB
/ # cat /proc/uptime
2594376.56 2208749.32
```
It is obvious to see the consistency between the outputs from /proc/meminfo、uptime files and those from the host machine. Although we designated 50 M memory in start time, /proc/meminfo files does not demonstrate the memory limit in containers.
It is obvious to see the consistency between the outputs from /proc/meminfo、uptime files and those from the host machine. Although we designated 50 M memory in start time, /proc/meminfo files does not demonstrate the memory limit in the container.

Starting LXCFS service inside the host machine, manually invoking pouchd process and designating related relative LXCFS parameters:

Expand Down Expand Up @@ -99,23 +99,22 @@ Cached: 4 kB
/ # cat /proc/uptime
10.00 10.00
```

Using LXCFS started container and reading in-container /proc files to obtain relative information in container.
To obtain relative information in the container, use containers started by LXCFS and read in-container /proc files

### Business Applications
For most applications which are relied heavily on the operation system, the startup program of applications needs to obtain information about the system's memory, CPU, and so on.
For most applications that heavily rely on the operation system, the launch procedure of applications need to obtain information about the system's memory, CPU, and so on.
When the '/proc' file in the container does not accurately reflect the resources condition of the container, it will cause significant effects to above applications.

For example, when some Java applications dynamically allocate the stack size of the running program by checking /proc/meminfo and the container memory limit is less than the host memory, program startup failure will appear because of failed memory allocation.
For example, when some Java applications dynamically allocate the stack size of the running program by checking /proc/meminfo in launch script. When the container memory limit is less than the host memory, programs will fail to launch due to failed memory allocation.

For DPDK related applications, the application tools needs to get CPU information and the CPU logic core used by initialization in the EAL layer from /proc/cpuinfo.
If the above information cannot be accurately obtained in the container, the DPDK application needs to modify the corresponding tool.
For DPDK related applications, the application tools need to get CPU information and the CPU logic core used by initialization in the EAL layer from /proc/cpuinfo.
If the above information cannot be accurately obtained in the container, the DPDK application needs to modify the corresponding tools.


## PouchContainer integrated LXCFS
PouchContainer supports LXCFS from version 0.1.0, please check this instance:[https://github.com/alibaba/pouch/pull/502](https://github.com/alibaba/pouch/pull/502)

In short, when the container starts, it will through -v mount the mount point of LXCFS on the host /var/lib/lxc/lxcfs/proc/ to the virtual filesystem directory /proc inside the container.
In short, when the container starts, it will mount the mount point of LXCFS on the host /var/lib/lxc/lxcfs/proc/ through -v to the virtual filesystem directory /proc inside the container.

At this point you can see in the /proc directory of the container, some proc files include meminfo, uptime, swaps, stat, diskstats, cpuinfo and so on.
The parameters are as follows:
Expand All @@ -131,14 +130,14 @@ The parameters are as follows:
-v /var/lib/lxc/lxcfs/proc/cpuinfo:/proc/cpuinfo
```

To simplify usage, the pouch create and run command lines provide parameters `--enableLxcfs`. If specify above parameters when you are creating the container, you can omit the complicated `-v` parameters.
To simplify usage, the pouch creates and runs command lines to provide parameters `--enableLxcfs`. If the above parameters are specified when you are creating the container, you can omit the complicated `-v` parameters.

After a period of using and testing, we found after lxcfs restarts, proc and cgroup will be rebuilt. That will cause a `connect failed` error when users access /proc in the container.

In order to enhance the stability of LXCFS, in pull request:[https://github.com/alibaba/pouch/pull/885](https://github.com/alibaba/pouch/pull/885) ,refined the management method of LXCFS by systemd guarantee. In order to do that, use remote operation by adding ExecStartPost to lxcfs.service, traverse LXCFS container, and mount again in the container.
In order to enhance the stability of LXCFS, in pull request:[https://github.com/alibaba/pouch/pull/885](https://github.com/alibaba/pouch/pull/885) , the management method of refining LXCFS is now guaranteed by systemd. In order to achieve this, use remote operation by adding ExecStartPost to lxcfs.service, traverse LXCFS container, and then mount again in the container.

## Summary

PouchContainer supports LXCFS implement view isolation for /proc filesystems within containers which will reduce the original tool chain as well as operation and maintenance habits in the process of enterprise storage application containerization, and speed up the process. PouchContainer will support enterprises strongly for the smooth transition from traditional virtualization to container virtualization.
PouchContainer supports using LXCFS to implement view isolation for /proc filesystems within containers. This will reduce the original tool chains as well as operation and maintenance habits in the containerization process of enterprise storage applications, and speed up the process itself. PouchContainer will provide strong support to enterprises for the smooth transition from traditional virtualization to container virtualization.


0 comments on commit 15b3a99

Please sign in to comment.