您的位置：首页 > 其它

进程和线程的故事

2015-12-27 17:38 267 查看

综述

多任务可以由多进程完成，也可以由一个进程内的多线程完成。

进程是由若干线程组成的，一个进程至少有一个线程

线程是操作系统直接支持的执行单元，因此，高级语言通常都内置多线程的支持，Python也不例外，并且，Python的线程是真正的Posix Thread，而不是模拟出来的线程

Python的标准库提供了两个模块：

thread

和

threading

，

thread

是低级模块，

threading

是高级模块，对

thread

进行了封装。绝大多数情况下，我们只需要使用

threading

这个高级模块。

Unix/Linux操作系统提供了一个

fork()

系统调用，它非常特殊。普通的函数调用，调用一次，返回一次，但是

fork()

调用一次，返回两次，因为操作系统自动把当前进程（称为父进程）复制了一份（称为子进程），然后，分别在父进程和子进程内返回。子进程永远返回

，而父进程返回子进程的ID。这样做的理由是，一个父进程可以fork出很多子进程，所以，父进程要记下每个子进程的ID，而子进程只需要调用

getppid()

就可以拿到父进程的ID。 Python的

os

模块封装了常见的系统调用，其中就包括

fork

，可以在Python程序中轻松创建子进程：

# multiprocessing.py
import os

print 'Process (%s) start...' % os.getpid()
pid = os.fork()
if pid==0:
print 'I am child process (%s) and my parent is %s.' % (os.getpid(), os.getppid())
else:
print 'I (%s) just created a child process (%s).' % (os.getpid(), pid)

运行结果如下：

Process (876) start...
I (876) just created a child process (877).
I am child process (877) and my parent is 876.

由于Windows没有

fork

调用，上面的代码在Windows上无法运行。

有了

fork

调用，一个进程在接到新任务时就可以复制出一个子进程来处理新任务，常见的Apache服务器就是由父进程监听端口，每当有新的http请求时，就fork出子进程来处理新的http请求。

multiprocessing

如果你打算编写多进程的服务程序，Unix/Linux无疑是正确的选择。由于Windows没有

fork

调用，难道在Windows上无法用Python编写多进程的程序？
由于Python是跨平台的，自然也应该提供一个跨平台的多进程支持。

multiprocessing

模块就是跨平台版本的多进程模块。

multiprocessing

模块提供了一个

Process

类来代表一个进程对象，下面的例子演示了启动一个子进程并等待其结束：

from multiprocessing import Process
import os

# 子进程要执行的代码
def run_proc(name):
print 'Run child process %s (%s)...' % (name, os.getpid())

if __name__=='__main__':
print 'Parent process %s.' % os.getpid()
p = Process(target=run_proc, args=('test',))
print 'Process will start.'
p.start()
p.join()
print 'Process end.'

执行结果如下：

Parent process 928.
Process will start.
Run child process test (929)...
Process end.

创建子进程时，只需要传入一个执行函数和函数的参数，创建一个

Process

实例，用

start()

方法启动，这样创建进程比

fork()

还要简单。

join()

方法可以等待子进程结束后再继续往下运行，通常用于进程间的同步。
pool
如果要启动大量的子进程，可以用进程池的方式批量创建子进程：

from multiprocessing import Pool
import os, time, random

def long_time_task(name):
print 'Run task %s (%s)...' % (name, os.getpid())
start = time.time()
time.sleep(random.random() * 3)
end = time.time()
print 'Task %s runs %0.2f seconds.' % (name, (end - start))

if __name__=='__main__':
print 'Parent process %s.' % os.getpid()
p = Pool()
for i in range(5):
p.apply_async(long_time_task, args=(i,))
print 'Waiting for all subprocesses done...'
p.close()
p.join()
print 'All subprocesses done.'

执行结果如下：

Parent process 669.
Waiting for all subprocesses done...
Run task 0 (671)...
Run task 1 (672)...
Run task 2 (673)...
Run task 3 (674)...
Task 2 runs 0.14 seconds.
Run task 4 (673)...
Task 1 runs 0.27 seconds.
Task 3 runs 0.86 seconds.
Task 0 runs 1.41 seconds.
Task 4 runs 1.91 seconds.
All subprocesses done.

代码解读：
对

Pool

对象调用

join()

方法会等待所有子进程执行完毕，调用

join()

之前必须先调用

close()

，调用

close()

之后就不能继续添加新的

Process

了。
请注意输出的结果，task

，

是立刻执行的，而task

要等待前面某个task完成后才执行，这是因为

Pool

的默认大小在我的电脑上是4，因此，最多同时执行4个进程。这是

Pool

有意设计的限制，并不是操作系统的限制。如果改成：

p = Pool(5)

就可以同时跑5个进程。
由于

Pool

的默认大小是CPU的核数，如果你不幸拥有8核CPU，你要提交至少9个子进程才能看到上面的等待效果。
进程间通信

Process

之间肯定是需要通信的，操作系统提供了很多机制来实现进程间的通信。Python的

multiprocessing

模块包装了底层的机制，提供了

Queue

、

Pipes

等多种方式来交换数据。
我们以

Queue

为例，在父进程中创建两个子进程，一个往

Queue

里写数据，一个从

Queue

里读数据：

from multiprocessing import Process, Queue
import os, time, random

# 写数据进程执行的代码:
def write(q):
for value in ['A', 'B', 'C']:
print 'Put %s to queue...' % value
q.put(value)
time.sleep(random.random())

# 读数据进程执行的代码:
def read(q):
while True:
value = q.get(True)
print 'Get %s from queue.' % value

if __name__=='__main__':
# 父进程创建Queue，并传给各个子进程：
q = Queue()
pw = Process(target=write, args=(q,))
pr = Process(target=read, args=(q,))
# 启动子进程pw，写入:
pw.start()
# 启动子进程pr，读取:
pr.start()
# 等待pw结束:
pw.join()
# pr进程里是死循环，无法等待其结束，只能强行终止:
pr.terminate()

运行结果如下：

Put A to queue...
Get A from queue.
Put B to queue...
Get B from queue.
Put C to queue...
Get C from queue.

在Unix/Linux下，

multiprocessing

模块封装了

fork()

调用，使我们不需要关注

fork()

的细节。由于Windows没有

fork

调用，因此，

multiprocessing

需要“模拟”出

fork

的效果，父进程所有Python对象都必须通过pickle序列化再传到子进程去，所有，如果

multiprocessing

在Windows下调用失败了，要先考虑是不是pickle失败了。
总之,

在Unix/Linux下，可以使用

fork()

调用实现多进程。
要实现跨平台的多进程，可以使用

multiprocessing

模块。
进程间通信是通过

Queue

、

Pipes

等实现的。

线程

线程是应用程序中工作的最小单元.

使用threading进行线程相关的操作.

import time, threading

# 新线程执行的代码:
def loop():
print 'thread %s is running...' % threading.current_thread().name
n = 0
while n < 5:
n = n + 1
print 'thread %s >>> %s' % (threading.current_thread().name, n)
time.sleep(1)
print 'thread %s ended.' % threading.current_thread().name

print 'thread %s is running...' % threading.current_thread().name
t = threading.Thread(target=loop, name='LoopThread')
t.start()
t.join()
print 'thread %s ended.' % threading.current_thread().name

这个程序的执行结果如下:

thread MainThread is running...
thread LoopThread is running...
thread LoopThread >>> 1
thread LoopThread >>> 2
thread LoopThread >>> 3
thread LoopThread >>> 4
thread LoopThread >>> 5
thread LoopThread ended.
thread MainThread ended.

由于任何进程默认就会启动一个线程，我们把该线程称为主线程，主线程又可以启动新的线程，Python的

threading

模块有个

current_thread()

函数，它永远返回当前线程的实例。主线程实例的名字叫

MainThread

，子线程的名字在创建时指定，我们用

LoopThread

命名子线程。名字仅仅在打印时用来显示，完全没有其他意义，如果不起名字Python就自动给线程命名为

Thread-1

，

Thread-2

……

更多方法：

start 线程准备就绪，等待CPU调度
setName 为线程设置名称
getName 获取线程名称
setDaemon 设置为后台线程或前台线程（默认）
如果是后台线程，主线程执行过程中，后台线程也在进行，主线程执行完毕后，后台线程不论成功与否，均停止
如果是前台线程，主线程执行过程中，前台线程也在进行，主线程执行完毕后，等待前台线程也执行完成后，程序停止
join 逐个执行每个线程，执行完毕后继续往下执行，该方法使得多线程变得无意义
run 线程被cpu调度后执行Thread类对象的run方法

线程锁

多线程和多进程最大的不同在于，多进程中，同一个变量，各自有一份拷贝存在于每个进程中，互不影响，而多线程中，所有变量都由所有线程共享，所以，任何一个变量都可以被任何一个线程修改，因此，线程之间共享数据最大的危险在于多个线程同时改一个变量，把内容给改乱了。
来看看多个线程同时操作一个变量怎么把内容给改乱了：

import time, threading

# 假定这是你的银行存款:
balance = 0

def change_it(n):
# 先存后取，结果应该为0:
global balance
balance = balance + n
balance = balance - n

def run_thread(n):
for i in range(100000):
change_it(n)

t1 = threading.Thread(target=run_thread, args=(5,))
t2 = threading.Thread(target=run_thread, args=(8,))
t1.start()
t2.start()
t1.join()
t2.join()
print balance

我们定义了一个共享变量

balance

，初始值为

，并且启动两个线程，先存后取，理论上结果应该为

，但是，由于线程的调度是由操作系统决定的，当t1、t2交替执行时，只要循环次数足够多，

balance

的结果就不一定是

了。
原因是因为高级语言的一条语句在CPU执行时是若干条语句，即使一个简单的计算：

balance = balance + n

也分两步：

计算

balance + n

，存入临时变量中；
将临时变量的值赋给

balance

。

也就是可以看成：

x = balance + n
balance = x

由于x是局部变量，两个线程各自都有自己的x，当代码正常执行时：

初始值 balance = 0

t1: x1 = balance + 5 # x1 = 0 + 5 = 5
t1: balance = x1     # balance = 5
t1: x1 = balance - 5 # x1 = 5 - 5 = 0
t1: balance = x1     # balance = 0

t2: x2 = balance + 8 # x2 = 0 + 8 = 8
t2: balance = x2     # balance = 8
t2: x2 = balance - 8 # x2 = 8 - 8 = 0
t2: balance = x2     # balance = 0

结果 balance = 0

但是t1和t2是交替运行的，如果操作系统以下面的顺序执行t1、t2：

初始值 balance = 0

t1: x1 = balance + 5  # x1 = 0 + 5 = 5

t2: x2 = balance + 8  # x2 = 0 + 8 = 8
t2: balance = x2      # balance = 8

t1: balance = x1      # balance = 5
t1: x1 = balance - 5  # x1 = 5 - 5 = 0
t1: balance = x1      # balance = 0

t2: x2 = balance - 5  # x2 = 0 - 5 = -5
t2: balance = x2      # balance = -5

结果 balance = -5

究其原因，是因为修改balance需要多条语句，而执行这几条语句时，线程可能中断，从而导致多个线程把同一个对象的内容改乱了。

两个线程同时一存一取，就可能导致余额不对，你肯定不希望你的银行存款莫名其妙地变成了负数，所以，我们必须确保一个线程在修改balance的时候，别的线程一定不能改。

在这样的情况下,我们会想到,只要给需要的线程加上一把锁就可以实现我们的要求了

#!/usr/bin/env python
#coding:utf-8

import threading
import time

gl_num = 0

lock = threading.RLock()

def Func():
lock.acquire()
global gl_num
gl_num +=1
time.sleep(1)
print gl_num
lock.release()

for i in range(10):
t = threading.Thread(target=Func)
t.start()

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航