您的位置：首页 > 编程语言 > Lua

Lua 5.3 源码分析（十）线程的执行与中断

2017-08-18 09:56 369 查看

Lua 5.3 源码分析（十）线程的执行与中断

Lua 的程序运行时以线程为单位的。每个Lua 线程可以独立运行直到自行中断，把中断的信息留在状态机中。每条线程的执行互不干扰，可以独立延续之前中断的执行过程。

Lua 线程和系统线程无关，所以不会为每条 Lua 线程创建独立的系统堆栈，而是利用自己维护的线程栈，内存开销也就远小于系统线程。

Lua 是一门嵌入式语言，和 C 语言混合编程是一种常态。一旦Lua 调用的 C 库中企图中断线程，延续它就是一个巨大的难题。

异常处理

如果Lua 被实现为一个纯粹的运行在字节码 VM 上的语言，只要不出 VM，可以很容易的实现自己的线程和异常处理。

事实上，Lua 的函数调用层次上只要没有 C 函数，是不会在 C 层面的调用栈上深入下去的。

但当Lua 函数调用了 C 函数，而这个 C 函数又进一步回调了 Lua 函数，这个问题就复杂很多。

Lua 的标准库中的 pairs 函数，就是一个典型的 C 扩展函数，却又回调了 Lua 函数。

Lua 底层把异常和线程中断用同一种机制来处理，也就是使用了 C 语言标准的 longjmp 机制来解决这个问题。

#if !defined(LUAI_THROW) /* { */

#if defined(__cplusplus) && !defined(LUA_USE_LONGJMP)   /* { */

/* C++ exceptions */
#define LUAI_THROW(L,c)     throw(c)
#define LUAI_TRY(L,c,a) \
try { a } catch(...) { if ((c)->status == 0) (c)->status = -1; }
#define luai_jmpbuf     int  /* dummy variable */

#elif defined(LUA_USE_POSIX)                /* }{ */

/* in POSIX, try _longjmp/_setjmp (more efficient) */
#define LUAI_THROW(L,c)     _longjmp((c)->b, 1)
#define LUAI_TRY(L,c,a)     if (_setjmp((c)->b) == 0) { a }
#define luai_jmpbuf     jmp_buf

#else                           /* }{ */

/* ISO C handling with long jumps */
#define LUAI_THROW(L,c)     longjmp((c)->b, 1)
#define LUAI_TRY(L,c,a)     if (setjmp((c)->b) == 0) { a }
#define luai_jmpbuf     jmp_buf

#endif                          /* } */

#endif                          /* } */

每条线程 L 中保存了当前的 longjmp 返回点： errorJmp ，其结构定义为 struct lua_longjmp 。这是一条链表，每次运行一段受保护的 Lua 代码，都会生成一个新的错误返回点，链到这条链表上。

/* chain list of long jump buffers */
struct lua_longjmp {
struct lua_longjmp *previous;
luai_jmpbuf b;
volatile int status;  /* error code */
};

设置 longjmp 返回点是由 luaD_rawrunprotected 完成的。

int luaD_rawrunprotected (lua_State *L, Pfunc f, void *ud) {
unsigned short oldnCcalls = L->nCcalls;
struct lua_longjmp lj;
lj.status = LUA_OK;
lj.previous = L->errorJmp;  /* chain new error handler */
L->errorJmp = &lj;
LUAI_TRY(L, &lj,
(*f)(L, ud);
);
L->errorJmp = lj.previous;  /* restore old error handler */
L->nCcalls = oldnCcalls;
return lj.status;
}

这段代码很容易理解：设置新的 jmpbuf，串到链表上，调用函数。调用完成后恢复进入时状态。

如果想回直接返回到最近的错误恢复点，只需要调用 longjmp。Lua 使用一个内部API luaD_throw 封装了这个过程。

l_noret luaD_throw (lua_State *L, int errcode) {
if (L->errorJmp) {  /* thread has an error handler? */
L->errorJmp->status = errcode;  /* set status */
LUAI_THROW(L, L->errorJmp);  /* jump to it */
}
else {  /* thread has no error handler */
global_State *g = G(L);
L->status = cast_byte(errcode);  /* mark it as dead */
if (g->mainthread->errorJmp) {  /* main thread has a handler? */
setobjs2s(L, g->mainthread->top++, L->top - 1);  /* copy error obj. */
luaD_throw(g->mainthread, errcode);  /* re-throw in main thread */
}
else {  /* no handler at all; abort */
if (g->panic) {  /* panic function? */
seterrorobj(L, errcode, L->top);  /* assume EXTRA_STACK */
if (L->ci->top < L->top)
L->ci->top = L->top;  /* pushing msg. can break this invariant */
lua_unlock(L);
g->panic(L);  /* call panic function (last chance to jump out) */
}
abort();
}
}
}

考虑到新构造的线程可能在不受保护的情况下运行。这时的任何错误都必须被捕获，不能让程序崩溃。这种情况合理的处理方式就是把正在运行的线程标记为死线程，并且在主线程中抛出异常。

函数调用

函数调用分为受保护调用和不受保护的调用。

受保护的函数调用可以看到一个 C 层面意义上完整的过程。在Lua 代码中，pcall 是用函数而不是语音机制完成的。受保护的函数调用一定在 C 层面进出一次调用栈。

它使用一个独立的内部API luaD_pcall 来实现。公开 API luaD_pcallk 仅仅是对它做了一些封装。

int luaD_pcall (lua_State *L, Pfunc func, void *u,
ptrdiff_t old_top, ptrdiff_t ef) {
int status;
CallInfo *old_ci = L->ci;
lu_byte old_allowhooks = L->allowhook;
unsigned short old_nny = L->nny;
ptrdiff_t old_errfunc = L->errfunc;
L->errfunc = ef;
status = luaD_rawrunprotected(L, func, u);
if (status != LUA_OK) {  /* an error occurred? */
StkId oldtop = restorestack(L, old_top);
luaF_close(L, oldtop);  /* close possible pending closures */
seterrorobj(L, status, oldtop);
L->ci = old_ci;
L->allowhook = old_allowhooks;
L->nny = old_nny;
luaD_shrinkstack(L);
}
L->errfunc = old_errfunc;
return status;
}

从这段代码我们可以看到 pcall 的处理模式：用 C 层面的堆栈来保护和恢复状态。

L->ci、L->allowhook、L->nny（ nny 的全称是 number of

non-yieldable calls。由于 C 语言本身无法提供延续点的支持，所以 Lua 也无法让所有函数都是 yieldable 的。当一级函数处于 non-yieldable 状态时，更深的层次都无法 yieldable。这个变量用于监督这个状态，在错误发生报告。每级 C 调用是否允许 yield 取决于是否有设置 C 延续点，或是 Lua 内核实现时认为这次调用在发生 yield 时无法正确处理。这些都是由 luaD_call 的最后一个参数来制定。）、L->errfunc 都保存在 luaD_pcall 的 C 堆栈上，一旦 luaD_rawrunprotected 就可以正确恢复。

luaD_rawrunprotected 没有正确返回时，需要根据 old_top 找到堆栈上刚才调用的函数，给它做收尾工作（调用luaF_close 涉及 upvalue 的 gc 流程）。

因为 luaD_rawrunprotected 调用的是一个函数对象，而不是数据栈上的索引，这就需要额外的变量来定位了。

这里使用 restorestack 这个宏来定位栈上的地址，是因为数据栈的内存地址是会随着数据栈的大小而变化。保存地址是不可能的，而应该记住一个相对量。 savestack 和 restorestack这两个宏就是做这个工作的。

#define savestack(L,p) ((char )(p) - (char )L->stack)

#define restorestack(L,n) ((TValue )((char )L->stack + (n)))

一般的 Lua 层面的函数调用并不对应一个 C 层面上函数调用行为。对于Lua 函数而言，应该看成是生成新的 CallInfo，修正数据栈，然后把字节码的执行位置调整到被调用的函数开头。而Lua 函数的return 操作则做了相反的操作，恢复数据栈，弹出 CallInfo ，修改字节码的执行位置，恢复到原有的执行序列上。

理解了这一点就能明白，在底层 API 中，为何分为 luaD_precall 和 luaD_poscall 。

luaD_precall 执行的是函数调用部分的工作，而 luaD_poscall 做的是函数返回的工作。

从 C 层面看，层层函数调用的过程并不是递归的。对于C 类型的函数调用，整个函数调用时完整的，不需要等待后续再调用 luaD_poscall，所以 luaD_precall 可以替代完成，并返回1；而Lua 函数，执行完 luaD_precall 后，只是切换了 lua_State 的执行状态，被调用的函数的字节码尚未运行，luaD_precall 返回 0。待到Lua VM 执行到对应的 OP_RETURN 指令时，才会去调用 luaD_poscall 完成整次调用。

call

/*
** returns true if function has been executed (C function)
*/
int luaD_precall (lua_State *L, StkId func, int nresults) {
lua_CFunction f;
CallInfo *ci;
int n;  /* number of arguments (Lua) or returns (C) */
ptrdiff_t funcr = savestack(L, func);
switch (ttype(func)) {
case LUA_TLCF:  /* light C function */
f = fvalue(func);
goto Cfunc;
case LUA_TCCL: {  /* C closure */
f = clCvalue(func)->f;
Cfunc:
luaD_checkstack(L, LUA_MINSTACK);  /* ensure minimum stack size */
ci = next_ci(L);  /* now 'enter' new function */
ci->nresults = nresults;
ci->func = restorestack(L, funcr);
ci->top = L->top + LUA_MINSTACK;
lua_assert(ci->top <= L->stack_last);
ci->callstatus = 0;
luaC_checkGC(L);  /* stack grow uses memory */
if (L->hookmask & LUA_MASKCALL)
luaD_hook(L, LUA_HOOKCALL, -1);
lua_unlock(L);
n = (*f)(L);  /* do the actual call */
lua_lock(L);
api_checknelems(L, n);
luaD_poscall(L, L->top - n);
return 1;
}
case LUA_TLCL: {  /* Lua function: prepare its call */
StkId base;
Proto *p = clLvalue(func)->p;
n = cast_int(L->top - func) - 1;  /* number of real arguments */
luaD_checkstack(L, p->maxstacksize);
for (; n < p->numparams; n++)
setnilvalue(L->top++);  /* complete missing arguments */
if (!p->is_vararg) {
func = restorestack(L, funcr);
base = func + 1;
}
else {
base = adjust_varargs(L, p, n);
func = restorestack(L, funcr);  /* previous call can change stack */
}
ci = next_ci(L);  /* now 'enter' new function */
ci->nresults = nresults;
ci->func = func;
ci->u.l.base = base;
ci->top = base + p->maxstacksize;
lua_assert(ci->top <= L->stack_last);
ci->u.l.savedpc = p->code;  /* starting point */
ci->callstatus = CIST_LUA;
L->top = ci->top;
luaC_checkGC(L);  /* stack grow uses memory */
if (L->hookmask & LUA_MASKCALL)
callhook(L, ci);
return 0;
}
default: {  /* not a function */
luaD_checkstack(L, 1);  /* ensure space for metamethod */
func = restorestack(L, funcr);  /* previous call may change stack */
tryfuncTM(L, func);  /* try to get '__call' metamethod */
return luaD_precall(L, func, nresults);  /* now it must be a function */
}
}
}

Light C function 和 C closure 仅仅是在存储上有所不同，处理逻辑是一致的：压入新的 CallInfo，把数据栈栈顶设置好。调用 C 函数，然后 luaD_poscall。

Lua 函数要复杂一些：先通入函数对象在数据栈上的位置和栈顶差，计算出数据栈上的调用参数个数 n 。如果 Lua 函数对输入参数个数有明确的最小要求，这点可以通过查询函数原型 numparams 字段获知；如果栈上提供的参数数量不足，就需要把不足的部分补为 nil。当调用函数需要可变参数的时候，还需要进一步处理：

static StkId adjust_varargs (lua_State *L, Proto *p, int actual) {
int i;
int nfixargs = p->numparams;
StkId base, fixed;
lua_assert(actual >= nfixargs);
/* move fixed parameters to final position */
luaD_checkstack(L, p->maxstacksize);  /* check again for new 'base' */
fixed = L->top - actual;  /* first fixed argument */
base = L->top;  /* final position of first argument */
for (i=0; i<nfixargs; i++) {
setobjs2s(L, L->top++, fixed + i);
setnilvalue(fixed + i);
}
return base;
}

变长参数表这个概念，只在Lua 函数中出现。当一个函数接收变长参数时，这部分的参数是放在上一级数据栈帧尾部。 adjust_varargs 将需要固定参数复制到被调用的函数的新一级数据栈帧上，而变长参数留在原地。

接下来，要构造出新的一层调用栈 CallInfo。这个结构需要初始化字节码执行指针 savepc ，将其指向Lua 函数对象中的字节指令区。

Lua 函数整体所需要的栈空间是在生成字节码时就已知的，所以可以用 luaD_checkstack 一次性分配好。 CallInfo 中的栈顶可以直接调整到位。CallInfo 中的返回参数个数，所引用的函数对象一一初始化完毕，最后初始化线程运行状态 callstatus ，标记上 CIST_LUA 就够了。

真正如何运行 Lua 函数，是由调用 luaD_precall 者决定的。

有些对象是通过元表驱动函数调用的行为的，这时需要通过 tryfuncTM 函数取得真正的调用函数。

static void tryfuncTM (lua_State *L, StkId func) {
const TValue *tm = luaT_gettmbyobj(L, func, TM_CALL);
StkId p;
if (!ttisfunction(tm))
luaG_typeerror(L, func, "call");
/* Open a hole inside the stack at 'func' */
for (p = L->top; p > func; p--)
setobjs2s(L, p, p-1);
L->top++;  /* slot ensured by caller */
setobj2s(L, func, tm);  /* tag method is the new function to be called */
}

根据Lua 的定义，通过元方法进行的函数调用和原生的函数调用有所区别。通过元方法进行的函数调用，会将对象自身作为第一个参数传入。这就需要移动数据栈，把对象插到第一个参数的位置。这个过程在源代码中有清晰的展示。

return

luaD_poscall 做的工作很简单，主要是数据栈的调整工作。

int luaD_poscall (lua_State *L, StkId firstResult) {
StkId res;
int wanted, i;
CallInfo *ci = L->ci;
if (L->hookmask & (LUA_MASKRET | LUA_MASKLINE)) {
if (L->hookmask & LUA_MASKRET) {
ptrdiff_t fr = savestack(L, firstResult);  /* hook may change stack */
luaD_hook(L, LUA_HOOKRET, -1);
firstResult = restorestack(L, fr);
}
L->oldpc = ci->previous->u.l.savedpc;  /* 'oldpc' for caller function */
}
res = ci->func;  /* res == final position of 1st result */
wanted = ci->nresults;
L->ci = ci = ci->previous;  /* back to caller */
/* move results to correct place */
for (i = wanted; i != 0 && firstResult < L->top; i--)
setobjs2s(L, res++, firstResult++);
while (i-- > 0)
setnilvalue(res++);
L->top = res;
return (wanted - LUA_MULTRET);  /* 0 iff wanted == LUA_MULTRET */
}

根据 luaD_precall 设置在CalInfo 里的返回参数的个数 nresult，以及数据栈上在这次函数调用中实际新增的数据个数，需要对数据栈做一次调整。多余的抛弃，不足的补为 nil。

luaD_call

luaD_call 主要用来实现外部 API lua_callk。它调用完 luaD_precall 后，接着调用 luaV_execute 完成对函数本身的字节码执行。

luaD_call 的最后一个参数用来标示这次调用是否可以在其中挂起。因为在 Lua VM 执行期间，有许多情况都会引起新的一层 C 层面的函数调用。

Lua 线程并不拥有独立的 C 堆栈，所以对于发生在 C 函数内部的线程挂起操作，不是所有情况都正确处理的。是否接受 yield 操作，只有调用 luaD_call 才清楚。

void luaD_call (lua_State *L, StkId func, int nResults, int allowyield) {
if (++L->nCcalls >= LUAI_MAXCCALLS) {
if (L->nCcalls == LUAI_MAXCCALLS)
luaG_runerror(L, "C stack overflow");
else if (L->nCcalls >= (LUAI_MAXCCALLS + (LUAI_MAXCCALLS>>3)))
luaD_throw(L, LUA_ERRERR);  /* error while handing stack error */
}
if (!allowyield)
L->nny++;
if (!luaD_precall(L, func, nResults))  /* is a Lua function? */
luaV_execute(L);  /* call it */
if (!allowyield)
L->nny--;
L->nCcalls--;
}

如前面所述，Lua VM 在解析字节码执行的过程中，对 Lua 函数的调用并不直接使用 luaD_call。它不会产生 C 层面的函数调用行为，就可以尽量不引起 C 函数中挂起线程的问题。但在某些情况上的处理，也这么做的话会让 VM 的实现变得相当复杂。这些情况包括 for 语句引起的函数调用以及触发元方法引起的函数调用。Lua 利用 luaD_call ，可以简化实现。

从 C 函数中挂起线程

如何让 C 函数正确的配合工作？

C 语言是不支持延续点这个特性的。如果你从 C 函数中利用 longjmp 跳出，就再也回不到跳出点了。这对 Lua 工作在 VM 字节码上的大部分特性都不是问题。

但是，pcall 和元表都涉及 C 函数调用，有这样的限制，让 Lua 不那么完整。Lua 应用一系统的技巧，绕开了这个限制，支持了 yieldable pcall and metamethods。

在Lua 文档中，可以找到这么一小节： Handing Yields in C 就是围绕解决这个难题展开的。首先我们来看问题的产生：

resume 的发起总是通过一次 lua_resume 的调用，在 Lua5.1 以前，yield 的调用必定结束于一次 lua_yield 调用，而调用它的 C 函数必须立刻返回。中间不能有任何 C 函数执行到中途的状态。这样，Lua VM 才能正常工作。

(C) lua_resume -> Lua functions -> coroutine.yield -> (C) lua yield -> (C) return

在这个流程中，无论Lua functions 有多少层，都被 lua_State 的调用栈管理。所以当最后 C return 返回到最初 resume 点，都不存在什么问题，可以让下一次 resume 正确继续。也就是说，在 yield 时， Lua 调用栈上可以有没有执行完的 Lua 函数，但不可以有没有执行完的 C 函数。

如果我们写了一个 C 扩展，在 C function 里回调了传入的一个 Lua 函数。情况就变得不一样了。

(C) lua_resume -> Lua function -> C function -> (C) lua_call -> Lua function -> coroutine.yield -> (C) lua_yield

C 通过 lua_call 调用的 Lua 函数中再调用 coroutine.yield 会导致在 yield 之后，再次 resume 时，不再可能从 lua_call 的下一行继续执行。 Lua 在遇到这种情况时，会抛出一个异常“attempt to yield across a C-call boundary”

在 C 和 Lua 的边界，如果在 yield 之后，resume 如何继续运行 C 边界之后的 C 代码？

所有 Lua 协程共享一个 C 堆栈。可以使用 longjmp 从调用深处跳出来，却无法回到那个位置。因为一旦跳出，堆栈就被破坏。

C 进入 Lua 的边界一共有四个 API：lua_call、lua_pcall、lua_resume、lua_yield 。其中要解决的关键问题在于调用一个 Lua 函数，却可能有两条返回路径。

Lua 函数的正常返回应该执行 lua_call 调用后面的 C 代码，而中途如果 yield 发生，会导致执行序回到前面 lua_resume 调用处的下一行 C 代码执行。

对于后一种，再次调用 lua_resume ，还需要回到 lua_call 之后完成后续的 C 执行逻辑。 C 语言是不允许这样做的，因为当初的 C 堆栈已经不存在了。

Lua 5.2 改造了 API lua_callk 来解决这个问题。既然在yield 之后， C 的执行序无法回到 lua_callk 的下一行代码，那么就让 C 语言使用者自己提供一个 Continuation 函数 k 来继续。

我们可以这里理解 k 这个参数：当lua_callk 调用的 Lua 函数中没有发生yield 时，它会正常返回。一旦发生 yield，调用者要明白， C 代码无法正常延续，而 Lua VM 会在需要延续时调用 k 来完成后续工作。 k 会得到正确的 L 保存正确的 lua_State状态，看起来就好像用一个新的 C 执行序替代掉原来的 C 执行序一样。

一个容易理解的用法就是一个 C 函数调用的最后使用 lua_callk ：

lua_callk(L, 0, LUA_MULTRET, 0, k);
return k(L);

static int luaB_dofile (lua_State *L) {
const char *fname = luaL_optstring(L, 1, NULL);
lua_settop(L, 1);
if (luaL_loadfile(L, fname) != LUA_OK)
return lua_error(L);
lua_callk(L, 0, LUA_MULTRET, 0, dofilecont);
return dofilecont(L, 0, 0);
}

也就是把 callk 后面的执行逻辑放在一个独立 C 函数 k中，分别在 callk 后调用它，或是传递给框架，让框架在resume 后调用。这里 Lua 状态机的状态被正确保存在 L 中，而C 函数堆栈会被yield 后被破坏掉。如果我们需要在 k 中得到延续点前的 C 函数状态怎么办呢？ Lua 提供了lua_KContext ctx 用于辅助记录 C 中的状态。

Lua 的线程结构 L 中保存有完整的 CallInfo 调用栈。当 C 层面的调用栈被破坏时，尚未返回的 C 函数会在切入 Lua VM 前在 CallInfo 中留下延续点函数。原本在 C 层面利用原生代码和系统提供的 C 堆栈维系的 C 函数调用线索，被平坦化为 L 里 CallInfo 中的一个个延续点函数。想延续一个 C 调用栈被破坏掉的 Lua 线程，只需要依次调用 CallInfo 中的延续点函数就能完成同样的执行逻辑。

挂起与延续

理解了Lua 5.2 对线程挂起和延续的处理方式，再来看相关代码要容易理解一些。

中断并挂起一个线程和线程的执行发生异常，这两种情况对 Lua VM 的执行来说是类似的。都是利用 luaD_throw 回到最近的保护点。

不同的是，线程的状态不同。主动挂起需要调用 API lua_yieldk ，把当前执行处的函数保存到CallInfo 的 extra 中，并设置线程状态为 LUA_YIELD, 然后抛出异常。

和异常抛出不同，yield 只可能被lua_yieldk 触发，这是一个 C API，而不是 Lua VM 的指令。也就是说， yield 必然来源于某次 C 函数调用，从 luaD_call 或 luaD_pcall 中退出的。这比异常的发生点药少的多。

先看下 lua_yieldk 的实现。

LUA_API int lua_yieldk (lua_State *L, int nresults, lua_KContext ctx,
lua_KFunction k) {
CallInfo *ci = L->ci;
luai_userstateyield(L, nresults);
lua_lock(L);
api_checknelems(L, nresults);
if (L->nny > 0) {
if (L != G(L)->mainthread)
luaG_runerror(L, "attempt to yield across a C-call boundary");
else
luaG_runerror(L, "attempt to yield from outside a coroutine");
}
L->status = LUA_YIELD;
ci->extra = savestack(L, ci->func);  /* save current 'func' */
if (isLua(ci)) {  /* inside a hook? */
api_check(k == NULL, "hooks cannot continue after yielding");
}
else {
if ((ci->u.c.k = k) != NULL)  /* is there a continuation? */
ci->u.c.ctx = ctx;  /* save context */
ci->func = L->top - nresults - 1;  /* protect stack below results */
luaD_throw(L, LUA_YIELD);
}
lua_assert(ci->callstatus & CIST_HOOKED);  /* must be inside a hook */
lua_unlock(L);
return 0;  /* return to 'luaD_hook' */
}

不是所有的C 函数都可以正常恢复，只要调用层次上面有一个这样的 C 函数， yield 就无法正确工作。这是由 nny 的值来检测的。

lua_yieldk 函数是一个公开的API ，只用于给 Lua 程序编写 C 扩展模块使用。所以处于整个函数内部时，一定处于一个 C 函数调用中。但钩子函数的运行是个例外。 HOOK 函数本身就是一个 C 函数，但是并不是通常正常的 C API 调用进来的。在Lua 函数中触发钩子会认为当前状态是处于Lua 函数执行中。这个时候允许yield 线程，但无法正确的处理 C 层面的延续点，所以禁止传入延续点函数。而对于正常的 C 调用，允许修改延续点 k 来改变执行流程。这里只需要简单的把 k 和 ctx 设入 L ，其它的活都交给 resume 去处理就可以了。

lua_resume 的过程要复杂的多，先列出代码，再分析。

LUA_API int lua_resume (lua_State *L, lua_State *from, int nargs) {
int status;
int oldnny = L->nny;  /* save "number of non-yieldable" calls */
lua_lock(L);
luai_userstateresume(L, nargs);
L->nCcalls = (from) ? from->nCcalls + 1 : 1;
L->nny = 0;  /* allow yields */
api_checknelems(L, (L->status == LUA_OK) ? nargs + 1 : nargs);
status = luaD_rawrunprotected(L, resume, L->top - nargs);
if (status == -1)  /* error calling 'lua_resume'? */
status = LUA_ERRRUN;
else {  /* continue running after recoverable errors */
while (errorstatus(status) && recover(L, status)) {
/* unroll continuation */
status = luaD_rawrunprotected(L, unroll, &status);
}
if (errorstatus(status)) {  /* unrecoverable error? */
L->status = cast_byte(status);  /* mark thread as 'dead' */
seterrorobj(L, status, L->top);  /* push error message */
L->ci->top = L->top;
}
else lua_assert(status == L->status);  /* normal end or yield */
}
L->nny = oldnny;  /* restore 'nny' */
L->nCcalls--;
lua_assert(L->nCcalls == ((from) ? from->nCcalls : 0));
lua_unlock(L);
return status;
}

lua_resume 开始运行时，等价于一次保护性调用。所以它是允许直接调用的 C 函数 yield 的。这里把 nny 设置为 0 开启。然后利用对 resume 函数的保护调用来进行前半段工作。

static void resume (lua_State *L, void *ud) {
int nCcalls = L->nCcalls;
StkId firstArg = cast(StkId, ud);
CallInfo *ci = L->ci;
if (nCcalls >= LUAI_MAXCCALLS)
resume_error(L, "C stack overflow", firstArg);
if (L->status == LUA_OK) {  /* may be starting a coroutine */
if (ci != &L->base_ci)  /* not in base level? */
resume_error(L, "cannot resume non-suspended coroutine", firstArg);
/* coroutine is in base level; start running it */
if (!luaD_precall(L, firstArg - 1, LUA_MULTRET))  /* Lua function? */
luaV_execute(L);  /* call it */
}
else if (L->status != LUA_YIELD)
resume_error(L, "cannot resume dead coroutine", firstArg);
else {  /* resuming from previous yield */
L->status = LUA_OK;  /* mark that it is running (again) */
ci->func = restorestack(L, ci->extra);
if (isLua(ci))  /* yielded inside a hook? */
luaV_execute(L);  /* just continue running Lua code */
else {  /* 'common' yield */
if (ci->u.c.k != NULL) {  /* does it have a continuation function? */
int n;
lua_unlock(L);
n = (*ci->u.c.k)(L, LUA_YIELD, ci->u.c.ctx); /* call continuation */
lua_lock(L);
api_checknelems(L, n);
firstArg = L->top - n;  /* yield results come from continuation */
}
luaD_poscall(L, firstArg);  /* finish 'luaD_precall' */
}
unroll(L, NULL);  /* run continuation */
}
lua_assert(nCcalls == L->nCcalls);
}

如果 resume 是重新启动一个函数，那么只需要按和 luaD_call 相同的正常的调用流程进行。

若需要延续之前的调用，如上文所述，之前只可能从一次 C 调用中触发 lua_yieldk 挂起。但钩子函数是一个特殊情况，它是一个 C 函数，却看起来在 Lua 中。这时从 CallInfo 中的 extra 取出上次运行到的函数，可以识别出这个情况。

当它是一个 Lua 调用，那么必然是从钩子函数中切出的，不会有被打断的 Lua VM 指令，直接通过 lua_exwcute 函数继续它的字节码解析执行流程。若是 C 函数，按照延续点的约定，调用延续点 k，之后经过 luaD_poscall 完成这次调用。

上述事情做完之后，不一定完成了所有的工作。这是因为之前完整的调用层次，包含在 L 的 CallInfo中，而不是存在于当前的 C 调用栈上。如果检查到 Lua 的调用栈上有未尽的工作，必须完成它。这项工作可通过 unroll 函数完成。

static void unroll (lua_State *L, void *ud) {
if (ud != NULL)  /* error status? */
finishCcall(L, *(int *)ud);  /* finish 'lua_pcallk' callee */
while (L->ci != &L->base_ci) {  /* something in the stack */
if (!isLua(L->ci))  /* C function? */
finishCcall(L, LUA_YIELD);  /* complete its execution */
else {  /* Lua function */
luaV_finishOp(L);  /* finish interrupted instruction */
luaV_execute(L);  /* execute down to higher C 'boundary' */
}
}
}

unroll 发现 L 中的当前函数如果是一个 Lua 函数时，由于字节码的解析过程也可能因为触发元方法等情况调用 luaD_call 而从中间中断。故需要先调用 luaV_finishOp 函数，再交给 luaV_execute 函数开启 VM 来执行未完成的字节码。

当执行流中断于一次 C 函数调用，finishCcall 函数能完成当初执行了一半的 C 函数的剩余工作。

static void finishCcall (lua_State *L, int status) {
CallInfo *ci = L->ci;
int n;
/* must have a continuation and must be able to call it */
lua_assert(ci->u.c.k != NULL && L->nny == 0);
/* error status can only happen in a protected call */
lua_assert((ci->callstatus & CIST_YPCALL) || status == LUA_YIELD);
if (ci->callstatus & CIST_YPCALL) {  /* was inside a pcall? */
ci->callstatus &= ~CIST_YPCALL;  /* finish 'lua_pcall' */
L->errfunc = ci->u.c.old_errfunc;
}
/* finish 'lua_callk'/'lua_pcall'; CIST_YPCALL and 'errfunc' already
handled */
adjustresults(L, ci->nresults);
/* call continuation function */
lua_unlock(L);
n = (*ci->u.c.k)(L, status, ci->u.c.ctx);
lua_lock(L);
api_checknelems(L, n);
/* finish 'luaD_precall' */
luaD_poscall(L, L->top - n);
}

前面曾提到过，此时线程一定处于健康的状态。那么之前的工作肯定终止于 lua_callk 或 lua_pcallk。这时，应该先完成 lua_callk 没完成的工作（lua_callk 和 lua_pcallk 在调用完 luaD_call 后，后续的代码没有区别，都是 adjustresults(L, ci->nresults); 可以一致对待）；然后调用 C 函数中设置的延续点函数；由于这时一次未完成的 C 函数调用，那么一定来源于一次被中断的 luaD_precall，收尾的工作还剩下 luaD_poscall(L, L->top - n);

当 resume 这前半段工作完成，结果要么是一切顺利，状态码为 LUA_OK 结束或是 LUA_YIELD 主动挂起。那么就没有太多剩下的工作。L 的状态是完全正常的。可当代码中有错误发生时，问题就复杂一些。

从定义上说，lua_resume 函数需要具有捕获错误的能力。同样有这个能力的还有 lua_pcallk。如果在调用栈上，有 lua_pcallk 优先于它捕获错误，那么执行流应该交到 lua_pcallk 之后，也就是 lua_pcallk 设置的延续点函数（）。

对 lua_resume 函数来说，错误被 lua_pcallk 捕获了，程序应该继续运行。它就有责任完成延续点的约定。这是用 recover 和 unroll 函数完成的。

static int recover (lua_State *L, int status) {
StkId oldtop;
CallInfo *ci = findpcall(L);
if (ci == NULL) return 0;  /* no recovery point */
/* "finish" luaD_pcall */
oldtop = restorestack(L, ci->extra);
luaF_close(L, oldtop);
seterrorobj(L, status, oldtop);
L->ci = ci;
L->allowhook = getoah(ci->callstatus);  /* restore original 'allowhook' */
L->nny = 0;  /* should be zero to be yieldable */
luaD_shrinkstack(L);
L->errfunc = ci->u.c.old_errfunc;
return 1;  /* continue running the coroutine */
}

recover 函数用来把错误引导到调用栈上最近的 lua_pcallk 的延续点上。

首先回溯CallInfo 栈，找到从 C 中调用 lua_pcallk 的位置。这次 lua_pcallk 一定从luaD_pcall 中被打断。

接下来就必须完成 luaD_pcall 本应该完成却没有机会去做的事情。所以我们会看到，接下来的代码和 luaD_pcall 的后半部分非常相似。

最后需要把线程运行状态设上 CIST_STAT 标记让 unroll函数正确的设置线程状态。然后只需要保护性调用unroll 函数来依据 Lua 调用栈执行逻辑上后续的流程。

回到 lua_resume 函数，其中的参数 from ，它是用来更准确的统计 C 调用栈的层级的。 nCcalls 的意义在于当发生无穷递归后，Lua VM 可以先于 C 层面的堆栈溢出导致的毁灭性错误之前，捕获到这种情况，完全的抛出异常。由于现在可以在 C 函数中切出，那么发起resume 的位置可能处于逻辑上调用层次较深的位置。这就需要调用者出入resume 的调用来源线程，正确的计算 NCcalls。

lua_callk 和 lua_pcallk

有了上面的基础，公开 API lua_callk 和 lua_pcallk 函数就能理解清楚了。

lua_callk 只是对 luaD_call 的简单封装。在调用之前，根据需要把延续点 k 以及 ctx 设置到当前的 CallInfo 结构中。

LUA_API void lua_callk (lua_State *L, int nargs, int nresults,
lua_KContext ctx, lua_KFunction k) {
StkId func;
lua_lock(L);
api_check(k == NULL || !isLua(L->ci),
"cannot use continuations inside hooks");
api_checknelems(L, nargs+1);
api_check(L->status == LUA_OK, "cannot do calls on non-normal thread");
checkresults(L, nargs, nresults);
func = L->top - (nargs+1);
if (k != NULL && L->nny == 0) {  /* need to prepare continuation? */
L->ci->u.c.k = k;  /* save continuation */
L->ci->u.c.ctx = ctx;  /* save context */
luaD_call(L, func, nresults, 1);  /* do the call */
}
else  /* no continuation or no yieldable */
luaD_call(L, func, nresults, 0);  /* just do the call */
adjustresults(L, nresults);
lua_unlock(L);
}

lua_pcallk 函数类似，只是对 luaD_pcall 的简单封装。

如果不需要延续点的支持或是处于不能被挂起的状态，那么，简单的调用 luaD_pcall 函数就可以。否则不能设置保护点，而改在调用前设置好延续点以及 ctx，并将线程状态标记为 CIST_YPCALL。这样在resume 过程中被 recover函数找到。

LUA_API int lua_pcallk (lua_State *L, int nargs, int nresults, int errfunc,
lua_KContext ctx, lua_KFunction k) {
struct CallS c;
int status;
ptrdiff_t func;
lua_lock(L);
api_check(k == NULL || !isLua(L->ci),
"cannot use continuations inside hooks");
api_checknelems(L, nargs+1);
api_check(L->status == LUA_OK, "cannot do calls on non-normal thread");
checkresults(L, nargs, nresults);
if (errfunc == 0)
func = 0;
else {
StkId o = index2addr(L, errfunc);
api_checkstackindex(errfunc, o);
func = savestack(L, o);
}
c.func = L->top - (nargs+1);  /* function to be called */
if (k == NULL || L->nny > 0) {  /* no continuation or no yieldable? */
c.nresults = nresults;  /* do a 'conventional' protected call */
status = luaD_pcall(L, f_call, &c, savestack(L, c.func), func);
}
else {  /* prepare continuation (call is already protected by 'resume') */
CallInfo *ci = L->ci;
ci->u.c.k = k;  /* save continuation */
ci->u.c.ctx = ctx;  /* save context */
/* save information for error recovery */
ci->extra = savestack(L, c.func);
ci->u.c.old_errfunc = L->errfunc;
L->errfunc = func;
setoah(ci->callstatus, L->allowhook);  /* save value of 'allowhook' */
ci->callstatus |= CIST_YPCALL;  /* function can do error recovery */
luaD_call(L, c.func, nresults, 1);  /* do the call */
ci->callstatus &= ~CIST_YPCALL;
L->errfunc = ci->u.c.old_errfunc;
status = LUA_OK;  /* if it is here, there were no errors */
}
adjustresults(L, nresults);
lua_unlock(L);
return status;
}

异常处理

Lua 的内部运行期异常，即错误码为 LUA_ERRRUN的那个，都是直接或间接的由 luaG_errormsg 函数抛出的。

按Lua 的约定，这类异常会在数据栈上留下错误信息，或是调用一个用户定义的错误处理函数。

l_noret luaG_errormsg (lua_State *L) {
if (L->errfunc != 0) {  /* is there an error handling function? */
StkId errfunc = restorestack(L, L->errfunc);
setobjs2s(L, L->top, L->top - 1);  /* move argument */
setobjs2s(L, L->top - 1, errfunc);  /* push function */
L->top++;  /* assume EXTRA_STACK */
luaD_call(L, L->top - 2, 1, 0);  /* call it */
}
luaD_throw(L, LUA_ERRRUN);
}

它尝试从 L中读出 errfunc , 并使用 luaD_call 函数调用它。如果在 errfunc 里再次出错，会继续调用自己。这样就有可能在错误处理函数中递归下去。但调用达到一定层次后， nCcalls 会超过上限最终产生一个 LUA_ERRERR 终止这个过程。

公开的API lua_error 函数是对它的简单封装。

LUALIB_API int luaL_error (lua_State *L, const char *fmt, ...) {
va_list argp;
va_start(argp, fmt);
luaL_where(L, 1);
lua_pushvfstring(L, fmt, argp);
va_end(argp);
lua_concat(L, 2);
return lua_error(L);
}

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： lua

相关文章推荐

新的分享

章节导航