您的位置:首页 > 运维架构

Read-copy-update

2013-10-14 00:05 204 查看
From Wikipedia, the free encyclopedia



In computer operating systems, read-copy-update (RCU)
is a synchronization mechanism implementing a kind of mutual
exclusion[note 1] which
can sometimes be used as an alternative to a readers-writer lock. It allows
extremely low overhead, wait-free reads. However, RCU updates
can be expensive, as they must leave the old versions of the data structure in place to accommodate pre-existing readers. These old versions are reclaimed after all pre-existing readers finish their accesses.


Contents

[hide]

1 Overview
2 Uses
3 Advantages
and disadvantages
4 Patents
5 Sample
RCU interface
6 Simple
implementation
7 Analogy
with reader-writer locking
8 Name
9 History
10 See
also
11 Notes
12 References
13 External
links


Overview[edit]

A key property of RCU is that readers can access a data structure even when it is in the process of being updated: There is absolutely nothing that RCU updaters can do to block readers or even
to force them to retry their accesses. Some people find this property to be surprising and even counter-intuitive, but this property is absolutely essential to RCU. This overview therefore starts by showing how data can be safely inserted into and deleted
from linked structures despite concurrent readers. The following diagram depicts a four-state insertion procedure, with time advancing from left to right:









The first state shows a global pointer named "gptr" that is initially NULL, colored red to indicate that it might be accessed by a reader at any time, thus requiring updaters to take care.
Allocating memory for a new structure transitions to the second state. This structure has indeterminate state (indicated by the question marks) but is inaccessible to readers (indicated by the green color). Because the structure is inaccessible to readers,
the updater may carry out any desired operation without fear of disrupting concurrent readers. Initializing this new structure transitions to the third state, which shows the initialized values of the structure's fields. Assigning a reference to this new structure
to gptr transitions to the fourth and final state. In this state, the structure is accessible to readers, and is therefore colored red. The rcu_assign_pointer() primitive is used to carry out this assignment, and ensures that the assignment is atomic in the
sense that concurrent readers will either see a NULL pointer or a valid pointer to the new structure, but not some mash-up of the two values. Additional properties of rcu_assign_pointer() are described later in this article.
This procedure demonstrates how new data may be inserted into a linked data structure even though readers are concurrently traversing the data structure before, during, and after the insertion.
The following diagram depicts a four-state deletion procedure, again with time advancing from left to right:









The first state shows a linked list containing elements A, B, and C. All three elements are colored red to indicate that an RCU reader might reference any of them at any time. Using list_del_rcu()
to remove element B from this list transitions to the second state. Note that the link from element B to C is left intact in order to allow readers currently referencing element B to traverse the remainder of the list. Readers accessing the link from element
A will either obtain a reference to element B or element C, but either way, each reader will see a valid and correctly formatted linked list. Element B is now colored yellow to indicate that while pre-existing readers might still have a reference to element
B, new readers have no way to obtain a reference. A wait-for-readers operation transitions to the third state. Note that this wait-for-readers operation need only wait for pre-existing readers, but not new readers. Element B is now colored green to indicate
that readers can no longer be referencing it. Therefore, it is now safe for the updater to free element B, thus transitioning to the fourth and final state.
It is important to reiterate that in the second state different readers can see two different versions of the list, either with or without element B. In other words, RCU provides coordination
in space (different versions of the list) as well as in time (different states in the deletion procedures). This is in stark contrast with more traditional synchronization primitives such as locking or transactions that
coordinate in time, but not in space. RCU's use of both space and time allows exceedingly fast and scalable readers.
This procedure demonstrates how old data may be removed from a linked data structure even though readers are concurrently traversing the data structure before, during, and after the deletion.
Given insertion and deletion, a wide variety of data structures can be implemented using RCU.
RCU's readers execute within read-side critical sections, which are normally delimited by rcu_read_lock() and rcu_read_unlock(). Any statement that is not within an RCU read-side critical
section is said to be in a quiescent state, and such statements are not permitted to hold references to RCU-protected data structures, nor is the wait-for-readers operation required to wait for threads in quiescent states. Any time period during which
each thread resides at least once in a quiescent state is called a grace period. By definition, any RCU read-side critical section in existence at the beginning of a given grace period must complete before the end of that grace period, which constitutes
the fundamental guarantee provided by RCU. In addition, the wait-for-readers operation must wait for at least one grace period to elapse. It turns out that this guarantee can be provided with extremely small read-side overheads, in fact, in the limiting case
that is actually realized by server-class Linux-kernel builds, the read-side overhead is exactly zero.[1]
RCU's fundamental guarantee may be used by splitting updates into removal and reclamation phases. The removal phase removes references to data items within a data structure
(possibly by replacing them with references to new versions of these data items), and can run concurrently with RCU read-side critical sections. The reason that it is safe to run the removal phase concurrently with RCU readers is the semantics of modern CPUs
guarantee that readers will see either the old or the new version of the data structure rather than a partially updated reference. Once a grace period has elapsed, there can no longer be any readers referencing the old version, so it is then safe for the reclamation
phase to free (reclaim) the data items that made up that old version.[2]
Splitting an update into removal and reclamation phases allows the updater to perform the removal phase immediately, and to defer the reclamation phase until all readers active during the removal
phase have completed, in other words, until a grace period has elapsed.[note
2]
So the typical RCU update sequence goes something like the following:[3]

Ensure that all readers accessing RCU-protected data structures carry out their references from within an RCU read-side critical section.
Remove pointers to a data structure, so that subsequent readers cannot gain a reference to it.
Wait for a grace period to elapse, so that all previous readers (which might still have pointers to the data structure removed in the prior step) will have completed their RCU read-side critical sections.
At this point, there cannot be any readers still holding references to the data structure, so it now may safely be reclaimed (e.g., freed).[note
3]

In the above procedure (which matches the earlier diagram), the updater is performing both the removal and the reclamation step, but it is often helpful for an entirely different thread to
do the reclamation. Reference counting can be used to let the reader perform removal so, even if the same thread performs both the update step (step (2) above) and the reclamation step (step (4) above), it is often helpful to think of them separately.


Uses[edit]

As of early 2008, there were almost 2,000 uses of the RCU API within the Linux kernel [4] including
the networking protocol stacks[5] and
the memory-management system.[6] As
of 2011, there were more than 5,000 uses.[7] Since
2006, researchers have applied RCU and similar techniques to a number of problems, including management of metadata used in dynamic analysis,[8] managing
the lifetime of clustered objects,[9] managing
object lifetime in the K42 research operating system,[10][11] and
optimizing software transactional memory implementations.[12][13] Dragonfly
BSD uses a technique similar to RCU that most closely resembles Linux's Sleepable RCU (SRCU) implementation.


Advantages and disadvantages[edit]

The ability to wait until all readers are done allows RCU readers to use much lighter-weight synchronization—in some cases, absolutely no synchronization at all. In contrast, in more conventional
lock-based schemes, readers must use heavy-weight synchronization in order to prevent an updater from deleting the data structure out from under them. This is because lock-based updaters typically update data items in place, and must therefore exclude readers.
In contrast, RCU-based updaters typically take advantage of the fact that writes to single aligned pointers are atomic on modern CPUs, allowing atomic insertion, removal, and replacement of data items in a linked structure without disrupting readers. Concurrent
RCU readers can then continue accessing the old versions, and can dispense with the atomic read-modify-write instructions, memory barriers, and cache misses that are so expensive on modern SMP computer
systems, even in absence of lock contention.[14][15] The
lightweight nature of RCU's read-side primitives provides additional advantages beyond excellent performance, scalability, and real-time response. For example, they provide immunity to most deadlock and livelock conditions.[note
4]
Of course, RCU also has disadvantages. For example, RCU is a specialized technique that works best in situations with mostly reads and few updates, but is often less applicable to update-only
workloads. For another example, although the fact that RCU readers and updaters may execute concurrently is what enables the lightweight nature of RCU's read-side primitives, some algorithms may not be amenable to read/update concurrency.
Despite well over a decade of experience with RCU, the exact extent of its applicability is still a research topic.


Patents[edit]

The technique is covered by U.S.software
patent 5,442,758, issued August 15, 1995 and assigned to Sequent Computer Systems,
as well as by 5,608,893, 5,727,209, 6,219,690, and 6,886,162. The now-expired US Patent 4,809,168 covers a closely related technique. RCU is also the topic of one claim in the SCO
v. IBM lawsuit.


Sample RCU interface[edit]

RCU is available in a number of operating systems, and was added to the Linux
kernel in October 2002. User-level implementations such as liburcu are also available.[16]
The implementation of RCU in version 2.6 of the Linux kernel is among the better-known RCU implementations, and will be used as an inspiration for the RCU API in the remainder of this article.
The core API (Application Programming Interface)
is quite small:[17]

rcu_read_lock(): Marks an RCU-protected data structure so that it won't be reclaimed for the full duration of that critical section.

rcu_read_unlock(): Used by a reader to inform the reclaimer that the reader is exiting an RCU read-side critical section. Note that RCU read-side critical sections may be nested and/or overlapping.

synchronize_rcu(): It blocks until all pre-existing RCU read-side critical sections on all CPUs have completed. Note that
synchronize_rcu
will notnecessarily
wait for any subsequent RCU read-side critical sections to complete. For example, consider the following sequence of events:

CPU 0                  CPU 1                 CPU 2
----------------- ------------------------- ---------------
1.  rcu_read_lock()
2.                    enters synchronize_rcu()
3.                                               rcu_read_lock()
4.  rcu_read_unlock()
5.                     exits synchronize_rcu()
6.                                              rcu_read_unlock()


Since
synchronize_rcu
is the API that must figure out when readers are done, its implementation is key to RCU. For RCU to be useful in all but the most read-intensive situations,
synchronize_rcu
's
overhead must also be quite small.

Alternatively, instead of blocking, synchronize_rcu may register a callback to be invoked after all ongoing RCU read-side critical sections have completed. This callback variant is called
call_rcu
in
the Linux kernel.

rcu_assign_pointer(): The updater uses this function to assign a new value to an RCU-protected pointer, in order to safely communicate the change in value from the updater to the reader. This function returns the new value, and
also executes any memory barrier instructions required for a given CPU architecture. Perhaps more importantly,
it serves to document which pointers are protected by RCU.

rcu_dereference(): The reader uses
rcu_dereference
to fetch an RCU-protected pointer, which returns a value that may then be safely dereferenced.
It also executes any directives required by the compiler or the CPU, for example, a volatile cast for gcc, a memory_order_consume load for C/C++11 or the memory-barrier instruction required by the old DEC Alpha CPU. The value returned by
rcu_dereference
is
valid only within the enclosing RCU read-side critical section. As with
rcu_assign_pointer
, an important function of
rcu_dereference
is
to document which pointers are protected by RCU.

The following diagram shows how each API communicates among the reader, updater, and reclaimer.









The RCU infrastructure observes the time sequence of
rcu_read_lock
,
rcu_read_unlock
,
synchronize_rcu
,
and
call_rcu
invocations in order to determine when (1)
synchronize_rcu
invocations may
return to their callers and (2)
call_rcu
callbacks may be invoked. Efficient implementations of the RCU infrastructure make heavy use of batching in order to amortize their
overhead over many uses of the corresponding APIs.


Simple implementation[edit]

RCU has extremely simple "toy" implementations that can aid understanding of RCU. This section presents one such "toy" implementation that works in a non-preemptive environment.[18]
void rcu_read_lock(void) { }

void rcu_read_unlock(void) { }

void call_rcu(void (*callback) (void *), void *arg)
{
// add callback/arg pair to a list
}

void synchronize_rcu(void)
{
int cpu, ncpus = 0;

for_each_cpu(cpu)
schedule_current_task_to(cpu);

for each entry in the call_rcu list
entry->callback (entry->arg);
}

You can ignore
rcu_assign_pointer
and
rcu_dereference
without
missing much. But here they are anyway.
#define rcu_assign_pointer(p, v)        ({ \
smp_wmb(); \
ACCESS_ONCE(p) = (v); \
})

#define rcu_dereference(p)              ({ \
typeof(p) _value = ACCESS_ONCE(p); \
smp_read_barrier_depends(); /* nop on most architectures */ \
(_value); \
})

Note that
rcu_read_lock
and
rcu_read_unlock
do
absolutely nothing. This is the great strength of classic RCU in a non-preemptive kernel: read-side overhead is precisely zero, as
smp_read_barrier_depends()
is an empty
macro on all but DEC Alpha CPUs;[19] such
memory barriers are not needed on modern CPUs. The
ACCESS_ONCE()
macro is a volatile cast that generates no additional code in most cases. And there is absolutely no way that
rcu_read_lock
can
participate in adeadlock cycle, cause a realtime process to miss its scheduling deadline, precipitate priority
inversion, or result in high lock contention. However, in this toy RCU implementation,
blocking within an RCU read-side critical section is illegal, just as is blocking while holding a pure spinlock.
The implementation of
synchronize_rcu
moves the caller of synchronize_cpu to each CPU, thus blocking until
all CPUs have been able to perform the context switch. Recall that this is a non-preemptive environment and that blocking within an RCU read-side critical section is illegal, which imply that there can be no preemption points within an RCU read-side critical
section. Therefore, if a given CPU executes a context switch (to schedule another process), we know that this CPU must have completed all preceding RCU read-side critical sections. Once all CPUs have executed a context switch, then all preceding RCU read-side
critical sections will have completed.


Analogy with reader-writer locking[edit]

Although RCU can be used in many different ways, a very common use of RCU is analogous to reader-writer locking. The following side-by-side code display shows how closely related reader-writer
locking (on the left) and RCU (on the right) can be.[20]
1 struct el {                           1 struct el {
2   struct list_head lp;                2   struct list_head lp;
3   long key;                           3   long key;
4   spinlock_t mutex;                   4   spinlock_t mutex;
5   int data;                           5   int data;
6   /* Other data fields */             6   /* Other data fields */
7 };                                    7 };
8 DEFINE_RWLOCK(listmutex);             8 DEFINE_SPINLOCK(listmutex);
9 LIST_HEAD(head);                      9 LIST_HEAD(head);

1 int search(long key, int *result)     1 int search(long key, int *result)
2 {                                     2 {
3   struct el *p;                       3   struct el *p;
4                                       4
5   read_lock(&listmutex);              5   rcu_read_lock();
6   list_for_each_entry(p, &head, lp) { 6   list_for_each_entry_rcu(p, &head, lp) {
7     if (p->key == key) {              7     if (p->key == key) {
8       *result = p->data;              8       *result = p->data;
9       read_unlock(&listmutex);        9       rcu_read_unlock();
10       return 1;                      10       return 1;
11     }                                11     }
12   }                                  12   }
13   read_unlock(&listmutex);           13   rcu_read_unlock();
14   return 0;                          14   return 0;
15 }                                    15 }

1 int delete(long key)                  1 int delete(long key)
2 {                                     2 {
3   struct el *p;                       3   struct el *p;
4                                       4
5   write_lock(&listmutex);             5   spin_lock(&listmutex);
6   list_for_each_entry(p, &head, lp) { 6   list_for_each_entry(p, &head, lp) {
7     if (p->key == key) {              7     if (p->key == key) {
8       list_del(&p->lp);               8       list_del_rcu(&p->lp);
9       write_unlock(&listmutex);       9       spin_unlock(&listmutex);
10       synchronize_rcu();
10       kfree(p);                      11       kfree(p);
11       return 1;                      12       return 1;
12     }                                13     }
13   }                                  14   }
14   write_unlock(&listmutex);          15   spin_unlock(&listmutex);
15   return 0;                          16   return 0;
16 }                                    17 }

The differences between the two approaches are quite small. Read-side locking moves to
rcu_read_lock
and
rcu_read_unlock
,
update-side locking moves from a reader-writer lock to a simple spinlock, and a
synchronize_rcu
precedes the
kfree
.
However, there is one potential catch: the read-side and update-side critical sections can now run concurrently. In many cases, this will not be a problem, but it is necessary to check carefully
regardless. For example, if multiple independent list updates must be seen as a single atomic update, converting to RCU will require special care.
Also, the presence of
synchronize_rcu
means that the RCU version of
delete
can
now block. If this is a problem,
call_rcu
could be used like
call_rcu (kfree, p)
in place
of
synchronize_rcu
. This is especially useful in combination with reference counting.


Name[edit]

The name comes from the way that RCU is used to update a linked structure in place. A thread wishing to do this uses the following steps:

create a new structure,
copy the data from the old structure into the new one, and save a pointer to
the old structure,
modify the new, copied, structure
update the global pointer to refer to the new structure, and then
sleep until the operating system kernel determines that there are no readers left using the old structure, for example, in the Linux kernel, by using synchronize_rcu().

When the thread which made the copy is awakened by the kernel, it can safely deallocate the old structure.
So the structure is read concurrently with a thread copying in order to do an update, hence the name "read-copy update". The abbreviation "RCU" was one of many contributions
by the Linux community. Other names for similar techniques include passive serialization and MP defer by VM/XA programmers
and generations by K42and Tornado programmers.


History[edit]

Techniques and mechanisms resembling RCU have been independently invented multiple times:[21]

H. T. Kung and Q. Lehman described use of garbage collectors to implement RCU-like
access to a binary search tree.[22]
Udi Manber and Richard Ladner extended Kung's and Lehman's work to non-garbage-collected
environments by deferring reclamation until all threads running at removal time have terminated, which works in environments that do not have long-lived threads.[23]
Richard Rashid et al. described a lazy translation
lookaside buffer (TLB) implementation that deferred reclaiming virtual-address space until all CPUs flushed their TLB, which is similar in spirit to some RCU implementations.[24]
J. Hennessy, D. Osisek, and J. Seigh were granted US Patent 4,809,168 in 1989 (since lapsed). This patent describes an RCU-like mechanism that was apparently used in VM/XA on IBM
Mainframes.[25]
William Pugh described an RCU-like mechanism that relied on explicit flag-setting
by readers.[26]
Aju John proposed an RCU-like implementation where updaters simply wait for a fixed period of time, under the assumption that readers would all complete within that fixed time, as might be appropriate in a hard real-time system.[27] Van
Jacobson proposed a similar scheme in 1993 (verbal communication).
J. Slingwine and P. E. McKenney received US Patent 5,442,758 in August 1995, which describes RCU as implemented in DYNIX/ptx and later in the Linux kernel.[28]
B. Gamsa, O. Krieger, J. Appavoo, and M. Stumm described an RCU-like mechanism used in the University
of Toronto Tornado research operating system and the closely related IBM
Research K42 research operating systems.[29]
Rusty Russell and Phil Rumpf described RCU-like techniques for handling unloading
of Linux kernel modules.[30][31]
D. Sarma added RCU to version
2.5.43 of the Linux kernel in October 2002.
Robert Colvin et al. formally verified a lazy concurrent list-based set algorithm that resembles RCU.[32]
M. Desnoyers et al. published a description of user-space RCU.[33]
A. Gotsman et al. derived formal semantics for RCU based on separation logic

PS:可以用share_ptr来实现。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: