@kiraSally
2018-03-12T19:01:53.000000Z
字数 22728
阅读 3912
并发
- 本番主要是笔者对(笔者认为的)论文精华部分的翻译提取,因此是篇转译文
- Doug Lea The java.util.concurrent Synchronizer Framework
- 由于排版需要,笔者对原著部分篇章做了一些转移(但不会影响阅读),若有不足异议之处强烈欢迎指点!
Java release J2SE-1.5 introduces pacakge java.util.concurrent,a collection of medium-level concurrent support classes created via JCP JSR 166.Among these components are a set of synchronizers - ADT classes that maintain an internal synchronization state (for example,representing whether a lock is locked or unlocked),operations to update and inspect that state,and at least one method that will cause a calling thread block if the state requires it,resuming when some other thread changes the synchronization state to permit it.Examples include various forms of mutual exclusion locks,read-write locks,semaphores,barriers,futures,event indicators,and handoff queues.
并发包起源: 基于JCP的JSR166规范,JDK1.5引入j.u.c包,其提供了一系列支持中等程度并发的类
核心组件: 并发包的核心组件是一系列的同步器(抽象数据类型(ADT))
实例: 不同形式的互斥排它锁、读写锁、信号量、屏障、Futures、事件指示器以及传送队列
JSR166 establishes a small framework centered on class AbstractQueuedSynchronizer,that provoides common mechanics that are used by most of the provided synchcronizers in the pacakge,as well as other classes that users may define themseleves.
AQS核心价值: JSR166构建了AQS框架,其为构造同步器提供一种通用的机制,并且被j.u.c包中大部分类使用,同时用户也可以通过其定义自己的同步器
Synchronizers possess two kinds of methods : at least one acquire operation that blocks the calling thread unless/until the synchronization state allows it to proceed, and at least one release operation that changes synchronization state in a way that may allow one or more blocked threads to unblock.
1. acquire: 阻塞调用线程,除非或直到同步状态允许其继续执行
2. release: 通过某种方式改变同步状态,允许一个或多个(被aquire)阻塞的线程继续执行
The java.util.concurrent package does not define a single unified API for synchronizers. Some are defined via common interfaces (e.g., Lock), but others contain only specialized versions. So, acquire and release operations take a range of names and forms across different classes. For example, methods Lock.lock,Semaphore.acquire,CountDownLatch.await,and FutureTask.get all map to acquire operations in the framework.
j.u.c包中同步器并没有统一的API定义: 一些类遵循通用接口定义(如Lock),另外一些则定义其专属版本,因此acquire和release操作的名称和形式在不同类中定义或有不同
以下方法都属于acquire操作: Lock.lock,Semaphore.acquire,CountDownLatch.await和FutureTask.get
However, the package does maintain consistent conventions across classes to support a range of common usage options.When meaningful, each synchronizer supports:
• Nonblocking synchronization attempts (for example, tryLock) as well as blocking versions.
• Optional timeouts, so applications can give up waiting.
• Cancellability via interruption, usually separated into one version of acquire that is cancellable, and one that isn't.
Synchronizers may vary according to whether they manage only exclusive states – those in which only one thread at a time may continue past a possible blocking point – versus possible shared states in which multiple threads can at least sometimes proceed. Regular lock classes of course maintain only exclusive state, but counting semaphores, for example, may be acquired by as many threads as the count permits. To be widely useful, the framework must support both modes of operation.
适用性:为了框架更具适用性,这两种操作模式都必须支持
The java.util.concurrent package also defines interface Condition, supporting monitor-style await/signal operations that may be associated with exclusive Lock classes, and whose implementations are intrinsically intertwined with their associated Lock classes.
管程形式: j.u.c包通过定义Condition接口提供管程形式的await/signal操作
Lock关联: await/signal操作与其关联的独占模式Lock类紧密相关
The primary performance goal here is scalability: to predictably maintain efficiency even, or especially, when synchronizers are contended. Ideally, the overhead required to pass a synchronization point should be constant no matter how many threads are trying to do so. Among the main goals is to minimize the total amount of time during which some thread is permitted to pass a synchronization point but has not done so. However, this must be balanced against resource considerations, including total CPU time requirements, memory traffic, and thread scheduling overhead. For example, spinlocks usually provide shorter acquisition times than blocking locks, but usually waste cycles and generate memory contention, so are not often applicable.
These goals carry across two general styles of use. Most applications should maximize aggregate throughput, tolerating, at best, probabilistic guarantees about lack of starvation. However in applications such as resource control, it is far more important to maintain fairness of access across threads, tolerating poor aggregate throughput. No framework can decide between these conflicting goals on behalf of users; instead different fairness policies must be accommodated.
框架本身不能决定用户的选择:相反框架必须同时适应这两种公平策略,如ReentrantLock
中的FairSync
和NonfairSync
No matter how well-crafted they are internally, synchronizers will create performance bottlenecks in some applications. Thus, the framework must make it possible to monitor and inspect basic operations to allow users to discover and alleviate bottlenecks. This minimally (and most usefully) entails providing a way to determine how many threads are blocked.
问题:无论同步器内部设计多么精巧,其在某些应用中还是会产生性能瓶颈
解决方案: 框架必须提供提供让用户能发现和缓解瓶颈的监听和检测方案,因此至少需要提供一种方式来确定有多少线程被阻塞,比如通过阻塞队列记录阻塞线程数
//阻塞
//循环判断同步状态是否可取
while (synchronization state does not allow acquire) {
//不可取时,线程入队列(若尚未进入队列)
enqueue current thread if not already queued;
//不可取时,将当前调用线程阻塞
//这里补充一点,线程阻塞次数应该不受限,即释放线程应与调用阻塞次数无关
possibly block current thread;
}
dequeue current thread if it was queued;
//解除阻塞
//更新同步状态
update synchronization state;
//判断同步状态是否可取
if (state may permit a blocked thread to acquire)
//状态可取,从阻塞队列中释放一个或多个阻塞线程
unblock one or more queued threads;
acquire和release方法的具体实现: 请参看 3.4.5 acquire和release方法实现的一般形式
Support for these operations requires the coordination of three basic components:
•Atomically managing synchronization state
•Blocking and unblocking threads
•Maintaining queues
The central design decision in the synchronizer framework was to choose a concrete implementation of each of these three components, while still permitting a wide range of options in how they are used. This intentionally limits the range of applicability, but provides efficient enough support that there is practically never a reason not to use the framework (and instead build synchronizers from scratch) in those cases where it does apply.
核心决策:同步器框架的核心决策是为这三个组件选择一个具体实现,同时在使用方式上提供大量可用选项
拿来主义:虽然有意限制了其适用范围,但提供了足够的效率,因此可以不用重复"造轮子"直接"拿来主义"即可
Class AbstractQueuedSynchronizer maintains synchro-nization state using only a single (32bit) int, and exports getState, setState, and compareAndSetState operations to access and update this state. These methods in turn rely on java.util.concurrent.atomic support providing JSR133 (Java Memory Model) compliant volatile semantics on reads and writes, and access to native compare-and-swap or load-linked/store-conditional instructions to implement compare-AndSetState , that atomically sets state to a given new value only if it holds a given expected value.
保存同步状态: AQS类通过单个int(32位)来保存同步状态
操作同步状态: AQS类提供getState
、setState
和compareAndSetState
方法来读取和更新该状态
实现原理: 同步状态的读取和更新依赖于atomic包,原理变量类提供了兼容JSR133(JMM)的volatile的内存读写语义,并提供本地的compare-and-swap
或load-linked/store-conditional
指令来实现compareAndSetState
,使得仅当同步状态持有一个期望值时,才能够被原子赋新值
Restricting synchronization state to a 32bit int was a pragmatic decision. While JSR166 also provides atomic operations on 64bit long fields, these must still be emulated using internal locks on enough platforms that the resulting synchronizers would not perform well. In the future, it seems likely that a second base class, specialized for use with 64bit state (i.e., with long control arguments), will be added. However, there is not now a compelling reason to include it in the package. Currently, 32 bits suffice for most applications. Only one java.util.concurrent synchronizer class, CyclicBarrier, would require more bits to maintain state, so instead uses locks (as do most higher-level utilities in the package).
32位考量:限制同步状态为32位是个实践性的决定
64位考量:尽管JSR166也提供了64位long字段的原子性操作,但由于多数平台仍是使用内部锁实现该操作,因此可能导致其性能表现不佳
CyclicBarrier:只有一个同步器CyclicBarrier
可能需要多于32位来维持状态,因此其使用了锁
补充: JDK1.6 已提供 AbstractQueuedLongSynchronizer
支持64位long用于维护同步状态
Concrete classes based on AbstractQueuedSynchronizer must define methods tryAcquire and tryRelease in terms of these exported state methods in order to control the acquire and release operations. The tryAcquire method must return true if synchronization was acquired, and the tryRelease method must return true if the new synchronization state may allow future acquires. These methods accept a single int argument that can be used to communicate desired state; for example in a reentrant lock, to re-establish the recursion count when re-acquiring the lock after returning from a condition wait. Many synchronizers do not need such an argument, and so just ignore it.
tryAcquire
和tryRelease
方法以控制acquire
和release
操作tryAcquire
方法必须返回truetryRelease
方法也必须返回truepackage includes a LockSupport class with methods that address this problem. Method LockSupport.park blocks the current thread unless or until a LockSupport.unpark has been issued. (Spurious wakeups are also permitted.) Calls to unpark are not "counted", so multiple unparks before a park only unblock a single park. Additionally, this applies per-thread, not per-synchronizer. A thread invoking park on a new synchronizer might return immediately because of a "leftover" unpark from a previous usage. However, in the absence of an unpark, its next invocation will block. While it would be possible to explicitly clear this state, it is not worth doing so. It is more efficient to invoke park multiple times when it happens to be necessary.
LockSupport
的park
和unpark
方法提供的线程阻塞和解除阻塞的功能注意:它们作用于每个线程而不是每个同步器,一个线程在一个新同步器上调用park操作可能立即返回,因为此前可能已执行多次unpark操作;然后下一次调用park会被阻塞当之前缺少一次unpark调用
注意:相比多次调用unpark方法,多次调用park方法会显得更有效
补充:更多内容请参见笔者的 并发番@LockSupport一文通(1.8版)
The heart of the framework is maintenance of queues of blocked threads, which are restricted here to FIFO queues. Thus, the framework does not support priority-based synchronization.
AQS框架关键: 维护被阻塞的线程队列 - 其必须是严格的先进先出队列,因此不支持基于优先级的同步
These days, there is little controversy that the most appropriate choices for synchronization queues are non-blocking data structures that do not themselves need to be constructed using lower-level locks. And of these, there are two main candidates: variants of Mellor-Crummey and Scott (MCS) locks , and variants of Craig, Landin, and Hagersten (CLH) locks.
同步队列的最佳选择: 最合适的是没有使用底层锁进行构建的非阻塞数据结构,如MCS和CLH锁
However, they appeared more amenable than MCS for use in the synchronizer framework because they are more easily adapted to handle cancellation and timeouts, so were chosen as a basis. The resulting design is far enough removed from the original CLH structure to require explanation.
选择CLH:因为CLH锁更容易地去实现取消和超时功能,但是最终的设计与原始CLH结构差异较大
A CLH queue is not very queue-like, because its enqueuing and dequeuing operations are intimately tied to its uses as a lock. It is a linked queue accessed via two atomically updatable fields, head and tail, both initially pointing to a dummy node.
锁用途: CLH队列并不像一个队列,因为它的入队和出队操作都与其作为锁的用途密切相关
链表结构: CLH是链表队列,通过head和tail字段进行原子更新并在初始化时都指向一个空节点(默认构造器)
操作支持: CLH锁队列支持入队、自旋和出队操作,当无法获取到锁时就自旋
//一个新节点的入队操作是个原子操作
do {
pred = tail;
} while(!tail.compareAndSet(pred, node));//遵循先进先出,新入的作为最后一个节点
//每一个节点的'释放'状态都存储于其前驱节点中,因此自旋锁的'自旋'操作如下:
while (pred.status != RELEASED); // spin 自旋,直到前驱节点被释放
//出队操作仅需在自旋后,将刚获得锁的节点指向head字段即可
head = node;
Among the advantages of CLH locks are that enqueuing and dequeuing are fast, lock-free, and obstruction free (even under contention, one thread will always win an insertion race so will make progress); that detecting whether any threads are waiting is also fast (just check if head is the same as tail); and that release status is decentralized, avoiding some memory contention.
In the original versions of CLH locks, there were not even links connecting nodes. In a spinlock, the pred variable can be held as a local. However, Scott and Scherer showed that by explicitly maintaining predecessor fields within nodes, CLH locks can deal with timeouts and other forms of cancellation: If a node's predecessor cancels, the node can slide up to use the previous node's status field.
补充: 原始的CLH锁中节点间并没有链接互联。自旋锁中,pred变量可以是一个局部变量。然而,Scott和Scherer证明通过在节点中显式地维护前驱节点,CLH锁就可以处理'超时'和各种形式的'取消':如果一个节点的前驱节点被取消,该节点就能继续往前并使用前一个节点的状态字段
The main additional modification needed to use CLH queues for blocking synchronizers is to provide an efficient way for one node to locate its successor. In spinlocks, a node need only change its status, which will be noticed on next spin by its successor, so links are unnecessary. But in a blocking synchronizer, a node needs to explicitly wake up (unpark) its successor.
改造点: 对CLH最主要的新增变化是提供一个有效的方式去定位后继节点以便支持阻塞同步器
自旋锁:在自旋锁中一旦节点状态发生变化,下次自旋中其后继节点就会注意该变化,因此节点互链非必要
阻塞同步器:但在阻塞同步器中,一个节点需要显式地唤醒(unpark)其后继节点
An AbstractQueuedSynchronizer queue node contains a next link to its successor. But because there are no applicable techniques for lock-free atomic insertion of double-linked list nodes using compareAndSet, this link is not atomically set as part of insertion.
实现: AQS队列节点通过next字段指向其后继节点
非原子操作: 由于没有针对双向链表节点的类似CAS的原子性无锁插入指令支持,因此next只是在节点插入后进行简单赋值,并非是原子操作
- it is simply assigned after the insertion :
//节点插入后,next只是简单赋值而并非原子操作
pred.next = node;
The next link is treated only as an optimized path. If a node's successor does not appear to exist (or appears to be cancelled) via its next field, it is always possible to start at the tail of the list and traverse backwards using the pred field to accurately check if there really is one.
注意: next只是一种优化手段 - 当节点的next不存在时,即其后继节点不存在(或看似被取消),总当通过某个节点的next字段发现其后继结点不存在(或看似被取消了),总能够通过从pred开始向后遍历精确得知其后续节点是否存在
A second set of modifications is to use the status field kept in each node for purposes of controlling blocking, not spinning.
改造点: 对CLH的第二个新增变化是每个节点都通过维护一个状态字段用于控制阻塞而非自旋
In the synchronizer framework, a queued thread can only return from an acquire operation if it passes the tryAcquire method defined in a concrete subclass; a single "released" bit does not suffice. But control is still needed to ensure that an active thread is only allowed to invoke tryAcquire when it is at the head of the queue; in which case it may fail to acquire, and (re)block. This does not require a per-node status flag because permission can be determined by checking that the current node's predecessor is the head. And unlike the case of spinlocks, there is not enough memory contention reading head to warrant replication. However, cancellation status must still be present in the status field.
保存多种状态:保存'释放'状态和'取消'状态
原因:在同步器框架中,线程仅在调用子类的tryAcquire方法返回true时才能从aquire操作中返回,单个'释放'位是不够的;但确保只有处于队列头部的活动线程才能调用tryAquire方法的控制是必要的;此时acquire可能会失败,然后(重新)阻塞。此时无须获取前一个节点的状态标识,检查当前节点的前驱节点是否是head即可;不同于自旋锁,读取head以保证复制时不会有太多的内存竞争;然而,'取消'状态必须保存在状态字段中
The queue node status field is also used to avoid needless calls to park and unpark. While these methods are relatively fast as blocking primitives go, they encounter avoidable overhead in the boundary crossing between Java and the JVM runtime and/or OS. Before invoking park, a thread sets a "signal me" bit, and then rechecks synchronization and node status once more before invoking park. A releasing thread clears status. This saves threads from needlessly attempting to block often enough to be worthwhile, especially for lock classes in which lost time waiting for the next eligible thread to acquire a lock accentuates other contention effects. This also avoids requiring a releasing thread to determine its successor unless the successor has set the signal bit, which in turn eliminates those cases where it must traverse multiple nodes to cope with an apparently null next field unless signalling occurs in conjunction with cancellation.
避免不必要调用:用于避免不必要的park和unpark调用
Perhaps the main difference between the variant of CLH locks used in the synchronizer framework and those employed in other languages is that garbage collection is relied on for managing storage reclamation of nodes, which avoids complexity and overhead. However, reliance on GC does still entail nulling of link fields when they are sure to never to be needed. This can normally be done when dequeuing. Otherwise, unused nodes would still be reachable, causing them to be uncollectable.
垃圾回收:与其他语言中的相比,同步框架中的改造CLH锁最主要区别可能是CLH锁需要依赖垃圾回收来管理节点的内存,从而避免了一些复杂性和开销
注意:但即使依赖GC也仍需将无用的链接设置为null,这通常与出队操作一起执行。否则,由于无用节点仍是可达的,因此无法被回收(有兴趣的读者可了解一下GC的变量可达性问题)
//基本的acquire操作的最终实现的一般形式如下(互斥,非中断,无超时):
//当尝试获取锁失败时,说明锁已被其他线程获取
if (!tryAcquire(arg)) {
//创建一个新节点并入队CLH
node = create and enqueue new node;
//获取前驱节点
pred = effective predecessor of the node;
//当前驱节点非头节点 或 再次尝试获取锁失败
while (pred is not head node || !tryAcquire(arg)) {
//前驱节点的状态位已被设置为true,直接阻塞当前线程
if (signal bit of the pred is set){
park();
} else {
//否则通过CAS设置前驱节点的状态位为true
compareAndSet signal bit of the pred to true;
//设置前驱节点为当前节点的有效前驱节点
pred = effective predecessor of the node;
}
}
//设置头节点
head = node;
}
//基本的release操作如下:
//尝试释放线程成功 同时 头节点状态位已被设置为true
if (tryRelease(arg) && signal bit of the head node is set) {
//CAS变更头节点状态位为true
compareAndSet signal bit of the head to false;
//若头节点的后继节点存在,解除其的阻塞
unpark successor of the head, if one exists
}
The number of iterations of the main acquire loop depends, of course, on the nature of tryAcquire . Otherwise, in the absence of cancellation, each component of acquire and release is a constant-time O(1) operation, amortized across threads, disregarding any OS thread scheduling occuring within park.
循环次数: acquire的循环次数依赖于具体实现类中tryAcquire的实现方式
时间复杂度: 在没有'取消'操作的情况下,(忽略park中发生的所有操作系统线程调度)每一个线程的acquire和release操作的时间复杂度都是O(1)
Cancellation support mainly entails checking for interrupt or timeout upon each return from park inside the acquire loop. A cancelled thread due to timeout or interrupt sets its node status and unparks its successor so it may reset links. With cancellation, determining predecessors and successors and resetting status may include O(n) traversals (where n is the length of the queue). Because a thread never again blocks for a cancelled operation, links and status fields tend to restabilize quickly.
支持取消操作原因: 主要是要在acquire循环里的park返回时检查中断或超时
取消策略:因超时或中断而被取消的线程会设置其节点状态然后unpark其后继节点以便重置链接
策略补充:由于操作被'取消',因此线程不会再次被阻塞,其链接和状态字段能快速重新稳定(即快速确定和设置)
时间复杂度: 在需要'取消'时,可能需要时间复杂度为O(n)的遍历(n=队列的长度)来决定其前驱节点和后继节点以及重置状态
The synchronizer framework provides a ConditionObject class for use by synchronizers that maintain exclusive synchronization and conform to the Lock interface. Any number of condition objects may be attached to a lock object, providing classic monitor-style await, signal, and signalAll operations, including those with timeouts, along with some inspection and monitoring methods.This class supports only Java- style monitor access rules in which condition operations are legal only when the lock owning the condition is held by the current thread.
起源: AQS框架提供一个ConditionObject类给同步器和Lock接口使用
锁对象: 一个锁对象可以关联任意多个ConditionObject对象
管程风格: ConditionObject提供管程风格的await、singal和singalAll操作,包括超时以及一些检测和监控方法
使用限制: 仅当当前线程持有锁且要操作的条件属于该锁时,条件操作才是合法的
A ConditionObject uses the same internal queue nodes as synchronizers, but maintains them on a separate condition queue.The signal operation is implemented as a queue transfer from the condition queue to the lock queue, without necessarily waking up the signalled thread before it has re-acquired its lock.The transfer operation simply unlinks the first node from the condition queue, and then uses CLH insertion to attach it to the lock queue.Because these operations are performed only when the lock is held, they can use sequential linked queue operations (using a nextWaiter field in nodes) to maintain the condition queue.
节点维护: ConditionObject使用与同步器一样的内部队列节点,但是在一个单独的条件队列中维护这些节点的
signal原理: 将节点从条件队列转移到锁队列中,因此无须在需要唤醒的线程重新获取到锁之前将其唤醒
节点转移: 转移操作仅仅把第一个节点从条件队列中的链接解除,然后通过CLH入队操作将其插入到锁队列
顺序链表: 因为只有在持有锁的时候才能执行条件队列操作,因此可以使用顺序链表队列操作来维护条件队列(在节点中使用一个nextWaiter字段)
- The basic await operation is:
//创建一个新节点并入队(条件队列)
create and add new node to condition queue;
//释放锁 -- 进入条件队列必须要释放锁,以让后续线程可以获取锁
release lock;
//阻塞当前线程直到对应节点处于锁队列(CLH) -- 当满足条件时,节点会从条件队列移出到锁队列
block until node is on lock queue;
//满足条件之后就可以重新获取锁
re-acquire lock;
- And the signal operation is:
//将条件队列的头节点移出到锁队列(CLH)
transfer the first node from condition queue to lock queue;
The main complication in implementing these operations is dealing with cancellation of condition waits due to timeouts or Thread.interrupt. A cancellation and signal occuring at approximately the same time encounter a race whose outcome conforms to the specifications for built-in monitors. As revised in JSR133, these require that if an interrupt occurs before a signal, then the await method must, after re-acquiring the lock, throw InterruptedException. But if it is interrupted after a signal, then the method must return without throwing an exception, but with its thread interrupt status set.
主要复杂度: 如何处理因超时或Thread.interrupt导致条件等待取消情况
竞态问题: '取消'和'唤醒'几乎同时发生就会有竞态问题,最终的结果遵循内置管程相关的规范
To maintain proper ordering, a bit in the queue node status records whether the node has been (or is in the process of being) transferred. Both the signalling code and the cancelling code try to compareAndSet this status. If a signal operation loses this race, it instead transfers the next node on the queue, if one exists. If a cancellation loses, it must abort the transfer, and then await lock re-acquisition.
状态转移:为了维护有序性,队列节点会通过节点状态的一个位记录节点是否发生转移(或正在发生转移)
This latter case introduces a potentially unbounded spin. A cancelled wait cannot commence lock re-acquisition until the node has been successfully inserted on the lock queue, so must spin waiting for the CLH queue insertion compareAndSet being performed by the signalling thread to succeed. The need to spin here is rare, and employs a Thread.yield to provide a scheduling hint that some other thread, ideally the one doing the signal, should instead run. While it would be possible to implement here a helping strategy for the cancellation to insert the node, the case is much too rare to justify the added overhead that this would entail. In all other cases, the basic mechanics here and elsewhere use no spins or yields, which maintains reasonable performance on uniprocessors.
潜在自旋:'取消'失败会引起一个潜在的无限自旋等待 - 被'取消'的等待不能重新获得锁直到该节点被成功插入到锁队列之前,因此必须自旋等待直到被'唤醒'线程成功执行CLH队列的CAS插入
取消失败情况: 实际上极少使用自旋且会调用Thread.yield暗示调度某一其他线程(理想情况就是执行signal操作的线程);虽然(引入自旋)可能为'取消'操作提供一个帮助策略以成功插入节点,但由于情况极少因此无需增加这些额外开销
其他情况: 在其它所有的情况下,这个基本的机制都不需要自旋或yield,因此在单处理器上保持着合理的性能
并发番@AQS框架一文通 由 黄志鹏kira 创作,采用 知识共享 署名-非商业性使用 4.0 国际 许可协议 进行许可。
本站文章除注明转载/出处外,均为本站原创或翻译,转载前请务必署名。