线程同步简介¶
Info
原文标题:Introduction to thread synchronization
原文链接:https://www.internalpointers.com/post/introduction-thread-synchronization
原文作者:Triangles
本文链接:https://www.white-winds.com/post/introduction to thread synchronization
看看多线程应用程序中最流行的并发控制方法之一。
Introduction to thread synchronization
A look at one of the most popular ways of concurrency control in a multithreaded application.
正如之前多线程入门中所看到的,编写并发代码可能很棘手。可能会出现两个大问题:数据竞争,即写入线程在读取线程正在读取内存时修改内存;以及竞争条件,即两个或多个线程以不可预测的顺序完成工作。幸运的是,有几种方法可以避免这些错误:本文中将介绍最常见的——同步。
As seen in my previous introduction to multithreading, writing concurrent code can be tricky. Two big problems might emerge: data races, when a writer thread modifies the memory while a reader thread is reading it and race conditions, when two or more threads do their job in an unpredictable order. Luckily for us there are several ways to prevent these errors: in this article I will take a look at the most common one known as synchronization.
什么是同步¶
What is synchronization
同步是确保两个或多个线程自己按规则运行的一系列技巧。更具体地说,同步将在多线程程序中帮助实现至少两个重要功能:
- 原子性——如果代码中包含对跨线程共享的数据进行操作的指令,则对该数据的不受监管的并发访问可能会触发数据竞争。包含这些指令的代码段称为临界区。你想确保这块关键部分是原子地执行:如上一篇文章所定义的,原子操作不可分割,这样当一个线程正在执行它时,没有其他线程可以溜过;
- 顺序 ——有时希望两个或多个线程以可预测的顺序执行它们的工作,或者限制有多少线程可以访问特定资源。通常无法控制这些事情,这可能触发竞争条件。通过同步,可以编排线程按计划执行。
Synchronization is a bag of tricks that make sure two or more threads behave themselves. More specifically, synchronization will help you to achieve at least two important features in your multithreaded program:
- atomicity — if your code contains instructions that operate on data shared across multiple threads, an unregulated concurrent access to that data might trigger a data race. The code segment that contains those instructions is called critical section. You want to make sure that critical sections are executed atomically: as defined in the previous episode, an atomic operation can't be broken into smaller ones, so that while a thread is executing it no other thread can slip through;
- ordering — sometimes you want two or more threads to perform their job in a predictable order, or put a restriction on how many threads can access a specific resource. Normally you don't have control over these things, which might be the root cause of race conditions. With synchronization you can orchestrate your threads to perform according to a plan.
同步是通过同步原语的特殊对象来实现的,它由操作系统或任何支持线程的编程语言提供。通过在代码中使用此类同步原语,可以确保线程之间不会触发任何的数据竞争、竞争条件。
Synchronization is implemented through special objects called synchronization primitives provided by the operating system or any programming language that supports threading. You then make use of such synchronization primitives in your code to make sure your threads don't trigger data races, race conditions or both.
同步发生在硬件和软件中,以及线程和操作系统进程之间。本文是关于软件线程的同步:硬件和进程的同步是引人入胜的话题,在以后的文章中肯定会受到一些人的喜爱。
Synchronization takes place both in hardware and software, as well as between threads and operating system processes. This article is about synchronization of software threads: the physical counterpart and process synchronization are fascinating topics that will surely get some love in a future post.
通用的同步原语¶
Common synchronization primitives
最重要的同步原语是互斥锁、信号量和条件变量。这些术语没有官方定义,因此不同的文章和实现会在每个原语上有轻微的差异。
The most important synchronization primitives are mutexes, semaphores and condition variables. There are no official definitions for these terms, so different texts and implementations associate slightly different characteristics with each primitive.
操作系统原生的提供这些工具。例如,Linux 和 macOS 支持 POSIX 线程,也称为 pthread,这是一组可以编写安全的多线程应用程序的函数。Windows 在 C 运行时库 (CRT) 中有自己的同步工具:概念上类似于 POSIX 线程函数,但名称有所不同。
Operating systems provide these tools natively. For example Linux and macOS support POSIX threads, also known as pthreads, a set of functions that allows you to write safe multithreaded applications. Windows has its own synchronization tools in the C Run-Time Libraries (CRT): conceptually similar to POSIX threads functions but with different names.
除非正在编写非常低级的代码,否则通常建议选择使用编程语言附带的同步原语。每种处理多线程的编程语言都有自己的同步原语工具箱,以及其他处理线程的函数。例如 Java 提供了 java.util.concurrent
包,现代 C++ 有自己的 thread
库,C# 提供了 System.Threading
命名空间等等当然,所有这些函数和对象都基于底层的操作系统原语。
Unless you are writing very low-level code, you usually want to employ the synchronization primitives shipped with the programming language of your choice. Every programming language that deals with multithreading has its own toolbox of synchronization primitives, along with other functions to fiddle around with threads. For example Java provides the
java.util.concurrent
package, modern C++ has its ownthread
library, C# ships theSystem.Threading
namespace and so on. All these functions and objects are based upon the underlying operating system primitives, of course.
还有许多其他的同步工具。本文仅介绍上面提到的三个,因为它们通常是构建更复杂实体的基础。接下来逐个分析。
There are many other synchronization tools around. In this article I'll stick to the three mentioned above, as they act as a foundation often used to build more complex entities. Let's take a closer look.
互斥锁¶
Mutexes
互斥锁 (相互排斥) 是一种同步原语,它在临界区周围设置限制,来防止数据竞争。互斥锁通过确保每个时刻至多只有一个线程访问临界区来保证原子性。
A mutex (mutual exclusion) is a synchronization primitive that puts a restriction around a critical section, in order to prevent data races. A mutex guarantees atomicity, by making sure that only one thread accesses the critical section at a time.
从技术上讲,互斥锁是应用程序中的全局对象,在多个线程之间共享,它提供通常称为 lock
和 unlock
的两个函数。将进入临界区的线程调用lock
来锁定互斥锁;当这个线程执行完成,即临界区结束时调用unlock
来解锁它。互斥锁的重要特性:只有锁定互斥锁的线程才能在之后解锁它。
Technically, a mutex is a global object in your application, shared across multiple threads, that provides two functions usually called
lock
andunlock
. A thread that is about to enter the critical section callslock
to lock the mutex; when it's done, that is when the critical section is over, the same thread callsunlock
to unlock it. The important feature of a mutex: only the thread that locks the mutex is allowed to unlock it later on.
如果另一个线程跳入并试图锁定一个锁定的互斥体,操作系统会使其进入睡眠状态,直到第一个线程完成其任务并解锁互斥体。这样只有一个线程可以访问临界区;任何其他线程都被排除在外,并且必须等待解锁。因此,互斥体也称为锁定机制。
If another thread jumps in and tries to lock a locked mutex, the operating system puts it to sleep until the first thread has finished its task and has unlocked the mutex. This way only one thread can access the critical section; any other thread is excluded from it and must wait for the unlock. For this reason a mutex is also known as a locking mechanism.
可以使用互斥体来保护简单的操作,例如对共享变量的并发读取和写入,以及需要由一个线程一次执行的更大更复杂的操作,例如写入日志文件或修改数据库。无论如何,互斥体的锁/解锁操作总是对应临界区的边界。
You can use a mutex to protect simple actions like a concurrent read and write of a shared variable, as well as bigger and more complex operations that need to be executed by one thread at a time, such as writing to a log file or modifying a database. Anyway, the mutex lock/unlock operations always match the boundaries of the critical section.
递归互斥锁¶
Recursive mutexes
在任何常规互斥锁实现中,同一个线程锁定互斥锁两次都会导致错误。与之不同,递归互斥体锁允许这样做:同一线程可以多次锁定递归互斥锁,而无需先解锁它。但是,在这个线程持有的所有锁都被释放之前,没有其他线程可以锁定递归互斥锁。此同步原语也称为可重入互斥锁,其中可重入是指在之前的调用结束之前,还有多次调用函数(即再次进入临界区)的能力。
In any regular mutex implementation, a thread that locks a mutex twice causes an error. A recursive mutex allows this, instead: a thread can lock a recursive mutex multiple times without unlocking it first. However no other thread can lock the recursive mutex until all the locks held by the first thread have been released. This synchronization primitive is also known as reentrant mutex, where reentrancy is the ability to call a function multiple times (i.e. to enter it again) before the previous invocations are over.
递归互斥锁很难使用并且容易出错。开发者必须跟踪哪个线程将互斥体锁定了多少次,并确保同一个线程完全解锁它。如果不这样做,就会留下锁定的互斥体,这将带来令人讨厌的后果。大多数场景下,一个普通的互斥锁就足够了。
Recursive mutexes are difficult to work with and are error-prone. You have to keep track of which thread has locked the mutex how many times and make sure the same thread unlocks it completely. Failing to do so would leave locked mutexes around with nasty consequences. Most of the time a regular mutex is enough.
读/写互斥锁¶
Reader/Writer Mutexes
从上一篇文章我们知道,只要不修改共享资源,多个线程可以并发地从共享资源中读取数据。那么,如果某些线程在“只读”模式下运行,为什么还要锁定互斥锁呢? 考虑一个并发数据库,它经常被许多线程读取,而另一个线程很少写入更新。这当然需要一个互斥锁来保护读/写访问,虽然大多数时候锁定它只是为了读操作,这阻止了其他读线程完成它们的工作。
As we know from the previous episode, multiple threads can concurrently read from a shared resource without harm as long as they don't modify it. So why bother locking a mutex if some of your threads are operating in "read-only" mode? For example consider a concurrent database that is frequently read by many threads, while another thread seldomly writes updates. You certainly need a mutex to protect the read/write access, but most of the time you would end up locking it just for read operations, preventing other reading threads to do their job.
读/写互斥锁允许多个线程并发读取和单个线程独占写入共享资源。它可以被锁定在读或写模式。要想修改资源,线程必须首先获取排他的写锁。在释放所有读锁之前,不允许独占写锁。
A reader/writer mutex allows concurrent reads from multiple threads and exclusive writes from a single thread to a shared resource. It can be locked in read or write mode. To modify a resource, a thread must first acquire the exclusive write lock. An exclusive write lock is not permitted until all read locks have been released.
信号量¶
Semaphores
信号量是用于编排线程的同步原语:哪个线程先启动,有多少线程可以访问资源等等。就像街道信号灯调节车流一样,程序中的信号量调节多线程流:因此信号量也被称为信号机制。可以把它看作是互斥锁的演变,因为它保证了顺序 和原子性。接下来的段落将展示为什么将信号量仅用于原子性并不是一个好主意。
A semaphore is a synchronization primitive used to orchestrate threads: which one starts first, how many threads can access a resource and so on. Like a street semaphore regulates the traffic, a programming semaphore regulates the multithreading flow: for this reason a semaphore is also known as a signaling mechanism. It can be seen as an evolution of a mutex, because it guarantees both ordering and atomicity. However in a few paragraphs I will show you why using semaphores for atomicity only is not a great idea.
从技术上讲,信号量是应用程序中的一个全局对象,在多个线程之间共享,它包含一个由两个函数管理的数字计数器:一个增加计数器,另一个减少它。历史上称为 P
和 V
,现代实现对这些函数使用更友好的名称,例如 acquire
和 release
。
Technically, a semaphore is a global object in your application, shared across multiple threads, that contains a numeric counter managed by two functions: one that increases the counter, another one that decreases it. Historically called
P
andV
, modern implementations use more friendly names for those functions such asacquire
andrelease
.
信号量控制对共享资源的访问:计数器确定可以同时访问共享资源的最大线程数。在程序开始时,信号量被初始化,可以根据需要选择该数字。之后一个想要访问共享资源的线程调用 acquire
:
- 如果计数器大于零,线程可以继续。计数器立即减一,然后当前线程开始工作。完成后,它会调用
release
,然后将计数器加一。 - 如果计数器等于零,则线程无法继续:其他线程已经占满了可用空间。当前线程被操作系统置于睡眠状态,当信号量计数器再次大于零时(即任何其他线程在其工作完成后调用
release
时)将唤醒。
A semaphore controls the access to a shared resource: the counter determines the maximum number of threads that can simultaneously access it. At the beginning of your program, when the semaphore gets initialized, you choose that number according to your needs. Then, a thread that wants to access a shared resource calls
acquire
:
- if the counter is greater than zero the thread can proceed. The counter gets reduced by one right away, then the current thread starts doing its job. When done, it calls
release
which in turn increases the counter by one.- if the counter is equal to zero the thread cannot proceed: other threads have already filled up the available space. The current thread is put to sleep by the operating system and will wake up when the semaphore counter becomes greater than zero again (that is when any other thread calls
release
once its job is done).
与互斥锁不同,任何线程都可以释放信号量,而不仅仅是最初获得它的线程。
Unlike a mutex, any thread can release a semaphore, not only the one that has acquired it in the first place.
单个信号量用于限制访问共享资源的线程数量:例如,限制多线程数据库连接的数量,其中每个线程由连接到服务器的人触发。
A single semaphore is used to limit the number of threads accessing a shared resource: for example to cap the number of multithreaded database connections, where each thread is triggered by someone connecting to your server.
通过将多个信号量组合在一起,可以解决线程顺序问题:例如,在浏览器中渲染网页的线程必须在从 Internet 下载 HTML 文件的线程之后开始。线程 A 完成后会通知线程 B,以便 B 可以被唤醒并继续其工作:这也便是著名的生产者-消费者问题。
By combining multiple semaphores together you can solve thread ordering problems: for example the thread that renders a web page in your browser must start after the thread that downloads the HTML files from the Internet. Thread A would notify thread B when it's done, so that B can wake up and proceed with its job: this is also known as the famous Producer-Consumer problem.
二进制信号量¶
Binary semaphores
计数器限制为值 0 和 1 的信号量称为二进制信号量:一次只有一个线程可以访问共享资源。停一下:这基本上是一个保护临界区的互斥锁!实际上可以使用二进制信号量复制互斥锁的行为。但是有两点需要注意:
- 互斥锁只能由首先锁定它的线程解锁,而信号量可以从任何其他线程释放。如果想要的只是一个锁定机制,这点区别可能会导致混乱和微妙的错误;
- 信号量是协调线程的信号机制,而互斥锁是保护共享资源的锁定机制。不应该使用信号量来保护共享资源,也不应该使用互斥锁来发出信号:这样你的意图对你和任何阅读代码的人来说都会更清晰。
A semaphore whose counter is restricted to the values 0 and 1 is called binary semaphore: only one thread at a time can access the shared resource. Wait: this is basically a mutex protecting a critical section! You can actually replicate the mutex behavior with a binary semaphore. However there are two important points to keep in mind:
- a mutex can be unlocked only by thread that had locked it first, while a semaphore can be released from any other thread. This could lead to confusion and subtle bugs if what you want is just a locking mechanism;
- semaphores are signaling mechanisms that orchestrate threads, while mutexes are locking mechanisms that protects shared resources. You should not use semaphores to protect shared resources, nor mutexes for signaling: your intent will be more clear to you and to anyone who will read your code.
条件变量¶
Condition variables
条件变量是另一个为顺序设计的同步原语。它们用于跨不同线程发送唤醒信号。条件变量总是与互斥锁密切相关;单独使用它没有意义。
Condition variables are another synchronization primitive designed for ordering. They are used for sending a wake up signal across different threads. A condition variable always goes hand in hand with a mutex; using it alone doesn't make sense.
从技术上讲,条件变量是应用程序中的一个全局对象,在多个线程之间共享,它提供了三个通常称为 wait
、notify_one
和 notify_all
的函数,以及一种将现有互斥锁传递给它的机制( 确切的方式取决于实现)。
Technically, a condition variable is a global object in your application, shared across multiple threads, that provides three functions usually called
wait
,notify_one
andnotify_all
, plus a mechanism to pass it an existing mutex to work with (the exact way depends on the implementation).
在条件变量上调用 wait
的线程被操作系统置于睡眠状态,然后另一个想要唤醒它的线程调用 notify_one
或 notify_all
。不同之处在于 notify_one
只解冻一个线程,而 notify_all
向所有在条件变量上调用 wait
之后的休眠线程发送唤醒调用。内部使用互斥锁来提供睡眠/唤醒机制。
A thread that calls
wait
on the condition variable is put to sleep by the operating system. Then another thread that wants to wake it up invokesnotify_one
ornotify_all
. The difference is thatnotify_one
unfreezes only one thread, whilenotify_all
sends the wake up call to all the threads that are sleeping after thewait
call on the condition variable. The mutex is used internally to provide the sleep/wakeup mechanism.
条件变量是一种强大的机制,可以在线程之间发送信号,而仅使用互斥锁是无法实现的。例如,可以使用它们来再次解决生产者-消费者问题,其中线程 A 在完成时会发出一个信号,以便线程 B 可以开始其工作。
Condition variables are a powerful mechanism to send signals between threads that you couldn't achieve with mutexes alone. For example you can use them to solve the Producer-Consumer problem once again, where thread A emits a signal when it's done so that thread B can start its job.
同步中的常见问题¶
Common problems in synchronization
本文中描述的所有同步原语都有一个共同点:它们使线程进入睡眠状态。出于这个原因,它们也被称为阻塞机制。如果想避免数据竞争或竞争条件,阻塞机制是防止并发访问共享资源的好方法:休眠线程并没有害处。但它可能会产生副作用,快速了解一下它们。
All the synchronization primitives described in this article have something in common: they put threads to sleep. For this reason they are also called blocking mechanisms. A blocking mechanism is a good way to prevent concurrent access to a shared resource if you want to avoid data races or race conditions: a sleeping thread does no harm. But it can trigger unfortunate side effects. Let's take a quick look at them.
死锁¶
Deadlock
当一个线程正在等待另一个线程持有的共享变量,而第二个线程正在等待第一个线程持有的共享变量时,就会发生死锁。这种情况通常发生在使用多个互斥锁时:两个线程在无限循环中永远等待:线程 A 等待线程 B 等待线程 A 等待线程 B......
A deadlock occurs when a thread is waiting for a shared variable that another thread holds, and this second thread is waiting for a shared variable that the first thread holds. These things usually happen when working with multiple mutexes: the two threads remain waiting forever in an infinite circular loop: thread A waits for thread B which waits for thread A which waits for thread B which...
饥饿¶
Starvation
当一个线程没有得到足够的“爱”时,它会进入饥饿模式:它会无限期地停留在睡眠状态,同时等待访问不断提供给其他线程的共享资源。例如,一个设计不佳的基于信号量的算法可能会忘记唤醒等待线程后面的许多线程之一,只为其中的一个子集提供优先权。饥饿的线程将永远处于等待而不做任何有用的工作。
A thread goes in starvation mode when it doesn't get enough love: it remains stuck indefinitely in its sleep state while waiting for access to a shared resource that is continuously given to other threads. For example a poorly designed semaphore-based algorithm might forget to wake up one of the many threads behind the waiting line, by giving precedence only to a subset of them. The starving thread would wait forever without doing any useful work.
虚假唤醒¶
Spurious wake-ups
这是一个微妙的问题,它来自于条件变量在某些操作系统中的实际实现方式。在虚假唤醒中,即使没有通过条件变量发出信号,线程也会被唤醒。这就是为什么大多数同步原语还包括一种方法,它用来检查唤醒信号是否真的来自线程正在等待的条件变量。
This is a subtle problem that comes from how condition variables are actually implemented in some operating systems. In a spurious wake-up a thread wakes up even if not signaled through the condition variable. That's why most synchronization primitives also include a way to check if the wakeup signal really comes from the condition variable the thread is waiting on.
优先级反转¶
Priority inversion
优先级反转发生在执行高优先级任务的线程被阻塞,并等待低优先级线程释放资源(例如互斥锁)时。例如,当向声卡输出音频的线程(高优先级)被显示界面的线程(低优先级)阻塞时,会导致扬声器出现严重故障。
Priority inversion occurs when a thread performing a high-priority task is blocked waiting for a lower-priority thread to release a resource, such as a mutex. For example when the thread that outputs audio to the soundcard (high priority) is blocked by the thread that displays the interface (low priority), resulting in a bad glitch through your speakers.
接下来¶
What's next
所有这些同步问题已经研究了多年,并且有许多技术和架构的解决方案可供使用。精心的设计和一点经验对预防问题大有帮助。此外,鉴于多线程应用程序的非确定的性质,人们开发了一些有趣的工具来检测并发代码中的错误和潜在陷阱。像 Google 的 TSan 或 Helgrind 这样的项目都只是其中的一小部分。
All these synchronization problems have been studied for years and many solutions, both technical and architectural are available. A careful design and a bit of experience help a lot in prevention. Also, given the non-deterministic, (i.e. extremely hard) nature of multithreaded applications, people have developed interesting tools to detect errors and potential pitfalls in concurrent code. Projects like Google's TSan or Helgrind are just a few of them.
然而,有时想采用不同的方法并摆脱多线程应用程序中的任何阻塞机制。这意味着将进入非阻塞的领域:一个非常低级的领域,其中线程永远不会被操作系统置于睡眠状态,并发性通过原子原语和无锁数据结构来调节。这是一个具有挑战性的领域,并不总是必须的,它可以提高软件速度或对其造成严重破坏。但这是下一篇的故事……
However, sometimes you want to take a different route and get rid of any blocking mechanism in your multithreaded application. This would mean to enter the non-blocking realm: a very low-level territory, where threads are never put to sleep by the operating system and concurrency is regulated through atomic primitives and lock-free data structures. It's a challenging field, not always necessary, which can boost the speed of your software or wreak havoc on it. But this is a story for the next episode...
Sources¶
Wikipedia - Synchronization (computer science)
Wikipedia - Reentrant mutex
Wikipedia - Reentrancy (computing)
Wikipedia - Semaphore (programming)
Wikipedia - Spurious Wakeup
Wikipedia - Priority inversion
Wikipedia - Deadlock
Wikipedia - Starvation (computer science)
Columbia Engineering - Synchronization primitives
StackOverflow - Definition of “synchronization primitive”
StackOverflow - Lock, mutex, semaphore… what's the difference?
StackOverflow - Why is locking a std::mutex twice 'Undefined Behaviour'?
Operating Systems: Three Easy Pieces - Concurrency
Jaka's Corner - Data race and mutex
Java 10 API specs - Class Semaphore
Oracle's Multithreaded Programming Guide - Read-Write Lock Attributes
Oracle's Multithreaded Programming Guide - Avoiding Deadlock
Just Software Solutions - Locks, Mutexes, and Semaphores: Types of Synchronization Objects
Just Software Solutions - Definitions of Non-blocking, Lock-free and Wait-free
Cppreference - std::shared_mutex
Cppreference - std::condition_variable
Quora - What is the difference between a mutex and a semaphore?
gerald-fahrnholz.eu - Using condition variables - the safe way
Politecnico di Milano - Thread Posix: Condition Variables
SoftwareEngineering - Spurious wakeups explanation sounds like a bug that just isn't worth fixing, is that right?
Android Open Source Project - Avoiding Priority Inversion