标题:A gentle introduction to multithreading


Approaching the world of concurrency, one step at a time.


Modern computers have the ability to perform multiple operations at the same time. Supported by hardware advancements and smarter operating systems, this feature makes your programs run faster, both in terms of speed of execution and responsiveness.

编写利用这种功能的软件很有趣,但也很棘手:它需要更深层次的了解计算机底层。在第一段中,我将尝试揭开线程的面纱,它是操作系统提供的用于执行这类魔法的工具之一。Let's go!

Writing software that takes advantage of such power is fascinating, yet tricky: it requires you to understand what happens under your computer's hood. In this first episode I'll try to scratch the surface of threads, one of the tools provided by operating systems to perform this kind of magic. Let's go!


Processes and threads: naming things the right way


Modern operating systems can run multiple programs at the same time. That's why you can read this article in your browser (a program) while listening to music on your media player (another program). Each program is known as a process that is being executed. The operating system knows many software tricks to make a process run along with others, as well as taking advantage from the underlying hardware. Either way, the final outcome is that you sense all your programs to be running simultaneously.


Running processes in an operating system is not the only way to perform several operations at the same time. Each process is able to run simultaneous sub-tasks within itself, called threads. You can think of a thread as a slice of the process itself. Every process triggers at least one thread on startup, which is called the main thread. Then, according to the program/programmer's needs, additional threads may be started or terminated. Multithreading is about running multiple threads within a single process.


For example, it is likely that your media player runs multiple threads: one for rendering the interface — this is usually the main thread, another one for playing the music and so on.


You can think of the operating system as a container that holds multiple processes, where each process is a container that holds multiple threads. In this article I will focus on threads only, but the whole topic is fascinating and deserves more in-depth analysis in the future.


1. 操作系统可以看作是一个包含进程的盒子,进程又包含一个或多个线程。

  1. Operating systems can be seen as a box that contains processes, which in turn contain one or more threads.


The differences between processes and threads

每个进程都有自己的、由操作系统分配的内存块。该内存块默认不能与其他进程共享:浏览器无法访问分配给媒体播放器的内存,反之亦然。如果运行同一进程的两个实例,也是同理。话句话说,如果启动浏览器两次,操作系统将每个实例视为一个新进程,并为其分配了单独的内存部分。所以,默认情况下,两个或多个进程无法共享数据,除非它们使用高级技巧——所谓的进程间通信 (IPC)

Each process has its own chunk of memory assigned by the operating system. By default that memory cannot be shared with other processes: your browser has no access to the memory assigned to your media player and vice versa. The same thing happens if you run two instances of the same process, that is if you launch your browser twice. The operating system treats each instance as a new process with its own separate portion of memory assigned. So, by default, two or more processes have no way to share data, unless they perform advanced tricks — the so-called inter-process communication (IPC).


Unlike processes, threads share the same chunk of memory assigned to their parent process by the operating system: data in the media player main interface can be easily accessed by the audio engine and vice versa. Therefore is easier for two threads to talk to eachother. On top of that, threads are usually lighter than a process: they take less resources and are faster to create, that's why they are also called lightweight processes.

线程是一个让程序同时执行多个操作的便捷方式。如果没有线程,就不得不为每个任务编写一个程序,将它们作为进程运行并通过操作系统同步它们。这将更困难(IPC 很棘手)并且更慢(进程比线程重)。

Threads are a handy way to make your program perform multiple operations at the same time. Without threads you would have to write one program per task, run them as processes and synchronize them through the operating system. This would be more difficult (IPC is tricky) and slower (processes are heavier than threads).


Green threads, of fibers


Threads mentioned so far are an operating system thing: a process that wants to fire a new thread has to talk to the operating system. Not every platform natively support threads, though. Green threads, also known as fibers are a kind of emulation that makes multithreaded programs work in environments that don't provide that capability. For example a virtual machine might implement green threads in case the underlying operating system doesn't have native thread support.


Green threads are faster to create and to manage because they completely bypass the operating system, but also have disadvantages. I will write about such topic in one of the next episodes.

“绿色线程”这个名称来源于在 90 年代设计原始 Java 线程库的 Sun Microsystem 的 Green Team。如今 Java 已不再使用绿色线程:早在 2000 年就切换到了原生线程。一些其他的编程语言如 Go、Haskell 或 Rust 等等实现了绿色线程的等价物来取代原生线程。

The name "green threads" refers to the Green Team at Sun Microsystem that designed the original Java thread library in the 90s. Today Java no longer makes use of green threads: they switched to native ones back in 2000. Some other programming languages — Go, Haskell or Rust to name a few — implement equivalents of green threads instead of native ones.


What threads are used for

为什么一个进程应该使用多个线程? 正如之前提到的,并行处理可以大大加快速度。假设要在电影编辑器中渲染电影,编辑器可能足够聪明,将渲染操作分散到多个线程中,每个线程只处理最终电影的一小部分。因此,如果使用一个线程,任务需要 1 小时,那么使用两个线程,则只需要 30 分钟;四个线程 15 分钟,以此类推。

Why should a process employ multiple threads? As I mentioned before, doing things in parallel greatly speeds up things. Say you are about to render a movie in your movie editor. The editor could be smart enough to spread the rendering operation across multiple threads, where each thread processes a chunk of the final movie. So if with one thread the task would take, say, one hour, with two threads it would take 30 minutes; with four threads 15 minutes, and so on.

真有那么简单吗? 这有三个问题需要考虑:

  1. 不是每个程序都需要多线程。如果应用程序执行顺序操作或经常等待用户执行某项操作,那么多线程可能没有那么适用;
  2. 并非在应用程序中使用更多线程就可以使其运行更快:要执行并行操作,每个子任务都必须经过仔细考虑和设计;
  3. 并不 100% 保证线程会真正并行执行它们的操作,这里的并行指同时执行:这完全取决于底层硬件。

Is it really that simple? There are three important points to consider:

  1. not every program needs to be multithreaded. If your app performs sequential operations or often waits on the user to do something, multithreading might not be that beneficial;
  2. you just don't throw more threads to an application to make it run faster: each sub-task has to be thought and designed carefully to perform parallel operations;
  3. it is not 100% guaranteed that threads will perform their operations truly in parallel, that is at the same time: it really depends on the underlying hardware.

最后一个至关重要:如果计算机不支持同时进行多个操作,则操作系统必须伪造它们。我们马上就知道了。现在让我们将 并发视为感觉多个任务在同时运行,而 真正的并行(true parallelism) 视为实际上多个任务在同时运行。

The last one is crucial: if your computer doesn't support multiple operations at the same time, the operating system has to fake them. We will see how in a minute. For now let's think of concurrency as the perception of having tasks that run at the same time, while true parallelism as tasks that literally run at the same time.


2. 并行是并发的一个子集。

  1. Parallelism is a subset of concurrency.


What makes concurrency and parallelism possible

计算机中的中央处理器 (CPU) 负责运行程序。它由几个部分组成,主要的部分被称为核心:这是实际执行计算的地方。一个核心一个时刻只能执行一个操作。

The central processing unit (CPU) in your computer does the hard work of running programs. It is made of several parts, the main one being the so-called core: that's where computations are actually performed. A core is capable of running only one operation at a time.


This is of course a major drawback. For this reason operating systems have developed advanced techniques to give the user the ability to running multiple processes (or threads) at once, especially on graphical environments, even on a single core machine. The most important one is called preemptive multitasking, where preemption is the ability of interrupting a task, switching to another one and then resuming the first task at a later time.

因此,如果 CPU 只有一个核心,那么操作系统的一部分工作就是将单核计算能力分散到多个进程或线程中,这些进程或线程在循环中一个接一个地执行。此操作可以产生一种错觉,即有多个程序在“并行”运行,或者单个程序“同时”执行多项操作(如果是多线程的)。这满足了并发性,但非真正的并行性(同时运行进程的能力)。

So if your CPU has only one core, part of a operating system's job is to spread that single core computing power across multiple processes or threads, which are executed one after the other in a loop. This operation gives you the illusion of having more than one program running in parallel, or a single program doing multiple things at the same time (if multithreaded). Concurrency is met, but true parallelism — the ability to run processes simultaneously — is still missing.

如今,现代 CPU 拥有多个核心,每个核心可以一次执行一项独立的操作。这意味着具有两个或更多核心的真正并行性成为可能。比如我使用的 Intel Core i7 有四个核心:它可以同时运行四个不同的进程或线程。

Today modern CPUs have more than one core under the hood, where each one performs an independent operation at a time. This means that with two or more cores true parallelism is possible. For example, my Intel Core i7 has four cores: it can run four different processes or threads at the same time, simultaneously.

操作系统能够检测 CPU 核心的数量并将进程或线程分配给每个核心。这种调度对于正在运行的程序是完全透明的,操作系统可以将线程分配给任何核心。此外,如果所有核心都忙,抢先式多任务处理可能会启动,这能够运行比机器中可用的实际核心数量更多的进程和线程。

Operating systems are able to detect the number of CPU cores and assign processes or threads to each one of them. A thread may be allocated to whatever core the operating system likes, and this kind of scheduling is completely transparent for the program being run. Additionally, preemptive multitasking might kick in in case all cores are busy. This gives you the ability to run more processes and threads than the actual number or cores available in your machine.


Multi-threading application on a single core: does it make sense?


True parallelism on a single-core machine is impossible to achieve. Nevertheless it still makes sense to write a multithreaded program, if your application can benefit from it. When a process employs multiple threads, preemptive multitasking can keep the app running even if one of those threads performs a slow or blocking task.

假设正在开发一个桌面应用程序,它将从非常慢的磁盘中读取一些数据。如果只用一个线程编写程序,整个应用程序在磁盘操作完成前都会无响应:分配给这个线程的 CPU 功率在等待磁盘唤醒时被浪费了。当然,操作系统正在运行除此之外的许多其他进程,但该应用程序不会取得任何进展。

Say for example you are working on a desktop app that reads some data from a very slow disk. If you write the program with just one thread, the whole app would freeze until the disk operation is finished: the CPU power assigned to the only thread is wasted while waiting for the disk to wake up. Of course the operating system is running many other processes besides this one, but your specific application will not be making any progress.

如果以多线程的思想重新考虑这个应用程序。线程 A 负责磁盘访问,而线程 B 负责主界面。如果线程 A 因设备速度慢而陷入等待,线程 B 仍然可以运行主界面,从而使程序保持响应。这是可能的,因为拥有两个线程,操作系统可以在它们之间切换 CPU 资源,而不会卡在较慢的线程上。

Let's rethink your app in a multithreaded way. Thread A is responsible for the disk access, while thread B takes care of the main interface. If thread A gets stuck waiting because the device is slow, thread B can still run the main interface, keeping your program responsive. This is possible because, having two threads, the operating system can switch the CPU resources between them without getting stuck on the slower one.


More threads, more problems


As we know, threads share the same chunk of memory of their parent process. This makes extremely easy for two or more of them to exchange data within the same application. For example: a movie editor might hold a big portion of shared memory containing the video timeline. Such shared memory is being read by several worker threads designated for rendering the movie to a file. They all just need a handle (e.g. a pointer) to that memory area in order to read from it and output rendered frames to disk.


  • 数据竞争(data race)——当写入线程修改内存时,读取线程可能正在从中读取。如果写入线程还没有完成工作,读取线程会得到损坏的数据;
  • 竞争条件(race condition)——读取线程应该只在写入线程写完之后才能读。如果发生相反的情况怎么办? 比数据竞争更微妙的是,竞争条件是关于两个或多个线程以不可预测的顺序执行它们的工作,而实际上应该以正确的顺序执行操作才能正确完成。即使程序受到数据竞争的保护,它也可能触发竞争条件。

Things run smoothly as long as two or more threads read from the same memory location. The troubles kick in when at least one of them writes to the shared memory, while others are reading from it. Two problems can occur at this point:

  • data race — while a writer thread modifies the memory, a reader thread might be reading from it. If the writer has not finished its work yet, the reader will get corrupted data;
  • race condition — a reader thread is supposed to read only after a writer has written. What if the opposite happens? More subtle than a data race, a race condition is about two or more threads doing their job in an unpredictable order, when in fact the operations should be performed in the proper sequence to be done correctly. Your program can trigger a race condition even if it has been protected against data races.


The concept of thread safety


A piece of code is said to be thread-safe if it works correctly, that is without data races or race conditions, even if many threads are executing it simultaneously. You might have noticed that some programming libraries declare themselves as being thread-safe: if you are writing a multithreaded program you want to make sure that any other third-party function can be used across different threads without triggering concurrency problems.


The root cause of data races

我们知道一个 CPU 内核一次只能执行一条机器指令。这样的指令被称为原子的(atomic),因为它是不可分割的:它不能分解成更小的操作。希腊词“atom”(ἄτομος; atomos)的意思是不可分割的

We know that a CPU core can perform only one machine instruction at a time. Such instruction is said to be atomic because it's indivisible: it can't be broken into smaller operations. The Greek word "atom" (ἄτομος; atomos) means uncuttable.


The property of being indivisible makes atomic operations thread-safe by nature. When a thread performs an atomic write on shared data, no other thread can read the modification half-complete. Conversely, when a thread performs an atomic read on shared data, it reads the entire value as it appeared at a single moment in time. There is no way for a thread to slip through an atomic operation, thus no data race can happen.

坏消息是,绝大多数操作都是非原子的。即使是在某些硬件上像 x = 1 这样的简单赋值也可能由多个原子机器指令组成,从而使赋值本身作为一个整体是非原子的。因此,如果一个线程读取 x 而另一个线程执行赋值,则会触发数据竞争。

The bad news is that the vast majority of operations out there are non-atomic. Even a trivial assignment like x = 1 on some hardware might be composed of multiple atomic machine instructions, making the assignment itself non-atomic as a whole. So a data race is triggered if a thread reads x while another one performs the assignment.


The root cause of race conditions


Preemptive multitasking gives the operating system full control over thread management: it can start, stop and pause threads according to advanced scheduling algorithms. You as a programmer cannot control the time or order of execution. In fact, there is no guarantee that a simple code like this:



would start the two threads in that specific order. Run this program several times and you will notice how it behaves differently on each run: sometimes the writer thread starts first, sometimes the reader does instead. You will surely hit a race condition if your program needs the writer to always run before the reader.


This behavior is called non-deterministic: the outcome changes each time and you can't predict it. Debugging programs affected by a race condition is very annoying because you can't always reproduce the problem in a controlled way.


Teach threads to get along: concurrency control


  • 同步(synchronization)——确保资源每个时刻只能由一个线程使用。同步是将代码的特定部分标记为“受保护的”,以便两个或多个并发线程不会同时执行它,从而破坏共享数据;
  • 原子操作(atomic operations)——得益于操作系统提供的特殊指令,一组非原子操作(比如之前提到的赋值)可以变成原子操作。这样,无论其他线程如何访问,共享数据始终保持有效状态;
  • 不可变数据(immutable data)——共享数据被标记为不可变,没有什么可以改变它:线程只允许从中读取,这消除了起因。我们知道,只要不修改共享数据,线程便可以安全地读取共享数据。这是函数式编程背后的主要哲学。

Both data races and race conditions are real-world problems: some people even died because of them. The art of accommodating two or more concurrent threads is called concurrency control: operating systems and programming languages offer several solutions to take care of it. The most important ones:

  • synchronization — a way to ensure that resources will be used by only one thread at a time. Synchronization is about marking specific parts of your code as "protected" so that two or more concurrent threads do not simultaneously execute it, screwing up your shared data;
  • atomic operations — a bunch of non-atomic operations (like the assignment mentioned before) can be turned into atomic ones thanks to special instructions provided by the operating system. This way the shared data is always kept in a valid state, no matter how other threads access it;
  • immutable data — shared data is marked as immutable, nothing can change it: threads are only allowed to read from it, eliminating the root cause. As we know threads can safely read from the same memory location as long as they don't modify it. This is the main philosophy behind functional programming.


I will cover all this fascinating topics in the next episodes of this mini-series about concurrency. Stay tuned!


8 bit avenue - Difference between Multiprogramming, Multitasking, Multithreading and Multiprocessing
Wikipedia - Inter-process communication
Wikipedia - Process (computing)
Wikipedia - Concurrency (computer science)
Wikipedia - Parallel computing
Wikipedia - Multithreading (computer architecture)
Stackoverflow - Threads & Processes Vs MultiThreading & Multi-Core/MultiProcessor: How they are mapped?
Stackoverflow - Difference between core and processor?
Wikipedia - Thread (computing)
Wikipedia - Computer multitasking
Ibm.com - Benefits of threads
Haskell.org - Parallelism vs. Concurrency
Stackoverflow - Can multithreading be implemented on a single processor system?
HowToGeek - CPU Basics: Multiple CPUs, Cores, and Hyper-Threading Explained
Oracle.com - 1.2 What is a Data Race?
Jaka's corner - Data race and mutex
Wikipedia - Thread safety
Preshing on Programming - Atomic vs. Non-Atomic Operations
Wikipedia - Green threads
Stackoverflow - Why should I use a thread vs. using a process?