数组、切片(和字符串):append 方法详解¶
Info
原文标题:Arrays, slices (and strings): The mechanics of 'append'
原文链接:https://blog.golang.org/slices
原文作者:Rob Pike
本文链接:https://www.white-winds.com/post/arrays, slices (and strings) the mechanics of append
简介¶
Introduction
数组的概念是过程编程语言最常见的功能之一。它看起来很简单,但将它们添加到语言时必须回答许多问题,比如:
- 固定容量还是可变容量?
- 容量是其类型的一部分吗?
- 多维数组是什么样的?
- 空数组有意义吗?
One of the most common features of procedural programming languages is the concept of an array. Arrays seem like simple things but there are many questions that must be answered when adding them to a language, such as:
- fixed-size or variable-size?
- is the size part of the type?
- what do multidimensional arrays look like?
- does the empty array have meaning?
这些问题的答案会影响数组是否只是语言的一个特性,还是说其是设计的核心部分。
The answers to these questions affect whether arrays are just a feature of the language or a core part of its design.
在 Go 的早期开发中,大约花了一年的时间来确定这些问题的答案。最关键的一步是引入切片,它建立在固定大小的数组之上,提供灵活、可扩展的数据结构。然而,如今刚接触 Go 的程序员经常会对切片的工作方式感到困惑,这也许是因为其他语言的经验影响了他们的思维。
In the early development of Go, it took about a year to decide the answers to these questions before the design felt right. The key step was the introduction of slices, which built on fixed-size arrays to give a flexible, extensible data structure. To this day, however, programmers new to Go often stumble over the way slices work, perhaps because experience from other languages has colored their thinking.
本文将尝试消除这些困惑,我们将通过一些代码片段来解释内置函数 append
是如何工作的,以及它为何会这样工作。
In this post we’ll attempt to clear up the confusion. We’ll do so by building up the pieces to explain how the
append
built-in function works, and why it works the way it does.
数组¶
Arrays
数组是 Go 中的一个重要构建块,但就像建筑物的基础一样,它们通常隐藏在更常见的组件之下。在我们继续讨论更有趣、更强大、更突出的切片概念之前,我们必须先简单地讨论一下数组。
Arrays are an important building block in Go, but like the foundation of a building they are often hidden below more visible components. We must talk about them briefly before we move on to the more interesting, powerful, and prominent idea of slices.
数组在 Go 程序中并不常见,因为数组的大小是其类型的一部分,这限制了它的表达能力。
Arrays are not often seen in Go programs because the size of an array is part of its type, which limits its expressive power.
语句 var buffer [256]byte
声明了变量 buffer
,它能容纳 256 个字节。buffer
的类型包含它的容量,为 [256]byte
。具有 512 个字节容量的数组将是与之不同的类型 [512]byte
。
The declaration
var buffer [256]byte
declares the variable
buffer
, which holds 256 bytes. The type ofbuffer
includes its size,[256]byte
. An array with 512 bytes would be of the distinct type[512]byte
.
从示意图上看,我们的 buffer
在内存中就像这样,
The data associated with an array is just that: an array of elements. Schematically, our buffer looks like this in memory,
buffer: byte byte byte ... 256 times ... byte byte byte
也就是说,该变量只保存 256 个字节的数据,仅此而已。我们可以使用熟悉的索引语法 buffer[0]
、buffer[1]
到 buffer[255]
等来访问它的元素(索引范围 0 到 255 涵盖 256 个元素)。尝试使用超出此范围的值来索引 buffer
会使程序崩溃。
That is, the variable holds 256 bytes of data and nothing else. We can access its elements with the familiar indexing syntax,
buffer[0]
,buffer[1]
, and so on throughbuffer[255]
. (The index range 0 through 255 covers 256 elements.) Attempting to indexbuffer
with a value outside this range will crash the program.
内置函数 len
,它返回数组或切片以及其他一些数据类型的元素数量。对于数组,很明显 len
会返回了什么。上例 len(buffer)
返回固定值 256。
There is a built-in function called
len
that returns the number of elements of an array or slice and also of a few other data types. For arrays, it’s obvious whatlen
returns. In our example,len(buffer)
returns the fixed value 256.
数组仍有它们的用武之地——例如,它们是进行矩阵变换的常用数据结构——虽然它们在 Go 中最常见的用途是为切片维持存储空间。
Arrays have their place—they are a good representation of a transformation matrix for instance—but their most common purpose in Go is to hold storage for a slice.
切片:切片头¶
Slices: The slice header
切片是重点所在,但要很好地使用它,必须准确了解它是什么以及它做什么。
Slices are where the action is, but to use them well one must understand exactly what they are and what they do.
切片是一个描述了存储在其他地方的数组中的一段连续数据的数据结构。切片不是数组。切片描述数组中的一部分。
A slice is a data structure describing a contiguous section of an array stored separately from the slice variable itself. A slice is not an array. A slice describes a piece of an array.
使用上一节中的数组变量 buffer
,我们可以创建一个切片,来描述元素 100 到 150(准确而言,是 100 到 149,闭区间)
Given our
buffer
array variable from the previous section, we could create a slice that describes elements 100 through 150 (to be precise, 100 through 149, inclusive) by slicing the array:
var slice []byte = buffer[100:150]
上面代码片段中我们使用了完整的变量声明。变量 slice
的类型为 []byte
,称为“字节切片”,并通过 buffer
数组初始化。更常见的用法是删除由初始化表达式设置的类型:
In that snippet we used the full variable declaration to be explicit. The variable
slice
has type[]byte
, pronounced “slice of bytes”, and is initialized from the array, calledbuffer
, by slicing elements 100 (inclusive) through 150 (exclusive). The more idiomatic syntax would drop the type, which is set by the initializing expression:
var slice = buffer[100:150]
在函数内部,可以使用更简短的声明形式,
Inside a function we could use the short declaration form,
slice := buffer[100:150]
那么这个切片变量到底是什么?虽然这并不全面,但现在请先将切片视为一个具有两个基本部分的轻量数据结构:长度和指向数组某个元素的指针。你可以把它想象成是在底层这样构建的:
What exactly is this slice variable? It’s not quite the full story, but for now think of a slice as a little data structure with two elements: a length and a pointer to an element of an array. You can think of it as being built like this behind the scenes:
type sliceHeader struct {
Length int
ZerothElement *byte
}
slice := sliceHeader{
Length: 50,
ZerothElement: &buffer[100],
}
当然,这进行了简化。尽管这段代码描述的 sliceHeader
结构体对程序员是不可见的,而且元素指针的类型还取决于元素的类型,但这给出了切片机制的一般概念。
Of course, this is just an illustration. Despite what this snippet says that
sliceHeader
struct is not visible to the programmer, and the type of the element pointer depends on the type of the elements, but this gives the general idea of the mechanics.
到目前为止,我们已经对数组进行了切片操作,还可以对切片进行再切片操作,如:
So far we’ve used a slice operation on an array, but we can also slice a slice, like this:
slice2 := slice[5:10]
和之前一样,此操作创建了一个新的切片,在本例中包含原切片中第 5 到 9(包含)的元素,这也意味着其内容是原始数组的第 105 到 109(包含)元素 。slice2
变量的底层结构 sliceHeader
如下:
Just as before, this operation creates a new slice, in this case with elements 5 through 9 (inclusive) of the original slice, which means elements 105 through 109 of the original array. The underlying
sliceHeader
struct for theslice2
variable looks like this:
slice2 := sliceHeader{
Length: 5,
ZerothElement: &buffer[105],
}
请注意,此切片仍然指向相同的底层数组,也即 buffer
变量所指向的数组。
Notice that this header still points to the same underlying array, stored in the
buffer
variable.
我们还可以 reslice,即对一个切片进行再切片操作,并将结果存储回原切片中。执行
We can also reslice, which is to say slice a slice and store the result back in the original slice structure. After
slice = slice[5:10]
之后,变量 slice
的底层结构 sliceHeader
看起来就像 slice2
变量的结构。你经常会看到 reslice 的使用,比如截断一个切片。下面这个语句删除切片的第一个和最后一个元素:
the
sliceHeader
structure for theslice
variable looks just like it did for theslice2
variable. You’ll see reslicing used often, for example to truncate a slice. This statement drops the first and last elements of our slice:
slice = slice[1:len(slice)-1]
【练习:写出上面代码执行之后的 sliceHeader
结构。】
[Exercise: Write out what the
sliceHeader
struct looks like after this assignment.]
你可能经常听到有经验的 Go 程序员谈论“切片头”,因为这确实是存储在切片变量中的内容。例如,当你调用以切片作为参数的某个函数时,比如bytes.IndexRune,该切片头就是传递给函数的内容。在这个调用中,
You’ll often hear experienced Go programmers talk about the “slice header” because that really is what’s stored in a slice variable. For instance, when you call a function that takes a slice as an argument, such as bytes.IndexRune, that header is what gets passed to the function. In this call,
slashPos := bytes.IndexRune(slice, '/')
传递给 IndexRune
函数的 slice
参数实际上是一个切片头。
the
slice
argument that is passed to theIndexRune
function is, in fact, a “slice header”.
在讨论切片头中的另一个数据项之前,让我们先看看在使用切片进行编程时切片头的存在意味着什么。
There’s one more data item in the slice header, which we talk about below, but first let’s see what the existence of the slice header means when you program with slices.
函数入参中的切片¶
Passing slices to functions
重要的是要理解:虽然切片包含一个指针,但它本身是一个值。在底层它是一个包含指针和长度的结构体。它不是指向结构体的指针。
It’s important to understand that even though a slice contains a pointer, it is itself a value. Under the covers, it is a struct value holding a pointer and a length. It is not a pointer to a struct.
这非常重要。
This matters.
我们在之前的示例中调用 IndexRune
时,它被传入了切片头的一个副本。这种行为会产生重要的影响。
When we called
IndexRune
in the previous example, it was passed a copy of the slice header. That behavior has important ramifications.
考虑这个简单的函数:
Consider this simple function:
func AddOneToEachElement(slice []byte) {
for i := range slice {
slice[i]++
}
}
像函数名所表达的那样,迭代切片(使用 for
range
循环),令它的元素自增 1。
It does just what its name implies, iterating over the indices of a slice (using a
for
range
loop), incrementing its elements.
尝试一下:
Try it:
func main() {
slice := buffer[10:20]
for i := 0; i < len(slice); i++ {
slice[i] = byte(i)
}
fmt.Println("before", slice)
AddOneToEachElement(slice)
fmt.Println("after", slice)
}
(如果你想探究一番,可以编辑并重新运行这些片段。)
(You can edit and re-execute these runnable snippets if you want to explore.)
尽管切片头是按值传递的,但它包含指向数组元素的指针,因此原始的切片头和传递给函数的副本都指向了同一个数组。于是函数返回后,可以通过原始切片变量看到修改后的元素。
Even though the slice header is passed by value, the header includes a pointer to elements of an array, so both the original slice header and the copy of the header passed to the function describe the same array. Therefore, when the function returns, the modified elements can be seen through the original slice variable.
函数入参实际上是一个副本,示例如下:
The argument to the function really is a copy, as this example shows:
func SubtractOneFromLength(slice []byte) []byte {
slice = slice[0 : len(slice)-1]
return slice
}
func main() {
fmt.Println("Before: len(slice) =", len(slice))
newSlice := SubtractOneFromLength(slice)
fmt.Println("After: len(slice) =", len(slice))
fmt.Println("After: len(newSlice) =", len(newSlice))
}
这里我们看到切片参数的内容可以被函数修改,但它的切片头不能。存储在 slice
变量中的长度不会被函数调用修改,因为传递给函数的是切片头的副本,而不是原始的。因此,如果我们想编写一个修改切片头的函数,我们必须将它作为结果返回,如上所示。slice
变量没有改变,返回值具有新的长度,最后存储在 newSlice
中。
Here we see that the contents of a slice argument can be modified by a function, but its header cannot. The length stored in the
slice
variable is not modified by the call to the function, since the function is passed a copy of the slice header, not the original. Thus if we want to write a function that modifies the header, we must return it as a result parameter, just as we have done here. Theslice
variable is unchanged but the returned value has the new length, which is then stored innewSlice
,
指向切片的指针:方法接收者¶
Pointers to slices: Method receivers
让函数修改切片头的另一种方法是传递一个指向它的指针。这是我们之前示例的变体:
Another way to have a function modify the slice header is to pass a pointer to it. Here’s a variant of our previous example that does this:
func PtrSubtractOneFromLength(slicePtr *[]byte) {
slice := *slicePtr
*slicePtr = slice[0 : len(slice)-1]
}
func main() {
fmt.Println("Before: len(slice) =", len(slice))
PtrSubtractOneFromLength(&slice)
fmt.Println("After: len(slice) =", len(slice))
}
这看起来很笨拙,特别是处理额外的中间层(通过临时变量)。但将指针作为入参是修改切片的常见方式。
It seems clumsy in that example, especially dealing with the extra level of indirection (a temporary variable helps), but there is one common case where you see pointers to slices. It is idiomatic to use a pointer receiver for a method that modifies a slice.
如果我们想在切片上添加一个方法:在最后一个斜杠处截断它。可以这样写:
Let’s say we wanted to have a method on a slice that truncates it at the final slash. We could write it like this:
type path []byte
func (p *path) TruncateAtFinalSlash() {
i := bytes.LastIndex(*p, []byte("/"))
if i >= 0 {
*p = (*p)[0:i]
}
}
func main() {
pathName := path("/usr/bin/tso") // 将 string转为 path // Conversion from string to path.
pathName.TruncateAtFinalSlash()
fmt.Printf("%s\n", pathName)
}
如果运行这个例子,你会看到它正常工作,更新调用者中的切片。
If you run this example you’ll see that it works properly, updating the slice in the caller.
【练习:将入参的类型改为值而不是指针,然后再次运行。解释发生了什么。】
[Exercise: Change the type of the receiver to be a value rather than a pointer and run it again. Explain what happens.]
另一方面,如果我们想为 path
编写一个将路径中的 ASCII 字母大写的方法(暂且忽略非英文名称),则该方法可以是一个值,因为值接收器仍将指向相同的底层数组。
On the other hand, if we wanted to write a method for
path
that upper-cases the ASCII letters in the path (parochially ignoring non-English names), the method could be a value because the value receiver will still point to the same underlying array.
type path []byte
func (p path) ToUpper() {
for i, b := range p {
if 'a' <= b && b <= 'z' {
p[i] = b + 'A' - 'a'
}
}
}
func main() {
pathName := path("/usr/bin/tso")
pathName.ToUpper()
fmt.Printf("%s\n", pathName)
}
这里 ToUpper
方法使用 for
range
构造的两个变量来表示索引和切片中的元素。这种形式的循环避免了多次写入 p[i]
。
Here the
ToUpper
method uses two variables in thefor
range
construct to capture the index and slice element. This form of loop avoids writingp[i]
multiple times in the body.
【练习:修改 ToUpper
方法使用指针入参,看看它的行为是否改变。】
[Exercise: Convert the
ToUpper
method to use a pointer receiver and see if its behavior changes.]
【进阶练习:修改 ToUpper
方法来处理 Unicode 字符,而不仅仅是 ASCII。】
[Advanced exercise: Convert the
ToUpper
method to handle Unicode letters, not just ASCII.]
容量¶
Capacity
看看下面的函数,它把入参的切片扩充了一个元素:
Look at the following function that extends its argument slice of
ints
by one element:
func Extend(slice []int, element int) []int {
n := len(slice)
slice = slice[0 : n+1]
slice[n] = element
return slice
}
(为什么它需要返回修改后的切片?)运行:
(Why does it need to return the modified slice?) Now run it:
func main() {
var iBuffer [10]int
slice := iBuffer[0:0]
for i := 0; i < 20; i++ {
slice = Extend(slice, i)
fmt.Println(slice)
}
}
看看切片是如何增加的,直到……它没有。
See how the slice grows until… it doesn’t.
是时候讨论切片头的第三个组成部分了:它的容量。除了数组指针和长度之外,切片头还存储其容量:
It’s time to talk about the third component of the slice header: its capacity. Besides the array pointer and length, the slice header also stores its capacity:
type sliceHeader struct {
Length int
Capacity int
ZerothElement *byte
}
Capacity
字段记录了底层数组实际有多少空间;它是 Length
可以达到的最大值。试图使切片超出其容量的行为将超越数组的限制,然后触发 panic。
The
Capacity
field records how much space the underlying array actually has; it is the maximum value theLength
can reach. Trying to grow the slice beyond its capacity will step beyond the limits of the array and will trigger a panic.
示例切片由
After our example slice is created by
slice := iBuffer[0:0]
创建,其切片头为:
its header looks like this:
slice := sliceHeader{
Length: 0,
Capacity: 10,
ZerothElement: &iBuffer[0],
}
Capacity
字段等于底层数组的长度,减去切片第一个元素在数组中的索引(上例中为零)。如果要查询切片的容量,可以使用内置函数 cap
:
The
Capacity
field is equal to the length of the underlying array, minus the index in the array of the first element of the slice (zero in this case). If you want to inquire what the capacity is for a slice, use the built-in functioncap
:
if cap(slice) == len(slice) {
fmt.Println("slice is full!")
}
make 函数¶
Make
如果我们想让切片超出其容量怎么办?不能!根据定义,容量是其增长的极限。但是你可以通过分配一个新数组、复制数据并修改切片来描述新数组,最终获得等效的结果。
What if we want to grow the slice beyond its capacity? You can’t! By definition, the capacity is the limit to growth. But you can achieve an equivalent result by allocating a new array, copying the data over, and modifying the slice to describe the new array.
让我们从分配开始。我们可以使用内置函数 new
分配一个更大的数组,然后对结果进行切片。但更简单的是使用内置函数 make
,它会分配一个新数组并创建一个切片头来描述新数组。make
函数接受三个参数:切片的类型、初始长度和容量,容量即用于保存切片数据的数组的长度。运行下例可以看到:这个调用创建了一个长度为 10 的切片,还有 5 个空间(15-10):
Let’s start with allocation. We could use the
new
built-in function to allocate a bigger array and then slice the result, but it is simpler to use themake
built-in function instead. It allocates a new array and creates a slice header to describe it, all at once. Themake
function takes three arguments: the type of the slice, its initial length, and its capacity, which is the length of the array thatmake
allocates to hold the slice data. This call creates a slice of length 10 with room for 5 more (15-10), as you can see by running it:
slice := make([]int, 10, 15)
fmt.Printf("len: %d, cap: %d\n", len(slice), cap(slice))
下面的代码将保持切片的长度不变,但容量加倍:
This snippet doubles the capacity of our
int
slice but keeps its length the same:
slice := make([]int, 10, 15)
fmt.Printf("len: %d, cap: %d\n", len(slice), cap(slice))
newSlice := make([]int, len(slice), 2*cap(slice))
for i := range slice {
newSlice[i] = slice[i]
}
slice = newSlice
fmt.Printf("len: %d, cap: %d\n", len(slice), cap(slice))
运行此代码后,在需要重新分配之前,切片有了更多的增长空间。
After running this code the slice has much more room to grow before needing another reallocation.
通常情况下,创建切片的长度和容量是相同的。内置函数 make
可以进行简写,容量默认等于长度,运行:
When creating slices, it’s often true that the length and capacity will be same. The
make
built-in has a shorthand for this common case. The length argument defaults to the capacity, so you can leave it out to set them both to the same value. After
gophers := make([]Gopher, 10)
之后,切片 gophers
的长度和容量都被设置为 10。
the
gophers
slice has both its length and capacity set to 10.
copy 函数¶
Copy
上节将切片的容量进行翻倍时,我们编写了一个循环来将旧数据复制到新切片中。内置函数 copy
可以使这个过程更轻松。它的参数是两个切片,将数据从右侧参数复制到左侧参数。使用 copy
的重写上例:
When we doubled the capacity of our slice in the previous section, we wrote a loop to copy the old data to the new slice. Go has a built-in function,
copy
, to make this easier. Its arguments are two slices, and it copies the data from the right-hand argument to the left-hand argument. Here’s our example rewritten to usecopy
:
newSlice := make([]int, len(slice), 2*cap(slice))
copy(newSlice, slice)
copy
很智能。它只复制它可以复制的内容,重点是两个参数的长度。换句话说,它复制的元素数量是两个切片长度中的最小值。这可以节省一点开销。此外,copy
返回它复制元素的数量,尽管它并不是总值得留意。
The
copy
function is smart. It only copies what it can, paying attention to the lengths of both arguments. In other words, the number of elements it copies is the minimum of the lengths of the two slices. This can save a little bookkeeping. Also,copy
returns an integer value, the number of elements it copied, although it’s not always worth checking.
当复制源和目标有重叠时,copy
也能正确处理,这意味着它可以用于在单个切片中移动元素。下面展示如何通过 copy
将值插入到切片中。
The
copy
function also gets things right when source and destination overlap, which means it can be used to shift items around in a single slice. Here’s how to usecopy
to insert a value into the middle of a slice.
// Insert inserts the value into the slice at the specified index,
// which must be in range.
// The slice must have room for the new element.
func Insert(slice []int, index, value int) []int {
// Grow the slice by one element.
slice = slice[0 : len(slice)+1]
// Use copy to move the upper part of the slice out of the way and open a hole.
copy(slice[index+1:], slice[index:])
// Store the new value.
slice[index] = value
// Return the result.
return slice
}
这个函数有几个值得注意的点。第一,它必须返回更新后的切片,因为切片的长度已经改变。其次,它使用了简写,表达式
There are a couple of things to notice in this function. First, of course, it must return the updated slice because its length has changed. Second, it uses a convenient shorthand. The expression
slice[i:]
等价于
means exactly the same as
slice[i:len(slice)]
当然,如果切片表达式的第一个元素为零,我们也可以省略它,虽然现在我们还没有使用到这个技巧。表达式
Also, although we haven’t used the trick yet, we can leave out the first element of a slice expression too; it defaults to zero. Thus
slice[:]
表示切片本身,这在对数组进行切片时非常有用。这个表达式是“描述数组中所有元素的切片”的最简洁写法:
just means the slice itself, which is useful when slicing an array. This expression is the shortest way to say “a slice describing all the elements of the array”:
array[:]
现在可以完全理解 Insert
函数了,运行它
Now that’s out of the way, let’s run our
Insert
function.
slice := make([]int, 10, 20) // Note capacity > length: room to add element.
for i := range slice {
slice[i] = i
}
fmt.Println(slice)
slice = Insert(slice, 5, 99)
fmt.Println(slice)
简单示例:Append¶
Append: An example
前几节我们编写了一个 Extend
函数,用来向切片添加一个元素。但它存在一些问题,如果切片的容量太小,函数就会崩溃(我们的 Insert
示例也有相同的问题)。现在我们已经准备好解决这个问题了,让我们为整数切片编写一个更健壮的 Extend
实现。
A few sections back, we wrote an
Extend
function that extends a slice by one element. It was buggy, though, because if the slice’s capacity was too small, the function would crash. (OurInsert
example has the same problem.) Now we have the pieces in place to fix that, so let’s write a robust implementation ofExtend
for integer slices.
func Extend(slice []int, element int) []int {
n := len(slice)
if n == cap(slice) {
// 切片已满,必须增长
// 我们的增长策略是:将其大小加倍并加 1,这样即使其大小是 0 也可以增长
// Slice is full; must grow.
// We double its size and add 1, so if the size is zero we still grow.
newSlice := make([]int, len(slice), 2*len(slice)+1)
copy(newSlice, slice)
slice = newSlice
}
slice = slice[0 : n+1]
slice[n] = element
return slice
}
这种情况下,返回切片尤其重要,因为当重新分配后,结果切片描述了一个完全不同的数组。下面这段代码将演示切片被填满时会发生什么:
In this case it’s especially important to return the slice, since when it reallocates the resulting slice describes a completely different array. Here’s a little snippet to demonstrate what happens as the slice fills up:
slice := make([]int, 0, 5)
for i := 0; i < 10; i++ {
slice = Extend(slice, i)
fmt.Printf("len=%d cap=%d slice=%v\n", len(slice), cap(slice), slice)
fmt.Println("address of 0th element:", &slice[0])
}
注意当初始大小为 5 的数组被填满时,触发重新分配。分配新数组后,切片的容量和索引为 0 的元素地址都发生了变化。
Notice the reallocation when the initial array of size 5 is filled up. Both the capacity and the address of the zeroth element change when the new array is allocated.
再强大的 Extend
函数指导下,我们可以编写一个更棒的函数,来为切片同时添加多个元素。为此,我们将使用 Go 在调用函数时将函数参数列表转换为切片的能力。即我们将在 Go 函数中使用可变参数。
With the robust
Extend
function as a guide we can write an even nicer function that lets us extend the slice by multiple elements. To do this, we use Go’s ability to turn a list of function arguments into a slice when the function is called. That is, we use Go’s variadic function facility.
我们将这个函数命名为 Append
,在第一个版本中,我们直接通过重复调用 Extend
来实现,目的是先弄清可变参数的机制,Append
的函数声明:
Let’s call the function
Append
. For the first version, we can just callExtend
repeatedly so the mechanism of the variadic function is clear. The signature ofAppend
is this:
func Append(slice []int, items ...int) []int
这表示 Append
接受一个切片,之后是零个或多个 int
。后面这些参数正是一个 int
切片,见下例:
What that says is that
Append
takes one argument, a slice, followed by zero or moreint
arguments. Those arguments are exactly a slice ofint
as far as the implementation ofAppend
is concerned, as you can see:
// Append 将 items 添加到 slice 中
// 版本一:直接循环调用 Extend
// Append appends the items to the slice.
// First version: just loop calling Extend.
func Append(slice []int, items ...int) []int {
for _, item := range items {
slice = Extend(slice, item)
}
return slice
}
注意 for
range
循环遍历 items
参数的元素,即 items
具有隐含的类型 []int
。上例中我们不需要循环中的索引,所以使用空白标识符 _
来丢弃它。
Notice the
for
range
loop iterating over the elements of theitems
argument, which has implied type[]int
. Also notice the use of the blank identifier_
to discard the index in the loop, which we don’t need in this case.
运行:
Try it:
slice := []int{0, 1, 2, 3, 4}
fmt.Println(slice)
slice = Append(slice, 5, 6, 7, 8)
fmt.Println(slice)
这个例子中的另一个新技术是我们通过组合字面量来初始化切片,它由切片的类型和大括号中的元素组成:
Another new technique in this example is that we initialize the slice by writing a composite literal, which consists of the type of the slice followed by its elements in braces:
slice := []int{0, 1, 2, 3, 4}
Append
函数之所以有趣还有另外的原因。我们不仅可以添加元素,还可以在调用处使用 ...
符号将切片“分解”为参数列表,以此来添加一整个切片:
The
Append
function is interesting for another reason. Not only can we append elements, we can append a whole second slice by “exploding” the slice into arguments using the...
notation at the call site:
slice1 := []int{0, 1, 2, 3, 4}
slice2 := []int{55, 66, 77}
fmt.Println(slice1)
// ... 是必须的
slice1 = Append(slice1, slice2...) // The '...' is essential!
fmt.Println(slice1)
当然,我们可以通过小于等于一次的再分配来提高 Append
的效率
Of course, we can make
Append
more efficient by allocating no more than once, building on the innards ofExtend
:
// Append 将 elements 添加到 slice 中
// 高效版本
// Append appends the elements to the slice.
// Efficient version.
func Append(slice []int, elements ...int) []int {
n := len(slice)
total := len(slice) + len(elements)
if total > cap(slice) {
// Reallocate. Grow to 1.5 times the new size, so we can still grow.
newSize := total*3/2 + 1
newSlice := make([]int, total, newSize)
copy(newSlice, slice)
slice = newSlice
}
slice = slice[:total]
copy(slice[n:], elements)
return slice
}
注意我们使用了两次 copy
,第一次是将切片数据移动到新分配的内存,第二次是将要添加的元素复制到旧数据的末尾。
Here, notice how we use
copy
twice, once to move the slice data to the newly allocated memory, and then to copy the appending items to the end of the old data.
试试看; 结果与之前一致:
Try it; the behavior is the same as before:
slice1 := []int{0, 1, 2, 3, 4}
slice2 := []int{55, 66, 77}
fmt.Println(slice1)
slice1 = Append(slice1, slice2...) // The '...' is essential!
fmt.Println(slice1)
内建函数 append¶
Append: The built-in function
现在我们有了设计内建函数 append
的动机。它与我们的 Append
示例完全一样,效率相当,但它适用于任何切片类型。
And so we arrive at the motivation for the design of the
append
built-in function. It does exactly what ourAppend
example does, with equivalent efficiency, but it works for any slice type.
任何泛型类型操作都必须由运行时提供是 Go 的一个缺点。总有一天这种情况会得到改变,但现在,为了简化切片的处理,Go 提供了一个内置的通用 append
函数。它的工作原理与我们的 int
切片版本相同,但它适用于任何的切片类型。
A weakness of Go is that any generic-type operations must be provided by the run-time. Some day that may change, but for now, to make working with slices easier, Go provides a built-in generic
append
function. It works the same as ourint
slice version, but for any slice type.
请记住,由于调用 append
总是会更新切片头,所以你需要在调用后保存返回的切片。事实上,编译器也不会让你在不保存结果的情况下调用 append
。
Remember, since the slice header is always updated by a call to
append
, you need to save the returned slice after the call. In fact, the compiler won’t let you call append without saving the result.
尝试修改运行并思考:
Here are some one-liners intermingled with print statements. Try them, edit them and explore:
// Create a couple of starter slices.
slice := []int{1, 2, 3}
slice2 := []int{55, 66, 77}
fmt.Println("Start slice: ", slice)
fmt.Println("Start slice2:", slice2)
// Add an item to a slice.
slice = append(slice, 4)
fmt.Println("Add one item:", slice)
// Add one slice to another.
slice = append(slice, slice2...)
fmt.Println("Add one slice:", slice)
// Make a copy of a slice (of int).
slice3 := append([]int(nil), slice...)
fmt.Println("Copy a slice:", slice3)
// Copy a slice to the end of itself.
fmt.Println("Before append to self:", slice)
slice = append(slice, slice...)
fmt.Println("After append to self:", slice)
该示例的最后一行代码值得花点时间详细思考,这可以帮助了解切片的设计如何使这个简单的调用能够正确工作。
It’s worth taking a moment to think about the final one-liner of that example in detail to understand how the design of slices makes it possible for this simple call to work correctly.
在社区构建的“切片技巧”Wiki页面上,有很多 append
、copy
以及其他切片方法的示例。
There are lots more examples of
append
,copy
, and other ways to use slices on the community-built “Slice Tricks” Wiki page.
nil¶
Nil
顺带一提,根据我们新学到的知识,我们可以看到 nil
切片的表示形式是什么。当然,它是切片头的零值:
As an aside, with our newfound knowledge we can see what the representation of a
nil
slice is. Naturally, it is the zero value of the slice header:
sliceHeader{
Length: 0,
Capacity: 0,
ZerothElement: nil,
}
或者
or just
sliceHeader{}
这里的关键细节是元素指针也是 nil
。由
The key detail is that the element pointer is
nil
too. The slice created by
array[0:0]
创建的切片,长度为零(甚至容量也为零),但它的指针不是 nil
,因此它不是 nil
切片。
has length zero (and maybe even capacity zero) but its pointer is not
nil
, so it is not a nil slice.
需要说明的是,空的切片是可以增长的(假设它是非零容量),但 nil
切片没有数组来放入值,所以它永远不可能增长到容纳一个元素。
As should be clear, an empty slice can grow (assuming it has non-zero capacity), but a
nil
slice has no array to put values in and can never grow to hold even one element.
也就是说,nil
切片在功能上等同于零长度切片,即使它没有指向任何数组。它的长度为零,可以附加到,并进行分配。例如,看看上面的一行程序,它通过附加到 nil
切片来复制切片。
That said, a
nil
slice is functionally equivalent to a zero-length slice, even though it points to nothing. It has length zero and can be appended to, with allocation. As an example, look at the one-liner above that copies a slice by appending to anil
slice.
字符串¶
Strings
现在简要介绍一下切片上下文中的字符串。
Now a brief section about strings in Go in the context of slices.
实际上字符串非常简单:它们是只读的字节切片,以及一些来自语言的额外语法支持。
Strings are actually very simple: they are just read-only slices of bytes with a bit of extra syntactic support from the language.
它们是只读的,因此不需要容量(它们不能增长),对于大多数用途,你可以直接将它们视为只读字节切片。
Because they are read-only, there is no need for a capacity (you can’t grow them), but otherwise for most purposes you can treat them just like read-only slices of bytes.
对于初学者,我们可以使用索引来访问单个字节:
For starters, we can index them to access individual bytes:
slash := "/usr/ken"[0] // yields the byte value '/'.
还可以切分字符串以获取子串:
We can slice a string to grab a substring:
usr := "/usr/ken"[0:4] // yields the string "/usr"
有了前面的基础,当我们切分字符串时,底层发生的事情应该是显而易见的。
It should be obvious now what’s going on behind the scenes when we slice a string.
我们还可以取一个普通的字节切片,通过简单的转换来创建一个字符串:
We can also take a normal slice of bytes and create a string from it with the simple conversion:
str := string(slice)
也可以进行逆向操作:
and go in the reverse direction as well:
slice := []byte(usr)
字符串底层的数组是不可见的;除非通过字符串,否则无法访问其内容。这意味着当我们进行任何一种转换时,都必须创建底层数组的副本。当然,Go 会实现这个过程,所以你不必亲自这样做。在这些转换之后,对字节切片底层数组的修改就不会影响到原来的字符串了。
The array underlying a string is hidden from view; there is no way to access its contents except through the string. That means that when we do either of these conversions, a copy of the array must be made. Go takes care of this, of course, so you don’t have to. After either of these conversions, modifications to the array underlying the byte slice don’t affect the corresponding string.
这种类似切片的字符串设计的一个重点是:创建子串非常高效。只需要创建一个简短的字符串头。因为字符串是只读的,所以原始字符串和切片操作产生的字符串可以安全地共享同一数组。
An important consequence of this slice-like design for strings is that creating a substring is very efficient. All that needs to happen is the creation of a two-word string header. Since the string is read-only, the original string and the string resulting from the slice operation can share the same array safely.
历史小知识:字符串的最早实现是总被分配的,但当切片被添加到语言中时,它们提供了一个模型来高效地处理字符串。因此,一些基准测试得到了巨大的加速。
A historical note: The earliest implementation of strings always allocated, but when slices were added to the language, they provided a model for efficient string handling. Some of the benchmarks saw huge speedups as a result.
当然,字符串还有更多内容,有一篇单独的博文更深入地讨论了它们。
There’s much more to strings, of course, and a separate blog post covers them in greater depth.
结语¶
Conclusion
为了理解切片是如何实现的,我们首先了解切片是如何工作的:有一个轻量的数据结构,即切片头,它与切片变量相关联,切片头描述了某个单独分配的数组的一部分。当我们传递切片值时,切片头就会被复制,但它指向的数组总是共享的。
To understand how slices work, it helps to understand how they are implemented. There is a little data structure, the slice header, that is the item associated with the slice variable, and that header describes a section of a separately allocated array. When we pass slice values around, the header gets copied but the array it points to is always shared.
一旦了解了切片的工作原理,它就不仅易于使用,而且功能强大且富有表现力,特别是在内置函数 copy
和 append
的加持下。
Once you appreciate how they work, slices become not only easy to use, but powerful and expressive, especially with the help of the
copy
andappend
built-in functions.
延申阅读¶
More reading
关于Go中的切片,有很多知识需要探索。前面提到的“切片技巧” Wiki 页面有很多的例子。博文 Go 切片用清晰的图表描述了其内存布局细节。Russ Cox 的文章 Go 中的数据结构包括对切片的讨论以及一些其他的内部数据结构。
There’s lots to find around the intertubes about slices in Go. As mentioned earlier, the “Slice Tricks” Wiki page has many examples. The Go Slices blog post describes the memory layout details with clear diagrams. Russ Cox’s Go Data Structures article includes a discussion of slices along with some of Go’s other internal data structures.
还有更多关于切片的有用资料,但了解它最好途径就是去使用它。
There is much more material available, but the best way to learn about slices is to use them.