eval 函数非常危险¶
Info
原文标题:Eval really is dangerous
原文链接:https://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html
原文作者:Ned Batchelder
翻译这篇文章的原因是在业务上很多使用 eval
来进行类型转换,比如 eval("[1,2,3]")
,这些代码要么没有对传入的字符串进行过滤,要么就是简单的将 globals 置空 eval("[1,2,3]", {})
。本文详细分析了 eval
不安全的原因,如果只是为了类型转换,可以使用 ast.literal_eval
来代替。
Python 的 eval()
函数用于将传入的字符串作为代码进行求值。
Python has an eval() function which evaluates a string of Python code:
assert eval("2 + 3 * len('hello')") == 17
它十分强大,但如果对不可信的字符串进行求值,就会非常危险。比如正在求值的字符串是 os.system('rm-rf /')
?它将真正删除计算机上的所有文件。(在下面的示例中,我将使用 clear
而不是 rm-rf /
以防意外。)
This is very powerful, but is also very dangerous if you accept strings to evaluate from untrusted input. Suppose the string being evaluated is “os.system(‘rm -rf /’)” ? It will really start deleting all the files on your computer. (In the examples that follow, I’ll use ‘clear’ instead of ‘rm -rf /’ to prevent accidental foot-shootings.)
一些人声称可以通过不提供全局变量,使 eval
变得安全。eval
的第二个参数是在求值期间使用的全局变量,为字典形式,如果没有提供,eval
将使用当前的全局变量,进而可以使用 os
模块。如果提供一个空字典,那就没有全局变量,触发异常: NameError: name 'os' is not defined
:
Some have claimed that you can make eval safe by providing it with no globals. eval() takes a second argument which are the global values to use during the evaluation. If you don’t provide a globals dictionary, then eval uses the current globals, which is why “os” might be available. If you provide an empty dictionary, then there are no globals. This now raises a NameError, “name ‘os’ is not defined”:
eval("os.system('clear')", {})
但我们仍然可以通过 builtins 中的函数 __import__
来引入它们并使用,下面的代码可以执行成功:
But we can still import modules and use them, with the builtin function import. This succeeds:
eval("__import__('os').system('clear')", {})
在 Python 2 中可以使用 __import__
和 open
函数的原因是它们位于全局变量 __builtins__
中,为了确保安全,我们尝试在 eval
中拒绝其访问 builtins。通过在全局变量中将 __builtins__
定义为空字典,我们可以显式指定不使用这些内置函数。现在下面的代码会抛出 NameError:
The next attempt to make things safe is to refuse access to the builtins. The reason names like __import__ and open are available to you in Python 2 is because they are in the __builtins__ global. We can explicitly specify that there are no builtins by defining that name as an empty dictionary in our globals. Now this raises a NameError:
eval("__import__('os').system('clear')", {"__builtins__": {}})
Note
eval
的函数定义为:eval(expression[, globals[, locals]])
你可能会疑惑,上面已经将 globals 置为 {}
了,为什么还要显式的将 __builtins__
置为 {}
呢?根据 Python 2.7 的官方文档的描述,如果给 eval
传入了 globals 字典,但是字典中缺少 __builtins__
,会复制当前环境的 globals 传入 eval
中。所以要想禁用 __builtins__
,传入的 globals 字典应该是 {"__builtins__": {}}
。
现在安全了吗?有些人说是,但他们错了。作为演示,在 CPython 中运行下面的代码将会使解释器崩溃(段错误):
Are we safe now? Some say yes, but they are wrong. As a demonstration, running this in CPython will segfault your interpreter:
bomb = """
(lambda fc=(
lambda n: [
c for c in
().__class__.__bases__[0].__subclasses__()
if c.__name__ == n
][0]
):
fc("function")(
fc("code")(
# 2.7: 0,0,0,0,"BOOM",(),(),(),"","",0,""
# 3.5-3.7: 0,0,0,0,0,b"BOOM",(),(),(),"","",0,b""
# 3.8-3.10: 0,0,0,0,0,0,b"BOOM",(),(),(),"","",0,b""
# 3.11: 0,0,0,0,0,0,b"BOOM",(),(),(),"","","",0,b"",b"",b"",b"",(),()
),{}
)()
)()
"""
eval(bomb, {"__builtins__": {}})
代码中间的 BOOM
行需要根据 Python 的版本进行更改。
The middle “BOOM” line needs to change depending on the version of Python. Uncomment the right one to see the crash.
下面让我们撕开这只野兽的伪装,看看到底发生了什么。先看这一句:
Let’s unpack this beast and see what’s going on. At the center we find this:
().__class__.__bases__[0]
这是一种获取 object
的奇特写法。元组的第一个基类是 object
。记住,因为我们没有 builtins,所以不能简单地使用 object
。但是我们可以用字面语法创建元组对象,然后从该对象上获取属性。
which is a fancy way of saying “object”. The first base class of a tuple is “object”. Remember, we can’t simply say “object”, since we have no builtins. But we can create objects with literal syntax, and then use attributes from there.
一旦我们有了 object
类,就可以获取 object
的所有子类:
Once we have object, we can get the list of all the subclasses of object:
().__class__.__bases__[0].__subclasses__()
换句话说,我们拿到了程序中至此所有已经完成实例化的类的列表。我们最后会再讨论这个问题。现在我们将其简写为 ALL_CLASSES
,那么下面的代码将从中找到第一个名为 n
的类:
or in other words, a list of all the classes that have been instantiated to this point in the program. We’ll come back to this at the end. If we shorthand this as ALL_CLASSES, then this is a list comprehension that examines all the classes to find one named n:
[c for c in ALL_CLASSES if c.__name__ == n][0]
后面我们将使用它来通过名称查找类,由于需要多次使用,所以为它创建一个函数:
We’ll use this to find classes by name, and because we need to use it twice, we’ll create a function for it:
lambda n: [c for c in ALL_CLASSES if c.__name__ == n][0]
由于我们在 eval
中,不能使用 def
语句或赋值语句来为这个函数命名。但函数的默认参数也是赋值的一种形式,而 lambda 函数可以有默认参数。所以,我们将上面的代码放在 lambda 函数中,通过默认参数为上面的函数命名:
But we’re in an eval, so we can’t use the def statement, or the assignment statement to give this function a name. But default arguments to a function are also a form of assignment, and lambdas can have default arguments. So we put the rest of our code in a lambda function to get the use of the default arguments as an assignment:
(lambda fc=(lambda n: [c for c in ALL_CLASSES if c.__name__ == n][0]):
# code goes here...
)()
现在有了函数 fc
,它能帮助我们查找类,通过它我们可以创建一个 code 对象!这没那么容易,需要向构造函数提供 12 个参数,但大多数参数都可以使用简单的默认值。
Now that we have our “find class” function fc, what will we do with it? We can make a code object! It isn’t easy, you need to provide 12 arguments to the constructor, but most can be given simple default values.
fc("code")(0, 0, 0, 0, "BOOM", (), (), (), "", "", 0, "")
字符串 BOOM
是 code 对象中会实际使用的字节码,你可能已经猜到,BOOM
并非有效的字节码序列。实际上,这些字节码中的任何一个都足够触发 CPython 的段错误,它们都是二进制运算符,都试图在空的操作数栈上做操作。多亏了 lvh,BOOM
更加有趣。
The string “BOOM” is the actual bytecodes to use in the code object, and as you can probably guess, “BOOM” is not a valid sequence of bytecodes. Actually, any one of these bytecodes would be enough, they are all binary operators that will try to operate on an empty operand stack, which will segfault CPython. “BOOM” is just more fun, thanks to lvh for it.
fc("code")
找到了类 code
,然后通过 12 个参数来实例化它,最终得到了一个 code 对象。虽然不能直接调用这个对象,但可以通过以下代码创建一个函数:
This gives us a code object: fc(“code”) finds the class “code” for us, and then we invoke it with the 12 arguments. You can’t invoke a code object directly, but you can create a function with one:
fc("function")(CODE_OBJECT, {})
当然,一旦你创建了这个函数,你就可以调用它,它将运行 code 对象中的代码,从而执行我们的伪字节码。这将导致 CPython 解释器的段错误。下面是一个更紧凑的危险字符串:
And of course, once you have a function, you can call it, which will run the code in its code object. In this case, that will execute our bogus bytecodes, which will segfault the CPython interpreter. Here’s the dangerous string again, in more compact form:
(lambda fc=(lambda n: [c for c in ().__class__.__bases__[0].__subclasses__() if c.__name__ == n][0]):
fc("function")(fc("code")(0,0,0,0,"BOOM",(),(),(),"","",0,""), {})()
)()
所以 eval
并不安全,即使你删除了所有的 globals 和 builtins!
So eval is not safe, even if you remove all the globals and the builtins!
在上面的示例中,我们使用 object
所有子类的列表来生成 code 对象和函数。当然也可以找到其他类并使用它们。可以找到的类取决于 eval()
调用的实际位置。在实际的程序中,eval()
调用时许多类都已被创建,并且这些类都将在我们的列表 ALL_CLASSES
中。例如:
We used the list of all subclasses of object here to make a code object and a function. You can of course find other classes and use them. Which classes you can find depends on where the eval() call actually is. In a real program, there will be many classes already created by the time the eval() happens, and all of them will be in our list of ALL_CLASSES. As an example:
s = """
[
c for c in
().__class__.__bases__[0].__subclasses__()
if c.__name__ == "Quitter"
][0](0)()
"""
标准模块 site
定义了一个名为 Quitter
的类,quit
是它的一个实例,你可以在交互式解释器下输入 quit()
来退出解释器。所以在 eval
中,我们只需找到 Quitter
,实例化它并调用它。上面的字符串会干净地退出 Python 解释器。
The standard site module defines a class called Quitter, it’s what the name “quit” is bound to, so that you can type quit() at the interactive prompt to exit the interpreter. So in eval we simply find Quitter, instantiate it, and call it. This string cleanly exits the Python interpreter.
在一个真实的系统中,会有各种各样强大的类,传递给 eval
的字符串可以实例化和调用这些类。这可能造成永无止境的破坏。
Of course, in a real system, there will be all sorts of powerful classes lying around that an eval’ed string could instantiate and invoke. There’s no end to the havoc that could be caused.
这些试图保护 eval()
的尝试,它们的问题在于都通过黑名单来排除掉可能造成危险的东西,但只要有一项不在黑名单中,就可以用来攻击系统,所以这是一场注定要失败的战斗。
The problem with all of these attempts to protect eval() is that they are blacklists. They explicitly remove things that could be dangerous. That is a losing battle because if there’s just one item left off the list, you can attack the system.
当我在探索这个主题时,偶然发现了 Python 的受限模式,这似乎是为了填补其中的一些漏洞。我们尝试访问 lambda 的 code 对象,发现其不被允许:
While I was poking around on this topic, I stumbled on Python’s restricted evaluation mode, which seems to be an attempt to plug some of these holes. Here we try to access the code object for a lambda, and find we aren’t allowed to:
>>> eval("(lambda:0).func_code", {'__builtins__':{}})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1, in <module>
RuntimeError: function attributes not accessible in restricted mode
受限模式是将某些“危险”的属性访问列入黑名单的明确尝试。如果你的 builtins
不是官方内置的 builtins
,则会在执行代码时触发。Tav 的博客上有关于这个主题的更详细的解释和其他讨论。正如前面我们所看到的,现有的限制模式并不足以防止恶意攻击。
Restricted mode is an explicit attempt to blacklist certain “dangerous” attribute access. It’s specifically triggered when executing code if your builtins are not the official builtins. There’s a much more detailed explanation and links to other discussion on this topic on Tav’s blog. As we’ve seen, the existing restricted mode it isn’t enough to prevent mischief.
那么,eval
会变得安全吗?很难说。在这一点上,我最乐观的猜测是:如果不能使用任何双下划线属性,就不会造成伤害,所以如果你过滤了任何带有双下划线的字符串,你就安全了。大概吧。
So, can eval be made safe? Hard to say. At this point, my best guess is that you can’t do any harm if you can’t use any double underscores, so maybe if you exclude any string with double underscores you are safe. Maybe...
更新:根据 Reddit 上的一个帖子:恢复已经被清除的 globals,下面是一个类似的代码片段,它将获取到原始的 builtins:
Update: from a thread on Reddit about recovering cleared globals, a similar snippet that will get you the original builtins:
[
c for c in ().__class__.__base__.__subclasses__()
if c.__name__ == 'catch_warnings'
][0]()._module.__builtins__