Making things simple is a lot of work. At dotCloud, we package terribly complex things – such as deploying and scaling web applications – into the simplest possible experience for developers. But how does it work behind the scenes?
From kernel-level virtualization to monitoring, from high-throughput network routing to distributed locks, from dealing with EBS issues to collecting millions of system metrics per minute.
As someone once commented, scaling a PaaS is “like disneyland for systems engineers on crack”.
Still with us? Read on!
This is the 4th installment of a series of posts exploring the architecture and internals of platorm-as-a-service in general, and dotCloud in particular.
You can find episode 1 on kernel namespaces here. Episode 2 covered cgroups, which you can find here. Our third episode explored AUFS and you an catch it here.
For this episode we will focus on GRSEC.
Part 4: GRSEC
grsec is a fairly large patch for the Linux kernel, providing strong security features to prevent many kind of attacks (or “exploits”) and to detect suspicious activity (i.e. people looking for new exploits, or trying to find out if known exploits are working on the current system).
There are many different things in GRSEC, so our goal is just to provide a quick overview of some relevant features.
Randomize Address Space
A lot of exploits will rely on the fact that the base address for the heap or the stack is always the same.
As an example, consider the following classic scenario for the attack of a remote service:
- A bug is found in the service; e.g. some index is not checked properly, and this can be used to alter the stack, and cause a jump to an arbitrary address (when a function returns)
- The stack is altered to introduce some malicious code
- A pointer to this malicious code is placed on the stack as well
- The bug is triggered: the service jumps to the malicious code and executes it
If the address space of the stack is randomized, those attacks are much harder, because the attacker has to find out where his malicious code will be before being able to jump to it.
Prevent Execution of Arbitrary Code
There are two steps to make sure that arbitrary code can’t make it inside a running program.
First, program code must be loaded in an area which is marked by the memory management unit as being read-only. This prevents code from modifying itself. Self-modifying code is sometimes referred to as polymorphic code; and while there are some legitimate uses for polymorphic code, it is more generally associated with dubious intentions!
Second, the heap and the stack must be marked as non-executable. After all, they’re supposed to contain data structures, function parameters, and return addresses – not a single opcode in there. On architectures supporting it, the heap and the stack regions will be marked as non-executable at the hardware level, effectively preventing accidental or intentional execution of code located in them.
At this point, there is no memory which is both executable and writable – good!
We mentioned that there were some legitimate uses for memory regions with both write and exec permissions. When does that happen, and what can be done about it?
The most common case is on-the-fly code generation for optimization purposes. If you use Java and its JIT (Just-In-Time) compiler, you’re in this situation.
Good news: GRSEC lets you flag some specific executables, to allow them to write to their code region or execute their data region.
This of course reduces the security for those specific processes; but the assumed trade-off is the following: to exploit a bug, it has to be a bug in e.g. the JVM itself, not in your program – and bugs in the JVM are likely to be found and fixed faster than bugs in your program. This is not about the quality of your code: this is about the number of users and scrutiny that the JVM has.
Audit Suspicious Activity
Another interesting security feature of GRSEC is the ability to log some specific events. For instance, it is possible to record in the kernel log each time a process is terminated by SIGSEGV, a.k.a. Segmentation Fault.
What’s the point? Well, a potential attacker will probably try to run a number of known exploits in an attempt to gain escalated privileges. Hopefully, all those exploits will fail. But very often, the failure will result in the tentative process to do a segmentation violation, and be killed by SIGSEGV.
Any C programmer will tell you that there are of course other cases when programs can be terminated by SIGSEGV; but detecting many different programs, started by the same user that are all being killed that way, is a telltale sign that someone is trying to break into the system.
If you’re not familiar with those concepts, you can make a parallel with scratches around a keylock: a few ones don’t mean anything, but if the whereabouts of the lock are barren with dents, you can bet that someone tried to pick it!
There are many other similar events that are logged by GRSEC. The kernel logs can then be analyzed in realtime, and suspicious patterns can be detected. This allows you to lock out malicious users, or, alternatively, to monitor closely what they’re doing. The latter allows you to be the first one to know if someone successfully breaks into the system, and how they did it – which is an essential step to eventually patch an uncovered security hole.
Compile-time Security Features
GRSEC also plays its part during the kernel compilation. It enables a compiler plugin, which will “constify” some kernel structures. It will automatically add the const keyword to all structures containing only function pointers (unless they have a special “non const” marker to evade the process).
In other words, instead of being mutable by default unless marked const, function tables are now const by default, unless specified otherwise.Accordingly, attempts to modify function tables will be detected at compile-time.The rationale is to make sure that any code that manipulates a function table will be closely audited before the function table is marked “non const”.
Why this emphasis on function tables? Because if they can be abused, they are a convenient way for a potential attacker to jump to arbitrary code – remember the technique explained in the beginning of the post!
Marking those data structures as const helps at compile time, but also later when the kernel is running, because those data structures will be laid out in a memory region which will be made read-only by the memory management unit.
This not only reduces exposure to attacks, but also makes much harder for successful attackers to cover up their tracks by hijacking existing function tables.
…And Many More
As told in the introduction, this is just a quick overview. If you want to learn about other features, you can check http://grsecurity.net/.
If you want to quench your thirst for technical details, there is a good way (4 steps) to list all the features, in an almost exhaustive way, along with a fairly detailed description of what they do:
- Get the kernelsources
- Apply the GRSEC patch set
- then run make menuconfig
- Navigate to the compilation options related to GRSEC
Almost each feature of GRSEC can be enabled/disabled at compilation time, and will therefore be listed there. The help provided with each compilation option is fairly informative.
In addition to GRSEC, dotCloud has a few other extra layers of security.
Each service runs into its own container. The benefits of container isolation were explained in a previous part of this blog post series.
Also, dotCloud users do not have root access. “No root access” means that users cannot SSH as root, cannot login as root, and cannot get a root shell through sudo. All processes run under a regular, non-privileged UID. Furthermore, SUID binaries are restricted to a set of well-known, well-audited programs, like ping.
Each of those security layers is pretty strong; but we believe that combining them together is a good way to provide an adequate level of security for massively scaled, multi-tenants platforms.