Buffer Overflows

From Embedded Lab Vienna for IoT & Security
Revision as of 19:48, 19 December 2021 by CBektas (talk | contribs) (memory structure added)
Jump to navigation Jump to search

Definition

Buffer overflow is a security vulnerability that should not be underestimated, as it has been the most common type of vulnerability in the last decade. This type of attack is an essential part of all security attacks, as buffer overflow vulnerabilities are widespread and easy to exploit. Especially in the field of cyber attacks, they are exploited by users to gain access over the vulnerable servers and control them.


Creation

A buffer overflow is essentially about how the system reacts to errors when executing software. If a program expects information from the user to process and a buffer of static size is created and the data that was entered is not checked for length, it would result in a buffer overflow. An example is show below.

Example.jpg


In a buffer, data is stored temporarily while it is transferred from one location to another. If it is assumed that 8 bytes of data are allocated and the user enters data that would be more than 8 bytes, it would result in a buffer overflow because 2 bytes more than allowed were entered. The corresponding source code example is shown below.

#include <string.h>

using namesprace std;

int main() 
{
    char buffer [8];
    cout << "Input data: ";
    cin >> buffer;

    return 0;
}

A variable of data type char is used, which creates an array named buffer that can store 8 bytes of data.

History

In the past, there was often a series of events executed with the help of buffer overflows. In 1980, the term Internet worm was a very well-known and a sensitive topic, because a worm was a malicious software that reproduced itself and spread through network connections. It was called a Morris worm because on November 2, 1988, an event occurred that changed the way people thought about networks and about the Internet. On that day, tens of thousands of computers quickly and simultaneously became infected with a self-replicating computer program. To this day, this attack is perhaps one of the most significant events in the history of computing. With buffer overflows it was also possible to bypass various security measures. For example, buffer overflows could be used to remove software restrictions from firmware or to bypass copy protection. This was the case with the Android and iOS operating systems, where it was possible to remove various locks and modify the smartphone according to one's own wishes. For example, it was possible to install apps that were not available in the store elsewhere, change the boot animation, access hidden system files, remove manufacturer-specific apps, but also remove network locks and much more. One keyword is "jailbreaking" for Apple devices and "rooting" for Android devices. On Nintendo's game console, a game called Pokemon Yellow could be changed from the inside by manupultating the program using shellcode.

Memory structure

In order to understand the principle of a buffer overflow-based attack, it is of great importance to look at the memory structure in detail. Binary data refers to files that can be executed and have different file formats. One of them is ELF, which stands for Executable and Linking Format and is supported by UNIX. When such an executable binary file is loaded via the linker and the program is executed, the corresponding program code is loaded into the main memory and executed by the CPU. Different parts of the program such as the program code, constants and data are stored in different memory segments. The memory structure is shown below.

Memory structure.png

Code

The Code segment contains byte patterns that can be read and understood by the CPU. This memory segment is read-only, so it can be executed and used by users at the same time. Therefore, this segment is not the target of a buffer overflow, since attempting to write to it would result in a Memory Access Violation Error, which would mean that the program will close.

Data and BSS

The Data and BSS segment store global variables that can be accessed by any function. Since variables should not contain executable code, these memory segments are not executable and should be pointed to with a command pointer, it would mean that the program will close.

Stack

The Stack segment store local variables that are used in a function and can be accessed via the function's instructions. The variables are located in a memory segment, which is the stack and corresponds to a last-in-first-out method. This means that only access to the top element is granted or to move the element to another location to be able to advance to the next element. The stack is at the end of the memory accessible to the program and grows downward, so the top element is actually the element with the lowest address, but is still called the top element. The CPU has a special register that keeps track of only the top of the stack, that is, whether the stack is growing or shrinking, and this register is called the stack pointer.

Heap

The Heap segment is a memory area where it is possible to request and release memory during the execution of a program. If a programmer would need 1000 bytes of memory, that would be allocated immediately with the function malloc() and returned to the system with the function free(). This can be useful in situations when it can't predict how much memory will end up using, since it depends on what is put into the program. Compared to the stack, the heap starts from the lowest address and grows upwards.

Types

Three different types are distinguished for a buffer overflow namely Stack Overflow, Heap Overflow and Integeger Overflow.

Stack Overflow

A stack overflow is a type of program error that can cause a computer's buffer to overflow with too much data. This means that if a program wants to write something to an address outside its data structure, it will not end up in the buffer, but will overwrite the memory location, which would then lead to an overflow. Such an attack exploits the fact that when a function is called, the CPU stores the address of the next instruction on the stack and the compiler uses the frame pointer, which is also stored on the stack and is used to remember the beginning of the current stack area. Each called function then creates its own block of memory in the stack area, gradually filling the stack higher with the highest address. If a buffer is allocated to the stack and the input is copied to the stack by the user without verification, the attacker can expand the size of the input to overwrite the data behind it.

Heap Overflow

Heap Overflows work on a similar principle like stack overflows and differ in that they maintain persistence between function calls, among other things. This would mean that as long as the memory area is not used later, the overflow is also not noticed. This is possible with the function malloc(), since the memory can be allocated and remains allocated until one executes the function free().

Integer Overflow

Integer overflows occurs when an arithmetic operation attempts to generate a value that lies outside a range that can be represented with a specified number of bits. The most common result of an overflow is that the least significant representable bits of the result are stored. An overflow condition can lead to results that are equivalent to unintended behavior. In particular, if the possibility is not expected, an overflow can affect the reliability and safety of a program. A code example is shown below.

unsigned char a = 255;
unsigned char b = 2;
unsigned char Result = a + b;

The data type unsigned char is used, which comprises 8 bits and the value range is from 0 to 255. For the variable "a" the value 255 is assigned and for the variable "b" the value 2 is assigned. If an arithmetic operation is performed, we would get a result that requires more bits than are present to represent. The corresponding dual calculation is shown below.

  11111111 (a)
+ 00000010 (b)
----------
 100000001 (Result)

The front one, the ninth bit, is no longer contained in the 8 bits of the data type unsigned char. If only the last 8 bits will be considered the result would be 1 and not 257

Summary

Since the rise of C in the early 1970s, buffer overflows have become a serious security vulnerability. Even though high-level programming languages are typically not affected, the number of vulnerable systems is actually rising.

At the same time, a wide array of countermeasures are also increasingly adopted and applied. Features like executable space protection (e.g. Data Execution Prevention under Windows) already deployed since the mid 2000s, and on the compiler side, technologies like Stackguard support several detection and prevention mechanisms (e.g. different types of Canaries). Furthermore, almost every wider used operation system supports Address Space Layout Randomization, in order to minimize the attack surface for buffer overflow attacks. For example, at the beginning of 2020 most of the bigger operating systems (Linux, Windows, macOS, iOS, Android, Solaris, OpenBSD, etc.) offer support for ASLR.

Requirements

  • Operating system: not limited
  • A vulnerable library (or function), within any attacked binary

Description

A buffer overflow occurs when there is more information written to a data region, than it can hold. For example in C, allowing user input directly to be written to a character array with a size of ten bytes. If in this case, the user enters more than ten characters, and furthermore the program attempts to insert said data into the smaller array, an overflow occurs.

Basic Vulnerability

#include <string.h>

int main(int argc, char *argv[]) {
    char buffer[6];
    strcpy(buffer, argv[1]); 
    return 0;
}

In this example an argument passed to this executable (e.g. the binary compiled from this source), with more than 6 characters, will typically overflow the buffer. However, the exact input size necessary to affect the program flow might be different (bigger), and will be a multiple of 4 characters (for 32 bit binaries).

These types of vulnerabilities can be taken advantage of in several different ways. For example most prominently, ROP attacks (return-oriented programming), targets the return address of a binary. By rewriting the return address, it aims at influencing the control flow of a program. Which can still be viable, when controlling security features, like executable-space protection, are inplace. Therefore, the attacker uses gadgets (small instruction sequences) which are already available within the binary, and manipulates their return location, and theirby does not directly need to inject executable instructions (which might be thwarted by the OS), but rather use these compiled resources (gadgets).

Countermeasures

Canaries

Canaries can be employed, in order to recognize whether a buffer overflow occurred. Therefore, in the simplest cases, Canaries are specific values in memory, which are located right after the thereby protected buffer. If their values are changed at any point, it is safe to assume that a buffer overflow occurred and countermeasures (e.g. termination) can be taken. Typical types of Canaries, which are supported by security hardening technologies like ProPolice or Stackguard (GCC), are Terminator Canaries, Random Canaries, and Random XOR Canaries.

Data Execution Prevention (DEP)

Data Execution Prevention (DEP) is a security feature, originally developed by MicrosoftR© for Windows XP SP2. There are two basic variants, hardware-based DEP and software-based DEP. If supported by the CPU as well as process, hardware DEP will be employed, otherwise DEP has to be carried out in software, which is part of the Windows operating system. The basic functionality of DEP is to prevent applications from executing code in a non-executable area of the memory.

Address Space Layout Randomization (ASLR)

Since code reuse attacks (e.g. ROP attack) require the memory addresses of gadgets to be known to an attacker, techniques to randomize their entry points have become increasingly popular. ASLR randomizes the location of data and code regions offers a plausible defensive strategy, (since) code region layout randomization hinders code reuse in exploits and data randomization impedes the redirection of control-flow by making it difficult to guess the location of injected code (partly paraphrased from the 2013 paper by Snow et al.).

Courses

References