Buffer Overflows
Definition
Buffer overflow is a security vulnerability that should not be underestimated, as it has been the most common type of vulnerability in the last decade. This type of attack is an essential part of all security attacks, as buffer overflow vulnerabilities are widespread and easy to exploit. Especially in the field of cyber attacks, they are exploited by users to gain access to vulnerable servers and control them.
Functionality
A buffer overflow is essentially about how the system reacts to errors when executing software. If a program expects information from the user to be processed and a buffer of static size is created and the data that was entered is not checked for length, it would result in a buffer overflow. An example is shown below.
In a buffer, data is stored temporarily while it is transferred from one location to another. If it is assumed that 8 bytes of data are allocated and the user enters data that would be more than 8 bytes, it would result in a buffer overflow because 2 bytes more than allowed were entered. The corresponding source code example is shown below.
#include <string.h> using namesprace std; int main() { char buffer [8]; cout << "Input data: "; cin >> buffer; return 0; }
A variable of the data type char is used, which creates an array named buffer that can store 8 bytes of data.
Practical Guide
In the following example, a piece of code is shown, which lets the user manipulate the output of the program enabled through a buffer vulnerability. Make sure, to have the GCC compiler installed on your Windows machine.
#include <string.h> int main(int argc, char **argv) { char jayce[4]="Oum"; char herc[8]="Gillian"; strcpy(herc, "BrookFlora"); printf("%s\n", jayce); return 0; }
To run the program, open the command prompt in the directory of the source file. Compile the file with the following command:
$gcc example.c
Subsequently, execute the compiled file to display its output:
$./example.out ra $
The intention of the program is supposed to be the output of the string "BrookFlora". As the following illustration shows, two buffers are stored back to back in the stack. Since ten characters are copied into the buffer which is only eight bytes long, the operation causes an overflow into the first buffer.
|\0| m| u| O| |\0|\0| a| r| |\0|\0| a| i| | o| l| F| k| | l| l| i| G| | o| o| r| B|
The first diagram shows the initial stack organization. The second diagram shows the overflowed memory layout.
History
In the past, there were often a series of events executed with the help of buffer overflows. In 1980, the term Internet worm was a very well-known and sensitive topic because a worm was malicious software that reproduced itself and spread through network connections. It was called a Morris worm because on November 2, 1988, an event occurred that changed the way people thought about networks and about the Internet. On that day, tens of thousands of computers quickly and simultaneously became infected with a self-replicating computer program. Back then, computer science student Robert T. Morris created a computer worm that exploited an unsafe function, which at the time was very commonly used and distributed. Through the practical application of a buffer overflow, the computer worm spread itself around at an alarming rate and, going back, nearly shut down the entire internet. This situation resulted in Morris being the first person convicted under the Computer Fraud and Abuse Act, demonstrating further how dangerous buffer overflows can be. To this day, this attack is perhaps one of the most significant events in the history of computing. With buffer overflows, it was also possible to bypass various security measures. For example, buffer overflows could be used to remove software restrictions from firmware or to bypass copy protection. This was the case with the Android and iOS operating systems, where it was possible to remove various locks and modify the smartphone according to one's own wishes. For example, it was possible to install apps that were not available in the store elsewhere, change the boot animation, access hidden system files, remove manufacturer-specific apps, remove network locks, and much more. One keyword is "jailbreaking" for Apple devices and "rooting" for Android devices. On Nintendo's game console, a game called Pokemon Yellow could be changed from the inside by manipulating the program using shellcode.
While many of these events already reside in the past, buffer overflows do not. The exploit reoccurs frequently, so much so that in 2023 they still secured themselves a spot on the Common Weakness Enumeration/SANS list of the Top 25 Most Dangerous Software Errors. Whether through the adaptation of the exploit's mechanism or simply by focusing on a new set of targets, buffer overflows stay relevant. That's why, when dealing with cybersecurity of any sort, there is no way past them.
Affected Programming Languages
Buffer overflows most commonly, if not exclusively, appear in the programming languages C and C++. The reason for that is that they are closely hardware-related programming languages, meaning that out-of-the-box, they do possess very little in terms of safety mechanisms. Given said information, it would be even easier to exploit buffer overflows when using an assembler, as it is as hardware-related as it gets for programming languages. Whereas more abstract and ”heavyweight” Programming languages such as C# and Java already come with lots of safeguards.
Memory structure
In order to understand the principle of a buffer overflow-based attack, it is of great importance to look at the memory structure in detail. Binary data refers to files that can be executed and have different file formats. One of them is ELF, which stands for Executable and Linking Format and is supported by UNIX. When such an executable binary file is loaded via the linker and the program is executed, the corresponding program code is loaded into the main memory and executed by the CPU. Different parts of the program, such as the program code, constants, and data, are stored in different memory segments. The memory structure is shown below.
Code
The code segment contains byte patterns that can be read and understood by the CPU. This memory segment is read-only, so it can be executed and used by users at the same time. Therefore, this segment is not the target of a buffer overflow, since attempting to write to it would result in a Memory Access Violation Error, which would mean that the program will close.
Data and BSS
The Data and BSS segments store global variables that can be accessed by any function. Since variables should not contain executable code, these memory segments are not executable and should be pointed to with a command pointer, which would mean that the program will close.
Stack
The stack segment stores local variables that are used in a function and can be accessed via the function's instructions. The variables are located in a memory segment, which is the stack and corresponds to a last-in-first-out (LIFO) method. This means that only access to the top element is granted, or to move the element to another location to be able to advance to the next element. Each function call creates a new stack frame for its data, which is removed once the function returns, with automatic cleanup after function completion. The stack is at the end of the memory accessible to the program and grows downward, so the top element is actually the element with the lowest address but is still called the top element. The CPU has a special register that keeps track of only the top of the stack, that is, whether the stack is growing or shrinking, and this register is called the stack pointer (SP).
Heap
The heap segment is a memory area where it is possible to request and release memory during the execution of a program. If a programmer would need 1000 bytes of memory, that would be allocated immediately with the function malloc() and returned to the system with the function free(). This can be useful in situations where it can't predict how much memory it will end up using, since it depends on what is put into the program. However, this requires manual allocation and deallocation by the programmer. Compared to the stack, the heap starts from the lowest address and grows upwards.
Buffer overflow variations
Three different types of buffer overflows are distinguished: stack overflow, heap overflow, and integer overflow. The stack overflow and heap overflow, also referred to as stack-based buffer overflow and heap-based buffer overflow, differ in where the buffer overflow exploit takes place, namely the stack or the heap in a system's memory.
Stack Overflow
A stack overflow is a type of program error that can cause a computer's buffer to overflow with too much data. This means that if a program wants to write something to an address outside its data structure, it will not end up in the buffer but will overwrite the memory location, which would then lead to an overflow. Such an attack exploits the fact that when a function is called, the CPU stores the address of the next instruction on the stack, and the compiler uses the frame pointer, which is also stored on the stack and is used to remember the beginning of the current stack area. Each called function then creates its own block of memory in the stack area, gradually filling the stack with the highest address. If a buffer is allocated to the stack and the input is copied to the stack by the user without verification, the attacker can expand the size of the input to overwrite the data behind it. This then leads to the possibility of the frame pointer pointing back to a memory address, more often than not into the buffer itself, containing malicious code.
Heap Overflow
Heap overflows work on a similar principle as stack overflows and differ in that they maintain persistence between function calls, among other things. This would mean that, as long as the memory area is not used later, the overflow is also not noticed. This is possible with the function malloc(), since the memory can be allocated and remains allocated until one executes the function free().
Integer Overflow
Integer overflows occur when an arithmetic operation attempts to generate a value that lies outside a range that can be represented with a specified number of bits. The most common result of an overflow is that the least significant representable bits of the result are stored. An overflow condition can lead to results that are equivalent to unintended behavior. In particular, if the possibility is not expected, an overflow can affect the reliability and safety of a program. A code example is shown below.
unsigned char a = 255; unsigned char b = 2; unsigned char Result = a + b;
The data type unsigned char is used, which comprises 8 bits, and the value range is from 0 to 255. For the variable "a," the value 255 is assigned, and for the variable "b," the value 2 is assigned. If an arithmetic operation is performed, we would get a result that requires more bits than are present to represent. The corresponding dual calculation is shown below.
11111111 (a) + 00000010 (b) ---------- 100000001 (Result)
The front one, the ninth bit, is no longer contained in the 8 bits of the data type unsigned char. If only the last 8 bits were considered, the result would be 1 and not 257.
32-bit vs. 64-bit systems
Additionally, buffer overflows vary based on the system used. Usually, this vulnerability is more common in 32-bit systems compared to 64-bit systems. This is because 64-bit systems can address more memory, making them less vulnerable to buffer overflows.
Use After Free Bug
The Use After Free bug is a vulnerability where memory should not be used in this way while the program is running. If a program clears memory, but the pointer to that memory is not yet cleared, an attacker can use this bug to gain access and control.
Countermeasures
Canaries
Canaries can be employed in order to recognize whether a buffer overflow has occurred. Therefore, in the simplest cases, canaries are specific values in memory that are located right after the thereby protected buffer. If their values are changed at any point, it is safe to assume that a buffer overflow occurred and countermeasures (e.g., termination) can be taken. Typical types of canaries, which are supported by security hardening technologies like ProPolice or Stackguard (GCC), are terminator canaries, random canaries, and random XOR canaries.
Data Execution Prevention (DEP)
Data Execution Prevention (DEP) is a security feature originally developed by MicrosoftR© for Windows XP SP2. There are two basic variants: hardware-based DEP and software-based DEP. If supported by the CPU as well as the process, hardware DEP will be employed; otherwise, DEP has to be carried out in software, which is part of the Windows operating system. The basic functionality of DEP is to prevent applications from executing code in a non-executable area of memory.
Address Space Layout Randomization (ASLR)
Since code reuse attacks (e.g., ROP attacks) require the memory addresses of gadgets to be known to an attacker, techniques to randomize their entry points have become increasingly popular. ASLR randomizes the location of data, and code region layout randomization offers a plausible defensive strategy since code region layout randomization hinders code reuse in exploits and data randomization impedes the redirection of control flow by making it difficult to guess the location of injected code (partly paraphrased from the 2013 paper by Snow et al.).
Standard Template Library (STL)
Using the Standard Template Library (STL) can significantly lower risks, as STL provides safer alternatives for string handling and vectors.
C++ Compiler
Compiling C code with a C++ compiler can enhance safety due to the stricter functionality of C++ compilers.
High-level Programming Languages
Using higher-level languages such as Java and C#. They come with built-in features like bounds-checked arrays and native string types, which offer more security.
Fuzz Testing
Fuzz testing involves testing applications with random and unexpected inputs to find potential vulnerabilities, including buffer overflows.
Unsafe Functions
Replacing unsafe writing instructions and functions, such as the most commonly used "strcpy," "strcat," and "sprintf,” with their safer counterparts can drastically decrease the likelihood of exploits.
Summary
Since the rise of C in the early 1970s, buffer overflows have become a serious security vulnerability. Even though high-level programming languages are typically not affected, the number of vulnerable systems is actually rising. At the same time, a wide array of countermeasures are also increasingly adopted and applied. Features like executable space protection (e.g., data execution prevention under Windows) have already been deployed since the mid-2000s, and on the compiler side, technologies like Stackguard support several detection and prevention mechanisms (e.g., different types of canaries). Furthermore, almost every widely used operation system supports Address Space Layout Randomization in order to minimize the attack surface for buffer overflow attacks. For example, at the beginning of 2020, most of the bigger operating systems (Linux, Windows, macOS, iOS, Android, Solaris, OpenBSD, etc.) will offer support for ASLR.
Another key point is the expansion of the Internet of Things (IoT). These widely distributed networks of hardware endpoints have deemed themselves the perfect target for buffer overflow attacks. This stems from the fact that IoT applications mostly utilize low-level, closely hardware-related languages such as C and C++, both of which are almost exclusively for buffer overflows.
Courses
- Ausgewählte Kapitel der IT-Security (2021, 2022)
- Ausgewählte Kapitel der IT-Security (2023, 2024)
References
- Smashing the stack for fun and profit (1996, Aleph One)
- Basic Integer Overflows (2002, Blexim)
- Just-In-Time Code Reuse: On the Effectiveness of Fine-Grained Address Space Layout Randomization (2013, Snow et al.)
- Buffer Overflow for Dummies (2002, Josef Nelißen)
- A Beginner’s Guide to Buffer Overflow (2021, Raj Chandel)
- 24 Deadly Sins of Software Security (2009, Michael Howard, David LeBlanc, and John Viega)
- United States v. Morris (1991, U.S. Dept. of Justice)
- 2023 CWE Top 25 Most Dangerous Software Weaknesses (2023, CWE)
- Stack memory (2023, Bill MacKenty)
- Heap memory (2023, Bill MacKenty)