Buffer Overflows

From Embedded Lab Vienna for IoT & Security
Jump to navigation Jump to search
Buffer Overflow examle [1]

Introduction

Buffer overflow is a security vulnerability that should not be underestimated, as it has been the most common type of vulnerability in the last decade. This type of attack is an essential part of all security attacks, as buffer overflow vulnerabilities are widespread and easy to exploit. Especially in the field of cyber attacks, they are exploited by users to gain access to vulnerable servers and control them.

Definitions

Buffer

A buffer is defined as a limited, contiguously allocated set of memory. The most common buffer in C is an array. [2]

Buffer Overflow

Buffer Overflows are possible because in the C and C++ languages, there exists no inherent bounds-checking to ensure that data being copied into a buffer will not be larger than what the buffer was initialized to hold. Consequently, if the person writing the program has not explicitly coded the program to check for oversize input, it is possible for data to fill a buffer, and if that data is large enough, to continue to write past the end of the buffer. [2]

Common Weakness Enumeration (CWE)

The importance of addressing buffer overflow vulnerabilities can be seen by examining Mitre’s respective parent category: [3]

  • CWE-119: Improper Restriction of Operations within the Bounds of a Memory Buffer

There is a large variety of direct and indirect child categories that further help create a taxonomy of the issues at hand. Some of those are listed here: [4] [5]

  • CWE-788: Access of Memory Location After End of Buffer
  • CWE-787: Out-of-bounds Write
  • CWE-786: Access of Memory Location Before Start of Buffer
  • CWE-125: Out-of-bounds Read
  • CWE-120: Buffer Copy Without Checking Size of Input (’Classic Buffer Over- flow’)
  • CWE-121: Stack-based Buffer Overflow
  • CWE-122: Heap-based Buffer Overflow
  • CWE-126: Buffer Over-read

History

In the past, there were often a series of events executed with the help of buffer overflows. In 1980, the term Internet worm was a very well-known and sensitive topic because a worm was malicious software that reproduced itself and spread through network connections. It was called a Morris worm because on November 2, 1988, an event occurred that changed the way people thought about networks and about the Internet. On that day, tens of thousands of computers quickly and simultaneously became infected with a self-replicating computer program. Back then, computer science student Robert T. Morris created a computer worm that exploited an unsafe function, which at the time was very commonly used and distributed. Through the practical application of a buffer overflow, the computer worm spread itself around at an alarming rate and, going back, nearly shut down the entire internet. This situation resulted in Morris being the first person convicted under the Computer Fraud and Abuse Act, demonstrating further how dangerous buffer overflows can be. To this day, this attack is perhaps one of the most significant events in the history of computing. With buffer overflows, it was also possible to bypass various security measures. For example, buffer overflows could be used to remove software restrictions from firmware or to bypass copy protection. This was the case with the Android and iOS operating systems, where it was possible to remove various locks and modify the smartphone according to one's own wishes. For example, it was possible to install apps that were not available in the store elsewhere, change the boot animation, access hidden system files, remove manufacturer-specific apps, remove network locks, and much more. One keyword is "jailbreaking" for Apple devices and "rooting" for Android devices. On Nintendo's game console, a game called Pokemon Yellow could be changed from the inside by manipulating the program using shellcode.

While many of these events already reside in the past, buffer overflows do not. The exploit reoccurs frequently, so much so that in 2023 they still secured themselves a spot on the Common Weakness Enumeration/SANS list of the Top 25 Most Dangerous Software Errors. Whether through the adaptation of the exploit's mechanism or simply by focusing on a new set of targets, buffer overflows stay relevant. That's why, when dealing with cybersecurity of any sort, there is no way past them.

Technical Background and Context

Memory Layout of a Process

Informally, a process is a program in execution. The status of the current activity of a process is represented by the value of the program counter/instruction pointer and the content of the processor’s registers. Different parts of the program are stored in different memory segments, as shown in the figure below.

Memory structure.png

[1]

  • Stack section: temporary data storage when invoking functions (such as function parameters, return addresses, and local variables)
  • Heap section: memory that is dynamically allocated during program run time
  • Data section: global variables (initialized and uninitialized)
  • Code/Text section: the executable, machine-readable bytecode

Readable, Writeable, Executable Memory

Memory regions have different rights with respect to the process they belong to. For instance, the text section will usually be marked read-only, and any attempt to write to it will result in a Segmentation Fault. [6] All other sections have to be writable in order for the process to work properly. [2] Whether a section is executable will largely depend on the applied settings during the compilation of the program and the platform on which it is executed.

The Stack and Important Control Structures

The stack’s primary purpose is to implement and help with the use of functions. A function call alters the flow of execution through a program. However, when its task is completed, a function returns control to the statement or instruction following the function call. Furthermore, the stack is used to allocate memory for the local variables used in the functions and to return values from the function. [6]

During a function call, arguments are pushed onto the stack in reverse order, and the return address (the instruction pointer EIP) is saved to enable returning to the caller after function execution. Key stack-related registers are:

  • EIP (Instruction Pointer): Stores the next instruction's address.
  • ESP (Stack Pointer): Points to the top of the stack.
  • EBP (Base Pointer): Marks the base of the stack frame for the current function.

When a function is called, the stack is manipulated using PUSH and POP instructions. A return address is pushed onto the stack, allowing the program to return to the calling function. Buffer overflows can occur if a function does not properly limit the size of data written to buffers, potentially overwriting the return address. An attacker may overwrite the return pointer to redirect the flow of execution.

Causes

C/C++: Vulnerable Functions

C is the most affected programming language when it comes to creating buffer overflows, closely followed by C++. [7] When writing C code, the programmer is responsible for data integrity. If this responsibility were shifted over to the compiler, the resulting binaries would be significantly slower and less efficient. Furthermore, C’s simplicity increases the programmer’s control. However, this can result in programs that are vulnerable to buffer overflows and memory leaks if the programmer isn’t careful. [8]

Some vulnerable functions from the C/C++ standard are listed here, each with their respective, saver counterpart:

Vulnerable Safer
strcpy() strncpy()
gets() fgets()
strcat() strncat()
memcpy() memmove()
scanf() sscanf()
memset() -

IoT/Embedded Devices

"The constrained resources of embedded systems leads to the predominant use of the C language. This, along with the tight processing requirements, leave associated devices open to BOF based attacks." [9]

Dangers

Arguably, the most significant consequence of a buffer overflow is when the attacker is able to execute their own malicious shellcode. Which is also referred to as arbitrary code execution. This way it is completely up to the attacker itself what they will do with the same privileges as the program on the level of the current user. Create a reverse shell to execute remote code, stealing data or even manipulating all settings of the system. By getting elevated privileges, the attacker would also be able to run code as an administrator or root. Another potential consequence is a Denial of Service (DoS) attack. Instead of executing code, the goal is to cause unpredictable behavior or crash the system by overwriting as much data as possible until the program is unable to handle it. The downtime of a program can be costly for large organizations. Data corruption, system instability, and information disclosure, are all potential consequences that can be caused by a buffer overflow. Consequently, it is up to the creativity of the attacker once a vulnerable code is identified.

Summed up, these four are the main dangers:

  • Data Corruption
  • Program Crashes
  • Exploitation: Control Flow Alteration
  • Exploitation (via Shellcode Injection): Arbitrary Code Execution

Types of Buffer Overflow Vulnerabilities

The following section will introduce descriptions and simple examples related to the most relevant categories of buffer overflow vulnerabilities. The distinction between Buffer Over-Read and Buffer Over-Write provides a basic categorization of the issues at hand. The respective sections will include simple, illustrative examples. The sections on Stack Buffer Overflow and Heap Buffer Overflow will present more complex concepts, introducing the mechanics and control structures of each memory region and how they could potentially be exploited. Note that Over-Read and Over-Write vulnerabilities can occur in both sections of process memory.

Buffer Over-Read

The following is a simple example of a Out-of-bounds read operation, that can be performed by a C program:


#include <stdio.h> 
#include <string.h> 

int main(){

int array[5] = {1 ,2 ,3 ,4 ,5}; 
printf("%d\n", array[5]);

}

This program illustrates a common error that new programmers might run into when learning about arrays: Referencing array[5] will eventually attempt to access its sixth element, although the array was only initialized to hold five elements. When this program is run, it will read beyond the bounds of the array and might produce unexpected results.

Buffer Over-Write

In this section, a simple Out-of-bounds write vulnerability will be examined. Although this particular example showcases a stack-based buffer overflow, we will use it to introduce the general concept of a over-write vulnerability and delve into specifically exploiting the stack's mechanics and control structures in the later section "Stack-based Buffer Overflow".

The following program deliberately uses the insecure strcpy() function to copy a large string into a buffer, which is actually too small for the string. The following listing displays the complete source code.

#include <stdio.h> 
#include <string.h>

int main() {

    char B[3] = "03"; // gets higher address on stack
    char A[8] = "0000000"; // gets lower address on stack

    printf("Before overflow: A = %s, B = %s\n", A, B);

    strcpy(A, "excessive"); // dangerous 
    printf("After overflow: A = %s, B = %s\n", A, B);

    return 0;
}

Console output running the program (bash).

Before overflow: A = 0000000, B = 03
After overflow: A = excessive, B = e

Inspection of memory before the overflow (gdb). The memory addresses on the left are the locations of the two buffers.

0xffffcc85:  0x30  0x30  0x30  0x30  0x30  0x30  0x30  0x00
0xffffcc8d:  0x30  0x33  0x00

Inspection of memory after the overflow (gdb).

0xffffcc85:  0x65  0x78  0x63  0x65  0x73  0x73  0x69  0x76
0xffffcc8d:  0x65  0x00  0x00

Integer Overflow

Integer overflows occur when an arithmetic operation attempts to generate a value that lies outside a range that can be represented with a specified number of bits. The most common result of an overflow is that the least significant representable bits of the result are stored. An overflow condition can lead to results that are equivalent to unintended behavior. In particular, if the possibility is not expected, an overflow can affect the reliability and safety of a program. A code example is shown below.

unsigned char a = 255;
unsigned char b = 2;
unsigned char Result = a + b;

The data type unsigned char is used, which comprises 8 bits, and the value range is from 0 to 255. For the variable "a," the value 255 is assigned, and for the variable "b," the value 2 is assigned. If an arithmetic operation is performed, we would get a result that requires more bits than are present to represent. The corresponding dual calculation is shown below.

  11111111 (a)
+ 00000010 (b)
----------
 100000001 (Result)

The front one, the ninth bit, is no longer contained in the 8 bits of the data type unsigned char. If only the last 8 bits were considered, the result would be 1 and not 257.

The Ariane 5 disaster of 1996 illustrates what can happen when integer overflows are not handled properly in the real world. The rocket was destroyed 37 seconds after launch because of a severe software error, which was caused by an integer overflow. This shows how unresolved integer overflows can have disastrous consequences, particularly in safety-critical systems like aerospace control software. The incident could have been prevented if there had been more robust input validation and error handling procedures, particularly during the conversion process from floating-point to integer data types. [10]

Stack-based Buffer Overflow

If more data is written to a buffer than the size assigned to it, there is a chance it could overwrite the adjacent memory. This is called a stack overflow. Since the memory overwritten contains valuable information like the return address, this exploitation is a prime target. As an example, consider an input string that gets copied to the stack from a vulnerable function, that does not validate the size. Whatever was written to this string, as long as it is longer than the size of the buffer, will overwrite memory next to it. If an attacker creates a string that reaches the return address with a new address on purpose, the program will jump to this exact location after its execution. This type of attack is called "stack smashing". It is frequently used and is well-known in the context of buffer overflows. A common problem of stack smashing is how to guess the starting address of the own shellcode. Just by guessing or brute-forcing the exact return value, it would take the attacker a long time and an enormous number of attempts. This can be minimized by inserting a series of NOP instructions in front of the shellcode. A NOP instruction is a special operation which will push the execution one by one until it hits the malicious code. Adding as many NOP instructions as possible increases the chances of reaching the desired code. This means that as long as the return address points to any NOP instruction, the application will execute the shellcode and we do not have to directly hit the address where the code starts.

Stack overflows which alter return addresses, are one of the most dangerous methods of gaining unauthorized access to a system. They allow an attacker to circumvent standard program execution and potentially gain complete control. [6]

Heap-based Buffer Overflow

A program may not know at compile time how much memory it will need. Such segments can be allocated dynamically, during runtime, and will be placed on the heap. Special system calls brk() and mmap() can be used by Linux programs to achieve that. The functions malloc(), calloc(), realloc() and free() provide convenient wrapper functions around these system calls and help manage those memory segments. [2]

To be efficient, any malloc() implementation stores a lot of meta-data about the location of the chunks, the size of the chunks, and perhaps some special areas for small chunks. It also organizes this information. In dlmalloc, it is organized into buckets, and in many other malloc implementations it is organized into a balanced tree structure. This information is stored in two places: in global variables used by the malloc() implementation itself, and in the memory block before and/or after the allocated user space. Thus, the heap contains important information about the state of memory stored directly after any user-allocated buffer. [2]

Use-After-Free Vulnerability

Use-after-free (UAF) vulnerabilities are a specialized form of a memory management bug occurring when a program tries to use a memory object even though it is already deallocated. This vulnerability can happen, because the pointer to the corresponding memory location may remain accessible, despite the memory itself being marked as accessible for other allocations. If a program tries to access this now-freed memory, it could cause some unexpected behavior or even a security breach. Since an attacker could exploit this "dangling pointer" to gain unauthorized access to the system or leak information meant to be kept confidential. [11]

Mitigation Techniques and How to Disable Them

This section will briefly introduce some mitigation techniques that can potentially prevent buffer overflow attacks. First, it will explain the basic concepts in a simple way, and second, it will provide brief guides on how to disable certain mitigation techniques for research purposes.

Adress Space Layout Randomisation (ASLR)

ASLR randomizes the memory address space layout of processes, making it harder to predict the location of specific functions or buffers. [2] Since code reuse attacks (e.g., ROP attacks) require the memory addresses of gadgets to be known to an attacker, techniques to randomize their entry points have become increasingly popular.

ASLR can be temporarily disabled at the system level (Linux):

echo 0 | sudo tee /proc/sys/kernel/randomize_va_space

Re-enable after testing:

echo 2 | sudo tee /proc/sys/kernel/randomize_va_space

Non-executable Stack (nx-stack)

A non-executable stack, or nx-stack, prevents execution of code in the stack, when designated as non-executable, mitigating buffer overflow attacks that inject shellcode into that region. [2]

To disable nx-stack on a program, use the -z execstack option during compilation. This allows execution of code in the stack memory:

gcc -z execstack -o vulnerable_program source.c

Data Execution Prevention (DEP)

Data Execution Prevention (DEP) is a security feature originally developed by MicrosoftR© for Windows XP SP2. There are two basic variants: hardware-based DEP and software-based DEP. If supported by the CPU as well as the process, hardware DEP will be employed; otherwise, DEP has to be carried out in software, which is part of the Windows operating system. The basic functionality of DEP is to prevent applications from executing code in a non-executable area of memory.

Stack Canaries

Stack canaries are small random values placed on the stack to detect and prevent buffer overflow attacks. If an overflow occurs and modifies the stack, the canary value will change, triggering a security alert or crash. [2] Typical types of canaries, which are supported by security hardening technologies like ProPolice or Stackguard (GCC), are terminator canaries, random canaries, and random XOR canaries.

To disable stack canaries when compiling a program, use the -fno-stack-protector option with gcc:

gcc -fno-stack-protector -o vulnerable_program source.c

Conclusion

Since the rise of C in the early 1970s, buffer overflows have become a serious security vulnerability. Even though high-level programming languages are typically not affected, the number of vulnerable systems is actually rising. At the same time, a wide array of countermeasures are also increasingly adopted and applied. Features like executable space protection (e.g., data execution prevention under Windows) have already been deployed since the mid-2000s, and on the compiler side, technologies like Stackguard support several detection and prevention mechanisms (e.g., different types of canaries). Furthermore, almost every widely used operating system supports Address Space Layout Randomization in order to minimize the attack surface for buffer overflow attacks. For example, at the beginning of 2020, most of the bigger operating systems (Linux, Windows, macOS, iOS, Android, Solaris, OpenBSD, etc.) will offer support for ASLR.

Another key point is the expansion of the Internet of Things (IoT). These widely distributed networks of hardware endpoints have deemed themselves the perfect target for buffer overflow attacks. This stems from the fact that IoT applications mostly utilize low-level, closely hardware-related languages such as C and C++, both of which are almost exclusively for buffer overflows.

References

  1. What is a Buffer Overflow Attack, [Online]. Available: https://www.wallarm.com/what/buffer-overflow-attack-definition-types-use-by-hackers-part-1. Accessed: Dec. 11, 2024
  2. 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 C. Anley, The Shellcoder’s Handbook: Discovering and Exploiting Security Holes, 2nd ed., Indianapolis: Wiley, 2007
  3. MITRE, "CVE Search Results for 'buffer overflow'," [Online]. Available: https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=buffer+overflow. Accessed: Oct. 16, 2024.
  4. MITRE, "CWE-119: Improper Restriction of Operations within the Bounds of a Memory Buffer," [Online]. Available: https://cwe.mitre.org/data/definitions/119.html. Accessed: Oct. 16, 2024.
  5. MITRE, "CWE-788: Access of Memory Location After End of Buffer," [Online]. Available: https://cwe.mitre.org/data/definitions/788.html. Accessed: Oct. 16, 2024.
  6. 6.0 6.1 6.2 A. One, "Smashing the Stack for Fun and Profit," Phrack, vol. 7, no. 49, Nov. 1996.
  7. M. Howard, D. LeBlanc, and J. Viega, *24 Deadly Sins of Software Security: Programming Flaws and How to Fix Them*, 1st ed. USA: McGraw-Hill, Inc., 2009.
  8. J. Erickson, *Hacking: The Art of Exploitation*, 2nd ed. San Francisco: No Starch Press, 2008.
  9. G. Mullen and L. Meany, "Assessment of Buffer Overflow Based Attacks On an IoT Operating System," 2019 Global IoT Summit (GIoTS), Aarhus, Denmark, 2019, pp. 1-6, doi: 10.1109/GIOTS.2019.8766434.
  10. Innovative Bytes, Ariane-5 disaster - (integer overflow - space requirements), Medium, October 2022
  11. Byoungyoung Lee, Chengyu Song, Yeongjin Jang, Tielei Wang, Taesoo Kim, Long Lu, and Wenke Lee. Preventing use-after-free with dangling pointers nullification. In Network and Distributed System Security Symposium, 2015

Further Reading