Format String Bug Introduction

From Embedded Lab Vienna for IoT & Security
Jump to navigation Jump to search


This document gives an introduction into format string bugs. The Format String Bug occurs when the programmer passes user controlled buffer to the Format Function. Then user input data is evaluated as a command by the application. An exploit can read from the stack and perform arbitrary write on the stack and therefore change the program behavior. This can lead to program crashes (segmentation faults) and security compromise such as reveal of secrets in memory (information disclosure) or execute arbitrary commands.


Binaries are tested on platform: Debian 4.9.210-1 (2020-01-20)

 git clone
 cd ccst/format_string

All here discussed code examples can be found in the example folder. They are compiled in 32 bit architecture x86 using gcc version 6.3.0 (posix).

 gcc example -m32 -o example

Exploits for x64 architecture may differ

Description of Format String Vulnerability

The Bold textFormat function in ANSI C conversion function such as `printf`, which converts a variable into a human-readable string representation. The Format String is the argument of the Format Function, it contains an ASCII string with text and Format Parameter, such as:

 printf (“Guess solution: %d\n”, 42);

Here 42 is interpreted as a decimal number by the Format Function. The Format Parameter/Specifier %d,%s,%p,%u,.. defines the type of conversion function.

Format Parameter/Specifier Table

Parameters Output Passed as
%p External representation of a pointer to void Reference
%d Decimal Value
%c Character Value
%u Unsigned decimal Value
%x Hexadecimal Value
%s String Reference
%n Writes the number of characters into a pointer Reference

In vulnerable code the user input is directly passed to the function printf(userinput) instead of a Format String including Format Parameter.

Vulnerable Format Functions Table

Format function Description
fprint Writes the printf to a file
printf Output a formatted string
sprintf Prints into a string
snprintf Prints into a string checking the length
vfprintf Prints the a va_arg structure to a file
vprintf Prints the va_arg structure to stdout
vsprintf Prints the va_arg to a string
vsnprintf Prints the va_arg to a string checking the length


The Format String Bug is publicly known since at least September 1999[1], it has obtained major attention after public release of the exploit code anainst wu-ftpd 2.6.0 in June 2000[2].

Read From the Stack

Example - Read from the stack

Shows safe and vulnerable usage of Format Function

   #include  <stdio.h>
   void main(int argc, char **argv)
       // This line is safe
       printf("%s\n", argv[1]);
       // This line is vulnerable

Execute with different arguments:

   ./example "%p %p %p %p %p %p %p %p %p %p %p %p %p %p %p"
   ./example "%08x.%08x.%08x.%08x.%08x%08x.%08x.%08x.%08x.%08x"
   ./example "%s"

The Format Function interprets the passed command line arguments as formatted string with 15 pointer arguments. As a further specification is missing it will print the next 15 pointer from the stack, which means we are able to dump the stack.

What happens exactly?

  • Stack with correctly coded Format String

printf(“this is a %s, with a number %d, and address %08x”,a,b,&c); The stack grows to the lower addresses and is used in a LIFO manner, the arguments are pushed in reverse order on the stack. The Format Function will pop a, b, c variables and convert them into the string which is sent to stdout.(Source of image: [3])


  • Stack with missing variables

printf(userinput) with an input string “this is a %s, with a number %d, and address %08x” evaluates to printf(“this is a %s, with a number %d, and address %08x”); The variables a,b,&c are missing on the stack.(Source of image: [3])


The Format Function pops the next values from the stack even when they have not been declared. If the source code is available a Format String vulnerability can be recognized by checking all Format Functions, if the string including Format Specifier are correctly provided. In case of a binary test by simply inserting Format Specifiers "%x %x %s" as input.

Read by Direct Stack Parameter Access

File example2.c

   # include  <stdio.h>
   # include <string.h>
   void vuln() {
       printf("What's your name? ");
       char name[200];
       printf("Nice to meet you, ");
       printf(strcat(name, "!\n"));
   void main(int argc, char **argv)
       setvbuf(stdin, NULL, _IONBF, 0);
       setvbuf(stdout, NULL, _IONBF, 0);

We can read arbitrary from the stack by "%<some number>$x", so to read the first argument on the stack we use %1$08x for the second %2$08x... To get an idea where we are reading, we can use a method of inserting

   What's your name? AAAA.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x
   Nice to meet you,   

then we are able to determine the number of the input buffer on the stack where 41414141 is found. So the input buffer name is the 6th argument on the stack, which can be accessed by AAAA.%6$08x Caution - data on the stack is represented in little Endian on Unix machines. Test this by inserting

   What's your name? ABCD.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x
   Nice to meet you, ABCD.f18281a0.0000004f.ffffffaf.85571440.00000012.44434241.30252e78.2e783830.3830252e.252e7838.78383025.30252e78.2e783830.3830252e.252e7838!

The Format string with our input ABCD... can be found at the stack beginning with the 6th argument 44434241 in little Endian and can verified by ABCD.%6$08x.

Arbitrary Read

We like to read data in memory from an known address. The examples before showed the content of the Format String on the stack is controlled by us. Then we need a Format Parameter which uses an address from the stack (per reference) and displays the memory from there %s. Now we insert the address which we like to read and define that the 6th item on the stack should be read as string at the pointer address passed by reference with %s.

   address = 0x08480110
   address (encoded as 32 bit le string): "\x10\x01\x48\x08"
   printf ("\x10\x01\x48\x08_%08x.%08x.%08x.%08x.%08x|%s|");

(With this information you should be able to solve CH0_easyprintf)

Write the Stack

Write with %n

The %n Format Parameter writes the number of bytes already printed into a variable. The address of the variable is given to the format function by placing an integer pointer as parameter onto the stack. File: example3

   int n_chars = 0;
   printf("Hello, World%n is ", &n_chars);
   printf("%d bytes long.\n",n_chars);

In the first printf call with %n Format Parameter in example3 the number of 12 bytes of Hello, World are written into the variable `n_chars`.

%n can be used to write into an specified address


The next printf calls show how the value of n_chars can be controlled by inserting a number of bytes for alignment.

   n_chars = 0;
   printf("%10u%n   ", 1, &n_chars);        //n_chars=10
   n_chars = 0;
   printf("%150u%n   ", 2, &n_chars);        //n_chars=150

But addresses in memory are at least 4 bytes long and we do not like to overwrite these areas on the stack. Therefore techniques to write just a byte are used several times. Let us have a closer look what is happening, because writing just one byte may have some side effects. An unsigned integer in memory is stored in 4 bytes in Little Endian encoding, therefore a number like 0x0000014c is \x4c\x01\x00\x00 in memory. File: example4

   unsigned char   canary[5];
   unsigned char   foo[4];
   memset (foo, "\x00", sizeof (foo));
   /* 0 * before */ strcpy (canary, "AAAA");
   /* 1 */  printf ("%16u%n", 7350, (int *) &foo[0]);            //foo[0]=0x10
   /* 2 */  printf ("%32u%n", 7350, (int *) &foo[1]);            //foo[1]=0x20
   /* 3 */  printf ("%64u%n", 7350, (int *) &foo[2]);            //foo[3]=0x40
   /* 4 */  printf ("%128u%n", 7350, (int *) &foo[3]);           //foo[3]=0x80
   /* 5 * after */ printf ("%02x%02x%02x%02x\n", foo[0], foo[1],foo[2], foo[3]);
   printf ("canary: %02x%02x%02x%02x\n", canary[0],canary[1], canary[2], canary[3]);

In step 0 foo is initialized with 4 times \x00 bytes and in the next variable on the stack a canary is stored with AAAA presented by \x41\x41\x41\x41. After the execution of the 4 steps bytewise write operation foo contains 10204080 as expected, but 3 bytes of the canary got overwritten \x00\x00\x00\x41= 00000041. The next Figure illustrates this behavior of the write procedures.(Image source:[4])


Multiple byte writes at once can also be performed, when the written bytes are ordered. File: example4

   strcpy (canary, "AAAA");
   printf ("%16u%n%16u%n%32u%n%64u%n",1, (int *) &foo[0], 1, (int *) &foo[1],1, (int *) &foo[2], 1, (int *) &foo[3]);
   printf ("%02x%02x%02x%02x\n", foo[0], foo[1],foo[2], foo[3]);
   printf ("canary: %02x%02x%02x%02x\n", canary[0], canary[1], canary[2], canary[3]);

Output again foo: 10204080 and canary: 00000041.

Write Short %hn

Instead of writing each byte we can use a technique to write short integers using %hn format specifier. The advantage of using %hn is that it does not destroy data near the targeted address. Therefore this is the preferred method. Again we like to write 10204080 into foo. First value of alignment: 0x2040=8208 Second value of alignment: 0x8040-0x2040=32832-8208=24624

Source: example5

   printf ("%.8208u%hn%.24624u%hn",1, (short int *) &foo[0], 1, (short int *) &foo[2]);

Output: foo: 10204080 canary: 41414141 Writing short values enables an exploitation of the Format String Bug without overwriting nearby memory locations.

To overwrite an address of 4 bytes we just need two writing steps, be aware of the correct parameter passing.

Write Short Example: Given a binary with a Format String vulnerability vuln. We like to overwrite a variable test on the stack with the known address of test_addr=0xffeeecac and set it to test=0x1337babe. The input buffer used in the printf function is the 6th element on the stack. A Python script for exploitation string p32(addr)+p32(addr+2)+'%'+str(first)+'x%6$hn'+%'+ str(second)'+x%7$hn looks like

   from pwn import *
   p = process('./vuln')

(Now you should be able to solve the challenge CH1_root_me2 and CH2_printfun)


  • Reading arbitrary locations (leak addresses or canaries)
  • Writing arbitrary locations
  • Executing arbitrary code (Overwrite Return, .GOT addresses)


  • Programmer should use safe versions

Never pass user input directly to Format Function without format specifier.[5]

    snprintf(buf, sizeof buf, "%s", userinputbuffer);
  • FormatGuard: Automatic Protection From printf Format String Vulnerabilities [6]



Interesting CTF writeups


  1. Tymm Twillman,Exploit for proftpd 1.2.0pre6, BugTraq, 20-09-1999, [accessed 05-21-2020].
  2. tf8, WuFTPD: Providing *remote* root since at least 1994, BugTraq, 22-06-2000, [accessed 05-21-2020].
  3. 3.0 3.1 Saif El Sherei, Format String Exploitation-Tutorial,[accessed 05-21-2020].
  4. scut / team teso, Exploiting Format String Vulnerabilities, version 1.2,09-01-b2001,[accessed 05-21-2020]
  5. Michael Howard, David LeBlanc and John Viega, 19 deadly sins of software security programming flaws and how to fix them, Chapter 2, 2005, [accessed 05-22-2020].
  6. Crispin Cowan , Matt Barringer , Steve Beattie , Greg Kroah-hartman , Mike Frantzen and Jamie Lokier, FormatGuard: Automatic Protection From printf Format String Vulnerabilities, Usenix, 2001 [accessed 05-21-2020].