《coredump问题原理探究》Linux x86版3.2节栈布局之函数桢

看一个例子：

void FuncA()
{
}

void FuncB()
{
    FuncA();
}

int main()
{
    FuncB();
    return 0;
}

用下面命令编译出它发布版本：

[buckxu@xuzhina 1]$ g++  -o xuzhina_dump_c3_s1_relxuzhina_dump_c3_s1.cpp

在讨论它们的栈之前，先分析一下main，FuncB，FuncA这三个函数的汇编：

(gdb) disassemble FuncA
Dump of assembler code for function_Z5FuncAv:
  0x08048470 <+0>:    push   %ebp
  0x08048471 <+1>:    mov    %esp,%ebp
  0x08048473 <+3>:    pop    %ebp
  0x08048474 <+4>:    ret   
End of assembler dump.

(gdb) disassemble FuncB
Dump of assembler code for function_Z5FuncBv:
  0x08048475 <+0>:    push   %ebp
  0x08048476 <+1>:    mov    %esp,%ebp
  0x08048478 <+3>:    call   0x8048470 <_Z5FuncAv>
  0x0804847d <+8>:    pop    %ebp
  0x0804847e <+9>:    ret   
End of assembler dump.

(gdb) disassemble main
Dump of assembler code for function main:
  0x0804847f <+0>:    push   %ebp
  0x08048480 <+1>:    mov    %esp,%ebp
  0x08048482 <+3>:    call   0x8048475 <_Z5FuncBv>
  0x08048487 <+8>:    mov    $0x0,%eax
  0x0804848c <+13>:   pop    %ebp
  0x0804848d <+14>:   ret   
End of assembler dump.

从它们的汇编，都可以看到在这三个函数的开头，都有这样的指令：

push  %ebp
mov   %esp,%ebp

而在它们的结尾则有这样的指令：

pop   %ebp
ret

在没有使用gcc的-fomit-frame-pointer选项来编译的函数一般都会有这样的开头和结尾。这几行指令可以看作是函数的开头和结尾的特征。像FuncA这样空叶子函数，一般就是由这两个特征拼起来的。

在x86里，ebp存放着函数桢指针，而esp则指向当前栈顶位置，而eip则是要执行的下一条指令地址。

所以，函数开头的两条指令的含义如下

push  %ebp             // esp = esp – 4，把ebp的值放入到esp指向的地址
mov   %esp,%ebp        // 把esp的值放到ebp里。即ebp = esp

函数结尾两条指令的含义如下

pop   %ebp                      // 把esp指向地址的内容放到ebp， esp ＝ esp+4
ret                             //把ebp指向地址下一个单元的内容放到eip，esp ＝ esp + 4

下面验证一下上面的内容。在main函数的开头指令地址0x0804847f打断点

(gdb) tbreak *0x0804847f
Temporary breakpoint 1 at 0x804847f

逐步地看一下是不是

(gdb) r
Starting program:/home/buckxu/work/3/1/xuzhina_dump_c3_s1_rel
 
Temporary breakpoint 1, 0x0804847f in main()
(gdb) i r ebp esp
ebp            0x0      0x0
esp            0xbffff4dc       0xbffff4dc
(gdb) x /4x $esp
0xbffff4dc:     0x4a8bf635      0x00000001      0xbffff574      0xbffff57c
(gdb) ni
0x08048480 in main ()
(gdb) i r ebp esp
ebp            0x0      0x0
esp            0xbffff4d8       0xbffff4d8
(gdb) x /4x $esp
0xbffff4d8:     0x00000000      0x4a8bf635      0x00000001      0xbffff574

果然，在运行了

push  %ebp

之后，esp的值由0xbffff4dc变为0xbffff4d8，而它所指向的单元刚好是ebp的值0。这一操作实质是把旧的函数桢指针保存到栈里。

再继续执行下去

(gdb) ni
0x08048482 in main ()
(gdb) i r ebp esp
ebp            0xbffff4d8       0xbffff4d8
esp            0xbffff4d8       0xbffff4d8
(gdb) x /4x $esp
0xbffff4d8:     0x00000000      0x4a8bf635      0x00000001      0xbffff574

可见

mov   %esp,%ebp

确实是把esp的值赋给了ebp，这一操作实质是设置函数桢指针，新的函数桢指针所指向的地址刚好放着旧的函数桢指针。

那为什么要设置函数桢呢？只是考察了main函数，并不一定能够找到规律，继续执行FuncB，FuncA，看一下能不能找到规律。

(gdb) si
0x08048475 in FuncB() ()
(gdb) i r esp ebp
esp            0xbffff4d4       0xbffff4d4
ebp            0xbffff4d8       0xbffff4d8
(gdb) x /4x $esp
0xbffff4d4:     0x08048487      0x00000000      0x4a8bf635      0x00000001

只是进入了FuncB,没有进行任何操作，esp的值却比ebp要小4个字节，变为0xbffff4d4,而0xbffff4d4地址却放入一个值0x08048487,正好是

  0x08048482 <+3>:    call   0x8048475 <_Z5FuncBv>

的下一条指令的地址：

   0x08048487 <+8>:     mov   $0x0,%eax

这是由于main函数使用call指令调用FuncB时，call指令把0x08048487压入栈的。这个地址实际上就是main函数的返回地址。

继续执行

(gdb) ni
0x08048476 in FuncB() ()
(gdb) ni         
0x08048478 in FuncB() ()
(gdb) i r ebp esp
ebp            0xbffff4d0       0xbffff4d0
esp            0xbffff4d0       0xbffff4d0
(gdb) i r eip
eip            0x8048478        0x8048478 <FuncB()+3>
(gdb) x /4x $esp
0xbffff4d0:     0xbffff4d8      0x08048487      0x00000000      0x4a8bf635
(gdb) x /4x 0xbffff4d8
0xbffff4d8:     0x00000000      0x4a8bf635      0x00000001      0xbffff574

从上面可以看到，函数桢指针可能是一种单链表关系，它的表头指针由ebp来放置。用C/C++语言方式表示，则是

struct  FramePointer
{
       struct  FramePointer* next;
};

它是不是一种链表关系呢？继续执行到FuncA。

(gdb) tbreak FuncA
Temporary breakpoint 2 at 0x8048473
(gdb) c
Continuing.
 
Temporary breakpoint 2, 0x08048473 inFuncA() ()
(gdb) i r ebp esp
ebp            0xbffff4c8       0xbffff4c8
esp            0xbffff4c8       0xbffff4c8
(gdb) x /4x $ebp
0xbffff4c8:     0xbffff4d0      0x0804847d      0xbffff4d8      0x08048487
(gdb) x /4x 0xbffff4d0
0xbffff4d0:     0xbffff4d8      0x08048487      0x00000000      0x4a8bf635
(gdb) x /4x 0xbffff4d8
0xbffff4d8:     0x00000000      0x4a8bf635      0x00000001      0xbffff574

可见函数桢指针确实如上面structFramePointer,是由ebp为表头。而且考察每个在栈里的函数桢指针的下一个单元内容和eip的值，会发现这样的情况：

(gdb) i r eip
eip            0x8048473        0x8048473 <FuncA()+3>
(gdb) info symbol 0x0804847d
FuncB() + 8 in section .text of/home/buckxu/work/3/1/xuzhina_dump_c3_s1_rel
(gdb) info symbol 0x08048487
main + 8 in section .text of/home/buckxu/work/3/1/xuzhina_dump_c3_s1_rel

正好和栈的内容一致：

(gdb) bt
#0  0x08048473 in FuncA() ()
#1  0x0804847d in FuncB() ()
#2  0x08048487 in main ()

所以，函数桢指针的结构扩展如下：

struct FramePointer
{
       struct FramePointer* next;
       void*  ret;               //返回地址
};

用图形表示的话，如下：

gdb也是根据这个规律来解析栈，才能够显示正确的栈。那么不正确的栈是怎样的呢？

在了解这个问题之前，先考察一下函数结尾的特征指令

pop   %ebp                      // 把esp指向地址的内容放到ebp， esp ＝ esp+4
ret                             //把ebp指向地址下一个单元的内容放到eip， esp ＝ esp + 4

由于现在程序已经执行到FuncA，看一下执行完FuncA，ebp，eip，esp的值是不是真的这样变化：

(gdb) x /4x $ebp
0xbffff4c8:     0xbffff4d0     0x0804847d      0xbffff4d8      0x08048487
(gdb) i r eip esp ebp
eip            0x8048473        0x8048473 <FuncA()+3>
esp            0xbffff4c8       0xbffff4c8
ebp            0xbffff4c8       0xbffff4c8
(gdb) ni
0x08048474 in FuncA() ()
(gdb) ni
0x0804847d in FuncB() ()
(gdb) i r eip esp ebp
eip            0x804847d        0x804847d <FuncB()+8>
esp            0xbffff4d0       0xbffff4d0
ebp            0xbffff4d0       0xbffff4d0

果然ebp，esp，eip的变化在意料之中。

现在来仔细考察一下函数结尾的特征指令，pop %ebp是把栈顶的内容放入ebp。如果栈顶的内容被修改了，指向一个非法的位置，结果会怎样？

在这里，重新跑一下程序，在FuncA退出前，修改一下栈的内容。

(gdb) tbreak FuncA
Temporary breakpoint 1 at 0x8048473
(gdb) r
Starting program:/home/buckxu/work/3/1/xuzhina_dump_c3_s1_rel
 
Temporary breakpoint 1, 0x08048473 inFuncA() ()
 (gdb) bt
#0  0x08048473 in FuncA() ()
#1  0x0804847d in FuncB() ()
#2  0x08048487 in main ()
(gdb) i r ebp eip esp
ebp            0xbffff4c8       0xbffff4c8
eip            0x8048473        0x8048473 <FuncA()+3>
esp            0xbffff4c8       0xbffff4c8
(gdb) x /4x $esp
0xbffff4c8:     0xbffff4d0      0x0804847d      0xbffff4d8      0x08048487
(gdb) set *0xbffff4c8 =0x10
(gdb) ni
0x08048474 in FuncA() ()
(gdb) i r ebp eip esp
ebp            0x10     0x10
eip            0x8048474        0x8048474 <FuncA()+4>
esp            0xbffff4cc       0xbffff4cc
(gdb) ni
Cannot access memory ataddress 0x14
(gdb) bt
#0  0x0804847d in FuncB() ()
Cannot access memory ataddress 0x14
(gdb) c
Continuing.
[Inferior 1 (process 1178) exited normally]

可以看到，gdb只能解析出FuncB函数桢，而无法解析出桢main函数桢，只因为FuncB的返回地址没有被修改。

再看一下函数结尾的第二条指令，ret是把esp所指向的内容放到eip里。假如esp所指向的内容是非法，栈又会变成怎样？

在重新考察一下这个程序，但这次是修改esp+4那个单元的内容。

(gdb) tbreak FuncA
Temporary breakpoint 1 at 0x8048473
(gdb) r
Starting program: /home/buckxu/work/3/1/xuzhina_dump_c3_s1_rel
 
Temporary breakpoint 1, 0x08048473 inFuncA() ()
(gdb) bt
#0  0x08048473 in FuncA() ()
#1  0x0804847d in FuncB() ()
#2  0x08048487 in main ()
(gdb) i r eip esp ebp
eip            0x8048473        0x8048473 <FuncA()+3>
esp            0xbffff4c8       0xbffff4c8
ebp            0xbffff4c8       0xbffff4c8
(gdb) x /4x $esp
0xbffff4c8:     0xbffff4d0      0x0804847d      0xbffff4d8      0x08048487
(gdb) set *0xbffff4cc= 0x10
(gdb) ni
0x08048474 in FuncA() ()
(gdb) ni
0x00000010 in ?? ()
(gdb) bt
#0  0x00000010 in ?? ()
#1  0xbffff4d8 in ?? ()
#2  0x08048487 in main ()
(gdb) c
Continuing.
 
Program received signalSIGSEGV, Segmentation fault.
0x00000010 in ?? ()
(gdb) bt
#0  0x00000010 in ?? ()
#1  0xbffff4d8 in ?? ()
#2  0x08048487 in main ()

可以看到，出现了“？？”的栈，这也是因为函数的返回地址被修改的原因。那么，存放在栈上的函数桢指针和返回地址都被修改了，栈又会变成怎样？

(gdb) tbreak FuncA
Temporary breakpoint 1 at 0x8048473
(gdb) r
Starting program:/home/buckxu/work/3/1/xuzhina_dump_c3_s1_rel
 
Temporary breakpoint 1, 0x08048473 inFuncA() ()
(gdb) bt
#0  0x08048473 in FuncA() ()
#1  0x0804847d in FuncB() ()
#2  0x08048487 in main ()
(gdb) i r eip esp ebp
eip            0x8048473        0x8048473 <FuncA()+3>
esp            0xbffff4c8       0xbffff4c8
ebp            0xbffff4c8       0xbffff4c8
(gdb) x /4x $esp
0xbffff4c8:     0xbffff4d0      0x0804847d      0xbffff4d8      0x08048487
(gdb) set *0xbffff4c8= 0x10
(gdb) set *0xbffff4cc= 0x20
(gdb) ni
0x08048474 in FuncA() ()
(gdb) ni
0x00000020 in ?? ()
(gdb) i r eip esp ebp           
eip            0x20     0x20
esp            0xbffff4d0       0xbffff4d0
ebp            0x10     0x10
(gdb) x /4x $esp           
0xbffff4d0:     0xbffff4d8      0x08048487      0x00000000      0x4a8bf635
(gdb) bt
#0  0x00000020 in ?? ()
#1  0xbffff4d8 in ?? ()
Backtrace stopped:previous frame inner to this frame (corrupt stack?)
(gdb) c
Continuing.
 
Program received signalSIGSEGV, Segmentation fault.
0x00000020 in ?? ()
(gdb) bt
#0  0x00000020 in ?? ()
#1  0xbffff4d8 in ?? ()
Backtrace stopped: previous frame inner tothis frame (corrupt stack?)

可以看到，这正好是前言看到那种的栈。现在可以知道，之后会出现“？？“的栈，是因为存在栈上的函数桢指针和返回地址被修改了。在实际开发过程中，往往会由于拷贝内存导致这种情况。这种情况叫做栈溢出。

在这一章的最后一节“coredump例子“会显示怎样恢复部分正常的栈。而为什么内存拷贝之类的操作会导致栈溢出，原因会放在第5章里讲述。