本文内容摘录自过个网站。在Linux下C/C++应用程序段错误经常发生,如何快速准确定位到出错地方,是本文讨论的主要内容。这里介绍两种不同环境下 的解决方法,带gdb调试环境的开发环境和不带gdb的嵌入式或发布环境。这些方法的关键都是利用程序运行调用堆栈来定位出错地点。
关键词:段错误(Segmentation fault),SIGSEGV,gdb,backtrace,objdump
一、段错误简要介绍和分析
[参见
http://www.upsdn.net/html/2006-11/775.html]
简而言之,产生段错误就是访问了错误的内存段,一般是你没有权限,或者根本就不存在对应的物理内存,尤其常见的是访问0地址.
一般来说, 段错误就是指访问的内存超出了系统所给这个程序的内存空间,通常这个值是由gdtr来保存的,他是一个48位的寄存器,其中的32位是保存由它指向的 gdt表,后13位保存相应于gdt的下标,最后3位包括了程序是否在内存中以及程序的在cpu中的运行级别,指向的gdt是由以64位为一个单位的表, 在这张表中就保存着程序运行的代码段以及数据段的起始地址以及与此相应的段限和页面交换还有程序运行级别还有内存粒度等等的信息。一旦一个程序发生了越界 访问,cpu就会产生相应的异常保护,于是segmentation fault就出现了.
在编程中以下几类做法容易导致段错误,基本是是错误地使用指针引起的
1)访问系统数据区,尤其是往 系统保护的内存地址写数据
最常见就是给一个指针以0地址
2)内存越界(数组越界,变量类型不一致等) 访问到不属于你的内存区域
二、解决方案
- 带GDB调试环境
先上例子再说。文件d.c内容如下
1 dummy_function (void)
2 {
3 unsigned char *ptr = 0x00;
4 *ptr = 0x00;
5 }
6
7 int main (void)
8 {
9 dummy_function ();
10
11 return 0;
12 }
第四行会造成段错误。
gcc -g d.c -o d 编译生成可执行程序d
运行结果如下
[chen@localhost seg]$ ./d
Segmentation fault
用gdb调试
[chen@localhost seg]$ gdb ./d
GNU gdb Red Hat Linux (6.5-25.el5rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1".
(gdb) r
Starting program: /home/chen/code/seg/d
Program received signal SIGSEGV, Segmentation fault.
0x08048364 in dummy_function () at d.c:28
4 *ptr = 0x00;
(gdb) backtrace
#0 0x08048364 in dummy_function () at d.c:4
#1 0x0804837c in main () at d.c:9
按上述步骤,能够清晰看到C文件产生段错误信号SIGSEGV在函数dummy_function()的第四行,和我们判断的一样,backtrace打 印堆栈调用
也是如此。
下面看一个C++例子,多调用一些模板库,会让调试麻烦些,但是这是比较接近实际应用情况。
文件iterbug.cpp内容如下
1 #include <iostream>
2 #include <vector>
3 #include <iterator>
4 #include <algorithm>
5 using namespace std;
6
7 void dummy_function(void)
8 {
9 vector<int> coll1;
10 vector<int> coll2;
11
12 /*
13 * RUNTIME ERROR:
14 * - beginning is behind the end of the range
15 */
16 vector<int>::iterator pos = coll1.begin();
17 reverse (++pos,coll1.end());
18
19
20 for ( int i=1; i<=9 ;++i )
21 coll2.push_back(i);
22
23 /*
24 * RUNTIME ERROR:
25 * - overwriting nonexisting elements
26 */
27 copy(coll2.begin(), coll2.end(),
28 coll1.begin());
29
30 /*
31 * RUNTTIME ERROR:
32 * - collections mistaken
33 * begin() and end() mistaken
34 */
35 copy(coll1.begin(), coll2.end(),
36 coll1.end());
37 }
38 int main()
39 {
40 dummy_function();
41 return 0;
42 }
三个运行时错误都在代码中注释出来了,后面两个根本不会执行,因为第一个运行时错误会造成程序终止。
g++ -g iterbug.cpp -o iterbug
将程序编译
运行
[chen@localhost seg]$ ./iterbug
Segmentation fault
[chen@localhost seg]$ gdb ./iterbug
GNU gdb Red Hat Linux (6.5-25.el5rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1".
(gdb) r
Starting program: /home/chen/code/seg/iterbug
Program received signal SIGSEGV, Segmentation fault.
0x08048bbf in std::swap<int> (__a=@0x4, __b=@0xfffffffc)
at /usr/lib/gcc/i386-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_algobase.h:97
97 _Tp __tmp = __a;
这时候进入了STL的代码里面,这里不太容易看出自身代码是哪里出错。就用backtrace查看堆栈情况
(gdb) backtrace #0 0x08048bbf in std::swap<int> (__a=@0x4, __b=@0xfffffffc)
at /usr/lib/gcc/i386-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_algobase.h:97
#1 0x08048c03 in std::__iter_swap<true>::iter_swap<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > > (__a={_M_current = 0x4}, __b=
{_M_current = 0xfffffffc})
at /usr/lib/gcc/i386-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_algobase.h:127
#2 0x08048c22 in std::iter_swap<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > > (__a={_M_current = 0x4}, __b=
{_M_current = 0xfffffffc})
at /usr/lib/gcc/i386-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_algobase.h:163
#3 0x08048c5f in std::__reverse<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > > (__first={_M_current = 0x4}, __last=
{_M_current = 0xfffffffc})
at /usr/lib/gcc/i386-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_algo.h:1586
#4 0x08048cc3 in std::reverse<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > > (__first={_M_current = 0x4}, __last=
{_M_current = 0x0})
at /usr/lib/gcc/i386-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_algo.h:1611
#5 0x080487bc in dummy_function () at iterbug.cpp:17 #6 0x0804891a in main () at iterbug.cpp:40
发现,我们是在
0x080487bc in dummy_function () at iterbug.cpp:17发生错误,即iterbug.cpp的17行。我们来查看代码。
(gdb) l dummy_function()
2 #include <vector>
3 #include <iterator>
4 #include <algorithm>
5 using namespace std;
6
7 void dummy_function(void)
8 {
9 vector<int> coll1;
10 vector<int> coll2;
11
(gdb) l
12 /*
13 * RUNTIME ERROR:
14 * - beginning is behind the end of the range
15 */
16 vector<int>::iterator pos = coll1.begin();
17 reverse (++pos,coll1.end()); 18
19
20 for ( int i=1; i<=9 ;++i )
21 coll2.push_back(i);
好了,确实找到了错误所在。
后面就是修正这些逻辑错误了。
上述例子分别来自
http://www.upsdn.net/html/2006-11/775.html 和 The C++ Standard Library A Tutorial and Reference
- 不带GDB调试环境
在很多嵌入式开发环境下,不能提供gdb,因为gdb运行环境太大了。不过一般的嵌入式开发环境会提供objdump等工具,那么可以通过 backtrace来获得堆栈信息,再用objdump来查看获得的堆栈信息与代码的关联。
看
http://www.gnu.org/software/libc/manual/html_node/Debugging-Support.html#Debugging -Support上的说明
摘录如下:
33 Debugging support
Applications are usually debugged using dedicated debugger programs. But sometimes this is not possible and, in any case, it is useful to provide the developer with as much information as possible at the time the problems are experienced. For this reason a few functions are provided which a program can use to help the developer more easily locate the problem.
33.1 Backtraces
A backtrace is a list of the function calls that are currently active in a thread. The usual way to inspect a backtrace of a program is to use an external debugger such as gdb. However, sometimes it is useful to obtain a backtrace programmatically from within a program, e.g., for the purposes of logging or diagnostics.
The header file execinfo.h declares three functions that obtain and manipulate backtraces of the current thread.
— Function: int
backtrace (
void **buffer, int size)
The backtrace
function obtains a backtrace for the current thread, as a list of pointers, and places the information into buffer. The argument size should be the number of void *
elements that will fit into buffer. The return value is the actual number of entries of buffer that are obtained, and is at most size.
The pointers placed in buffer are actually return addresses obtained by inspecting the stack, one return address per stack frame.
Note that certain compiler optimizations may interfere with obtaining a valid backtrace. Function inlining causes the inlined function to not have a stack frame; tail call optimization replaces one stack frame with another; frame pointer elimination will stop backtrace
from interpreting the stack contents correctly.
— Function: char **
backtrace_symbols (
void *const *buffer, int size)
The backtrace_symbols
function translates the information obtained from the backtrace
function into an array of strings. The argument buffer should be a pointer to an array of addresses obtained via the backtrace
function, and size is the number of entries in that array (the return value of backtrace
).
The return value is a pointer to an array of strings, which has size entries just like the array buffer. Each string contains a printable representation of the corresponding element of buffer. It includes the function name (if this can be determined), an offset into the function, and the actual return address (in hexadecimal).
Currently, the function name and offset only be obtained on systems that use the ELF binary format for programs and libraries. On other systems, only the hexadecimal return address will be present. Also, you may need to pass additional flags to the linker to make the function names available to the program. (For example, on systems using GNU ld, you must pass (-rdynamic
.)
The return value of backtrace_symbols
is a pointer obtained via the malloc
function, and it is the responsibility of the caller to free
that pointer. Note that only the return value need be freed, not the individual strings.
The return value is NULL
if sufficient memory for the strings cannot be obtained.
— Function: void
backtrace_symbols_fd (
void *const *buffer, int size, int fd)
The backtrace_symbols_fd
function performs the same translation as the function backtrace_symbols
function. Instead of returning the strings to the caller, it writes the strings to the file descriptor fd, one per line. It does not use the malloc
function, and can therefore be used in situations where that function might fail.
The following program illustrates the use of these functions. Note that the array to contain the return addresses returned by backtrace
is allocated on the stack. Therefore code like this can be used in situations where the memory handling via malloc
does not work anymore (in which case the backtrace_symbols
has to be replaced by a backtrace_symbols_fd
call as well). The number of return addresses is normally not very large. Even complicated programs rather seldom have a nesting level of more than, say, 50 and with 200 possible entries probably all programs should be covered.
#include <execinfo.h> #include <stdio.h> #include <stdlib.h> /* Obtain a backtrace and print it to stdout
. */ void print_trace (void) { void *array[10]; size_t size; char **strings; size_t i; size = backtrace (array, 10); strings = backtrace_symbols (array, size); printf ("Obtained %zd stack frames.\n", size); for (i = 0; i < size; i++) printf ("%s\n", strings[i]); free (strings); } /* A dummy function to make the backtrace more interesting. */ void dummy_function (void) { print_trace (); } int main (void) { dummy_function (); return 0; }
好吧,看了这个说明就大致知道了backtrace等三个函数是如何调用的,以及有什么作用。
更多的可以看
http://www.kernel.org/doc/man- pages/online/pages/man3/backtrace.3.html上的man page
这时候再回到
http://www.upsdn.net/html/2006-11/775.html上给的一个例子 .利用backtrace和objdump进行分析:
[chen@localhost seg]$ cat -n backtrace.c
1 #include <stdio.h>
2 #include <execinfo.h>
3 #include <stdlib.h>
4 #include <signal.h>
5
6 /*
7 * A dummy function to make the backtrace more interesting.
8 */
9 void
10 dummy_function(void)
11 {
12 unsigned char *ptr = 0x00;
13 *ptr = 0x00; 14 }
15
16 void dump(int signo)
17 {
18 void *array[10];
19 size_t size;
20 char **strings;
21 size_t i;
22
23 size = backtrace(array, 10);
24 strings = backtrace_symbols(array,size);
25
26 printf("Obtained %zd stack frames.\n", size);
27
28
29 for ( i = 0; i < size ; ++i )
30 printf("%s\n",strings[i]);
31
32 free(strings);
33 exit(0);
34 }
35
36 int
37 main(void)
38 {
39 signal(SIGSEGV, &dump);
40 dummy_function();
41
42 return 0;
43 }
一样是第十三行出现错误
gcc -g
-rdynamic backtrace.c -o backtrace
编译运行
在man page里面说明了使用backtrace函数需要用-rdynamic参数进行编译 [chen@localhost seg]$ ./backtrace
Obtained 5 stack frames.
./backtrace(
dump+0x19) [0x80486c2]
[0x192420]
./backtrace(
main+0x2a) [0x8048756]
/lib/libc.so.6(__libc_start_main+0xdc) [0x3e1dec]
./backtrace [0x80485e1]
这里打印出了堆栈调用情况
我们用objdump来查看0x8048756这样的地址到底是什么地方
objdump -d -S backtrace > backtrace.dump
查看backtrace.dump文件,搜索上面那几个十六进制地址
backtrace: file format elf32-i386
Disassembly of section .init:
...
08048694 <dummy_function>:
* A dummy function to make the backtrace more interesting.
*/
void
dummy_function(void)
{
8048694: 55 push %ebp
8048695: 89 e5 mov %esp,%ebp
8048697: 83 ec 10 sub $0x10,%esp
unsigned char *ptr = 0x00;
804869a: c7 45 fc 00 00 00 00 movl $0x0,0xfffffffc(%ebp)
*ptr = 0x00;
80486a1: 8b 45 fc mov 0xfffffffc(%ebp),%eax
80486a4: c6 00 00 movb $0x0,(%eax)
}
80486a7: c9 leave
80486a8: c3 ret
080486a9 <dump>:
void dump(int signo)
{
80486a9: 55 push %ebp
80486aa: 89 e5 mov %esp,%ebp
80486ac: 83 ec 48 sub $0x48,%esp
void *array[10];
size_t size;
char **strings;
size_t i;
size = backtrace(array, 10);
80486af: c7 44 24 04 0a 00 00 movl $0xa,0x4(%esp)
80486b6: 00
80486b7: 8d 45 cc lea 0xffffffcc(%ebp),%eax
80486ba: 89 04 24 mov %eax,(%esp)
80486bd: e8 c6 fe ff ff call 8048588 <backtrace@plt>
80486c2: 89 45 f4 mov %eax,0xfffffff4(%ebp)
strings = backtrace_symbols(array,size);
80486c5: 8b 45 f4 mov 0xfffffff4(%ebp),%eax
80486c8: 89 44 24 04 mov %eax,0x4(%esp)
80486cc: 8d 45 cc lea 0xffffffcc(%ebp),%eax
80486cf: 89 04 24 mov %eax,(%esp)
80486d2: e8 91 fe ff ff call 8048568 <backtrace_symbols@plt>
80486d7: 89 45 f8 mov %eax,0xfffffff8(%ebp)
printf("Obtained %zd stack frames.\n", size);
80486da: 8b 45 f4 mov 0xfffffff4(%ebp),%eax
80486dd: 89 44 24 04 mov %eax,0x4(%esp)
80486e1: c7 04 24 40 88 04 08 movl $0x8048840,(%esp)
80486e8: e8 8b fe ff ff call 8048578 <printf@plt>
for ( i = 0; i < size ; ++i )
80486ed: c7 45 fc 00 00 00 00 movl $0x0,0xfffffffc(%ebp)
80486f4: eb 17 jmp 804870d <dump+0x64>
printf("%s\n",strings[i]);
80486f6: 8b 45 fc mov 0xfffffffc(%ebp),%eax
80486f9: c1 e0 02 shl $0x2,%eax
80486fc: 03 45 f8 add 0xfffffff8(%ebp),%eax
80486ff: 8b 00 mov (%eax),%eax
8048701: 89 04 24 mov %eax,(%esp)
8048704: e8 8f fe ff ff call 8048598 <puts@plt>
8048709: 83 45 fc 01 addl $0x1,0xfffffffc(%ebp)
804870d: 8b 45 fc mov 0xfffffffc(%ebp),%eax
8048710: 3b 45 f4 cmp 0xfffffff4(%ebp),%eax
8048713: 72 e1 jb 80486f6 <dump+0x4d>
free(strings);
8048715: 8b 45 f8 mov 0xfffffff8(%ebp),%eax
8048718: 89 04 24 mov %eax,(%esp)
804871b: e8 38 fe ff ff call 8048558 <free@plt>
exit(0);
8048720: c7 04 24 00 00 00 00 movl $0x0,(%esp)
8048727: e8 7c fe ff ff call 80485a8 <exit@plt>
0804872c <main>:
}
int
main(void)
{
804872c: 8d 4c 24 04 lea 0x4(%esp),%ecx
8048730: 83 e4 f0 and $0xfffffff0,%esp
8048733: ff 71 fc pushl 0xfffffffc(%ecx)
8048736: 55 push %ebp
8048737: 89 e5 mov %esp,%ebp
8048739: 51 push %ecx
804873a: 83 ec 14 sub $0x14,%esp
signal(SIGSEGV, &dump);
804873d: c7 44 24 04 a9 86 04 movl $0x80486a9,0x4(%esp)
8048744: 08
8048745: c7 04 24 0b 00 00 00 movl $0xb,(%esp)
804874c: e8 d7 fd ff ff call 8048528 <signal@plt>
dummy_function();
8048751: e8 3e ff ff ff call 8048694 <dummy_function>
return 0;
8048756: b8 00 00 00 00 mov $0x0,%eax
}
...
上面用红色标注出来的就是那几个地址了
[chen@localhost seg]$ ./backtrace
Obtained 5 stack frames.
./backtrace(
dump+0x19) [0x80486c2]
[0x192420]
./backtrace(
main+0x2a) [0x8048756]
/lib/libc.so.6(__libc_start_main+0xdc) [0x3e1dec]
./backtrace [0x80485e1]
我们再看看这些地址分析下。
最后一次调用堆栈是 ./backtrace(
dump+0x19) [0x80486c2]这肯定是backtrace函数调用,
./backtrace(
main+0x2a) [0x8048756]则是我们自己写的程序最后出错地方。
8048756: b8 00 00 00 00 mov $0x0,%eax
对应的是return 0,呵呵,我们出错的地址是在8048756的上一条。即
8048751: e8 3e ff ff ff call 8048694 <dummy_function>
所以我们在没有用gdb的情况,可以判断段错误发生在函数dummy_function上,但是具体在哪一行就不能获得。
建议将这些内容看看,如果对这些感兴趣的。
http://www.upsdn.net/html/2006-11/775.html http://www.gnu.org/software/libc/manual/html_node/Debugging-Support.html#Debugging-Support http://www.kernel.org/doc/man-pages/online/pages/man3/backtrace.3.html <完>