GCC / clang 어셈블리 출력에서 "노이즈"를 제거하는 방법은 무엇입니까?
boost::variant
어떤 중간 호출이 최적화되었는지 확인하기 위해 코드에 적용한 어셈블리 출력을 검사하고 싶습니다 .
다음 예제 (GCC 5.3 사용 g++ -O3 -std=c++14 -S
)를 컴파일하면 컴파일러가 모든 것을 최적화하고 직접 100을 반환하는 것처럼 보입니다.
(...)
main:
.LFB9320:
.cfi_startproc
movl $100, %eax
ret
.cfi_endproc
(...)
#include <boost/variant.hpp>
struct Foo
{
int get() { return 100; }
};
struct Bar
{
int get() { return 999; }
};
using Variant = boost::variant<Foo, Bar>;
int run(Variant v)
{
return boost::apply_visitor([](auto& x){return x.get();}, v);
}
int main()
{
Foo f;
return run(f);
}
그러나 전체 어셈블리 출력에는 위의 발췌 내용보다 훨씬 더 많은 내용이 포함되어 있습니다. GCC / clang에게 모든 "노이즈"를 제거하고 프로그램이 실행될 때 실제로 호출되는 것을 출력하도록 지시하는 방법이 있습니까?
전체 어셈블리 출력 :
.file "main1.cpp"
.section .rodata.str1.8,"aMS",@progbits,1
.align 8
.LC0:
.string "/opt/boost/include/boost/variant/detail/forced_return.hpp"
.section .rodata.str1.1,"aMS",@progbits,1
.LC1:
.string "false"
.section .text.unlikely._ZN5boost6detail7variant13forced_returnIvEET_v,"axG",@progbits,_ZN5boost6detail7variant13forced_returnIvEET_v,comdat
.LCOLDB2:
.section .text._ZN5boost6detail7variant13forced_returnIvEET_v,"axG",@progbits,_ZN5boost6detail7variant13forced_returnIvEET_v,comdat
.LHOTB2:
.p2align 4,,15
.weak _ZN5boost6detail7variant13forced_returnIvEET_v
.type _ZN5boost6detail7variant13forced_returnIvEET_v, @function
_ZN5boost6detail7variant13forced_returnIvEET_v:
.LFB1197:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
movl $_ZZN5boost6detail7variant13forced_returnIvEET_vE19__PRETTY_FUNCTION__, %ecx
movl $49, %edx
movl $.LC0, %esi
movl $.LC1, %edi
call __assert_fail
.cfi_endproc
.LFE1197:
.size _ZN5boost6detail7variant13forced_returnIvEET_v, .-_ZN5boost6detail7variant13forced_returnIvEET_v
.section .text.unlikely._ZN5boost6detail7variant13forced_returnIvEET_v,"axG",@progbits,_ZN5boost6detail7variant13forced_returnIvEET_v,comdat
.LCOLDE2:
.section .text._ZN5boost6detail7variant13forced_returnIvEET_v,"axG",@progbits,_ZN5boost6detail7variant13forced_returnIvEET_v,comdat
.LHOTE2:
.section .text.unlikely._ZN5boost6detail7variant13forced_returnIiEET_v,"axG",@progbits,_ZN5boost6detail7variant13forced_returnIiEET_v,comdat
.LCOLDB3:
.section .text._ZN5boost6detail7variant13forced_returnIiEET_v,"axG",@progbits,_ZN5boost6detail7variant13forced_returnIiEET_v,comdat
.LHOTB3:
.p2align 4,,15
.weak _ZN5boost6detail7variant13forced_returnIiEET_v
.type _ZN5boost6detail7variant13forced_returnIiEET_v, @function
_ZN5boost6detail7variant13forced_returnIiEET_v:
.LFB9757:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
movl $_ZZN5boost6detail7variant13forced_returnIiEET_vE19__PRETTY_FUNCTION__, %ecx
movl $39, %edx
movl $.LC0, %esi
movl $.LC1, %edi
call __assert_fail
.cfi_endproc
.LFE9757:
.size _ZN5boost6detail7variant13forced_returnIiEET_v, .-_ZN5boost6detail7variant13forced_returnIiEET_v
.section .text.unlikely._ZN5boost6detail7variant13forced_returnIiEET_v,"axG",@progbits,_ZN5boost6detail7variant13forced_returnIiEET_v,comdat
.LCOLDE3:
.section .text._ZN5boost6detail7variant13forced_returnIiEET_v,"axG",@progbits,_ZN5boost6detail7variant13forced_returnIiEET_v,comdat
.LHOTE3:
.section .text.unlikely,"ax",@progbits
.LCOLDB4:
.text
.LHOTB4:
.p2align 4,,15
.globl _Z3runN5boost7variantI3FooJ3BarEEE
.type _Z3runN5boost7variantI3FooJ3BarEEE, @function
_Z3runN5boost7variantI3FooJ3BarEEE:
.LFB9310:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
movl (%rdi), %eax
cltd
xorl %edx, %eax
cmpl $19, %eax
ja .L7
jmp *.L9(,%rax,8)
.section .rodata
.align 8
.align 4
.L9:
.quad .L30
.quad .L10
.quad .L7
.quad .L7
.quad .L7
.quad .L7
.quad .L7
.quad .L7
.quad .L7
.quad .L7
.quad .L7
.quad .L7
.quad .L7
.quad .L7
.quad .L7
.quad .L7
.quad .L7
.quad .L7
.quad .L7
.quad .L7
.text
.p2align 4,,10
.p2align 3
.L7:
call _ZN5boost6detail7variant13forced_returnIiEET_v
.p2align 4,,10
.p2align 3
.L30:
movl $100, %eax
.L8:
addq $8, %rsp
.cfi_remember_state
.cfi_def_cfa_offset 8
ret
.p2align 4,,10
.p2align 3
.L10:
.cfi_restore_state
movl $999, %eax
jmp .L8
.cfi_endproc
.LFE9310:
.size _Z3runN5boost7variantI3FooJ3BarEEE, .-_Z3runN5boost7variantI3FooJ3BarEEE
.section .text.unlikely
.LCOLDE4:
.text
.LHOTE4:
.globl _Z3runN5boost7variantI3FooI3BarEEE
.set _Z3runN5boost7variantI3FooI3BarEEE,_Z3runN5boost7variantI3FooJ3BarEEE
.section .text.unlikely
.LCOLDB5:
.section .text.startup,"ax",@progbits
.LHOTB5:
.p2align 4,,15
.globl main
.type main, @function
main:
.LFB9320:
.cfi_startproc
movl $100, %eax
ret
.cfi_endproc
.LFE9320:
.size main, .-main
.section .text.unlikely
.LCOLDE5:
.section .text.startup
.LHOTE5:
.section .rodata
.align 32
.type _ZZN5boost6detail7variant13forced_returnIvEET_vE19__PRETTY_FUNCTION__, @object
.size _ZZN5boost6detail7variant13forced_returnIvEET_vE19__PRETTY_FUNCTION__, 58
_ZZN5boost6detail7variant13forced_returnIvEET_vE19__PRETTY_FUNCTION__:
.string "T boost::detail::variant::forced_return() [with T = void]"
.align 32
.type _ZZN5boost6detail7variant13forced_returnIiEET_vE19__PRETTY_FUNCTION__, @object
.size _ZZN5boost6detail7variant13forced_returnIiEET_vE19__PRETTY_FUNCTION__, 57
_ZZN5boost6detail7variant13forced_returnIiEET_vE19__PRETTY_FUNCTION__:
.string "T boost::detail::variant::forced_return() [with T = int]"
.ident "GCC: (Ubuntu 5.3.0-3ubuntu1~14.04) 5.3.0 20151204"
.section .note.GNU-stack,"",@progbits
아웃 스트리핑 .cfi
해결 된 문제가 지침, 사용되지 않는 레이블 및 주석 행입니다 : 뒤에 스크립트 매트 Godbolt의 컴파일러 탐색기 에서 오픈 소스 의 GitHub의 프로젝트 . 디버그 정보를 사용하여 소스 라인을 asm 라인과 일치시키기 위해 색상 강조 표시를 할 수도 있습니다.
로컬로 설정하여 프로젝트의 일부인 파일에 모든 #include
경로 등을 제공 할 수 있습니다 (사용 -I/...
). 따라서 인터넷을 통해 전송하고 싶지 않은 비공개 소스 코드에 사용할 수 있습니다.
Matt Godbolt의 CppCon2017은 “최근 내 컴파일러가 나를 위해 무엇을 했는가? Unbolting the Compiler 's Lid” 는 사용법 (자명하게 설명 할 수 있지만 github의 문서를 읽으면 깔끔한 기능이 있음)과 x86 asm을 읽는 방법을 보여줍니다. 또한 초보자를위한 x86 asm 자체를 부드럽게 소개합니다. 컴파일러 출력을 살펴 봅니다. 그는 계속해서 몇 가지 깔끔한 컴파일러 최적화 (예 : 상수로 나누기)와 최적화 된 컴파일러 출력을 보는 데 유용한 asm 출력을 제공하는 함수 (함수 인수, 아님 int a = 123;
)를 보여줍니다.
일반 gcc / clang (g ++ 아님)을 사용 -fno-asynchronous-unwind-tables
하면 .cfi
지시문을 피할 수 있습니다. 또한 유용 할 수도 있습니다 : -fno-exceptions -fno-rtti
-masm=intel
. 는 생략해야합니다 -g
.
로컬 사용을 위해 복사 / 붙여 넣기 :
g++ -fno-asynchronous-unwind-tables -fno-exceptions -fno-rtti -fverbose-asm \
-Wall -Wextra foo.cpp -O3 -masm=intel -S -o- | less
But really, I'd recommend just using Godbolt directly (online or set it up locally)! You can quickly flip between versions of gcc and clang to see if old or new compilers do something dumb. (Or what ICC does, or even what MSVC does.) There's even ARM / ARM64 gcc 6.3, and various gcc for PowerPC, MIPS, AVR, MSP430. (It can be interesting to see what happens on a machine where int
is wider than a register, or isn't 32-bit. Or on a RISC vs. x86).
For C instead of C++, use -xc -std=gnu11
or something; the compiler explorer site only provides g++ / clang++, not gcc / clang. (Or you can use C mode in the language dropdown, but that has a different selection of compilers that's mostly more limited. And it resets your source pane so it's more of an ordeal to flip between C and C++.)
Useful compiler options for making asm for human consumption:
Remember, your code only has to compile, not link: passing a pointer to an external function like
void ext(int*p)
is a good way to stop something from optimizing away. You only need a prototype for it, with no definition so the compiler can't inline it or make any assumptions about what it does.I'd recommend using
-O3 -Wall -Wextra -fverbose-asm -march=haswell
) for looking at code. (-fverbose-asm
can just make the source look noisy, though, when all you get are numbered temporaries as names for the operands.) When you're fiddling with the source to see how it changes the asm, you definitely want compiler warnings enabled. You don't want to waste time scratching your head over the asm when the explanation is that you did something that deserves a warning in the source.To see how the calling convention works, you often want to look at caller and callee without inlining.
You can use
__attribute__((noinline,noclone)) foo_t foo(bar_t x) { ... }
on a definition, or compile withgcc -O3 -fno-inline-functions -fno-inline-functions-called-once -fno-inline-small-functions
to disable inlining. (But those command line options don't disable cloning a function for constant-propagation.) See From compiler perspective, how is reference for array dealt with, and, why passing by value(not decay) is not allowed? for an example.Or if you just want to see how functions pass / receive args of different types, you could use different names but the same prototype so the compiler doesn't have a definition to inline. This works with any compiler.
-ffast-math
will get many libm functions to inline, some to a single instruction (esp. with SSE4 available forroundsd
). Some will inline with just-fno-math-errno
, or other "safer" parts of-ffast-math
, without the parts that allow the compiler to round differently. If you have FP code, definitely look at it with/without-ffast-math
. If you can't safely enable any of-ffast-math
in your regular build, maybe you'll get an idea for a safe change you can make in the source to allow the same optimization without-ffast-math
.-O3 -fno-tree-vectorize
will optimize without auto-vectorizing, so you can get full optimization without if you want to compare with-O2
(which doesn't enable autovectorization on gcc, but does on clang).- clang unrolls loops by default, so
-funroll-loops
can be useful in complex functions. You can get a sense of "what the compiler did" without having to wade through the unrolled loops. (gcc enables-funroll-loops
with-fprofile-use
, but not with-O3
). (This is a suggestion for human-readable code, not for code that would run faster.) - Definitely enable some level of optimization, unless you specifically want to know what
-O0
did. Its "predictable debug behaviour" requirement makes the compiler store/reload everything between every C statement, so you can modify C variables with a debugger and even "jump" to a different source line within the same function, and have execution continue as if you did that in the C source.-O0
output is so noisy with stores/reloads (and so slow) not just from lack of optimization, but forced de-optimization to support debugging.
To get a mix of source and asm, use gcc -Wa,-adhln -c -g foo.c | less
to pass extra options to as
. (More discussion of this in a blog post, and another blog.). Note that the output of this isn't valid assembler input, because the C source is there directly, not as an assembler comment. So don't call it a .s
. A .lst
might make sense if you want to save it to a file.
Godbolt's color highlighting serves a similar purpose, and is great at helping you see when multiple non-contiguous asm instructions come from the same source line. I haven't used that gcc listing command at all, so IDK how well it does, and how easy it is for the eye to see, in that case.
I like the high code density of godbolt's asm pane, so I don't think I'd like having source lines mixed in. At least not for simple functions. Maybe with a function that was too complex to get a handle on the overall structure of what the asm does...
And remember, when you want to just look at the asm, leave out the main()
and the compile-time constants. You want to see the code for dealing with a function arg in a register, not for the code after constant-propagation turns it into return 42
, or at least optimizes away some stuff.
Removing static
and/or inline
from functions will produce a stand-alone definition for them, as well as a definition for any callers, so you can just look at that.
Don't put your code in a function called main()
. gcc knows that main
is special and assumes it will only be called once, so it marks it as "cold" and optimizes it less.
The other thing you can do: If you did make a main()
, you can run it and use a debugger. stepi
(si
) steps by instruction. See the bottom of the x86 tag wiki for instructions. But remember that code might optimize away after inlining into main with compile-time-constant args.
__attribute__((noinline))
may help, on a function that you want to not be inlined. gcc will also make constant-propagation clones of functions, i.e. a special version with one of the args as a constant, for call-sites that know they're passing a constant. The symbol name will be .clone.foo.constprop_1234
or something in the asm output. You can use __attribute__((noclone))
to disable that, too.).
For example
If you want to see how the compiler multiplies two integers: I put the following code on the Godbolt compiler explorer to get the asm (from gcc -O3 -march=haswell -fverbose-asm
) for the wrong way and the right way to test this.
// the wrong way, which people often write when they're used to creating a runnable test-case with a main() and a printf
// or worse, people will actually look at the asm for such a main()
int constants() { int a = 10, b = 20; return a * b; }
mov eax, 200 #,
ret # compiles the same as return 200; not interesting
// the right way: compiler doesn't know anything about the inputs
// so we get asm like what would happen when this inlines into a bigger function.
int variables(int a, int b) { return a * b; }
mov eax, edi # D.2345, a
imul eax, esi # D.2345, b
ret
(This mix of asm and C was hand-crafted by copy-pasting the asm output from godbolt into the right place. I find it's a good way to show how a short function compiles in SO answers / compiler bug reports / emails.)
You can always look at the generated assembly from the object file, instead of using the compilers assembly output. objdump
comes to mind.
You can even tell objdump
to intermix source with assembly, making it easier to figure out what source line corresponds to what instructions. Example session:
$ cat test.cc
int foo(int arg)
{
return arg * 42;
}
$ g++ -g -O3 -std=c++14 -c test.cc -o test.o && objdump -dS -M intel test.o
test.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <_Z3fooi>:
int foo(int arg)
{
return arg + 1;
0: 8d 47 01 lea eax,[rdi+0x1]
}
3: c3 ret
Explanation of objdump
flags:
-d
disassembles all executable sections-S
intermixes assembly with source (-g
required while compiling withg++
)-M intel
choses intel syntax over ugly AT&T syntax (optional)
I like to insert labels that I can easily grep out of the objdump output.
int main() {
asm volatile ("interesting_part_begin%=:":);
do_something();
asm volatile ("interesting_part_end%=:":);
}
I haven't had a problem with this yet, but asm volatile
can be very hard on a compiler's optimizer because it tends to leave such code untouched.
ReferenceURL : https://stackoverflow.com/questions/38552116/how-to-remove-noise-from-gcc-clang-assembly-output
'programing' 카테고리의 다른 글
Pandas의 샘플 데이터 세트 (0) | 2021.01.14 |
---|---|
android logcat 로그 수다스러운 모듈 라인 만료 메시지 (0) | 2021.01.14 |
LiveData의 별도의 MutableLiveData 하위 클래스가있는 이유는 무엇입니까? (0) | 2021.01.14 |
GNU C ++ 용 SSE SSE2 및 SSE3 (0) | 2021.01.14 |
게시 된 RDL 파일은 어디에 있습니까? (0) | 2021.01.14 |