問題描述
什麼會導致 _mm_setzero_si128() 到 SIGSEGV? (What would cause _mm_setzero_si128() to SIGSEGV?)
Possible Duplicate: Qt, GCC, SSE and stack alignment
I am converting a simulator from TinyPTC to WxWidgets. Some graphics routines are optimized with SSE intrinsics. During the initialization of the GUI, the initial state is rendered once, and all of the SSE routines work perfectly. However, if I call them later from an event handler, I get a SIGSEGV.
At first I thought those were some weird alignment issues, but it even happens for:
__m128i zero = _mm_setzero_si128();
When I replace the SSE routines with non-optimized code, everything works fine.
I suppose the event handling happens in a different thread than the initialization. Is there anything to watch out for when using SSE from different threads? What else could possibly cause this behavior?
The SIGSEGV happens at a movdqa %xmm0, -40(%ebp)
instruction (there are several of those). If I compile with -O1
, the movdqa
instructions are completely optimized away, and the program runs fine. It seems to be an alignment issue with the stack after all, as already pointed out in the comments.
Here is the command CodeLite generates for compilation:
g++ -c "x:/some/folder/sse.cpp" -g -O1 -Wall -std=gnu++0x -msse3
-mthreads -DHAVE_W32API_H -D__WXMSW__ -D__WXDEBUG__ -D_UNICODE
-ID:\CodeLite\wxWidgets\lib\gcc_dll\mswud -ID:\CodeLite\wxWidgets\include
-DWXUSINGDLL -Wno-ctor-dtor-privacy -pipe -fmessage-length=0 -o ./Debug/sse.o -I.
Anything unusual? Is it possible that WxWidgets changes the alignment settings somewhere?
參考解法
方法 1:
Your stack pointer is probably misaligned. The SSE instructions require that all memory locations are 16-byte aligned. The issue isn't occurring with the _mm_setzero_si128
instruction, which just loads a constant into an SSE register, but rather the instruction that the compiler generated to store that register back into memory on the stack.
First make sure you're not using an outdated version of GCC (older versions had issues with stack alignment with SSE). Then, try also adding the -mstackrealign
option for that translation unit, which will forcibly realign the stack to 16-byte alignment on function entry (which adds a very tiny runtime cost).
See Volume 2B page 4-67 of the Intel Architectures Software Developer Manuals for more details on the movdqa
instruction and the exact conditions under which it can generate exceptions.
方法 2:
AFAIK, wxWidgets event handling runs in the main thread ( the GUI thread. ) You should be able to confirm this by running in a debugger. The debugger should also give some hints as to where the segment fault occurs.
方法 3:
You may have a bug in the SSE routines. SSE instructions will write data in larger blocks. Its possible that you are overrunning the end of the array when zeroing it out with SSE. E.g. check if the zeroed out array is not a multiple of 8 bytes. So you may want to do any odd ends of the array with non-optimized instructions.
(by fredoverflow、Adam Rosenfield、ravenspoint、Rafael Baptista)