REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1303
Today's code generates assembly code as below for
InternalSyncIncrement:
__asm__ __volatile__ (
"movl $1, %%eax \n\t"
"lock \n\t"
"xadd %%eax, %1 \n\t"
"inc %%eax \n\t"
: "=a" (Result), // %0
"+m" (*Value) // %1
: // no inputs that aren't also outputs
: "memory",
"cc"
);
0: 55 pushl %ebp
1: 89 e5 movl %esp, %ebp
3: 8b 45 08 movl 8(%ebp), %eax
6: b8 01 00 00 00 movl $1, %eax
b: f0 lock
c: 0f c1 00 xaddl %eax, _InternalSyncIncrement(%eax)
f: 40 incl %eax
10: 5d popl %ebp
11: c3 retl
Line #3 and Line #6 both use EAX as destination register.
Line #c uses EAX and (EAX).
The output operand "=a" tells GCC that EAX is used for output.
But GCC only assumes that EAX will be used in the very last
instruction.
Per GCC document,
"Use the '&' constraint modifier on all output operands that must
not overlap an input. Otherwise, GCC may allocate the output
operand in the same register as an unrelated input operand, on
the assumption that the assembler code consumes its inputs before
producing outputs. This assumption may be false if the assembler
code actually consists of more than one instruction."
"=&a" should be used to tell GCC not use EAX before the assembly.
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Ruiyu Ni <ruiyu.ni@intel.com>
Cc: Liming Gao <liming.gao@intel.com>
Cc: Michael Kinney <michael.d.kinney@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Tested-by: Laszlo Ersek <lersek@redhat.com>
Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Fixes: 8a94eb9283
Fixes: 17634d026f
The IA32 variant of InternalSyncCompareExchange64() is correct, but we can
simplify it. We don't need to load the lower 32 bits of ExchangeValue into
EBX in two steps (first into a general register, then into EBX); we can
ask GCC to populate EBX like that itself.
Cc: Liming Gao <liming.gao@intel.com>
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1208
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Michael D Kinney <michael.d.kinney@intel.com>
Reviewed-by: Liming Gao <liming.gao@intel.com>
(This patch is identical to the last one, except for the
InternalSyncCompareExchange16() -> InternalSyncCompareExchange32() and
"cmpxchgw" -> "cmpxchgl" replacements.)
The CMPXCHG instruction has the following operands:
- AX (implicit, CompareValue): input and output
- destination operand (*Value): input and output
- source operand (ExchangeValue): input
The IA32 version of InternalSyncCompareExchange32() correctly marks
CompareValue as input/output, but it marks (*Value) only as input.
The X64 version of InternalSyncCompareExchange32() attempts to mark both
CompareValue and (*Value) as input/output, but it doesn't use the
appropriate constraints for either operand.
Fix these issues. Furthermore, prefer the short "+" constraint for I/O
operands over the <output-operand-number> constraint that can be applied
to the input instances of I/O operands.
Cc: Liming Gao <liming.gao@intel.com>
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1208
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Michael D Kinney <michael.d.kinney@intel.com>
Reviewed-by: Liming Gao <liming.gao@intel.com>
The CMPXCHG instruction has the following operands:
- AX (implicit, CompareValue): input and output
- destination operand (*Value): input and output
- source operand (ExchangeValue): input
The IA32 version of InternalSyncCompareExchange16() correctly marks
CompareValue as input/output, but it marks (*Value) only as input.
The X64 version of InternalSyncCompareExchange16() attempts to mark both
CompareValue and (*Value) as input/output, but it doesn't use the
appropriate constraints for either operand.
Fix these issues. Furthermore, prefer the short "+" constraint for I/O
operands over the <output-operand-number> constraint that can be applied
to the input instances of I/O operands.
Cc: Liming Gao <liming.gao@intel.com>
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1208
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Michael D Kinney <michael.d.kinney@intel.com>
Reviewed-by: Liming Gao <liming.gao@intel.com>
The "GccInline.c" files have some inconsistent whitespace, and missing (or
incorrect) operand comments. Fix and unify them.
This patch doesn't change behavior.
Cc: Liming Gao <liming.gao@intel.com>
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1208
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Michael D Kinney <michael.d.kinney@intel.com>
Reviewed-by: Liming Gao <liming.gao@intel.com>
Currently, "gcc-4.8.5-28.el7_5.1.x86_64" generates the following code for
me, from the XADD inline assembly added to "X64/GccInline.c" in commit
17634d026f96:
> 0000000000004383 <InternalSyncIncrement>:
> UINT32
> EFIAPI
> InternalSyncIncrement (
> IN volatile UINT32 *Value
> )
> {
> 4383: 55 push %rbp
> 4384: 48 89 e5 mov %rsp,%rbp
> 4387: 48 83 ec 10 sub $0x10,%rsp
> 438b: 48 89 4d 10 mov %rcx,0x10(%rbp)
> UINT32 Result;
>
> __asm__ __volatile__ (
> 438f: 48 8b 55 10 mov 0x10(%rbp),%rdx
> 4393: 48 8b 45 10 mov 0x10(%rbp),%rax
> 4397: b8 01 00 00 00 mov $0x1,%eax
> 439c: f0 0f c1 00 lock xadd %eax,(%rax)
> 43a0: ff c0 inc %eax
> 43a2: 89 45 fc mov %eax,-0x4(%rbp)
> : "m" (*Value) // %2
> : "memory",
> "cc"
> );
>
> return Result;
> 43a5: 8b 45 fc mov -0x4(%rbp),%eax
> }
> 43a8: c9 leaveq
> 43a9: c3 retq
>
The MOV $0X1,%EAX instruction corrupts the address of Value in %RAX before
we reach the XADD instruction. In fact, it makes no sense for XADD to use
%EAX as source operand and (%RAX) as destination operand at the same time.
The XADD instruction's destination operand is a read-write operand. The
GCC documentation states:
> The ordinary output operands must be write-only; GCC will assume that
> the values in these operands before the instruction are dead and need
> not be generated. Extended asm supports input-output or read-write
> operands. Use the constraint character `+' to indicate such an operand
> and list it with the output operands. You should only use read-write
> operands when the constraints for the operand (or the operand in which
> only some of the bits are to be changed) allow a register.
(The above is intentionally quoted from the oldest GCC release that edk2
supports, namely gcc-4.4:
<https://gcc.gnu.org/onlinedocs/gcc-4.4.7/gcc/Extended-Asm.html>.)
Fix the operand list accordingly.
With the patch applied, I get:
> 0000000000004383 <InternalSyncIncrement>:
> UINT32
> EFIAPI
> InternalSyncIncrement (
> IN volatile UINT32 *Value
> )
> {
> 4383: 55 push %rbp
> 4384: 48 89 e5 mov %rsp,%rbp
> 4387: 48 83 ec 10 sub $0x10,%rsp
> 438b: 48 89 4d 10 mov %rcx,0x10(%rbp)
> UINT32 Result;
>
> __asm__ __volatile__ (
> 438f: 48 8b 55 10 mov 0x10(%rbp),%rdx
> 4393: 48 8b 45 10 mov 0x10(%rbp),%rax
> 4397: b8 01 00 00 00 mov $0x1,%eax
> 439c: f0 0f c1 02 lock xadd %eax,(%rdx)
> 43a0: ff c0 inc %eax
> 43a2: 89 45 fc mov %eax,-0x4(%rbp)
> : // no inputs that aren't also outputs
> : "memory",
> "cc"
> );
>
> return Result;
> 43a5: 8b 45 fc mov -0x4(%rbp),%eax
> }
> 43a8: c9 leaveq
> 43a9: c3 retq
Note that some other bugs remain in
"BaseSynchronizationLib/*/GccInline.c"; those should be addressed later,
under <https://bugzilla.tianocore.org/show_bug.cgi?id=1208>.
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Liming Gao <liming.gao@intel.com>
Cc: Michael Kinney <michael.d.kinney@intel.com>
Cc: Ruiyu Ni <ruiyu.ni@intel.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1207
Fixes: 17634d026f
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Ruiyu Ni <ruiyu.ni@intel.com>
REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1197
Today's InterlockedIncrement()/InterlockedDecrement() guarantees to
perform atomic increment/decrement but doesn't guarantee the return
value equals to the new value.
The patch fixes the behavior to use "XADD" instruction to guarantee
the return value equals to the new value.
The patch calls intrinsic functions for MSVC tool chain, calls the
NASM implementation for INTEL tool chain and calls GCC inline
assembly implementation (GccInline.c) for GCC tool chain.
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Ruiyu Ni <ruiyu.ni@intel.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Reviewed-by: Liming Gao <liming.gao@intel.com>
Cc: Michael D Kinney <michael.d.kinney@intel.com>
1. Do not use tab characters
2. No trailing white space in one line
3. All files must end with CRLF
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Liming Gao <liming.gao@intel.com>
NASM has replaced ASM and S files.
1. Remove ASM from all modules.
2. Remove S files from the drivers only.
3. https://bugzilla.tianocore.org/show_bug.cgi?id=881
After NASM is updated, S files can be removed from Library.
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Liming Gao <liming.gao@intel.com>
Reviewed-by: Michael Kinney <michael.d.kinney@intel.com>
The SpinLock functions in the SynchronicationLib use volatile
parameters to keep compiler from optimizing these functions
too much. The volatile keyword is missing from the Interlocked*()
functions in this same library instance. Update the library instance
to consistently use volatile on all functions in the
SynchronizationLib class.
Cc: Liming Gao <liming.gao@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Andrew Fish <afish@apple.com>
Cc: Jeff Fan <jeff.fan@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Michael Kinney <michael.d.kinney@intel.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Liming Gao <liming.gao@intel.com>
Remove extra qword in nasm code to make it pass build.
This file is only built in INTEL ICC compiler. So, there is missing
build check for it.
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Liming Gao <liming.gao@intel.com>
Reviewed-by: Jeff Fan <jeff.fan@intel.com>
Some processor may return small cache line size, we should return 32 bytes at
least for spin lock alignment.
Cc: Liming Gao <liming.gao@intel.com>
Cc: Michael Kinney <michael.d.kinney@intel.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jeff Fan <jeff.fan@intel.com>
Reviewed-by: Jiewen Yao <jiewen.yao@intel.com>
The BaseTools/Scripts/ConvertMasmToNasm.py script was used to convert
Ia32/InterlockedIncrement.asm to Ia32/InterlockedIncrement.nasm
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
The BaseTools/Scripts/ConvertMasmToNasm.py script was used to convert
Ia32/InterlockedDecrement.asm to Ia32/InterlockedDecrement.nasm
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
The BaseTools/Scripts/ConvertMasmToNasm.py script was used to convert
Ia32/InterlockedCompareExchange16.asm to Ia32/InterlockedCompareExchange16.nasm
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
The BaseTools/Scripts/ConvertMasmToNasm.py script was used to convert
Ia32/InterlockedCompareExchange32.asm to Ia32/InterlockedCompareExchange32.nasm
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
The BaseTools/Scripts/ConvertMasmToNasm.py script was used to convert
Ia32/InterlockedCompareExchange64.asm to Ia32/InterlockedCompareExchange64.nasm
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
From Intel(R) 64 and IA-32 Architectures Software Developer's Manual, one lock
or semaphore is suggested to be present within a cache line. If the processors
are based on Intel NetBurst microarchitecture, two cache lines are suggested.
This could minimize the bus traffic required to service locks.
Cc: Michael Kinney <michael.d.kinney@intel.com>
Cc: Liming Gao <liming.gao@intel.com>
Cc: Star Zeng <star.zeng@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jeff Fan <jeff.fan@intel.com>
Reviewed-by: Liming Gao <liming.gao@intel.com>
Reviewed-by: Star Zeng <star.zeng@intel.com>
This implements the function InterlockedCompareExchange16 () for all
architectures, using architecture and toolchain specific intrinsics
or primitive assembler instructions.
Contributed-under: TianoCore Contribution Agreement 1.0
Reviewed-by: Olivier Martin <olivier.martin@arm.com>
Acked-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Michael Kinney <michael.d.kinney@intel.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
git-svn-id: https://svn.code.sf.net/p/edk2/code/trunk/edk2@16966 6f19259b-4bc3-4df7-8a09-765794883524