mirror of https://github.com/acidanthera/audk.git
c4e76d2fba
BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=4171 A typical QEMU fw_cfg read bytes with IOMMU for td guest is that: (QemuFwCfgReadBytes@QemuFwCfgLib.c is the example) 1) Allocate DMA Access buffer 2) Map actual data buffer 3) start the transfer and wait for the transfer to complete 4) Free DMA Access buffer 5) Un-map actual data buffer In step 1/2, Private memories are allocated, converted to shared memories. In Step 4/5 the shared memories are converted to private memories and accepted again. The final step is to free the pages. This is time-consuming and impacts td guest's boot perf (both direct boot and grub boot) badly. In a typical grub boot, there are about 5000 calls of page allocation and private/share conversion. Most of page size is less than 32KB. This patch allocates a memory region and initializes it into pieces of memory with different sizes. A piece of such memory consists of 2 parts: the first page is of private memory, and the other pages are shared memory. This is to meet the layout of common buffer. When allocating bounce buffer in IoMmuMap(), IoMmuAllocateBounceBuffer() is called to allocate the buffer. Accordingly when freeing bounce buffer in IoMmuUnmapWorker(), IoMmuFreeBounceBuffer() is called to free the bounce buffer. CommonBuffer is allocated by IoMmuAllocateCommonBuffer and accordingly freed by IoMmuFreeCommonBuffer. This feature is tested in Intel TDX pre-production platform. It saves up to hundreds of ms in a grub boot. Cc: Erdem Aktas <erdemaktas@google.com> Cc: James Bottomley <jejb@linux.ibm.com> Cc: Jiewen Yao <jiewen.yao@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Jiewen Yao <Jiewen.yao@intel.com> Signed-off-by: Min Xu <min.m.xu@intel.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> |
||
---|---|---|
.. | ||
AmdSevIoMmu.c | ||
AmdSevIoMmu.h | ||
IoMmuBuffer.c | ||
IoMmuDxe.c | ||
IoMmuDxe.inf | ||
IoMmuInternal.h |