When SEV-SNP is active, the CPUID and Secrets memory range contains the
information that is used during the VM boot. The content need to be persist
across the kexec boot. Mark the memory range as Reserved in the EFI map
so that guest OS or firmware does not use the range as a system RAM.
Cc: Michael Roth <michael.roth@amd.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: Min Xu <min.m.xu@intel.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Ard Biesheuvel <ardb+tianocore@kernel.org>
Cc: Erdem Aktas <erdemaktas@google.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=3275
Version 2 of the GHCB specification added the support to query the
hypervisor feature bitmap. The feature bitmap provide information
such as whether to use the AP create VmgExit or use the AP jump table
approach to create the APs. The MpInitLib will use the
PcdGhcbHypervisorFeatures to determine which method to use for creating
the AP.
Query the hypervisor feature and set the PCD accordingly.
Cc: Michael Roth <michael.roth@amd.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: Min Xu <min.m.xu@intel.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Ard Biesheuvel <ardb+tianocore@kernel.org>
Cc: Erdem Aktas <erdemaktas@google.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Jiewen Yao <Jiewen.yao@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=3275
The MpInitLib uses the ConfidentialComputingAttr PCD to determine whether
AMD SEV is active so that it can use the VMGEXITs defined in the GHCB
specification to create APs.
Cc: Michael Roth <michael.roth@amd.com>
Cc: Ray Ni <ray.ni@intel.com>
Cc: Rahul Kumar <rahul1.kumar@intel.com>
Cc: Eric Dong <eric.dong@intel.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: Min Xu <min.m.xu@intel.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Ard Biesheuvel <ardb+tianocore@kernel.org>
Cc: Erdem Aktas <erdemaktas@google.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Suggested-by: Jiewen Yao <jiewen.yao@intel.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=3275
When SEV-SNP is active, a memory region mapped encrypted in the page
table must be validated before access. There are two approaches that
can be taken to validate the system RAM detected during the PEI phase:
1) Validate on-demand
OR
2) Validate before access
On-demand
=========
If memory is not validated before access, it will cause a #VC
exception with the page-not-validated error code. The VC exception
handler can perform the validation steps.
The pages that have been validated will need to be tracked to avoid
the double validation scenarios. The range of memory that has not
been validated will need to be communicated to the OS through the
recently introduced unaccepted memory type
https://github.com/microsoft/mu_basecore/pull/66, so that OS can
validate those ranges before using them.
Validate before access
======================
Since the PEI phase detects all the available system RAM, use the
MemEncryptSevSnpValidateSystemRam() function to pre-validate the
system RAM in the PEI phase.
For now, choose option 2 due to the dependency and the complexity
of the on-demand validation.
Cc: Michael Roth <michael.roth@amd.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: Min Xu <min.m.xu@intel.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Ard Biesheuvel <ardb+tianocore@kernel.org>
Cc: Erdem Aktas <erdemaktas@google.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Jiewen Yao <Jiewen.yao@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=3275
The SEV-SNP guest requires that GHCB GPA must be registered before using.
See the GHCB specification section 2.3.2 for more details.
Cc: Michael Roth <michael.roth@amd.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: Min Xu <min.m.xu@intel.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Ard Biesheuvel <ardb+tianocore@kernel.org>
Cc: Erdem Aktas <erdemaktas@google.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Jiewen Yao <Jiewen.yao@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
REF: https://bugzilla.tianocore.org/show_bug.cgi?id=3737
Apply uncrustify changes to .c/.h files in the OvmfPkg package
Cc: Andrew Fish <afish@apple.com>
Cc: Leif Lindholm <leif@nuviainc.com>
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Signed-off-by: Michael Kubacki <michael.kubacki@microsoft.com>
Reviewed-by: Andrew Fish <afish@apple.com>
BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=3429
Both the TDX and SEV support needs to reserve a page in MEMFD as a work
area. The page will contain meta data specific to the guest type.
Currently, the SEV-ES support reserves a page in MEMFD
(PcdSevEsWorkArea) for the work area. This page can be reused as a TDX
work area when Intel TDX is enabled.
Based on the discussion [1], it was agreed to rename the SevEsWorkArea
to the OvmfWorkArea, and add a header that can be used to indicate the
work area type.
[1] https://edk2.groups.io/g/devel/message/78262?p=,,,20,0,0,0::\
created,0,SNP,20,2,0,84476064
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: Min Xu <min.m.xu@intel.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Ard Biesheuvel <ardb+tianocore@kernel.org>
Cc: Erdem Aktas <erdemaktas@google.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Reviewed-by: Min Xu <min.m.xu@intel.com>
Reviewed-by: Jiewen Yao <Jiewen.yao@intel.com>
The "OvmfPkg/PlatformPei/PlatformPei.inf" module is used by the following
platform DSCs:
OvmfPkg/AmdSev/AmdSevX64.dsc
OvmfPkg/OvmfPkgIa32.dsc
OvmfPkg/OvmfPkgIa32X64.dsc
OvmfPkg/OvmfPkgX64.dsc
Remove Xen support from "OvmfPkg/PlatformPei", including any dependencies
that now become unused. The basic idea is to substitute FALSE for "mXen".
Remove "OvmfPkg/PlatformPei" from the "OvmfPkg: Xen-related modules"
section of "Maintainers.txt".
This patch is best reviewed with "git show -b -W".
Cc: Andrew Fish <afish@apple.com>
Cc: Ard Biesheuvel <ardb+tianocore@kernel.org>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Leif Lindholm <leif@nuviainc.com>
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=2122
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20210526201446.12554-22-lersek@redhat.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Leif Lindholm <leif@nuviainc.com>
BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=3275
The Flush parameter is used to provide a hint whether the specified range
is Mmio address. Now that we have a dedicated helper to clear the
memory encryption mask for the Mmio address range, its safe to remove the
Flush parameter from MemEncryptSev{Set,Clear}PageEncMask().
Since the address specified in the MemEncryptSev{Set,Clear}PageEncMask()
points to a system RAM, thus a cache flush is required during the
encryption mask update.
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: Min Xu <min.m.xu@intel.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Ard Biesheuvel <ardb+tianocore@kernel.org>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Erdem Aktas <erdemaktas@google.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Message-Id: <20210519181949.6574-14-brijesh.singh@amd.com>
BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=3108
Protect the GHCB backup pages used by an SEV-ES guest when S3 is
supported.
Regarding the lifecycle of the GHCB backup pages:
PcdOvmfSecGhcbBackupBase
(a) when and how it is initialized after first boot of the VM
If SEV-ES is enabled, the GHCB backup pages are initialized when a
nested #VC is received during the SEC phase
[OvmfPkg/Library/VmgExitLib/SecVmgExitVcHandler.c].
(b) how it is protected from memory allocations during DXE
If S3 and SEV-ES are enabled, then InitializeRamRegions()
[OvmfPkg/PlatformPei/MemDetect.c] protects the ranges with an AcpiNVS
memory allocation HOB, in PEI.
If S3 is disabled, then these ranges are not protected. PEI switches to
the GHCB backup pages in permanent PEI memory and DXE will use these
PEI GHCB backup pages, so we don't have to preserve
PcdOvmfSecGhcbBackupBase.
(c) how it is protected from the OS
If S3 is enabled, then (b) reserves it from the OS too.
If S3 is disabled, then the range needs no protection.
(d) how it is accessed on the S3 resume path
It is rewritten same as in (a), which is fine because (b) reserved it.
(e) how it is accessed on the warm reset path
It is rewritten same as in (a).
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@arm.com>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Julien Grall <julien@xen.org>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <119102a3d14caa70d81aee334a2e0f3f925e1a60.1610045305.git.thomas.lendacky@amd.com>
BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=3108
In order to be able to issue messages or make interface calls that cause
another #VC (e.g. GetLocalApicBaseAddress () issues RDMSR), add support
for nested #VCs.
In order to support nested #VCs, GHCB backup pages are required. If a #VC
is received while currently processing a #VC, a backup of the current GHCB
content is made. This allows the #VC handler to continue processing the
new #VC. Upon completion of the new #VC, the GHCB is restored from the
backup page. The #VC recursion level is tracked in the per-vCPU variable
area.
Support is added to handle up to one nested #VC (or two #VCs total). If
a second nested #VC is encountered, an ASSERT will be issued and the vCPU
will enter CpuDeadLoop ().
For SEC, the GHCB backup pages are reserved in the OvmfPkgX64.fdf memory
layout, with two new fixed PCDs to provide the address and size of the
backup area.
For PEI/DXE, the GHCB backup pages are allocated as boot services pages
using the memory allocation library.
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@arm.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <ac2e8203fc41a351b43f60d68bdad6b57c4fb106.1610045305.git.thomas.lendacky@amd.com>
BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=3108
The early assembler code performs validation for some of the SEV-related
information, specifically the encryption bit position. The new
MemEncryptSevGetEncryptionMask() interface provides access to this
validated value.
To ensure that we always use a validated encryption mask for an SEV-ES
guest, update all locations that use CPUID to calculate the encryption
mask to use the new interface.
Also, clean up some call areas where extra masking was being performed
and where a function call was being used instead of the local variable
that was just set using the function.
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@arm.com>
Cc: Rebecca Cran <rebecca@bsdio.com>
Cc: Peter Grehan <grehan@freebsd.org>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Julien Grall <julien@xen.org>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <9de678c0d66443c6cc33e004a4cac0a0223c2ebc.1610045305.git.thomas.lendacky@amd.com>
BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2198
After having transitioned from UEFI to the OS, the OS will need to boot
the APs. For an SEV-ES guest, the APs will have been parked by UEFI using
GHCB pages allocated by UEFI. The hypervisor will write to the GHCB
SW_EXITINFO2 field of the GHCB when the AP is booted. As a result, the
GHCB pages must be marked reserved so that the OS does not attempt to use
them and experience memory corruption because of the hypervisor write.
Change the GHCB allocation from the default boot services memory to
reserved memory.
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@arm.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Regression-tested-by: Laszlo Ersek <lersek@redhat.com>
BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2198
Protect the SEV-ES work area memory used by an SEV-ES guest.
Regarding the lifecycle of the SEV-ES memory area:
PcdSevEsWorkArea
(a) when and how it is initialized after first boot of the VM
If SEV-ES is enabled, the SEV-ES area is initialized during
the SEC phase [OvmfPkg/ResetVector/Ia32/PageTables64.asm].
(b) how it is protected from memory allocations during DXE
If SEV-ES is enabled, then InitializeRamRegions()
[OvmfPkg/PlatformPei/MemDetect.c] protects the ranges with either
an AcpiNVS (S3 enabled) or BootServicesData (S3 disabled) memory
allocation HOB, in PEI.
(c) how it is protected from the OS
If S3 is enabled, then (b) reserves it from the OS too.
If S3 is disabled, then the range needs no protection.
(d) how it is accessed on the S3 resume path
It is rewritten same as in (a), which is fine because (b) reserved it.
(e) how it is accessed on the warm reset path
It is rewritten same as in (a).
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@arm.com>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Julien Grall <julien@xen.org>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Regression-tested-by: Laszlo Ersek <lersek@redhat.com>
BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2198
The SEV support will clear the C-bit from non-RAM areas. The early GDT
lives in a non-RAM area, so when an exception occurs (like a #VC) the GDT
will be read as un-encrypted even though it is encrypted. This will result
in a failure to be able to handle the exception.
Move the GDT into RAM so it can be accessed without error when running as
an SEV-ES guest.
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@arm.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Regression-tested-by: Laszlo Ersek <lersek@redhat.com>
BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2198
Allocate memory for the GHCB pages and the per-CPU variable pages during
SEV initialization for use during Pei and Dxe phases. The GHCB page(s)
must be shared pages, so clear the encryption mask from the current page
table entries. Upon successful allocation, set the GHCB PCDs (PcdGhcbBase
and PcdGhcbSize).
The per-CPU variable page needs to be unique per AP. Using the page after
the GHCB ensures that it is unique per AP. Only the GHCB page is marked as
shared, keeping the per-CPU variable page encyrpted. The same logic is
used in DXE using CreateIdentityMappingPageTables() before switching to
the DXE pagetables.
The GHCB pages (one per vCPU) will be used by the PEI and DXE #VC
exception handlers. The #VC exception handler will fill in the necessary
fields of the GHCB and exit to the hypervisor using the VMGEXIT
instruction. The hypervisor then accesses the GHCB associated with the
vCPU in order to perform the requested function.
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@arm.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Regression-tested-by: Laszlo Ersek <lersek@redhat.com>
BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2198
Protect the memory used by an SEV-ES guest when S3 is supported. This
includes the page table used to break down the 2MB page that contains
the GHCB so that it can be marked un-encrypted, as well as the GHCB
area.
Regarding the lifecycle of the GHCB-related memory areas:
PcdOvmfSecGhcbPageTableBase
PcdOvmfSecGhcbBase
(a) when and how it is initialized after first boot of the VM
If SEV-ES is enabled, the GHCB-related areas are initialized during
the SEC phase [OvmfPkg/ResetVector/Ia32/PageTables64.asm].
(b) how it is protected from memory allocations during DXE
If S3 and SEV-ES are enabled, then InitializeRamRegions()
[OvmfPkg/PlatformPei/MemDetect.c] protects the ranges with an AcpiNVS
memory allocation HOB, in PEI.
If S3 is disabled, then these ranges are not protected. DXE's own page
tables are first built while still in PEI (see HandOffToDxeCore()
[MdeModulePkg/Core/DxeIplPeim/X64/DxeLoadFunc.c]). Those tables are
located in permanent PEI memory. After CR3 is switched over to them
(which occurs before jumping to the DXE core entry point), we don't have
to preserve PcdOvmfSecGhcbPageTableBase. PEI switches to GHCB pages in
permanent PEI memory and DXE will use these PEI GHCB pages, so we don't
have to preserve PcdOvmfSecGhcbBase.
(c) how it is protected from the OS
If S3 is enabled, then (b) reserves it from the OS too.
If S3 is disabled, then the range needs no protection.
(d) how it is accessed on the S3 resume path
It is rewritten same as in (a), which is fine because (b) reserved it.
(e) how it is accessed on the warm reset path
It is rewritten same as in (a).
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@arm.com>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Julien Grall <julien@xen.org>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Regression-tested-by: Laszlo Ersek <lersek@redhat.com>
BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2198
When SEV-ES is enabled, then SEV is also enabled. Add support to the SEV
initialization function to also check for SEV-ES being enabled, and if
enabled, set the SEV-ES enabled PCD (PcdSevEsIsEnabled).
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@arm.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Regression-tested-by: Laszlo Ersek <lersek@redhat.com>
Some OvmfPkg modules already depend on "EmbeddedPkg.dec"; thus, replace
the open-coded memory type info defaults in the source code with the
EmbeddedPkg PCDs that stand for the same purpose. Consequently, platform
builders can override these values with the "--pcd" option of "build",
without source code updates.
While at it, sort the memory type names alphabetically.
Cc: Ard Biesheuvel <ard.biesheuvel@arm.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=2706
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20200508121651.16045-4-lersek@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@arm.com>
The previous patch has no effect -- i.e., it cannot stop the tracking of
BS Code/Data in MemTypeInfo -- if the virtual machine already has a
MemoryTypeInformation UEFI variable.
In that case, our current logic allows the DXE IPL PEIM to translate the
UEFI variable to the HOB, and that translation is verbatim. If the
variable already contains records for BS Code/Data, the issues listed in
the previous patch persist for the virtual machine.
For this reason, *always* install PlatformPei's own MemTypeInfo HOB. This
prevents the DXE IPL PEIM's variable-to-HOB translation.
In PlatformPei, consume the records in the MemoryTypeInformation UEFI
variable as hints:
- Ignore all memory types for which we wouldn't by default install records
in the HOB. This hides BS Code/Data from any existent
MemoryTypeInformation variable.
- For the memory types that our defaults cover, enable the records in the
UEFI variable to increase (and *only* to increase) the page counts.
This lets the MemoryTypeInformation UEFI variable function as designed,
but it eliminates a reboot when such a new OVMF binary is deployed (a)
that has higher memory consumption than tracked by the virtual machine's
UEFI variable previously, *but* (b) whose defaults also reflect those
higher page counts.
Cc: Ard Biesheuvel <ard.biesheuvel@arm.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=2706
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20200508121651.16045-3-lersek@redhat.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@arm.com>
In commit d42fdd6f83 ("OvmfPkg: improve SMM comms security with adaptive
MemoryTypeInformation", 2020-03-12), we enabled the boot-to-boot tracking
of the usages of various UEFI memory types.
Both whitepapers listed in that commit recommend that BS Code/Data type
memory *not* be tracked. This recommendation was confirmed by Jiewen in
the following two messages as well:
[1] https://edk2.groups.io/g/devel/message/55741http://mid.mail-archive.com/74D8A39837DF1E4DA445A8C0B3885C503F97B579@shsmsx102.ccr.corp.intel.com
[2] https://edk2.groups.io/g/devel/message/55749http://mid.mail-archive.com/74D8A39837DF1E4DA445A8C0B3885C503F97BDC5@shsmsx102.ccr.corp.intel.com
While tracking BS Code/Data type memory has one benefit (it de-fragments
the UEFI memory map), the downsides outweigh it. Spikes in BS Data type
memory usage are not uncommon in particular, and they may have the
following consequences:
- such reboots during normal boot that look "spurious" to the end user,
and have no SMM security benefit,
- a large BS Data record in MemoryTypeInformation may cause issues when
the DXE Core tries to prime the according bin(s), but the system's RAM
size has been reduced meanwhile.
Removing the BS Code/Data entries from MemoryTypeInformation leads to a
bit more fragmentation in the UEFI memory map, but that should be
harmless.
Cc: Ard Biesheuvel <ard.biesheuvel@arm.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=2706
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20200508121651.16045-2-lersek@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@arm.com>
The UPDATE_BOOLEAN_PCD_FROM_FW_CFG() macro currently calls the
module-private helper function GetNamedFwCfgBoolean(). Replace the latter
with QemuFwCfgParseBool() from QemuFwCfgSimpleParserLib.
This change is compatible with valid strings accepted previously.
Cc: Ard Biesheuvel <ard.biesheuvel@arm.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Per Sundstrom <per_sundstrom@yahoo.com>
Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=2681
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20200424075353.8489-4-lersek@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@arm.com>
Replace the
- QemuFwCfgFindFile(),
- QemuFwCfgSelectItem(),
- QemuFwCfgReadBytes(),
- AsciiStrDecimalToUint64()
sequence in the GetFirstNonAddress() function with a call to
QemuFwCfgSimpleParserLib.
This change is compatible with valid strings accepted previously.
Cc: Ard Biesheuvel <ard.biesheuvel@arm.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Per Sundstrom <per_sundstrom@yahoo.com>
Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=2681
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20200424075353.8489-3-lersek@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@arm.com>
The UEFI properties table and the associated memory protection feature was
severely broken from the start, and has been deprecated for a while. Let's
drop all references to it from OVMF so we can safely remove it from the
DXE core as well.
Link: https://bugzilla.tianocore.org/show_bug.cgi?id=2633
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@arm.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Jiewen Yao <Jiewen.yao@intel.com>
Acked-by: Liming Gao <liming.gao@intel.com>
Reviewed-by: Jian J Wang <jian.j.wang@intel.com>
Add a code comment that explains the nature of the NumberOfPages field
values. Including this kind of historical information was suggested by
Leif in <https://edk2.groups.io/g/devel/message/55797> (alternative link:
<http://mid.mail-archive.com/20200312104006.GB23627@bivouac.eciton.net>).
Right now, the most recent commit updating the page counts has been commit
991d956362 ("[...] Update default memory type information to reduce EFI
Memory Map fragmentation.", 2010-07-16).
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Leif Lindholm <leif@nuviainc.com>
Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
Suggested-by: Leif Lindholm <leif@nuviainc.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20200312223555.29267-2-lersek@redhat.com>
Reviewed-by: Leif Lindholm <leif@nuviainc.com>
* In the Intel whitepaper:
--v--
A Tour Beyond BIOS -- Secure SMM Communication
https://github.com/tianocore/tianocore.github.io/wiki/EDK-II-Security-White-Papershttps://github.com/tianocore-docs/Docs/raw/master/White_Papers/A_Tour_Beyond_BIOS_Secure_SMM_Communication.pdf
--^--
bullet#3 in section "Assumption and Recommendation", and bullet#4 in "Call
for action", recommend enabling the (adaptive) Memory Type Information
feature.
* In the Intel whitepaper:
--v--
A Tour Beyond BIOS -- Memory Map and Practices in UEFI BIOS
https://github.com/tianocore/tianocore.github.io/wiki/EDK-II-white-papershttps://github.com/tianocore-docs/Docs/raw/master/White_Papers/A_Tour_Beyond_BIOS_Memory_Map_And_Practices_in_UEFI_BIOS_V2.pdf
--^--
figure#6 describes the Memory Type Information feature in detail; namely
as a feedback loop between the Platform PEIM, the DXE IPL PEIM, the DXE
Core, and BDS.
Implement the missing PlatformPei functionality in OvmfPkg, for fulfilling
the Secure SMM Communication recommendation.
In the longer term, OVMF should install the WSMT ACPI table, and this
patch contributes to that.
Notes:
- the step in figure#6 where the UEFI variable is copied into the HOB is
covered by the DXE IPL PEIM, in the DxeLoadCore() function,
- "PcdResetOnMemoryTypeInformationChange" must be reverted to the DEC
default TRUE value, because both whitepapers indicate that BDS needs to
reset the system if the Memory Type Information changes.
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=386
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20200310222739.26717-6-lersek@redhat.com>
Acked-by: Leif Lindholm <leif@nuviainc.com>
Now that the SMRAM at the default SMBASE is honored everywhere necessary,
implement the actual detection. The (simple) steps are described in
previous patch "OvmfPkg/IndustryStandard: add MCH_DEFAULT_SMBASE* register
macros".
Regarding CSM_ENABLE builds: according to the discussion with Jiewen at
https://edk2.groups.io/g/devel/message/48082http://mid.mail-archive.com/74D8A39837DF1E4DA445A8C0B3885C503F7C9D2F@shsmsx102.ccr.corp.intel.com
if the platform has SMRAM at the default SMBASE, then we have to
(a) either punch a hole in the legacy E820 map as well, in
LegacyBiosBuildE820() [OvmfPkg/Csm/LegacyBiosDxe/LegacyBootSupport.c],
(b) or document, or programmatically catch, the incompatibility between
the "SMRAM at default SMBASE" and "CSM" features.
Because CSM is out of scope for the larger "VCPU hotplug with SMM"
feature, option (b) applies. Therefore, if the CSM is enabled in the OVMF
build, then PlatformPei will not attempt to detect SMRAM at the default
SMBASE, at all. This is approach (4) -- the most flexible one, for
end-users -- from:
http://mid.mail-archive.com/868dcff2-ecaa-e1c6-f018-abe7087d640c@redhat.comhttps://edk2.groups.io/g/devel/message/48348
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1512
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20200129214412.2361-12-lersek@redhat.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
When OVMF runs in a SEV guest, the initial SMM Save State Map is
(1) allocated as EfiBootServicesData type memory in OvmfPkg/PlatformPei,
function AmdSevInitialize(), for preventing unintended information
sharing with the hypervisor;
(2) decrypted in AmdSevDxe;
(3) re-encrypted in OvmfPkg/Library/SmmCpuFeaturesLib, function
SmmCpuFeaturesSmmRelocationComplete(), which is called by
PiSmmCpuDxeSmm right after initial SMBASE relocation;
(4) released to DXE at the same location.
The SMRAM at the default SMBASE is a superset of the initial Save State
Map. The reserved memory allocation in InitializeRamRegions(), from the
previous patch, must override the allocating and freeing in (1) and (4),
respectively. (Note: the decrypting and re-encrypting in (2) and (3) are
unaffected.)
In AmdSevInitialize(), only assert the containment of the initial Save
State Map, in the larger area already allocated by InitializeRamRegions().
In SmmCpuFeaturesSmmRelocationComplete(), preserve the allocation of the
initial Save State Map into OS runtime, as part of the allocation done by
InitializeRamRegions(). Only assert containment.
These changes only affect the normal boot path (the UEFI memory map is
untouched during S3 resume).
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1512
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Jiewen Yao <jiewen.yao@intel.com>
Message-Id: <20200129214412.2361-9-lersek@redhat.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
The 128KB SMRAM at the default SMBASE will be used for protecting the
initial SMI handler for hot-plugged VCPUs. After platform reset, the SMRAM
in question is open (and looks just like RAM). When BDS signals
EFI_DXE_MM_READY_TO_LOCK_PROTOCOL (and so TSEG is locked down), we're
going to lock the SMRAM at the default SMBASE too.
For this, we have to reserve said SMRAM area as early as possible, from
components in PEI, DXE, and OS runtime.
* QemuInitializeRam() currently produces a single resource descriptor HOB,
for exposing the system RAM available under 1GB. This occurs during both
normal boot and S3 resume identically (the latter only for the sake of
CpuMpPei borrowing low RAM for the AP startup vector).
But, the SMRAM at the default SMBASE falls in the middle of the current
system RAM HOB. Split the HOB, and cover the SMRAM with a reserved
memory HOB in the middle. CpuMpPei (via MpInitLib) skips reserved memory
HOBs.
* InitializeRamRegions() is responsible for producing memory allocation
HOBs, carving out parts of the resource descriptor HOBs produced in
QemuInitializeRam(). Allocate the above-introduced reserved memory
region in full, similarly to how we treat TSEG, so that DXE and the OS
avoid the locked SMRAM (black hole) in this area.
(Note that these allocations only occur on the normal boot path, as they
matter for the UEFI memory map, which cannot be changed during S3
resume.)
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1512
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Jiewen Yao <jiewen.yao@intel.com>
Message-Id: <20200129214412.2361-8-lersek@redhat.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
The permanent PEI RAM that is published on the normal boot path starts
strictly above MEMFD_BASE_ADDRESS (8 MB -- see the FDF files), regardless
of whether PEI decompression will be necessary on S3 resume due to
SMM_REQUIRE. Therefore the normal boot permanent PEI RAM never overlaps
with the SMRAM at the default SMBASE (192 KB).
The S3 resume permanent PEI RAM is strictly above the normal boot one.
Therefore the no-overlap statement holds true on the S3 resume path as
well.
Assert the no-overlap condition commonly for both boot paths in
PublishPeiMemory().
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1512
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Jiewen Yao <jiewen.yao@intel.com>
Message-Id: <20200129214412.2361-7-lersek@redhat.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Introduce the Q35SmramAtDefaultSmbaseInitialization() function for
detecting the "SMRAM at default SMBASE" feature.
For now, the function is only a skeleton, so that we can gradually build
upon the result while the result is hard-coded as FALSE. The actual
detection will occur in a later patch.
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1512
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Jiewen Yao <jiewen.yao@intel.com>
Message-Id: <20200129214412.2361-6-lersek@redhat.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Before adding another SMM-related, and therefore Q35-only, dynamically
detectable feature, extract the current board type check from
Q35TsegMbytesInitialization() to a standalone function.
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1512
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Jiewen Yao <jiewen.yao@intel.com>
Message-Id: <20200129214412.2361-5-lersek@redhat.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
MaxCpuCountInitialization() currently handles the following options:
(1) QEMU does not report the boot CPU count (FW_CFG_NB_CPUS is 0)
In this case, PlatformPei makes MpInitLib enumerate APs up to the
default PcdCpuMaxLogicalProcessorNumber value (64) minus 1, or until
the default PcdCpuApInitTimeOutInMicroSeconds (50,000) elapses.
(Whichever is reached first.)
Time-limited AP enumeration had never been reliable on QEMU/KVM, which
is why commit 45a70db3c3 strated handling case (2) below, in OVMF.
(2) QEMU reports the boot CPU count (FW_CFG_NB_CPUS is nonzero)
In this case, PlatformPei sets
- PcdCpuMaxLogicalProcessorNumber to the reported boot CPU count
(FW_CFG_NB_CPUS, which exports "PCMachineState.boot_cpus"),
- and PcdCpuApInitTimeOutInMicroSeconds to practically "infinity"
(MAX_UINT32, ~71 minutes).
That causes MpInitLib to enumerate exactly the present (boot) APs.
With CPU hotplug in mind, this method is not good enough. Because,
using QEMU terminology, UefiCpuPkg expects
PcdCpuMaxLogicalProcessorNumber to provide the "possible CPUs" count
("MachineState.smp.max_cpus"), which includes present and not present
CPUs both (with not present CPUs being subject for hot-plugging).
FW_CFG_NB_CPUS does not include not present CPUs.
Rewrite MaxCpuCountInitialization() for handling the following cases:
(1) The behavior of case (1) does not change. (No UefiCpuPkg PCDs are set
to values different from the defaults.)
(2) QEMU reports the boot CPU count ("PCMachineState.boot_cpus", via
FW_CFG_NB_CPUS), but not the possible CPUs count
("MachineState.smp.max_cpus").
In this case, the behavior remains unchanged.
The way MpInitLib is instructed to do the same differs however: we now
set the new PcdCpuBootLogicalProcessorNumber to the boot CPU count
(while continuing to set PcdCpuMaxLogicalProcessorNumber identically).
PcdCpuApInitTimeOutInMicroSeconds becomes irrelevant.
(3) QEMU reports both the boot CPU count ("PCMachineState.boot_cpus", via
FW_CFG_NB_CPUS), and the possible CPUs count
("MachineState.smp.max_cpus").
We tell UefiCpuPkg about the possible CPUs count through
PcdCpuMaxLogicalProcessorNumber. We also tell MpInitLib the boot CPU
count for precise and quick AP enumeration, via
PcdCpuBootLogicalProcessorNumber. PcdCpuApInitTimeOutInMicroSeconds is
irrelevant again.
This patch is a pre-requisite for enabling CPU hotplug with SMM_REQUIRE.
As a side effect, the patch also enables S3 to work with CPU hotplug at
once, *without* SMM_REQUIRE.
(Without the patch, S3 resume fails, if a CPU is hot-plugged at OS
runtime, prior to suspend: the FW_CFG_NB_CPUS increase seen during resume
causes PcdCpuMaxLogicalProcessorNumber to increase as well, which is not
permitted.
With the patch, PcdCpuMaxLogicalProcessorNumber stays the same, namely
"MachineState.smp.max_cpus". Therefore, the CPU structures allocated
during normal boot can accommodate the CPUs at S3 resume that have been
hotplugged prior to S3 suspend.)
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Julien Grall <julien.grall@arm.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1515
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20191022221554.14963-4-lersek@redhat.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Philippe Mathieu-Daude <philmd@redhat.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
EFI_XEN_OVMF_INFO is only useful to retrieve the E820 table. The
mXenHvmloaderInfo isn't used yet, but will be use in a further patch to
retrieve the E820 table.
Also remove the unused pointer from the XenInfo HOB as that information
is only useful in the XenPlatformPei.
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1689
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20190813113119.14804-11-anthony.perard@citrix.com>
Change referenced MSR name to avoid later build failure.
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Eric Dong <eric.dong@intel.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
REF:https://bugzilla.tianocore.org/show_bug.cgi?id=1843
For the driver's INF file, this commit will remove the redundant reference
to 'IntelFrameworkModulePkg/IntelFrameworkModulePkg.dec'.
Cc: Ray Ni <ray.ni@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Hao A Wu <hao.a.wu@intel.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
(This is a replacement for commit 39b9a5ffe6 ("OvmfPkg/PlatformPei: fix
MTRR for low-RAM sizes that have many bits clear", 2019-05-16).)
Reintroduce the same logic as seen in commit 39b9a5ffe6 for the pc
(i440fx) board type.
For q35, the same approach doesn't work any longer, given that (a) we'd
like to keep the PCIEXBAR in the platform DSC a fixed-at-build PCD, and
(b) QEMU expects the PCIEXBAR to reside at a lower address than the 32-bit
PCI MMIO aperture.
Therefore, introduce a helper function for determining the 32-bit
"uncacheable" (MMIO) area base address:
- On q35, this function behaves statically. Furthermore, the MTRR setup
exploits that the range [0xB000_0000, 0xFFFF_FFFF] can be marked UC with
just two variable MTRRs (one at 0xB000_0000 (size 256MB), another at
0xC000_0000 (size 1GB)).
- On pc (i440fx), the function behaves dynamically, implementing the same
logic as commit 39b9a5ffe6 did. The PciBase value is adjusted to the
value calculated, similarly to commit 39b9a5ffe6. A further
simplification is that we show that the UC32 area size truncation to a
whole power of two automatically guarantees a >=2GB base address.
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1859
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Philippe Mathieu-Daude <philmd@redhat.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
This reverts commit 60e95bf509.
The original fix for <https://bugzilla.tianocore.org/show_bug.cgi?id=1814>
triggered a bug / incorrect assumption in QEMU.
QEMU assumes that the PCIEXBAR is below the 32-bit PCI window, not above
it. When the firmware doesn't satisfy this assumption, QEMU generates an
\_SB.PCI0._CRS object in the ACPI DSDT that does not reflect the
firmware's 32-bit MMIO BAR assignments. This causes OSes to re-assign
32-bit MMIO BARs.
Working around the problem in the firmware looks less problematic than
fixing QEMU. Revert the original changes first, before implementing an
alternative fix.
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1859
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Philippe Mathieu-Daude <philmd@redhat.com>
This reverts commit 9a2e8d7c65.
The original fix for <https://bugzilla.tianocore.org/show_bug.cgi?id=1814>
triggered a bug / incorrect assumption in QEMU.
QEMU assumes that the PCIEXBAR is below the 32-bit PCI window, not above
it. When the firmware doesn't satisfy this assumption, QEMU generates an
\_SB.PCI0._CRS object in the ACPI DSDT that does not reflect the
firmware's 32-bit MMIO BAR assignments. This causes OSes to re-assign
32-bit MMIO BARs.
Working around the problem in the firmware looks less problematic than
fixing QEMU. Revert the original changes first, before implementing an
alternative fix.
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1859
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Philippe Mathieu-Daude <philmd@redhat.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
This reverts commit 75136b2954.
The original fix for <https://bugzilla.tianocore.org/show_bug.cgi?id=1814>
triggered a bug / incorrect assumption in QEMU.
QEMU assumes that the PCIEXBAR is below the 32-bit PCI window, not above
it. When the firmware doesn't satisfy this assumption, QEMU generates an
\_SB.PCI0._CRS object in the ACPI DSDT that does not reflect the
firmware's 32-bit MMIO BAR assignments. This causes OSes to re-assign
32-bit MMIO BARs.
Working around the problem in the firmware looks less problematic than
fixing QEMU. Revert the original changes first, before implementing an
alternative fix.
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1859
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Philippe Mathieu-Daude <philmd@redhat.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
This reverts commit 39b9a5ffe6.
The original fix for <https://bugzilla.tianocore.org/show_bug.cgi?id=1814>
triggered a bug / incorrect assumption in QEMU.
QEMU assumes that the PCIEXBAR is below the 32-bit PCI window, not above
it. When the firmware doesn't satisfy this assumption, QEMU generates an
\_SB.PCI0._CRS object in the ACPI DSDT that does not reflect the
firmware's 32-bit MMIO BAR assignments. This causes OSes to re-assign
32-bit MMIO BARs.
Working around the problem in the firmware looks less problematic than
fixing QEMU. Revert the original changes first, before implementing an
alternative fix.
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1859
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Philippe Mathieu-Daude <philmd@redhat.com>
Assume that we boot OVMF in a QEMU guest with 1025 MB of RAM. The
following assertion will fire:
> ASSERT_EFI_ERROR (Status = Out of Resources)
> ASSERT OvmfPkg/PlatformPei/MemDetect.c(696): !EFI_ERROR (Status)
That's because the range [1025 MB, 4 GB) that we try to mark as
uncacheable with MTRRs has size 3071 MB:
0x1_0000_0000
-0x0_4010_0000
--------------
0x0_BFF0_0000
The integer that stands for the uncacheable area size has 11 (eleven) bits
set to 1. As a result, covering this size requires 11 variable MTRRs (each
MTRR must cover a naturally aligned, power-of-two sized area). But, if we
need more variable MTRRs than the CPU can muster (such as 8), then
MtrrSetMemoryAttribute() fails, and we refuse to continue booting (which
is justified, in itself).
Unfortunately, this is not difficult to trigger, and the error message is
well-hidden from end-users, in the OVMF debug log. The following
mitigation is inspired by SeaBIOS:
Truncate the uncacheable area size to a power-of-two, while keeping the
end fixed at 4 GB. Such an interval can be covered by just one variable
MTRR.
This may leave such an MMIO gap, between the end of low-RAM and the start
of the uncacheable area, that is marked as WB (through the MTRR default).
Raise the base of the 32-bit PCI MMIO aperture accordingly -- the gap will
not be used for anything.
On Q35, the minimal 32-bit PCI MMIO aperture (triggered by RAM size 2815
MB) shrinks from
0xE000_0000 - 0xAFF0_0000 = 769 MB
to
0xE000_0000 - 0xC000_0000 = 512 MB
On i440fx, the minimal 32-bit PCI MMIO aperture (triggered by RAM size
3583 MB) shrinks from
0xFC00_0000 - 0xDFF0_0000 = 449 MB
to
0xFC00_0000 - 0xE000_0000 = 448 MB
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1814
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1666941
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1701710
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Philippe Mathieu-Daude <philmd@redhat.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Commit 7b8fe63561 ("OvmfPkg: PlatformPei: enable PCIEXBAR (aka MMCONFIG
/ ECAM) on Q35", 2016-03-10) claimed that,
On Q35 machine types that QEMU intends to support in the long term, QEMU
never lets the RAM below 4 GB exceed 2 GB.
Alas, this statement came from a misunderstanding that occurred while we
worked out the interface contract. In fact QEMU does allow the 32-bit RAM
extend up to 0xB000_0000 (exclusive), in case the RAM size falls in the
range (0x8000_0000, 0xB000_0000) (i.e., the RAM size is greater than
2048MB and smaller than 2816MB).
In turn, such a RAM size (justifiedly) triggers
ASSERT (TopOfLowRam <= PciExBarBase);
in MemMapInitialization(), because we placed the 256MB PCIEXBAR at
0x8000_0000 (2GB) exactly, relying on the interface contract. (And, the
32-bit PCI window would follow the PCIEXBAR, covering the [0x9000_0000,
0xFC00_0000) range.)
In order to fix this, reorder the 32-bit PCI window against the PCIEXBAR,
as follows:
- start the 32-bit PCI window where it starts on i440fx as well, that is,
at 2GB or TopOfLowRam, whichever is higher;
- unlike on i440fx, where the 32-bit PCI window extends up to 0xFC00_0000,
stop it at 0xE000_0000 on q35,
- place the PCIEXBAR at 0xE000_0000.
(We cannot place the PCIEXBAR at 0xF000_0000 because the 256MB MMIO area
that starts there is not entirely free.)
Before this patch, the 32-bit PCI window used to only *end* at the same
spot (namely, 0xFC00_0000) between i440fx and q35; now it will only
*start* at the same spot (namely, 2GB or TopOfLowRam, whichever is higher)
between both boards.
On q35, the maximal window shrinks from
0xFC00_0000 - 0x9000_0000 = 0x6C00_0000 == 1728 MB
to
0xE000_0000 - 0x8000_0000 == 1536 MB.
We lose 192 MB of the aperture; however, the aperture is now aligned at
1GB, rather than 256 MB, and so it could fit a 1GB BAR even.
Regarding the minimal window (triggered by RAM size 2815MB), its size is
0xE000_0000 - 0xAFF0_0000 = 769 MB
which is not great, but probably better than a failed ASSERT.
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1814
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1666941
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1701710
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Philippe Mathieu-Daude <philmd@redhat.com>
In the MemMapInitialization() function, we currently assign PciBase
different values, on both branches of the board type check. Hoist the
PciBase assignment from the i440fx branch in front of the "if". This is a
no-op for the i440fx branch. On the q35 branch, we overwrite this value,
hence the change is a no-op on q35 as well.
This is another refactoring for simplifying the rest of this series.
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1814
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1666941
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1701710
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>