2009-05-27 23:10:18 +02:00
|
|
|
/**@file
|
|
|
|
Memory Detection for Virtual Machines.
|
|
|
|
|
2016-04-21 08:31:55 +02:00
|
|
|
Copyright (c) 2006 - 2016, Intel Corporation. All rights reserved.<BR>
|
2019-04-04 01:06:33 +02:00
|
|
|
SPDX-License-Identifier: BSD-2-Clause-Patent
|
2009-05-27 23:10:18 +02:00
|
|
|
|
|
|
|
Module Name:
|
|
|
|
|
|
|
|
MemDetect.c
|
|
|
|
|
|
|
|
**/
|
|
|
|
|
|
|
|
//
|
|
|
|
// The package level header files this module uses
|
|
|
|
//
|
OvmfPkg/PlatformPei: support >=1TB high RAM, and discontiguous high RAM
In OVMF we currently get the upper (>=4GB) memory size with the
GetSystemMemorySizeAbove4gb() function.
The GetSystemMemorySizeAbove4gb() function is used in two places:
(1) It is the starting point of the calculations in GetFirstNonAddress().
GetFirstNonAddress() in turn
- determines the placement of the 64-bit PCI MMIO aperture,
- provides input for the GCD memory space map's sizing (see
AddressWidthInitialization(), and the CPU HOB in
MiscInitialization()),
- influences the permanent PEI RAM cap (the DXE core's page tables,
built in permanent PEI RAM, grow as the RAM to map grows).
(2) In QemuInitializeRam(), GetSystemMemorySizeAbove4gb() determines the
single memory descriptor HOB that we produce for the upper memory.
Respectively, there are two problems with GetSystemMemorySizeAbove4gb():
(1) It reads a 24-bit count of 64KB RAM chunks from the CMOS, and
therefore cannot return a larger value than one terabyte.
(2) It cannot express discontiguous high RAM.
Starting with version 1.7.0, QEMU has provided the fw_cfg file called
"etc/e820". Refer to the following QEMU commits:
- 0624c7f916b4 ("e820: pass high memory too.", 2013-10-10),
- 7d67110f2d9a ("pc: add etc/e820 fw_cfg file", 2013-10-18)
- 7db16f2480db ("pc: register e820 entries for ram", 2013-10-10)
Ever since these commits in v1.7.0 -- with the last QEMU release being
v2.9.0, and v2.10.0 under development --, the only two RAM entries added
to this E820 map correspond to the below-4GB RAM range, and the above-4GB
RAM range. And, the above-4GB range exactly matches the CMOS registers in
question; see the use of "pcms->above_4g_mem_size":
pc_q35_init() | pc_init1()
pc_memory_init()
e820_add_entry(0x100000000ULL, pcms->above_4g_mem_size, E820_RAM);
pc_cmos_init()
val = pcms->above_4g_mem_size / 65536;
rtc_set_memory(s, 0x5b, val);
rtc_set_memory(s, 0x5c, val >> 8);
rtc_set_memory(s, 0x5d, val >> 16);
Therefore, remedy the above OVMF limitations as follows:
(1) Start off GetFirstNonAddress() by scanning the E820 map for the
highest exclusive >=4GB RAM address. Fall back to the CMOS if the E820
map is unavailable. Base all further calculations (such as 64-bit PCI
MMIO aperture placement, GCD sizing etc) on this value.
At the moment, the only difference this change makes is that we can
have more than 1TB above 4GB -- given that the sole "high RAM" entry
in the E820 map matches the CMOS exactly, modulo the most significant
bits (see above).
However, Igor plans to add discontiguous (cold-plugged) high RAM to
the fw_cfg E820 RAM map later on, and then this scanning will adapt
automatically.
(2) In QemuInitializeRam(), describe the high RAM regions from the E820
map one by one with memory HOBs. Fall back to the CMOS only if the
E820 map is missing.
Again, right now this change only makes a difference if there is at
least 1TB high RAM. Later on it will adapt to discontiguous high RAM
(regardless of its size) automatically.
-*-
Implementation details: introduce the ScanOrAdd64BitE820Ram() function,
which reads the E820 entries from fw_cfg, and finds the highest exclusive
>=4GB RAM address, or produces memory resource descriptor HOBs for RAM
entries that start at or above 4GB. The RAM map is not read in a single
go, because its size can vary, and in PlatformPei we should stay away from
dynamic memory allocation, for the following reasons:
- "Pool" allocations are limited to ~64KB, are served from HOBs, and
cannot be released ever.
- "Page" allocations are seriously limited before PlatformPei installs the
permanent PEI RAM. Furthermore, page allocations can only be released in
DXE, with dedicated code (so the address would have to be passed on with
a HOB or PCD).
- Raw memory allocation HOBs would require the same freeing in DXE.
Therefore we process each E820 entry as soon as it is read from fw_cfg.
-*-
Considering the impact of high RAM on the DXE core:
A few years ago, installing high RAM as *tested* would cause the DXE core
to inhabit such ranges rather than carving out its home from the permanent
PEI RAM. Fortunately, this was fixed in the following edk2 commit:
3a05b13106d1, "MdeModulePkg DxeCore: Take the range in resource HOB for
PHIT as higher priority", 2015-09-18
which I regression-tested at the time:
http://mid.mail-archive.com/55FC27B0.4070807@redhat.com
Later on, OVMF was changed to install its high RAM as tested (effectively
"arming" the earlier DXE core change for OVMF), in the following edk2
commit:
035ce3b37c90, "OvmfPkg/PlatformPei: Add memory above 4GB as tested",
2016-04-21
which I also regression-tested at the time:
http://mid.mail-archive.com/571E8B90.1020102@redhat.com
Therefore adding more "tested memory" HOBs is safe.
Cc: Jordan Justen <jordan.l.justen@intel.com>
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1468526
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-07-08 01:28:37 +02:00
|
|
|
#include <IndustryStandard/E820.h>
|
OvmfPkg/PlatformPei: set 32-bit UC area at PciBase / PciExBarBase (pc/q35)
(This is a replacement for commit 39b9a5ffe661 ("OvmfPkg/PlatformPei: fix
MTRR for low-RAM sizes that have many bits clear", 2019-05-16).)
Reintroduce the same logic as seen in commit 39b9a5ffe661 for the pc
(i440fx) board type.
For q35, the same approach doesn't work any longer, given that (a) we'd
like to keep the PCIEXBAR in the platform DSC a fixed-at-build PCD, and
(b) QEMU expects the PCIEXBAR to reside at a lower address than the 32-bit
PCI MMIO aperture.
Therefore, introduce a helper function for determining the 32-bit
"uncacheable" (MMIO) area base address:
- On q35, this function behaves statically. Furthermore, the MTRR setup
exploits that the range [0xB000_0000, 0xFFFF_FFFF] can be marked UC with
just two variable MTRRs (one at 0xB000_0000 (size 256MB), another at
0xC000_0000 (size 1GB)).
- On pc (i440fx), the function behaves dynamically, implementing the same
logic as commit 39b9a5ffe661 did. The PciBase value is adjusted to the
value calculated, similarly to commit 39b9a5ffe661. A further
simplification is that we show that the UC32 area size truncation to a
whole power of two automatically guarantees a >=2GB base address.
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1859
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Philippe Mathieu-Daude <philmd@redhat.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
2019-05-29 14:49:55 +02:00
|
|
|
#include <IndustryStandard/I440FxPiix4.h>
|
2017-07-04 14:50:43 +02:00
|
|
|
#include <IndustryStandard/Q35MchIch9.h>
|
2009-05-27 23:10:18 +02:00
|
|
|
#include <PiPei.h>
|
2019-09-20 17:07:43 +02:00
|
|
|
#include <Register/Intel/SmramSaveStateMap.h>
|
2009-05-27 23:10:18 +02:00
|
|
|
|
|
|
|
//
|
|
|
|
// The Library classes this module consumes
|
|
|
|
//
|
2017-07-04 14:50:43 +02:00
|
|
|
#include <Library/BaseLib.h>
|
2014-03-04 09:03:23 +01:00
|
|
|
#include <Library/BaseMemoryLib.h>
|
2009-05-27 23:10:18 +02:00
|
|
|
#include <Library/DebugLib.h>
|
|
|
|
#include <Library/HobLib.h>
|
|
|
|
#include <Library/IoLib.h>
|
OvmfPkg/PlatformPei: Reserve GHCB-related areas if S3 is supported
BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2198
Protect the memory used by an SEV-ES guest when S3 is supported. This
includes the page table used to break down the 2MB page that contains
the GHCB so that it can be marked un-encrypted, as well as the GHCB
area.
Regarding the lifecycle of the GHCB-related memory areas:
PcdOvmfSecGhcbPageTableBase
PcdOvmfSecGhcbBase
(a) when and how it is initialized after first boot of the VM
If SEV-ES is enabled, the GHCB-related areas are initialized during
the SEC phase [OvmfPkg/ResetVector/Ia32/PageTables64.asm].
(b) how it is protected from memory allocations during DXE
If S3 and SEV-ES are enabled, then InitializeRamRegions()
[OvmfPkg/PlatformPei/MemDetect.c] protects the ranges with an AcpiNVS
memory allocation HOB, in PEI.
If S3 is disabled, then these ranges are not protected. DXE's own page
tables are first built while still in PEI (see HandOffToDxeCore()
[MdeModulePkg/Core/DxeIplPeim/X64/DxeLoadFunc.c]). Those tables are
located in permanent PEI memory. After CR3 is switched over to them
(which occurs before jumping to the DXE core entry point), we don't have
to preserve PcdOvmfSecGhcbPageTableBase. PEI switches to GHCB pages in
permanent PEI memory and DXE will use these PEI GHCB pages, so we don't
have to preserve PcdOvmfSecGhcbBase.
(c) how it is protected from the OS
If S3 is enabled, then (b) reserves it from the OS too.
If S3 is disabled, then the range needs no protection.
(d) how it is accessed on the S3 resume path
It is rewritten same as in (a), which is fine because (b) reserved it.
(e) how it is accessed on the warm reset path
It is rewritten same as in (a).
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@arm.com>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Julien Grall <julien@xen.org>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Regression-tested-by: Laszlo Ersek <lersek@redhat.com>
2020-08-12 22:21:40 +02:00
|
|
|
#include <Library/MemEncryptSevLib.h>
|
2010-01-04 17:17:59 +01:00
|
|
|
#include <Library/PcdLib.h>
|
2017-07-04 14:50:43 +02:00
|
|
|
#include <Library/PciLib.h>
|
2009-05-27 23:10:18 +02:00
|
|
|
#include <Library/PeimEntryPoint.h>
|
|
|
|
#include <Library/ResourcePublicationLib.h>
|
2011-10-28 08:04:01 +02:00
|
|
|
#include <Library/MtrrLib.h>
|
OvmfPkg: PlatformPei: determine the 64-bit PCI host aperture for X64 DXE
The main observation about the 64-bit PCI host aperture is that it is the
highest part of the useful address space. It impacts the top of the GCD
memory space map, and, consequently, our maximum address width calculation
for the CPU HOB too.
Thus, modify the GetFirstNonAddress() function to consider the following
areas above the high RAM, while calculating the first non-address (i.e.,
the highest inclusive address, plus one):
- the memory hotplug area (optional, the size comes from QEMU),
- the 64-bit PCI host aperture (we set a default size).
While computing the first non-address, capture the base and the size of
the 64-bit PCI host aperture at once in PCDs, since they are natural parts
of the calculation.
(Similarly to how PcdPciMmio32* are not rewritten on the S3 resume path
(see the InitializePlatform() -> MemMapInitialization() condition), nor
are PcdPciMmio64*. Only the core PciHostBridgeDxe driver consumes them,
through our PciHostBridgeLib instance.)
Set 32GB as the default size for the aperture. Issue#59 mentions the
NVIDIA Tesla K80 as an assignable device. According to nvidia.com, these
cards may have 24GB of memory (probably 16GB + 8GB BARs).
As a strictly experimental feature, the user can specify the size of the
aperture (in MB) as well, with the QEMU option
-fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=65536
The "X-" prefix follows the QEMU tradition (spelled "x-" there), meaning
that the property is experimental, unstable, and might go away any time.
Gerd has proposed heuristics for sizing the aperture automatically (based
on 1GB page support and PCPU address width), but such should be delayed to
a later patch (which may very well back out "X-PciMmio64Mb" then).
For "everyday" guests, the 32GB default for the aperture size shouldn't
impact the PEI memory demand (the size of the page tables that the DXE IPL
PEIM builds). Namely, we've never reported narrower than 36-bit addresses;
the DXE IPL PEIM has always built page tables for 64GB at least.
For the aperture to bump the address width above 36 bits, either the guest
must have quite a bit of memory itself (in which case the additional PEI
memory demand shouldn't matter), or the user must specify a large aperture
manually with "X-PciMmio64Mb" (and then he or she is also responsible for
giving enough RAM to the VM, to satisfy the PEI memory demand).
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Thomas Lamprecht <t.lamprecht@proxmox.com>
Ref: https://github.com/tianocore/edk2/issues/59
Ref: http://www.nvidia.com/object/tesla-servers.html
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-04 19:30:45 +01:00
|
|
|
#include <Library/QemuFwCfgLib.h>
|
2020-04-24 09:53:48 +02:00
|
|
|
#include <Library/QemuFwCfgSimpleParserLib.h>
|
2009-05-27 23:10:18 +02:00
|
|
|
|
|
|
|
#include "Platform.h"
|
|
|
|
#include "Cmos.h"
|
|
|
|
|
2015-06-26 18:09:39 +02:00
|
|
|
UINT8 mPhysMemAddressWidth;
|
|
|
|
|
2016-07-13 00:52:54 +02:00
|
|
|
STATIC UINT32 mS3AcpiReservedMemoryBase;
|
|
|
|
STATIC UINT32 mS3AcpiReservedMemorySize;
|
|
|
|
|
2017-07-04 12:44:05 +02:00
|
|
|
STATIC UINT16 mQ35TsegMbytes;
|
|
|
|
|
2019-09-20 14:02:14 +02:00
|
|
|
BOOLEAN mQ35SmramAtDefaultSmbase;
|
|
|
|
|
OvmfPkg/PlatformPei: set 32-bit UC area at PciBase / PciExBarBase (pc/q35)
(This is a replacement for commit 39b9a5ffe661 ("OvmfPkg/PlatformPei: fix
MTRR for low-RAM sizes that have many bits clear", 2019-05-16).)
Reintroduce the same logic as seen in commit 39b9a5ffe661 for the pc
(i440fx) board type.
For q35, the same approach doesn't work any longer, given that (a) we'd
like to keep the PCIEXBAR in the platform DSC a fixed-at-build PCD, and
(b) QEMU expects the PCIEXBAR to reside at a lower address than the 32-bit
PCI MMIO aperture.
Therefore, introduce a helper function for determining the 32-bit
"uncacheable" (MMIO) area base address:
- On q35, this function behaves statically. Furthermore, the MTRR setup
exploits that the range [0xB000_0000, 0xFFFF_FFFF] can be marked UC with
just two variable MTRRs (one at 0xB000_0000 (size 256MB), another at
0xC000_0000 (size 1GB)).
- On pc (i440fx), the function behaves dynamically, implementing the same
logic as commit 39b9a5ffe661 did. The PciBase value is adjusted to the
value calculated, similarly to commit 39b9a5ffe661. A further
simplification is that we show that the UC32 area size truncation to a
whole power of two automatically guarantees a >=2GB base address.
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1859
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Philippe Mathieu-Daude <philmd@redhat.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
2019-05-29 14:49:55 +02:00
|
|
|
UINT32 mQemuUc32Base;
|
|
|
|
|
2017-07-04 12:44:05 +02:00
|
|
|
VOID
|
|
|
|
Q35TsegMbytesInitialization (
|
|
|
|
VOID
|
|
|
|
)
|
|
|
|
{
|
2017-07-04 14:50:43 +02:00
|
|
|
UINT16 ExtendedTsegMbytes;
|
|
|
|
RETURN_STATUS PcdStatus;
|
|
|
|
|
2019-09-20 13:36:56 +02:00
|
|
|
ASSERT (mHostBridgeDevId == INTEL_Q35_MCH_DEVICE_ID);
|
2017-07-04 14:50:43 +02:00
|
|
|
|
|
|
|
//
|
|
|
|
// Check if QEMU offers an extended TSEG.
|
|
|
|
//
|
|
|
|
// This can be seen from writing MCH_EXT_TSEG_MB_QUERY to the MCH_EXT_TSEG_MB
|
|
|
|
// register, and reading back the register.
|
|
|
|
//
|
|
|
|
// On a QEMU machine type that does not offer an extended TSEG, the initial
|
|
|
|
// write overwrites whatever value a malicious guest OS may have placed in
|
|
|
|
// the (unimplemented) register, before entering S3 or rebooting.
|
|
|
|
// Subsequently, the read returns MCH_EXT_TSEG_MB_QUERY unchanged.
|
|
|
|
//
|
|
|
|
// On a QEMU machine type that offers an extended TSEG, the initial write
|
|
|
|
// triggers an update to the register. Subsequently, the value read back
|
|
|
|
// (which is guaranteed to differ from MCH_EXT_TSEG_MB_QUERY) tells us the
|
|
|
|
// number of megabytes.
|
|
|
|
//
|
|
|
|
PciWrite16 (DRAMC_REGISTER_Q35 (MCH_EXT_TSEG_MB), MCH_EXT_TSEG_MB_QUERY);
|
|
|
|
ExtendedTsegMbytes = PciRead16 (DRAMC_REGISTER_Q35 (MCH_EXT_TSEG_MB));
|
|
|
|
if (ExtendedTsegMbytes == MCH_EXT_TSEG_MB_QUERY) {
|
|
|
|
mQ35TsegMbytes = PcdGet16 (PcdQ35TsegMbytes);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
DEBUG ((
|
|
|
|
DEBUG_INFO,
|
|
|
|
"%a: QEMU offers an extended TSEG (%d MB)\n",
|
|
|
|
__FUNCTION__,
|
|
|
|
ExtendedTsegMbytes
|
|
|
|
));
|
|
|
|
PcdStatus = PcdSet16S (PcdQ35TsegMbytes, ExtendedTsegMbytes);
|
|
|
|
ASSERT_RETURN_ERROR (PcdStatus);
|
|
|
|
mQ35TsegMbytes = ExtendedTsegMbytes;
|
2017-07-04 12:44:05 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
2019-09-20 14:02:14 +02:00
|
|
|
VOID
|
|
|
|
Q35SmramAtDefaultSmbaseInitialization (
|
|
|
|
VOID
|
|
|
|
)
|
|
|
|
{
|
|
|
|
RETURN_STATUS PcdStatus;
|
|
|
|
|
|
|
|
ASSERT (mHostBridgeDevId == INTEL_Q35_MCH_DEVICE_ID);
|
|
|
|
|
|
|
|
mQ35SmramAtDefaultSmbase = FALSE;
|
OvmfPkg/PlatformPei: detect SMRAM at default SMBASE (for real)
Now that the SMRAM at the default SMBASE is honored everywhere necessary,
implement the actual detection. The (simple) steps are described in
previous patch "OvmfPkg/IndustryStandard: add MCH_DEFAULT_SMBASE* register
macros".
Regarding CSM_ENABLE builds: according to the discussion with Jiewen at
https://edk2.groups.io/g/devel/message/48082
http://mid.mail-archive.com/74D8A39837DF1E4DA445A8C0B3885C503F7C9D2F@shsmsx102.ccr.corp.intel.com
if the platform has SMRAM at the default SMBASE, then we have to
(a) either punch a hole in the legacy E820 map as well, in
LegacyBiosBuildE820() [OvmfPkg/Csm/LegacyBiosDxe/LegacyBootSupport.c],
(b) or document, or programmatically catch, the incompatibility between
the "SMRAM at default SMBASE" and "CSM" features.
Because CSM is out of scope for the larger "VCPU hotplug with SMM"
feature, option (b) applies. Therefore, if the CSM is enabled in the OVMF
build, then PlatformPei will not attempt to detect SMRAM at the default
SMBASE, at all. This is approach (4) -- the most flexible one, for
end-users -- from:
http://mid.mail-archive.com/868dcff2-ecaa-e1c6-f018-abe7087d640c@redhat.com
https://edk2.groups.io/g/devel/message/48348
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1512
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20200129214412.2361-12-lersek@redhat.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
2019-09-22 11:52:48 +02:00
|
|
|
if (FeaturePcdGet (PcdCsmEnable)) {
|
|
|
|
DEBUG ((DEBUG_INFO, "%a: SMRAM at default SMBASE not checked due to CSM\n",
|
|
|
|
__FUNCTION__));
|
|
|
|
} else {
|
|
|
|
UINTN CtlReg;
|
|
|
|
UINT8 CtlRegVal;
|
|
|
|
|
|
|
|
CtlReg = DRAMC_REGISTER_Q35 (MCH_DEFAULT_SMBASE_CTL);
|
|
|
|
PciWrite8 (CtlReg, MCH_DEFAULT_SMBASE_QUERY);
|
|
|
|
CtlRegVal = PciRead8 (CtlReg);
|
|
|
|
mQ35SmramAtDefaultSmbase = (BOOLEAN)(CtlRegVal ==
|
|
|
|
MCH_DEFAULT_SMBASE_IN_RAM);
|
|
|
|
DEBUG ((DEBUG_INFO, "%a: SMRAM at default SMBASE %a\n", __FUNCTION__,
|
|
|
|
mQ35SmramAtDefaultSmbase ? "found" : "not found"));
|
|
|
|
}
|
|
|
|
|
2019-09-20 14:02:14 +02:00
|
|
|
PcdStatus = PcdSetBoolS (PcdQ35SmramAtDefaultSmbase,
|
|
|
|
mQ35SmramAtDefaultSmbase);
|
|
|
|
ASSERT_RETURN_ERROR (PcdStatus);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
OvmfPkg/PlatformPei: set 32-bit UC area at PciBase / PciExBarBase (pc/q35)
(This is a replacement for commit 39b9a5ffe661 ("OvmfPkg/PlatformPei: fix
MTRR for low-RAM sizes that have many bits clear", 2019-05-16).)
Reintroduce the same logic as seen in commit 39b9a5ffe661 for the pc
(i440fx) board type.
For q35, the same approach doesn't work any longer, given that (a) we'd
like to keep the PCIEXBAR in the platform DSC a fixed-at-build PCD, and
(b) QEMU expects the PCIEXBAR to reside at a lower address than the 32-bit
PCI MMIO aperture.
Therefore, introduce a helper function for determining the 32-bit
"uncacheable" (MMIO) area base address:
- On q35, this function behaves statically. Furthermore, the MTRR setup
exploits that the range [0xB000_0000, 0xFFFF_FFFF] can be marked UC with
just two variable MTRRs (one at 0xB000_0000 (size 256MB), another at
0xC000_0000 (size 1GB)).
- On pc (i440fx), the function behaves dynamically, implementing the same
logic as commit 39b9a5ffe661 did. The PciBase value is adjusted to the
value calculated, similarly to commit 39b9a5ffe661. A further
simplification is that we show that the UC32 area size truncation to a
whole power of two automatically guarantees a >=2GB base address.
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1859
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Philippe Mathieu-Daude <philmd@redhat.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
2019-05-29 14:49:55 +02:00
|
|
|
VOID
|
|
|
|
QemuUc32BaseInitialization (
|
|
|
|
VOID
|
|
|
|
)
|
|
|
|
{
|
|
|
|
UINT32 LowerMemorySize;
|
|
|
|
UINT32 Uc32Size;
|
|
|
|
|
|
|
|
if (mXen) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (mHostBridgeDevId == INTEL_Q35_MCH_DEVICE_ID) {
|
|
|
|
//
|
|
|
|
// On q35, the 32-bit area that we'll mark as UC, through variable MTRRs,
|
|
|
|
// starts at PcdPciExpressBaseAddress. The platform DSC is responsible for
|
|
|
|
// setting PcdPciExpressBaseAddress such that describing the
|
|
|
|
// [PcdPciExpressBaseAddress, 4GB) range require a very small number of
|
|
|
|
// variable MTRRs (preferably 1 or 2).
|
|
|
|
//
|
|
|
|
ASSERT (FixedPcdGet64 (PcdPciExpressBaseAddress) <= MAX_UINT32);
|
|
|
|
mQemuUc32Base = (UINT32)FixedPcdGet64 (PcdPciExpressBaseAddress);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
ASSERT (mHostBridgeDevId == INTEL_82441_DEVICE_ID);
|
|
|
|
//
|
|
|
|
// On i440fx, start with the [LowerMemorySize, 4GB) range. Make sure one
|
|
|
|
// variable MTRR suffices by truncating the size to a whole power of two,
|
|
|
|
// while keeping the end affixed to 4GB. This will round the base up.
|
|
|
|
//
|
|
|
|
LowerMemorySize = GetSystemMemorySizeBelow4gb ();
|
|
|
|
Uc32Size = GetPowerOfTwo32 ((UINT32)(SIZE_4GB - LowerMemorySize));
|
|
|
|
mQemuUc32Base = (UINT32)(SIZE_4GB - Uc32Size);
|
|
|
|
//
|
|
|
|
// Assuming that LowerMemorySize is at least 1 byte, Uc32Size is at most 2GB.
|
|
|
|
// Therefore mQemuUc32Base is at least 2GB.
|
|
|
|
//
|
|
|
|
ASSERT (mQemuUc32Base >= BASE_2GB);
|
|
|
|
|
|
|
|
if (mQemuUc32Base != LowerMemorySize) {
|
|
|
|
DEBUG ((DEBUG_VERBOSE, "%a: rounded UC32 base from 0x%x up to 0x%x, for "
|
|
|
|
"an UC32 size of 0x%x\n", __FUNCTION__, LowerMemorySize, mQemuUc32Base,
|
|
|
|
Uc32Size));
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
|
OvmfPkg/PlatformPei: support >=1TB high RAM, and discontiguous high RAM
In OVMF we currently get the upper (>=4GB) memory size with the
GetSystemMemorySizeAbove4gb() function.
The GetSystemMemorySizeAbove4gb() function is used in two places:
(1) It is the starting point of the calculations in GetFirstNonAddress().
GetFirstNonAddress() in turn
- determines the placement of the 64-bit PCI MMIO aperture,
- provides input for the GCD memory space map's sizing (see
AddressWidthInitialization(), and the CPU HOB in
MiscInitialization()),
- influences the permanent PEI RAM cap (the DXE core's page tables,
built in permanent PEI RAM, grow as the RAM to map grows).
(2) In QemuInitializeRam(), GetSystemMemorySizeAbove4gb() determines the
single memory descriptor HOB that we produce for the upper memory.
Respectively, there are two problems with GetSystemMemorySizeAbove4gb():
(1) It reads a 24-bit count of 64KB RAM chunks from the CMOS, and
therefore cannot return a larger value than one terabyte.
(2) It cannot express discontiguous high RAM.
Starting with version 1.7.0, QEMU has provided the fw_cfg file called
"etc/e820". Refer to the following QEMU commits:
- 0624c7f916b4 ("e820: pass high memory too.", 2013-10-10),
- 7d67110f2d9a ("pc: add etc/e820 fw_cfg file", 2013-10-18)
- 7db16f2480db ("pc: register e820 entries for ram", 2013-10-10)
Ever since these commits in v1.7.0 -- with the last QEMU release being
v2.9.0, and v2.10.0 under development --, the only two RAM entries added
to this E820 map correspond to the below-4GB RAM range, and the above-4GB
RAM range. And, the above-4GB range exactly matches the CMOS registers in
question; see the use of "pcms->above_4g_mem_size":
pc_q35_init() | pc_init1()
pc_memory_init()
e820_add_entry(0x100000000ULL, pcms->above_4g_mem_size, E820_RAM);
pc_cmos_init()
val = pcms->above_4g_mem_size / 65536;
rtc_set_memory(s, 0x5b, val);
rtc_set_memory(s, 0x5c, val >> 8);
rtc_set_memory(s, 0x5d, val >> 16);
Therefore, remedy the above OVMF limitations as follows:
(1) Start off GetFirstNonAddress() by scanning the E820 map for the
highest exclusive >=4GB RAM address. Fall back to the CMOS if the E820
map is unavailable. Base all further calculations (such as 64-bit PCI
MMIO aperture placement, GCD sizing etc) on this value.
At the moment, the only difference this change makes is that we can
have more than 1TB above 4GB -- given that the sole "high RAM" entry
in the E820 map matches the CMOS exactly, modulo the most significant
bits (see above).
However, Igor plans to add discontiguous (cold-plugged) high RAM to
the fw_cfg E820 RAM map later on, and then this scanning will adapt
automatically.
(2) In QemuInitializeRam(), describe the high RAM regions from the E820
map one by one with memory HOBs. Fall back to the CMOS only if the
E820 map is missing.
Again, right now this change only makes a difference if there is at
least 1TB high RAM. Later on it will adapt to discontiguous high RAM
(regardless of its size) automatically.
-*-
Implementation details: introduce the ScanOrAdd64BitE820Ram() function,
which reads the E820 entries from fw_cfg, and finds the highest exclusive
>=4GB RAM address, or produces memory resource descriptor HOBs for RAM
entries that start at or above 4GB. The RAM map is not read in a single
go, because its size can vary, and in PlatformPei we should stay away from
dynamic memory allocation, for the following reasons:
- "Pool" allocations are limited to ~64KB, are served from HOBs, and
cannot be released ever.
- "Page" allocations are seriously limited before PlatformPei installs the
permanent PEI RAM. Furthermore, page allocations can only be released in
DXE, with dedicated code (so the address would have to be passed on with
a HOB or PCD).
- Raw memory allocation HOBs would require the same freeing in DXE.
Therefore we process each E820 entry as soon as it is read from fw_cfg.
-*-
Considering the impact of high RAM on the DXE core:
A few years ago, installing high RAM as *tested* would cause the DXE core
to inhabit such ranges rather than carving out its home from the permanent
PEI RAM. Fortunately, this was fixed in the following edk2 commit:
3a05b13106d1, "MdeModulePkg DxeCore: Take the range in resource HOB for
PHIT as higher priority", 2015-09-18
which I regression-tested at the time:
http://mid.mail-archive.com/55FC27B0.4070807@redhat.com
Later on, OVMF was changed to install its high RAM as tested (effectively
"arming" the earlier DXE core change for OVMF), in the following edk2
commit:
035ce3b37c90, "OvmfPkg/PlatformPei: Add memory above 4GB as tested",
2016-04-21
which I also regression-tested at the time:
http://mid.mail-archive.com/571E8B90.1020102@redhat.com
Therefore adding more "tested memory" HOBs is safe.
Cc: Jordan Justen <jordan.l.justen@intel.com>
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1468526
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-07-08 01:28:37 +02:00
|
|
|
/**
|
|
|
|
Iterate over the RAM entries in QEMU's fw_cfg E820 RAM map that start outside
|
|
|
|
of the 32-bit address range.
|
|
|
|
|
|
|
|
Find the highest exclusive >=4GB RAM address, or produce memory resource
|
|
|
|
descriptor HOBs for RAM entries that start at or above 4GB.
|
|
|
|
|
|
|
|
@param[out] MaxAddress If MaxAddress is NULL, then ScanOrAdd64BitE820Ram()
|
|
|
|
produces memory resource descriptor HOBs for RAM
|
|
|
|
entries that start at or above 4GB.
|
|
|
|
|
|
|
|
Otherwise, MaxAddress holds the highest exclusive
|
|
|
|
>=4GB RAM address on output. If QEMU's fw_cfg E820
|
|
|
|
RAM map contains no RAM entry that starts outside of
|
|
|
|
the 32-bit address range, then MaxAddress is exactly
|
|
|
|
4GB on output.
|
|
|
|
|
|
|
|
@retval EFI_SUCCESS The fw_cfg E820 RAM map was found and processed.
|
|
|
|
|
|
|
|
@retval EFI_PROTOCOL_ERROR The RAM map was found, but its size wasn't a
|
|
|
|
whole multiple of sizeof(EFI_E820_ENTRY64). No
|
|
|
|
RAM entry was processed.
|
|
|
|
|
|
|
|
@return Error codes from QemuFwCfgFindFile(). No RAM
|
|
|
|
entry was processed.
|
|
|
|
**/
|
|
|
|
STATIC
|
|
|
|
EFI_STATUS
|
|
|
|
ScanOrAdd64BitE820Ram (
|
|
|
|
OUT UINT64 *MaxAddress OPTIONAL
|
|
|
|
)
|
|
|
|
{
|
|
|
|
EFI_STATUS Status;
|
|
|
|
FIRMWARE_CONFIG_ITEM FwCfgItem;
|
|
|
|
UINTN FwCfgSize;
|
|
|
|
EFI_E820_ENTRY64 E820Entry;
|
|
|
|
UINTN Processed;
|
|
|
|
|
|
|
|
Status = QemuFwCfgFindFile ("etc/e820", &FwCfgItem, &FwCfgSize);
|
|
|
|
if (EFI_ERROR (Status)) {
|
|
|
|
return Status;
|
|
|
|
}
|
|
|
|
if (FwCfgSize % sizeof E820Entry != 0) {
|
|
|
|
return EFI_PROTOCOL_ERROR;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (MaxAddress != NULL) {
|
|
|
|
*MaxAddress = BASE_4GB;
|
|
|
|
}
|
|
|
|
|
|
|
|
QemuFwCfgSelectItem (FwCfgItem);
|
|
|
|
for (Processed = 0; Processed < FwCfgSize; Processed += sizeof E820Entry) {
|
|
|
|
QemuFwCfgReadBytes (sizeof E820Entry, &E820Entry);
|
|
|
|
DEBUG ((
|
|
|
|
DEBUG_VERBOSE,
|
|
|
|
"%a: Base=0x%Lx Length=0x%Lx Type=%u\n",
|
|
|
|
__FUNCTION__,
|
|
|
|
E820Entry.BaseAddr,
|
|
|
|
E820Entry.Length,
|
|
|
|
E820Entry.Type
|
|
|
|
));
|
|
|
|
if (E820Entry.Type == EfiAcpiAddressRangeMemory &&
|
|
|
|
E820Entry.BaseAddr >= BASE_4GB) {
|
|
|
|
if (MaxAddress == NULL) {
|
|
|
|
UINT64 Base;
|
|
|
|
UINT64 End;
|
|
|
|
|
|
|
|
//
|
|
|
|
// Round up the start address, and round down the end address.
|
|
|
|
//
|
|
|
|
Base = ALIGN_VALUE (E820Entry.BaseAddr, (UINT64)EFI_PAGE_SIZE);
|
|
|
|
End = (E820Entry.BaseAddr + E820Entry.Length) &
|
|
|
|
~(UINT64)EFI_PAGE_MASK;
|
|
|
|
if (Base < End) {
|
|
|
|
AddMemoryRangeHob (Base, End);
|
|
|
|
DEBUG ((
|
|
|
|
DEBUG_VERBOSE,
|
|
|
|
"%a: AddMemoryRangeHob [0x%Lx, 0x%Lx)\n",
|
|
|
|
__FUNCTION__,
|
|
|
|
Base,
|
|
|
|
End
|
|
|
|
));
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
UINT64 Candidate;
|
|
|
|
|
|
|
|
Candidate = E820Entry.BaseAddr + E820Entry.Length;
|
|
|
|
if (Candidate > *MaxAddress) {
|
|
|
|
*MaxAddress = Candidate;
|
|
|
|
DEBUG ((
|
|
|
|
DEBUG_VERBOSE,
|
|
|
|
"%a: MaxAddress=0x%Lx\n",
|
|
|
|
__FUNCTION__,
|
|
|
|
*MaxAddress
|
|
|
|
));
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return EFI_SUCCESS;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2014-02-01 22:22:43 +01:00
|
|
|
UINT32
|
2011-01-21 17:51:00 +01:00
|
|
|
GetSystemMemorySizeBelow4gb (
|
2014-02-01 22:22:43 +01:00
|
|
|
VOID
|
2009-05-27 23:10:18 +02:00
|
|
|
)
|
|
|
|
{
|
|
|
|
UINT8 Cmos0x34;
|
|
|
|
UINT8 Cmos0x35;
|
|
|
|
|
|
|
|
//
|
|
|
|
// CMOS 0x34/0x35 specifies the system memory above 16 MB.
|
|
|
|
// * CMOS(0x35) is the high byte
|
|
|
|
// * CMOS(0x34) is the low byte
|
|
|
|
// * The size is specified in 64kb chunks
|
|
|
|
// * Since this is memory above 16MB, the 16MB must be added
|
|
|
|
// into the calculation to get the total memory size.
|
|
|
|
//
|
|
|
|
|
|
|
|
Cmos0x34 = (UINT8) CmosRead8 (0x34);
|
|
|
|
Cmos0x35 = (UINT8) CmosRead8 (0x35);
|
|
|
|
|
2014-09-25 04:29:10 +02:00
|
|
|
return (UINT32) (((UINTN)((Cmos0x35 << 8) + Cmos0x34) << 16) + SIZE_16MB);
|
2009-05-27 23:10:18 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
2011-01-21 17:51:00 +01:00
|
|
|
STATIC
|
|
|
|
UINT64
|
|
|
|
GetSystemMemorySizeAbove4gb (
|
|
|
|
)
|
|
|
|
{
|
|
|
|
UINT32 Size;
|
|
|
|
UINTN CmosIndex;
|
|
|
|
|
|
|
|
//
|
|
|
|
// CMOS 0x5b-0x5d specifies the system memory above 4GB MB.
|
|
|
|
// * CMOS(0x5d) is the most significant size byte
|
|
|
|
// * CMOS(0x5c) is the middle size byte
|
|
|
|
// * CMOS(0x5b) is the least significant size byte
|
|
|
|
// * The size is specified in 64kb chunks
|
|
|
|
//
|
|
|
|
|
|
|
|
Size = 0;
|
|
|
|
for (CmosIndex = 0x5d; CmosIndex >= 0x5b; CmosIndex--) {
|
|
|
|
Size = (UINT32) (Size << 8) + (UINT32) CmosRead8 (CmosIndex);
|
|
|
|
}
|
|
|
|
|
|
|
|
return LShiftU64 (Size, 16);
|
|
|
|
}
|
|
|
|
|
2015-06-26 18:09:39 +02:00
|
|
|
|
2016-03-04 17:23:35 +01:00
|
|
|
/**
|
|
|
|
Return the highest address that DXE could possibly use, plus one.
|
|
|
|
**/
|
|
|
|
STATIC
|
|
|
|
UINT64
|
|
|
|
GetFirstNonAddress (
|
|
|
|
VOID
|
|
|
|
)
|
|
|
|
{
|
|
|
|
UINT64 FirstNonAddress;
|
OvmfPkg: PlatformPei: determine the 64-bit PCI host aperture for X64 DXE
The main observation about the 64-bit PCI host aperture is that it is the
highest part of the useful address space. It impacts the top of the GCD
memory space map, and, consequently, our maximum address width calculation
for the CPU HOB too.
Thus, modify the GetFirstNonAddress() function to consider the following
areas above the high RAM, while calculating the first non-address (i.e.,
the highest inclusive address, plus one):
- the memory hotplug area (optional, the size comes from QEMU),
- the 64-bit PCI host aperture (we set a default size).
While computing the first non-address, capture the base and the size of
the 64-bit PCI host aperture at once in PCDs, since they are natural parts
of the calculation.
(Similarly to how PcdPciMmio32* are not rewritten on the S3 resume path
(see the InitializePlatform() -> MemMapInitialization() condition), nor
are PcdPciMmio64*. Only the core PciHostBridgeDxe driver consumes them,
through our PciHostBridgeLib instance.)
Set 32GB as the default size for the aperture. Issue#59 mentions the
NVIDIA Tesla K80 as an assignable device. According to nvidia.com, these
cards may have 24GB of memory (probably 16GB + 8GB BARs).
As a strictly experimental feature, the user can specify the size of the
aperture (in MB) as well, with the QEMU option
-fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=65536
The "X-" prefix follows the QEMU tradition (spelled "x-" there), meaning
that the property is experimental, unstable, and might go away any time.
Gerd has proposed heuristics for sizing the aperture automatically (based
on 1GB page support and PCPU address width), but such should be delayed to
a later patch (which may very well back out "X-PciMmio64Mb" then).
For "everyday" guests, the 32GB default for the aperture size shouldn't
impact the PEI memory demand (the size of the page tables that the DXE IPL
PEIM builds). Namely, we've never reported narrower than 36-bit addresses;
the DXE IPL PEIM has always built page tables for 64GB at least.
For the aperture to bump the address width above 36 bits, either the guest
must have quite a bit of memory itself (in which case the additional PEI
memory demand shouldn't matter), or the user must specify a large aperture
manually with "X-PciMmio64Mb" (and then he or she is also responsible for
giving enough RAM to the VM, to satisfy the PEI memory demand).
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Thomas Lamprecht <t.lamprecht@proxmox.com>
Ref: https://github.com/tianocore/edk2/issues/59
Ref: http://www.nvidia.com/object/tesla-servers.html
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-04 19:30:45 +01:00
|
|
|
UINT64 Pci64Base, Pci64Size;
|
2020-04-24 09:53:48 +02:00
|
|
|
UINT32 FwCfgPciMmio64Mb;
|
OvmfPkg: PlatformPei: determine the 64-bit PCI host aperture for X64 DXE
The main observation about the 64-bit PCI host aperture is that it is the
highest part of the useful address space. It impacts the top of the GCD
memory space map, and, consequently, our maximum address width calculation
for the CPU HOB too.
Thus, modify the GetFirstNonAddress() function to consider the following
areas above the high RAM, while calculating the first non-address (i.e.,
the highest inclusive address, plus one):
- the memory hotplug area (optional, the size comes from QEMU),
- the 64-bit PCI host aperture (we set a default size).
While computing the first non-address, capture the base and the size of
the 64-bit PCI host aperture at once in PCDs, since they are natural parts
of the calculation.
(Similarly to how PcdPciMmio32* are not rewritten on the S3 resume path
(see the InitializePlatform() -> MemMapInitialization() condition), nor
are PcdPciMmio64*. Only the core PciHostBridgeDxe driver consumes them,
through our PciHostBridgeLib instance.)
Set 32GB as the default size for the aperture. Issue#59 mentions the
NVIDIA Tesla K80 as an assignable device. According to nvidia.com, these
cards may have 24GB of memory (probably 16GB + 8GB BARs).
As a strictly experimental feature, the user can specify the size of the
aperture (in MB) as well, with the QEMU option
-fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=65536
The "X-" prefix follows the QEMU tradition (spelled "x-" there), meaning
that the property is experimental, unstable, and might go away any time.
Gerd has proposed heuristics for sizing the aperture automatically (based
on 1GB page support and PCPU address width), but such should be delayed to
a later patch (which may very well back out "X-PciMmio64Mb" then).
For "everyday" guests, the 32GB default for the aperture size shouldn't
impact the PEI memory demand (the size of the page tables that the DXE IPL
PEIM builds). Namely, we've never reported narrower than 36-bit addresses;
the DXE IPL PEIM has always built page tables for 64GB at least.
For the aperture to bump the address width above 36 bits, either the guest
must have quite a bit of memory itself (in which case the additional PEI
memory demand shouldn't matter), or the user must specify a large aperture
manually with "X-PciMmio64Mb" (and then he or she is also responsible for
giving enough RAM to the VM, to satisfy the PEI memory demand).
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Thomas Lamprecht <t.lamprecht@proxmox.com>
Ref: https://github.com/tianocore/edk2/issues/59
Ref: http://www.nvidia.com/object/tesla-servers.html
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-04 19:30:45 +01:00
|
|
|
EFI_STATUS Status;
|
|
|
|
FIRMWARE_CONFIG_ITEM FwCfgItem;
|
|
|
|
UINTN FwCfgSize;
|
|
|
|
UINT64 HotPlugMemoryEnd;
|
2016-10-21 11:59:36 +02:00
|
|
|
RETURN_STATUS PcdStatus;
|
2016-03-04 17:23:35 +01:00
|
|
|
|
OvmfPkg/PlatformPei: support >=1TB high RAM, and discontiguous high RAM
In OVMF we currently get the upper (>=4GB) memory size with the
GetSystemMemorySizeAbove4gb() function.
The GetSystemMemorySizeAbove4gb() function is used in two places:
(1) It is the starting point of the calculations in GetFirstNonAddress().
GetFirstNonAddress() in turn
- determines the placement of the 64-bit PCI MMIO aperture,
- provides input for the GCD memory space map's sizing (see
AddressWidthInitialization(), and the CPU HOB in
MiscInitialization()),
- influences the permanent PEI RAM cap (the DXE core's page tables,
built in permanent PEI RAM, grow as the RAM to map grows).
(2) In QemuInitializeRam(), GetSystemMemorySizeAbove4gb() determines the
single memory descriptor HOB that we produce for the upper memory.
Respectively, there are two problems with GetSystemMemorySizeAbove4gb():
(1) It reads a 24-bit count of 64KB RAM chunks from the CMOS, and
therefore cannot return a larger value than one terabyte.
(2) It cannot express discontiguous high RAM.
Starting with version 1.7.0, QEMU has provided the fw_cfg file called
"etc/e820". Refer to the following QEMU commits:
- 0624c7f916b4 ("e820: pass high memory too.", 2013-10-10),
- 7d67110f2d9a ("pc: add etc/e820 fw_cfg file", 2013-10-18)
- 7db16f2480db ("pc: register e820 entries for ram", 2013-10-10)
Ever since these commits in v1.7.0 -- with the last QEMU release being
v2.9.0, and v2.10.0 under development --, the only two RAM entries added
to this E820 map correspond to the below-4GB RAM range, and the above-4GB
RAM range. And, the above-4GB range exactly matches the CMOS registers in
question; see the use of "pcms->above_4g_mem_size":
pc_q35_init() | pc_init1()
pc_memory_init()
e820_add_entry(0x100000000ULL, pcms->above_4g_mem_size, E820_RAM);
pc_cmos_init()
val = pcms->above_4g_mem_size / 65536;
rtc_set_memory(s, 0x5b, val);
rtc_set_memory(s, 0x5c, val >> 8);
rtc_set_memory(s, 0x5d, val >> 16);
Therefore, remedy the above OVMF limitations as follows:
(1) Start off GetFirstNonAddress() by scanning the E820 map for the
highest exclusive >=4GB RAM address. Fall back to the CMOS if the E820
map is unavailable. Base all further calculations (such as 64-bit PCI
MMIO aperture placement, GCD sizing etc) on this value.
At the moment, the only difference this change makes is that we can
have more than 1TB above 4GB -- given that the sole "high RAM" entry
in the E820 map matches the CMOS exactly, modulo the most significant
bits (see above).
However, Igor plans to add discontiguous (cold-plugged) high RAM to
the fw_cfg E820 RAM map later on, and then this scanning will adapt
automatically.
(2) In QemuInitializeRam(), describe the high RAM regions from the E820
map one by one with memory HOBs. Fall back to the CMOS only if the
E820 map is missing.
Again, right now this change only makes a difference if there is at
least 1TB high RAM. Later on it will adapt to discontiguous high RAM
(regardless of its size) automatically.
-*-
Implementation details: introduce the ScanOrAdd64BitE820Ram() function,
which reads the E820 entries from fw_cfg, and finds the highest exclusive
>=4GB RAM address, or produces memory resource descriptor HOBs for RAM
entries that start at or above 4GB. The RAM map is not read in a single
go, because its size can vary, and in PlatformPei we should stay away from
dynamic memory allocation, for the following reasons:
- "Pool" allocations are limited to ~64KB, are served from HOBs, and
cannot be released ever.
- "Page" allocations are seriously limited before PlatformPei installs the
permanent PEI RAM. Furthermore, page allocations can only be released in
DXE, with dedicated code (so the address would have to be passed on with
a HOB or PCD).
- Raw memory allocation HOBs would require the same freeing in DXE.
Therefore we process each E820 entry as soon as it is read from fw_cfg.
-*-
Considering the impact of high RAM on the DXE core:
A few years ago, installing high RAM as *tested* would cause the DXE core
to inhabit such ranges rather than carving out its home from the permanent
PEI RAM. Fortunately, this was fixed in the following edk2 commit:
3a05b13106d1, "MdeModulePkg DxeCore: Take the range in resource HOB for
PHIT as higher priority", 2015-09-18
which I regression-tested at the time:
http://mid.mail-archive.com/55FC27B0.4070807@redhat.com
Later on, OVMF was changed to install its high RAM as tested (effectively
"arming" the earlier DXE core change for OVMF), in the following edk2
commit:
035ce3b37c90, "OvmfPkg/PlatformPei: Add memory above 4GB as tested",
2016-04-21
which I also regression-tested at the time:
http://mid.mail-archive.com/571E8B90.1020102@redhat.com
Therefore adding more "tested memory" HOBs is safe.
Cc: Jordan Justen <jordan.l.justen@intel.com>
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1468526
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-07-08 01:28:37 +02:00
|
|
|
//
|
|
|
|
// set FirstNonAddress to suppress incorrect compiler/analyzer warnings
|
|
|
|
//
|
|
|
|
FirstNonAddress = 0;
|
|
|
|
|
|
|
|
//
|
|
|
|
// If QEMU presents an E820 map, then get the highest exclusive >=4GB RAM
|
|
|
|
// address from it. This can express an address >= 4GB+1TB.
|
|
|
|
//
|
|
|
|
// Otherwise, get the flat size of the memory above 4GB from the CMOS (which
|
|
|
|
// can only express a size smaller than 1TB), and add it to 4GB.
|
|
|
|
//
|
|
|
|
Status = ScanOrAdd64BitE820Ram (&FirstNonAddress);
|
|
|
|
if (EFI_ERROR (Status)) {
|
|
|
|
FirstNonAddress = BASE_4GB + GetSystemMemorySizeAbove4gb ();
|
|
|
|
}
|
OvmfPkg: PlatformPei: determine the 64-bit PCI host aperture for X64 DXE
The main observation about the 64-bit PCI host aperture is that it is the
highest part of the useful address space. It impacts the top of the GCD
memory space map, and, consequently, our maximum address width calculation
for the CPU HOB too.
Thus, modify the GetFirstNonAddress() function to consider the following
areas above the high RAM, while calculating the first non-address (i.e.,
the highest inclusive address, plus one):
- the memory hotplug area (optional, the size comes from QEMU),
- the 64-bit PCI host aperture (we set a default size).
While computing the first non-address, capture the base and the size of
the 64-bit PCI host aperture at once in PCDs, since they are natural parts
of the calculation.
(Similarly to how PcdPciMmio32* are not rewritten on the S3 resume path
(see the InitializePlatform() -> MemMapInitialization() condition), nor
are PcdPciMmio64*. Only the core PciHostBridgeDxe driver consumes them,
through our PciHostBridgeLib instance.)
Set 32GB as the default size for the aperture. Issue#59 mentions the
NVIDIA Tesla K80 as an assignable device. According to nvidia.com, these
cards may have 24GB of memory (probably 16GB + 8GB BARs).
As a strictly experimental feature, the user can specify the size of the
aperture (in MB) as well, with the QEMU option
-fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=65536
The "X-" prefix follows the QEMU tradition (spelled "x-" there), meaning
that the property is experimental, unstable, and might go away any time.
Gerd has proposed heuristics for sizing the aperture automatically (based
on 1GB page support and PCPU address width), but such should be delayed to
a later patch (which may very well back out "X-PciMmio64Mb" then).
For "everyday" guests, the 32GB default for the aperture size shouldn't
impact the PEI memory demand (the size of the page tables that the DXE IPL
PEIM builds). Namely, we've never reported narrower than 36-bit addresses;
the DXE IPL PEIM has always built page tables for 64GB at least.
For the aperture to bump the address width above 36 bits, either the guest
must have quite a bit of memory itself (in which case the additional PEI
memory demand shouldn't matter), or the user must specify a large aperture
manually with "X-PciMmio64Mb" (and then he or she is also responsible for
giving enough RAM to the VM, to satisfy the PEI memory demand).
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Thomas Lamprecht <t.lamprecht@proxmox.com>
Ref: https://github.com/tianocore/edk2/issues/59
Ref: http://www.nvidia.com/object/tesla-servers.html
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-04 19:30:45 +01:00
|
|
|
|
|
|
|
//
|
|
|
|
// If DXE is 32-bit, then we're done; PciBusDxe will degrade 64-bit MMIO
|
|
|
|
// resources to 32-bit anyway. See DegradeResource() in
|
|
|
|
// "PciResourceSupport.c".
|
|
|
|
//
|
|
|
|
#ifdef MDE_CPU_IA32
|
|
|
|
if (!FeaturePcdGet (PcdDxeIplSwitchToLongMode)) {
|
|
|
|
return FirstNonAddress;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
|
|
|
//
|
|
|
|
// Otherwise, in order to calculate the highest address plus one, we must
|
|
|
|
// consider the 64-bit PCI host aperture too. Fetch the default size.
|
|
|
|
//
|
|
|
|
Pci64Size = PcdGet64 (PcdPciMmio64Size);
|
|
|
|
|
|
|
|
//
|
|
|
|
// See if the user specified the number of megabytes for the 64-bit PCI host
|
2020-04-24 09:53:48 +02:00
|
|
|
// aperture. Accept an aperture size up to 16TB.
|
OvmfPkg: PlatformPei: determine the 64-bit PCI host aperture for X64 DXE
The main observation about the 64-bit PCI host aperture is that it is the
highest part of the useful address space. It impacts the top of the GCD
memory space map, and, consequently, our maximum address width calculation
for the CPU HOB too.
Thus, modify the GetFirstNonAddress() function to consider the following
areas above the high RAM, while calculating the first non-address (i.e.,
the highest inclusive address, plus one):
- the memory hotplug area (optional, the size comes from QEMU),
- the 64-bit PCI host aperture (we set a default size).
While computing the first non-address, capture the base and the size of
the 64-bit PCI host aperture at once in PCDs, since they are natural parts
of the calculation.
(Similarly to how PcdPciMmio32* are not rewritten on the S3 resume path
(see the InitializePlatform() -> MemMapInitialization() condition), nor
are PcdPciMmio64*. Only the core PciHostBridgeDxe driver consumes them,
through our PciHostBridgeLib instance.)
Set 32GB as the default size for the aperture. Issue#59 mentions the
NVIDIA Tesla K80 as an assignable device. According to nvidia.com, these
cards may have 24GB of memory (probably 16GB + 8GB BARs).
As a strictly experimental feature, the user can specify the size of the
aperture (in MB) as well, with the QEMU option
-fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=65536
The "X-" prefix follows the QEMU tradition (spelled "x-" there), meaning
that the property is experimental, unstable, and might go away any time.
Gerd has proposed heuristics for sizing the aperture automatically (based
on 1GB page support and PCPU address width), but such should be delayed to
a later patch (which may very well back out "X-PciMmio64Mb" then).
For "everyday" guests, the 32GB default for the aperture size shouldn't
impact the PEI memory demand (the size of the page tables that the DXE IPL
PEIM builds). Namely, we've never reported narrower than 36-bit addresses;
the DXE IPL PEIM has always built page tables for 64GB at least.
For the aperture to bump the address width above 36 bits, either the guest
must have quite a bit of memory itself (in which case the additional PEI
memory demand shouldn't matter), or the user must specify a large aperture
manually with "X-PciMmio64Mb" (and then he or she is also responsible for
giving enough RAM to the VM, to satisfy the PEI memory demand).
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Thomas Lamprecht <t.lamprecht@proxmox.com>
Ref: https://github.com/tianocore/edk2/issues/59
Ref: http://www.nvidia.com/object/tesla-servers.html
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-04 19:30:45 +01:00
|
|
|
//
|
|
|
|
// As signaled by the "X-" prefix, this knob is experimental, and might go
|
|
|
|
// away at any time.
|
|
|
|
//
|
2020-04-24 09:53:48 +02:00
|
|
|
Status = QemuFwCfgParseUint32 ("opt/ovmf/X-PciMmio64Mb", FALSE,
|
|
|
|
&FwCfgPciMmio64Mb);
|
|
|
|
switch (Status) {
|
|
|
|
case EFI_UNSUPPORTED:
|
|
|
|
case EFI_NOT_FOUND:
|
|
|
|
break;
|
|
|
|
case EFI_SUCCESS:
|
|
|
|
if (FwCfgPciMmio64Mb <= 0x1000000) {
|
|
|
|
Pci64Size = LShiftU64 (FwCfgPciMmio64Mb, 20);
|
|
|
|
break;
|
OvmfPkg: PlatformPei: determine the 64-bit PCI host aperture for X64 DXE
The main observation about the 64-bit PCI host aperture is that it is the
highest part of the useful address space. It impacts the top of the GCD
memory space map, and, consequently, our maximum address width calculation
for the CPU HOB too.
Thus, modify the GetFirstNonAddress() function to consider the following
areas above the high RAM, while calculating the first non-address (i.e.,
the highest inclusive address, plus one):
- the memory hotplug area (optional, the size comes from QEMU),
- the 64-bit PCI host aperture (we set a default size).
While computing the first non-address, capture the base and the size of
the 64-bit PCI host aperture at once in PCDs, since they are natural parts
of the calculation.
(Similarly to how PcdPciMmio32* are not rewritten on the S3 resume path
(see the InitializePlatform() -> MemMapInitialization() condition), nor
are PcdPciMmio64*. Only the core PciHostBridgeDxe driver consumes them,
through our PciHostBridgeLib instance.)
Set 32GB as the default size for the aperture. Issue#59 mentions the
NVIDIA Tesla K80 as an assignable device. According to nvidia.com, these
cards may have 24GB of memory (probably 16GB + 8GB BARs).
As a strictly experimental feature, the user can specify the size of the
aperture (in MB) as well, with the QEMU option
-fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=65536
The "X-" prefix follows the QEMU tradition (spelled "x-" there), meaning
that the property is experimental, unstable, and might go away any time.
Gerd has proposed heuristics for sizing the aperture automatically (based
on 1GB page support and PCPU address width), but such should be delayed to
a later patch (which may very well back out "X-PciMmio64Mb" then).
For "everyday" guests, the 32GB default for the aperture size shouldn't
impact the PEI memory demand (the size of the page tables that the DXE IPL
PEIM builds). Namely, we've never reported narrower than 36-bit addresses;
the DXE IPL PEIM has always built page tables for 64GB at least.
For the aperture to bump the address width above 36 bits, either the guest
must have quite a bit of memory itself (in which case the additional PEI
memory demand shouldn't matter), or the user must specify a large aperture
manually with "X-PciMmio64Mb" (and then he or she is also responsible for
giving enough RAM to the VM, to satisfy the PEI memory demand).
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Thomas Lamprecht <t.lamprecht@proxmox.com>
Ref: https://github.com/tianocore/edk2/issues/59
Ref: http://www.nvidia.com/object/tesla-servers.html
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-04 19:30:45 +01:00
|
|
|
}
|
2020-04-24 09:53:48 +02:00
|
|
|
//
|
|
|
|
// fall through
|
|
|
|
//
|
|
|
|
default:
|
|
|
|
DEBUG ((DEBUG_WARN,
|
|
|
|
"%a: ignoring malformed 64-bit PCI host aperture size from fw_cfg\n",
|
|
|
|
__FUNCTION__));
|
|
|
|
break;
|
OvmfPkg: PlatformPei: determine the 64-bit PCI host aperture for X64 DXE
The main observation about the 64-bit PCI host aperture is that it is the
highest part of the useful address space. It impacts the top of the GCD
memory space map, and, consequently, our maximum address width calculation
for the CPU HOB too.
Thus, modify the GetFirstNonAddress() function to consider the following
areas above the high RAM, while calculating the first non-address (i.e.,
the highest inclusive address, plus one):
- the memory hotplug area (optional, the size comes from QEMU),
- the 64-bit PCI host aperture (we set a default size).
While computing the first non-address, capture the base and the size of
the 64-bit PCI host aperture at once in PCDs, since they are natural parts
of the calculation.
(Similarly to how PcdPciMmio32* are not rewritten on the S3 resume path
(see the InitializePlatform() -> MemMapInitialization() condition), nor
are PcdPciMmio64*. Only the core PciHostBridgeDxe driver consumes them,
through our PciHostBridgeLib instance.)
Set 32GB as the default size for the aperture. Issue#59 mentions the
NVIDIA Tesla K80 as an assignable device. According to nvidia.com, these
cards may have 24GB of memory (probably 16GB + 8GB BARs).
As a strictly experimental feature, the user can specify the size of the
aperture (in MB) as well, with the QEMU option
-fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=65536
The "X-" prefix follows the QEMU tradition (spelled "x-" there), meaning
that the property is experimental, unstable, and might go away any time.
Gerd has proposed heuristics for sizing the aperture automatically (based
on 1GB page support and PCPU address width), but such should be delayed to
a later patch (which may very well back out "X-PciMmio64Mb" then).
For "everyday" guests, the 32GB default for the aperture size shouldn't
impact the PEI memory demand (the size of the page tables that the DXE IPL
PEIM builds). Namely, we've never reported narrower than 36-bit addresses;
the DXE IPL PEIM has always built page tables for 64GB at least.
For the aperture to bump the address width above 36 bits, either the guest
must have quite a bit of memory itself (in which case the additional PEI
memory demand shouldn't matter), or the user must specify a large aperture
manually with "X-PciMmio64Mb" (and then he or she is also responsible for
giving enough RAM to the VM, to satisfy the PEI memory demand).
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Thomas Lamprecht <t.lamprecht@proxmox.com>
Ref: https://github.com/tianocore/edk2/issues/59
Ref: http://www.nvidia.com/object/tesla-servers.html
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-04 19:30:45 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
if (Pci64Size == 0) {
|
|
|
|
if (mBootMode != BOOT_ON_S3_RESUME) {
|
2020-04-29 23:53:27 +02:00
|
|
|
DEBUG ((DEBUG_INFO, "%a: disabling 64-bit PCI host aperture\n",
|
OvmfPkg: PlatformPei: determine the 64-bit PCI host aperture for X64 DXE
The main observation about the 64-bit PCI host aperture is that it is the
highest part of the useful address space. It impacts the top of the GCD
memory space map, and, consequently, our maximum address width calculation
for the CPU HOB too.
Thus, modify the GetFirstNonAddress() function to consider the following
areas above the high RAM, while calculating the first non-address (i.e.,
the highest inclusive address, plus one):
- the memory hotplug area (optional, the size comes from QEMU),
- the 64-bit PCI host aperture (we set a default size).
While computing the first non-address, capture the base and the size of
the 64-bit PCI host aperture at once in PCDs, since they are natural parts
of the calculation.
(Similarly to how PcdPciMmio32* are not rewritten on the S3 resume path
(see the InitializePlatform() -> MemMapInitialization() condition), nor
are PcdPciMmio64*. Only the core PciHostBridgeDxe driver consumes them,
through our PciHostBridgeLib instance.)
Set 32GB as the default size for the aperture. Issue#59 mentions the
NVIDIA Tesla K80 as an assignable device. According to nvidia.com, these
cards may have 24GB of memory (probably 16GB + 8GB BARs).
As a strictly experimental feature, the user can specify the size of the
aperture (in MB) as well, with the QEMU option
-fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=65536
The "X-" prefix follows the QEMU tradition (spelled "x-" there), meaning
that the property is experimental, unstable, and might go away any time.
Gerd has proposed heuristics for sizing the aperture automatically (based
on 1GB page support and PCPU address width), but such should be delayed to
a later patch (which may very well back out "X-PciMmio64Mb" then).
For "everyday" guests, the 32GB default for the aperture size shouldn't
impact the PEI memory demand (the size of the page tables that the DXE IPL
PEIM builds). Namely, we've never reported narrower than 36-bit addresses;
the DXE IPL PEIM has always built page tables for 64GB at least.
For the aperture to bump the address width above 36 bits, either the guest
must have quite a bit of memory itself (in which case the additional PEI
memory demand shouldn't matter), or the user must specify a large aperture
manually with "X-PciMmio64Mb" (and then he or she is also responsible for
giving enough RAM to the VM, to satisfy the PEI memory demand).
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Thomas Lamprecht <t.lamprecht@proxmox.com>
Ref: https://github.com/tianocore/edk2/issues/59
Ref: http://www.nvidia.com/object/tesla-servers.html
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-04 19:30:45 +01:00
|
|
|
__FUNCTION__));
|
2016-10-21 11:59:36 +02:00
|
|
|
PcdStatus = PcdSet64S (PcdPciMmio64Size, 0);
|
|
|
|
ASSERT_RETURN_ERROR (PcdStatus);
|
OvmfPkg: PlatformPei: determine the 64-bit PCI host aperture for X64 DXE
The main observation about the 64-bit PCI host aperture is that it is the
highest part of the useful address space. It impacts the top of the GCD
memory space map, and, consequently, our maximum address width calculation
for the CPU HOB too.
Thus, modify the GetFirstNonAddress() function to consider the following
areas above the high RAM, while calculating the first non-address (i.e.,
the highest inclusive address, plus one):
- the memory hotplug area (optional, the size comes from QEMU),
- the 64-bit PCI host aperture (we set a default size).
While computing the first non-address, capture the base and the size of
the 64-bit PCI host aperture at once in PCDs, since they are natural parts
of the calculation.
(Similarly to how PcdPciMmio32* are not rewritten on the S3 resume path
(see the InitializePlatform() -> MemMapInitialization() condition), nor
are PcdPciMmio64*. Only the core PciHostBridgeDxe driver consumes them,
through our PciHostBridgeLib instance.)
Set 32GB as the default size for the aperture. Issue#59 mentions the
NVIDIA Tesla K80 as an assignable device. According to nvidia.com, these
cards may have 24GB of memory (probably 16GB + 8GB BARs).
As a strictly experimental feature, the user can specify the size of the
aperture (in MB) as well, with the QEMU option
-fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=65536
The "X-" prefix follows the QEMU tradition (spelled "x-" there), meaning
that the property is experimental, unstable, and might go away any time.
Gerd has proposed heuristics for sizing the aperture automatically (based
on 1GB page support and PCPU address width), but such should be delayed to
a later patch (which may very well back out "X-PciMmio64Mb" then).
For "everyday" guests, the 32GB default for the aperture size shouldn't
impact the PEI memory demand (the size of the page tables that the DXE IPL
PEIM builds). Namely, we've never reported narrower than 36-bit addresses;
the DXE IPL PEIM has always built page tables for 64GB at least.
For the aperture to bump the address width above 36 bits, either the guest
must have quite a bit of memory itself (in which case the additional PEI
memory demand shouldn't matter), or the user must specify a large aperture
manually with "X-PciMmio64Mb" (and then he or she is also responsible for
giving enough RAM to the VM, to satisfy the PEI memory demand).
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Thomas Lamprecht <t.lamprecht@proxmox.com>
Ref: https://github.com/tianocore/edk2/issues/59
Ref: http://www.nvidia.com/object/tesla-servers.html
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-04 19:30:45 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
//
|
|
|
|
// There's nothing more to do; the amount of memory above 4GB fully
|
|
|
|
// determines the highest address plus one. The memory hotplug area (see
|
|
|
|
// below) plays no role for the firmware in this case.
|
|
|
|
//
|
|
|
|
return FirstNonAddress;
|
|
|
|
}
|
|
|
|
|
|
|
|
//
|
|
|
|
// The "etc/reserved-memory-end" fw_cfg file, when present, contains an
|
|
|
|
// absolute, exclusive end address for the memory hotplug area. This area
|
|
|
|
// starts right at the end of the memory above 4GB. The 64-bit PCI host
|
|
|
|
// aperture must be placed above it.
|
|
|
|
//
|
|
|
|
Status = QemuFwCfgFindFile ("etc/reserved-memory-end", &FwCfgItem,
|
|
|
|
&FwCfgSize);
|
|
|
|
if (!EFI_ERROR (Status) && FwCfgSize == sizeof HotPlugMemoryEnd) {
|
|
|
|
QemuFwCfgSelectItem (FwCfgItem);
|
|
|
|
QemuFwCfgReadBytes (FwCfgSize, &HotPlugMemoryEnd);
|
2018-03-28 13:26:35 +02:00
|
|
|
DEBUG ((DEBUG_VERBOSE, "%a: HotPlugMemoryEnd=0x%Lx\n", __FUNCTION__,
|
|
|
|
HotPlugMemoryEnd));
|
OvmfPkg: PlatformPei: determine the 64-bit PCI host aperture for X64 DXE
The main observation about the 64-bit PCI host aperture is that it is the
highest part of the useful address space. It impacts the top of the GCD
memory space map, and, consequently, our maximum address width calculation
for the CPU HOB too.
Thus, modify the GetFirstNonAddress() function to consider the following
areas above the high RAM, while calculating the first non-address (i.e.,
the highest inclusive address, plus one):
- the memory hotplug area (optional, the size comes from QEMU),
- the 64-bit PCI host aperture (we set a default size).
While computing the first non-address, capture the base and the size of
the 64-bit PCI host aperture at once in PCDs, since they are natural parts
of the calculation.
(Similarly to how PcdPciMmio32* are not rewritten on the S3 resume path
(see the InitializePlatform() -> MemMapInitialization() condition), nor
are PcdPciMmio64*. Only the core PciHostBridgeDxe driver consumes them,
through our PciHostBridgeLib instance.)
Set 32GB as the default size for the aperture. Issue#59 mentions the
NVIDIA Tesla K80 as an assignable device. According to nvidia.com, these
cards may have 24GB of memory (probably 16GB + 8GB BARs).
As a strictly experimental feature, the user can specify the size of the
aperture (in MB) as well, with the QEMU option
-fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=65536
The "X-" prefix follows the QEMU tradition (spelled "x-" there), meaning
that the property is experimental, unstable, and might go away any time.
Gerd has proposed heuristics for sizing the aperture automatically (based
on 1GB page support and PCPU address width), but such should be delayed to
a later patch (which may very well back out "X-PciMmio64Mb" then).
For "everyday" guests, the 32GB default for the aperture size shouldn't
impact the PEI memory demand (the size of the page tables that the DXE IPL
PEIM builds). Namely, we've never reported narrower than 36-bit addresses;
the DXE IPL PEIM has always built page tables for 64GB at least.
For the aperture to bump the address width above 36 bits, either the guest
must have quite a bit of memory itself (in which case the additional PEI
memory demand shouldn't matter), or the user must specify a large aperture
manually with "X-PciMmio64Mb" (and then he or she is also responsible for
giving enough RAM to the VM, to satisfy the PEI memory demand).
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Thomas Lamprecht <t.lamprecht@proxmox.com>
Ref: https://github.com/tianocore/edk2/issues/59
Ref: http://www.nvidia.com/object/tesla-servers.html
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-04 19:30:45 +01:00
|
|
|
|
|
|
|
ASSERT (HotPlugMemoryEnd >= FirstNonAddress);
|
|
|
|
FirstNonAddress = HotPlugMemoryEnd;
|
|
|
|
}
|
|
|
|
|
|
|
|
//
|
|
|
|
// SeaBIOS aligns both boundaries of the 64-bit PCI host aperture to 1GB, so
|
|
|
|
// that the host can map it with 1GB hugepages. Follow suit.
|
|
|
|
//
|
|
|
|
Pci64Base = ALIGN_VALUE (FirstNonAddress, (UINT64)SIZE_1GB);
|
|
|
|
Pci64Size = ALIGN_VALUE (Pci64Size, (UINT64)SIZE_1GB);
|
|
|
|
|
|
|
|
//
|
|
|
|
// The 64-bit PCI host aperture should also be "naturally" aligned. The
|
|
|
|
// alignment is determined by rounding the size of the aperture down to the
|
|
|
|
// next smaller or equal power of two. That is, align the aperture by the
|
|
|
|
// largest BAR size that can fit into it.
|
|
|
|
//
|
|
|
|
Pci64Base = ALIGN_VALUE (Pci64Base, GetPowerOfTwo64 (Pci64Size));
|
|
|
|
|
|
|
|
if (mBootMode != BOOT_ON_S3_RESUME) {
|
|
|
|
//
|
|
|
|
// The core PciHostBridgeDxe driver will automatically add this range to
|
|
|
|
// the GCD memory space map through our PciHostBridgeLib instance; here we
|
|
|
|
// only need to set the PCDs.
|
|
|
|
//
|
2016-10-21 11:59:36 +02:00
|
|
|
PcdStatus = PcdSet64S (PcdPciMmio64Base, Pci64Base);
|
|
|
|
ASSERT_RETURN_ERROR (PcdStatus);
|
|
|
|
PcdStatus = PcdSet64S (PcdPciMmio64Size, Pci64Size);
|
|
|
|
ASSERT_RETURN_ERROR (PcdStatus);
|
|
|
|
|
2020-04-29 23:53:27 +02:00
|
|
|
DEBUG ((DEBUG_INFO, "%a: Pci64Base=0x%Lx Pci64Size=0x%Lx\n",
|
OvmfPkg: PlatformPei: determine the 64-bit PCI host aperture for X64 DXE
The main observation about the 64-bit PCI host aperture is that it is the
highest part of the useful address space. It impacts the top of the GCD
memory space map, and, consequently, our maximum address width calculation
for the CPU HOB too.
Thus, modify the GetFirstNonAddress() function to consider the following
areas above the high RAM, while calculating the first non-address (i.e.,
the highest inclusive address, plus one):
- the memory hotplug area (optional, the size comes from QEMU),
- the 64-bit PCI host aperture (we set a default size).
While computing the first non-address, capture the base and the size of
the 64-bit PCI host aperture at once in PCDs, since they are natural parts
of the calculation.
(Similarly to how PcdPciMmio32* are not rewritten on the S3 resume path
(see the InitializePlatform() -> MemMapInitialization() condition), nor
are PcdPciMmio64*. Only the core PciHostBridgeDxe driver consumes them,
through our PciHostBridgeLib instance.)
Set 32GB as the default size for the aperture. Issue#59 mentions the
NVIDIA Tesla K80 as an assignable device. According to nvidia.com, these
cards may have 24GB of memory (probably 16GB + 8GB BARs).
As a strictly experimental feature, the user can specify the size of the
aperture (in MB) as well, with the QEMU option
-fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=65536
The "X-" prefix follows the QEMU tradition (spelled "x-" there), meaning
that the property is experimental, unstable, and might go away any time.
Gerd has proposed heuristics for sizing the aperture automatically (based
on 1GB page support and PCPU address width), but such should be delayed to
a later patch (which may very well back out "X-PciMmio64Mb" then).
For "everyday" guests, the 32GB default for the aperture size shouldn't
impact the PEI memory demand (the size of the page tables that the DXE IPL
PEIM builds). Namely, we've never reported narrower than 36-bit addresses;
the DXE IPL PEIM has always built page tables for 64GB at least.
For the aperture to bump the address width above 36 bits, either the guest
must have quite a bit of memory itself (in which case the additional PEI
memory demand shouldn't matter), or the user must specify a large aperture
manually with "X-PciMmio64Mb" (and then he or she is also responsible for
giving enough RAM to the VM, to satisfy the PEI memory demand).
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Thomas Lamprecht <t.lamprecht@proxmox.com>
Ref: https://github.com/tianocore/edk2/issues/59
Ref: http://www.nvidia.com/object/tesla-servers.html
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-04 19:30:45 +01:00
|
|
|
__FUNCTION__, Pci64Base, Pci64Size));
|
|
|
|
}
|
|
|
|
|
|
|
|
//
|
|
|
|
// The useful address space ends with the 64-bit PCI host aperture.
|
|
|
|
//
|
|
|
|
FirstNonAddress = Pci64Base + Pci64Size;
|
2016-03-04 17:23:35 +01:00
|
|
|
return FirstNonAddress;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2015-06-26 18:09:39 +02:00
|
|
|
/**
|
|
|
|
Initialize the mPhysMemAddressWidth variable, based on guest RAM size.
|
|
|
|
**/
|
|
|
|
VOID
|
|
|
|
AddressWidthInitialization (
|
|
|
|
VOID
|
|
|
|
)
|
|
|
|
{
|
|
|
|
UINT64 FirstNonAddress;
|
|
|
|
|
|
|
|
//
|
|
|
|
// As guest-physical memory size grows, the permanent PEI RAM requirements
|
|
|
|
// are dominated by the identity-mapping page tables built by the DXE IPL.
|
|
|
|
// The DXL IPL keys off of the physical address bits advertized in the CPU
|
|
|
|
// HOB. To conserve memory, we calculate the minimum address width here.
|
|
|
|
//
|
2016-03-04 17:23:35 +01:00
|
|
|
FirstNonAddress = GetFirstNonAddress ();
|
2015-06-26 18:09:39 +02:00
|
|
|
mPhysMemAddressWidth = (UINT8)HighBitSet64 (FirstNonAddress);
|
|
|
|
|
|
|
|
//
|
|
|
|
// If FirstNonAddress is not an integral power of two, then we need an
|
|
|
|
// additional bit.
|
|
|
|
//
|
|
|
|
if ((FirstNonAddress & (FirstNonAddress - 1)) != 0) {
|
|
|
|
++mPhysMemAddressWidth;
|
|
|
|
}
|
|
|
|
|
|
|
|
//
|
|
|
|
// The minimum address width is 36 (covers up to and excluding 64 GB, which
|
|
|
|
// is the maximum for Ia32 + PAE). The theoretical architecture maximum for
|
|
|
|
// X64 long mode is 52 bits, but the DXE IPL clamps that down to 48 bits. We
|
|
|
|
// can simply assert that here, since 48 bits are good enough for 256 TB.
|
|
|
|
//
|
|
|
|
if (mPhysMemAddressWidth <= 36) {
|
|
|
|
mPhysMemAddressWidth = 36;
|
|
|
|
}
|
|
|
|
ASSERT (mPhysMemAddressWidth <= 48);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
Calculate the cap for the permanent PEI memory.
|
|
|
|
**/
|
|
|
|
STATIC
|
|
|
|
UINT32
|
|
|
|
GetPeiMemoryCap (
|
|
|
|
VOID
|
|
|
|
)
|
|
|
|
{
|
|
|
|
BOOLEAN Page1GSupport;
|
|
|
|
UINT32 RegEax;
|
|
|
|
UINT32 RegEdx;
|
|
|
|
UINT32 Pml4Entries;
|
|
|
|
UINT32 PdpEntries;
|
|
|
|
UINTN TotalPages;
|
|
|
|
|
|
|
|
//
|
|
|
|
// If DXE is 32-bit, then just return the traditional 64 MB cap.
|
|
|
|
//
|
|
|
|
#ifdef MDE_CPU_IA32
|
|
|
|
if (!FeaturePcdGet (PcdDxeIplSwitchToLongMode)) {
|
|
|
|
return SIZE_64MB;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
|
|
|
//
|
|
|
|
// Dependent on physical address width, PEI memory allocations can be
|
|
|
|
// dominated by the page tables built for 64-bit DXE. So we key the cap off
|
|
|
|
// of those. The code below is based on CreateIdentityMappingPageTables() in
|
|
|
|
// "MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c".
|
|
|
|
//
|
|
|
|
Page1GSupport = FALSE;
|
|
|
|
if (PcdGetBool (PcdUse1GPageTable)) {
|
|
|
|
AsmCpuid (0x80000000, &RegEax, NULL, NULL, NULL);
|
|
|
|
if (RegEax >= 0x80000001) {
|
|
|
|
AsmCpuid (0x80000001, NULL, NULL, NULL, &RegEdx);
|
|
|
|
if ((RegEdx & BIT26) != 0) {
|
|
|
|
Page1GSupport = TRUE;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (mPhysMemAddressWidth <= 39) {
|
|
|
|
Pml4Entries = 1;
|
|
|
|
PdpEntries = 1 << (mPhysMemAddressWidth - 30);
|
|
|
|
ASSERT (PdpEntries <= 0x200);
|
|
|
|
} else {
|
|
|
|
Pml4Entries = 1 << (mPhysMemAddressWidth - 39);
|
|
|
|
ASSERT (Pml4Entries <= 0x200);
|
|
|
|
PdpEntries = 512;
|
|
|
|
}
|
|
|
|
|
|
|
|
TotalPages = Page1GSupport ? Pml4Entries + 1 :
|
|
|
|
(PdpEntries + 1) * Pml4Entries + 1;
|
|
|
|
ASSERT (TotalPages <= 0x40201);
|
|
|
|
|
|
|
|
//
|
|
|
|
// Add 64 MB for miscellaneous allocations. Note that for
|
|
|
|
// mPhysMemAddressWidth values close to 36, the cap will actually be
|
|
|
|
// dominated by this increment.
|
|
|
|
//
|
|
|
|
return (UINT32)(EFI_PAGES_TO_SIZE (TotalPages) + SIZE_64MB);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2013-12-08 02:36:07 +01:00
|
|
|
/**
|
|
|
|
Publish PEI core memory
|
|
|
|
|
|
|
|
@return EFI_SUCCESS The PEIM initialized successfully.
|
|
|
|
|
|
|
|
**/
|
|
|
|
EFI_STATUS
|
|
|
|
PublishPeiMemory (
|
|
|
|
VOID
|
|
|
|
)
|
|
|
|
{
|
|
|
|
EFI_STATUS Status;
|
|
|
|
EFI_PHYSICAL_ADDRESS MemoryBase;
|
|
|
|
UINT64 MemorySize;
|
2016-07-14 17:59:44 +02:00
|
|
|
UINT32 LowerMemorySize;
|
2015-06-26 18:09:39 +02:00
|
|
|
UINT32 PeiMemoryCap;
|
2013-12-08 02:36:07 +01:00
|
|
|
|
2016-07-13 00:52:54 +02:00
|
|
|
LowerMemorySize = GetSystemMemorySizeBelow4gb ();
|
|
|
|
if (FeaturePcdGet (PcdSmmSmramRequire)) {
|
|
|
|
//
|
|
|
|
// TSEG is chipped from the end of low RAM
|
|
|
|
//
|
2017-07-04 12:44:05 +02:00
|
|
|
LowerMemorySize -= mQ35TsegMbytes * SIZE_1MB;
|
2016-07-13 00:52:54 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
//
|
|
|
|
// If S3 is supported, then the S3 permanent PEI memory is placed next,
|
|
|
|
// downwards. Its size is primarily dictated by CpuMpPei. The formula below
|
|
|
|
// is an approximation.
|
|
|
|
//
|
|
|
|
if (mS3Supported) {
|
|
|
|
mS3AcpiReservedMemorySize = SIZE_512KB +
|
2016-11-24 15:18:44 +01:00
|
|
|
mMaxCpuCount *
|
2016-07-13 00:52:54 +02:00
|
|
|
PcdGet32 (PcdCpuApStackSize);
|
|
|
|
mS3AcpiReservedMemoryBase = LowerMemorySize - mS3AcpiReservedMemorySize;
|
|
|
|
LowerMemorySize = mS3AcpiReservedMemoryBase;
|
|
|
|
}
|
|
|
|
|
2014-03-04 09:02:16 +01:00
|
|
|
if (mBootMode == BOOT_ON_S3_RESUME) {
|
2016-07-13 00:52:54 +02:00
|
|
|
MemoryBase = mS3AcpiReservedMemoryBase;
|
|
|
|
MemorySize = mS3AcpiReservedMemorySize;
|
2014-03-04 09:02:16 +01:00
|
|
|
} else {
|
2015-06-26 18:09:39 +02:00
|
|
|
PeiMemoryCap = GetPeiMemoryCap ();
|
2020-04-29 23:53:27 +02:00
|
|
|
DEBUG ((DEBUG_INFO, "%a: mPhysMemAddressWidth=%d PeiMemoryCap=%u KB\n",
|
2015-06-26 18:09:39 +02:00
|
|
|
__FUNCTION__, mPhysMemAddressWidth, PeiMemoryCap >> 10));
|
|
|
|
|
2014-03-04 09:02:16 +01:00
|
|
|
//
|
|
|
|
// Determine the range of memory to use during PEI
|
|
|
|
//
|
OvmfPkg: decompress FVs on S3 resume if SMM_REQUIRE is set
If OVMF was built with -D SMM_REQUIRE, that implies that the runtime OS is
not trusted and we should defend against it tampering with the firmware's
data.
One such datum is the PEI firmware volume (PEIFV). Normally PEIFV is
decompressed on the first boot by SEC, then the OS preserves it across S3
suspend-resume cycles; at S3 resume SEC just reuses the originally
decompressed PEIFV.
However, if we don't trust the OS, then SEC must decompress PEIFV from the
pristine flash every time, lest we execute OS-injected code or work with
OS-injected data.
Due to how FVMAIN_COMPACT is organized, we can't decompress just PEIFV;
the decompression brings DXEFV with itself, plus it uses a temporary
output buffer and a scratch buffer too, which even reach above the end of
the finally installed DXEFV. For this reason we must keep away a
non-malicious OS from DXEFV too, plus the memory up to
PcdOvmfDecomprScratchEnd.
The delay introduced by the LZMA decompression on S3 resume is negligible.
If -D SMM_REQUIRE is not specified, then PcdSmmSmramRequire remains FALSE
(from the DEC file), and then this patch has no effect (not counting some
changed debug messages).
If QEMU doesn't support S3 (or the user disabled it on the QEMU command
line), then this patch has no effect also.
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
git-svn-id: https://svn.code.sf.net/p/edk2/code/trunk/edk2@19037 6f19259b-4bc3-4df7-8a09-765794883524
2015-11-30 19:41:24 +01:00
|
|
|
// Technically we could lay the permanent PEI RAM over SEC's temporary
|
|
|
|
// decompression and scratch buffer even if "secure S3" is needed, since
|
|
|
|
// their lifetimes don't overlap. However, PeiFvInitialization() will cover
|
|
|
|
// RAM up to PcdOvmfDecompressionScratchEnd with an EfiACPIMemoryNVS memory
|
|
|
|
// allocation HOB, and other allocations served from the permanent PEI RAM
|
|
|
|
// shouldn't overlap with that HOB.
|
|
|
|
//
|
|
|
|
MemoryBase = mS3Supported && FeaturePcdGet (PcdSmmSmramRequire) ?
|
|
|
|
PcdGet32 (PcdOvmfDecompressionScratchEnd) :
|
|
|
|
PcdGet32 (PcdOvmfDxeMemFvBase) + PcdGet32 (PcdOvmfDxeMemFvSize);
|
2014-03-04 09:02:16 +01:00
|
|
|
MemorySize = LowerMemorySize - MemoryBase;
|
2015-06-26 18:09:39 +02:00
|
|
|
if (MemorySize > PeiMemoryCap) {
|
|
|
|
MemoryBase = LowerMemorySize - PeiMemoryCap;
|
|
|
|
MemorySize = PeiMemoryCap;
|
2014-03-04 09:02:16 +01:00
|
|
|
}
|
2013-12-08 02:36:07 +01:00
|
|
|
}
|
|
|
|
|
2019-09-20 17:07:43 +02:00
|
|
|
//
|
|
|
|
// MEMFD_BASE_ADDRESS separates the SMRAM at the default SMBASE from the
|
|
|
|
// normal boot permanent PEI RAM. Regarding the S3 boot path, the S3
|
|
|
|
// permanent PEI RAM is located even higher.
|
|
|
|
//
|
|
|
|
if (FeaturePcdGet (PcdSmmSmramRequire) && mQ35SmramAtDefaultSmbase) {
|
|
|
|
ASSERT (SMM_DEFAULT_SMBASE + MCH_DEFAULT_SMBASE_SIZE <= MemoryBase);
|
|
|
|
}
|
|
|
|
|
2013-12-08 02:36:07 +01:00
|
|
|
//
|
|
|
|
// Publish this memory to the PEI Core
|
|
|
|
//
|
|
|
|
Status = PublishSystemMemory(MemoryBase, MemorySize);
|
|
|
|
ASSERT_EFI_ERROR (Status);
|
|
|
|
|
|
|
|
return Status;
|
|
|
|
}
|
|
|
|
|
2011-01-21 17:51:00 +01:00
|
|
|
|
OvmfPkg/PlatformPei: reserve the SMRAM at the default SMBASE, if it exists
The 128KB SMRAM at the default SMBASE will be used for protecting the
initial SMI handler for hot-plugged VCPUs. After platform reset, the SMRAM
in question is open (and looks just like RAM). When BDS signals
EFI_DXE_MM_READY_TO_LOCK_PROTOCOL (and so TSEG is locked down), we're
going to lock the SMRAM at the default SMBASE too.
For this, we have to reserve said SMRAM area as early as possible, from
components in PEI, DXE, and OS runtime.
* QemuInitializeRam() currently produces a single resource descriptor HOB,
for exposing the system RAM available under 1GB. This occurs during both
normal boot and S3 resume identically (the latter only for the sake of
CpuMpPei borrowing low RAM for the AP startup vector).
But, the SMRAM at the default SMBASE falls in the middle of the current
system RAM HOB. Split the HOB, and cover the SMRAM with a reserved
memory HOB in the middle. CpuMpPei (via MpInitLib) skips reserved memory
HOBs.
* InitializeRamRegions() is responsible for producing memory allocation
HOBs, carving out parts of the resource descriptor HOBs produced in
QemuInitializeRam(). Allocate the above-introduced reserved memory
region in full, similarly to how we treat TSEG, so that DXE and the OS
avoid the locked SMRAM (black hole) in this area.
(Note that these allocations only occur on the normal boot path, as they
matter for the UEFI memory map, which cannot be changed during S3
resume.)
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1512
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Jiewen Yao <jiewen.yao@intel.com>
Message-Id: <20200129214412.2361-8-lersek@redhat.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
2019-09-20 15:12:27 +02:00
|
|
|
STATIC
|
|
|
|
VOID
|
|
|
|
QemuInitializeRamBelow1gb (
|
|
|
|
VOID
|
|
|
|
)
|
|
|
|
{
|
|
|
|
if (FeaturePcdGet (PcdSmmSmramRequire) && mQ35SmramAtDefaultSmbase) {
|
|
|
|
AddMemoryRangeHob (0, SMM_DEFAULT_SMBASE);
|
|
|
|
AddReservedMemoryBaseSizeHob (SMM_DEFAULT_SMBASE, MCH_DEFAULT_SMBASE_SIZE,
|
|
|
|
TRUE /* Cacheable */);
|
|
|
|
STATIC_ASSERT (
|
|
|
|
SMM_DEFAULT_SMBASE + MCH_DEFAULT_SMBASE_SIZE < BASE_512KB + BASE_128KB,
|
|
|
|
"end of SMRAM at default SMBASE ends at, or exceeds, 640KB"
|
|
|
|
);
|
|
|
|
AddMemoryRangeHob (SMM_DEFAULT_SMBASE + MCH_DEFAULT_SMBASE_SIZE,
|
|
|
|
BASE_512KB + BASE_128KB);
|
|
|
|
} else {
|
|
|
|
AddMemoryRangeHob (0, BASE_512KB + BASE_128KB);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2009-05-27 23:10:18 +02:00
|
|
|
/**
|
2014-02-01 22:22:48 +01:00
|
|
|
Peform Memory Detection for QEMU / KVM
|
2009-05-27 23:10:18 +02:00
|
|
|
|
|
|
|
**/
|
2014-02-01 22:22:48 +01:00
|
|
|
STATIC
|
|
|
|
VOID
|
|
|
|
QemuInitializeRam (
|
|
|
|
VOID
|
2009-05-27 23:10:18 +02:00
|
|
|
)
|
|
|
|
{
|
2011-01-21 17:51:00 +01:00
|
|
|
UINT64 LowerMemorySize;
|
|
|
|
UINT64 UpperMemorySize;
|
OvmfPkg: PlatformPei: invert MTRR setup in QemuInitializeRam()
At the moment we work with a UC default MTRR type, and set three memory
ranges to WB:
- [0, 640 KB),
- [1 MB, LowerMemorySize),
- [4 GB, 4 GB + UpperMemorySize).
Unfortunately, coverage for the third range can fail with a high
likelihood. If the alignment of the base (ie. 4 GB) and the alignment of
the size (UpperMemorySize) differ, then MtrrLib creates a series of
variable MTRR entries, with power-of-two sized MTRR masks. And, it's
really easy to run out of variable MTRR entries, dependent on the
alignment difference.
This is a problem because a Linux guest will loudly reject any high memory
that is not covered my MTRR.
So, let's follow the inverse pattern (loosely inspired by SeaBIOS):
- flip the MTRR default type to WB,
- set [0, 640 KB) to WB -- fixed MTRRs have precedence over the default
type and variable MTRRs, so we can't avoid this,
- set [640 KB, 1 MB) to UC -- implemented with fixed MTRRs,
- set [LowerMemorySize, 4 GB) to UC -- should succeed with variable MTRRs
more likely than the other scheme (due to less chaotic alignment
differences).
Effects of this patch can be observed by setting DEBUG_CACHE (0x00200000)
in PcdDebugPrintErrorLevel.
Cc: Maoming <maoming.maoming@huawei.com>
Cc: Huangpeng (Peter) <peter.huangpeng@huawei.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Tested-by: Maoming <maoming.maoming@huawei.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
git-svn-id: https://svn.code.sf.net/p/edk2/code/trunk/edk2@17722 6f19259b-4bc3-4df7-8a09-765794883524
2015-06-26 18:09:52 +02:00
|
|
|
MTRR_SETTINGS MtrrSettings;
|
|
|
|
EFI_STATUS Status;
|
2009-05-27 23:10:18 +02:00
|
|
|
|
2020-04-29 23:53:27 +02:00
|
|
|
DEBUG ((DEBUG_INFO, "%a called\n", __FUNCTION__));
|
2009-05-27 23:10:18 +02:00
|
|
|
|
|
|
|
//
|
|
|
|
// Determine total memory size available
|
|
|
|
//
|
2011-01-21 17:51:00 +01:00
|
|
|
LowerMemorySize = GetSystemMemorySizeBelow4gb ();
|
|
|
|
UpperMemorySize = GetSystemMemorySizeAbove4gb ();
|
2009-05-27 23:10:18 +02:00
|
|
|
|
2016-07-07 09:29:12 +02:00
|
|
|
if (mBootMode == BOOT_ON_S3_RESUME) {
|
|
|
|
//
|
|
|
|
// Create the following memory HOB as an exception on the S3 boot path.
|
|
|
|
//
|
|
|
|
// Normally we'd create memory HOBs only on the normal boot path. However,
|
|
|
|
// CpuMpPei specifically needs such a low-memory HOB on the S3 path as
|
|
|
|
// well, for "borrowing" a subset of it temporarily, for the AP startup
|
|
|
|
// vector.
|
|
|
|
//
|
|
|
|
// CpuMpPei saves the original contents of the borrowed area in permanent
|
|
|
|
// PEI RAM, in a backup buffer allocated with the normal PEI services.
|
|
|
|
// CpuMpPei restores the original contents ("returns" the borrowed area) at
|
|
|
|
// End-of-PEI. End-of-PEI in turn is emitted by S3Resume2Pei before
|
2016-09-09 22:32:15 +02:00
|
|
|
// transferring control to the OS's wakeup vector in the FACS.
|
2016-07-07 09:29:12 +02:00
|
|
|
//
|
|
|
|
// We expect any other PEIMs that "borrow" memory similarly to CpuMpPei to
|
|
|
|
// restore the original contents. Furthermore, we expect all such PEIMs
|
|
|
|
// (CpuMpPei included) to claim the borrowed areas by producing memory
|
|
|
|
// allocation HOBs, and to honor preexistent memory allocation HOBs when
|
|
|
|
// looking for an area to borrow.
|
|
|
|
//
|
OvmfPkg/PlatformPei: reserve the SMRAM at the default SMBASE, if it exists
The 128KB SMRAM at the default SMBASE will be used for protecting the
initial SMI handler for hot-plugged VCPUs. After platform reset, the SMRAM
in question is open (and looks just like RAM). When BDS signals
EFI_DXE_MM_READY_TO_LOCK_PROTOCOL (and so TSEG is locked down), we're
going to lock the SMRAM at the default SMBASE too.
For this, we have to reserve said SMRAM area as early as possible, from
components in PEI, DXE, and OS runtime.
* QemuInitializeRam() currently produces a single resource descriptor HOB,
for exposing the system RAM available under 1GB. This occurs during both
normal boot and S3 resume identically (the latter only for the sake of
CpuMpPei borrowing low RAM for the AP startup vector).
But, the SMRAM at the default SMBASE falls in the middle of the current
system RAM HOB. Split the HOB, and cover the SMRAM with a reserved
memory HOB in the middle. CpuMpPei (via MpInitLib) skips reserved memory
HOBs.
* InitializeRamRegions() is responsible for producing memory allocation
HOBs, carving out parts of the resource descriptor HOBs produced in
QemuInitializeRam(). Allocate the above-introduced reserved memory
region in full, similarly to how we treat TSEG, so that DXE and the OS
avoid the locked SMRAM (black hole) in this area.
(Note that these allocations only occur on the normal boot path, as they
matter for the UEFI memory map, which cannot be changed during S3
resume.)
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1512
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Jiewen Yao <jiewen.yao@intel.com>
Message-Id: <20200129214412.2361-8-lersek@redhat.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
2019-09-20 15:12:27 +02:00
|
|
|
QemuInitializeRamBelow1gb ();
|
2016-07-07 09:29:12 +02:00
|
|
|
} else {
|
2014-03-04 09:02:30 +01:00
|
|
|
//
|
|
|
|
// Create memory HOBs
|
|
|
|
//
|
OvmfPkg/PlatformPei: reserve the SMRAM at the default SMBASE, if it exists
The 128KB SMRAM at the default SMBASE will be used for protecting the
initial SMI handler for hot-plugged VCPUs. After platform reset, the SMRAM
in question is open (and looks just like RAM). When BDS signals
EFI_DXE_MM_READY_TO_LOCK_PROTOCOL (and so TSEG is locked down), we're
going to lock the SMRAM at the default SMBASE too.
For this, we have to reserve said SMRAM area as early as possible, from
components in PEI, DXE, and OS runtime.
* QemuInitializeRam() currently produces a single resource descriptor HOB,
for exposing the system RAM available under 1GB. This occurs during both
normal boot and S3 resume identically (the latter only for the sake of
CpuMpPei borrowing low RAM for the AP startup vector).
But, the SMRAM at the default SMBASE falls in the middle of the current
system RAM HOB. Split the HOB, and cover the SMRAM with a reserved
memory HOB in the middle. CpuMpPei (via MpInitLib) skips reserved memory
HOBs.
* InitializeRamRegions() is responsible for producing memory allocation
HOBs, carving out parts of the resource descriptor HOBs produced in
QemuInitializeRam(). Allocate the above-introduced reserved memory
region in full, similarly to how we treat TSEG, so that DXE and the OS
avoid the locked SMRAM (black hole) in this area.
(Note that these allocations only occur on the normal boot path, as they
matter for the UEFI memory map, which cannot be changed during S3
resume.)
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1512
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Jiewen Yao <jiewen.yao@intel.com>
Message-Id: <20200129214412.2361-8-lersek@redhat.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
2019-09-20 15:12:27 +02:00
|
|
|
QemuInitializeRamBelow1gb ();
|
OvmfPkg: PlatformPei: account for TSEG size with PcdSmmSmramRequire set
PlatformPei calls GetSystemMemorySizeBelow4gb() in three locations:
- PublishPeiMemory(): on normal boot, the permanent PEI RAM is installed
so that it ends with the RAM below 4GB,
- QemuInitializeRam(): on normal boot, memory resource descriptor HOBs are
created for the RAM below 4GB; plus MTRR attributes are set
(independently of S3 vs. normal boot)
- MemMapInitialization(): an MMIO resource descriptor HOB is created for
PCI resource allocation, on normal boot, starting at max(RAM below 4GB,
2GB).
The first two of these is adjusted for the configured TSEG size, if
PcdSmmSmramRequire is set:
- In PublishPeiMemory(), the permanent PEI RAM is kept under TSEG.
- In QemuInitializeRam(), we must keep the DXE out of TSEG.
One idea would be to simply trim the [1MB .. LowerMemorySize] memory
resource descriptor HOB, leaving a hole for TSEG in the memory space
map.
The SMM IPL will however want to massage the caching attributes of the
SMRAM range that it loads the SMM core into, with
gDS->SetMemorySpaceAttributes(), and that won't work on a hole. So,
instead of trimming this range, split the TSEG area off, and report it
as a cacheable reserved memory resource.
Finally, since reserved memory can be allocated too, pre-allocate TSEG
in InitializeRamRegions(), after QemuInitializeRam() returns. (Note that
this step alone does not suffice without the resource descriptor HOB
trickery: if we omit that, then the DXE IPL PEIM fails to load and start
the DXE core.)
- In MemMapInitialization(), the start of the PCI MMIO range is not
affected.
We choose the largest option (8MB) for the default TSEG size. Michael
Kinney pointed out that the SMBASE relocation in PiSmmCpuDxeSmm consumes
SMRAM proportionally to the number of CPUs. From the three options
available, he reported that 8MB was both necessary and sufficient for the
SMBASE relocation to succeed with 255 CPUs:
- http://thread.gmane.org/gmane.comp.bios.edk2.devel/3020/focus=3137
- http://thread.gmane.org/gmane.comp.bios.edk2.devel/3020/focus=3177
Cc: Michael Kinney <michael.d.kinney@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Michael Kinney <michael.d.kinney@intel.com>
git-svn-id: https://svn.code.sf.net/p/edk2/code/trunk/edk2@19039 6f19259b-4bc3-4df7-8a09-765794883524
2015-11-30 19:41:33 +01:00
|
|
|
|
|
|
|
if (FeaturePcdGet (PcdSmmSmramRequire)) {
|
|
|
|
UINT32 TsegSize;
|
|
|
|
|
2017-07-04 12:44:05 +02:00
|
|
|
TsegSize = mQ35TsegMbytes * SIZE_1MB;
|
OvmfPkg: PlatformPei: account for TSEG size with PcdSmmSmramRequire set
PlatformPei calls GetSystemMemorySizeBelow4gb() in three locations:
- PublishPeiMemory(): on normal boot, the permanent PEI RAM is installed
so that it ends with the RAM below 4GB,
- QemuInitializeRam(): on normal boot, memory resource descriptor HOBs are
created for the RAM below 4GB; plus MTRR attributes are set
(independently of S3 vs. normal boot)
- MemMapInitialization(): an MMIO resource descriptor HOB is created for
PCI resource allocation, on normal boot, starting at max(RAM below 4GB,
2GB).
The first two of these is adjusted for the configured TSEG size, if
PcdSmmSmramRequire is set:
- In PublishPeiMemory(), the permanent PEI RAM is kept under TSEG.
- In QemuInitializeRam(), we must keep the DXE out of TSEG.
One idea would be to simply trim the [1MB .. LowerMemorySize] memory
resource descriptor HOB, leaving a hole for TSEG in the memory space
map.
The SMM IPL will however want to massage the caching attributes of the
SMRAM range that it loads the SMM core into, with
gDS->SetMemorySpaceAttributes(), and that won't work on a hole. So,
instead of trimming this range, split the TSEG area off, and report it
as a cacheable reserved memory resource.
Finally, since reserved memory can be allocated too, pre-allocate TSEG
in InitializeRamRegions(), after QemuInitializeRam() returns. (Note that
this step alone does not suffice without the resource descriptor HOB
trickery: if we omit that, then the DXE IPL PEIM fails to load and start
the DXE core.)
- In MemMapInitialization(), the start of the PCI MMIO range is not
affected.
We choose the largest option (8MB) for the default TSEG size. Michael
Kinney pointed out that the SMBASE relocation in PiSmmCpuDxeSmm consumes
SMRAM proportionally to the number of CPUs. From the three options
available, he reported that 8MB was both necessary and sufficient for the
SMBASE relocation to succeed with 255 CPUs:
- http://thread.gmane.org/gmane.comp.bios.edk2.devel/3020/focus=3137
- http://thread.gmane.org/gmane.comp.bios.edk2.devel/3020/focus=3177
Cc: Michael Kinney <michael.d.kinney@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Michael Kinney <michael.d.kinney@intel.com>
git-svn-id: https://svn.code.sf.net/p/edk2/code/trunk/edk2@19039 6f19259b-4bc3-4df7-8a09-765794883524
2015-11-30 19:41:33 +01:00
|
|
|
AddMemoryRangeHob (BASE_1MB, LowerMemorySize - TsegSize);
|
|
|
|
AddReservedMemoryBaseSizeHob (LowerMemorySize - TsegSize, TsegSize,
|
|
|
|
TRUE);
|
|
|
|
} else {
|
|
|
|
AddMemoryRangeHob (BASE_1MB, LowerMemorySize);
|
|
|
|
}
|
|
|
|
|
OvmfPkg/PlatformPei: support >=1TB high RAM, and discontiguous high RAM
In OVMF we currently get the upper (>=4GB) memory size with the
GetSystemMemorySizeAbove4gb() function.
The GetSystemMemorySizeAbove4gb() function is used in two places:
(1) It is the starting point of the calculations in GetFirstNonAddress().
GetFirstNonAddress() in turn
- determines the placement of the 64-bit PCI MMIO aperture,
- provides input for the GCD memory space map's sizing (see
AddressWidthInitialization(), and the CPU HOB in
MiscInitialization()),
- influences the permanent PEI RAM cap (the DXE core's page tables,
built in permanent PEI RAM, grow as the RAM to map grows).
(2) In QemuInitializeRam(), GetSystemMemorySizeAbove4gb() determines the
single memory descriptor HOB that we produce for the upper memory.
Respectively, there are two problems with GetSystemMemorySizeAbove4gb():
(1) It reads a 24-bit count of 64KB RAM chunks from the CMOS, and
therefore cannot return a larger value than one terabyte.
(2) It cannot express discontiguous high RAM.
Starting with version 1.7.0, QEMU has provided the fw_cfg file called
"etc/e820". Refer to the following QEMU commits:
- 0624c7f916b4 ("e820: pass high memory too.", 2013-10-10),
- 7d67110f2d9a ("pc: add etc/e820 fw_cfg file", 2013-10-18)
- 7db16f2480db ("pc: register e820 entries for ram", 2013-10-10)
Ever since these commits in v1.7.0 -- with the last QEMU release being
v2.9.0, and v2.10.0 under development --, the only two RAM entries added
to this E820 map correspond to the below-4GB RAM range, and the above-4GB
RAM range. And, the above-4GB range exactly matches the CMOS registers in
question; see the use of "pcms->above_4g_mem_size":
pc_q35_init() | pc_init1()
pc_memory_init()
e820_add_entry(0x100000000ULL, pcms->above_4g_mem_size, E820_RAM);
pc_cmos_init()
val = pcms->above_4g_mem_size / 65536;
rtc_set_memory(s, 0x5b, val);
rtc_set_memory(s, 0x5c, val >> 8);
rtc_set_memory(s, 0x5d, val >> 16);
Therefore, remedy the above OVMF limitations as follows:
(1) Start off GetFirstNonAddress() by scanning the E820 map for the
highest exclusive >=4GB RAM address. Fall back to the CMOS if the E820
map is unavailable. Base all further calculations (such as 64-bit PCI
MMIO aperture placement, GCD sizing etc) on this value.
At the moment, the only difference this change makes is that we can
have more than 1TB above 4GB -- given that the sole "high RAM" entry
in the E820 map matches the CMOS exactly, modulo the most significant
bits (see above).
However, Igor plans to add discontiguous (cold-plugged) high RAM to
the fw_cfg E820 RAM map later on, and then this scanning will adapt
automatically.
(2) In QemuInitializeRam(), describe the high RAM regions from the E820
map one by one with memory HOBs. Fall back to the CMOS only if the
E820 map is missing.
Again, right now this change only makes a difference if there is at
least 1TB high RAM. Later on it will adapt to discontiguous high RAM
(regardless of its size) automatically.
-*-
Implementation details: introduce the ScanOrAdd64BitE820Ram() function,
which reads the E820 entries from fw_cfg, and finds the highest exclusive
>=4GB RAM address, or produces memory resource descriptor HOBs for RAM
entries that start at or above 4GB. The RAM map is not read in a single
go, because its size can vary, and in PlatformPei we should stay away from
dynamic memory allocation, for the following reasons:
- "Pool" allocations are limited to ~64KB, are served from HOBs, and
cannot be released ever.
- "Page" allocations are seriously limited before PlatformPei installs the
permanent PEI RAM. Furthermore, page allocations can only be released in
DXE, with dedicated code (so the address would have to be passed on with
a HOB or PCD).
- Raw memory allocation HOBs would require the same freeing in DXE.
Therefore we process each E820 entry as soon as it is read from fw_cfg.
-*-
Considering the impact of high RAM on the DXE core:
A few years ago, installing high RAM as *tested* would cause the DXE core
to inhabit such ranges rather than carving out its home from the permanent
PEI RAM. Fortunately, this was fixed in the following edk2 commit:
3a05b13106d1, "MdeModulePkg DxeCore: Take the range in resource HOB for
PHIT as higher priority", 2015-09-18
which I regression-tested at the time:
http://mid.mail-archive.com/55FC27B0.4070807@redhat.com
Later on, OVMF was changed to install its high RAM as tested (effectively
"arming" the earlier DXE core change for OVMF), in the following edk2
commit:
035ce3b37c90, "OvmfPkg/PlatformPei: Add memory above 4GB as tested",
2016-04-21
which I also regression-tested at the time:
http://mid.mail-archive.com/571E8B90.1020102@redhat.com
Therefore adding more "tested memory" HOBs is safe.
Cc: Jordan Justen <jordan.l.justen@intel.com>
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1468526
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-07-08 01:28:37 +02:00
|
|
|
//
|
|
|
|
// If QEMU presents an E820 map, then create memory HOBs for the >=4GB RAM
|
|
|
|
// entries. Otherwise, create a single memory HOB with the flat >=4GB
|
|
|
|
// memory size read from the CMOS.
|
|
|
|
//
|
|
|
|
Status = ScanOrAdd64BitE820Ram (NULL);
|
|
|
|
if (EFI_ERROR (Status) && UpperMemorySize != 0) {
|
2016-04-21 08:31:55 +02:00
|
|
|
AddMemoryBaseSizeHob (BASE_4GB, UpperMemorySize);
|
2015-06-26 18:09:48 +02:00
|
|
|
}
|
2014-03-04 09:02:30 +01:00
|
|
|
}
|
2009-05-27 23:10:18 +02:00
|
|
|
|
OvmfPkg: PlatformPei: invert MTRR setup in QemuInitializeRam()
At the moment we work with a UC default MTRR type, and set three memory
ranges to WB:
- [0, 640 KB),
- [1 MB, LowerMemorySize),
- [4 GB, 4 GB + UpperMemorySize).
Unfortunately, coverage for the third range can fail with a high
likelihood. If the alignment of the base (ie. 4 GB) and the alignment of
the size (UpperMemorySize) differ, then MtrrLib creates a series of
variable MTRR entries, with power-of-two sized MTRR masks. And, it's
really easy to run out of variable MTRR entries, dependent on the
alignment difference.
This is a problem because a Linux guest will loudly reject any high memory
that is not covered my MTRR.
So, let's follow the inverse pattern (loosely inspired by SeaBIOS):
- flip the MTRR default type to WB,
- set [0, 640 KB) to WB -- fixed MTRRs have precedence over the default
type and variable MTRRs, so we can't avoid this,
- set [640 KB, 1 MB) to UC -- implemented with fixed MTRRs,
- set [LowerMemorySize, 4 GB) to UC -- should succeed with variable MTRRs
more likely than the other scheme (due to less chaotic alignment
differences).
Effects of this patch can be observed by setting DEBUG_CACHE (0x00200000)
in PcdDebugPrintErrorLevel.
Cc: Maoming <maoming.maoming@huawei.com>
Cc: Huangpeng (Peter) <peter.huangpeng@huawei.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Tested-by: Maoming <maoming.maoming@huawei.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
git-svn-id: https://svn.code.sf.net/p/edk2/code/trunk/edk2@17722 6f19259b-4bc3-4df7-8a09-765794883524
2015-06-26 18:09:52 +02:00
|
|
|
//
|
|
|
|
// We'd like to keep the following ranges uncached:
|
|
|
|
// - [640 KB, 1 MB)
|
|
|
|
// - [LowerMemorySize, 4 GB)
|
|
|
|
//
|
|
|
|
// Everything else should be WB. Unfortunately, programming the inverse (ie.
|
|
|
|
// keeping the default UC, and configuring the complement set of the above as
|
|
|
|
// WB) is not reliable in general, because the end of the upper RAM can have
|
|
|
|
// practically any alignment, and we may not have enough variable MTRRs to
|
|
|
|
// cover it exactly.
|
|
|
|
//
|
|
|
|
if (IsMtrrSupported ()) {
|
|
|
|
MtrrGetAllMtrrs (&MtrrSettings);
|
|
|
|
|
|
|
|
//
|
|
|
|
// MTRRs disabled, fixed MTRRs disabled, default type is uncached
|
|
|
|
//
|
|
|
|
ASSERT ((MtrrSettings.MtrrDefType & BIT11) == 0);
|
|
|
|
ASSERT ((MtrrSettings.MtrrDefType & BIT10) == 0);
|
|
|
|
ASSERT ((MtrrSettings.MtrrDefType & 0xFF) == 0);
|
|
|
|
|
|
|
|
//
|
|
|
|
// flip default type to writeback
|
|
|
|
//
|
|
|
|
SetMem (&MtrrSettings.Fixed, sizeof MtrrSettings.Fixed, 0x06);
|
|
|
|
ZeroMem (&MtrrSettings.Variables, sizeof MtrrSettings.Variables);
|
|
|
|
MtrrSettings.MtrrDefType |= BIT11 | BIT10 | 6;
|
|
|
|
MtrrSetAllMtrrs (&MtrrSettings);
|
2011-10-28 08:04:01 +02:00
|
|
|
|
OvmfPkg: PlatformPei: invert MTRR setup in QemuInitializeRam()
At the moment we work with a UC default MTRR type, and set three memory
ranges to WB:
- [0, 640 KB),
- [1 MB, LowerMemorySize),
- [4 GB, 4 GB + UpperMemorySize).
Unfortunately, coverage for the third range can fail with a high
likelihood. If the alignment of the base (ie. 4 GB) and the alignment of
the size (UpperMemorySize) differ, then MtrrLib creates a series of
variable MTRR entries, with power-of-two sized MTRR masks. And, it's
really easy to run out of variable MTRR entries, dependent on the
alignment difference.
This is a problem because a Linux guest will loudly reject any high memory
that is not covered my MTRR.
So, let's follow the inverse pattern (loosely inspired by SeaBIOS):
- flip the MTRR default type to WB,
- set [0, 640 KB) to WB -- fixed MTRRs have precedence over the default
type and variable MTRRs, so we can't avoid this,
- set [640 KB, 1 MB) to UC -- implemented with fixed MTRRs,
- set [LowerMemorySize, 4 GB) to UC -- should succeed with variable MTRRs
more likely than the other scheme (due to less chaotic alignment
differences).
Effects of this patch can be observed by setting DEBUG_CACHE (0x00200000)
in PcdDebugPrintErrorLevel.
Cc: Maoming <maoming.maoming@huawei.com>
Cc: Huangpeng (Peter) <peter.huangpeng@huawei.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Tested-by: Maoming <maoming.maoming@huawei.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
git-svn-id: https://svn.code.sf.net/p/edk2/code/trunk/edk2@17722 6f19259b-4bc3-4df7-8a09-765794883524
2015-06-26 18:09:52 +02:00
|
|
|
//
|
|
|
|
// Set memory range from 640KB to 1MB to uncacheable
|
|
|
|
//
|
|
|
|
Status = MtrrSetMemoryAttribute (BASE_512KB + BASE_128KB,
|
|
|
|
BASE_1MB - (BASE_512KB + BASE_128KB), CacheUncacheable);
|
|
|
|
ASSERT_EFI_ERROR (Status);
|
2011-10-28 08:04:01 +02:00
|
|
|
|
OvmfPkg: PlatformPei: invert MTRR setup in QemuInitializeRam()
At the moment we work with a UC default MTRR type, and set three memory
ranges to WB:
- [0, 640 KB),
- [1 MB, LowerMemorySize),
- [4 GB, 4 GB + UpperMemorySize).
Unfortunately, coverage for the third range can fail with a high
likelihood. If the alignment of the base (ie. 4 GB) and the alignment of
the size (UpperMemorySize) differ, then MtrrLib creates a series of
variable MTRR entries, with power-of-two sized MTRR masks. And, it's
really easy to run out of variable MTRR entries, dependent on the
alignment difference.
This is a problem because a Linux guest will loudly reject any high memory
that is not covered my MTRR.
So, let's follow the inverse pattern (loosely inspired by SeaBIOS):
- flip the MTRR default type to WB,
- set [0, 640 KB) to WB -- fixed MTRRs have precedence over the default
type and variable MTRRs, so we can't avoid this,
- set [640 KB, 1 MB) to UC -- implemented with fixed MTRRs,
- set [LowerMemorySize, 4 GB) to UC -- should succeed with variable MTRRs
more likely than the other scheme (due to less chaotic alignment
differences).
Effects of this patch can be observed by setting DEBUG_CACHE (0x00200000)
in PcdDebugPrintErrorLevel.
Cc: Maoming <maoming.maoming@huawei.com>
Cc: Huangpeng (Peter) <peter.huangpeng@huawei.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Tested-by: Maoming <maoming.maoming@huawei.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
git-svn-id: https://svn.code.sf.net/p/edk2/code/trunk/edk2@17722 6f19259b-4bc3-4df7-8a09-765794883524
2015-06-26 18:09:52 +02:00
|
|
|
//
|
OvmfPkg/PlatformPei: set 32-bit UC area at PciBase / PciExBarBase (pc/q35)
(This is a replacement for commit 39b9a5ffe661 ("OvmfPkg/PlatformPei: fix
MTRR for low-RAM sizes that have many bits clear", 2019-05-16).)
Reintroduce the same logic as seen in commit 39b9a5ffe661 for the pc
(i440fx) board type.
For q35, the same approach doesn't work any longer, given that (a) we'd
like to keep the PCIEXBAR in the platform DSC a fixed-at-build PCD, and
(b) QEMU expects the PCIEXBAR to reside at a lower address than the 32-bit
PCI MMIO aperture.
Therefore, introduce a helper function for determining the 32-bit
"uncacheable" (MMIO) area base address:
- On q35, this function behaves statically. Furthermore, the MTRR setup
exploits that the range [0xB000_0000, 0xFFFF_FFFF] can be marked UC with
just two variable MTRRs (one at 0xB000_0000 (size 256MB), another at
0xC000_0000 (size 1GB)).
- On pc (i440fx), the function behaves dynamically, implementing the same
logic as commit 39b9a5ffe661 did. The PciBase value is adjusted to the
value calculated, similarly to commit 39b9a5ffe661. A further
simplification is that we show that the UC32 area size truncation to a
whole power of two automatically guarantees a >=2GB base address.
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1859
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Philippe Mathieu-Daude <philmd@redhat.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
2019-05-29 14:49:55 +02:00
|
|
|
// Set the memory range from the start of the 32-bit MMIO area (32-bit PCI
|
|
|
|
// MMIO aperture on i440fx, PCIEXBAR on q35) to 4GB as uncacheable.
|
OvmfPkg: PlatformPei: invert MTRR setup in QemuInitializeRam()
At the moment we work with a UC default MTRR type, and set three memory
ranges to WB:
- [0, 640 KB),
- [1 MB, LowerMemorySize),
- [4 GB, 4 GB + UpperMemorySize).
Unfortunately, coverage for the third range can fail with a high
likelihood. If the alignment of the base (ie. 4 GB) and the alignment of
the size (UpperMemorySize) differ, then MtrrLib creates a series of
variable MTRR entries, with power-of-two sized MTRR masks. And, it's
really easy to run out of variable MTRR entries, dependent on the
alignment difference.
This is a problem because a Linux guest will loudly reject any high memory
that is not covered my MTRR.
So, let's follow the inverse pattern (loosely inspired by SeaBIOS):
- flip the MTRR default type to WB,
- set [0, 640 KB) to WB -- fixed MTRRs have precedence over the default
type and variable MTRRs, so we can't avoid this,
- set [640 KB, 1 MB) to UC -- implemented with fixed MTRRs,
- set [LowerMemorySize, 4 GB) to UC -- should succeed with variable MTRRs
more likely than the other scheme (due to less chaotic alignment
differences).
Effects of this patch can be observed by setting DEBUG_CACHE (0x00200000)
in PcdDebugPrintErrorLevel.
Cc: Maoming <maoming.maoming@huawei.com>
Cc: Huangpeng (Peter) <peter.huangpeng@huawei.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Tested-by: Maoming <maoming.maoming@huawei.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
git-svn-id: https://svn.code.sf.net/p/edk2/code/trunk/edk2@17722 6f19259b-4bc3-4df7-8a09-765794883524
2015-06-26 18:09:52 +02:00
|
|
|
//
|
OvmfPkg/PlatformPei: set 32-bit UC area at PciBase / PciExBarBase (pc/q35)
(This is a replacement for commit 39b9a5ffe661 ("OvmfPkg/PlatformPei: fix
MTRR for low-RAM sizes that have many bits clear", 2019-05-16).)
Reintroduce the same logic as seen in commit 39b9a5ffe661 for the pc
(i440fx) board type.
For q35, the same approach doesn't work any longer, given that (a) we'd
like to keep the PCIEXBAR in the platform DSC a fixed-at-build PCD, and
(b) QEMU expects the PCIEXBAR to reside at a lower address than the 32-bit
PCI MMIO aperture.
Therefore, introduce a helper function for determining the 32-bit
"uncacheable" (MMIO) area base address:
- On q35, this function behaves statically. Furthermore, the MTRR setup
exploits that the range [0xB000_0000, 0xFFFF_FFFF] can be marked UC with
just two variable MTRRs (one at 0xB000_0000 (size 256MB), another at
0xC000_0000 (size 1GB)).
- On pc (i440fx), the function behaves dynamically, implementing the same
logic as commit 39b9a5ffe661 did. The PciBase value is adjusted to the
value calculated, similarly to commit 39b9a5ffe661. A further
simplification is that we show that the UC32 area size truncation to a
whole power of two automatically guarantees a >=2GB base address.
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1859
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Philippe Mathieu-Daude <philmd@redhat.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
2019-05-29 14:49:55 +02:00
|
|
|
Status = MtrrSetMemoryAttribute (mQemuUc32Base, SIZE_4GB - mQemuUc32Base,
|
|
|
|
CacheUncacheable);
|
OvmfPkg: PlatformPei: invert MTRR setup in QemuInitializeRam()
At the moment we work with a UC default MTRR type, and set three memory
ranges to WB:
- [0, 640 KB),
- [1 MB, LowerMemorySize),
- [4 GB, 4 GB + UpperMemorySize).
Unfortunately, coverage for the third range can fail with a high
likelihood. If the alignment of the base (ie. 4 GB) and the alignment of
the size (UpperMemorySize) differ, then MtrrLib creates a series of
variable MTRR entries, with power-of-two sized MTRR masks. And, it's
really easy to run out of variable MTRR entries, dependent on the
alignment difference.
This is a problem because a Linux guest will loudly reject any high memory
that is not covered my MTRR.
So, let's follow the inverse pattern (loosely inspired by SeaBIOS):
- flip the MTRR default type to WB,
- set [0, 640 KB) to WB -- fixed MTRRs have precedence over the default
type and variable MTRRs, so we can't avoid this,
- set [640 KB, 1 MB) to UC -- implemented with fixed MTRRs,
- set [LowerMemorySize, 4 GB) to UC -- should succeed with variable MTRRs
more likely than the other scheme (due to less chaotic alignment
differences).
Effects of this patch can be observed by setting DEBUG_CACHE (0x00200000)
in PcdDebugPrintErrorLevel.
Cc: Maoming <maoming.maoming@huawei.com>
Cc: Huangpeng (Peter) <peter.huangpeng@huawei.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Tested-by: Maoming <maoming.maoming@huawei.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
git-svn-id: https://svn.code.sf.net/p/edk2/code/trunk/edk2@17722 6f19259b-4bc3-4df7-8a09-765794883524
2015-06-26 18:09:52 +02:00
|
|
|
ASSERT_EFI_ERROR (Status);
|
2011-01-21 17:51:00 +01:00
|
|
|
}
|
2009-05-27 23:10:18 +02:00
|
|
|
}
|
|
|
|
|
2014-02-01 22:22:48 +01:00
|
|
|
/**
|
|
|
|
Publish system RAM and reserve memory regions
|
|
|
|
|
|
|
|
**/
|
|
|
|
VOID
|
|
|
|
InitializeRamRegions (
|
|
|
|
VOID
|
|
|
|
)
|
|
|
|
{
|
2014-02-01 22:22:54 +01:00
|
|
|
if (!mXen) {
|
|
|
|
QemuInitializeRam ();
|
|
|
|
} else {
|
|
|
|
XenPublishRamRegions ();
|
|
|
|
}
|
2014-03-04 09:02:16 +01:00
|
|
|
|
|
|
|
if (mS3Supported && mBootMode != BOOT_ON_S3_RESUME) {
|
|
|
|
//
|
|
|
|
// This is the memory range that will be used for PEI on S3 resume
|
|
|
|
//
|
|
|
|
BuildMemoryAllocationHob (
|
2016-07-13 00:52:54 +02:00
|
|
|
mS3AcpiReservedMemoryBase,
|
|
|
|
mS3AcpiReservedMemorySize,
|
2014-03-04 09:02:16 +01:00
|
|
|
EfiACPIMemoryNVS
|
|
|
|
);
|
2014-03-04 09:02:45 +01:00
|
|
|
|
|
|
|
//
|
|
|
|
// Cover the initial RAM area used as stack and temporary PEI heap.
|
|
|
|
//
|
|
|
|
// This is reserved as ACPI NVS so it can be used on S3 resume.
|
|
|
|
//
|
|
|
|
BuildMemoryAllocationHob (
|
|
|
|
PcdGet32 (PcdOvmfSecPeiTempRamBase),
|
|
|
|
PcdGet32 (PcdOvmfSecPeiTempRamSize),
|
|
|
|
EfiACPIMemoryNVS
|
|
|
|
);
|
2014-03-04 09:02:52 +01:00
|
|
|
|
OvmfPkg: PlatformPei: protect SEC's GUIDed section handler table thru S3
OVMF's SecMain is unique in the sense that it links against the following
two libraries *in combination*:
- IntelFrameworkModulePkg/Library/LzmaCustomDecompressLib/
LzmaCustomDecompressLib.inf
- MdePkg/Library/BaseExtractGuidedSectionLib/
BaseExtractGuidedSectionLib.inf
The ExtractGuidedSectionLib library class allows decompressor modules to
register themselves (keyed by GUID) with it, and it allows clients to
decompress file sections with a registered decompressor module that
matches the section's GUID.
BaseExtractGuidedSectionLib is a library instance (of type BASE) for this
library class. It has no constructor function.
LzmaCustomDecompressLib is a compatible decompressor module (of type
BASE). Its section type GUID is
gLzmaCustomDecompressGuid == EE4E5898-3914-4259-9D6E-DC7BD79403CF
When OVMF's SecMain module starts, the LzmaCustomDecompressLib constructor
function is executed, which registers its LZMA decompressor with the above
GUID, by calling into BaseExtractGuidedSectionLib:
LzmaDecompressLibConstructor() [GuidedSectionExtraction.c]
ExtractGuidedSectionRegisterHandlers() [BaseExtractGuidedSectionLib.c]
GetExtractGuidedSectionHandlerInfo()
PcdGet64 (PcdGuidedExtractHandlerTableAddress) -- NOTE THIS
Later, during a normal (non-S3) boot, SecMain utilizes this decompressor
to get information about, and to decompress, sections of the OVMF firmware
image:
SecCoreStartupWithStack() [OvmfPkg/Sec/SecMain.c]
SecStartupPhase2()
FindAndReportEntryPoints()
FindPeiCoreImageBase()
DecompressMemFvs()
ExtractGuidedSectionGetInfo() [BaseExtractGuidedSectionLib.c]
ExtractGuidedSectionDecode() [BaseExtractGuidedSectionLib.c]
Notably, only the extraction depends on full-config-boot; the registration
of LzmaCustomDecompressLib occurs unconditionally in the SecMain EFI
binary, triggered by the library constructor function.
This is where the bug happens. BaseExtractGuidedSectionLib maintains the
table of GUIDed decompressors (section handlers) at a fixed memory
location; selected by PcdGuidedExtractHandlerTableAddress (declared in
MdePkg.dec). The default value of this PCD is 0x1000000 (16 MB).
This causes SecMain to corrupt guest OS memory during S3, leading to
random crashes. Compare the following two memory dumps, the first taken
right before suspending, the second taken right after resuming a RHEL-7
guest:
crash> rd -8 -p 1000000 0x50
1000000: c0 00 08 00 02 00 00 00 00 00 00 00 00 00 00 00 ................
1000010: d0 33 0c 00 00 c9 ff ff c0 10 00 01 00 88 ff ff .3..............
1000020: 0a 6d 57 32 0f 00 00 00 38 00 00 01 00 88 ff ff .mW2....8.......
1000030: 00 00 00 00 00 00 00 00 73 69 67 6e 61 6c 6d 6f ........signalmo
1000040: 64 75 6c 65 2e 73 6f 00 00 00 00 00 00 00 00 00 dule.so.........
vs.
crash> rd -8 -p 1000000 0x50
1000000: 45 47 53 49 01 00 00 00 20 00 00 01 00 00 00 00 EGSI.... .......
1000010: 20 01 00 01 00 00 00 00 a0 01 00 01 00 00 00 00 ...............
1000020: 98 58 4e ee 14 39 59 42 9d 6e dc 7b d7 94 03 cf .XN..9YB.n.{....
1000030: 00 00 00 00 00 00 00 00 73 69 67 6e 61 6c 6d 6f ........signalmo
1000040: 64 75 6c 65 2e 73 6f 00 00 00 00 00 00 00 00 00 dule.so.........
The "EGSI" signature corresponds to EXTRACT_HANDLER_INFO_SIGNATURE
declared in
MdePkg/Library/BaseExtractGuidedSectionLib/BaseExtractGuidedSectionLib.c.
Additionally, the gLzmaCustomDecompressGuid (quoted above) is visible at
guest-phys offset 0x1000020.
Fix the problem as follows:
- Carve out 4KB from the 36KB gap that we currently have between
PcdOvmfLockBoxStorageBase + PcdOvmfLockBoxStorageSize == 8220 KB
and
PcdOvmfSecPeiTempRamBase == 8256 KB.
- Point PcdGuidedExtractHandlerTableAddress to 8220 KB (0x00807000).
- Cover the area with an EfiACPIMemoryNVS type memalloc HOB, if S3 is
supported and we're not currently resuming.
The 4KB size that we pick is an upper estimate for
BaseExtractGuidedSectionLib's internal storage size. The latter is
calculated as follows (see GetExtractGuidedSectionHandlerInfo()):
sizeof(EXTRACT_GUIDED_SECTION_HANDLER_INFO) + // 32
PcdMaximumGuidedExtractHandler * (
sizeof(GUID) + // 16
sizeof(EXTRACT_GUIDED_SECTION_DECODE_HANDLER) + // 8
sizeof(EXTRACT_GUIDED_SECTION_GET_INFO_HANDLER) // 8
)
OVMF sets PcdMaximumGuidedExtractHandler to 16 decimal (which is the
MdePkg default too), yielding 32 + 16 * (16 + 8 + 8) == 544 bytes.
Regarding the lifecycle of the new area:
(a) when and how it is initialized after first boot of the VM
The library linked into SecMain finds that the area lacks the signature.
It initializes the signature, plus the rest of the structure. This is
independent of S3 support.
Consumption of the area is also limited to SEC (but consumption does
depend on full-config-boot).
(b) how it is protected from memory allocations during DXE
It is not, in the general case; and we don't need to. Nothing else links
against BaseExtractGuidedSectionLib; it's OK if DXE overwrites the area.
(c) how it is protected from the OS
When S3 is enabled, we cover it with AcpiNVS in InitializeRamRegions().
When S3 is not supported, the range is not protected.
(d) how it is accessed on the S3 resume path
Examined by the library linked into SecMain. Registrations update the
table in-place (based on GUID matches).
(e) how it is accessed on the warm reset path
If S3 is enabled, then the OS won't damage the table (due to (c)), hence
see (d).
If S3 is unsupported, then the OS may or may not overwrite the
signature. (It likely will.) This is identical to the pre-patch status.
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
git-svn-id: https://svn.code.sf.net/p/edk2/code/trunk/edk2@15433 6f19259b-4bc3-4df7-8a09-765794883524
2014-04-05 23:26:09 +02:00
|
|
|
//
|
|
|
|
// SEC stores its table of GUIDed section handlers here.
|
|
|
|
//
|
|
|
|
BuildMemoryAllocationHob (
|
|
|
|
PcdGet64 (PcdGuidedExtractHandlerTableAddress),
|
|
|
|
PcdGet32 (PcdGuidedExtractHandlerTableSize),
|
|
|
|
EfiACPIMemoryNVS
|
|
|
|
);
|
|
|
|
|
2014-03-04 09:02:52 +01:00
|
|
|
#ifdef MDE_CPU_X64
|
|
|
|
//
|
|
|
|
// Reserve the initial page tables built by the reset vector code.
|
|
|
|
//
|
|
|
|
// Since this memory range will be used by the Reset Vector on S3
|
|
|
|
// resume, it must be reserved as ACPI NVS.
|
|
|
|
//
|
|
|
|
BuildMemoryAllocationHob (
|
|
|
|
(EFI_PHYSICAL_ADDRESS)(UINTN) PcdGet32 (PcdOvmfSecPageTablesBase),
|
|
|
|
(UINT64)(UINTN) PcdGet32 (PcdOvmfSecPageTablesSize),
|
|
|
|
EfiACPIMemoryNVS
|
|
|
|
);
|
OvmfPkg/PlatformPei: Reserve GHCB-related areas if S3 is supported
BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2198
Protect the memory used by an SEV-ES guest when S3 is supported. This
includes the page table used to break down the 2MB page that contains
the GHCB so that it can be marked un-encrypted, as well as the GHCB
area.
Regarding the lifecycle of the GHCB-related memory areas:
PcdOvmfSecGhcbPageTableBase
PcdOvmfSecGhcbBase
(a) when and how it is initialized after first boot of the VM
If SEV-ES is enabled, the GHCB-related areas are initialized during
the SEC phase [OvmfPkg/ResetVector/Ia32/PageTables64.asm].
(b) how it is protected from memory allocations during DXE
If S3 and SEV-ES are enabled, then InitializeRamRegions()
[OvmfPkg/PlatformPei/MemDetect.c] protects the ranges with an AcpiNVS
memory allocation HOB, in PEI.
If S3 is disabled, then these ranges are not protected. DXE's own page
tables are first built while still in PEI (see HandOffToDxeCore()
[MdeModulePkg/Core/DxeIplPeim/X64/DxeLoadFunc.c]). Those tables are
located in permanent PEI memory. After CR3 is switched over to them
(which occurs before jumping to the DXE core entry point), we don't have
to preserve PcdOvmfSecGhcbPageTableBase. PEI switches to GHCB pages in
permanent PEI memory and DXE will use these PEI GHCB pages, so we don't
have to preserve PcdOvmfSecGhcbBase.
(c) how it is protected from the OS
If S3 is enabled, then (b) reserves it from the OS too.
If S3 is disabled, then the range needs no protection.
(d) how it is accessed on the S3 resume path
It is rewritten same as in (a), which is fine because (b) reserved it.
(e) how it is accessed on the warm reset path
It is rewritten same as in (a).
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@arm.com>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Julien Grall <julien@xen.org>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Regression-tested-by: Laszlo Ersek <lersek@redhat.com>
2020-08-12 22:21:40 +02:00
|
|
|
|
|
|
|
if (MemEncryptSevEsIsEnabled ()) {
|
|
|
|
//
|
|
|
|
// If SEV-ES is enabled, reserve the GHCB-related memory area. This
|
|
|
|
// includes the extra page table used to break down the 2MB page
|
|
|
|
// mapping into 4KB page entries where the GHCB resides and the
|
|
|
|
// GHCB area itself.
|
|
|
|
//
|
|
|
|
// Since this memory range will be used by the Reset Vector on S3
|
|
|
|
// resume, it must be reserved as ACPI NVS.
|
|
|
|
//
|
|
|
|
BuildMemoryAllocationHob (
|
|
|
|
(EFI_PHYSICAL_ADDRESS)(UINTN) PcdGet32 (PcdOvmfSecGhcbPageTableBase),
|
|
|
|
(UINT64)(UINTN) PcdGet32 (PcdOvmfSecGhcbPageTableSize),
|
|
|
|
EfiACPIMemoryNVS
|
|
|
|
);
|
|
|
|
BuildMemoryAllocationHob (
|
|
|
|
(EFI_PHYSICAL_ADDRESS)(UINTN) PcdGet32 (PcdOvmfSecGhcbBase),
|
|
|
|
(UINT64)(UINTN) PcdGet32 (PcdOvmfSecGhcbSize),
|
|
|
|
EfiACPIMemoryNVS
|
|
|
|
);
|
2021-01-07 19:48:24 +01:00
|
|
|
BuildMemoryAllocationHob (
|
|
|
|
(EFI_PHYSICAL_ADDRESS)(UINTN) PcdGet32 (PcdOvmfSecGhcbBackupBase),
|
|
|
|
(UINT64)(UINTN) PcdGet32 (PcdOvmfSecGhcbBackupSize),
|
|
|
|
EfiACPIMemoryNVS
|
|
|
|
);
|
OvmfPkg/PlatformPei: Reserve GHCB-related areas if S3 is supported
BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2198
Protect the memory used by an SEV-ES guest when S3 is supported. This
includes the page table used to break down the 2MB page that contains
the GHCB so that it can be marked un-encrypted, as well as the GHCB
area.
Regarding the lifecycle of the GHCB-related memory areas:
PcdOvmfSecGhcbPageTableBase
PcdOvmfSecGhcbBase
(a) when and how it is initialized after first boot of the VM
If SEV-ES is enabled, the GHCB-related areas are initialized during
the SEC phase [OvmfPkg/ResetVector/Ia32/PageTables64.asm].
(b) how it is protected from memory allocations during DXE
If S3 and SEV-ES are enabled, then InitializeRamRegions()
[OvmfPkg/PlatformPei/MemDetect.c] protects the ranges with an AcpiNVS
memory allocation HOB, in PEI.
If S3 is disabled, then these ranges are not protected. DXE's own page
tables are first built while still in PEI (see HandOffToDxeCore()
[MdeModulePkg/Core/DxeIplPeim/X64/DxeLoadFunc.c]). Those tables are
located in permanent PEI memory. After CR3 is switched over to them
(which occurs before jumping to the DXE core entry point), we don't have
to preserve PcdOvmfSecGhcbPageTableBase. PEI switches to GHCB pages in
permanent PEI memory and DXE will use these PEI GHCB pages, so we don't
have to preserve PcdOvmfSecGhcbBase.
(c) how it is protected from the OS
If S3 is enabled, then (b) reserves it from the OS too.
If S3 is disabled, then the range needs no protection.
(d) how it is accessed on the S3 resume path
It is rewritten same as in (a), which is fine because (b) reserved it.
(e) how it is accessed on the warm reset path
It is rewritten same as in (a).
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@arm.com>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Julien Grall <julien@xen.org>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Regression-tested-by: Laszlo Ersek <lersek@redhat.com>
2020-08-12 22:21:40 +02:00
|
|
|
}
|
2014-03-04 09:02:52 +01:00
|
|
|
#endif
|
OvmfPkg: PlatformPei: lifecycle fixes for the LockBox area
If (mBootMode == BOOT_ON_S3_RESUME) -- that is, we are resuming --, then
the patch has no observable effect.
If (mBootMode != BOOT_ON_S3_RESUME && mS3Supported) -- that is, we are
booting or rebooting, and S3 is supported), then the patch has no
observable effect either.
If (mBootMode != BOOT_ON_S3_RESUME && !mS3Supported) -- that is, we are
booting or rebooting, and S3 is unsupported), then the patch effects the
following two fixes:
- The LockBox storage is reserved from DXE (but not the OS). Drivers in
DXE may save data in the LockBox regardless of S3 support, potentially
corrupting any overlapping allocations. Make sure there's no overlap.
- The LockBox storage is cleared. A LockBox inherited across a non-resume
reboot, populated with well-known GUIDs, breaks drivers that want to
save entries with those GUIDs.
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Tested-by: Matt Fleming <matt.fleming@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
git-svn-id: https://svn.code.sf.net/p/edk2/code/trunk/edk2@15418 6f19259b-4bc3-4df7-8a09-765794883524
2014-03-31 22:35:50 +02:00
|
|
|
}
|
2014-03-04 09:03:23 +01:00
|
|
|
|
OvmfPkg: PlatformPei: lifecycle fixes for the LockBox area
If (mBootMode == BOOT_ON_S3_RESUME) -- that is, we are resuming --, then
the patch has no observable effect.
If (mBootMode != BOOT_ON_S3_RESUME && mS3Supported) -- that is, we are
booting or rebooting, and S3 is supported), then the patch has no
observable effect either.
If (mBootMode != BOOT_ON_S3_RESUME && !mS3Supported) -- that is, we are
booting or rebooting, and S3 is unsupported), then the patch effects the
following two fixes:
- The LockBox storage is reserved from DXE (but not the OS). Drivers in
DXE may save data in the LockBox regardless of S3 support, potentially
corrupting any overlapping allocations. Make sure there's no overlap.
- The LockBox storage is cleared. A LockBox inherited across a non-resume
reboot, populated with well-known GUIDs, breaks drivers that want to
save entries with those GUIDs.
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Tested-by: Matt Fleming <matt.fleming@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
git-svn-id: https://svn.code.sf.net/p/edk2/code/trunk/edk2@15418 6f19259b-4bc3-4df7-8a09-765794883524
2014-03-31 22:35:50 +02:00
|
|
|
if (mBootMode != BOOT_ON_S3_RESUME) {
|
2015-11-30 19:42:10 +01:00
|
|
|
if (!FeaturePcdGet (PcdSmmSmramRequire)) {
|
|
|
|
//
|
|
|
|
// Reserve the lock box storage area
|
|
|
|
//
|
|
|
|
// Since this memory range will be used on S3 resume, it must be
|
|
|
|
// reserved as ACPI NVS.
|
|
|
|
//
|
|
|
|
// If S3 is unsupported, then various drivers might still write to the
|
|
|
|
// LockBox area. We ought to prevent DXE from serving allocation requests
|
|
|
|
// such that they would overlap the LockBox storage.
|
|
|
|
//
|
|
|
|
ZeroMem (
|
|
|
|
(VOID*)(UINTN) PcdGet32 (PcdOvmfLockBoxStorageBase),
|
|
|
|
(UINTN) PcdGet32 (PcdOvmfLockBoxStorageSize)
|
|
|
|
);
|
|
|
|
BuildMemoryAllocationHob (
|
|
|
|
(EFI_PHYSICAL_ADDRESS)(UINTN) PcdGet32 (PcdOvmfLockBoxStorageBase),
|
|
|
|
(UINT64)(UINTN) PcdGet32 (PcdOvmfLockBoxStorageSize),
|
|
|
|
mS3Supported ? EfiACPIMemoryNVS : EfiBootServicesData
|
|
|
|
);
|
|
|
|
}
|
OvmfPkg: PlatformPei: account for TSEG size with PcdSmmSmramRequire set
PlatformPei calls GetSystemMemorySizeBelow4gb() in three locations:
- PublishPeiMemory(): on normal boot, the permanent PEI RAM is installed
so that it ends with the RAM below 4GB,
- QemuInitializeRam(): on normal boot, memory resource descriptor HOBs are
created for the RAM below 4GB; plus MTRR attributes are set
(independently of S3 vs. normal boot)
- MemMapInitialization(): an MMIO resource descriptor HOB is created for
PCI resource allocation, on normal boot, starting at max(RAM below 4GB,
2GB).
The first two of these is adjusted for the configured TSEG size, if
PcdSmmSmramRequire is set:
- In PublishPeiMemory(), the permanent PEI RAM is kept under TSEG.
- In QemuInitializeRam(), we must keep the DXE out of TSEG.
One idea would be to simply trim the [1MB .. LowerMemorySize] memory
resource descriptor HOB, leaving a hole for TSEG in the memory space
map.
The SMM IPL will however want to massage the caching attributes of the
SMRAM range that it loads the SMM core into, with
gDS->SetMemorySpaceAttributes(), and that won't work on a hole. So,
instead of trimming this range, split the TSEG area off, and report it
as a cacheable reserved memory resource.
Finally, since reserved memory can be allocated too, pre-allocate TSEG
in InitializeRamRegions(), after QemuInitializeRam() returns. (Note that
this step alone does not suffice without the resource descriptor HOB
trickery: if we omit that, then the DXE IPL PEIM fails to load and start
the DXE core.)
- In MemMapInitialization(), the start of the PCI MMIO range is not
affected.
We choose the largest option (8MB) for the default TSEG size. Michael
Kinney pointed out that the SMBASE relocation in PiSmmCpuDxeSmm consumes
SMRAM proportionally to the number of CPUs. From the three options
available, he reported that 8MB was both necessary and sufficient for the
SMBASE relocation to succeed with 255 CPUs:
- http://thread.gmane.org/gmane.comp.bios.edk2.devel/3020/focus=3137
- http://thread.gmane.org/gmane.comp.bios.edk2.devel/3020/focus=3177
Cc: Michael Kinney <michael.d.kinney@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Michael Kinney <michael.d.kinney@intel.com>
git-svn-id: https://svn.code.sf.net/p/edk2/code/trunk/edk2@19039 6f19259b-4bc3-4df7-8a09-765794883524
2015-11-30 19:41:33 +01:00
|
|
|
|
|
|
|
if (FeaturePcdGet (PcdSmmSmramRequire)) {
|
|
|
|
UINT32 TsegSize;
|
|
|
|
|
|
|
|
//
|
|
|
|
// Make sure the TSEG area that we reported as a reserved memory resource
|
|
|
|
// cannot be used for reserved memory allocations.
|
|
|
|
//
|
2017-07-04 12:44:05 +02:00
|
|
|
TsegSize = mQ35TsegMbytes * SIZE_1MB;
|
OvmfPkg: PlatformPei: account for TSEG size with PcdSmmSmramRequire set
PlatformPei calls GetSystemMemorySizeBelow4gb() in three locations:
- PublishPeiMemory(): on normal boot, the permanent PEI RAM is installed
so that it ends with the RAM below 4GB,
- QemuInitializeRam(): on normal boot, memory resource descriptor HOBs are
created for the RAM below 4GB; plus MTRR attributes are set
(independently of S3 vs. normal boot)
- MemMapInitialization(): an MMIO resource descriptor HOB is created for
PCI resource allocation, on normal boot, starting at max(RAM below 4GB,
2GB).
The first two of these is adjusted for the configured TSEG size, if
PcdSmmSmramRequire is set:
- In PublishPeiMemory(), the permanent PEI RAM is kept under TSEG.
- In QemuInitializeRam(), we must keep the DXE out of TSEG.
One idea would be to simply trim the [1MB .. LowerMemorySize] memory
resource descriptor HOB, leaving a hole for TSEG in the memory space
map.
The SMM IPL will however want to massage the caching attributes of the
SMRAM range that it loads the SMM core into, with
gDS->SetMemorySpaceAttributes(), and that won't work on a hole. So,
instead of trimming this range, split the TSEG area off, and report it
as a cacheable reserved memory resource.
Finally, since reserved memory can be allocated too, pre-allocate TSEG
in InitializeRamRegions(), after QemuInitializeRam() returns. (Note that
this step alone does not suffice without the resource descriptor HOB
trickery: if we omit that, then the DXE IPL PEIM fails to load and start
the DXE core.)
- In MemMapInitialization(), the start of the PCI MMIO range is not
affected.
We choose the largest option (8MB) for the default TSEG size. Michael
Kinney pointed out that the SMBASE relocation in PiSmmCpuDxeSmm consumes
SMRAM proportionally to the number of CPUs. From the three options
available, he reported that 8MB was both necessary and sufficient for the
SMBASE relocation to succeed with 255 CPUs:
- http://thread.gmane.org/gmane.comp.bios.edk2.devel/3020/focus=3137
- http://thread.gmane.org/gmane.comp.bios.edk2.devel/3020/focus=3177
Cc: Michael Kinney <michael.d.kinney@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Michael Kinney <michael.d.kinney@intel.com>
git-svn-id: https://svn.code.sf.net/p/edk2/code/trunk/edk2@19039 6f19259b-4bc3-4df7-8a09-765794883524
2015-11-30 19:41:33 +01:00
|
|
|
BuildMemoryAllocationHob (
|
|
|
|
GetSystemMemorySizeBelow4gb() - TsegSize,
|
|
|
|
TsegSize,
|
|
|
|
EfiReservedMemoryType
|
|
|
|
);
|
OvmfPkg/PlatformPei: reserve the SMRAM at the default SMBASE, if it exists
The 128KB SMRAM at the default SMBASE will be used for protecting the
initial SMI handler for hot-plugged VCPUs. After platform reset, the SMRAM
in question is open (and looks just like RAM). When BDS signals
EFI_DXE_MM_READY_TO_LOCK_PROTOCOL (and so TSEG is locked down), we're
going to lock the SMRAM at the default SMBASE too.
For this, we have to reserve said SMRAM area as early as possible, from
components in PEI, DXE, and OS runtime.
* QemuInitializeRam() currently produces a single resource descriptor HOB,
for exposing the system RAM available under 1GB. This occurs during both
normal boot and S3 resume identically (the latter only for the sake of
CpuMpPei borrowing low RAM for the AP startup vector).
But, the SMRAM at the default SMBASE falls in the middle of the current
system RAM HOB. Split the HOB, and cover the SMRAM with a reserved
memory HOB in the middle. CpuMpPei (via MpInitLib) skips reserved memory
HOBs.
* InitializeRamRegions() is responsible for producing memory allocation
HOBs, carving out parts of the resource descriptor HOBs produced in
QemuInitializeRam(). Allocate the above-introduced reserved memory
region in full, similarly to how we treat TSEG, so that DXE and the OS
avoid the locked SMRAM (black hole) in this area.
(Note that these allocations only occur on the normal boot path, as they
matter for the UEFI memory map, which cannot be changed during S3
resume.)
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1512
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Jiewen Yao <jiewen.yao@intel.com>
Message-Id: <20200129214412.2361-8-lersek@redhat.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
2019-09-20 15:12:27 +02:00
|
|
|
//
|
|
|
|
// Similarly, allocate away the (already reserved) SMRAM at the default
|
|
|
|
// SMBASE, if it exists.
|
|
|
|
//
|
|
|
|
if (mQ35SmramAtDefaultSmbase) {
|
|
|
|
BuildMemoryAllocationHob (
|
|
|
|
SMM_DEFAULT_SMBASE,
|
|
|
|
MCH_DEFAULT_SMBASE_SIZE,
|
|
|
|
EfiReservedMemoryType
|
|
|
|
);
|
|
|
|
}
|
OvmfPkg: PlatformPei: account for TSEG size with PcdSmmSmramRequire set
PlatformPei calls GetSystemMemorySizeBelow4gb() in three locations:
- PublishPeiMemory(): on normal boot, the permanent PEI RAM is installed
so that it ends with the RAM below 4GB,
- QemuInitializeRam(): on normal boot, memory resource descriptor HOBs are
created for the RAM below 4GB; plus MTRR attributes are set
(independently of S3 vs. normal boot)
- MemMapInitialization(): an MMIO resource descriptor HOB is created for
PCI resource allocation, on normal boot, starting at max(RAM below 4GB,
2GB).
The first two of these is adjusted for the configured TSEG size, if
PcdSmmSmramRequire is set:
- In PublishPeiMemory(), the permanent PEI RAM is kept under TSEG.
- In QemuInitializeRam(), we must keep the DXE out of TSEG.
One idea would be to simply trim the [1MB .. LowerMemorySize] memory
resource descriptor HOB, leaving a hole for TSEG in the memory space
map.
The SMM IPL will however want to massage the caching attributes of the
SMRAM range that it loads the SMM core into, with
gDS->SetMemorySpaceAttributes(), and that won't work on a hole. So,
instead of trimming this range, split the TSEG area off, and report it
as a cacheable reserved memory resource.
Finally, since reserved memory can be allocated too, pre-allocate TSEG
in InitializeRamRegions(), after QemuInitializeRam() returns. (Note that
this step alone does not suffice without the resource descriptor HOB
trickery: if we omit that, then the DXE IPL PEIM fails to load and start
the DXE core.)
- In MemMapInitialization(), the start of the PCI MMIO range is not
affected.
We choose the largest option (8MB) for the default TSEG size. Michael
Kinney pointed out that the SMBASE relocation in PiSmmCpuDxeSmm consumes
SMRAM proportionally to the number of CPUs. From the three options
available, he reported that 8MB was both necessary and sufficient for the
SMBASE relocation to succeed with 255 CPUs:
- http://thread.gmane.org/gmane.comp.bios.edk2.devel/3020/focus=3137
- http://thread.gmane.org/gmane.comp.bios.edk2.devel/3020/focus=3177
Cc: Michael Kinney <michael.d.kinney@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Michael Kinney <michael.d.kinney@intel.com>
git-svn-id: https://svn.code.sf.net/p/edk2/code/trunk/edk2@19039 6f19259b-4bc3-4df7-8a09-765794883524
2015-11-30 19:41:33 +01:00
|
|
|
}
|
2020-08-12 22:21:41 +02:00
|
|
|
|
|
|
|
#ifdef MDE_CPU_X64
|
|
|
|
if (MemEncryptSevEsIsEnabled ()) {
|
|
|
|
//
|
|
|
|
// If SEV-ES is enabled, reserve the SEV-ES work area.
|
|
|
|
//
|
|
|
|
// Since this memory range will be used by the Reset Vector on S3
|
|
|
|
// resume, it must be reserved as ACPI NVS.
|
|
|
|
//
|
|
|
|
// If S3 is unsupported, then various drivers might still write to the
|
|
|
|
// work area. We ought to prevent DXE from serving allocation requests
|
|
|
|
// such that they would overlap the work area.
|
|
|
|
//
|
|
|
|
BuildMemoryAllocationHob (
|
|
|
|
(EFI_PHYSICAL_ADDRESS)(UINTN) FixedPcdGet32 (PcdSevEsWorkAreaBase),
|
|
|
|
(UINT64)(UINTN) FixedPcdGet32 (PcdSevEsWorkAreaSize),
|
|
|
|
mS3Supported ? EfiACPIMemoryNVS : EfiBootServicesData
|
|
|
|
);
|
|
|
|
}
|
|
|
|
#endif
|
2014-03-04 09:02:16 +01:00
|
|
|
}
|
2014-02-01 22:22:48 +01:00
|
|
|
}
|