This reverts commit 6693f359b3c213513c5096a06c6f67244a44dc52..
678f851312.
Python3 migration is the fundamental change. It requires every developer
to install Python3. Before this migration, the well communication and wide
verification must be done. But now, most people is not aware of this change,
and not try it. So, Python3 migration is reverted and be moved to edk2-staging
Python3 branch for the edk2 user evaluation.
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Liming Gao <liming.gao@intel.com>
We test a simple case of UTF-8 with and without the UTF-8 BOM.
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Michael D Kinney <michael.d.kinney@intel.com>
Reviewed-by: Yingke Liu <yingke.d.liu@intel.com>
git-svn-id: https://svn.code.sf.net/p/edk2/code/trunk/edk2@17699 6f19259b-4bc3-4df7-8a09-765794883524
Surrogate pair characters can be encoded in UTF-8 files, but they are
not valid UCS-2 characters.
For example, this python interpreter code:
>>> import codecs
>>> codecs.encode(u'\ud801', 'utf-8')
'\xed\xa0\x81'
But, the range of 0xd800 - 0xdfff should be rejected as unicode code
points because they are reserved for the surrogate pair usage in
UTF-16 files.
We test that this case is rejected for UTF-8 with and without the
UTF-8 BOM.
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Michael D Kinney <michael.d.kinney@intel.com>
Reviewed-by: Yingke Liu <yingke.d.liu@intel.com>
git-svn-id: https://svn.code.sf.net/p/edk2/code/trunk/edk2@17698 6f19259b-4bc3-4df7-8a09-765794883524
Since UTF-8 .uni unicode files might contain strings with unicode code
points larger than 16-bits, and UEFI only supports UCS-2 characters,
we need to make sure that BaseTools rejects these characters in UTF-8
.uni source files.
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Michael D Kinney <michael.d.kinney@intel.com>
Reviewed-by: Yingke Liu <yingke.d.liu@intel.com>
git-svn-id: https://svn.code.sf.net/p/edk2/code/trunk/edk2@17697 6f19259b-4bc3-4df7-8a09-765794883524
Supplementary Plane characters can exist in UTF-16 files,
but they are not valid UCS-2 characters.
For example, this python interpreter code:
>>> import codecs
>>> codecs.encode(u'\U00010300', 'utf-16')
'\xff\xfe\x00\xd8\x00\xdf'
Therefore the UCS-4 0x00010300 character is encoded as two
16-bit numbers (0xd800 0xdf00) in a little endian UTF-16
file.
For more information, see:
http://en.wikipedia.org/wiki/UTF-16#U.2B10000_to_U.2B10FFFF
This test checks to make sure that BaseTools will reject these
characters in UTF-16 files.
The range of 0xd800 - 0xdfff should also be rejected as unicode code
points because they are reserved for the surrogate pair usage in
UTF-16 files.
This test was fixed by the previous commit:
"BaseTools/UniClassObject: Verify valid UCS-2 chars in UTF-16 .uni files"
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Michael D Kinney <michael.d.kinney@intel.com>
Reviewed-by: Yingke Liu <yingke.d.liu@intel.com>
git-svn-id: https://svn.code.sf.net/p/edk2/code/trunk/edk2@17695 6f19259b-4bc3-4df7-8a09-765794883524
This verifies that a UTF-16 data (with BOM) .uni file is successfully
read.
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Michael D Kinney <michael.d.kinney@intel.com>
Reviewed-by: Yingke Liu <yingke.d.liu@intel.com>
git-svn-id: https://svn.code.sf.net/p/edk2/code/trunk/edk2@17693 6f19259b-4bc3-4df7-8a09-765794883524