coreboot-kgpe-d16/src/lib/decompressor.c

72 lines
1.8 KiB
C
Raw Normal View History

Introduce bootblock self-decompression Masked ROMs are the silent killers of boot speed on devices without memory-mapped SPI flash. They often contain awfully slow SPI drivers (presumably bit-banged) that take hundreds of milliseconds to load our bootblock, and every extra kilobyte of bootblock size has a hugely disproportionate impact on boot speed. The coreboot timestamps can never show that component, but it impacts our users all the same. This patch tries to alleviate that issue a bit by allowing us to compress the bootblock with LZ4, which can cut its size down to nearly half. Of course, masked ROMs usually don't come with decompression algorithms built in, so we need to introduce a little decompression stub that can decompress the rest of the bootblock. This is done by creating a new "decompressor" stage which runs before the bootblock, but includes the compressed bootblock code in its data section. It needs to be as small as possible to get a real benefit from this approach, which means no device drivers, no console output, no exception handling, etc. Besides the decompression algorithm itself we only include the timer driver so that we can measure the boot speed impact of decompression. On ARM and ARM64 systems, we also need to give SoC code a chance to initialize the MMU, since running decompression without MMU is prohibitively slow on these architectures. This feature is implemented for ARM and ARM64 architectures for now, although most of it is architecture-independent and it should be relatively simple to port to other platforms where a masked ROM loads the bootblock into SRAM. It is also supposed to be a clean starting point from which later optimizations can hopefully cut down the decompression stub size (currently ~4K on RK3399) a bit more. NOTE: Bootblock compression is not for everyone. Possible side effects include trying to run LZ4 on CPUs that come out of reset extremely underclocked or enabling this too early in SoC bring-up and getting frustrated trying to find issues in an undebuggable environment. Ask your SoC vendor if bootblock compression is right for you. Change-Id: I0dc1cad9ae7508892e477739e743cd1afb5945e8 Signed-off-by: Julius Werner <jwerner@chromium.org> Reviewed-on: https://review.coreboot.org/26340 Tested-by: build bot (Jenkins) <no-reply@coreboot.org> Reviewed-by: Aaron Durbin <adurbin@chromium.org>
2018-05-16 23:14:04 +02:00
/*
* This file is part of the coreboot project.
*
* Copyright 2018 Google Inc.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License as
* published by the Free Software Foundation; version 2 of
* the License.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*/
#include <bootblock_common.h>
#include <commonlib/compression.h>
#include <delay.h>
#include <program_loading.h>
#include <symbols.h>
#include <timestamp.h>
Introduce bootblock self-decompression Masked ROMs are the silent killers of boot speed on devices without memory-mapped SPI flash. They often contain awfully slow SPI drivers (presumably bit-banged) that take hundreds of milliseconds to load our bootblock, and every extra kilobyte of bootblock size has a hugely disproportionate impact on boot speed. The coreboot timestamps can never show that component, but it impacts our users all the same. This patch tries to alleviate that issue a bit by allowing us to compress the bootblock with LZ4, which can cut its size down to nearly half. Of course, masked ROMs usually don't come with decompression algorithms built in, so we need to introduce a little decompression stub that can decompress the rest of the bootblock. This is done by creating a new "decompressor" stage which runs before the bootblock, but includes the compressed bootblock code in its data section. It needs to be as small as possible to get a real benefit from this approach, which means no device drivers, no console output, no exception handling, etc. Besides the decompression algorithm itself we only include the timer driver so that we can measure the boot speed impact of decompression. On ARM and ARM64 systems, we also need to give SoC code a chance to initialize the MMU, since running decompression without MMU is prohibitively slow on these architectures. This feature is implemented for ARM and ARM64 architectures for now, although most of it is architecture-independent and it should be relatively simple to port to other platforms where a masked ROM loads the bootblock into SRAM. It is also supposed to be a clean starting point from which later optimizations can hopefully cut down the decompression stub size (currently ~4K on RK3399) a bit more. NOTE: Bootblock compression is not for everyone. Possible side effects include trying to run LZ4 on CPUs that come out of reset extremely underclocked or enabling this too early in SoC bring-up and getting frustrated trying to find issues in an undebuggable environment. Ask your SoC vendor if bootblock compression is right for you. Change-Id: I0dc1cad9ae7508892e477739e743cd1afb5945e8 Signed-off-by: Julius Werner <jwerner@chromium.org> Reviewed-on: https://review.coreboot.org/26340 Tested-by: build bot (Jenkins) <no-reply@coreboot.org> Reviewed-by: Aaron Durbin <adurbin@chromium.org>
2018-05-16 23:14:04 +02:00
extern u8 compressed_bootblock[];
asm (
".pushsection .data.compressed_bootblock,\"a\",@progbits\n\t"
".type compressed_bootblock, %object\n\t"
".balign 8\n"
"compressed_bootblock:\n\t"
".incbin \"" __BUILD_DIR__ "/cbfs/" CONFIG_CBFS_PREFIX "/bootblock.lz4\"\n\t"
".size compressed_bootblock, . - compressed_bootblock\n\t"
".popsection\n\t"
);
struct bootblock_arg arg = {
.base_timestamp = 0,
.num_timestamps = 2,
.timestamps = {
{ .entry_id = TS_START_ULZ4F },
{ .entry_id = TS_END_ULZ4F },
},
};
struct prog prog_bootblock = {
.type = PROG_BOOTBLOCK,
.entry = (void *)_bootblock,
.arg = &arg,
};
__weak void decompressor_soc_init(void) { /* no-op */ }
void main(void)
{
init_timer();
if (CONFIG(COLLECT_TIMESTAMPS))
Introduce bootblock self-decompression Masked ROMs are the silent killers of boot speed on devices without memory-mapped SPI flash. They often contain awfully slow SPI drivers (presumably bit-banged) that take hundreds of milliseconds to load our bootblock, and every extra kilobyte of bootblock size has a hugely disproportionate impact on boot speed. The coreboot timestamps can never show that component, but it impacts our users all the same. This patch tries to alleviate that issue a bit by allowing us to compress the bootblock with LZ4, which can cut its size down to nearly half. Of course, masked ROMs usually don't come with decompression algorithms built in, so we need to introduce a little decompression stub that can decompress the rest of the bootblock. This is done by creating a new "decompressor" stage which runs before the bootblock, but includes the compressed bootblock code in its data section. It needs to be as small as possible to get a real benefit from this approach, which means no device drivers, no console output, no exception handling, etc. Besides the decompression algorithm itself we only include the timer driver so that we can measure the boot speed impact of decompression. On ARM and ARM64 systems, we also need to give SoC code a chance to initialize the MMU, since running decompression without MMU is prohibitively slow on these architectures. This feature is implemented for ARM and ARM64 architectures for now, although most of it is architecture-independent and it should be relatively simple to port to other platforms where a masked ROM loads the bootblock into SRAM. It is also supposed to be a clean starting point from which later optimizations can hopefully cut down the decompression stub size (currently ~4K on RK3399) a bit more. NOTE: Bootblock compression is not for everyone. Possible side effects include trying to run LZ4 on CPUs that come out of reset extremely underclocked or enabling this too early in SoC bring-up and getting frustrated trying to find issues in an undebuggable environment. Ask your SoC vendor if bootblock compression is right for you. Change-Id: I0dc1cad9ae7508892e477739e743cd1afb5945e8 Signed-off-by: Julius Werner <jwerner@chromium.org> Reviewed-on: https://review.coreboot.org/26340 Tested-by: build bot (Jenkins) <no-reply@coreboot.org> Reviewed-by: Aaron Durbin <adurbin@chromium.org>
2018-05-16 23:14:04 +02:00
arg.base_timestamp = timestamp_get();
decompressor_soc_init();
if (CONFIG(COLLECT_TIMESTAMPS))
Introduce bootblock self-decompression Masked ROMs are the silent killers of boot speed on devices without memory-mapped SPI flash. They often contain awfully slow SPI drivers (presumably bit-banged) that take hundreds of milliseconds to load our bootblock, and every extra kilobyte of bootblock size has a hugely disproportionate impact on boot speed. The coreboot timestamps can never show that component, but it impacts our users all the same. This patch tries to alleviate that issue a bit by allowing us to compress the bootblock with LZ4, which can cut its size down to nearly half. Of course, masked ROMs usually don't come with decompression algorithms built in, so we need to introduce a little decompression stub that can decompress the rest of the bootblock. This is done by creating a new "decompressor" stage which runs before the bootblock, but includes the compressed bootblock code in its data section. It needs to be as small as possible to get a real benefit from this approach, which means no device drivers, no console output, no exception handling, etc. Besides the decompression algorithm itself we only include the timer driver so that we can measure the boot speed impact of decompression. On ARM and ARM64 systems, we also need to give SoC code a chance to initialize the MMU, since running decompression without MMU is prohibitively slow on these architectures. This feature is implemented for ARM and ARM64 architectures for now, although most of it is architecture-independent and it should be relatively simple to port to other platforms where a masked ROM loads the bootblock into SRAM. It is also supposed to be a clean starting point from which later optimizations can hopefully cut down the decompression stub size (currently ~4K on RK3399) a bit more. NOTE: Bootblock compression is not for everyone. Possible side effects include trying to run LZ4 on CPUs that come out of reset extremely underclocked or enabling this too early in SoC bring-up and getting frustrated trying to find issues in an undebuggable environment. Ask your SoC vendor if bootblock compression is right for you. Change-Id: I0dc1cad9ae7508892e477739e743cd1afb5945e8 Signed-off-by: Julius Werner <jwerner@chromium.org> Reviewed-on: https://review.coreboot.org/26340 Tested-by: build bot (Jenkins) <no-reply@coreboot.org> Reviewed-by: Aaron Durbin <adurbin@chromium.org>
2018-05-16 23:14:04 +02:00
arg.timestamps[0].entry_stamp = timestamp_get();
size_t out_size = ulz4f(compressed_bootblock, _bootblock);
prog_segment_loaded((uintptr_t)_bootblock, out_size, SEG_FINAL);
if (CONFIG(COLLECT_TIMESTAMPS))
Introduce bootblock self-decompression Masked ROMs are the silent killers of boot speed on devices without memory-mapped SPI flash. They often contain awfully slow SPI drivers (presumably bit-banged) that take hundreds of milliseconds to load our bootblock, and every extra kilobyte of bootblock size has a hugely disproportionate impact on boot speed. The coreboot timestamps can never show that component, but it impacts our users all the same. This patch tries to alleviate that issue a bit by allowing us to compress the bootblock with LZ4, which can cut its size down to nearly half. Of course, masked ROMs usually don't come with decompression algorithms built in, so we need to introduce a little decompression stub that can decompress the rest of the bootblock. This is done by creating a new "decompressor" stage which runs before the bootblock, but includes the compressed bootblock code in its data section. It needs to be as small as possible to get a real benefit from this approach, which means no device drivers, no console output, no exception handling, etc. Besides the decompression algorithm itself we only include the timer driver so that we can measure the boot speed impact of decompression. On ARM and ARM64 systems, we also need to give SoC code a chance to initialize the MMU, since running decompression without MMU is prohibitively slow on these architectures. This feature is implemented for ARM and ARM64 architectures for now, although most of it is architecture-independent and it should be relatively simple to port to other platforms where a masked ROM loads the bootblock into SRAM. It is also supposed to be a clean starting point from which later optimizations can hopefully cut down the decompression stub size (currently ~4K on RK3399) a bit more. NOTE: Bootblock compression is not for everyone. Possible side effects include trying to run LZ4 on CPUs that come out of reset extremely underclocked or enabling this too early in SoC bring-up and getting frustrated trying to find issues in an undebuggable environment. Ask your SoC vendor if bootblock compression is right for you. Change-Id: I0dc1cad9ae7508892e477739e743cd1afb5945e8 Signed-off-by: Julius Werner <jwerner@chromium.org> Reviewed-on: https://review.coreboot.org/26340 Tested-by: build bot (Jenkins) <no-reply@coreboot.org> Reviewed-by: Aaron Durbin <adurbin@chromium.org>
2018-05-16 23:14:04 +02:00
arg.timestamps[1].entry_stamp = timestamp_get();
prog_run(&prog_bootblock);
}