Fallback mechanism: Difference between revisions

From coreboot
Jump to navigation Jump to search
(some options moved to the `Chipset` menu)
 
(57 intermediate revisions by 3 users not shown)
Line 1: Line 1:
= WARNING =
== Introduction  ==
This rewritten howto needs to be tested by someone with a recovery method first, then remove this notice. (I could have forgetten some parts).
This mechanism permits to test and recover from certain non-booting coreboot images.
== Boards support ==


=== Board support requirements ===
This works by having two coreboot images in the same flash chip:
* A working cmos layout with reboot_bits exported.
* One fallback/ image: The working image.
* Probably other patches, this needs testing on unsuported boards.
* One normal/ image: The image to be tested.


=== Supported boards ===
This feature is not widely tested on all boards. It also requires it to have a reboot_counter exported in the CMOS layout.
* X60, X60s, X60t


=== Unsupported boards ===
This also doesn't protect against human errors when using such feature, or bugs in the code responsible for switching between the two images.
* T60: requires at least reboot_bits to be exported
** Seems that somehow, the bits are reset to normal arround ramstage, need investigation. As a workarround you could boot, then power off before ramstage.
* All other mainboards are untested


== Introduction ==
== Uses cases ==
The fallback mecanism permits to be able to use two different prefixes (normal/ and fallback/) for the romstage, ramstage and payload, in the same coreboot image.
* Test new images way faster: if the image doesn't boot it will fallback on the old known-working image and save a long reflashing procedure. Handy for bisecting faster.
* Test new images more safely: Despite of the recommendations of having a way to externally reflash, many new user don't. Still, this method is not totally foolproof.
* More compact testing setup: Since reflashing tools are not mandatory anymore, the tests can be done with less hardware, very useful when traveling.


The switch between the two prefixes can be governed by an nvram configuration parameter.
== How it works ==
Coreboot increments a reboot count at each boot but never clears it. What runs after coreboot is responsible for that.


== Prefixes ==
That way, the count can be cleared by the OS once it's fully booted.
fallback/ is expected to hold the good known working image.


normal/ is expected to hold the image under test
If a certain threshold<ref>Defined by CONFIG_MAX_REBOOT_CNT, typically 3</ref> is attained at boot, coreboot will boot the fallback image.


== Uses cases ==
== Warnings ==
* Test new images way faster: if the image doesn't boot it will fallback on the old known-working image and save a long reflashing procedure
Because we uses two images, it's easy to wrongly identify which image booted:
* Test new images more safely: Despite of the recommendations of having a way to externally reflash, many new user don't. Assuming that the user don't screw up the fallback/ procedure (which adds a layer of complexity) he can test new images more safely because it will fallback on the known good image.
* If the user mistakenly thinks the normal image is booting...
* More compact testing setup: Since reflashing tools are not mandatory anymore, the tests can be done with less voluminous hardware, which means that the test setup is easier to bring with you while travelling.
* But the fallback image always boots...
* Faster bisecting of commit which broke the boot, assuming it broke after the inclusion of that fallback mecanism.
* And the normal image doesn't work...
* And the user flashes the normal in fallback because she thinks it boots fine...
* Then the user bricked her device and has to reflash it externally.


== Enabling the fallback switch ==
== Fallback build  ==
The Fallback switch behaviour is governed by the BOOTBLOCK_SIMPLE and BOOTBLOCK_NORMAL compilation options.
To configure it for fallback, do:
$ make menuconfig
Then in "General setup  --->", near the top use "fallback" in "CBFS prefix to use":
(fallback) CBFS prefix to use
Then near the bottom, make sure to have:
[ ] Update existing coreboot.rom image
And in the "Chipset  --->" menu at the bottom:
Bootblock behaviour (Switch to normal if CMOS says so)  --->
[*] Do not clear reboot count after successful boot


They are selectable in "Bootblock behaviour" in make menuconfig.
You can then build the fallback image with the [[Fallback mechanism/fallback.sh|fallback.sh]] script.


Currently only two choices are available:
== Normal build ==
* BOOTBLOCK_SIMPLE: "(X) Always load fallback"
To configure it for normal, do:
* BOOTBLOCK_NORMAL: "(X) Switch to normal if CMOS says so"
$ make menuconfig
Then in "General setup  --->", near the top use "normal" in "CBFS prefix to use":
(normal) CBFS prefix to use
Then near the bottom, make sure to have:
[*] Update existing coreboot.rom image
And in the "Chipset  --->" menu at the bottom:
Bootblock behaviour (Switch to normal if CMOS says so)  --->
[*] Do not clear reboot count after successful boot


If BOOTBLOCK_SIMPLE is chosen, then fallback/ , then no siwtch will ever be done and fallback/ will always be chosen, since this is the default we have to change it to get it to work.
You can then build with the normal part with the [[Fallback mechanism/normal.sh|normal.sh]] script. It takes an existing coreboot image as argument.
If BOOTBLOCK_NORMAL is chosen, then the switch will be able to work.


If the BOOTBLOCK_NORMAL is chosen, the functionality is disabled but can be enabled later if needed.
== OS configuration ==


Make sure that in "General setup  --->" you have:
=== The manual way ===
(fallback) CBFS prefix to use
An approach is to run switch-to-normal.sh before trying an image.
It's however more error prone than the systemd approach because:
* you have to do it manually, each time, before testing an image.
* If you then want to use that new image, you have to flash it, again, to fallback.


== How it works (summary) ==
==== switch-to-normal.sh ====
Coreboot will switch to fallback/ if the boot count is higher than CONFIG_MAX_REBOOT_CNT (or if normal/ isn't present).
#!/bin/sh
nvramtool -w boot_option=Normal
nvramtool -w reboot_counter=0


Coreboot increments the reboot count at each boot.
==== switch-to-fallback.sh ====
#!/bin/sh
nvramtool -w boot_option=Fallback
nvramtool -w reboot_counter=15


Here, clearing the boot count is delegated to what is run after coreboot.
(Assuming that 15 is the maximum that can be stored in reboot_counter.)


To get the maximum safety out of it, clearing the boot count after the last step of the boot is advised.
=== Systemd ===
Here we use systemd to automatically reset the boot counter after each successful boot (or resume).


=== Example of use ===
We are then supposed to use the normal image daily and only resort to fallback in case of issues.
For instance once the system is fully booted, a systemd unit can reset the boot count.


That way if the coreboot changes makes it impossible to boot a linux kernel or even if GNU/Linux can't fully boot, the boot count won't be reset.
To install it, first install nvramtool (from coreboot sources):
$ cd util/nvramtool
$ make
$ sudo make install


Then the user will power off the computer, and at the next boot CONFIG_MAX_REBOOT_CNT will hopefully be reached.
Then add the following systemd units at their respective paths:
Then coreboot will boot on the good known working image and the boot will complete.
* [[Fallback_mechanism/coreboot@boot.service|/etc/systemd/system/coreboot@boot.service]]
* [[Fallback_mechanism/coreboot@resume.service|/etc/systemd/system/coreboot@resume.service]]


At that point the user is expected to reflash a good image in order not to go in normal/ again at the next boot.
Then enable them with:
$ sudo systemctl enable coreboot@boot.service
$ sudo systemctl start coreboot@boot.service
$ sudo systemctl enable coreboot@resume.service
$ sudo systemctl start coreboot@resume.service


== Current limitations ==
== Current limitations ==
* scripts exist only for the systemd init system, but they are easy to adapt to other init systems
* '''Use of the same cmos.layout in fallback and normal !'''
* suspend/resume systemd scripts not written yet
* The user may wrongly identify which image booted, and because of that, end up reflashing a non-working image.
* some issues can arrise when the nvram layout is not the same between normal/ and fallback/
* Some issues can arrise when the nvram layout is not the same between normal/ and fallback/
* The number of failed boot is 3 by default (for all boards that don't set CONFIG_MAX_REBOOT_CNT)
* The number of failed boot is fixed at compilation time.
* In order to fully boot, some boards do reboot once during the boot procedure. The issue is that it reboot conditionally, and no code has been written yet to take that into account.
* In order to fully boot, some boards do reset conditionally during the boot process resulting in a non-predictable increment of the boot count.
* Payloads can have non-configurable default locations when loading things from cbfs:
* Example script exist only for systemd. Still, they are trivial to adapt to other init systems.
* Payloads sometime have fixed default locations when loading things from cbfs:
** When using grub as a payload, grub.cfg is at etc/grub.cfg by default, so if you want to test grub as a payload, remember to change grub.cfg's path not to interfer with the fallback's grub configuration.
** When using grub as a payload, grub.cfg is at etc/grub.cfg by default, so if you want to test grub as a payload, remember to change grub.cfg's path not to interfer with the fallback's grub configuration.
** Changing the path of what SeaBIOS loads from cbfs is probably configurable with SeaBIOS cbfs symlinks but not yet tested/documented with the use of the fallback mecanism
** Changing the path of what SeaBIOS loads from cbfs is probably configurable with SeaBIOS cbfs symlinks but not yet tested/documented with the use of the fallback mecanism
* Complexity for the user:
* Tested boards need to be listed somewhere.
** Once the normal/ image has been tested, if the user wants to flash it to fallback/ he will have to make sure that the normal/ image was running when he tested it, and that it was not the fallback/ (that could happen due to an error of the user for instance), cbmem -c is a good way to do it.
** The user has to check if the coreboot image with fallback/ that he is adding normal/ to, had CONFIG_BOOTBLOCK_NORMAL enabled. Else it will probably keep booting on normal/
 
== Using it ==
=== Prerequisites ===
* Make sure that your fallback/ image has the [[Fallback_mechanism#Enabling_the_fallback_switch|fallback switch mecanism enabled]]
* Build the fallback image as you would build an image usually
=== Building the normal/ image ===
==== Configuration ====
You need to set the following in "make menuconfig", before building a normal/ image:
[*] Update existing coreboot.rom image
And also set the prefix to normal/ in "General setup  --->":
(normal) CBFS prefix to use
 
Then you will have to use a build script because of the shortcommings of coreboot's Kconfig build system.
 
==== build script ====
The build scrpit takes an existing coreboot image as argument.
 
That image is expected [[Fallback_mechanism#Enabling_the_fallback_switch|To have the fallback switch already enabled]]
 
#!/bin/sh
# In the cases where this work is copyrightable, it falls under the GPLv2
# or later license that is available here:
# https://www.gnu.org/licenses/gpl-2.0.txt
image="$1"
if [ $# -ne 1 ] ; then
echo "Usage $0 <image>"
exit 1
fi
die() {
  echo "Failed"
  exit 1
}
cbfs_remove() {
  file=$1
  ./util/cbfstool/cbfstool ./build/coreboot.rom remove -n ${file}
}
cbfs_reuse_payload() {
  ./util/cbfstool/cbfstool ./build/coreboot.rom extract -f ./build/payload.elf -n fallback/payload
  ./util/cbfstool/cbfstool ./build/coreboot.rom add -f ./build/payload.elf -n normal/payload -t payload
}
make oldconfig || die
make clean || die
mkdir build/ || die
cp ${image} ./build/coreboot.rom || die
cbfs_remove normal/romstage
cbfs_remove normal/ramstage
cbfs_remove normal/coreboot_ram
cbfs_remove normal/payload
cbfs_remove config
# it now adds it automatically
cbfs_remove etc/ps2-keyboard-spinup
 
make || die
# uncomment if you want to reuse fallback's payload
# cbfs_reuse_payload
./util/cbfstool/cbfstool ./build/coreboot.rom print
 
=== OS configuration examples ===
The configurations below assume that the user wants to keep booting on normal/ if the boot doesn't fail.
 
==== Example scripts ====
The most simple way to do it is to run some nvramtool commands, they are described in the scripts below.
set-normal-0.sh has to be run:
* After the boot is completed and is declared a success.
* After the resuming is completed.
 
The way to make them run at boot and after suspend is not described here yet.
 
===== set-fallback-1.sh =====
#!/bin/sh
nvramtool -w boot_option=Fallback
nvramtool -w last_boot=Fallback
nvramtool -w reboot_bits=1
===== set-normal-0.sh =====
#!/bin/sh
nvramtool -w boot_option=Normal
nvramtool -w last_boot=Normal
nvramtool -w reboot_bits=0
 
===== get-nvram.sh =====
#!/bin/sh
nvramtool -a | grep -e boot_option -e last_boot -e reboot_bits
 
==== With systemd ====
===== Systemd setup =====
Requirements:
* nvramtool has to be in the path.


Limitations:
== Issues ==
* This setup doesn't needs to run that systemd unit when resuming from suspend to ram, but it's not described yet here.
=== thinkpad_acpi ===
This linux driver can have some bad interactions with the fallback/normal mecanism: when using it with the volume_control=1 option, volume_mode=1 is required, otherwise after shutting down the computer, it will always boot from fallback.


The unit file below has to be activated with:
This might be because as the default settings of volume_mode touches the nvram, it probably corrupts it at shutdown when saving the alsa state of the volume buttons "sound card" (called EC Mixer). Then at boot, coreboot will detects a corrupted nvram and restore its valid defaults.
systemctl enable coreboot-booted-ok
systemctl start coreboot-booted-ok


===== /etc/systemd/system/coreboot-booted-ok.service: =====
== references ==
#  This file is not part of systemd.
<references/>
#
#  this file is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.
[Unit]
Description=Tell coreboot that the computer booted fine.
DefaultDependencies=no
Wants=display-manager.service
After=display-manager.service
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/local/sbin/nvramtool -w boot_option=Normal
ExecStart=/usr/local/sbin/nvramtool -w last_boot=Normal
ExecStart=/usr/local/sbin/nvramtool -w reboot_bits=0
[Install]
WantedBy=multi-user.target

Latest revision as of 20:59, 25 February 2018

Introduction

This mechanism permits to test and recover from certain non-booting coreboot images.

This works by having two coreboot images in the same flash chip:

  • One fallback/ image: The working image.
  • One normal/ image: The image to be tested.

This feature is not widely tested on all boards. It also requires it to have a reboot_counter exported in the CMOS layout.

This also doesn't protect against human errors when using such feature, or bugs in the code responsible for switching between the two images.

Uses cases

  • Test new images way faster: if the image doesn't boot it will fallback on the old known-working image and save a long reflashing procedure. Handy for bisecting faster.
  • Test new images more safely: Despite of the recommendations of having a way to externally reflash, many new user don't. Still, this method is not totally foolproof.
  • More compact testing setup: Since reflashing tools are not mandatory anymore, the tests can be done with less hardware, very useful when traveling.

How it works

Coreboot increments a reboot count at each boot but never clears it. What runs after coreboot is responsible for that.

That way, the count can be cleared by the OS once it's fully booted.

If a certain threshold<ref>Defined by CONFIG_MAX_REBOOT_CNT, typically 3</ref> is attained at boot, coreboot will boot the fallback image.

Warnings

Because we uses two images, it's easy to wrongly identify which image booted:

  • If the user mistakenly thinks the normal image is booting...
  • But the fallback image always boots...
  • And the normal image doesn't work...
  • And the user flashes the normal in fallback because she thinks it boots fine...
  • Then the user bricked her device and has to reflash it externally.

Fallback build

To configure it for fallback, do:

$ make menuconfig

Then in "General setup --->", near the top use "fallback" in "CBFS prefix to use":

(fallback) CBFS prefix to use

Then near the bottom, make sure to have:

[ ] Update existing coreboot.rom image

And in the "Chipset --->" menu at the bottom:

Bootblock behaviour (Switch to normal if CMOS says so)  --->
[*] Do not clear reboot count after successful boot

You can then build the fallback image with the fallback.sh script.

Normal build

To configure it for normal, do:

$ make menuconfig

Then in "General setup --->", near the top use "normal" in "CBFS prefix to use":

(normal) CBFS prefix to use

Then near the bottom, make sure to have:

[*] Update existing coreboot.rom image

And in the "Chipset --->" menu at the bottom:

Bootblock behaviour (Switch to normal if CMOS says so)  --->
[*] Do not clear reboot count after successful boot

You can then build with the normal part with the normal.sh script. It takes an existing coreboot image as argument.

OS configuration

The manual way

An approach is to run switch-to-normal.sh before trying an image. It's however more error prone than the systemd approach because:

  • you have to do it manually, each time, before testing an image.
  • If you then want to use that new image, you have to flash it, again, to fallback.

switch-to-normal.sh

#!/bin/sh
nvramtool -w boot_option=Normal
nvramtool -w reboot_counter=0

switch-to-fallback.sh

#!/bin/sh
nvramtool -w boot_option=Fallback
nvramtool -w reboot_counter=15

(Assuming that 15 is the maximum that can be stored in reboot_counter.)

Systemd

Here we use systemd to automatically reset the boot counter after each successful boot (or resume).

We are then supposed to use the normal image daily and only resort to fallback in case of issues.

To install it, first install nvramtool (from coreboot sources):

$ cd util/nvramtool
$ make
$ sudo make install

Then add the following systemd units at their respective paths:

Then enable them with:

$ sudo systemctl enable coreboot@boot.service
$ sudo systemctl start coreboot@boot.service
$ sudo systemctl enable coreboot@resume.service
$ sudo systemctl start coreboot@resume.service

Current limitations

  • Use of the same cmos.layout in fallback and normal !
  • The user may wrongly identify which image booted, and because of that, end up reflashing a non-working image.
  • Some issues can arrise when the nvram layout is not the same between normal/ and fallback/
  • The number of failed boot is fixed at compilation time.
  • In order to fully boot, some boards do reset conditionally during the boot process resulting in a non-predictable increment of the boot count.
  • Example script exist only for systemd. Still, they are trivial to adapt to other init systems.
  • Payloads sometime have fixed default locations when loading things from cbfs:
    • When using grub as a payload, grub.cfg is at etc/grub.cfg by default, so if you want to test grub as a payload, remember to change grub.cfg's path not to interfer with the fallback's grub configuration.
    • Changing the path of what SeaBIOS loads from cbfs is probably configurable with SeaBIOS cbfs symlinks but not yet tested/documented with the use of the fallback mecanism
  • Tested boards need to be listed somewhere.

Issues

thinkpad_acpi

This linux driver can have some bad interactions with the fallback/normal mecanism: when using it with the volume_control=1 option, volume_mode=1 is required, otherwise after shutting down the computer, it will always boot from fallback.

This might be because as the default settings of volume_mode touches the nvram, it probably corrupts it at shutdown when saving the alsa state of the volume buttons "sound card" (called EC Mixer). Then at boot, coreboot will detects a corrupted nvram and restore its valid defaults.

references

<references/>