PCR 14992 - CPSR is not properly restored after an interrupt in kernel mode
Summary: CPSR is not properly restored after an interrupt in kernel mode
Status: NEW
Alias: None
Product: Kernel
Classification: Deos
Component: Kernel (show other PCRs)
Version: 10.8.0
Hardware: ARM Deos
: Hold
: Limitation
Target Milestone: 10.8.0
Assignee: .Kernel
URL:
Whiteboard:
Depends on:
Blocks:
 
Reported: 2023-06-02 09:07 MST by rroffelsen
Modified: 2023-11-13 16:24 MST (History)
3 users (show)

See Also:
Impact Assessment: Trivial
Organization: DDC-I, Inc.


Attachments

Note You need to log in before you can comment on or make changes to this PCR.
Description rroffelsen 2023-06-02 09:07:31 MST
+++ This PCR was initially created as a clone of PCR #14991 +++

This limitation only impacts ARM processors. 

When returning from an interrupt that occurred in kernel mode some bits of the CPSR can differ from the state they where prior to the interrupt. This could have the following affects:
* Kernel mode T32 code could have CONSTRAINED UNPREDICTABLE behavior. See ARM_v8_ARCH_REF_D.a section K1.1.7 "CONSTRAINED UNPREDICTABLE behavior associated with IT instructions and PSTATE.IT"
* If boot sets CPSR.A to 1, unmasking SErrors, it may temporarily become set to 0 longer than intended while a thread is executing in kernel mode. This is a transient state that will be resolved when the thread returns to user mode. If a SError occurs during this time it will be unmasked when the thread returns to user mode or a context switch occurs. A SError is a fatal event so the effect is the potential delay of a call to boot's kernel mode error function.

Workaround:

* To avoid CONSTRAINED UNPREDICTABLE behavior when executing T32 code in kernel mode, ensure all kernel mode T32 code executes within a kernel critical. 

** To locate T32 code within a shared object file use the following command: 'arm-eabi-objdump --syms --special-syms <shared object file>.dbg | grep \$t'. This will return the address of the start of each T32 section of code. The sections can then be further analyzed to ensure if executed in kernel mode they are within a kernel critical. Note: This command assumes the .dbg variant contains the debug symbols for the shared object file in question.

** Startup GCC component may contain T32 instructions so share objects that link in the Startup GCC files an contain T32 code even if all of the other sources for the shared object are compiled A32. When an shared object that runs in kernel mode is loaded the kernel will call the initialization function (_init) in a kernel critical. Kernel mode shared object files are never unloaded therefor, the kernel never calls the termination function (_fini). As long as there are no other calls to Startup GCC functions, its T32 code is assured to execute within a kernel critical.

* There is no workaround for the potential delay of a call to boot's kernel mode error function. Note: The calling of this function is documented to be "best effort", therefor systems should not be depending on this function ever being called.

Analysis:

The instruction used to load the SPSR_irq (which is used to restore the CPSR on exception return) is "msr spsr, r0" which gcc translates into "msr spsr_cf, r0" causing only bits 0-7 and 24-31 to be written. Bits 8-23 are unchanged.

The bits impacted are: A bit 8, E bit 9, IT[2:7] bits 10-15, GE bits 16-19, IL bit 20, DIT bit 21, PAN bit 22
Note: Bit 23 is reserved.

This analysis must address the possibility of each bit being in a changed spontaneously any time the kernel mode code can be interrupted. But because SPSR_irq is not user mode writable and only written to vailed CPSR[8-23] values by the processor on each interrupt entry and the kernel on a user mode interrupt return, only valid CPSR values need to be considered (i.e. a SPSR_irq bit can never be a value that is never used in the CPSR).

The sequence of operations that needs to happen to possibly change the state of a bit are:

1 - ThreadA running in kernel mode is interrupted and its CPSR state is written to SPSR_irq.
2 - context switch to another thread as part of the kernel's interrupt handler.
3 - Before ThreadA is context switched in again another thread is interrupted and its CPSR state is written to SPSR_irq.
4 - When ThreadA is context switched in again, its saved SPSR_irq state is moved to SPSR_irq, preserving bits 8-23 that where written in step #4.
5 - eret, causing CPSR[8-23] to potentially be incorrectly changed.

The LLKMI feature allows for interrupts in kernel mode when in a kernel critical section but step #2 can never happen as part of a LLKMI handler. Therefor, a kernel critical is sufficient to ensure this limitation will not be realized.

CPSR.A is not user mode modifiable. It is set to a boot defined value and unchanged by the kernel for both user and kernel mode. The processor will set the CPSR.A to 0 on an interrupt or abort exception. Therefor, it is possible for CPSR.A to switch form 1 to 0 as a result of this limitation. The effect of this is an SError interrupt (which is a fatal condition) could be delayed until the interrupted thread returns to user mode or context switches to another thread. This is a transient condition and once the thread returns to user mode CPSR.A will be restored to its correct value. While not optimal, the result is "safe" because in a fatal condition boot's kernel mode error function is called as a "best effort" and not guaranteed.

CPSR.E is not user mode modifiable because SCTLR.SED is set to 1 by boot. CPSR.E is set to 0 by boot never changed. Therefor, this limitation will have no impact  on CPSR.E.

CPSR.IT[2:7] are user modifiable. These bits are only permitted to the set during an IT block (T32 instructions). According to the v8 manual section K1.1.7 "CONSTRAINED UNPREDICTABLE behavior associated with IT instructions and PSTATE.IT", the CPSR.IT bits are forced to zero on an exception return to A32 state. Therefor, this limitation will have no impact as long as all kernel mode interruptible code is A32 code.

CPSR.GE are user writable. These bits are only used by SIMD instructions. SIMD instructions are prohibited in kernel mode. Therefor, this limitation will have no impact.

CPSR.IL is not user mode modifiable. It is set to 0 by boot never changed. Therefor, this limitation will have no impact on CPSR.IL.

CPSR.DIT is implemented in v8.1 and reserved in prior versions. All currently processors the kernel has been verified to support are prior to v8.1. Therefor, this limitation will have no impact on CPSR.DIT.

CPSR.PAN is implemented in v8.4 and reserved in prior versions. All currently processors the kernel has been verified to support are prior to v8.1. Therefor, this limitation will have no impact on CPSR.PAN.
Comment 1 rroffelsen 2023-06-02 13:04:46 MST
This limitation only potentially impacts PALs, PRLs and any libraries they use.
DDC-I provided PALs and PRLS that need to be inspected to see if they are impacted:
- trickyfish (pal)
- vfile (prl)
- dvms (prl)

The inspection must identify any library used in kernel mode and include them in the inspection. Results of the inspection(s) must be documented on this PCR.
Comment 2 Christopher Pow 2023-06-13 11:25:47 MST
vfile-prl is not impacted by this limitation.

vfile-prl uses -marm compile option to avoid T32 code in general and thus the majority of it is not affected by this limitation.

vfile-prl does link GCC Startup. As noted in the workaround this is not a concern.

To confirm no T32 code exists in the non-GCC Startup code in vfile-prl:

The analysis step given in the workaround yields:
$ arm-eabi-objdump --syms --special-syms output/arm-deos/release/libvfile-prl.so.dbg | grep \$t
00000a3c l       .fini  00000000 $t
0000029c l       .init  00000000 $t
000009bc l       .text  00000000 $t
000009f8 l       .text  00000000 $t

.init and .fini is not a concern as stated in the workaround. 9bc is the start of gcc_invoke_ctors, a GCC Startup invoked function. Similarly, 9f8 is the address of gcc_invoke_init_array, another GCC Startup function.
Comment 3 Richard Frost 2023-06-13 18:12:37 MST
trickyfish pal and dvms prl are in development. They will ensure the limitation is not applicable before verf complete. The Limitation Analysis will validate they are not susceptible.
Comment 4 deosbugs.ccb 2023-08-23 10:12:05 MST
CCB visited this PCR on 2023-08-23-61658
Comment 5 deosbugs.ccb 2023-08-23 10:13:03 MST
PCR being placed on HOLD for jupiter, to remain open.
Comment 6 rroffelsen 2023-11-13 16:24:59 MST
(In reply to rroffelsen from comment #0)
> * If boot sets CPSR.A to 1, unmasking SErrors, it may temporarily become set
> to 0 longer than intended while a thread is executing in kernel mode. This
> is a transient state that will be resolved when the thread returns to user
> mode. If a SError occurs during this time it will be unmasked when the
> thread returns to user mode or a context switch occurs. A SError is a fatal
> event so the effect is the potential delay of a call to boot's kernel mode
> error function.

The above statement incorrectly stated stated CPSR.A is set to 1 to unmask SErrors, CPSR.A set to 0 unmaskes SErrors.

So it should read:

* If boot sets CPSR.A to 0, unmasking SErrors, it may temporarily become set to 1 longer than intended while a thread is executing in kernel mode. This is a transient state that will be resolved when the thread returns to user mode. If a SError occurs during this time it will be unmasked when the thread returns to user mode or a context switch occurs. A SError is a fatal event so the effect is the potential delay of a call to boot's kernel mode error function.
 
> CPSR.A is not user mode modifiable. It is set to a boot defined value and
> unchanged by the kernel for both user and kernel mode. The processor will
> set the CPSR.A to 0 on an interrupt or abort exception. Therefor, it is
> possible for CPSR.A to switch form 1 to 0 as a result of this limitation.
> The effect of this is an SError interrupt (which is a fatal condition) could
> be delayed until the interrupted thread returns to user mode or context
> switches to another thread. This is a transient condition and once the
> thread returns to user mode CPSR.A will be restored to its correct value.
> While not optimal, the result is "safe" because in a fatal condition boot's
> kernel mode error function is called as a "best effort" and not guaranteed.

This statement made the same error and should read:
CPSR.A is not user mode modifiable. It is set to a boot defined value and unchanged by the kernel for both user and kernel mode. The processor will set the CPSR.A to 1 on an interrupt or abort exception. Therefor, it is possible for CPSR.A to switch form 0 to 1 as a result of this limitation. The effect of this is an SError interrupt (which is a fatal condition) could be delayed until the interrupted thread returns to user mode or context switches to another thread. This is a transient condition and once the thread returns to user mode CPSR.A will be restored to its correct value. While not optimal, the result is "safe" because in a fatal condition boot's kernel mode error function is called as a "best effort" and not guaranteed.