Cache Trasher Project

From DDCIDeos
Jump to navigationJump to search

Design discussion for the proposed cache trasher, hopefully with material that can be incorporated into the final documentation. The project will support trashing caches restricted to specific memory pools.

Reference PCR:10118

Overview

The purpose of a cache trasher is to cause the worst case execution time for a computation due to the behavior of caches. Determination of the worst case execution time of the computation due to other considerations, such as code path, scheduling, blocking, interrupts, etc., is outside the scope of the cache trasher.

There are two primary considerations

  1. When the trashing occurs, and
  2. Which caches, or parts thereof, are trashed.

When Trashing Occurs

For single core execution the caches are trashed prior to the beginning of execution, for multi-core the times are TBD, but likely continuously. There is a secondary question of what corresponds to "the computation"?

  1. A thread, this would be the natural choice for RMA.
  2. A window, e.g., a partition in 653 or POSIX.

There is another dimension to "when", namely when does the trashing begin? Is it possible to stop it?

  1. From coldstart
  2. On command from some agent, e.g., status monitor.


Which Caches

The cache trasher is limited to caches for memory. TLB, branch prediction, and other processor caches are out of scope.

Memory caches are typically composed of L1 instruction, L1 Data, and (usually unified) L2 and possibly L3 caches. Cache partitioning can be used to limit the effective size of the L2 and L3 caches, and sometimes the L1 caches.

Which caches to trash is partly policy, partly mechanism.

The policy should be driven by the platform integrator. Our guidance to the platform integrator: For single core, only caches that are accessible to the computation and shared with some other entity need to be trashed. The effects of BIT and systemic events, e.g., warmstart should be considered. For example, even if the platform limited BIT to only test a single page of memory per period, even non-shared caches could be affected. Note: BIT will change the state of the cache but will not leave it dirty so the effect is not as bad as a cache trasher.

The mechanism should permit the trashing of any number of the cache partitions. Trashing should consist of loading the unified and data caches with irrelevant data, and invalidating the instruction cache.

ISSUE: Should trashing only part of a partition be supported (as per the one page BIT example above)? RLR: To test BIT impact the trasher could write back and invalidate all cache lines in the pools used by the thread that are not being trashed.

The specification of which caches is TBD.

Implementation Issues

The kernel does not specify cache partitions, it only specifies memory pools. Suggest the cache trasher take the same approach, however the kernel does not care about the number of ways. The cache trasher would require that information.

It is probably acceptable to always invalidate the L1 instruction cache. This is partly because the size is small in general, and hence not readily partitionable, and secondly that many processors have fast ways to invalidate the entire instruction cache.

If the specification of "when" becomes complicated, it may make sense to permit the customer to specify the "when" via code and let them hack up their own PAL or PAL interceptor, etc. as appropriate. The specific suggestion is to provide an interface for the trashing itself and provide a simple policy or two and let the customer override/replace if that is not good enough.


Notes from RLR (edited, some removed and included above)

As far as I know the existing cache trasher is a modified PAL and status monitor. The PAL has been modified to look for switching to a thread handle of interest. During timerWrite() it gets the current thread handle, if different from the previous handle and it matches a handle of interest a large memory region outside of any processes memory is accessed such that it causes the cache to be in the "dirty" state (each cache line needs to be written back to memory before it can be filled). It should also invalidate the instruction cache. Status monitor was modified to add commands to communicate the thread handle of interest to the PAL.

Our cache trasher should be more advanced:

  1. It should not require a special PAL. It should either use a PAL interceptor or perhaps even better would be a PRL (we would likely need a kernel update to support this).
  2. The use of status monitor to control it seems reasonable but It should be integrated with OA.

Notes from kickoff meeting

The Idea:

  • Add an OpenArbor example / lesson that shows one how to build, integrate, and load an "Interceptor" PAL. The Interceptor PAL example would actually implement a Cache Trasher.
  • The Customer would need to specify, likely in a "C" header file, rules as shown in the pseudo-code below. This header file would be compiled as part of the Cache Trasher Interceptor PAL example above.
  • The Status Monitor would be altered to accept a command that allows the Customer to associate a particular thread handle with a specific rule defined in the "C" header file above (See SET THRASHER below). This enables the Customer to inflict thread specific cache trashing behavior, e.g. only trash the cache lines associated with the memory pool the thread is using.

Initial Granite Features:

  • X86 only
  • The "trigger" is a thread handle.

Benefits:

  • Does not require a special PAL.
  • Gives the customer control, i.e. they can have it trash the entire cache, or just a pool based sub-set.
  • Can eventually scale to more complex trashing schemes and triggers, as well as PPC, ARM, etc., with minimal (ideally no) BSP or kernel involvement.
Rule:
   Trigger
   set of Pool // the Pools to write to when the trigger matches

Trigger:
   a thread handle
   a process handle
   a list of window indexes


// A Pool is a description of how to trash the cache partition associated with a
// memory pool.  The user is responsible for ensuring that writing to the
// address range(s) will "trash" the cache associated with the memory pool.
// Typically this requires that the memory range not be user writable and that
// the number of cache conflicting ranges is as large as the largest number
// of cache ways.
// AL: perhaps PoolTrasher rather than Pool?
Pool:
  list of [physical address, length] 

Initial implementation the user would supply the pool info and be
responsible for ensuring the addresses are not available to the user.


The pool description would be derived from information in the
registry.

trigger is evaluated on timerWrite()
  currentThreadHandle() can be used.
  if current == last, don't trash, although UG should note that some
  cases where current==last might need to be trashed, e.g., for ISR
  threads. 

At coldstart time the IPAL should map virtual addresses for the pool
physical addresses, and (a guess), map a page user writable that can
be used by the "SET TRASHER RULE TRIGGER" API.

Status monitor interface:

  SET TRASHER RULE TRIGGER rulenum threadHandle