Deos with Native 653 Support Project

From Deos
Jump to navigationJump to search

Description

This project's purpose is develop and deploy ARINC 653 for Deos. This project received funding in Oct 2010 from Honeywell.

Coordination with Honeywell

Document exchange between Honeywell and DDC-I uses Honeywell's Daptive site. The following people should have access to that site:

  • Aaron Larson: alarson@ddci.com
  • Bill Cronk: bcronk@ddci.com
  • Gary Kindorf: gkindorf@ddcic.om
  • Greg Rose: grose@ddci.com
  • Ryan Roffelsen: rroffelsen@ddci.com

References

Marketing overview

  • Can run 653 apps using Deos scheduling (including slack) or 653 scheduling.
  • Can run both 653 and Deos apps simultaneously.
  • Backward compatible with preceding Deos APIs
  • Support 653 applications
    • process level API & scheduling
    • partition level scheduling
  • integration between IOI and 653 sampling and queuing ports
  • Commercially supported
  • IDE (more details?)
  • x86 & PPC support
  • XML system config tool

Substantive Non-Compliances

The following are DDC-I 653 implementation non-compliances with the 653 spec.

  1. Deos/653 will only support one reader and one writer to queuing ports.
    • The main reason here is that supporting multiple writers/readers requires a trusted intermediary (e.g., the kernel) and the runtime overhead and implementation cost is not deemed worth it.
  2. Deos/653 will not support delayed start.

Project Management

Short term Program management and setup needed

Upcoming Releases and SOW Deliverables

  • 2011-03-18 verf tools (GK) from PDR actions Issue 12.
  • 2011-04-04 L2 Cache brief (AL) from PDR actions Issue 3.
  • 2011-04-04 Integ Tool usability (GK) from PDR actions Issue 13.
  • 2011-06 Partition scheduler Release (Kernel & IT)
    • These are the high level actions that need to be performed:
      1. Update Kernel Release notes
      2. Generate basic Window scheduler documentation (Readme).
      3. Ad Hoc window testing
      4. DONE: Regression on Fit-pc2 and ep440c
      5. Window Example including Timemap usage
      6. EP8343M 653-BSP
    • Nice to have:
      1. Fit-PC2 653-BSP
      2. Status Monitor Release (New events logged)
  • 2011-07 MIPS kickoff meeting.
  • 2011-10 653 Process Scheduler Release

Risks

  1. The Health monitor section of the specification is imprecise and spans the kernel and the process scheduler. Need good coordination there.
    • This needs to be covered in the PDR slides, especially division of labor between kernel and library, and what controls on module reset.
  2. Are 32-bit/64-bit timers a performance issue? Ref [1] Issue 2.

Miscellaneous SOW tasks

Things required by the SOW that don't naturally fit into one of the below projects.

  1. Update DeosDoc1a PCR:6886
  2. Technical brief on cache management mechanisms (ref SOW 4.4.1)

Kernel

  • Lead: RLR
  • Tasks
    • Partition Scheduler (WAT, etc.)
    • Interface APIs required by runtime library
      • ThreadContext as KIO
        • Desirement is to have budget tracking. Could be via log, or (preferred) with high/low watermarks.
      • Current Window accessor
      • Time to next system tick PAL interface function.
        • Must be user callable.
        • Return time in us.
      • Boot shutdown/halt function pointer & kernel PRL APIs for calling it from user mode. Ref PCR:1876
        • Needed to support HM module restart.
      • Either document restart behavior of setThreadTLSPointer() or implement setter for process user data.
        • Needed to support HM partition restart.

The Kernel is responsible for the following from the SOW:

  1. For ensuring 653 partition window activations are executed on time and without interference.
  2. Health monitor reset support: (perhaps delegated to a platform resource library).
  3. Section 5.11 paragraph 1: "...co-schedule...
  4. Section 5.11 paragraph 3: "... flush caches..."
  5. Section 5.11 paragraph 4: "... fast packet... bypass L2..."
    1. SOW says we described our approach in our quote, should look at what we said.
  6. Section 5.11 paragraph 5: "...executable segment level control..."
  7. Section 6.3 context switch times.

BSP with A653 Support

  • Lead: MH
  • Tasks:
    • Initial boot/pal/config support on reference platforms
      • Window Activation Table (WAT) support completed on VMware, Fit-PC2, and EP8343 targets
      • Additional 653 support API required on VMware, Fit-PC2, and EP8343 targets for nanosecond timer values
    • Initial boot/pal/config support on MIPS reference platform
      • Boot
        • Get the emulator operational
        • Determine Processor initialization sequence
        • Determine Memory map and TLB setup
        • Determine Cache setup
        • Determine RAM and FLASH Init
        • Determine Exception Handler installation requirements
      • PAL
        • Provide WAT and A653 Support
      • Config
        • Provide a basic PI file for integration
      • Network
        • Insure network support

653 Part1 Runtime

  • Lead: RLF
  • Tasks:
    • Define Interface between configuration tool binary and runtime.

Partitioning of responsibility for the SOW into runtime and i/o library is TBD. Special notes:

  1. Section 5.11 paragraph 2: "...replace or supplement default sampling and queuing port behavior..."
  2. Section 5.11 paragraph 6: "... partition warmstarts ... traditional Deos..."

Inter-Partition IO library

  • Lead: GK (at least for analysis phase)

Tooling

ARINC 653 Configuration tool

  • Lead: GK
  • Status as of Aug 8 2011:
    • deos653config tool version 1.0.0-9 is in unreleased form in the cygwin installer.
    • if you suspect a bug in the binary configuration file, there is a utility called dump.py in svn in the tools /code folder that can be used to dump a config file content into text.
  • Mini milestones:
    • [done] generate WAT content to the registry
    • [done] ioi config xml generation (the process level xml files)
    • [done] i/o binding specification and mechanism.
      • [done] add binding specification to deos653 XML
      • [done] auto-binding capability for 653 ports and queues
      • [done] support pure 653 I/O where the deos653 tool generates the ioi binder file
      • [hold] support hybrid I/O system where the deos653 tool contributes to an existing ioi binder file. This will be done by modifying the IOI configuration tool, enabling it to accept multiple ioi.binder.xml files. those binder files will be merged into a single temporary file, which is then effectively passed to the tool -- thereby keeping the single-binder-file invariant in tact. From the 653 config tool perspective, were done. the ioi config tool needs some work.
    • [done] Resolve WAT interface to registry and kernel. On hold pending group discussion with richard and ryan.
    • [done] validation rules
    • [done] regression test suite
    • [done] meaningful UG content (framework is in place)
    • [40h not-started] health monitor xml, table generation
    • [16h not-started] manage config file namespaces within the deos653 binary file (i.e., the name of the 653 config file will be specified via argv, but the name of the ioi binary file to use is... ?)

For the Aug 19 release, only "Resolve WAT interface" and "meaningful UG content" need to be done.

Status Monitor

  • Lead: RLR
  • Displays for ThreadContext.

Debugger

  • Lead: ???

Need to make sure that ThreadContext changes don't adversely affect the GDB server, or if they do, that the DDS debugger accommodates the changes.

Timemap

  • Lead: ???'

Create timemap tool (integrate old tool?) in DDS and add necessary runtime message decode.

Integration Tool updates

  • Lead: GK

Add WAT and ThreadContext info.

Testing

Note that Part 3 of the 653 specification is dedicated to describing testing, and contains what we would call test cases in our process.

The Project

Hybrid System

The system must allow both Deos and 653 schedulers to be active simultaneously, which does not preclude either scheduler from using all of a CPU. The basic idea is to provide a 653 partition scheduler at the same level as Deos, and run a 653 process scheduler as a user mode shared library. The product would support both 653 API and 653 process level scheduling, and optionally 653 partition level scheduling (otherwise a 653 partition would be scheduled as a normal Deos thread and would thus would not require a fixed timeline schedule, could use and contribute slack (if desired), etc.

The hybrid system (opposed to a library wrapper solution or a standalone solution) has some distinct advantages:

  • Leverage Deos platforms: All hardware architectures supported by Deos would be inherited by 653. The 653 architecture proposed later in this document does not require hardware (context switching, VAS management), and instead allows Deos to continue those functions.
  • Native 653: The solution proposed here is not a mapping of 653 calls to Deos calls. The 653 library will contain the 653 API and will not rely on Deos for any functionality.
  • Allows Deos and 653 Applications to interact/co-reside: The hybrid approach allows applications coded to either OS API to exist and execute on the same CPU. Deos IPC mechanisms can be used by applications to synchronize/communicate between 653 and Deos.


Historical Information

Most of the legacy text for this page has been deleted. The original text is available in the current page history.

Potentially still relevant kernel notes

Note that there is not a 1:1 correlation between a Deos thread and a 653 process. Making thread blocking calls to Deos from a 653 partition will block the entire 653 partition (i.e., only the non-blocking subset of the Deos API is available for use as a 653 extension).

The following depends on several Deos Kernel modifications as described in the kernel 653 ISR priority ceiling proposal.

Each 653 partition is configured as a Deos process in the Deos Integration Tool. The main thread of that Deos process will be an ISR thread. All 653 processes execute in the context of the Deos ISR thread, and the 653 scheduler can (in user mode) perform context switching between processes simply by saving and restoring CPU registers.

Control

The bulk of the 653 API implementation will be in a user mode library; the APEX data structures for a partition will be kept in that partitions address space. The one Deos ISR thread is exists inherits virtual address space management and other HAL functionality, but there is a compromise as that thread is being scheduled by Deos, but it represents 653 processes that are being scheduled by APEX.

The technical issue to solve is in getting the 653 process scheduler and the Deos thread scheduler coordinated. When a 653 partition is swapped out, its Deos ISR thread must be blocked. The control mechanisms available in the Deos PAL (where the partition interrupt handler is) include the ability to raise exceptions and to generate software interrupts.

Before a 653 ISR is given its activation software interrupt, all interrupts (except PDLA, tick, thread timer and other critical system interrupts) are masked. And 653 ISR threads have a scheduling priority higher than any other scheduling priority defined to allow a 653 partition to truly preempt all-things-Deos.

Using the proposed solution above, the typical sequence of operation in the PAL tick interrupt handler/partition interrupt handler would be:

When a partition interrupt indicates a Deos partition is ending:

  • mask interrupts

When a partition interrupt indicates a Deos partition is starting

  • clear & unmask interrupts (at least the thread timer interrupt should be cleared)
  • call raiseSystemTick()

When a partition interrupt indicates a 653 partition is ending:

  • raiseExceptionToThread( the ISR thread associated with the partition )

When a partition interrupt indicates a 653 partition is starting:

  • perform any platform specific processing (AIMS APEX, for example, flushed and invalidated the caches for determinism). AIMS APEX also allocated a fixed amount of time for this variable time operation, and spun in an idle look for the duration of the worst case. This architecture does not preclude or require such operations. Note that these operations could be located in the end-of-partition exception handler.
  • raisePlatformInterrupt() for the Deos ISR thread for this partition

The 653 ISR thread must not block or complete-for-period via any Deos mechanism. A TBE, for example, will leave the ISR thread blocked in Deos, rather than blocked in 653 where it belongs, and it will be scheduled with its high priority by Deos, not by the partition activation table.

Slack Time

If all the processes in a partition give up the CPU prior to the end of the partition, what is to be done with that slack time? Aaron believes it can be transferred to Deos as slack.

653 API Location

The 653 scheduling state data, API and associated code will be located in a library that partitions are linked with. This means a partition could corrupt its 653 state, and:

  • Overhead of trapping to the PAL/supervisor mode for API calls is 0
  • Configuration data for processes and other objects in the partition needs to be file based so that it can be referenced, as already suggested.
  • Context switching between processes in the same partition can be done in user mode, and is very fast (no timers to manage, no switching VAS).

653 Library Overview

As mentioned earlier, the 653 Library is where the 653 API and most (if not all) of its implementation resides. Context switching between processes in the same partition can be done in user mode by swapping CPU registers. The run-queues, process state data, etc. for a particular partition reside in the partitions user space.

This library does not map 653 API calls to Deos calls. The primary problem with that is the differences in the schedulers, thread/process states, and blocking models. Instead, each 653 API call will be implemented in the 653 library -- no Deos calls in that library is a goal (although this overall architecture relies on Deos for memory management -- so if capabilities are needed, Deos will be used)

Effort Reducers

  • There is an external entity that validates systems for 653 compliance
  • Is the 653 specification sufficient to be the bulk of the requirements?

PDR

PDR was held on 3/1/2011. Action item list is still "in work". Preliminary meeting notes are here.

PDR Action-items Meeting Prep

Monday, March 21nd, Meeting 1: 653 Scheduler Issues

DDC-I required attendees: Richard & Gary

ISSUE 5 - CLOSED:

Description:

Determine when deadline misses should be evaluated.

Much discussion occurred relative to when deadline misses should be evaluated. Some of these concerns were performance related and others were related to determinism.

Response:

DDC-I has no action here.

Resolution:

It was determined that evaluation of timeout conditions (deadline misses, semaphore wait, timed wait, etc) at scheduling points as proposed by DDC-I was acceptable. This is the strategy that the other OS vendors use. Richard Frost pointed out that the timeout conditions will be stored in time-ordered lists so the check for a condition will be as efficient as possible.

ISSUE 7 - CLOSED:

Investigate 653 partition cold start versus warm start

Description:

Determine what can be done to differentiate cold starts and warm starts for fault response. DDCI to look at cold vs warm for fault is there any other option than to treat them as the same thing, and what are the implications? Concern is that cold starts can take a long time. What could be done to make a warm start faster than a cold start?

DDCI ACTION: Investigate Partition Warm-start vs. Cold-start. Could flash copy be eliminated from Warm-start case? (due 3/18)

Response:

We have two approaches for warm_start vs cold_start at the 653 runtime library level:

1) 653 runtime treats them the same, but there is a different start condition the user 653 process can query to skip some logic during initialization. 653 runtime calls Deos restartProcess API. This is what was proposed during the PDR. If copy from flash is not wanted, then during boot copy kernel file system from flash to RAM, and then mark the Deos process (653 partition) as run from Flash.

2) 653 runtime manually reinitializes all of its global data, sets the start condition, etc., and then calls Deos API restartThread (on the primary thread, after changing the threads entry point to effectively be main()). The runtime will not reallocate any memory, or reread the configuration file, but that will be maintained and just reset to no allocated objects. User global data is not reinitialized. Their partition is running the initialization process with no other 653 processes defined. All handles which are in global data will be invalid. When creating a process or other object the ID that is returned will not match the id prior to the warm start. Using an invalid id in an API will return the INVALID_PARAM code where an object does not exist. The application logic is free to decide what needs to be redone and what global data is still valid and can be used to determine what the 653 processes do when started.

With the 2nd approach, user health monitor warm start responses must understand what RAM corruption may have occurred or other behavior that would warrant a cold start and reinitialization of global data.

Approach #2 is the current plan.

Resolution:

DDC-I proposed a partition warm-start implementation that performs less initialization than the cold-start implementation (whereas GHS performs the same steps for both). This was Option 2 in the DDC-I response. The proposed warm-start implementation does not recopy partition code from flash, global data is not re-initialized, constructers are not re-run, and library entry point initialization is not performed. This implementation places considerable responsibility on the partition software to do explicit initialization. Of course, the use of warm-start restart type is optional and participants on the call wanted this option.

An alternative to using this partition warm-start behavior was discussed as well. This was Option 1 in the DDC-I response. This option can be implemented today with Deos (and will continue to be an option with Deos653). The entire flash file image can be copied to RAM by boot and then Deos653 operates out of RAM and run its RMA processes and 653 partitions directly out of this “simulated flash”. With this strategy, Deos653 would be told via its platform registry to not copy executable code into RAM, but rather to execute directly from “flash”. With this implementation, all partition restarts could be “cold-starts” since even cold-starts will be very quick without the flash copy. This option would probably work well for COMAC C919 FCS given the relatively small flash image size and the large amount of available RAM. This still may still be the selected approach for C919 FCS, but those in the meeting wanted DDC-I’s Option 2 to allow more design flexibility.

DDC-I to provide Option 2 behavior for Partition warm-starts in Deos653 as proposed in the action item response document. Platforms will decide which strategy to use at a later date.

ISSUE 9 - CLOSED:

Fault logging via RAISE_APPLICATION_ERROR when no error process exists.

Description:

The default behavior of RAISE_APPLICATION_ERROR does not support fault logging, however this behavior is needed.

DDC-I ACTION: DDC-I to evaluate request and respond (due 3/18).

Response:

If an error handler process exists, it will have the responsibility of logging an error by calling the REPORT_APPLICATION_MESSAGE service. If there is no error handler process, or a fault occurs during the error handler process, the Health Monitor function will invoke a partition-level fault response.

Immediately prior to any partition-level response, the Health Monitor will invoke the LOG_HM_MESSAGE function. This is the same function REPORT_APPLICATION_MESSAGE will use to log a message. The LOG_HM_MESSAGE will be provided by the user and specified as an XML attribute in the 653 configuration files.

Resolution:

DDC-I proposed that in the absence of an error handler process, the Health Monitor would invoke LOG_HM_MESSAGE. When an error handler is present, the handler would be responsible for making that call. The solution proposed by DDC-I was accepted by the participants

DDC-I to implement as proposed in action item response document..

ISSUE 10 - CLOSED:

Alternative behavior required for READ_SAMPLING_MESSAGE

Description:

Green Hills provided a Honeywell specified custom behavior for the READ_SAMPLING_MESSAGE function. The default behavior provides no means to determine if data is stale or fresh. The custom behavior provides a freshness indication. This implementation is necessary for C919 FCS as well.

DDC-I ACTION: DDC-I to evaluate request and respond (due 3/18).

Response:

This issue was triggered during the PDR by a slide describing how 653 I/O behaviors are different from IOI behaviors. Specifically, it was the item that said 653 will read a stale message, where the IOI will not. In the proposed architecture, the freshness checking will be disabled in the IOI, and instead implemented within the 653 I/O library. We do understand Honeywell desires IOI-like behavior where stale messages are not read.

Does the Green Hills modification provide additional capabilities other than just freshness management?

Before answering that, there was also a discussion on the use of IOI formatting functions, and whether or not they could they be used for anything beyond the providing the variable length message protocol. The answer to that is now “yes, that will be included.” We are hoping that meets all your needs. How it works…

In the 653 XML, each port may optionally be associated with a “formatting function” in the form of an image name/function name pair (e.g., somelib.so/someFunction). If that “formatting function” is not specified, the default formatting function will be used, and that will yield the undesired 653 standard behavior. The 653 Configuration Tool will generate IOI XML, and each 653 port will be associated with either the default formatting function, or a 653 XML specified formatting function.

The default formatting function adds additional metadata to the message to support the variable length protocol (i.e., another DWORD per (IOI) message containing message size). This formatting function will probably add a 64-bit ns timestamp as well, as the IOI timestamp storage support is limited to 32-bits. At run-time (during initialization), all formatting function images are loaded into the address spaces of the partitions that use them (see Deos API loadLibraryDeos()). The function entry points are then found within the images and cached within IOI data structures. In this proposal, the stored function pointer will be either to the default formatting function, or to the 653 XML specified formatting function. The default formatting function will be contained within the 653 I/O library itself (like it is today for NGIMA). During read/write calls, the function pointer is simply dereferenced. If you write your own “formatting function”, documentation will be provided on how to implement the required protocol found in the default formatting function. You are then free to do whatever you want in your function: custom freshness, wrap payloads in CRCs, fork a data write to multiple devices/destinations, whatever. Formatting functions are not special in any way it’s just user mode code.

From within the formatting function, you will have access to the following data items:

  • The address of the message in shared memory which get you to the metadata shown below and the actual payload address
  • The user specified address to read-into or write-from
  • The freshness interval associated with the port
  • When writing data, the size of the message to write
  • The address to write a653 return code to.

In shared memory, the metadata associated with the message is as follows (you can of course add more if you like via your formatting functions):

  • The IOI message sequence number
  • The IOI message timestamp
  • The 653 variable message length
  • The 653 64-bit message timestamp

You can use the 653 API GET_TIME() along with the above provided data to determine freshness. The default formatting function will do this and if all you want to do is not read stale data, that would be a simple modification to a copy of the function and some XML configuration.

We will provide examples of how to get formatting functions written.

Resolution:

DDC-I proposed allowing the use of “formatting functions” in 653 XML tables where any kind of user desired processing could be implemented (such as a copy conditional on freshness, computation of a CRC on the incoming/outgoing data, etc). These formatting functions would be specified in the form of an image name/function name pair in XML tables. Supplying a formatting function would supplant the default 653 behavior. DDC-I mentioned that the default functionality is a very small function and can be provided to the platforms team for reference along with instructions on how to create a new function. The meeting attendees accepted this approach.

DDC-I to implement formatting function approach as proposed in Action item response document.

ISSUE 11 - Spawned Follow-on Action:

Verification Tools to ease V&V burden

Description:

CVT preferred over "checker tools".

DDC-I ACTION: Investigate and provide feedback on what could be done to reduce V&V effort associated with binary output from DDC-I tools (due 3/18).

Response:

There are two strategies in use at this time; the checker style verification tool (ensures internal consistency and generates a text report of values), and the CVT approach. The checker style is our legacy style that is being used by all Honeywell Deos users (e.g., the regcheck tool). Honeywell Deos users running Agave or later may also be using the CVT approach on some middleware components (e.g., IOI), but still must use the checker style (e.g., regcheck) in order to assist with the verification of the Deos registry. Both approaches are valid and it has been noted that the CVT approach does make verification activities easier. However, when quoting, planning and scheduling, for this SOW, DDC-I took the position that what was being used now was acceptable for Deos653. Therefore, while replacing checker style tools with CVT style tools is possible, a modification to the quote/SOW would be necessary (i.e., it is not trivial).

The “CVT approach” is illustrated below:

Discussion:

During the PDR, the overhead of hand-verifying 653 and IO related data tables was discussed. Dan Will related that this manual effort for 787 FCS was approximately eight weeks of one-time effort, with follow on-reviews limited to analyzing differences. Like Green Hills, DDC-I does not intend to produce a Configuration Verification Tool that regenerates the XML table information in order to allow simple diffs of the input and output files. Rather they intend to provide “Checker” tools that scan the created binary images for errors that may have been induced in the translation process by the generation tools. DDC-I indicated that they could produce such tooling but it is outside of the scope of the SOW and would therefore need to be quoted separately.

Action: SWCOE and DDC-I to discuss a potential tool improvement (CVT style tool for validating Registry and 653 XML files). Goal: Determine desired functionality and obtain quote from DDC-I). Assess perceived long-term benefits vs. cost. Due 4/15

Tuesday, March 22nd, Meeting 2: Cache/Kernel Interface Issues

Attendees: Miller, Larry; Schmidt, Brian; King, Wayne; Will, Daniel; Thompson, John; Johnson, G Craig; Saw, William; Rische, Ronald; Larson, Aaron; Roffelsen, Ryan; Kindorf, Gary; Anderson, Mark; Morin, Brent; Kimball, John; Hancock, Bill

ISSUE 3 - Spawned Follow-on Actions:

Action to confirm how L2 load and lock works

Description:

787 FCS locks portions of application code (and kernel code) in L2/Fast RAM for performance reasons. The need for this capability came up during initial negotiations and is a required capability within the SOW. A meeting between DDC-I and Honeywell is needed to revisit the topic and ensure that the planned design will meet C919 FCS (and future Honeywell program) needs.

DDCI ACTION: Section 4.4.1 in SOW requires a written technical brief on Deos653 cache management capabilities. DDC-I to provide technical brief on cache management prior to (or in response to) the above meeting (due 4/4/11)

Response:

Deos (as it exists today) provides sufficient functionality for a Deos process (653 partition) to place its code and/or data (or subset) into a physical memory range that has been locked into cache. The basic approach is as follows:

  • Have Boot/PAL/PRL lock a physical memory range out side of Deos RAM into cache.
  • Have the process of interest use the Deos "attach platform resource" functionality to map the memory into the address space.

We (DDC-I) can (and will if necessary) provide a library that can be used to re-map a processes code and/or data into the platform resource. We will not be providing the code to lock the memory into cache because that is processor/platform specific.

Resolution:

There was significant dialog between DDC-I and Honeywell with Dan, Brent and Wayne describing how L2 cache, and Fast RAM (Polaris) is used by the two CPUs on 787 FCS. Much effort has gone into optimizing the placement of code and data to achieve the best performance. The placement of specific code and data segments at specific physical addresses is accomplished via tools and does not require OS intervention.

DDC-I described the capabilities that Deos653 will provide. Each 653 Partition and Deos RMA processes will have an executable and one or more dynamic libraries. Any portion of the executable or a library can be relocated to a physical address that could correspond to L2 locked RAM or Fast RAM (4K resolution). The L2-locked RAM and FastRAM address ranges will be divided into Deos platform resources that can then be assigned to an RMA process or 653 partition via the platform registry. It will also be possible to share libraries among two or more RMA processes/653 but it may require a mechanism similar to what is done today on 787 FCS (function pointer tables). DDC-I did not promise a means to lock the kernel itself in L2 locked RAM or Fast RAM. Today that isn’t possible. The motive to do that would be to reduce context switch times but since we don’t have solid context switch performance numbers yet, it isn’t known if this will be necessary. This issue will be revisited if necessary after a prototype partition scheduler is released later this year.

Actions:
  • SWCOE to assess context switch performance of prototype partition scheduler (available Jun 2011) in order to determine if a means of locating the kernel at a specific physical address is required. Due 7/15
  • Wayne King to provide documentation on 787 L2/Fast-RAM memory use by COM/MON CPUs to DDC-I. Due 4/10
  • SWCOE to review DDC-I provided Cache white paper (due 4/4/11) and assess for compatibility with platform needs. Due 5/1


ISSUE 4 - CLOSED:

Hook issue, SKP kernel interface and how is that enabled with Deos653

Description:

The 787 FCS SKP partition features Several issues were raised regarding the SKP partition including its need to scrub RAM for bit errors. We need to review what this partition does and how these functions can be performed within Deos653.

Response:

The Deos kernel provides support for a feature known as a Platform Resource Libraries (PRL, pronounced "pearl"). A PRL is an access controlled library that allows the user to run code in kernel mode. If you are familiar with Deos a PRL is a alternative/replacement for the limited and potentially problematic PAL based kernel extension. The original intent of a PRL was to allow users to write high criticality kernel mode device drivers that could be called directly by low criticality applications.

It is possible to implement a PRL that performs a memory scrubbing over all of RAM. It would work something like this:

  1. disable processor interrupts
  2. disable data paging
  3. flush cache line of physical address of interest
  4. read physical address of interest
  5. write back physical address of interest
  6. flush cache line of physical address of interest
  7. enable data paging
  8. enable processor interrupts
  9. move to the physical address to the next cache line and repeat until the end of RAM is reached.

For more information on PRL's see the Deos Users Guide

Resolution:

Dan described the current functionality of the SKP partition. In summary it does the following:

  • Performs buffered Bite flash writes to NVM as directed by the BMP partition.
  • Continuous scrub of TMR RAM (entire RAM must be read/written once/hour in order to clear RAM errors)
  • Dumps Ethernet buffers for the GHS debugger and FIDO partition (level E data).

In discussing what this partition does, we concluded that the Bite flash writes were most likely done in this partition because of its high rate rather than out of necessity to perform them in a “system” partition. Deos653 provides a couple of ways to perform the continuous scrub of RAM so duplicating that functionality should be no problem. Unlike Integrity, user processes can directly utilize the network interface and even share it under Deos. It may be possible to combine BMP and SKP and is certainly doable under Deos653 if that is advantageous. In all, no issues were identified for this item.

Bonus Issue - Health Monitoring:

Health Monitoring was addressed to some degree during the PDR but details were lacking. The issue came up again today and a productive discussion ensued.

Discussion:

Dan and Brent described the Green Hills Health monitoring mechanism, the aspects of the design that the team do not like and the trials the team went through to get the implementation to its current state. The team dislikes the fact that shadow Health Monitor partition execution windows must be defined to overlay all user partitions so that when a serious fault occurs, the health monitor can take over for the lame-duck partition that caused the fault and perform the required recovery action. A module reset is forced by writing to a register in the Polaris ASIC rather than through under-pulsing the watch-dog hardware.

DDC-I does not intend to have a separate Health Monitor partition. Rather it will perform platform specific health monitoring functions via its Platform Resource Library (PRL) mechanism which allows a design by which only registry designated partitions can induce platform-wide actions such as a platform reset. This same mechanism will be usable by designated RMA processes as well, allowing a fatal error in a critical RMA process to force a platform reset as well, while denying that capability to low criticality processes. The attendees had no issues with the DDC-I approach but it was decided that Honeywell should provide DDC-I with the Health Monitor document delivered to Green Hills. It may or may not be relevant given the differences between the Green Hills and Deos implementations, and it must be viewed in light of ARINC 653 specification changes in Health Monitoring functionality since the paper was written.

Actions:
  • Dan Will to provide to DDC-I the Honeywell-produced Health Monitor white paper delivered to Green Hills during the 787 FCS project. Due 4/4
  • Follow-on Action: DDC-I to evaluate the Health Monitoring paper, assess relevance and glean useful information. Due 5/1

Attendees:

  • Miller, Larry
  • Smith, Stephen
  • Schmidt, Brian
  • King, Wayne
  • Will, Daniel
  • Thompson, John
  • Johnson, G Craig
  • Rische, Ronald
  • Larson, Aaron
  • Frost, Richard
  • Roffelsen, Ryan
  • Kindorf, Gary
  • Kimball, John
  • Anderson, Mark
  • Saw, William
  • Hancock, Bill

ISSUE 1 - Spawned Actions:

Harmonic window rate issue, scheduling issues.

Description:

The proposed design allows Deos RMA threads to be scheduled during time slots when no ARINC 653 partition is active. However, for RMA thread budget guarantee purposes, the design assumes that a fixed portion of each tick period is dedicated to ARINC 653 windows. This design does not allow RMA threads to utilize budget created in frames where a smaller than worst-case 653 footprint exists. This means that full utilization of the CPU (budget-wise) would depend upon the ability to apportion ARINC 653 execution windows evenly across each Deos tick frame.

The ARINC 653 partitions that will be ported from the 787 FCS design apparently do not have a constant minor frame to minor frame footprint.

Response:

This subject generated a substantial amount of discussion during the design phase. The design of Deos is to compute the available CPU time at the fastest period then as processes and threads are created/deleted use a budget normalized to the fastest period to subtract/add the available CPU time. This allows the Deos kernel to quickly decide if a process or thread can be created or not. In order to provide Deos RMA threads guaranteed (i.e. non-slack) access to budget created in frames where a smaller than worst-case 653 footprint exists the Deos kernel would have to track CPU quota available at given rates which would add additional complexity into an already complex system. We (the Deos kernel team) decided to preclude this for the following reasons:

  1. Performance: At a minimum this would slow down context switches (do to budget and slack manipulations), process creation/deletion, thread creation/deletion, and mutex lock/unlock time.
  2. Complexity: The changes would be far reaching within the scheduler impacting some of the most complex code we have. This directly impacts development time and cost.
  3. Use complexity: This would likely complicate the user and platform integrator jobs.
  4. Nothing New: In a standard 653 system users typically load level the system. Failure to do so results in wasted CPU time. In Deos this time is available as slack time so it is not completely wasted.

If it is not possible to load level the 653 threads there are some possible workarounds:

  • Load level your system using non-653 threads. The Window Activation Table can be used to schedule non-653 threads in the holes.
  • Use the unused 653 time as "guaranteed slack". If you know that at a given rate the 653 time-line will generate slack you can design a system that enables that time to be used by high criticality apps.
Discussion:

The initial DDC-I response pointed out the difficulty and complexity associated with providing the desired behavior (RMA scheduling across a timeline with differing amounts of reserved 653 time in each minor frame). The response also offered some alternative approaches to use blocks of time in underutilized minor frames.

Since the C919 FCS software architecture has not been established it is uncertain if the limitation is or isn’t a big issue. Some opportunities were identified to even out differences from minor frame to minor frame. The BMP partition is an offender in that regard and could perhaps be spread out across frames in smaller windows or it could possibly be converted into an RMA process instead. However, considerable IO uncertainty exists since C919 FCS IO will differ from 787 FCS IO. Without RTOS support for fully utilizing uneven minor frames, good utilization of the CPU could thus hinge upon the ability to evenly spread IO partition windows evenly across minor frames. The success of such an effort is difficult to predict at this time.

The decision was made to task the Platforms team with studying the feasibility of addressing the frame to frame unevenness caused by the BMP, via one of several possible strategies, keeping the required hardware-centric functionality such as the Flash writes it performs in mind. Also combining BMP and SKP functionality was discussed as a possibility in an earlier meeting and could be impacted by this issue. It is understood that the Platforms team will not be able to do a full assessment at this time given the lack of IO detail.

As a fall-back strategy in case minor frame variation can’t be scheduled out, Wayne suggested that DDC-I provide a ROM estimate on the cost of providing RMA thread scheduling that can utilize all non-653 partition time across minor frames.

Actions:
  • Platforms Team to study SKP/BMP design alternatives which could yield even utilization across minor frames. Due 4/29
  • DDC-I to provide ROM cost estimate to provide the capability to fully utilize non-653 partition time within each minor frame. Due 4/15



ISSUE 6 - Spawned Actions:

RMA thread execution during unused 653 execution window time could lead to unexpected behavior.

Description:

While the ability for RMA threads to utilize slack time created during a partition execution window is often desirable, this could also allow RMA threads to execute during unexpected times which could break order assumptions between 653 and RMA elements.

Response:

This issue also came up during the design and the conclusion was that if there is an ordering dependency between 653 and RMA elements, the RMA element can no longer be considered a strictly RMA element (i.e. it can no longer be scheduled by RMA rules) instead the RMA element must be scheduled via the Window Activation Table like 653 elements.

Discussion:

The initial DDC-I response to this item was deemed unsatisfactory. Considerable time was spent better describing the challenges. A couple of potential solutions were proposed by Honeywell including:

  • An attribute on each 653 activation window what would allow (or preclude) RMA scheduling within that time slot. With this approach, RMA scheduling could be constrained to not occur within time slots associated with a key partition or partitions. The kernel would spin rather than give the time to an RMA thread if the partition did not use the time.
  • Schedule-before relationships between RMA threads and 653 partition activations. This proposal would provide more fine-grained control but DDC-I expressed concern over the complexity of implementing such a mechanism since the planned implementation does not allow a 653 partition activation window to be involved in a schedule-before relationship.

DDC-I was asked to reexamine the issue and provide a proposal that addresses the concerns.

Actions:
  • DDC-I to provide new proposal to provide a means to control/constrain RMA thread execution in relation to a 653 partition execution window. Due 4/29

CDR

CDR was held on 10/3/2011. See Larry's meeting notes and Action item list.

The following are DDC-I's action items:

Issue 1: Full Use of non-653 Reserved Time by RMA Threads

  • Assignee: Ryan
  • Due Nov 28, 2011

DDC-I ACTION: Provide a ROM estimate of cost to provide a scheduling solution that can take full advantage of time unused by ARINC 653 windows.

Issue 4: ARINC 653 Main Process Stack Size

  • Assignee: Gary
  • Due: Nov 18, 2011

DDC-I ACTION: Provide a proposal for a means to allow a partition developer to specify the required main process stack size.

The XML associated with deos653config tool version 1.1.0 contains an attribute allowing the main process stack size to be specified.

Issue 5: Determine ARINC 653 process execution times.

  • Assignee: Richard
  • Due: Feb 6, 2011

DDC-I ACTION: Provide information on how 653 process execution times will be conveyed to Deos653 users.

Issue 10: No stack underflow/overflow mechanism in place

  • Assignee: Richard
  • Response Due: Nov 18, 2011

DDC-I ACTION: Provide plan on how 653 process stack overflows can be trapped.

  • Response Sent: Nov 17, 2011

DDC-I RESPONSE: Enclosed is the response for CDR action item #10, related to stack overflow handling in Deos 653.

Status Monitor Support: The status monitor will be updated to show 653 process stack utilization, similar to Deos thread stack usage. This is accomplished by the status monitor knowing where process stacks start and their length. The status monitor can read backwards looking for the pattern which was written at a specified interval. When the pattern is not present, that is the high water mark of the stack which will be reported.

The stack initialization occurs from the kernel during partition initialization according to the kernelAttribute stackTagIntervalInDWORDs. This includes initialization of the TLS which is used by the runtime for 653 process stacks.


Health Monitor - STACK_OVERFLOW exception In order for the Health Monitor to detect stack overflow, a gap must exist between 653 process stacks which is filled with a known pattern. The size of the gap will be configurable with a zero indicating no gap, and this condition will not be caught by the 653 runtime library. When a gap is present, another configurable setting will indicate how frequently to write the pattern in the gap. The more frequent the pattern, and the larger the gap, the more likely an errant write past the end of the stack will be caught.

During every 653 process context switch, the runtime will evaluate each location in the gap which should have a pattern to ensure that the pattern exists. If any location does not match, then a stack overflow has occurred. Therefore, the larger the gap and more frequent the pattern the more locations which must be compared, slowing down the 653 process context switch times.

The configuration of the gap size and pattern frequency is controlled in the 653 configuration file, and is set on a per partition basis.

Issue 12: Verf of IOI PIPC buffer

  • Assignee: Gary
  • Due Nov. 12, 2011

DDC-I ACTION: Provide ROM Quote for V&V of PIPC buffering model for IOI. Also make sure Honeywell fully understands that 653 data will be modeled with the ring buffer. Although the IOI will allow each produced item to declare the type of buffer it will use, any consumes of a data item must use the producer selected buffer (that happens implicitly). So although the PIPC buffer could be provided, it could not be used to interface to any 653 data. It would be there for deos-app-to-deos-app purposes only. Make this clear first, then find out if Honeywell is still interested.

Issue 13: RMA Thread Execution during Unused ARINC 653 reserved time

  • Assignee: Aaron.
  • Due: Nov. 12, 2011

DDC-I ACTION: Investigate Using RMA Thread Slack Enable attribute to enable/disable utilization of unused ARINC 653 window time.

653 Working Group Activity

20110514-Paris

The primary focus of the meeting was a review of draft 7 of Part 4. Draft 8 was introduced as a result of that meeting, and that draft can be found in the working group material found here.

Actions:

  • Part 3 is dedicated to testing 653 systems. The test team should reference this Part.
  • Part 1 Supplement 3 Appendix D contains a number of APEX range restrictions (priority, for example, is 1..239). I don't think we were aware of this before the meeting. In the current implementation, the ranges are explicitly expressed in the XML. PCR:7047 has been written to capture this action.
  • Support a distinction in the configuration tooling for Part 1 vs Part 4 to enable the generation of warnings for Part 4 configurations that are not compatible with Part 1. See PCR:7048