Kernel 64

From DDCIDeos
Jump to navigationJump to search

Tasks

64-bit tasks for kernel until counting on deos (i.e. idle running). For other components see Deos_64-bit_Port#Component_Status

Task Priority Assignee Status Remarks
kernel - Get updated manuals 1 RLR Done ELF spec, ARM ARM
qemu-vm 4 Done Enable 64-bit gdb interface for at least aarch64 and x86_64
kernel - aarch64 core register requirements 2 AL Done-ish
kernel - aarch64 image requirements 2 RLR Done-ish This is done enough for now. There are several TODOs that will need to be addressed as more 64-bit knowledge. Notably after we figure out what to do with TLS.
kernel - x86_64 core register requirements 10 AL In Work
kernel - x86_64 image requirements 10 RLR Done-ish This is done enough for now. There are several TODOs that will need to be addressed as more 64-bit knowledge. Notably after we figure out what to do with %fs. -fno-stack-protector added to build-utils and OpenArbor as a temporary work around.
64-bit compilers 3 Done aarch64, powerpc64, and x86_64 already built for both Linux and Cygwin.
Build utils updates for 64-bit compiles 3 AL Done Sufficiently complete to start development
qemu-aarch64 boot/pal/config 4 RLR Done
compiler startup-aarch64 4 RLR In Work Add support for single step.
qemu-x86_64 boot/pal/config 14 RLR Done-ish
compiler startup-x86_64 14 RLR Done
aarch64 bsp-devkit 4 RLR Done
hypstart aarch64 7 Done Add support for 64-bit
hypstart x86_64 14 RLR Done
debug_library aarch64 7 RLR Done-ish The library builds but is completely untested.
debug_library x86_64 14 In Work
integ-tool/makereg 7 GK Done size_t. Does this include change to offsets?
kernel - update VAS to 39 bits 8 RLR Done-ish iterator mostly done and ported to all targets. TODO: documentation, resolve todo items.
kernel - aarch64 context switch code 6 AL Done
kernel - x86_64 context switch code 16 AL In Work
kernel - aarch64 exception handlers 6 AL Done-ish Done except for ABC instrumentation
kernel - x86_64 exception handlers 16 AL In Work
kernel - aarch64 startup 5 AL Done-ish
kernel - x86_64 startup 14 AL In Work
kernel - aarch64 other HAL 6 AL Done-ish
kernel - x86_64 other HAL 16 In Work Perhaps get KVM accelerator working to test this out? Otherwise FP crashes kernel on QEMU.
kernel - aarch64 image load 7 RLR Done-ish This is done enough for now. There are several TODOs that will need to be addressed as more 64-bit knowledge.
kernel - x86_64 image load 12 RLR Done-ish This is done enough for now. There are several TODOs that will need to be addressed as more 64-bit knowledge.
desk-python-tools 4 Done Needed to support hypstart

Priorities

  •   0-   9 aarch64
  • 10-20 x86_64

Be a maintainer

From a kismet distribution:

  1. Install unreleased of:
    1. gdb-cross-debuggers (gdb-11)

Uncategorized

Proposed solutions to BTree limitation PCR:14792

  1. Make names be statically specified in registry; change binding to be via CAS assignment.
    • - backward compatibility issues; likely requires changes to several (many?) other components
    • + Eliminates stack and btree analysis
    • + Eliminates a large Level 3 lock
  2. Expose numericName APIs (i.e., ones that don't DEOS_hash() their argument).
    • This would permit applications to specify numeric values, e.g., via linguistic enums, or other mechanisms (range of integers) that would avoid the need to compose unique strings that might be difficult to statically assign.
    • It is not clear this is a viable/complete alternative, but it does introduce the potential for apps to move the search to compile time in some cases.
  3. Introduce a new "name state": PresentButNotBound.
    • On entry into a create API, first add the name (same btree as today) to the namespace(s), in state PresentButNotBound
    • Create the object, passing in the name, which will be bound to refer to the newly created object via an atomic assignment.
    • the getHandle functions will treat PresentButNotBound as "not present".
    • RLR: implied is the process and SMO name spaces would be protected by two new level 2 locks.
    • - Introduces new object creation race condition (object create fail + namespace bind fail are now independent).
    • + Other than new race condition is backward compatible; i.e., would not affect other components.
    • - namespace critical would be retained as level 2 lock (in the case of Process, a pair of independent ones, most likely with a new race condition).
    • - btree analysis is still required, but stack portion likely simplified since namespace add/delete would be closer to KFunction.
    • - need some way to bound the upper limit on the number of process names/aliases. Currently there is nothing that prevents each process from defining (2^16 - 1) process aliases.
    • TLBs appear to be a significant factor. Might need some way to minimize them.
  4. Variant of "new name state": Require all aliases to be specified in the registry.
  5. An attempt to eliminate the second process name BTree lookup but that doesn't have as much backward compatibility concerns.
    • - Probably introduces new failure cases (alias used by another process).
    • - Probably would still need a separate critical.
  6. Kernel shutdown hook. Many notes on BSP_Support_PhysicalAlloc.

Make it easier for hypdump to find the PVA.

  1. Currently hypdump has to search for the PVA and the new kfs-cvt tests can introduce errors that cause the search to fail. See kernel maintainers post for more info.
  2. suggestions:
    1. Put the PVA in an elf Section/Segment.
  1. Have the kernel pass the PAL a struct of PPI functions and have the PAL call the kernel via these pointers. This would facilitate layered PAL implementations and eliminate(simplify?) the PAL interceptor logic during testing.
  2. Add a DEOSABIAPI symbol that resolves to the normal ABI conventions for the target, e.g., CDECL
    1. Lots of libs create their own symbol, e..g. SAL API, all needlessly.
    2. At this point the base api and *foo*API symbols are still used for auto-generating the list of exported symbols.
  3. Add PPI functions to get properties of platform resources.
    1. Helpful for PRL registration functions, and perhaps even PAL (e.g., to get GIC address).
  4. Make all PRC_* tags be public symbols so they get an xref-alternate which could then be used by the CVTs.

TODO

non-backward compatible changes we wish to make to the kernel before the first release.

  • deprecate virtualReserve(), get/setNextLibraryStartAddress(), and others
  • Make paging attribute interpreter macros functions
  • change system tick user interface to 64-bit value? For 64-bit only. Make it a size_t?
  • Remove addresses from registries (make it relocatable). Perhaps we can deprecate the apis?
    • Need to identify impacted APIs. High priority to fix these APIs. Low priority to fix registry.
    • may need to punt.
  • Check "length" of things (API parameters) to make sure they are 64 bit, e.g., platform resources are currently limited to 32 bit.
  • (partly done, probably defer for kismet) Re-layout VAS.
  • make elf symbol resolution follow the ABI? x86 (at a minimum) needs to continue to support the current behavior.
  • Address issues related to gnu-language
    1. linguistic TLS for executables can create
      1. COPY relocations
        • COPY relocations for executables could be supported if ELF symbol resolution followed the spec. Not possible for shared object images.
      2. References to segment registers
      3. Assumptions that the TLS for threads in the .exe's module are initialized (I think).
    2. References to vtable entries from gnu-language's operator new classes can cause
      1. COPY relocations if the executable is not linked -fPIC (or presumably -fpic).
  • ARM processors don't support floating point exceptions and users would like some way to detect FPU errors without writing special code. Proposed "solutions" are:
    1. Add ARM/AARch64 FPU error detection to the FPU context switch code. This is not perfect solution but perhaps it is "good enough".
    2. Other?

Defer

  • Do we want to add 32-bit read functions for (unattached) platform resources and files?
    • Perhaps add some support routines/structs, e.g., something that extracts indexed 8, 16, 32-bit values from a 64-bit value?
  • read/write controls on kernel mode VAS accesses
  • VAS no execute access?
  • switch object access rights to POSIX like UGO access?
  • stop having arm PAL provide readTimeStamp() or start having ppc and x86 PALs provide it?
  • Remove required pad pool?
  • Stop measuring the TSC rate at startup?
  • Other?

Recently Completed

  • DONE. remove virtualReserve(), suggest use of adjustNextLibraryStartAddress() instead. x86_64 and aarch64 only.
  • DONE. Minimize the number of APIs that return pointers into kernel space
  • DONE. Remove align{1,2,4,8}.h, unalign.h, pack.h, unpack.h
  • DONE. Fix virtualQueryEx() performance
  • DONE. Remove deprecated APIs.
  • DONE. Remove deprecated types
  • DONE. Remove Produce/consume APIs.

Out

  • Remove the word swap of PPC page table entries.
  • Remove PE support - Note: PE support only on x86, x86_64 does not have PE support.
  • Remove KERNEL_DLL address range.
    • Requires all kernel images to be relocatable
      • interceptor PAL issues result (interceptor needs to find the rpal).
  • make object file record part of file's data segment
  • change x86 PAL to kernel interrupt interface to match ppc and arm?

Remove level 3 locks proposal

How to remove the last of the level 3 locks:

_publicSlackBudgetVector:

  • Make the public slack buvec per core.
  • During the window switch loop over all cores merging the public slack buvec to the private slack buvec.
    • Note: this is already done on the public slack buvec so the execution time is only increased by the number of cores - 1.

memory pools:

  • In the process track all returned pages instead of returning them to the global pool.
  • On allocation:
 enter critical
 lock process
 if ramQuota > 1:
   ramQuota--
   if numReturnedPages > 0:
     numReturnedPages--
     retVal = returnedPagesList.pop()
     unlock process
     exit critical
     return retVal
   else:
     unlock process
     exit critical
     enter critical
     lock memoryPool 
     retVal = memoryPoolPagesList.pop()
     unlock memoryPool
     exit critical
     return retVal
 else:
   unlock process
   exit critical
   return 0
  • On process deletion merge the returnedPagesList on the memoryPoolPagesList

system lock

See PCR 14867 comment #6

VAS Iterator Proposal

Proposal for supporting VAS with more than two levels of paging structure.

Call all the paging tables used by a VAS the VPTs (Virtual Paging Tables) include the Page directory table and all Page tables.

The VPTs in conjunction with the Virtual Memory Register and the VPs comprise the VAS Paging Structures (VPS). Each thing in the VPS is assigned a level. Legal level values are both larger and smaller than the level of the VPTs. Here is an example VPS with 3 levels of VPTs:

                                     VPTs
                           -----------------------------
               Virtual     Page            
               Memory      Directory               Page
               Register    Table          ...      Table   Page
VPS Level   :  4           3=N            2        1       0
Level Names :  VMRLevel    PDTLevel                PTLevel PageLevel

There are no VPTs at level VMRLevel or PageLevel. The levels VMRLevel and PageLevel are just to address end conditions. NotPresentLevel is an alias for VMRLevel.

The PDT is always pointed at by the VM register. The Level 1 VPS is always a Page Table. The PDTLevel is always N (i.e., the number of levels between PDTLevel and PTLevel can vary).

A VPS iterator iterates over the the VPTs in a VAS.

 Constructor   VPS(VAS, page range) // start+len, or start,end


The "Lowest level" and "Highest Level" refer to various properties. For example:

present entry
processor specific "valid bit" is set, and the entry refers to a
physical page (either VP or VPT).
lowestLevelPresent()
The Lowest level for which a VPT has a present entry. PTLevel (i.e., 1)
if the Page is present. VMRLevel if the PDT doesn't have a present entry.
Should consider changing definition to "lowest level where a paging structure is present".
highestLevelLastEntry()
The level for which the current index is the last entry of the VPT, and
all lower levels are also at their last entry.
E.g., 0 if the current index is less than 511, 1 if current index = 511,
2 if index=(512*512)-1
stepLevel(level)
Step to the next entry at the given level. level=PTLevel advances by 1,
level=2 advances 512 pages, etc. Advancing a level greater
than level 1 zeroes the lower level indexes.

Iterator Pseudocode

typedef size_t virtualPageIndex; // virtual addr >> 12

class it
{
  portals[PDTLevel]; // Could be "-1" since don't need an entry for "Page" level.
  // The entries at each level that were used to open the portal.
  _entries[PDTLevel]; // same comment as for portals.

  // end is first page not to be "visited".
  // todo rename _current,_start,_end to something like _current_vpi
  virtualPageIndex _current, _start, _end;
  unsigned _lowesLevelPresent;
};

portal get, open, slide, close free;

// protocol (functions that can be called 

VPSIterator vpi(range, vas)
{
  get all portals;
  open all portals starting at PDT level going down until a not present entry;
  not present levels are opened to some present page, e.g., the pdt;
  _lowesLevelPresent set accordingly;
  start,end=range;
  current = start;
}

// Is called with lock held
// postcondition: portals are "open appropriately"
unsigned lowestLevelPresent(out level, out entry)
{
  if (_lowesLevelPresent > PTLevel)
  {
    // start at _lowesLevelPresent and slide portals to any newly allocated pages
    // update _lowesLevelPresent
  }
  return _lowesLevelPresent, entry at _lowesLevelPresent;
}

// Preconditions:
//  - 0 < level <= PDTLevel(?)
//  - level >= _lowestLevelPresent-1
stepLevel(level)
{
  // increment _current
  old_current = _current;
  // ?? step at level and zero low order bits.
  _current = (_current + (1 << (9*level))) & ~((1 << ((level-1)*9))-1);

  // let lev be the highest level where the index in old and new_current differs
  updatePortalsAtAndBelow(lev)
}

private function updatePortalsAtAndBelow(level)
{
  for (L in level to PTLevel step -1)
  {
    if present
    {
      // slide portal at level L to the present page
    }
    else { set _lowestLevelPresent and break }
  }
}

// entry at level+1 must be present
// assumes portals are open appropriately
// Caller must ensure VPTs stability and has permission (e.g., ownership).
// this may be (practically) restricted to level=PTLevel.
updateVPSEntry(level)
{ 
  update entry;
  // maintain portals if level is > PTLevel (may not be possible/necessary).
  // alternate make it user's responsibility.
}

installPageIfNoChange(level, newEntry, newPage)
{
  // Perhaps change all VAS locking to use CAS?
  get lock/critical; // or use CAS?
  if (CAS(entryAddress(level), old=entryAtLevel(level), new=newEntry) fails)
  {
    return_page(newPage);
  }
  updatePortalsAtAndBelow(level); // perhaps this becomes public
}

cachedEntryAtLevel(level); //??
entryAtLevelPTLevel()
{
  // Perhaps this function goes away on use of CAS?
  // Don't use locally cached entries
  return _entries[...];
}

addrRangeSpansVPS(level);// just an index computation
highestLevelLastEntry(); // just an index computation


VAS Pseudocode

Introduce alternates of several of the existing functions to take a VPS level as a parameter, e.g.,

addrRangeSpansVPS(level)
replaces addrRangeSpansPT()
updateVPSEntry(level)
replaces updatePTEntry, updatePDTEntry, etc.
installPageIfNoChange(level)
updates iterator on both win and loss.
// Delete all virtually visible pages in the range for the VAS
destroy(range)
    VPSIterator vpi(range)
    while (vpi)
        vpi.acquireLock() // some sort of locking
        level = vpi.lowestLevelPresent()
        // TODO: this code is missing ownership test.  Should check at all higher levels
	// at least at PTLevel or PTLevel+1? depending on what level ownership is restricted to.
        if level == PTLevel // A virtually visible page is present.
            // Delete Page
            pg = vpi.entryAtLevel(PTLevel).physicalAddress()
            vpi.updateVPSEntry(PTLevel, notpresentEntry(PTLevel))
            vpi.unlock()
            if isRAM(pg)
                RAM.returnPage(pg)
        else
            level-- // step through each entry in the (lower level) present PDS
            vpi.unlock()
        vpi.stepLevel(level) // steps 512(level-1) pages, e.g., 1, 512, 512*512, etc.

bool thawOrCopyPages(const void *startPageAddr, size_t numPages, const RAMQuotaType *quota, CoreLock *vasLock,
                         UserAccessTYP userAccess, bool mayUseUftMemory, bool ensureCachesCoherent, bool forceCopy)
  VPSIterator sourceIt(startPageAddr, numPages)
  while (sourceIt)
    level,entry = sourceIt.lowestLevelPresent()
    if level == PTLevel
      origPTE = entry;  // save for installPageIfNoChange
      // Iterator may have to stop descending when hitting a !malleable entry.
      if (entry.present() && ! entry.malleable())
        foo = hightest level that is !malleable (some sort of loop)
        stepLevel(foo)
        continue
      if (entry.present() && entry.malleable())
      if (forceCopy || entry.frozen())  // frozen implies present
        if ! ensureVPSLevelThawedAndOwned(sourceIt, level)
           return failure
        newPage = RAM::getFreePage(quota, mayUseUftMemory)
        void *srcVirtAddr  = srcPortal->open(entry.physicalAddress(), PTEntry::cacheWriteBack);
        void *destVirtAddr = destPortal->open(newPage, PTEntry::cacheWriteBack);
        ::memcpy(destVirtAddr, srcVirtAddr, bytesPerPage);
       manipulate entry as needed
       CPU::ensureCachesCoherent(destVirtAddr, bytesPerPage, true /* aliased */);
       installPageIfNoChange(..., true); // always free the page on failure
       invalidateTLBentry(...);

bool MAP(sourceIt, destIt, lock)
  while (sourceIt)
    level,entry = sourceIt.lowestLevelPresent()
    if level == PTLevel
       // We have to thaw the destination page table.
       if ! ensureVPSLevelThawedAndOwned(destIt, level)
          return true;
       enterCritical
       prevLock = acquireLockAndOuterDummyLocks
       initialDestPTE = destIt.getVPSEntry(level)
       newDestPTE = entry;
       manipulate newDestPTE as needed
       destIt.updateVPSEntry(PTLevel, newDestPTE)
       unlock, exit crit
       invalidateTLBentry(destIt.currentVirtualAddress())
       if initialDestPTE.owned()
         RAM::returnPage(initialDestPTE.physicalAddress(), destQuota);
     destIt.stepLevel(level)
     sourceIt.stepLevel(level)
  return false

// ensure level and all levels above it are thawed and owned
ensureVPSLevelThawedAndOwned(level)  // perhaps level is always PTLevel, or maybe PTLevel and pageLevel
// Is responsible for ensuring that iterator portals are updated as necessary.
  for i = PDTLevel .. level (step -1) (PDTLevel is highest level where an entry can be not owned)
     if entryAtLevel(i) is not owned:
        allocate ram, copy page, 
	installPageIfNoChange(i)  // locks either in or around this