POSIX Access Controls

From DDCIDeos
Jump to navigationJump to search

Overview

Design for POSIX access control in Deos.

The primary driver at this time (Jun 2013) is to support cache partitioning.

Access control is needed to support cache partitioning on top of memory pools to:

  1. Prevent process creation using a process template with access to a RAM pool.
    1. I.e., use file system access to prevent process creation.
  2. To prevent cache pollution/trashing.
    1. Files live in a FLASH pool, preventing read access prevents cache pollution.

Its not clear that full ACL or capability support is strictly necessary, since we don't know how complicated the cache pool process inter-dependencies will be. The increased flexibility undoubtedly has higher cost, but probably not hugely much more so.

POSIX Introduction

POSIX defines access control based on both users/groups, and #POSIX_Capabilities. Users/groups define the access a process has to files, other processes, and in general user visible objects. Capabilities control access to kernel APIs/actions.

Every process has an effective user (EUID) and group (EGID). There is also a set of capabilities represented as a bit-vector of 32 or 64 bits.

Access to files is controlled by owning user, group, and other with appropriate read, write, execute, and also access control lists (ACLs), which are effectively a list of users and groups along with the access each grants. There are some special rules that ensure ACL based access is "consistent" with the owning UID/GID based access.

In most existing *nix implementations, the #POSIX_ACLs and #POSIX_Capabilities are stored in a separate #Extended_Attributes storage area associated with each file. This makes sense because the ACLs are potentially arbitrarily long, although in many systems the size of the extended attributes is limited, e.g., to 4K, or 64K.

It is also worth noting that in POSIX, the IDs are the primary means of matching, and user and group names are secondary. Deos places all the ACL information in the registry, thus the names can be the primary means of interchange, and (at least for now) minimizes the importance of the IDs. Eventually when full setuid() and setgid() support is added, more dependence on IDs may make its way into application source code so IDs may increase in significance over time, thus necessitating user defined name to ID mapping.

Deos will not support *nux "root" semantics, instead it will use capabilities.

Deos will not implement the #stat() API at this time. Debug accessors may define kstat() or something else that is similar to stat, but Deos will not use the stat header file name, or the stat function or structure names.

Historical Overview

This design stores file system access control information in the registry using file names as the shared key. Some other designs considered (and recorded in the history of this page) had the information stored in the file system, but that didn't work out very well because different registries could potentially need different access models, and changing the file system layout caused numerous backward compatibility issues.

Issues

  1. How does this relate to names?
    1. Named processes are unaffected, although permitted binder could probably now be implemented via file system access rights in most cases.

Changes To Deos

Objects

The following is the logical data required for the objects. The physical layout and bit/field packing is left to detailed design.

User

The User class will be stored in the registry. The list of all defined users should be sorted by uid. The name field is not used by the kernel except for diagnostics. The names should appear in the registry, null-terminated, packed together, preferably not interleaved with other data used at runtime.

 class User
   const char *name;
   uid_t uid
   gid_t gid
   supplemental gids

There should be a pre-defined user "nobody" and a pre-defined group "nogroup". User nobody should have group nogroup, with no supplemental gids. nogroup should not appear as the gid or supplemental gid of any other user. Conventionally there is also a user "root" with UID=0, but that is not required, although its probably a good idea to NOT use 0 for nobody, and in general reserving 0 would probably be a good idea for future POSIX compatibility.

FileAttributes

A sequence of FileAttributes is stored in the registry. The primary access key is the filename, so the sequence should be sorted or hashed to optimize lookup. Suggest all names appear in the registry, null-terminated, packed together. Whether the name strings are in sorted order, or optimized for cache access is left to detailed design.

 class FileAttributes:
   const char *filename;
   "other"   access (read, execute)
   <userID,  access (read, execute, setuid)>
   <groupID, access (read, execute, setgid)>
   userACLs  list of <userID,  access (read, execute)>
   groupACLs list of <groupID, access (read, execute)>
   cap_t capabilities

See #stat() for suggested ordering of rwx and setuid bits.

Supporting wildcards (file globbing) is optional. If wildcards are supported, some means to ensure unambiguous specifications should be implemented.

Because the registry is not tightly coupled to the file system, it is not possible to require that the sequence of FileAttributes specify all files in the file system, consequently there must be a default entry that is applicable to files in the file system that do not have an associated FileAttributes entry. The default default entry should be:

 user=nobody, group=nobody access is rx for user, group and other.
 No ACLs
 capabilities = (Can perform BIT);

The default entry should be specifiable in the IT XML.

It is TBD if setKernelAttributes() will require "can perform BIT" capability. The "password" and PAL enabling interface is being phased out as part of the introduction of setKernelAttributesEx()

UIDs and GIDs could be limited to 16 bits, as could all the fields above. Since Deos will not provide a means to set the UID/GIDs (i.e., the user can't have a dependency on the IDs assigned), the integration tool could have its specification entirely based on user/group names and assign IDs densely, which means they could probably be substantially smaller than 16 bits which would leave plenty of space for access control bits in a 16-bit word.

Note that Linux defines uid_t and gid_t as 32-bit UNSIGNED32, so the API and kernel internal use should use that type. The registry could use smaller types.

Any meta data fields in the above are recommended as offsets, but pointers could probably be used as well. Ideally though the kernel would return the FileAttributes information via some (debug library) API and it would be nice to be able to not return pointers.

Note that pointers to PIB data structures do not currently escape the kernel.

There are some vagaries of the POSIX access control model involving the mask field e.g., "mask" ensures that no user ACL grants more access than "the user" entry, but we don't need to support that at this time.

Even though the file system is read only in normal mode, we still might want to specify "write" access for future use of the FileAttributes info by other file systems. Perhaps just reserving a bit for now. This is a "nice to have".

Capabilities

The following capabilities must be defined:

  1. Can perform BIT
    • Unclear if there is a reasonable Linux equivalent, perhaps CAP_SYS_RAWIO.
  2. Can set Kernel Attributes
    • This should replace the registry kernelAttributesOwnerList
    • Closest Linux equivalent appears to be CAP_SYS_BOOT.
  3. Can modify kernel file system
    • This should replace the registry fileSystemOwnerList
    • Closest Linux equivalent appears to be CAP_FOWNER, or perhaps CAP_SYS_RESOURCE.

The associated routines should be modified to check capabilities rather than using names.

The above is intended to not be a substantive backward compatibility concern.

The integration tool should convert the name lists to capabilities on upgrade.

Process

New fields:

 uid_t euid		// Effective uid
 gid_t egid		// Effective gid 
 cap_t capabilities

Let fileAttr be the FileAttributes matching the process template's partitionASCIIName.

Create process shall fail if the creating process does not have "x" access to the fileAttr.

If the fileAttr has the setuid bit set, the created process' euid shall be set to the fileAttr.userID.

If the fileAttr has the setgid bit set, the created process' egid shall be set to the fileAttr.groupID.

If the setuid or setgid bits are not set, then the created process' euid and/or egid shall be set to either autoCreatedProcInfo or be inherited from the creating process. The autoCreatedProcInfo should refer to a #User, and the user.gid used as the egid of the newly created process.

The only anticipated reason to have an API that returns the euid or egid is testing. However Deos will support new kernel APIs geteuid() and getegid().

http://linux.die.net/man/2/geteuid http://pubs.opengroup.org/onlinepubs/009695399/functions/getegid.html

There do not appear to be any POSIX or *nix portable APIs to get the euid or egids of a different process. If such APIs are needed, they will be added to libdebuggerkernelsupport.so. It appears that /proc/$pid/uid is reasonably portable, but Deos will not introduce the /proc file system at this time.

Since Deos does not support saved and real IDs, or CAP_SETUID, there is no need to implement functions that modify uids or gids.

POSIX Compatibility

Note that although Deos will support setuid and setgid bits in the file system, Deos will not support the full POSIX semantics. POSIX retains the "real {user,group} ids", which means that suid programs always retain the "real uid" of the creating process. At this time Deos processes don't need real user ID, so Deos will not be POSIX compliant, but neither will it be incompatible (just not complete).

The setuid/setgid file attributes are required to provide a means to change the effective IDs in order to support the required access control.

Future Implementation Note: Effective IDs behavior matches POSIX specified behavior, however to maintain backward compatibility if real IDs were implemented with access control information, Deos could not match POSIX since Deos does not grant child processes permission to affect the creating process. However since Deos will not implement "real IDs", at this time there is no inconsistency. The behavior of real IDs can be resolved when execve() is implemented. I.e., the treatment of real IDs by execve() and createProcess() would likely need to be different. Alternatively users could be required to create FileAttributes with setuid/setgid to get the backward compatible behavior.

Image mapping code

Update mapViewOfKernelFile*(), and readFromKernelFile() to require "read" access to all files.

Since mapping an image file (.exe or .so) requires 4 different file operations, the access control lookup could be noticeable. Consequently caching permissions for the last file may be desireable. Post implementation testing should be done to determine if caching is necessary.

Access Checking Rules

The following assumes #User and #Group are implemented as noted.

The following defines the access a process has (either read, write, or execute) to a file (let fileAttr be the FileAttribute matching the file):

 If
     process.euid == fileAttr.userID, then fileAttr.userID.access determines access
 else if
     process.euid == for some x, fileAttr.userACLs[x.userID], then x.access determines access
 else if
     (the egid, or any process.euid.supplemental gids) matches (the
     fileAttr.groupID or any of the fileAttr.groupACLs.groupIDs) and the
     matching entry's access grants the requested permissions, that entry determines access
 else if
     (the egid, or any process.euid.supplemental gids) matches (the
     fileAttr.groupID or any of the fileAttr.groupACLs.groupIDs) and
     none of the matching entry's access grants requested
     permissions, access is denied.
 else
     the fileAttr.other entry determines access.

Note that the above permits group to specify broader access than user, and "other" to specify broader access than is granted by a matching user or group. This is intentional.

The rules for access checking in POSIX ACLs were never standardized. This is a concise statement of the semantics: http://users.suse.com/~agruen/acl/linux-acls/online#sec:permission-check The above Deos design is intended to be consistent with that specification.

Failing access control is a new way for a process to fail to load. Since the access control info only applies to files that exist, it is likely that diagnostic output (log event) can be made meaningful enough that elfchk does not need to be enhanced. If not, then an elfcheck update may be required.

E.g.,

  logEvent(file_not_readable, filehandle);

File Display

Either FTP server should be updated to return the file system access, or OpenArbor changed to have the ability to dump the FileAttribute information applicable to each file. If ftpserver is the choice, suggest defining POSIX like APIs, e.g, #stat() to get the #POSIX_ACLs, and #Capabilities. If the APIs are not POSIX compliant, use a Deos specific name.

Note that POSX ls(1) indicates the existence of ACLs by adding a "+" to the "rwxrwxrwx" protection mask when a file is listed, e.g., "rwxrwxrwx+". and "s" overlays "x" for setuid/setgid. It would be nice if ftpserver honored that convention. ls(1) does not have a special marker for capabilities.

Supporting Cache Partitioning

To implement cache partitioning, it is necessary to specify a set of user and/or groups in such a manner that created processes only have access to the proper memory pools, and that only authorized processes are permitted to create a process that has access to the memory pool. Assume the following condition:

Field Process A Process B Process C
euid User_A User_B User_C
egid Group_A Group_B Group_C
executable A.exe B.exe C.exe
Accessible Memory Pools Pool A, C Pool B Pool C

Use Cases

Assume process A and B already exist. Process A needs to create process C, but process B should not have access to Pool A or Pool C, and process C is not supposed to have access to Pool A. To accomplish this, establish the following FileAttributes:

          User/Group     Access
 A.exe
   user:  User_A        rx, setuid   setuid/gid Or autocreate with 
   group: Group_A       rx, setgid   appropriate euid and egid
   other:               none
   ACLs none
 
 B.exe
   user:  User_B        rx, setuid   setuid/gid Or autocreate with 
   group: Group_B       rx, setgid   appropriate euid and egid
   other:               none
   ACLs   none
 
 C.exe
   user:  User_C        rx, setuid
   group: Group_C       rx, setgid
   other:               none
   ACLs   User_A        x

Process B can't create an instance of Process A or Process C because Process B does not have "x" access to either A.exe or C.exe. Similarly Process C cannot create instances of Process A or Process B because Process C does not have "x" access to either A.exe or B.exe.

Process A has "x" access to C.exe, so Process A can create an instance of Process C. Because C.exe has setuid and setgid, when Process A creates Process C, Process C will be created with euid=User_C, and egid=Group_C. Note that Process A must have sufficient quota in memory pool C to create Process C.

Using access control mechanisms alone, there is no way to prevent process A from creating multiple instances of Process C. However such inhibition can be achieved if Process A only has sufficient pool C quota for once instance of C.




None of what follows describes required behavior at this time.



POSIX Object Overview

Random notes on the POSIX objects that are involved in access control that may (eventually) need to find their way into Deos.

POSIX Users

  • POSIX states that a user with UID of 0 has "appropriate privileges", i.e., "has access to everything" aka "root" access.
  • "nobody" is uid_t(-1)
  • Suggest reserving UIDs from 0--999 for (Deos) system use.

There is a reasonable overview of the following at: http://man7.org/linux/man-pages/man7/credentials.7.html

Note that packing the UID space is probably not reasonable since the attributes in the file system have longer extent than a registry or other data structures that we'd likely use to represent the list of users.

POSIX Group

 name
 ID

POSIX also has a group password.

POSIX User

 name
 UID
 GID  primary group ID
 supplementary group IDs

Linux used to permit 32 supplementary groups per user and reserved 15 bits per user/group ID. Linux [http://man7.org/linux/man-pages/man7/credentials.7.html now supports 65536] supplementary groups.

POSIX also has a user password.

POSIX Process

 euid Effective user ID,
 saved user id,
 real user id
 egid  effective group ID
 gid   real group ID.
 sgid  saved group ID.

Only euid and egid are required if we don't support SUID programs or root access.

Note that the GID should not be confused with a "process group". Process group and session ids are a means for managing terminal sessions

mode_t

In stat.h (12 total bits)

 user access   (rwx)
 group access  (rwx)
 other access  (rwx)
 suid   set UID on execute   AL: Support or not?
 sgid   set GID on execute   AL: Support or not?
 isvtx  On directories, restricted deletion flag.

Note that suid and sgid have additional directory specific behavior.

Ref: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_stat.h.html

File Properties

This stuff is returned by stat()

Ref http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_stat.h.html

stat // There is both a type stat and a function stat(). Nice.

 //device
 inode         file/PIB handle
 mode_t
 //nlink       Number of hard links
 uid           owner
 gid           group
 // st_rdev
 st_size      length of file in bytes
 // timestamps for access, modification, status change

Commented out fields above, probably would not be supported for the kernel file system.

  • POSIX filenames are case sensitive.
  • POSIX portable filenames only need to support letters, digits, period, underscore and hyphen.

Interesting POSIX APIs

stat()

Returns file system information. Ref http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_stat.h.html

exec() and SUID/SGID

Ref: http://en.wikipedia.org/wiki/Setuid

On exec of a program with suid or sgid, the executing process gets its effective uid or effective gid set (or both).

There is also special semantics for directories.

Modern Linux is replacing setuid programs with #Capabiliites.

setuid()

Allows root to change the real, effective, and saved user ID, or suid executables (processes) to change the effective uid.

Ref: http://pubs.opengroup.org/onlinepubs/009695399/functions/setuid.html

setgid()

Like #setuid()

http://pubs.opengroup.org/onlinepubs/009695399/functions/setregid.html

chown()

Change owner and group of a file

Ref: http://pubs.opengroup.org/onlinepubs/009695399/functions/chown.html

Predefined files

These seem to be pretty standard, but not part of POSIX. I think we could ignore them for now. However, we'll need some way to specify both in the registry and to hypstart, what the current users and groups are, and the numeric IDs will have to match.

 /etc/passwd   Ref: http://en.wikipedia.org/wiki/Passwd
 /etc/group    Ref: http://linux.die.net/man/5/group

kill()

Kill sends signals (Deos terminology, raise an exception) to other processes. W.r.t. access control, the sender must have the CAP_KILL capability, or the real or effective user ID of the sending process must equal the real or saved set-user-ID of the target process.

Ref: http://man7.org/linux/man-pages//man2/kill.2.html

POSIX ACLs

POSIX also defines Access Control Lists (ACLs). Effectively this is a way to specify access for more than one user or group, e.g., user:alarson:rwx (ref http://users.suse.com/~agruen/acl/linux-acls/online/). ACLs are stored as file #Extended_Attributes.

POSIX Capabiliites

Capabilities are defined with #defines starting with "CAP_" and there are standard macros for converting between capability sets and strings, and setting and clearing capabilities. Ref http://linux.die.net/man/3/cap_to_text and related.

Adding capabilities to files may not be a good match since capabilities may vary by which registry is loaded. E.g., ftp in download has "may modify file system".

Extended Attributes

Ref: