POSIX Access Controls
Overview
Design for POSIX access control in Deos.
The primary driver at this time (Jun 2013) is to support cache partitioning.
Access control is needed to support cache partitioning on top of memory pools to:
- Prevent process creation using a process template with access to a RAM pool.
- I.e., use file system access to prevent process creation.
- To prevent cache pollution/trashing.
- Files live in a FLASH pool, preventing read access prevents cache pollution.
Its not clear that full ACL or capability support is strictly necessary, since we don't know how complicated the cache pool process inter-dependencies will be. The increased flexibility undoubtedly has higher cost, but probably not hugely much more so.
POSIX Introduction
POSIX defines access control based on both users/groups, and #POSIX_Capabilities. Users/groups define the access a process has to files, other processes, and in general user visible objects. Capabilities control access to kernel APIs/actions.
Every process has an effective user (EUID) and group (EGID). There is also a set of capabilities represented as a bit-vector of 32 or 64 bits.
Access to files is controlled by owning user, group, and other with appropriate read, write, execute, and also access control lists (ACLs), which are effectively a list of users and groups along with the access each grants. There are some special rules that ensure ACL based access is "consistent" with the owning UID/GID based access.
In most existing *nix implementations, the #POSIX_ACLs and #POSIX_Capabilities are stored in a separate #Extended_Attributes storage area associated with each file. This makes sense because the ACLs are potentially arbitrarily long, although in many systems the size of the extended attributes is limited, e.g., to 4K, or 64K.
It is also worth noting that in POSIX, the IDs are the primary means of matching, and user and group names are secondary. Deos places all the ACL information in the registry, thus the names can be the primary means of interchange, and (at least for now) minimizes the importance of the IDs. Eventually when full setuid() and setgid() support is added, more dependence on IDs may make its way into application source code so IDs may increase in significance over time, thus necessitating user defined name to ID mapping.
Deos will not support *nux "root" semantics, instead it will use capabilities.
Deos will not implement the #stat() API at this time. Debug accessors may define kstat() or something else that is similar to stat, but Deos will not use the stat header file name, or the stat function or structure names.
Historical Overview
This design stores file system access control information in the registry using file names as the shared key. Some other designs considered (and recorded in the history of this page) had the information stored in the file system, but that didn't work out very well because different registries could potentially need different access models, and changing the file system layout caused numerous backward compatibility issues.
Issues
- How does this relate to names?
- Named processes are unaffected, although permitted binder could probably now be implemented via file system access rights in most cases.
Changes To Deos
Objects
The following is the logical data required for the objects. The physical layout and bit/field packing is left to detailed design.
User
The User class will be stored in the registry. The list of all defined users should be sorted by uid. The name field is not used by the kernel except for diagnostics. The names should appear in the registry, null-terminated, packed together, preferably not interleaved with other data used at runtime.
class User const char *name; uid_t uid gid_t gid supplemental gids
There should be a pre-defined user "nobody" and a pre-defined group "nogroup". User nobody should have group nogroup, with no supplemental gids. nogroup should not appear as the gid or supplemental gid of any other user. Conventionally there is also a user "root" with UID=0, but that is not required, although its probably a good idea to NOT use 0 for nobody, and in general reserving 0 would probably be a good idea for future POSIX compatibility.
FileAttributes
A sequence of FileAttributes is stored in the registry. The primary access key is the filename, so the sequence should be sorted or hashed to optimize lookup. Suggest all names appear in the registry, null-terminated, packed together. Whether the name strings are in sorted order, or optimized for cache access is left to detailed design.
class FileAttributes: const char *filename; "other" access (read, execute) <userID, access (read, execute, setuid)> <groupID, access (read, execute, setgid)> userACLs list of <userID, access (read, execute)> groupACLs list of <groupID, access (read, execute)> cap_t capabilities
See #stat() for suggested ordering of rwx and setuid bits.
Supporting wildcards (file globbing) is optional. If wildcards are supported, some means to ensure unambiguous specifications should be implemented.
Because the registry is not tightly coupled to the file system, it is not possible to require that the sequence of FileAttributes specify all files in the file system, consequently there must be a default entry that is applicable to files in the file system that do not have an associated FileAttributes entry. The default default entry should be:
user=nobody, group=nobody access is rx for user, group and other. No ACLs capabilities = (Can perform BIT);
The default entry should be specifiable in the IT XML.
It is TBD if setKernelAttributes() will require "can perform BIT" capability. The "password" and PAL enabling interface is being phased out as part of the introduction of setKernelAttributesEx()
UIDs and GIDs could be limited to 16 bits, as could all the fields above. Since Deos will not provide a means to set the UID/GIDs (i.e., the user can't have a dependency on the IDs assigned), the integration tool could have its specification entirely based on user/group names and assign IDs densely, which means they could probably be substantially smaller than 16 bits which would leave plenty of space for access control bits in a 16-bit word.
Note that Linux defines uid_t and gid_t as 32-bit UNSIGNED32, so the API and kernel internal use should use that type. The registry could use smaller types.
Any meta data fields in the above are recommended as offsets, but pointers could probably be used as well. Ideally though the kernel would return the FileAttributes information via some (debug library) API and it would be nice to be able to not return pointers.
Note that pointers to PIB data structures do not currently escape the kernel.
There are some vagaries of the POSIX access control model involving the mask field e.g., "mask" ensures that no user ACL grants more access than "the user" entry, but we don't need to support that at this time.
Even though the file system is read only in normal mode, we still might want to specify "write" access for future use of the FileAttributes info by other file systems. Perhaps just reserving a bit for now. This is a "nice to have".
Capabilities
The following capabilities must be defined:
- Can perform BIT
- Unclear if there is a reasonable Linux equivalent, perhaps CAP_SYS_RAWIO.
- Can set Kernel Attributes
- This should replace the registry kernelAttributesOwnerList
- Closest Linux equivalent appears to be CAP_SYS_BOOT.
- Can modify kernel file system
- This should replace the registry fileSystemOwnerList
- Closest Linux equivalent appears to be CAP_FOWNER, or perhaps CAP_SYS_RESOURCE.
The associated routines should be modified to check capabilities rather than using names.
The above is intended to not be a substantive backward compatibility concern.
The integration tool should convert the name lists to capabilities on upgrade.
Process
New fields:
uid_t euid // Effective uid gid_t egid // Effective gid cap_t capabilities
Let fileAttr be the FileAttributes matching the process template's partitionASCIIName.
Create process shall fail if the creating process does not have "x" access to the fileAttr.
If the fileAttr has the setuid bit set, the created process' euid shall be set to the fileAttr.userID.
If the fileAttr has the setgid bit set, the created process' egid shall be set to the fileAttr.groupID.
If the setuid or setgid bits are not set, then the created process' euid and/or egid shall be set to either autoCreatedProcInfo or be inherited from the creating process. The autoCreatedProcInfo should refer to a #User, and the user.gid used as the egid of the newly created process.
The only anticipated reason to have an API that returns the euid or egid is testing. However Deos will support new kernel APIs geteuid() and getegid().
http://linux.die.net/man/2/geteuid http://pubs.opengroup.org/onlinepubs/009695399/functions/getegid.html
There do not appear to be any POSIX or *nix portable APIs to get the euid or egids of a different process. If such APIs are needed, they will be added to libdebuggerkernelsupport.so. It appears that /proc/$pid/uid is reasonably portable, but Deos will not introduce the /proc file system at this time.
Since Deos does not support saved and real IDs, or CAP_SETUID, there is no need to implement functions that modify uids or gids.
POSIX Compatibility
Note that although Deos will support setuid and setgid bits in the file system, Deos will not support the full POSIX semantics. POSIX retains the "real {user,group} ids", which means that suid programs always retain the "real uid" of the creating process. At this time Deos processes don't need real user ID, so Deos will not be POSIX compliant, but neither will it be incompatible (just not complete).
The setuid/setgid file attributes are required to provide a means to change the effective IDs in order to support the required access control.
Future Implementation Note: Effective IDs behavior matches POSIX specified behavior, however to maintain backward compatibility if real IDs were implemented with access control information, Deos could not match POSIX since Deos does not grant child processes permission to affect the creating process. However since Deos will not implement "real IDs", at this time there is no inconsistency. The behavior of real IDs can be resolved when execve() is implemented. I.e., the treatment of real IDs by execve() and createProcess() would likely need to be different. Alternatively users could be required to create FileAttributes with setuid/setgid to get the backward compatible behavior.
Image mapping code
Update mapViewOfKernelFile*(), and readFromKernelFile() to require "read" access to all files.
Since mapping an image file (.exe or .so) requires 4 different file operations, the access control lookup could be noticeable. Consequently caching permissions for the last file may be desireable. Post implementation testing should be done to determine if caching is necessary.
Access Checking Rules
The following assumes #User and #Group are implemented as noted.
The following defines the access a process has (either read, write, or execute) to a file (let fileAttr be the FileAttribute matching the file):
If
process.euid == fileAttr.userID, then fileAttr.userID.access determines access
else if
process.euid == for some x, fileAttr.userACLs[x.userID], then x.access determines access
else if
(the egid, or any process.euid.supplemental gids) matches (the
fileAttr.groupID or any of the fileAttr.groupACLs.groupIDs) and the
matching entry's access grants the requested permissions, that entry determines access
else if
(the egid, or any process.euid.supplemental gids) matches (the
fileAttr.groupID or any of the fileAttr.groupACLs.groupIDs) and
none of the matching entry's access grants requested
permissions, access is denied.
else
the fileAttr.other entry determines access.
Note that the above permits group to specify broader access than user, and "other" to specify broader access than is granted by a matching user or group. This is intentional.
The rules for access checking in POSIX ACLs were never standardized. This is a concise statement of the semantics: http://users.suse.com/~agruen/acl/linux-acls/online#sec:permission-check The above Deos design is intended to be consistent with that specification.
Failing access control is a new way for a process to fail to load. Since the access control info only applies to files that exist, it is likely that diagnostic output (log event) can be made meaningful enough that elfchk does not need to be enhanced. If not, then an elfcheck update may be required.
E.g.,
logEvent(file_not_readable, filehandle);
File Display
Either FTP server should be updated to return the file system access, or OpenArbor changed to have the ability to dump the FileAttribute information applicable to each file. If ftpserver is the choice, suggest defining POSIX like APIs, e.g, #stat() to get the #POSIX_ACLs, and #Capabilities. If the APIs are not POSIX compliant, use a Deos specific name.
Note that POSX ls(1) indicates the existence of ACLs by adding a "+" to the "rwxrwxrwx" protection mask when a file is listed, e.g., "rwxrwxrwx+". and "s" overlays "x" for setuid/setgid. It would be nice if ftpserver honored that convention. ls(1) does not have a special marker for capabilities.
Supporting Cache Partitioning
To implement cache partitioning, it is necessary to specify a set of user and/or groups in such a manner that created processes only have access to the proper memory pools, and that only authorized processes are permitted to create a process that has access to the memory pool. Assume the following condition:
| Field | Process A | Process B | Process C |
|---|---|---|---|
| euid | User_A | User_B | User_C |
| egid | Group_A | Group_B | Group_C |
| executable | A.exe | B.exe | C.exe |
| Accessible Memory Pools | Pool A, C | Pool B | Pool C |
Use Cases
Assume process A and B already exist. Process A needs to create process C, but process B should not have access to Pool A or Pool C, and process C is not supposed to have access to Pool A. To accomplish this, establish the following FileAttributes:
User/Group Access A.exe user: User_A rx, setuid setuid/gid Or autocreate with group: Group_A rx, setgid appropriate euid and egid other: none ACLs none B.exe user: User_B rx, setuid setuid/gid Or autocreate with group: Group_B rx, setgid appropriate euid and egid other: none ACLs none C.exe user: User_C rx, setuid group: Group_C rx, setgid other: none ACLs User_A x
Process B can't create an instance of Process A or Process C because Process B does not have "x" access to either A.exe or C.exe. Similarly Process C cannot create instances of Process A or Process B because Process C does not have "x" access to either A.exe or B.exe.
Process A has "x" access to C.exe, so Process A can create an instance of Process C. Because C.exe has setuid and setgid, when Process A creates Process C, Process C will be created with euid=User_C, and egid=Group_C. Note that Process A must have sufficient quota in memory pool C to create Process C.
Using access control mechanisms alone, there is no way to prevent process A from creating multiple instances of Process C. However such inhibition can be achieved if Process A only has sufficient pool C quota for once instance of C.
None of what follows describes required behavior at this time.
POSIX Object Overview
Random notes on the POSIX objects that are involved in access control that may (eventually) need to find their way into Deos.
POSIX Users
- POSIX states that a user with UID of 0 has "appropriate privileges", i.e., "has access to everything" aka "root" access.
- "nobody" is uid_t(-1)
- Suggest reserving UIDs from 0--999 for (Deos) system use.
There is a reasonable overview of the following at: http://man7.org/linux/man-pages/man7/credentials.7.html
Note that packing the UID space is probably not reasonable since the attributes in the file system have longer extent than a registry or other data structures that we'd likely use to represent the list of users.
POSIX Group
name ID
POSIX also has a group password.
POSIX User
name UID GID primary group ID supplementary group IDs
Linux used to permit 32 supplementary groups per user and reserved 15 bits per user/group ID. Linux [http://man7.org/linux/man-pages/man7/credentials.7.html now supports 65536] supplementary groups.
POSIX also has a user password.
POSIX Process
euid Effective user ID, saved user id, real user id egid effective group ID gid real group ID. sgid saved group ID.
Only euid and egid are required if we don't support SUID programs or root access.
Note that the GID should not be confused with a "process group". Process group and session ids are a means for managing terminal sessions
mode_t
In stat.h (12 total bits)
user access (rwx) group access (rwx) other access (rwx) suid set UID on execute AL: Support or not? sgid set GID on execute AL: Support or not? isvtx On directories, restricted deletion flag.
Note that suid and sgid have additional directory specific behavior.
Ref: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_stat.h.html
File Properties
This stuff is returned by stat()
Ref http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_stat.h.html
stat // There is both a type stat and a function stat(). Nice.
//device inode file/PIB handle mode_t //nlink Number of hard links uid owner gid group // st_rdev st_size length of file in bytes // timestamps for access, modification, status change
Commented out fields above, probably would not be supported for the kernel file system.
- POSIX filenames are case sensitive.
- POSIX portable filenames only need to support letters, digits, period, underscore and hyphen.
Interesting POSIX APIs
stat()
Returns file system information. Ref http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_stat.h.html
exec() and SUID/SGID
Ref: http://en.wikipedia.org/wiki/Setuid
On exec of a program with suid or sgid, the executing process gets its effective uid or effective gid set (or both).
There is also special semantics for directories.
Modern Linux is replacing setuid programs with #Capabiliites.
setuid()
Allows root to change the real, effective, and saved user ID, or suid executables (processes) to change the effective uid.
Ref: http://pubs.opengroup.org/onlinepubs/009695399/functions/setuid.html
setgid()
Like #setuid()
http://pubs.opengroup.org/onlinepubs/009695399/functions/setregid.html
chown()
Change owner and group of a file
Ref: http://pubs.opengroup.org/onlinepubs/009695399/functions/chown.html
Predefined files
These seem to be pretty standard, but not part of POSIX. I think we could ignore them for now. However, we'll need some way to specify both in the registry and to hypstart, what the current users and groups are, and the numeric IDs will have to match.
/etc/passwd Ref: http://en.wikipedia.org/wiki/Passwd /etc/group Ref: http://linux.die.net/man/5/group
kill()
Kill sends signals (Deos terminology, raise an exception) to other processes. W.r.t. access control, the sender must have the CAP_KILL capability, or the real or effective user ID of the sending process must equal the real or saved set-user-ID of the target process.
Ref: http://man7.org/linux/man-pages//man2/kill.2.html
POSIX ACLs
POSIX also defines Access Control Lists (ACLs). Effectively this is a way to specify access for more than one user or group, e.g., user:alarson:rwx (ref http://users.suse.com/~agruen/acl/linux-acls/online/). ACLs are stored as file #Extended_Attributes.
POSIX Capabiliites
Capabilities are defined with #defines starting with "CAP_" and there are standard macros for converting between capability sets and strings, and setting and clearing capabilities. Ref http://linux.die.net/man/3/cap_to_text and related.
Adding capabilities to files may not be a good match since capabilities may vary by which registry is loaded. E.g., ftp in download has "may modify file system".
- Ref: http://linux.die.net/man/7/capabilities The description of File capabilities is provably incomprehensible.
- Ref: http://www.cis.syr.edu/~wedu/seed/Labs/Documentation/Linux/How_Linux_Capability_Works.pdf
- Ref: http://www.friedhoff.org/posixfilecaps.html
- Old: https://www.kernel.org/pub/linux/libs/security/linux-privs/kernel-2.2/capfaq-0.2.txt
- The withdrawn POSIX.1e spec (ref http://wt.tuxomania.net/publications/posix.1e/).
- Capability Myths Demolished and its refutation
Extended Attributes
Ref: