Short Notes: Unix System Calls

How do Processes interact with the hardware?

Krishanu Konar

12 minute read


Unix Systems

Unix systems were created in 1970s, and by 1980s the 2 most prevelant systems were SystemV (created by AT&T) and BSD (Berkeley System Distribution). Many variants exist nowadays, including many distros for Linux, MacOSX (based on Darwin), FreeBSD etc.

Unix is a modular OS made up of a number of essential components, including the kernel, shell, file system and a core set of utilities or programs. At the heart of the Unix OS is the kernel, a master control program that provides services to start and end programs. It also handles low-level operations, such as allocating memory, managing files, responding to system calls and scheduling tasks.

But what “is” Unix? In the late 1980s, an open operating system standardization effort now known as POSIX provided a common baseline for all operating systems.

  • POSIX (Portable Operating System Interface for Unix): IEEE based POSIX around the common structure of the major competing variants of the Unix system, publishing the first POSIX standard in 1988.
  • SUS (Single Unix Specifications): In the early 1990s, a separate but very similar effort was started by an industry consortium, the Common Open Software Environment (COSE) initiative, which eventually became the Single UNIX Specification (SUS) administered by The Open Group.

Unix systems nowadays follow these specifications.

User Space and Kernel Space

The OS provides programs with a consistent view of the computer’s hardware. In addition, the operating system must account for independent operation of programs and protection against unauthorized access to resources. This is possible only if the CPU enforces protection of system software from the applications.

All current processors have at least two protection levels; when several levels exist, the highest and lowest levels are used. Under Unix, the kernel executes in the highest level (also called supervisor mode), where everything is allowed, whereas applications execute in the lowest level (the so-called user mode), where the processor regulates direct access to hardware and unauthorized access to memory. We usually refer to the execution modes as kernel space and user space.

User space processes can only access a small part of the kernel via an interface exposed by the kernel - the system calls. A system call is a programmatic way a program requests a service from the kernel. The system call interface includes a number of functions that the operating system exports to the applications running on top of it. These functions allow actions like opening files, creating network connections, reading and writing from files, and so on.

System calls are divided into 5 main categories:

  • Process Control: This system calls perform the task of process creation, process termination, etc. fork(), exit(), exec() are some common examples.

  • File Management: This system calls handle file manipulation jobs like creating a file, reading, and writing, etc. The Linux System calls under this are open(), read(), write(), close().

  • Device Management: Device management does the job of device manipulation like reading from device buffers, writing into device buffers, etc. The Linux System calls under this is ioctl().

  • Information Maintenance: It handles information and its transfer between the OS and the user program. The System calls under this are getpid(), alarm(), sleep().

  • Communication: These types of system calls are specially used for inter-process communications. Message Passing, shared memory are part of these. The system calls under this are pipe(), shmget(), mmap().

Syscall Flow

System Call Flow

Example of calling fwrite which invokes sys_write

$ strace ./hello_userspace

execve("./hello_userspace", ["./hello_userspace"], 0x7ffd769d3740 /* 23 vars */) = 0
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
... read dynamic library linking index
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
... read symbols in libc library
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
... obtain the standard output
write(1, "USER: Hello World!\n", 19USER: Hello World!)    = 19

You will see a list of system call invoked, including execve, access, open, read, write, close.

Unix System Calls

Unix System calls provide an interface to interact with hardware. It abstracts out the low level programming from users without having to worry about the implementation of underlying system and devices. This improves security and portability of programs.

System calls are funcitons in the OS code, which programs can invoke using special instructions. These instructions allow programs to interact with hardware, like writing to a device, send data over network etc.

The Linux kernel requires that for an entity to be a filesystem, it must implement the following system calls:

  • open(): Creates a file and returns a new FD.
  • close(): Release the related FD.
  • read(): Copy bytes from file into memory
  • write(): Copy bytes from memory into file

FD is a number that uniquely identifies an open file in the process. The read/write functions ususally take the FD. Descriptor has underlying data structure “description”, which represents the open file. FDs can be copied, and might point to the same description.

Writes generally go to a write buffer; ie. process -> syscall write -> write buffer (OS) -> Disk. This is done mostly due to performance reasons. The process might not know when (or even if the data is ever written on disk), since OS is responsible for writing these buffers to file. Processes do not get any verification, and may even terminate before data is written. In write critical systems like databases, it might be possible to get verification of writes.

The Same flow exists for reads as well; ie. Disk -> read buffer (OS) -> Process. Again, this is done mostly due to performance reasons. Read is initially blocking, as the process needs data and disks are slow. More data might be read into the buffer for performance reasons.

The Writes are non blocking, while the Reads can be both blocking or non blocking depending on the data present in buffers.

Read decides how much data to return, you can request maximum amount of data to be returned, but read can decide what amount of data to send (might depend on read buffers). It is responsibility for the application to check how much data was recieved and act accordingly.

Read always returns some amount of data if there’s data left to be read from file, it does not return empty. In case there is no data in the buffer, it blocks until data is recieved. Since “empty” return indicates EOF, read needs to block on process in order to disambiguate from EOF and buffer delays. A marker keeps tracks of the last processed bytes.

File size can automatically increase on writes, but does not shrink automatically. truncate() can be used to shrink (or increase) file size. lseek can be used to move this market to arbitrary positions. umask can be used to set/get default permissions for new files/directories henceforth created in a given process.

It is important to note that reads and writes are not atomic, and can be interrupted by the OS in the middle and the data buffers are written/read from with no coordination. Multiple processses running using same files for reads and writes may read to interweaving of data from all of the writes/reads.

A process runs in isolation (memory space), system calls basically breaks out of this isolation. In most modern OS, the kernel code is placed in address space of the process itself, just above the stack. These pages of the address space are marked such that the process itself cannot access that space, only when the process invokes a system call does execution actually jump to the kernel code.

When system call is invoked, it uses the process’s stack to store a stack frame for that system call. This allows the system calls to execute in context of the process which invokes them, which avoids a context switch. It also allows system calls to be natuarally iterrupted. If system calls runs outside the context of process, its tricky for kernel to suspend a process while its invoking a systme call.

System calls are present for various purposes like: processes, files, sockets, signals, IPC etc. System calls are special instructions, mostly the compiler or interpreters include standard libraries that include these functions which invoke the system calls. The compiler/interpreters might be bridging the semantic gap between the function used by programmer and the actual system call.

Most Operating Systems are monolithic in nature (eg. linux). The OS is a single executable file. All OS runs in the system space. The Kernel binary contains all linux sub-systems. There’s another type of OS running as Microkernels (eg. Mach, which communicate by passing messages between subsystems, only core kernel would be present in system space. It handles message passing, interrupts, I/O and process management.

How does a system call work

  • The kernel provides a set of interfaces by which processes running in user-space can interact with the system. These interfaces give applications controlled access to hardware, a mechanism with which to create new processes and communicate with existing ones, and the capability to request other operating system resources. Whenever a process needs these special resources like reading/writing to a file, getting any information from the hardware or requesting a space in memory etc., system calls will be used to get them access to these resources.

  • The libc code provides a common API for UNIX programs, and a portion of that API is the system calls. These APIs act as the messengers between applications and the kernel, the applications issues various requests and the kernel fulfills them (or returns an error). System calls provide a layer between the hardware and user-space processes. But System calls and APIs are not the same thing. APIs basically are function definitions that specify “how to obtain a specific service”. You generally don’t make system calls directly, instead you use an API. Each system call has a corresponding Wrapping routine, that specifies the API that the application program must use to invoke that system call.

  • Kernels are very dependent on the hardware architecture they run on, and libc needs to talk to the kernel, much of the code you’ll find in the libc source organized by architecture. For eg., if we look for stat, we see basically a single line of code that calls a function called __xstat. If we find . -name xstat.c we’ll see that we want ./sysdeps/unix/sysv/linux/i386/xstat.c, which is the implementation of stat for Linux on i386.

  • These are special predefined instructions that the process can invoke to make a request to the operating system. The program uses calls to these interfaces in its code when needed. It is also known as a trap or a software interrupt. This triggers a context switch and transition from the user mode to the kernel mode.

  • System calls in OS are executed in kernel mode on a priority basis. When the OS sees the system call, it can temporarily halt the program execution and give the control to Kernel. The system uses an index to identify the system call and address the corresponding kernel function. Now the kernel performs the operation which was requested by program, by running the kernel function corresponding to the system call and prepares the output that is to be returned. Once finished, the Operating system switches back to user-mode and gives the control back to the program and resumes from where the interrupt occured.

System Call Flow
  • To be more specific, when a User Mode process invokes a system call, the CPU switches to Kernel Mode and starts the execution of a Kernel Function(which happens to be a assembly language function) called the System Call Handler. This System Call Handler has a similar structure to that of other “Exception Handlers”.
  • Every system call has a number associated with it. This number is passed to the kernel and that’s how the kernel knows which system call was made. When a user program issues a system call, it is actually calling a library routine. The library routine issues a trap to the Linux operating system by executing INT 0x80 assembly instruction. It also passes the system call number to the kernel using the EAX register. The arguments of the system call are also passed to the kernel using other registers (EBX, ECX, etc.).
  • This System Call Handler first saves the content of the registers in the kernel mode stack. Then based on the system call number (each system call has a number associated with it and the user mode process must pass this number as a parameter so that the requested call can be identified) the System Call Handler calls the relevant System Call Service Routine which in Linux happens to be a C function that actually goes on to implement the functionality requested by the User Process. After that’s done, registers are loaded back to their previous values and the CPU switches back to User Mode.

Parameter Passing

OS writer and user programs rely upon convention when choosing where to store parameters and return values:

  • Simplest: put all values in registers (hopefully there are enough!)
  • Memory region: write to memory, then store starting memory address in a register.
  • Push values onto stack; OS will pop values off the stack (based upon stack register)
  • Usually, hardware constraints dictate which system call convention used

System calls and library functions are similar, in the sense that library functions also reside in a library external to the user program.

Types of System Calls

Process Control: These System calls play an essential role in controlling system processes. They enable you to create new processes or terminate existing ones, load and execute programs within a process’s space, schedule processes and set execution attributes, such as priority or wait for a process to complete or signal upon its completion. Some common system calls include fork(), exec(), wait(), exit(), getpid() etc.

File Management: These System calls support a wide array of file operations, such as reading from or writing to files, opening and closing files, deleting or modifying file attributes, moving or renaming files etc. Some common system calls include open(), close(), read(), write(), link(), lseek() etc.

Device Management: These System calls can be used to facilitate device management by requesting device access and releasing it after use, setting device attributes or parameters, reading from or writing to devices or mapping logical device names to physical devices. Some common system calls include mmap(), munmap(), brk() etc.

Information Maintenance: This type of system call enables processes to retrieve or modify various system attributes, set the system date and time or query system performance metrics. Some common system calls include time(), alarm(), getuid() etc.

Communication: The communication call type facilitates sending or receiving messages between processes, synchronizing actions between user processes, wstablishing shared memory regions for inter-process communication, networking via sockets etc. Some common system calls include socket(), bind(), send(), recv(), listen() etc.

Security and Access Control: System calls contribute to security and access control by determining which processes or users get access to specific resources and who can read, write, and execute resources. It also facilitates user authentication procedures. Some common system calls include chmod(), chroot(), chown(), umask() etc.


References

comments powered by Disqus