36,59 €
Explore the fundamentals of systems programming starting from kernel API and filesystem to network programming and process communications
Key Features
Book Description
System software and applications were largely created using low-level languages such as C or C++. Go is a modern language that combines simplicity, concurrency, and performance, making it a good alternative for building system applications for Linux and macOS.
This Go book introduces Unix and systems programming to help you understand the components the OS has to offer, ranging from the kernel API to the filesystem, and familiarize yourself with Go and its specifications. You'll also learn how to optimize input and output operations with files and streams of data, which are useful tools in building pseudo terminal applications. You'll gain insights into how processes communicate with each other, and learn about processes and daemon control using signals, pipes, and exit codes. This book will also enable you to understand how to use network communication using various protocols, including TCP and HTTP.
As you advance, you'll focus on Go's best feature-concurrency helping you handle communication with channels and goroutines, other concurrency tools to synchronize shared resources, and the context package to write elegant applications.
By the end of this book, you will have learned how to build concurrent system applications using Go
What you will learn
Who this book is for
If you are a developer who wants to learn system programming with Go, this book is for you. Although no knowledge of Unix and Linux system programming is necessary, intermediate knowledge of Go will help you understand the concepts covered in the book
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 489
Veröffentlichungsjahr: 2019
Copyright © 2019 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Richa TripathiAcquisition Editor:Shriram ShekharContent Development Editor:Tiksha SarangSenior Editor: Afshaan KhanTechnical Editor:Sabaah NavlekarCopy Editor: Safis EditingLanguage Support Editor: Storm MannProject Coordinator:Prajakta NaikProofreader: Safis EditingIndexer:Rekha NairProduction Designer:Shraddha Falebhai
First published: July 2019
Production reference: 1040719
Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.
ISBN 978-1-78980-407-2
www.packtpub.com
Packt.com
Subscribe to our online digital library for full access to over 7,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Spend less time learning and more time coding with practical eBooks and videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Fully searchable for easy access to vital information
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and, as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Alex Guerrieri is a software developer who specializes in backend and distributed systems. Go has been his favorite tool for the job since first using it in 2013. He holds a degree in computer science engineering and has been working in the field for more than 6 years. Originally from Italy, where he completed his studies and started his career as a full-stack developer, he now lives in Spain, where he previously worked in two start-ups—source{d} and Cabify. He is now working for three companies—as a software crafter for BBVA, one of the biggest Spanish banks; as a software architect for Security First, London, a company focusing on digital and physical security; and as a cofounder of DauMau, the company behind Vidsey, software that generates videos in a procedural manner.
Corey Scott is a principal software engineer currently living in Melbourne, Australia. He has been programming professionally since 2000, with the last 5 years spent building large-scale distributed services in Go.
A blogger on a variety of software-related topics, he is passionate about designing and building quality software. He believes that software engineering is a craft that should be honed, debated, and continuously improved. He takes a pragmatic, non-zealous approach to coding, and is always up for a good debate about software engineering, continuous delivery, testing, or clean coding.
Janani Selvaraj is currently working as a data analytics consultant for Gaddiel Technologies, Trichy, where she focuses on providing data analytics solutions for start-up companies. Her previous experience includes training and research development in relation to data analytics and machine learning.
She has a PhD in environmental management and has more than 5 years' research experience with regard to statistical modeling. She is also proficient in a number of programming languages, including R, Python, and Go.
She reviewed a book entitled Go Machine Learning Projects, and also coauthored a book entitled Machine Learning Using Go, published by Packt Publishing.
Arun Muralidharan is a software developer with over 9 years' experience as a systems developer. Distributed system design, architecture, event systems, scalability, performance, and programming languages are some of the aspects of a product that interest him the most. Professionally, he spends most of his time coding in C++, Python, and C (and perhaps Go in the near future). Away from his job, he also develops software in Go and Rust.
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Title Page
Copyright and Credits
Hands-On System Programming with Go
Dedication
About Packt
Why subscribe?
Contributors
About the author
About the reviewers
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Code in Action
Playground examples
Conventions used
Get in touch
Reviews
Section 1: An Introduction to System Programming and Go
An Introduction to System Programming
Technical requirements
Beginning with system programming
Software for software
Languages and system evolution
System programming and software engineering
Application programming interfaces
Types of APIs
Operating systems
Libraries and frameworks
Remote APIs
Web APIs
Understanding the protection ring
Architectural differences
Kernel space and user space
Diving into system calls
Services provided
Process control
File management
Device management
Information maintenance
Communication
The difference between operating systems
Understanding the POSIX standard
POSIX standards and features
POSIX.1 – core services
POSIX.1b and POSIX.1c – real-time and thread extensions
POSIX.2 – shell and utilities 
OS adherence
Linux and macOS
Windows
Summary
Questions
Unix OS Components
Technical requirements
Memory management
Techniques of management
Virtual memory
Understanding files and filesystems
Operating systems and filesystems
Linux
macOS
Windows
Files and hard and soft links
Unix filesystem
Root and inodes
Directory structure
Navigation and interaction
Mounting and unmounting
Processes
Process properties
Process life cycle
Foreground and background
Killing a job
Users, groups, and permissions
Users and groups
Owner, group, and others
Read, write, and execute
Changing permission
Process communications
Exit codes
Signals
Pipes
Sockets
Summary
Questions
An Overview of Go
Technical requirements
Language features
History of Go
Strengths and weaknesses
Namespace
Imports and exporting symbols
Type system
Basic types
Composite types
Custom-defined types
Variables and functions
Handling variables
Declaration
Operations
Casting
Scope
Constants
Functions and methods
Values and pointers
Understanding flow control
Condition
Looping
Exploring built-in functions
Defer, panic, and recover
Concurrency model
Understanding channels and goroutines
Understanding memory management
Stack and heap
The history of GC in Go
Building and compiling programs
Install
Build
Run
Summary
Questions
Section 2: Advanced File I/O Operations
Working with the Filesystem
Technical requirements
Handling paths
Working directory
Getting and setting the working directory
Path manipulation
Reading from files
Reader interface
The file structure
Using buffers
Peeking content
Closer and seeker
Writing to file
Writer interface
Buffers and format
Efficient writing
File modes
Other operations
Create
Truncate
Delete
Move
Copy
Stats
Changing properties
Third-party packages
Virtual filesystems
Filesystem events
Summary
Questions
Handling Streams
Technical requirements
Streams
Input and readers
The bytes reader
The strings reader
Defining a reader
Output and writers
The bytes writer
The string writer
Defining a writer
Built-in utilities
Copying from one stream to another
Connected readers and writers
Extending readers
Writers and decorators
Summary
Questions
Building Pseudo-Terminals
Technical requirements
Understanding pseudo-terminals
Beginning with teletypes
Pseudo teletypes
Creating a basic PTY
Input management
Selector
Command execution
Some refactor
Improving the PTY
Multiline input
Providing color support to the pseudo-terminal
Suggesting commands
Extensible commands
Commands with status
Volatile status
Persistent status
Upgrading the Stack command
Summary
Questions
Section 3: Understanding Process Communication
Handling Processes and Daemons
Technical requirements
Understanding processes
Current process
Standard input
User and group ID
Working directory
Child processes
Accessing child properties
Standard input
Beginning with daemons
Operating system support
Daemons in action
Services
Creating a service
Third-party packages
Summary
Questions
Exit Codes, Signals, and Pipes
Technical requirements
Using exit codes
Sending exit codes
Exit codes in bash
The exit value bit size
Exit and deferred functions
Panics and exit codes
Exit codes and goroutines
Reading child process exit codes
Handling signals
Handling incoming signals
The signal package
Graceful shutdowns
Exit cleanup and resource release
Configuration reload
Sending signals to other processes
Connecting streams
Pipes
Anonymous pipes
Standard input and output pipes
Summary
Questions
Network Programming
Technical requirements
Communicating via networks
OSI model
Layer 1 – Physical layer
Layer 2 – Data link layer
Layer 3 – Network layer
Layer 4 – Transport layer
Layer 5 – Session layer
Layer 6 – Presentation layer
Layer 7 – Application layer
TCP/IP – Internet protocol suite
Layer 1 – Link layer
Layer 2 – Internet layer
Layer 3 – Transport layer
Layer 4 – Application layer
Understanding socket programming
Network package
TCP connections
UDP connections
Encoding and checksum
Web servers in Go
Web server
HTTP protocol
HTTP/2 and Go
Using the standard package
Making a HTTP request
Creating a simple server
Serving filesystem
Navigating through routes and methods
Multipart request and files
HTTPS
Third-party packages
gorilla/mux
gin-gonic/gin
Other functionalities
HTTP/2 Pusher
WebSockets protocol
Beginning with the template engine
Syntax and basic usage
Creating, parsing, and executing templates
Conditions and loops
Template functions
RPC servers
Defining a service
Creating the server
Creating the client
Summary
Questions
Data Encoding Using Go
Technical requirements
Understanding text-based encoding
CSV
Decoding values
Encoding values
Custom options
JSON
Field tags
Decoder
Encoder
Marshaler and unmarshaler
Interfaces
Generating structs
JSON schemas
XML
Structure
Document Type Definition
Decoding and encoding
Field tags
Marshaler and unmarshaler
Generating structs
YAML
Structure
Decoding and encoding
Learning about binary encoding
BSON
Encoding
Decoding
gob
Interfaces
Encoding
Decoding
Interfaces
Proto
Structure
Code generation
Encoding
Decoding
gRPC protocol
Summary
Questions
Section 4: Deep Dive into Concurrency
Dealing with Channels and Goroutines
Technical requirements
Understanding goroutines
Comparing threads and goroutines
Threads
Goroutines
New goroutine
Multiple goroutines
Argument evaluation
Synchronization
Exploring channels
Properties and operations
Capacity and size
Blocking operations
Closing channels
One-way channels
Waiting receiver
Special values
nil channels
Closed channels
Managing multiple operations
Default clause
Timers and tickers
Timers
AfterFunc
Tickers
Combining channels and goroutines
Rate limiter
Workers
Pool of workers
Semaphores
Summary
Questions
Synchronization with sync and atomic
Technical requirements
Synchronization primitives
Concurrent access and lockers
Mutex
RWMutex
Write starvation
Locking gotchas
Synchronizing goroutines
Singleton in Go
Once and Reset
Resource recycling
Slices recycling issues
Conditions
Synchronized maps
Semaphores
Atomic operations
Integer operations
clicker
Thread-safe floats
Thread-safe Boolean
Pointer operations
Value
Under the hood
Summary
Questions
Coordination Using Context
Technical requirements
Understanding context
The interface
Default contexts
Background
TODO
Cancellation, timeout, and deadline
Cancellation
Deadline
Timeout
Keys and values 
Context in the standard library
HTTP requests
Passing scoped values
Request cancellation
HTTP server
Shutdown
Passing values
TCP dialing
Cancelling a connection
Database operations
Experimental packages
Context in your application
Things to avoid
Wrong types as keys
Passing parameters
Optional arguments
Globals
Building a service with Context
Main interface and usage
Exit and entry points
Exclude list
Handling directories
Checking file names and contents
Summary
Questions
Implementing Concurrency Patterns
Technical requirements
Beginning with generators
Avoiding leaks
Sequencing with pipelines
Muxing and demuxing
Fan-out
Fan-in
Producers and consumers
Multiple producers (N * 1)
Multiple consumers (1 * M)
Multiple consumers and producers (N*M)
Other patterns
Error groups
Leaky bucket
Sequencing
Summary
Questions
Section 5: A Guide to Using Reflection and CGO
Using Reflection
Technical requirements
What's reflection?
Type assertions
Interface assertion
Understanding basic mechanics
Value and Type methods
Kind
Value to interface
Manipulating values
Changing values
Creating new values
Handling complex types
Data structures
Changing fields
Using tags
Maps and slices
Maps
Slices
Functions
Analyzing a function
Invoking a function
Channels
Creating channels
Sending, receiving, and closing
Select statement
Reflecting on reflection
Performance cost
Usage in the standard library
Using reflection in a package
Property files
Using the package
Summary
Questions
Using CGO
Technical requirements
Introduction to CGO
Calling C code from Go
Calling Go code from C
The C and Go type systems
Strings and byte slices
Integers
Float types
Unsafe conversions
Editing a byte slice directly
Numbers
Working with slices
Working with structs
Structures in Go
Manual padding
Structures in C
Unpacked structures
Packed structures
CGO recommendations
Compilation and speed
Performance
Dependency from C
Summary
Questions
Assessments
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Chapter 8
Chapter 9
Chapter 10
Chapter 11
Chapter 12
Chapter 13
Chapter 14
Chapter 15
Chapter 16
Other Books You May Enjoy
Leave a review - let other readers know what you think
This book will provide good, in-depth explanations of various interesting Go concepts. It begins with Unix and system programming, which will help you understand what components the Unix operating system has to offer, from the kernel API to the filesystem, and allow you to familiarize yourself with the basic concepts of system programming.
Next, it moves on to cover the application of I/O operations, focusing on the filesystem, files, and streams in the Unix operating system. It covers many topics, including reading from and writing to files, among other I/O operations.
This book also shows how various processes communicate with one another. It explains how to use Unix pipe-based communication in Go, how to handle signals inside an application, and how to use a network to communicate effectively. Also, it shows how to encode data to improve communication speed.
The book will, toward the end, help you to understand the most modern feature of Go—concurrency. It will introduce you to the tools the language has, along with sync and channels, and how and when to use each one.
This book is for developers who want to learn system programming with Go. Although no prior knowledge of Unix and Linux system programming is necessary, some intermediate knowledge of Go will help you to understand the concepts covered in the book.
Chapter 1, An Introduction to System Programming, introduces you to Go and system programming and provides some basic concepts and an overview of Unix and its resources, including the kernel API. It also defines many concepts that are used throughout the rest of the book.
Chapter 2, Unix OS Components, focuses on the Unix operating system and the components that you will interact with—files and the filesystem, processes, users and permissions, threads, and others. It also explains the various memory management techniques of the operating system, and how Unix handles resident and virtual memory.
Chapter 3, An Overview of Go, takes a look at Go, starting with some history of the language and then explaining, one by one, all its basic concepts, starting with namespaces and the type system, variables, and flow control, and finishing with built-in functions and the concurrency model, while also offering an explanation of how Go interacts and manages its memory.
Chapter 4, Working with the Filesystem, helps you to understand how the Unix filesystem works and how to master the Go standard library to handle file path operations, file reading, and file writing.
Chapter 5, Handling Streams, helps you to learn about the interfaces for the input and output streams that Go uses to abstract data flows. It explains how they work and how to combine them and best use them without leaking information.
Chapter 6, Building Pseudo-Terminals, helps you understand how a pseudo-terminal application works and how to create one. The result will be an interactive application that uses standard streams just as the command line does.
Chapter 7, Handling Processes and Daemons, provides an explanation of what processes are and how to handle them in Go, how to start child processes from a Go application, and how to create a command-line application that will stay in the background (a daemon) and interact with it.
Chapter 8, Exit Codes, Signals, and Pipes, discusses Unix inter-process communication. It explains how to use exit codes effectively. It shows you how signals are handled by default inside an application, and how to manage them with some patterns for effective signal handling. Furthermore, it explains how to connect the output and input of different processes using pipes.
Chapter 9, Network Programming, explains how to use a network to make processes communicate. It explains how network communication protocols work. It initially focuses on low-level socket communication, such as TCP and UDP, before moving on to web server development using the well-known HTTP protocol. Finally, it shows how to use the Go template engine.
Chapter 10, Data Encoding Using Go, explains how to leverage the Go standard library to encode complex data structures in order to facilitate process communications. It analyzes both text-based protocols, such as XML and JSON, and binary-based protocols, such as GOB.
Chapter 11, Dealing with Channels and Goroutines, explains the basics of concurrency and channels and some general rules that prevent the creation of deadlocks and resource-leaking inside an application.
Chapter 12, Synchronization with sync and atomic, discusses the synchronization packages of the sync and sync/atomic standard libraries, and how they can be used instead of channels to achieve concurrency easily. It also focuses on avoiding the leaking of resources and on recycling resources.
Chapter 13, Coordination UsingContext, discusses Context, a relatively new package introduced in Go that offers a simple way of handling asynchronous operations effectively.
Chapter 14, Implementing Concurrency Patterns, uses the tools from the previous three chapters and demonstrates how to use and combine them to communicate effectively. It focuses on the most common patterns used in Go for concurrency.
Chapter 15, Using Reflection, explains what reflection is and whether you should use it. It shows where it's used in the standard library and guides you in creating a practical example. It also shows how to avoid using reflection where there is no need to.
Chapter 16, Using CGO, explains how CGO works and why and when you should use it. It explains how to use C code inside a Go application, and vice versa.
Some basic knowledge of Go is required to try the examples and to build modern applications.
Each chapter includes a set of questions that will help you to gauge your understanding of the chapter. The answers to these questions are provided in the Assessments section of the book. These questions will prove very beneficial for you, as they will help you revisit each chapter at a glance.
Apart from this, each chapter provides you with instructions on how to run the code files, while the GitHub repository of the book provides the requisite details.
You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packt.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
Log in or register at
www.packt.com
.
Select the
SUPPORT
tab.
Click on
Code Downloads & Errata
.
Enter the name of the book in the
Search
box and follow the onscreen instructions.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for Mac
7-Zip/PeaZip for Linux
The code bundle for the book is also hosted on GitHub athttps://github.com/PacktPublishing/Hands-On-System-Programming-with-Go. In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781789804072_ColorImages.pdf.
Visit the following link to check out videos of the code being run: http://bit.ly/2ZWgJb5.
In the course of the book you will find many snippets of code followed by a link to https://play.golang.org, a service that allows you to run Go applications with some limitations. You can read more about it at https://blog.golang.org/playground.
In order to see the full source code of such examples, you need to visit the Playground link. Once on the website, you can press the Run button to execute the application. The bottom part of the page will show the output. The following is an example of the code running in the Go Playground:
If you want, you have the possibility of experimenting by adding and editing more code to the examples, and then running them.
There are a number of text conventions used throughout this book.
CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "This type of service includes load, which adds a program to memory and prepares for its execution before passing control to the program itself, or execute, which runs an executable file in the context of a pre-existing process."
A block of code is set as follows:
<meta name="go-import" content="package-name vcs repository-url">
Any command-line input or output is written as follows:
Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Windows-Subsystem-Linux
Bold: Indicates a new term, an important word, or words that you see on screen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "In the meantime, systems started to get distributed, and applications started to get shipped in containers, orchestrated by other system software, such as Kubernetes."
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packt.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in, and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packt.com.
This section is an introduction to Unix and system programming. It will help you understand what components Unix operating systems have to offer, from the kernel API to the filesystem, and you will become familiar with system programming's basic concepts.
This section consists of the following chapters:
Chapter 1
,
An Introduction to System Programming
Chapter 2
,
Unix OS Components
Chapter 3
,
An Overview of Go
This chapter is an introduction to system programming, exploring a range of topics from its original definition to how it has shifted in time with system evolution. This chapter provides some basic concepts and an overview of Unix and its resources, including the kernel and the application programming interfaces (API). Many of these concepts are defined here and are used in the rest of the book.
The following topics will be covered in this chapter:
What is system programming?
Application programming interfaces
Understanding how the protection ring works
An overview of system calls
The POSIX standard
This chapter does not require you to install any special software if you're on Linux.
If you are a Windows user, you can install the Windows Subsystem for Linux (WSL). Follow these steps in order to install WSL:
Open PowerShell as administrator and run the following:
Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Windows-Subsystem-Linux
Restart your computer when prompted.
Install your favorite Linux distribution from the Microsoft Store.
Over the years, the IT landscape has shifted dramatically. Multicore CPUs that challenge the Von Neumann machine, the internet, and distributed systems are just some of the changes that occurred in the last 30 years. So, where does system programming stand in this landscape?
Let's start with the standard textbook definition first.
System programming (or systems programming) is the activity of programming computer system software. The primary distinguishing characteristic of system programming when compared to application programming is that application programming aims to produce software that provides services directly to the user (for example, a word processor), whereas system programming aims to produce software and software platforms that provide services to other software, and are designed to work in performance-constrained environments, for example, operating systems, computational science applications, game engines and AAA video games, industrial automation, and software as a service applications.
The definition highlights two main concepts of what system applications are as follows:
Software that is used by other software, not directly by the final user.
The software is hardware aware (it knows how the hardware works), and is oriented toward performance.
This makes it possible to easily recognize as system software operating system kernels, hardware drivers, compilers, and debuggers, and not as system software, a chat client, or a word processor.
Historically, system programs were created using Assembly and C. Then came the shells and the scripting languages that were used to tie together the functionality offered by system programs. Another characteristic of system languages was the control of the memory allocation.
In the last decade, scripting languages gained popularity to the point at which some had significant performance improvement and entire systems were built with them. For example, let's just think about the V8 Engine for JavaScript and the PyPy implementation of Python, which dramatically shifted the performance of these languages.
Other languages, such as Go, proved that garbage collection and performance are not mutually exclusive. In particular, Go managed to replace its own memory allocator written in C with a native version written in Go in release 1.5, improving it to the point where the performance was comparable.
In the meantime, systems started to get distributed and the applications started to get shipped in containers, orchestrated by other system software, such as Kubernetes. These systems are meant to sustain huge throughput and achieve it in two main ways:
By scaling—augmenting the number or the resources of the machines that are hosting the system
By optimizing the software in order to be more resource effective
Some of the practices of system programming—such as having an application that is tied to the hardware, performance oriented, and working on an environment that is resource constrained—are an approach that can also be valid when building distributed systems, where constraining resource usage allows the reduction of the number of instances needed. It looks like system programming is a good way of addressing generic software engineering problems.
This means that learning the concept of system programming with regards to using the resource of the machine efficiently—from memory usage to filesystem access—will be useful when building any type of application.
APIs are series subroutine definitions, communication protocols, and tools for building software. The most important aspects of an API are the functionalities that it offers, combined with its documentation, which facilitates the user in the usage and implementation of the software itself in another software. An API can be the interface that allows an application software to use a system software.
An API usually has a specific release policy that is meant to be used by a specific group of recipients. This can be the following:
Private and for internal use only
Partner and usable by determined groups only—this may include companies that want to integrate the service with theirs
Public and available for every user
We'll see that there are several types of APIs, from the ones used to make different application software work together, to the inner ones exposed by the operating system to other software.
An API can specify how to interface an application and the operating system. For instance, Windows, Linux, and macOS have an interface that makes it possible to operate with the filesystem and files.
The API related to a software library describes and prescribes (provides instructions on how to use it) how each of its elements should behave, including the most common error scenarios. The behavior and interfaces of the API is usually referred to as library specification, while the library is the implementation of the rules described in such specification. Libraries and frameworks are usually language bound, but there are some tools that make it possible to use a library in a different language. You can use C code in Go using CGO, and in Python you can use CPython.
These make it possible to manipulate remote resources using specific standards for communication that allow different technologies to work together, regardless of the language or platform. A good example is the Java Database Connectivity (JDBC) API, which allows querying many different types of databases with the same set of functions, or the Java remote method invocation API (Java RMI), which allows the use of remote functions as if they were local.
Web APIs are interfaces that define a series of specifications about the protocol used, message encoding, and available endpoints with their expected input and output values. There are two main paradigms for this kind of API—REST and SOAP:
REST APIs have the following characteristics:
They treat data as a resource.
Each resource is identified by a URL.
The type of operation is specified by the HTTP method.
SOAP protocols have the following characteristics:
They are defined by the W3C standard.
XML is the only encoding used for messages.
They use a series of XML schema to verify the data.
The protection ring, also referred to as hierarchical protection domains, is the mechanism used to protect a system against failure. Its name is derived from the hierarchical structure of its levels of permission, represented by concentric rings, with privilege decreasing when moving to the outside rings. Between each ring there are special gates that allow the outer ring to access the inner ring resources in a restricted manner.
The number and order of rings depend on the CPU architecture. They are usually numbered with decreasing privilege, making ring 0 the most privileged one. This is true for i386 and x64 architecture that use four rings (from ring 0 to ring 3) but it's not true for ARM, which uses reverse order (from EL3 to EL0). Most operating systems are not using all four levels; they end up using a two level hierarchy—user/application (ring 3) and kernel (ring 0).
A software that runs under an operating system will be executed at user (ring 3) level. In order to access the machine resources, it will have to interact with the operating system kernel (that runs at ring 0). Here's a list of some of the operations a ring 3 application cannot do:
Modify the current segment descriptor, which determines the current ring
Modify the page tables, preventing one process from seeing the memory of other processes
Use the LGDT and LIDT instructions, preventing them from registering interrupt handlers
Use I/O instructions such as in and out that would ignore file permissions and read directly from disk
The access to the content of the disk, for instance, will be mediated by the kernel that will verify that the application has permission to access the data. This kind of negotiation improves security and avoids failures, but comes with an important overhead that impacts the application performance.
Some applications can be designed to run directly on the hardware without the framework provided by an operating system. This is true for real-time systems, where there is no compromise on response times and performance.
System calls are the way operating systems provide access to the resources for the applications. It is an API implemented by the kernel for accessing the hardware safely.
There are some categories that we can use to split the numerous functions offered by the operating system. These include the control of the running applications and their flow, the filesystem access, and the network.
This type of services includes load, which adds a program to memory and prepares for its execution before passing control to the program itself, or execute, which runs an executable file in the context of a pre-existing process. Other operations that belong to this category are as follows:
end
and
abort
—the first requires the application to exit while the second forces it.
CreateProcess
, also known as
fork
on Unix systems or
NtCreateProcess
in Windows.
Terminate process.
Get/set process attributes.
Wait for time, wait event, or signal event.
Allocate and free memory.
The handling of files and filesystems belongs to file management system calls. There are create and delete files that make it possible to add or remove an entry from the filesystem, and open and close operations that make it possible to gain control of a file in order to execute read and write operations. It is also possible to read and change file attributes.
Device management handles all other devices but the filesystem, such as frame buffers or display. It includes all operations from the request of a device, including the communication to and from it (read, write, seek), and its release. It also includes all the operations of changing device attributes and logically attaching and detaching them.
Reading and writing the system date and time belongs to the information maintenance category. This category also takes care of other system data, such as the environment. Another important set of operations that belongs here is the request and the manipulation of processes, files, and device attributes.
All the network operations from handling sockets to accepting connections fall into the communication category. This includes the creation, deletion, and naming of connections, and sending and receiving messages.
Windows has a series of different system calls that cover all the kernel operations. Many of these correspond exactly with the Unix equivalent. Here's a list of some of the overlapping system calls:
Windows
Unix
Process control
CreateProcess()ExitProcess()WaitForSingleObject()
fork()exit()wait()
File manipulation
CreateFile()
ReadFile()
WriteFile()
CloseHandle()
open()read()write()close()
File protection
SetFileSecurity()
InitializeSecurityDescriptor()
SetSecurityDescriptorGroup()
chmod()umask()chown()
Device management
SetConsoleMode()
ReadConsole()
WriteConsole()
ioctl()
read()
write()
Information maintenance
GetCurrentProcessID()
SetTimer()
Sleep()
getpid()
alarm()
sleep()
Communication
CreatePipe()
CreateFileMapping()
MapViewOfFile()
pipe()
shmget()
mmap()
In order to ensure consistency between operating systems, IEEE formalized some standards for operating systems. These are described in the following sections.
Portable Operating System Interface (POSIX) for Unix represents a series of standards for operating system interfaces. The first version dates back to 1988 and covers a series of topics like filenames, shells, and regular expressions.
There are many features defined by POSIX, and they are organized in four different standards, each one focusing on a different aspect of the Unix compliance. They are all named POSIX followed by a number.
POSIX.1 is the 1988 original standard, which was initially named POSIX but was renamed to make it possible to add more standards to the family without giving up the name. It defines the following features:
Process creation and control
Signals:
Floating point exceptions
Segmentation/memory violations
Illegal instructions
Bus errors
Timers
File and directory operations
Pipes
C library (standard C)
I/O port interface and control
Process triggers
POSIX.1b focuses on real-time applications and on applications that need high performance. It focus on these aspects:
Priority scheduling
Real-time signals
Clocks and timers
Semaphores
Message passing
Shared memory
Asynchronous and synchronous I/O
Memory locking interface
POSIX.1c introduces the multithread paradigm and defines the following:
Thread creation, control, and cleanup
Thread scheduling
Thread synchronization
Signal handling
POSIX.2 specifies standards for both a command-line interpreter andutility programs as cd, echo, or ls.
Not all operating systems are POSIX compliant. Windows was born after the standard and it is not compliant, for instance. From a certification point of view, macOS is more compliant than Linux, because the latter uses another standard built on top of POSIX.
Most Linux distributions follow the Linux Standard Base (LSB), which is another standard that includes POSIX and much more, focusing on maintaining the inter-compatibility between different Linux distributions. It is not considered officially compliant because the developers didn't go into the process of certification.
However, macOS became fully compatible in 2007 with the Snow Leopard distribution, and it has been POSIX-certified since then.
Windows is not POSIX compliant, but there are many attempts to make it so. There are open source initiatives such as Cygwin and MinGW that provide a less POSIX-compliant development environment, and support C applications using the Microsoft Visual C runtime library. Microsoft itself has made some attempts at POSIX compatibility, such as the Microsoft POSIX subsystem. The latest compatibility layer made by Microsoft is the Windows Linux Subsystem, which is an optional feature that can be activated in Windows 10 and has been well received by developers (including myself).
In this chapter, we saw what system programming means—writing system software that has some strict requirements, such as being tied to the hardware, using a low-level language, and working in a resource-constrained environment.Its practices can be really useful when building distributed systems that normally require optimizing resource usage. We discussed APIs,definitions that allows software to be used by other software, and listed the different types—the ones in the operating system, libraries and frameworks, and remote and web APIs.
We analyzed how, in operating systems, the access to resources is arranged in hierarchical levels called protection rings that prevent uncontrolled usage in order to improve security and avoid failures from the applications. The Linux model simplifies this hierarchy to just two levels called user and kernel space. All the applications are running in the user space, and in order to access the machine's resources they need the kernel to intercede.
Then we saw one specific type of API called system calls that allows the applications to request resources to the kernel, and mediates process control, access and management of files, and devices and network communications.
We gave an overview of the POSIX standard, which defines Unix system interoperability. Among the features defined, there are also the C API, CLI utilities, shell language, environment variables, program exit status, regular expressions, directory structures, filenames, and command-line utility API conventions.
In the next chapter, we will explore the Unix operating system resources such as the filesystem and the Unix permission model. We will look at what processes are, how they communicate with each other, and how they handle errors.
What is the difference between application and system
programming?
What is an API? Why are APIs so important?
Could you explain how protection rings work?
Can you make some examples of what cannot be done in user space?
What's a system call?
Which calls are used in Unix to manage a process?
Why is POSIX useful?
Is Windows POSIX compliant?
This chapter will be focusing on Unix OS and on the components that the user will interact with: files and filesystems, processes, users and permissions, and so on. It will also explain some basic process communication and how system program error handling works. All these parts of the operating system will be the ones we will be interacting with when creating system applications.
The following topics will be covered in this chapter:
Memory management
Files and filesystems
Processes
Users, groups, and permissions
Process communications
In the same way as the previous chapter, this one does not require any software to be installed: any other POSIX-compliant shell is enough.
You could choose, for instance, Bash (https://www.gnu.org/software/bash/), which is recommended, Zsh (http://www.zsh.org/), or fish (https://fishshell.com/).
The operating system handles the primary and secondary memory usage of the applications. It keeps track of how much of the memory is used, by which process, and what parts are free. It also handles allocation of new memory from the processes and memory de-allocation when the processes are complete.
There are different techniques for handling memory, including the following:
Single allocation
: All the memory, besides the part reserved for the OS, is available for the application. This means that there can only be
one application
in execution at a time, like in
Microsoft Disk Operating System
(
MS-DOS
).
Partitioned allocation
: This divides the memory into different blocks called partitions. Using one of these blocks per process makes it possible to execute more than one process at once. The partitions can be relocated and compacted in order to obtain more contiguous memory space for the next processes.
Paged memory
: The memory is divided into parts called frames, which have a fixed size. A process' memory is divided into parts of the same size called
pages
. There is a mapping between pages and frames that makes the process see its own virtual memory as contiguous. This process is also known as
pagination
.
Unix uses the paged memory management technique, abstracting its memory for each application into contiguous virtual memory. It also uses a technique called swapping, which extends the virtual memory to the secondary memory (hard drive or solid state drives (SSD)) using a swap file.
When memory is scarce, the operating system puts pages from processes that are sleeping in the swap partition in order to make space for active processes that are requesting more memory, executing an operation called swap-out. When a page that is in the swap file is needed by a process in execution it gets loaded back into the main memory for executing it. This is called swap-in.
The main issue of swapping is the performance drop when interacting with secondary memory, but it is very useful for extending multitasking capabilities and for dealing with applications that are bigger than the physical memory, by loading just the pieces that are actually needed at a given time. Creating memory-efficient applications is a way of increasing performance by avoiding or reducing swapping.
The top command shows details about available memory, swap, and memory consumption for each process:
RES
is the physical primary memory used by the process.
VIRT
is the total memory used by the process, including the swapped memory, so it's equal to or bigger than
RES
.
SHR
is the part of
VIRT
that is actually shareable, such as loaded libraries.
A filesystem is a method used to structure data in a disk, and a file is the abstraction used for indicating a piece of self-contained information. If the filesystem is hierarchical, it means that files are organized in a tree of directories, which are special files used for arranging stored files.
Over the last 50 years, a large number of filesystems have been invented and used, and each one has its own characteristics regarding space management, filenames and directories, metadata, and access restriction. Each modern operating system mainly uses a single type of filesystem.
Linux's filesystem (FS) of choice is the extended filesystem (EXT) family, but other ones are also supported, including XFS, Journaled File System (JFS), and B-tree File System (Btrfs). It is also compatible with the older File AllocationTable (FAT) family (FAT16 and FAT32) and New Technology File System (NTFS). The filesystem most commonly used remains the latest version of EXT (EXT4), which was released in 2006 and expanded its predecessor's capacities, including support for bigger disks.
macOS uses the Apple File System (APFS), which supports Unix permission and has journaling. It is also metadata-rich and case-preserving, while being a case-insensitive filesystem. It offers support for other filesystems, including HFS+ and FAT32, supporting NTFS for read-only operations. To write to such a filesystem, we can use an experimental feature or third-party applications.
The main filesystem used by Windows is NTFS. As well as being case-insensitive, the signature feature that distinguishes Windows FS from others is the use of a letter followed by a colon to represent a partition in paths, combined with the use of backslash as a folder separator, instead of a forward slash. Drive letters, and the use of C for the primary partition, comes from MS-DOS, where A and B were reserved drive letters used for floppy disk drives.
Windows also natively supports otherfilesystems, such as FAT, which is a filesystem family that was very popular between the late seventies and the late nineties, and Extended File Allocation Table (exFAT), which is a format developed by Microsoft on top of FAT for removable devices.
Most files are regular files, containing a certain amount of data. For instance, a text file contains a sequence of human-readable characters represented by a certain encoding, while a bitmap contains some metadata about the size and the bit used for each pixel, followed by the content of each pixel.
Files are arranged inside directories that make it possible to have different namespaces to reuse filenames. These are referred to with a name, their human-readable identifier, and organized in a tree structure. The path is a unique identifier that represents a directory, and it is made by the names of all the parents of the directory joined by a separator (/ in Unix, \ in Windows), descending from the root to the desired leaf. For instance if a directory named a is located under another named b, which is under one called c, it will have a path that starts from the root and concatenates all the directories, up to the file: /c/b/a.
When more than one file points to the same content, we have a hard link, but this is not allowed in all filesystems (for example, NTFS and FAT). A soft link is a file that points to another soft link or to a hard link. Hard links can be removed or deleted without breaking the original link, but this is not true for soft links. A symbolic link is a regular file with its own data that is the path of another file. It can also link other filesystems or files and directories that do not exist (that will be a broken link).
In Unix, some resources that are not actually files are represented as files, and communication with these resources is achieved by writing to or reading from their corresponding files. For instance, the /dev/sda file represents an entire disk, while/dev/stdout,dev/stdin, and /dev/stderr are standard output, input, and error. The main advantage ofEverything is a fileis that the same tools that can be used for files can also interact with other devices (network and pipes) or entities (processes).
The principles contained in this section are specific to the filesystems used by Linux, such as EXT4.
In Linux and macOS, each file and directory is represented by an inode, which is a special data structure that stores all the information about the file except its name and its actual data.
Inode 0 is used for a null value, which means that there is no inode. Inode 1 is used to record any bad block on the disk. The root of the hierarchical structure of the filesystem uses inode 2. It is represented by /.
From the latest Linux kernel source, we can see how the first inodes are reserved. This is shown as follows:
#define EXT4_BAD_INO 1 /* Bad blocks inode */#define EXT4_ROOT_INO 2 /* Root inode */#define EXT4_USR_QUOTA_INO 3 /* User quota inode */#define EXT4_GRP_QUOTA_INO 4 /* Group quota inode */#define EXT4_BOOT_LOADER_INO 5 /* Boot loader inode */#define EXT4_UNDEL_DIR_INO 6 /* Undelete directory inode */#define EXT4_RESIZE_INO 7 /* Reserved group descriptors inode */#define EXT4_JOURNAL_INO 8 /* Journal inode */
This link is the source for the preceding code block: https://elixir.bootlin.com/linux/latest/source/fs/ext4/ext4.h#L212.
In Unix filesystems, there is a series of other directories under the root, each one used for a specific purpose, making it possible to maintain a certain interoperability between different operating systems and enabling compiled software to run on different OSes, making the binaries portable.
This is a comprehensive list of the directories with their scope:
Directory
Description
/bin
Executable files for all users
/boot
Files for booting the system
/dev
Device drivers
/etc
Configuration files for applications and system
/home
Home directory for users
/kernel
Kernel files
/lib
Shared library files and other kernel-related files
/mnt
Temporary filesystems, from floppy disks and CDs to flash drives
/proc
File with process numbers for active processes
/sbin
Executable files for administrators
/tmp
Temporary files that should be safe to delete
/usr
Administrative commands, shared files, library files, and others
/var
Variable-length files (logs and print files)
While using a shell, one of the directories will be the working directory, when paths are relative (for example, file.sh or dir/subdir/file.txt). The working directory is used as a prefix to obtain an absolute one. This is usually shown in the prompt of the command line, but it can be printed with the pwd command (print working directory).
The cd (change directory) command can be used to change the current working directory. To create a new directory, there's themkdir(make directory) command.
To show the list of files for a directory, there's the ls command,which accepts a series of options, including more information (-l), showing hidden files and directories (-a), and sorting by time (-t) and size (-S).
There is a series of other commands that can be used to interact with files: the touch command creates a new empty file with the given name, and to edit its content you can use a series of editors, including vi and nano, whilecat,more, andlessare some of the commands that make it possible to read them.
The operating system splits the hard drive into logical units called partitions, and each one can be a different file system. When the operating system starts, it makes some partitions available using the mount command for each line of the /etc/fstab file, which looks more or less like this:
# device # mount-point # fstype # options # dumpfreq # passno
/dev/sda1 / ext4 defaults 0 1
This configuration mounts /dev/sda1 to /disk using an ext4filesystem and default options, no backing up (0), and root integrity check (1). Themountcommand can be used at any time to expose partitions in the filesystem. Its counterpart,umount, is needed to remove these partitions from the main filesystem. The empty directory used for the operation is calledmount point, and it represents the root under which the filesystem is connected.
When an application is launched, it becomes a process: a special instance provided by the operating system that includes all the resources that are used by the running application. This program must be in Executable and Linkable Format (ELF), in order to allow the operating system to interpret its instructions.
Each process is a five-digit identifier process ID (PID), and it represents the process for all its life cycle. This means that there cannot be two processes with the same PID at the same time. Their uniqueness makes it possible to access a specific process by knowing its PID. Once a process is terminated, its PID can be reused for another process, if necessary.
Similar to PID, there are other properties that characterize a process. These are as follows:
P
PID
: The parent process ID of the process that started this process
Nice number
: Degree of friendliness of this process toward other processes
Terminal or TTY
