Command Line Fundamentals - Vivek Nagarajan - E-Book

Command Line Fundamentals E-Book

Vivek Nagarajan

0,0
27,59 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

The most basic interface to a computer—the command line—remains the most flexible and powerful way of processing data and performing and automating various day-to-day tasks.

Command Line Fundamentals begins by exploring the basics, and then focuses on the most common tool, the Bash shell (which is standard on all Linux and iOS systems). As you make your way through the book, you'll explore the traditional Unix command-line programs as implemented by the GNU project. You'll also learn to use redirection and pipelines to assemble these programs to solve complex problems.

By the end of this book, you'll have explored the basics of shell scripting, allowing you to easily and quickly automate tasks.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 341

Veröffentlichungsjahr: 2018

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Command Line Fundamentals

Learn to use the Unix command-line tools and Bash shell scripting

Vivek N

Command Line Fundamentals

Copyright © 2018 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Author: Vivek N

Technical Reviewer: Sundeep Agarwal

Managing Editor: Neha Nair

Acquisitions Editor: Koushik Sen

Production Editor: Samita Warang

Editorial Board: David Barnes, Ewan Buckingham, Simon Cox, Manasa Kumar, Alex Mazonowicz, Douglas Paterson, Dominic Pereira, Shiny Poojary, Saman Siddiqui, Erol Staveley, Ankita Thakur, and Mohita Vyas

First Published: December 2018

Production Reference: 1211218

ISBN: 978-1-78980-776-9

Table of Contents

Preface   i

Introduction to the Command Line   1

Introduction   2

Command Line: History, Shells, and Terminology   3

History of the Command Line   3

Command-Line Shells   4

Command-Line Terminology   5

Exploring the Filesystem   5

Filesystems   5

Navigating Filesystems   8

Exercise 1: Exploring Filesystem Contents   10

Manipulating a Filesystem   17

Exercise 2: Manipulating the Filesystem   19

Activity 1: Navigating the Filesystem and Viewing Files   28

Activity 2: Modifying the Filesystem   29

Shell History Recall, Editing, and Autocompletion   31

Command History Recall   31

Exercise 3: Exploring Shell History   34

Command-Line Shortcuts   35

Exercise 4: Using Shell Keyboard Shortcuts   37

Command-Line Autocompletion   38

Exercise 5: Completing a Folder Path   39

Exercise 6: Completing a Command   40

Exercise 7: Completing a Command using Options   40

Activity 3: Command-Line Editing   41

Shell Wildcards and Globbing   42

Wildcard Syntax and Semantics   43

Wildcard Expansion or Globbing   43

Exercise 8: Using Wildcards   44

Activity 4: Using Simple Wildcards   49

Activity 5: Using Directory Wildcards   49

Summary   50

Command-Line Building Blocks   53

Introduction   54

Redirection   54

Input and Output Streams   54

Use of Operators for Redirection   55

Using Multiple Redirections   56

Heredocs and Herestrings   57

Buffering   59

Exercise 9: Working with Command Redirection   61

Pipes   67

Exercise 10: Working with Pipes   68

Text-Processing Commands   72

Shell Input Concepts   73

Filtering Commands   78

Exercise 11: Working with Filtering Commands   87

Transformation Commands   94

Exercise 12: Working with Transformation Commands   101

Activity 6: Processing Tabular Data – Reordering Columns    105

Activity 7: Data Analysis   108

Summary   110

Advanced Command-Line Concepts   113

Introduction   114

Command Lists   114

Command List Operators   114

Using Multiple Operators   117

Command Grouping   118

Exercise 13: Using Command Lists   119

Job Control   122

Keyboard Shortcuts for Controlling Jobs   123

Commands for Controlling Jobs   124

Regular Expressions   129

Elements   130

Quantifiers   132

Anchoring   133

Subexpressions and Backreferences   134

Exercise 14: Using Regular Expressions   136

Activity 8: Word Matching with Regular Expressions   141

Shell Expansion   142

Environment Variables and Variable Expansion   143

Arithmetic Expansion   146

Brace Expansion   149

Recursive Expansion with eval   150

Command Substitution   152

Process Substitution   153

Exercise 15: Using Shell Expansions   155

Activity 9: String Processing with eval and Shell Expansion   160

Summary   162

Shell Scripting   165

Introduction   166

Conditionals and Loops   166

Conditional Expressions   166

Conditional Statements   173

Loops   178

Loop Control   184

Shell Functions   186

Function Definition   186

Function Arguments   188

Return Values   192

Local Variables, Scope, and Recursion   193

Exercise 16: Using Conditional Statements, Loops, and Functions   197

Shell Line Input   200

Line Input Commands   200

Internal Field Separator   205

Exercise 17: Using Shell Input Interactively   209

Shell Scripts   212

Shell Command Categories   212

Program Launch Process   213

Script Interpreters   213

Practical Case Study 1: Chess Game Extractor   217

Understanding the Problem   217

Exercise 18: Chess Game Extractor – Parsing a PGN File   219

Exercise 19: Chess Game Extractor – Extracting a Desired Game   222

Refining Our Script   225

Exercise 20: Chess Game Extractor – Handling Options   226

Adding Features   230

Exercise 21: Chess Game Extractor – Counting Game Moves   231

Tips and Tricks   240

Suppressing Command Output   240

Arithmetic Expansion   240

Declaring Typed Variables   243

Numeric for Loops   244

echo   246

Array Reverse Indexing   246

shopt   247

Extended Wildcards   248

man and info Pages   248

shellcheck   249

Activity 10: PGN Game Extractor Enhancement   249

Practical Case Study 2: NYC Yellow Taxi Trip Analysis   250

Understanding the Dataset   251

Exercise 22: Taxi Trip Analysis – Extracting Trip Time   251

Exercise 23: Taxi Trip Analysis – Calculating Average Trip Speed   258

Exercise 24: Taxi Trip Analysis – Calculating Average Fare   260

Activity 11: Shell Scripting – NYC Taxi Trip Analysis   263

Summary   265

Appendix   269

>

Preface

About

This section briefly introduces the author, the coverage of this book, the technical skills you'll need to get started, and the hardware and software required to complete all of the included activities and exercises.

About the Book

From the Bash shell to traditional UNIX programs, and from redirection and pipes to automating tasks, Command Line Fundamentals teaches you all you need to know about how command lines work.

The most basic interface to a computer, the command line, remains the most flexible and powerful way of processing data and performing and automating various day-to-day tasks. Command Line Fundamentals begins by exploring the basics and then focuses on the most common tool, the Bash shell (which is standard on all Linux and macOs/iOS systems). As you make your way through the book, you'll explore the traditional UNIX command-line programs implemented by the GNU project. You'll also learn how to use redirection and pipelines to assemble these programs to solve complex problems. Next, you'll learn how to use redirection and pipelines to assemble those programs to solve complex problems.

By the end of this book, you'll have explored the basics of shell scripting, which will allow you to easily and quickly automate tasks.

About the Author

Vivek N is a self-taught programmer who has been programming for almost 30 years now, since the age of 8, with experience in X86 Assembler, C, Delphi, Python, JavaScript, and C++. He has been working with various command-line shells since the days of DOS 4.01, and is keen to introduce the new generation of computer users to the power it holds to make their lives easier. You can reach out to him through his Gmail ID rep.movsd.

Objectives

Use the Bash shell to run commandsUtilize basic Unix utilities such as cat, tr, sort, and uniqExplore shell wildcards to manage groups of filesApply useful keyboard shortcuts in shellEmploy redirection and pipes to process dataWrite both basic and advanced shell scripts to automate tasks

Audience

Command Line Fundamentals is for programmers who use GUIs but want to understand how to use the command line to complete tasks more quickly.

Approach

Command Line Fundamentals takes a hands-on approach to the practical aspects of exploring UNIX command-line tools. It contains multiple activities that use real-life business scenarios for you to practice and apply your new skills in a highly relevant context.

Hardware Requirements

For the optimal student experience, we recommend the following hardware configuration:

Processor: Any modern processor manufactured after 2010Memory: 4 GB RAMStorage: 4 GB available hard disk space

Software Requirements

The ideal OS for this book is a modern Linux distribution. However, there are many dozens of flavors of Linux, with different versions, and several other OS platforms, including Windows and macOS/iOS, which are widely used. In order to make the book accessible to students using any OS platform or version, we will use a virtual machine to ensure a uniform isolated environment. If you are not familiar with the term, a virtual machine lets an entire computer be simulated within your existing one; hence, you can use another OS (in this case, a tiny cut-down Linux distribution) as if it were running on actual hardware, completely isolated from your regular OS. The advantage of this approach is a simple, uniform experience for all students, regardless of the system used. Another advantage is that the VM is sandboxed and anything performed within it will not interfere in any way with the existing system. Finally, VMs allow snapshotting, which allows you to undo any serious mistakes you may make with little effort. Once you have completed the exercises and activities in this book in the VM, you can experiment with the command-line support that is available on your individual system. Those who wish to use the commands learned in this book on their systems directly should refer to the documentation for their specific platforms, to ensure that they work as expected. For the most part, the behaviors are standard, but some platforms might only support older versions of some commands, might lack some options for some commands, or completely lack support for certain commands:

Linux: All up-to-date Linux distributions will support all the commands and techniques taught in this book. Some may require the installation of additional packages.Windows: The Windows Linux Subsystem allows a few Linux distributions, such as Ubuntu and Debian, to run from within Windows. Some packages may require installation to support everything covered in this book.macOS and iOS: These OSes are based on FreeBSD, which is a variant of UNIX, and they include most of the GNU tools. Some packages may require installation to support everything covered in this book.

Note

If you use the VM, all the sample data required to complete the exercises and activities in this book will automatically be fetched and installed in the correct location, when the VM is started the first time. On the other hand, if you decide to use your native OS install, you will have to download the ZIP files (Lesson1.zip to Lesson4.zip) present in the code repository on GitHub and extract them into the home directory of your user account. The data consists of four folders, called Lesson1 to Lesson4, and several commands in the exercises rely on the data being in the locations ~/Lesson1 and so on. It is recommended that you stick to the VM approach unless you know what you are doing.

Installation and Setup

Before you start this book, you need to install the following software. You will find the steps to install these here:

Installing VirtualBox

Download the latest version of VirtualBox from https://www.virtualbox.org/wiki/Downloads and install it.

Setting up the VM

Download the VM appliance file, Packt-CLI.ova, from the Git repository here: https://github.com/TrainingByPackt/Command-Line-Fundamentals/blob/master/Packt-CLI.ova.Launch VirtualBox and select File | Import Appliance:
Figure 0.1: A screenshot showing how to make the selection

The following dialog box will appear:

Figure 0.2: A screenshot displaying the dialog box
Browse for the Packt-CLI.ova file downloaded earlier and click Next, after which the following dialog box should be shown. The path where the Virtual Disk Image is to be saved can be changed if you wish, but the default location should be fine. Ensure there is at least 4 GB of free space available:
Figure 0.3: A screenshot showing the path where the Virtual Disk Image will be saved
Click Import to create the virtual machine. After the process completes, the VM name will be visible in the left-hand panel of the VirtualBox window:
Figure 0.4: A screenshot showing the successful installation of VirtualBox
Double-click the VM entry, Packt-CLI, to start the VM. You will see a lot of text scroll by as it boots up, and after a few seconds a GUI desktop will show up. The window may maximize to your entire screen; however, you can resize it to whatever is convenient. The desktop inside will adjust to fit in. Your system is called a host and the VM within is called a guest. VirtualBox may show a couple of information popups at the top of the VM. Read the information to understand how the VM mouse and keyboard capture works. You can click the little buttons at the extreme right of the popups that have the message Do not show this message again to prevent them from showing up again. More information can be found at https://www.virtualbox.org/manual/ch01.html#keyb_mouse_normal.

Note

In case the VM doesn't start at all, or you see an error message "Kernel Panic" in the VM window, you can usually solve this by enabling the virtualization settings in BIOS. See https://www.howtogeek.com/213795/how-to-enable-intel-vt-x-in-your-computers-bios-or-uefi-firmware/for an example tutorial.

When the VM starts up for the first time, it will download the sample data and snippets for this book automatically. The following window will appear:

Figure 0.5: A screenshot displaying the first-time setup script progress

There are four launcher icons in the toolbar on top, which are shown here:

Figure 0.6: A screenshot displaying the launcher icons
The first launcher is the Root menu, which is like the Start menu of Windows. Since the guest OS is a minimal, stripped-down version, many of the programs shown there will not run. The only entry you will need to use during this book is the Log Out option.
Figure 0.7: A screenshot showing the Root menu
The second launcher is the Thunar file manager. By default, it opens the home directory of the current user, called guest (note that the guest username has no connection to the term "guest" used in the context of virtual machines). The sample data for the chapters is in the folders Lesson1 to Lesson4. All the snippets and examples in the book material assume this location. The Snippets folder contains a subfolder for each chapter with all the exercises and activity solutions as text files.
Figure 0.8: A screenshot showing the Thunar file manager
The third launcher is the command-line terminal application. This is what you will need to use throughout the book. Notice that it starts in the home directory of the logged-in user, guest.
Figure 0.9: A screenshot showing the command-line terminal
The final launcher is a text editor called Mousepad, which will be useful for viewing the snippets and writing scripts during the book:
Figure 0.10: A screenshot of the text editor

Guidelines and Tips for using the VM

The desktop environment in the guest OS is called XFCE and is very similar to Windows XP. The top toolbar shows the running tasks. The windows behave just like any other desktop environment.Within the console, you can select text with the mouse and paste it onto the command line with the middle mouse button (this is distinct from the clipboard). To copy selected text in the console to the clipboard, press Ctrl+Shift+C and to paste from the clipboard into the command line press Ctrl+Shift+V (or right-click and choose Paste). This will be useful when you try out the snippets. You can copy from the editor and paste into the command line, although it is recommended that you type them out. Be careful not to paste multiple or incomplete commands into the console, as it could lead to errors. To shut down the guest OS, click Log Out from the Root menu to get the following dialog:
Figure 0.11: A screenshot showing the dialogue box that appears on shut down
To close the VM (preserving its state) and resume later, close the VM window and choose Save the machine state. Next time the VM is started, it resumes from where it was. Usually, it would be preferable to use this option than choosing Shut down, as shown earlier.
Figure 0.12: A screenshot showing how to save your work before closing the VM
The VM allows the guest and host OS to share the clipboard, so that text that you copy to the clipboard in the host, can be pasted into applications in the guest VM and vice versa. This is useful if you prefer to use your own editor rather than the one included in the VM.It is strongly recommended that you close the shell window after completion of each exercise or activity, and open a fresh instance for the next.During the book, it is possible that you, by mistake, will end up changing the sample data (or the guest OS itself) in such a way that you cannot complete the exercises. To avoid starting from scratch, you are advised to create a snapshot of the VM after each exercise or activity is performed. This can be done by clicking Snapshots in the VirtualBox window:
Figure 0.13: A screenshot showing the Snapshots window
Click Take to save the current state of the VM as a snapshot:
Figure 0.14: A screenshot showing how to take a snapshot

You can take any number of snapshots and restore them, taking the guest OS back to the exact state as when you saved it. Note that snapshots can only be restored when the guest has been shut down. Snapshots will take up some disk space. Deleting a snapshot does not affect the current state:

Figure 0.15: A screenshot showing how to restore the OS
You are free to customize the color scheme, fonts, and preferences of the editor and console application to suit your own tastes but be sure to take a snapshot before changing things, to avoid being left with an unusable guest OS.If the VM somehow becomes completely unusable (which is quite unlikely), you can always delete it and repeat the setup process.If you get logged out by mistake, log in as guest with the password packt.

Installing the Code Bundle

Copy the code bundle for the class to the C:/Code folder.

Conventions

Code words in text, folder names, filenames, file extensions, pathnames, user input, and example strings are shown as follows: "Navigate to the data folder inside the Lesson2 folder."

A block of code is set as follows: The text typed by the user is in bold and the output printed by the system is in regular font:

$ echo 'Hello World'

Hello World

New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "Click Log Out from the Root menu."

Additional Resources

The code bundle for this book is also hosted on GitHub at https://github.com/TrainingByPackt/Command-Line-Fundamentals.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

You can also find links to the Official GNU Bash Manual and Linux man pages at https://www.gnu.org/software/bash/manual/html_node/index.html and https://linux.die.net/man/, respectively.

1

Introduction to the Command Line

Learning Objectives

By the end of this chapter, you will be able to:

Describe the basics of a filesystemNavigate a filesystem with command-line toolsPerform file management tasks using the command lineUtilize shell history and tab completion to efficiently compose commandsUtilize shell-editing shortcuts to efficiently work with the command lineWrite and use wildcard expressions to manage groups of files and folders

This chapter gives a brief history of the command line, explains filesystems, and describes how to get into and out of the command line.

Introduction

Today, with the widespread use of computing devices, graphical user interfaces(GUIs) are all-pervasive and easily learned by almost anyone. However, we should not ignore one of the most powerful tools from a bygone era, which is the command-line interface (CLI).

GUIs and CLIs approach user interaction from different angles. While GUIs emphasize user-friendliness, instant feedback, and visual aesthetics, CLIs target automation and repeatability of tasks, and composition of complicated task workflows that can be executed in one shot. These features result in the command line having widespread utility even today, nearly half a century since its invention. For instance, it is useful for web administrators to administer a web server via a shell command-line interface: instead of running a local CLI on your machine, you remotely control one that is running thousands of miles away, as if it were right in front of you. Similarly, it is useful for developers who create the backends of websites. This role requires them to learn how to use a command line, since they often need to replicate the web server environment on their local machine for development.

Even outside the purely tech-oriented professions, almost everyone works with computers, and automation is a very helpful tool that can save a lot of time and drudgery. The CLI is specifically built to help automate things. Consider the task of a graphic designer, who downloads a hundred images from a website and resizes all of them into a standard size and creates thumbnails; a personnel manager, who takes 20 spreadsheet files with personnel data and converts all names to upper case, checking for duplicates; or a web content creator, who quickly replaces a person's name with another across an entire website's content.

Using a GUI for these tasks would usually be tedious, considering that these tasks may need to be performed on a regular basis. Hence, rather than repeating these manually using specific applications, such as a download manager, photo editor, spreadsheet, and so on, or getting a custom application written, the professional in each case can use the command line to automate these jobs, consequently reducing drudgery, avoiding errors, and freeing the person to engage in the more important aspects of their job. Besides this, every new version of a GUI invalidates a lot of what you learned earlier. Menus change, toolbars look different, things move around, and features get removed or changed. It is often a re-learning exercise filled with frustration. On the other hand, much of what we learn about the command line is almost 100% compatible with the command line of 30 years ago, and will remain so for the foreseeable future. Rarely is a feature added that will invalidate what was valid before.

Everyone should use the command line because it can make life so much easier, but there is an aura of mystery surrounding the command line. Popular depictions of command-line users are stereotypical asocial geniuses. This skewed perception makes people feel it is very arcane, complex, and difficult to learn—as if it were magic and out of the reach of mere mortals. However, just like any other thing in the world, it can be learned incrementally step-by-step, and unlike learning GUI programs, which have no connection to one another, each concept or tool you learn in the command line adds up.

Command Line: History, Shells, and Terminology

It is necessary for us to explore a little bit of computing history to fully comprehend the rationale behind why CLIs came into being.

History of the Command Line

At the dawn of the computing age, computers were massive electro-mechanical calculators, with little or no interactivity. Stacks of data and program code in the form of punched cards would be loaded into a system, and after a lengthy execution, punched cards containing the results of the computation would be spit out by the machines.

This was called batch processing (this paradigm is still used in many fields of computing even today). The essence of batch processing is to prepare the complete input dataset and the program code by hand and feed it to the machine in a batch. The computation is queued up for execution, and as soon as it finishes, the output is delivered, following which the next computation in the queue is processed.

As the field progressed, the age of the teletypewriter (TTY) arrived. Computers would take input and produce human—readable output interactively through a typewriter-like device. This was the first time that people sat at a terminal and interacted continuously with the system, looking at results of their computations live.

Eventually, TTYs with paper and mechanical keyboards were replaced by TTYs with text display screens and electronic keyboards. This method of interaction with a computer via a keyboard and text display device is called a command-line interface (CLI), and works as follows:

The system prompts the user to type a sentence (a command line).The system executes the command, if valid, and prints out the results.This sequence repeats indefinitely, and the user conducts their work step by step.

In a more generic sense, a CLI is also called a REPL, which stands for Read, Evaluate, Print, Loop, and is defined as follows:

Read an input command from the user.Evaluate the command.Print the result.Loop back to the first step.

The concept of a REPL is seen in many places—even the flight control computer on NASA's 1998 Deep Space 1 mission spacecraft had a REPL controlled from Earth, which allowed scientists to troubleshoot a failure in real-time and prevent the mission from failing.

Command-Line Shells

CLIs that interface with the operating system are called shells. As shells evolved, they went from being able to execute just one command at a time, to multiple commands in sequence, repeat commands multiple times, re-invoke commands from the past, and so on. Most of this evolution happened in the UNIX world, and the UNIX CLI remains up to date the de facto standard.

There are many different CLIs in UNIX itself, which are analogous to different dialects of a language—in other words, the way they interpret commands from the user varies. These CLIs are called shells because they form a shell between the internals of the operating system and the user.

There are several shells that are widely used, such as the Bourne shell, Korn shell, and C shell, to name a few. Shells for other operating systems such as Windows exist too (PowerShell and DOS). In this book, we will learn a modern reincarnation of the Bourne shell, called Bash (Bourne Again Shell), which is the most widely used, and considered the most standard. The Bash shell is part of the GNU project from the Free Software Foundation that was founded by Richard Stallman, which provides free and open source software.

During this book, we will sometimes introduce common abbreviations for lengthy terms, which the students should get accustomed to.

Command-Line Terminology

Before we can delve into the chapters, we will learn some introductory command-line terms that will come handy throughout the book.

Commands: They refer to the names that are typed to execute some function. They can be built into the shell or be external programs. Any program that's available on the system is a command.Arguments: The strings typed after a command are called its arguments. They tell the command how to operate or what to operate on. They are typically options or names of some data resource such as a file, URL, and so on.Switches/Options/Flags: These are arguments that typically start with a single or double hyphen and request a certain optional behavior from a command. Usually, an option has a short form, which is a hyphen followed by a single character, and a longer version of the same option, as a double hyphen followed by an entire word. The long option is easier to remember and often makes the command easier to read. Note that options are always case-sensitive.

The following are some examples of switches and arguments in commands:

ls -l --color --classify

grep -n --ignore-case 'needle' haystack.txt 'my data.txt'

In the preceding snippet, ls and grep are commands, –l, --color, –classify, -n, and --ignore-case are flags, and 'needle', haystack.txt and 'my data.txt' are arguments.

Exploring the Filesystem

The space in which a command line operates is called a filesystem (FS). A lot of shell activity revolves around manipulating and organizing files; thus, learning the basics of filesystems is imperative to learning the command line. In this topic, we will learn about filesystems, and how to navigate, examine, and modify them via the shell. For regular users of computers, some of these ideas may seem familiar, but it is necessary to revisit them to have a clear and unambiguous understanding.

Filesystems

The UNIX design philosophy is to represent every object on a computer as a file; thus, the main objects that we manipulate with a command line are files. There are many different types of file-like objects under UNIX, but for our purposes, we will deal with simple data files, typically ASCII text files, that are human readable.

From this UNIX perspective, the system is accessible under what is termed a filesystem (FS). An FS is a representation of the system that's analogous to a series of nested boxes, each of which is called a directory or folder. Most of us are familiar with this folder structure, which we would have encountered when using a GUI file manager.

A directory that contains another directory is called the parent of the latter. The latter is called a sub-directory of the former. On UNIX-like systems, the outermost directory is called the root directory, and each directory can contain either files or other directories in turn. Some files are not data, but rather represent devices or other resources on the system. To be concise, we will refer to folders, regular files, and special files as FS objects.

Typically, every user of a system has their own distinct home directory, named after the user's name, where they store their own data. Various other directories used by the operating system, called system directories, exist on the filesystem, but we need not concern ourselves with them for the purposes of this book. For the sake of simplicity, we will assume that our entire filesystem resides on only a single disk or partition (although this is not true in general):

Figure 1.1: An illustration of an example structure of a typical filesystem

The notation used to refer to a location in a filesystem is called a path. A path consists of the list of directories that need to be navigated to reach some FS object. The list is separated by a forward slash, which is called a path separator. The complete location of an FS object, including its path from the root directory onward, is called a fully qualified pathname.

Paths can be absolute or relative. An absolute path starts at the root directory, whereas a relative path starts at what is called the current working directory (CWD). Every process that runs on a system is started with its CWD set to some location. This includes the command-line process itself. When an FS object is accessed within the CWD, the name of the object alone is enough to refer to it.

The root directory itself is represented by a single forward slash; thus, any absolute path starts with a single forward slash. The following is an example of an absolute path relative to the root directory:

/home/robin/Lesson1/data/cupressaceae/juniperus/indica

Special syntax is used to refer to the current, parent, and user's home directories:

./ refers to the current directory explicitly. The CWD is implicit in many cases, but is useful when the current directory needs to be explicitly specified as an argument to some commands. For instance, the same directory that we've just seen can be expressed relative to the CWD (/home/robin, in this case) as follows: one pathname specifying ./ explicitly and one without:

./Lesson1/data/cupressaceae/juniperus/indica

Lesson1/data/cupressaceae/juniperus/indica

../ refers to the parent directory. This can be extended further, such as ../../../, and so on. For instance, the preceding directory can be expressed relative to the parent of the CWD, as follows:

../robin/Lesson1/data/cupressaceae/juniperus/indica

The ../ takes us to one level up to the parent of all the user home directories, and then we go back down to robin and the rest of the path.

~/ refers to the home directory of the current user.

~robin/ refers to the home directory of a user called "robin". This is a useful shorthand, because the home directory of a user could be configured to be anywhere in the filesystem. For example, macOS keeps the users' home directories in /Users, whereas Linux systems keep it in /home.

Note

The trailing slash symbol at the end of a directory pathname is optional. The shell does not mandate this. It is usually typed only to make it obvious that it is the name of a directory rather than a file.

Navigating Filesystems

We will now look briefly at the most common commands for moving around the filesystem and examining its contents:

The cd (change directory) command changes the CWD to the path specified as its argument—if the path is non-existent, it prints an error message. Specifying just a single hyphen as the argument to cd changes the CWD to the last directory that was navigated from.The pwd (print working directory) command simply displays the absolute path of the CWD.The pushd and popd (push directory and pop directory) commands are used to bookmark the CWD and return to it later, respectively. They work by pushing and popping entries on to an internal directory stack, hence the names pushd and popd. Since they use a stack, you can push multiple values and pop them later in reverse order.The tree command displays the hierarchical structure of a directory as a text-based diagram.The ls (list) command displays the contents of one or more specified directories (by default, the CWD) in various formats.The cat (concatenate) command outputs the concatenation of the contents of the files specified to it. If only one file is specified, it simply displays the file. This is a quick way to look at a file's content, if the files are small. cat can also apply some transformations on its output, such as numbering the lines or suppressing multiple blank lines.The less command can be used to interactively scroll through one or more files easily, search for a string, and so on. This command is called a pager (it lets text content be viewed page by page). On most systems, less is configured to be the default pager. Other commands that require a pager interface will request the default pager from the system for this purpose. Here are some of the most useful keyboard shortcuts for less:

(a) The up or down and Page Up or Page Down keys scroll vertically.

(b) The Enter and spacebar keys scroll down by one line and one screenful, respectively.

(c) < and > or g and G characters will scroll to the beginning and end of the file, respectively.

(d) / followed by a string and then Enter searches for the specified string. The occurrences are also highlighted.

(e) n and N jump to the next or previous match, respectively.

(f) Esc followed by u turns off the highlights.

(g) h shows a help screen, with the list of shortcuts and commands that are supported.

(h) q exits the application or exits the help screen if it is being shown.

There are many more features for navigating, searching, and editing that less provides, which we will not cover in this basic introduction.

Commonly Used Options for the Commands

The following options are used with the ls command:

The -l option (which stands for long list) shows the contents with one entry per line—each column in the listing shows some specific information, namely permissions, link count, owner, group, size, and modification time, followed by the name, respectively. For the purposes of this book, we will only consider the size and the name. Information about the type of each FS object is indicated in the first character of the permissions field. For example, - for a file, and d for a directory.The --reverse option sorts the entries in reverse order. This is an example of a long option, where the option is a complete word, which is easy to remember. Long options are usually aliases for short options—in this case, the corresponding short option is -r.The --color option is used to make different kinds of FS objects display in different colors—there is no corresponding short option for this.

The following options are used with the tree command:

The -d option prints only directories and skips filesThe -o option writes the output to a file rather than the displayThe -H option generates a formatted HTML output, and typically would be used along with -o to generate an HTML listing to serve on a website

Before going ahead with the exercises, let's establish some conventions for the rest of this book. Each chapter of this book includes some test data to practice on. Throughout this book, we will assume that each chapter's data is in its own folder called Lesson1, Lesson2, and so on.

In all of the exercises that follow, it is assumed that the work is in the home directory of the logged-in user (here, the user is called robin).

Exercise 1: Exploring Filesystem Contents

In this exercise, we will navigate through a complex directory structure and view files using the commands learned so far. The sample data used here is a dataset of conifer trees, hierarchically structured as per botanic classification, which will be used in future activities and exercises too.

Open the command-line shell.Navigate to the Lesson1 directory and examine the contents of the folder with the ls command:

robin ~ $ cd Lesson1

robin ~/Lesson1 $ ls

data data1

In the preceding code snippet, the part of the first line up to the $ symbol is called a prompt. The system is prompting for a command to be typed. The prompt shows the current user, in this case robin, followed by the CWD ~/Lesson1. The text shown after the command is what the command itself prints as output.

Note

Recall that ~ means the home directory of the current user.

Use the cd command to navigate to the data directory and examine its contents with ls:

robin ~/Lesson1 $ cd data

robin ~/Lesson1/data $ ls

cupressaceae pinaceae podocarpaceae taxaceae

Note

Notice that the prompt shown afterward displays the new CWD. This is not always true. Depending on the configuration of the system, the prompt may vary, and may even be a simple $ symbol with no other information shown.

The ls command can be provided with one or more arguments, which are the names of files and folders to list. By default, it lists only the CWD. The following snippet can be used to view the subdirectories within the taxaceae and podocarpaceae directories:

robin ~/Lesson1/data $ ls taxaceae podocarpaceae

podocarpaceae/:

acmopyle dacrydium lagarostrobos margbensonia parasitaxus podocarpus saxegothaea

afrocarpus falcatifolium lepidothamnus microcachrys pherosphaera prumnopitys stachycarpus

dacrycarpus halocarpus manoao nageia phyllocladus retrophyllum sundacarpus

taxaceae/:

amentotaxus  austrotaxus  cephalotaxus  pseudotaxus  taxus  torreya

The dataset contains a directory for every member of the botanical families of coniferous trees. Here, we can see the top-level directories for each botanical family. Each of these has subdirectories for the genii, and those in turn for the species.

You can also use ls to request a long output in color, as follows:

robin ~/Lesson1/data $ ls -l --color

total 16

drwxr-xr-x 36 robin robin 4096 Aug 20 14:01 cupressaceae

drwxr-xr-x 15 robin robin 4096 Aug 20 14:01 pinaceae

drwxr-xr-x 23 robin robin 4096 Aug 20 14:01 podocarpaceae

drwxr-xr-x 8 robin robin 4096 Aug 20 14:01 taxaceae

Navigate into the taxaceae folder, and then use the tree command to visualize the directory structure at this point. For clarity, specify the -d option, which instructs it to display only directories and exclude files:

robin ~/Lesson1/data $ cd taxaceae

robin ~/Lesson1/data/taxaceae $ tree -d

You should get the following output on running the preceding command:

Figure 1.2: The directory structure of the taxaceae folder (not shown entirely)
cd can be given a single hyphen as an argument to jump back to the last directory that was navigated from:

robin ~/Lesson1/data/taxaceae $ cd taxus

robin ~/Lesson1/data/taxaceae/taxus $ cd -

/home/robin/Lesson1/data/taxaceae

Observe that it prints out the absolute path of the directory it is changing to.

Note

The home directory is stored in /home on UNIX-based systems. Other operating systems such as Mac OS may place them in other locations, so the output of some of the following commands may slightly differ from that shown here.

We can move upwards in the hierarchy by using .. any number of times. Type the first command that follows to reach the home directory, which is three levels up. Then, use cd - to return to the previous location:

robin ~/Lesson1/data/taxaceae $ cd ../../..

robin ~ $ cd -

/home/robin/Lesson1/data/taxaceae

robin ~/Lesson1/data/taxaceae $

Use cd without any arguments to go to the home directory. Then, once again, use cd - to return to the previous location:

robin ~/Lesson1/data/taxaceae $ cd

robin ~ $ cd -

/home/robin/Lesson1/data/taxaceae

robin ~/Lesson1/data/taxaceae $

Now, we will explore commands that help us navigate the folder structure, such as pwd, pushd, and popd. Use the pwd command to display the path of the CWD, as follows:

robin ~/Lesson1/data/taxaceae $ pwd

/home/robin/Lesson1/data/taxaceae

The pwd command may not seem very useful when the CWD is being displayed in the prompt, but it is useful in some situations, for example, to copy the path to the clipboard for use in another command, or to share it with someone.

Use the pushd command to navigate into a folder, while remembering the CWD:

robin ~/Lesson1/data/taxaceae $ pushd taxus/baccata/

~/Lesson1/data/taxaceae/taxus/baccata ~/Lesson1/data/taxaceae

Use it once again, saving this location to the stack too:

robin ~/Lesson1/data/taxaceae/taxus/baccata $ pushd ../sumatrana/

~/Lesson1/data/taxaceae/taxus/sumatrana ~/Lesson1/data/taxaceae/taxus/baccata ~/Lesson1/data/taxaceae

Using it yet again, now we have three folders on the stack:

robin ~/Lesson1/data/taxaceae/taxus/sumatrana $ pushd ../../../pinaceae/cedrus/deodara/