39,40 €
A comprehensive guide to programming with network sockets, implementing internet protocols, designing IoT devices, and much more with C
Key Features
Book Description
Network programming enables processes to communicate with each other over a computer network, but it is a complex task that requires programming with multiple libraries and protocols. With its support for third-party libraries and structured documentation, C is an ideal language to write network programs.
Complete with step-by-step explanations of essential concepts and practical examples, this C network programming book begins with the fundamentals of Internet Protocol, TCP, and UDP. You'll explore client-server and peer-to-peer models for information sharing and connectivity with remote computers. The book will also cover HTTP and HTTPS for communicating between your browser and website, and delve into hostname resolution with DNS, which is crucial to the functioning of the modern web. As you advance, you'll gain insights into asynchronous socket programming and streams, and explore debugging and error handling. Finally, you'll study network monitoring and implement security best practices.
By the end of this book, you'll have experience of working with client-server applications and be able to implement new network programs in C.
The code in this book is compatible with the older C99 version as well as the latest C18 and C++17 standards. You'll work with robust, reliable, and secure code that is portable across operating systems, including Winsock sockets for Windows and POSIX sockets for Linux and macOS.
What you will learn
Who this book is for
If you're a developer or a system administrator who wants to get started with network programming, this book is for you. Basic knowledge of C programming is assumed.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 518
Veröffentlichungsjahr: 2019
Copyright © 2019 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor:Richa TripathiAcquisition Editor:Shriram ShekharContent Development Editor:Digvijay BagulTechnical Editor:Abin SebastianCopy Editor: Safis EditingProject Coordinator:Prajakta NaikProofreader: Safis EditingIndexer:Tejal Daruwale SoniGraphics Coordinator:Tom ScariaProduction Coordinator:Aparna Bhagat
First published: May 2019
Production reference: 1100519
Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.
ISBN 978-1-78934-986-3
www.packtpub.com
Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Mapt is fully searchable
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Lewis Van Winkle is a software programming consultant, entrepreneur, and founder of a successful IoT company. He has over 20 years of programming experience after publishing his first successful software product at the age of 12. He has over 15 years of programming experience with the C programming language on a variety of operating systems and platforms. He is active in the open source community and has published several popular open source programs and libraries—many of them in C. Today, Lewis spends much of his time consulting, where he loves taking on difficult projects that other programmers have given up on. He specializes in network systems, financial systems, machine learning, and interoperation between different programming languages.
Daniele Lacamera is a software technologist and researcher with vast experience in software design and development on embedded systems for different industries. He is currently working as freelance software developer and trainer. He is a worldwide expert in TCP/IP and transport protocol design and optimization, with more than 20 academic publications on the topic. He supports free software by contributing to several projects, including the Linux kernel, and is involved within a number of communities and organizations that promote the use of free and open source software in the IoT.
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Title Page
Copyright and Credits
Hands-On Network Programming with C
Dedication
About Packt
Why subscribe?
Packt.com
Contributors
About the author
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Section 1 - Getting Started with Network Programming
Introducing Networks and Protocols
Technical requirements
The internet and C
OSI layer model
TCP/IP layer model
Data encapsulation
Internet Protocol
What is an address?
Domain names
Internet routing
Local networks and address translation
Subnetting and CIDR
Multicast, broadcast, and anycast
Port numbers
Clients and servers
Putting it together
What's your address?
Listing network adapters from C
Listing network adapters on Windows
Listing network adapters on Linux and macOS
Summary
Questions
Getting to Grips with Socket APIs
Technical requirements
What are sockets?
Socket setup
Two types of sockets
Socket functions
Anatomy of a socket program
TCP program flow
UDP program flow
Berkeley sockets versus Winsock sockets
Header files
Socket data type
Invalid sockets
Closing sockets
Error handling
Our first program
A motivating example
Making it networked
Working with IPv6
Supporting both IPv4 and IPv6
Networking with inetd
Summary
Questions
An In-Depth Overview of TCP Connections
Technical requirements
Multiplexing TCP connections
Polling non-blocking sockets
Forking and multithreading
The select() function
Synchronous multiplexing with select()
select() timeout
Iterating through an fd_set
select() on non-sockets
A TCP client
TCP client code
A TCP server
TCP server code
Building a chat room
Blocking on send()
TCP is a stream protocol
Summary
Questions
Establishing UDP Connections
Technical requirements
How UDP sockets differ
UDP client methods
UDP server methods
A first UDP client/server
A simple UDP server
A simple UDP client
A UDP server
Summary
Questions
Hostname Resolution and DNS
Technical requirements
How hostname resolution works
DNS record types
DNS security
Name/address translation functions
Using getaddrinfo()
Using getnameinfo()
Alternative functions
IP lookup example program
The DNS protocol
DNS message format
DNS message header format
Question format
Answer format
Endianness
A simple DNS query
A DNS query program
Printing a DNS message name
Printing a DNS message
Sending the query
Summary
Questions
Further reading
Section 2 - An Overview of Application Layer Protocols
Building a Simple Web Client
Technical requirements
The HTTP protocol
HTTP request types
HTTP request format
HTTP response format
HTTP response codes
Response body length
What's in a URL
Parsing a URL
Implementing a web client
HTTP POST requests
Encoding form data
File uploads
Summary
Questions
Further reading
Building a Simple Web Server
Technical requirements
The HTTP server
The server architecture
Content types
Returning Content-Type from a filename
Creating the server socket
Multiple connections buffering
get_client()
drop_client()
get_client_address()
wait_on_clients()
send_400()
send_404()
serve_resource()
The main loop
Security and robustness
Open source servers
Summary
Questions
Further reading
Making Your Program Send Email
Technical requirements
Email servers
SMTP security
Finding an email server
SMTP dialog
The format of an email
A simple SMTP client program
Enhanced emails
Email file attachments
Spam-blocking pitfalls
Summary
Questions
Further reading
Section 3 - Understanding Encrypted Protocols and OpenSSL
Loading Secure Web Pages with HTTPS and OpenSSL
Technical requirements
HTTPS overview
Encryption basics
Symmetric ciphers
Asymmetric ciphers
How TLS uses ciphers
The TLS protocol
Certificates
Server name identification
OpenSSL
Encrypted sockets with OpenSSL
Certificates
A simple HTTPS client
Other examples
Summary
Questions
Further reading
Implementing a Secure Web Server
Technical requirements
HTTPS and OpenSSL summary
Certificates
Self-signed certificates with OpenSSL
HTTPS server with OpenSSL
Time server example
A full HTTPS server
HTTPS server challenges
OpenSSL alternatives
Alternatives to TLS
Summary
Questions
Further reading
Establishing SSH Connections with libssh
Technical requirements
The SSH protocol
libssh
Testing out libssh
Establishing a connection
SSH authentication
Server authentication
Client authentication
Executing a remote command
Downloading a file
Summary
Questions
Further reading
Section 4 - Odds and Ends
Network Monitoring and Security
Technical requirements
The purpose of network monitoring
Testing reachability
Checking a route
How traceroute works
Raw sockets
Checking local connections
Snooping on connections
Deep packet inspection
Capturing all network traffic
Network security
Application security and safety
Network-testing etiquette
Summary
Questions
Further reading
Socket Programming Tips and Pitfalls
Technical requirements
Error handling
Obtaining error descriptions
TCP socket tips
Timeout on connect()
TCP flow control and avoiding deadlock
Congestion control
The Nagle algorithm
Delayed acknowledgment
Connection tear-down
The shutdown() function
Preventing address-in-use errors
Sending to a disconnected peer
Socket's local address
Multiplexing with a large number of sockets
Summary
Questions
Web Programming for the Internet of Things
Technical requirements
What is the IoT?
Connectivity options
Wi-Fi
Ethernet
Cellular
Bluetooth
IEEE 802.15.4 WPANs
Hardware choices
Single-board computers
Microcontrollers
FPGAs
External transceivers and modems
IoT protocols
Firmware updates
Ethics of IoT
Privacy and data collection
End-of-life planning
Security
Summary
Questions
Answers to Questions
Chapter 1, Introducing Networks and Protocols
Chapter 2, Getting to Grips with Socket APIs
Chapter 3, An In-Depth Overview of TCP Connections
Chapter 4, Establishing UDP Connections
Chapter 5, Hostname Resolution and DNS
Chapter 6, Building a Simple Web Client
Chapter 7, Building a Simple Web Server
Chapter 8, Making Your Program Send Email
Chapter 9, Loading Secure Web Pages with HTTPS and OpenSSL
Chapter 10, Implementing a Secure Web Server
Chapter 11, Establishing SSH Connections with libssh
Chapter 12, Network Monitoring and Security
Chapter 13, Socket Programming Tips and Pitfalls
Chapter 14, Web Programming for the Internet of Things
Setting Up Your C Compiler on Windows
Installing MinGW GCC
Installing Git
Installing OpenSSL
Installing libssh
Alternatives
Setting Up Your C Compiler on Linux
Installing GCC
Installing Git
Installing OpenSSL
Installing libssh
Setting Up Your C Compiler on macOS
Installing Homebrew and the C compiler
Installing OpenSSL
Installing libssh
Example Programs
Code license
Code included with this book
Chapter 1 – Introducing Networks and Protocols
Chapter 2 – Getting to Grips with Socket APIs
Chapter 3 – An In-Depth Overview of TCP Connections
Chapter 4 – Establishing UDP Connections
Chapter 5 – Hostname Resolution and DNS
Chapter 6 – Building a Simple Web Client
Chapter 7 – Building a Simple Web Server
Chapter 8 – Making Your Program Send Email
Chapter 9 – Loading Secure Web Pages with HTTPS and OpenSSL
Chapter 10 – Implementing a Secure Web Server
Chapter 11 – Establishing SSH Connections with libssh
Chapter 12 – Network Monitoring and Security
Chapter 13 – Socket Programming Tips and Pitfalls
Chapter 14 – Web Programming for the Internet of Things
Other Book You May Enjoy
Leave a review - let other readers know what you think
Packt first contacted me about writing this book nearly a year ago. It's been a long journey, harder than I anticipated at times, and I've learned a lot. The book you hold now is the culmination of many long days, and I'm proud to finally present it.
I think C is a beautiful programming language. No other language in everyday use gets you as close to the machine as C does. I've used C to program 8-bit microcontrollers with only 16 bytes of RAM, just the same as I've used it to program modern desktops with multi-core, multi-GHz processors. It's truly remarkable that C works efficiently in both contexts.
Network programming is a fun topic, but it's also a very deep one; a lot is going on at many levels. Some programming languages hide these abstractions. In the Python programming language, for example, you can download an entire web page using only one line of code. This isn't the case in C! In C, if you want to download a web page, you have to know how everything works. You need to know sockets, you need to know Transfer Control Protocol (TCP), and you need to know HTTP. In C network programming, nothing is hidden.
C is a great language to learn network programming in. This is not only because we get to see all the details, but also because the popular operating systems all use kernels written in C. No other language gives you the same first-class access as C does. In C, everything is under your control – you can lay out your data structures exactly how you want, manage memory precisely as you please, and even shoot yourself in the foot just the way you want.
When I first began writing this book, I surveyed other resources related to learning network programming with C. I found much misinformation – not only on the web, but even in print. There is a lot of C networking code that is done wrong. Internet tutorials about C sockets often use deprecated functions and ignore memory safety completely. When it comes to network programming, you can't take the it works so it's good enough programming-by-coincidence approach. You have to use reasoning.
In this book, I take care to approach network programming in a modern and safe way. The example programs are carefully designed to work with both IPv4 and IPv6, and they are all written in a portable, operating system-independent way, whenever possible. Wherever there is an opportunity for memory errors, I try to take notice and point out these concerns. Security is too often left as an afterthought. I believe security is important, and it should be planned in the system from the beginning. Therefore, in addition to teaching network basics, this book spends a lot of time working with secure protocols, such as TLS.
I hope you enjoy reading this book as much as I enjoyed writing it.
This book is for the C or C++ programmer who wants to add networking features to their software. It is also designed for the student or professional who simply wants to learn about network programming and common network protocols.
It is assumed that the reader already has some familiarity with the C programming language. This includes a basic proficiency with pointers, basic data structures, and manual memory management.
Chapter 1, Introducing Networks and Protocols, introduces the important concepts related to networking. This chapter includes example programs to determine your IP address pragmatically.
Chapter 2, Getting to Grips with Socket APIs, introduces socket programming APIs and has you build your first networked program—a tiny web server.
Chapter 3, An In-Depth Overview of TCP Connections, focuses on programming TCP sockets. In this chapter, example programs are developed for both the client and server sides.
Chapter 4, Establishing UDP Connections, covers programming with User Datagram Protocol (UDP) sockets.
Chapter 5, Hostname Resolution and DNS, explains how hostnames are translated into IP addresses. In this chapter, we build an example program to perform manual DNS query lookups using UDP.
Chapter 6, Building a Simple Web Client, introduces HTTP—the protocol that powers websites. We dive right in and build an HTTP client in C.
Chapter 7, Building a Simple Web Server, describes how to construct a fully functional web server in C. This program is able to serve a static website to any modern web browser.
Chapter 8, Making Your Program Send Email, describes Simple Mail Transfer Protocol (SMTP)—the protocol that is powering email. In this chapter, we develop a program that can send email over the internet.
Chapter 9, Loading Secure Web Pages with HTTPS and OpenSSL, explores TLS—the protocol that secures web pages. In this chapter, we develop an HTTPS client that is capable of downloading web pages securely.
Chapter 10, Implementing a Secure Web Server, continues on the security theme and explores the construction of a secure HTTPS web server.
Chapter 11, Establishing SSH Connections with libssh, continues with the secure protocol theme. The use of Secure Shell (SSH) is covered to connect to a remote server, execute commands, and download files securely.
Chapter 12, Network Monitoring and Security, discusses the tools and techniques used to test network functionality, troubleshoot problems, and eavesdrop on insecure communication protocols.
Chapter 13, Socket Programming Tips and Pitfalls, goes into detail about TCP and addresses many important edge cases that appear in socket programming. The techniques covered are invaluable for creating robust network programs.
Chapter 14, Web Programming for the Internet of Things, gives an overview of the design and programming for Internet of Things (IoT) applications.
Appendix A, Answers to Questions, provides answers to the comprehension questions given at the end of each chapter.
Appendix B, Setting Up Your C Compiler on Windows, gives instructions for setting up a development environment on Windows that is needed for compiling all of the example programs in this book.
Appendix C, Setting Up Your C Compiler on Linux, provides the setup instructions for preparing your Linux computer to be capable of compiling all of the example programs in this book.
Appendix D, Setting Up Your C Compiler on macOS, gives step-by-step instructions for configuring your macOS system to be capable of compiling all of the example programs in this book.
Appendix E, Example Programs, lists each example program, by chapter, included in this book's code repository.
The reader is expected to be proficient in the C programming language. This includes a familiarity with memory management, the use of pointers, and basic data structures.
A Windows, Linux, or macOS development machine is recommended; you can refer to the appendices for setup instructions.
This book takes a hands-on approach to learning and includes 44 example programs. Working through these examples as you read the book will help enforce the concepts.
The code for this book is released under the MIT open source license. The reader is encouraged to use, modify, improve, and even publish their changes to these example programs.
You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packt.com/support and register to have the files emailed directly to you.
The code bundle for the book is also publicly hosted on GitHub at https://github.com/codeplea/hands-on-network-programming-with-c. In case there's an update to the code, it will be updated on that GitHub repository. Each chapter that introduces example programs begins with the commands needed to download the book's code.
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/9781789349863_ColorImages.pdf.
There are a number of text conventions used throughout this book.
CodeInText: Indicates code words in text, variable names, function names, directory names, filenames, file extensions, pathnames, URLs, and user input. Here is an example: "Use the select() function to wait for network data."
A block of code is set as follows:
/* example program */#include <stdio.h>int main() { printf("Hello World!\n"); return 0;}
Any command-line input or output is written as follows:
gcc hello.c -o hello
./hello
Bold: Indicates a new term, an important word, or words that you see on screen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Select System info from the Administration panel."
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packt.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in, and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packt.com.
This section will get the reader up and running with the basics of networking, the relevant network protocols, and basic socket programming.
The following chapters are in this section:
Chapter 1, An Introduction to Networks and Protocols
Chapter 2, Getting to Grips with Socket APIs
Chapter 3, An In-Depth Overview of TCP Connections
Chapter 4, Establishing UDP Connections
Chapter 5, Hostname Resolution and DNS
In this chapter, we will review the fundamentals of computer networking. We'll look at abstract models that attempt to explain the main concerns of networking, and we'll explain the operation of the primary network protocol, the Internet Protocol. We'll look at address families and end with writing programs to list your computer's local IP addresses.
The following topics are covered in this chapter:
Network programming and C
OSI layer model
TCP/IP reference model
The Internet Protocol
IPv4 addresses and IPv6 addresses
Domain names
Internet protocol routing
Network address translation
The client-server paradigm
Listing your IP addresses programmatically from C
Most of this chapter focuses on theory and concepts. However, we do introduce some sample programs near the end. To compile these programs, you will need a good C compiler. We recommend MinGW on Windows and GCC on Linux and macOS. See Appendix B, Setting Up Your C Compiler On Windows, Appendix C, Setting Up Your C Compiler On Linux, and Appendix D, Setting Up Your C Compiler On macOS, for compiler setup.
The code for this book can be found at: https://github.com/codeplea/Hands-On-Network-Programming-with-C.
From the command line, you can download the code for this chapter with the following command:
git clone https://github.com/codeplea/Hands-On-Network-Programming-with-C
cd Hands-On-Network-Programming-with-C/chap01
On Windows, using MinGW, you can use the following command to compile and run code:
gcc win_list.c -o win_list.exe -liphlpapi -lws2_32
win_list
On Linux and macOS, you can use the following command:
gcc unix_list.c -o unix_list
./unix_list
Today, the internet needs no introduction. Certainly, millions of desktops, laptops, routers, and servers are connected to the internet and have been for decades. However, billions of additional devices are now connected as well—mobile phones, tablets, gaming systems, vehicles, refrigerators, television sets, industrial machinery, surveillance systems, doorbells, and even light bulbs. The new Internet of Things (IoT) trend has people rushing to connect even more unlikely devices every day.
Over 20 billion devices are estimated to be connected to the internet now. These devices use a wide variety of hardware. They connect over an Ethernet connection, Wi-Fi, cellular, a phone line, fiber optics, and other media, but they likely have one thing in common; they likely use C.
The use of the C programming language is ubiquitous. Almost every network stack is programmed in C. This is true for Windows, Linux, and macOS. If your mobile phone uses Android or iOS, then even though the apps for these were programmed in a different language (Java and Objective C), the kernel and networking code was written in C. It is very likely that the network routers that your internet data goes through are programmed in C. Even if the user interface and higher-level functions of your modem or router are programmed in another language, the networking drivers are still probably implemented in C.
Networking encompasses concerns at many different abstraction levels. The concerns your web browser has with formatting a web page are much different than the concerns your router has with forwarding network packets. For this reason, it is useful to have a theoretical model that helps us to understand communications at these different levels of abstraction. Let's look at these models now.
It's clear that if all of the disparate devices composing the internet are going to communicate seamlessly, there must be agreed-upon standards that define their communications. These standards are called protocols. Protocols define everything from the voltage levels on an Ethernet cable to how a JPEG image is compressed on a web page. It's clear that, when we talk about the voltage on an Ethernet cable, we are at a much different level of abstraction compared to talking about the JPEG image format. If you're programming a website, you don't want to think about Ethernet cables or Wi-Fi frequencies. Likewise, if you're programming an internet router, you don't want to have to worry about how JPEG images are compressed. For this reason, we break the problem down into many smaller pieces.
One common method of breaking down the problem is to place levels of concern into layers. Each layer then provides services for the layer on top of it, and each upper layer can rely on the layers underneath it without concern for how they work.
The most popular layer system for networking is called the Open Systems Interconnection model (OSI model). It was standardized in 1977 and is published as ISO 7498. It has seven layers:
Let's understand these layers one by one:
Physical
(1): This is the level of physical communication in the real world. At this level, we have specifications for things such as the voltage levels on an Ethernet cable, what each pin on a connector is for, the radio frequency of Wi-Fi, and the light flashes over an optic fiber.
Data Link
(2): This level builds on the physical layer. It deals with protocols for directly communicating between two nodes. It defines how a direct message between nodes starts and ends (framing), error detection and correction, and flow control.
Network layer
(3): The network layer provides the methods to transmit data sequences (called packets) between nodes in different networks. It provides methods to route packets from one node to another (without a direct physical connection) by transferring through many intermediate nodes. This is the layer that the Internet Protocol is defined on, which we will go into in some depth later.
Transport layer
(4): At this layer, we have methods to reliably deliver variable length data between hosts. These methods deal with splitting up data, recombining it, ensuring data arrives in order, and so on. The
Transmission Control Protocol
(
TCP
) and
User Datagram Protocol
(
UDP
) are commonly said to exist on this layer.
Session layer
(5): This layer builds on the transport layer by adding methods to establish, checkpoint, suspend, resume, and terminate dialogs.
Presentation layer
(6): This is the lowest layer at which data structure and presentation for an application are defined. Concerns such as data encoding, serialization, and encryption are handled here.
Application layer
(7): The applications that the user interfaces with (for example, web browsers and email clients) exist here. These applications make use of the services provided by the six lower layers.
In the OSI model, an application, such as a web browser, exists in the application layer (layer 7). A protocol from this layer, such as HTTP used to transmit web pages, doesn't have to concern itself with how the data is being transmitted. It can rely on services provided by the layer underneath it to effectively transmit data. This is illustrated in the following diagram:
It should be noted that chunks of data are often referred to by different names depending on the OSI layer they're on. A data unit on layer 2 is called a frame, since layer 2 is responsible for framing messages. A data unit on layer 3 is referred to as a packet, while a data unit on layer 4 is a segment if it is part of a TCP connection or a datagram if it is a UDP message.
In this book, we often use the term packet as a generic term to refer to a data unit on any layer. However, segment will only be used in the context of a TCP connection, and datagram will only refer to UDP datagrams.
As we will see in the next section, the OSI model doesn't fit precisely with the common protocols in use today. However, it is still a handy model to explain networking concerns, and it is still in widespread use for that purpose today.
The TCP/IP protocol suite is the most common network communication model in use today. The TCP/IP reference model differs a bit from the OSI model, as it has only four layers instead of seven.
The following diagram illustrates how the four layers of the TCP/IP model line up to the seven layers of the OSI model:
Notably, the TCP/IP model doesn't match up exactly with the layers in the OSI model. That's OK. In both models, the same functions are performed; they are just divided differently.
The TCP/IP reference model was developed after the TCP/IP protocol was already in common use. It differs from the OSI model by subscribing a less rigid, although still hierarchical, model. For this reason, the OSI model is sometimes better for understanding and reasoning about networking concerns, but the TCP/IP model reflects a more realistic view of how networking is commonly implemented today.
The four layers of the TCP/IP model are as follows:
Network Access layer
(1): On this layer, physical connections and data framing happen. Sending an Ethernet or Wi-Fi packet are examples of layer 1 concerns.
Internet layer
(2): This layer deals with the concerns of addressing packets and routing them over multiple interconnection networks. It's at this layer that an IP address is defined.
Host-to-Host layer
(3): The host-to-host layer provides two protocols, TCP and UDP, which we will discuss in the next few chapters. These protocols address concerns such as data order, data segmentation, network congestion, and error correction.
Process/Application layer
(4): The process/application layer is where protocols such as HTTP, SMTP, and FTP are implemented. Most of the programs that feature in this book could be considered to take place on this layer while consuming functionality provided by our operating system's implementation of the lower layers.
Regardless of your chosen abstraction model, real-world protocols do work at many levels. Lower levels are responsible for handling data for the higher levels. These lower-level data structures must, therefore, encapsulate data from the higher levels. Let's look at encapsulating data now.
The advantage of these abstractions is that, when programming an application, we only need to consider the highest-level protocol. For example, a web browser needs only to implement the protocols dealing specifically with websites—HTTP, HTML, CSS, and so on. It does not need to bother with implementing TCP/IP, and it certainly doesn't have to understand how an Ethernet or Wi-Fi packet is encoded. It can rely on ready-made implementations of the lower layers for these tasks. These implementations are provided by the operating system (for example, Windows, Linux, and macOS). When communicating over a network, data must be processed down through the layers at the sender and up again through the layers at the receiver. For example, if we have a web server, Host A, which is transmitting a web page to the receiver, Host B, it may look like this:
The web page contains a few paragraphs of text, but the web server doesn't only send the text by itself. For the text to be rendered correctly, it must be encoded in an HTML structure:
In some cases, the text is already preformatted into HTML and saved that way but, in this example, we are considering a web application that dynamically generates the HTML, which is the most common paradigm for dynamic web pages. As the text cannot be transmitted directly, neither can the HTML. It instead must be transmitted as part of an HTTP response. The web server does this by applying the appropriate HTTP response header to the HTML:
The HTTP is transmitted as part of a TCP session. This isn't done explicitly by the web server, but is taken care of by the operating system's TCP/IP stack:
The TCP packet is routed by an IP packet:
This is transmitted over the wire in an Ethernet packet (or another protocol):
Luckily for us, the lower-level concerns are handled automatically when we use the socket APIs for network programming. It is still useful to know what happens behind the scenes. Without this knowledge, dealing with failures or optimizing for performance is difficult if not impossible.
With some of the theory out of the way, let's dive into the actual protocols powering modern networking.
The Internet Protocol can only route packets to an IP address, not a name. So, if you try to connect to a website, such as example.com, your system must first resolve that domain name, example.com, into an IP address for the server that hosts that website. This is done by connecting to a Domain Name System (DNS) server. You connect to a domain name server by knowing in advance its IP address. The IP address for a domain name server is usually assigned by your ISP.
Many other domain name servers are made publicly available by different organizations. Here are a few free and public DNS servers:
DNS Provider
IPv4 Addresses
IPv6 Addresses
Cloudflare 1.1.1.1
1.1.1.1
2606:4700:4700::1111
1.0.0.1
2606:4700:4700::1001
FreeDNS
37.235.1.174
37.235.1.177
Google Public DNS
8.8.8.8
2001:4860:4860::8888
8.8.4.4
2001:4860:4860::8844
OpenDNS
208.67.222.222
2620:0:ccc::2
208.67.220.220
2620:0:ccd::2
To resolve a hostname, your computer sends a UDP message to your domain name server and asks it for an AAAA-type record for the domain you're trying to resolve. If this record exists, an IPv6 address is returned. You can then connect to a server at that address to load the website. If no AAAA record exists, then your computer queries the server again, but asks for an A record. If this record exists, you will receive an IPv4 address for the server. In many cases, a site will publish an A record and an AAAA record that route to the same server.
It is also possible, and common, for multiple records of the same type to exist, each pointing to a different address. This is useful for redundancy in the case where multiple servers can provide the same service.
We will see a lot more about DNS queries in Chapter 5, Hostname Resolution and DNS.
Now that we have a basic understanding of IP addresses and names, let's look into detail of how IP packets are routed over the internet.
If all networks contained only a maximum of only two devices, then there would be no need for routing. Computer A would just send its data directly over the wire, and computer B would receive it as the only possibility:
The internet today has an estimated 20 billion devices connected. When you make a connection over the internet, your data first transmits to your local router. From there, it is transmitted to another router, which is connected to another router, and so on. Eventually, your data reaches a router that is connected to the receiving device, at which point, the data has reached its destination:
Imagine that each router in the preceding diagram is connected to tens, hundreds, or even thousands of other routers and systems. It's an amazing feat that IP can discover the correct path and deliver traffic seamlessly.
Windows includes a utility, tracert, which lists the routers between your system and the destination system.
Here is an example of using the tracert command on Windows 10 to trace the route to example.com:
As you can see from the example, there are 11 hops between our system and the destination system (example.com, 93.184.216.34). The IP addresses are listed for many of these intermediate routers, but a few are missing with the Request timed out message. This usually means that the system in question doesn't support the part of the Internet Control Message Protocol (ICMP) protocol needed. It's not unusual to see a few such systems when running tracert.
In Unix-based systems, the utility to trace routes is called traceroute. You would use it like traceroute example.com, for example, but the information obtained is essentially the same.
More information on tracert and traceroute can be found in Chapter 12, Network Monitoring and Security.
Sometimes, when IP packets are transferred between networks, their addresses must be translated. This is especially common when using IPv4. Let's look at the mechanism for this next.
It's common for households and organizations to have small Local Area Networks (LANs). As mentioned previously, there are IPv4 addresses ranges reserved for use in these small local networks.
These reserved private ranges are as follows:
10.0.0.0
to
10.255.255.255
172.16.0.0
to
172.31.255.255
192.168.0.0
to
192.168.255.255
When a packet originates from a device on an IPv4 local network, it must undergoNetwork Address Translation(NAT) before being routed on the internet. A router that implements NAT remembers which local address a connection is established from.
The devices on the same LAN can directly address one another by their local address. However, any traffic communicated to the internet must undergo address translation by the router. The router does this by modifying the source IP address from the original private LAN IP address to its public internet IP address:
Likewise, when the router receives the return communication, it must modify the destination address from its public IP to the private IP of the original sender. It knows the private IP address because it was stored in memory after the first outgoing packet:
Network address translation can be more complicated than it first appears. In addition to modifying the source IP address in the packet, it must also update the checksums in the packet. Otherwise, the packet would be detected as containing errors and discarded by the next router. The NAT router must also remember which private IP address sent the packet in order to route the reply. Without remembering the translation address, the NAT router wouldn't know where to send the reply to on the private network.
NATs will also modify the packet data in some cases. For example, in the File Transfer Protocol (FTP), some connection information is sent as part of the packet's data. In these cases, the NAT router will look at the packet's data in order to know how to forward future incoming packets. IPv6 largely avoids the need for NAT, as it is possible (and common) for each device to have its own publicly-addressable address.
You may be wondering how a router knows whether a message is locally deliverable or whether it must be forwarded. This is done using a netmask, subnet mask, or CIDR.
IP addresses can be split into parts. The most significant bits are used to identify the network or subnetwork, and the least significant bits are used to identify the specific device on the network.
This is similar to how your home address can be split into parts. Your home address includes a house number, a street name, and a city. The city is analogous to the network part, the street name could be the subnetwork part, and your house number is the device part.
IPv4 traditionally uses a mask notation to identify the IP address parts. For example, consider a router on the 10.0.0.0 network with a subnet mask of 255.255.255.0. This router can take any incoming packet and perform a bitwise AND operation with the subnet mask to determine whether the packet belongs on the local subnet or needs to be forwarded on. For example, this router receives a packet to be delivered to 10.0.0.105. It does a bitwise AND operation on this address with the subnet mask of 255.255.255.0, which produces 10.0.0.0. That matches the subnet of the router, so the traffic is local. If, instead, we consider a packet destined for 10.0.15.22, the result of the bitwise AND with the subnet mask is 10.0.15.0. This address doesn't match the subnet the router is on, and so it must be forwarded.
IPv6 uses CIDR. Networks and subnetworks are specified using the CIDR notation we described earlier. For example, if the IPv6 subnet is /112, then the router knows that any address that matches on the first 112 bits is on the local subnet.
So far, we've covered only routing with one sender and one receiver. While this is the most common situation, let's consider alternative cases too.
When a packet is routed from one sender to one receiver, it uses unicast addressing. This is the simplest and most common type of addressing. All of the protocols we deal with in this book use unicast addressing.
Broadcast addressing allows a single sender to address a packet to all recipients simultaneously. It is typically used to deliver a packet to every receiver on an entire subnet.
If a broadcast is a one-to-all communication, then multicast is a one-to-many communication. Multicast involves some group management, and a message is addressed and delivered to members of a group.
Anycast addressed packets are used to deliver a message to one recipient when you don't care who that recipient is. This is useful if you have several servers that provide the same functionality, and you simply want one of them (you don't care which) to handle your request.
IPv4 and lower network levels support local broadcast addressing. IPv4 provides some optional (but commonly implemented) support for multicasting. IPv6 mandates multicasting support while providing additional features over IPv4's multicasting. Though IPv6 is not considered to broadcast, its multicasting functionality can essentially emulate it.
It's worth noting that these alternative addressing methods don't generally work over the broader internet. Imagine if one peer was able to broadcast a packet to every connected internet device. It would be a mess!
If you can use IP multicasting on your local network, though, it is worthwhile to implement it. Sending one IP level multicast conserves bandwidth compared to sending the same unicast message multiple times.
However, multicasting is often done at the application level. That is, when the application wants to deliver the same message to several recipients, it sends the message multiple times – once to each recipient. In Chapter 3, An In-Depth Overview of TCP Connections, we build a chat room. This chat room could be said to use application-level multicasting, but it does not take advantage of IP multicasting.
We've covered how messages are routed through a network. Now, let's see how a message knows which application is responsible for it once it arrives at a specific system.
An IP address alone isn't quite enough. We need port numbers. To return to the telephone analogy, if IP addresses are phone numbers, then port numbers are like phone extensions.
Generally, an IP address gets a packet routed to a specific system, but a port number is used to route the packet to a specific application on that system.
For example, on your system, you may be running multiple web browsers, an email client, and a video-conferencing client. When your computer receives a TCP segment or UDP datagram, your operating system looks at the destination port number in that packet. That port number is used to look up which application should handle it.
Port numbers are stored as unsigned 16-bit integers. This means that they are between 0 and 65,535 inclusive.
Some port numbers for common protocols are as follows:
Port Number
Protocol
20
,
21
TCP
File Transfer Protocol
(
FTP
)
22
TCP
Secure Shell
(
SSH
)
Chapter 11
,
Establishing SSH Connections with libssh
23
TCP
Telnet
25
TCP
Simple Mail Transfer Protocol
(
SMTP
)
Chapter 8
,
Making Your Program Send Email
53
UDP
Domain Name System
(
DNS
)
Chapter 5
,
Hostname Resolution and DNS
80
TCP
Hypertext Transfer Protocol
(
HTTP
)
Chapter 6, Building a Simple Web Client
Chapter 7, Building a Simple Web Server
110
TCP
Post Office Protocol, Version 3
(
POP3
)
143
TCP
Internet Message Access Protocol
(
IMAP
)
194
TCP
Internet Relay Chat
(
IRC
)
443
TCP
HTTP over TLS/SSL
(
HTTPS
)
Chapter 9, Loading Secure Web Pages with HTTPS and OpenSSLChapter 10, Implementing a Secure Web Server
993
TCP
IMAP over TLS/SSL
(
IMAPS
)
995
TCP
POP3 over TLS/SSL
(
POP3S
)
Each of these listed port numbers is assigned by the Internet Assigned Numbers Authority (IANA). They are responsible for the official assignments of port numbers for specific protocols. Unofficial port usage is very common for applications implementing custom protocols. In this case, the application should try to choose a port number that is not in common use to avoid conflict.
In the telephone analogy, a call must be initiated first by one party. The initiating party dials the number for the receiving party, and the receiving party answers.
This is also a common paradigm in networking called the client-server model. In this model, a server listens for connections. The client, knowing the address and port number that the server is listening on, establishes the connection by sending the first packet.
For example, the web server at example.com listens on port 80 (HTTP) and port 443 (HTTPS). A web browser (client) must establish the connection by sending the first packet to the web server address and port.
A socket is one end-point of a communication link between systems. It's an abstraction in which your application can send and receive data over the network, in much the same way that your application can read and write to a file using a file handle.
An open socket is uniquely defined by a 5-tuple consisting of the following:
Local IP address
Local port
Remote IP address
Remote port
Protocol (UDP or TCP)
This 5-tuple is important, as it is how your operating system knows which application is responsible for any packets received. For example, if you use two web browsers to establish two simultaneous connections to example.com on port 80, then your operating system keeps the connections separate by looking at the local IP address, local port, remote IP address, remote port, and protocol. In this case, the local IP addresses, remote IP addresses, remote port (80), and protocol (TCP) are identical.
The deciding factor then is the local port (also called the ephemeral port), which will have been chosen to be different by the operating system for connection. This 5-tuple is also important to understand how NAT works. A private network may have many systems accessing the same outside resource, and the router NAT must store this five tuple for each connection in order to know how to route received packets back into the private network.
You can find your IP address using the ipconfig command on Windows, or the ifconfig command on Unix-based systems (such as Linux and macOS).
Using the ipconfig command from Windows PowerShell looks like this:
In this example, you can find that the IPv4 address is listed under Ethernet adapter Ethernet0. Your system may have more network adapters, and each will have its own IP address. We can tell that this computer is on a local network because the IP address, 192.168.182.133, is in the private IP address range.
On Unix-based systems, we use either the ifconfig or ip addr commands. The ifconfig command is the old way and is now deprecated on some systems. The ip addr command is the new way, but not all systems support it yet.
Using the ifconfig command from a macOS terminal looks like this:
The IPv4 address is listed next to inet. In this case, we can see that it's 192.168.182.128. Again, we see that this computer is on a local network because of the IP address range. The same adapter has an IPv6 address listed next to inet6.
The following screenshot shows using the ip addr command on Ubuntu Linux:
The preceding screenshot shows the local IPv4 address as 192.168.182.145. We can also see that the link-local IPv6 address is fe80::df60:954e:211:7ff0.
These commands, ifconfig, ip addr, and ipconfig, show the IP address or addresses for each adapter on your computer. You may have several. If you are on a local network, the IP addresses you see will be your local private network IP addresses.
If you are behind a NAT, there is often no good way to know your public IP address. Usually, the only resort is to contact an internet server that provides an API that informs you of your IP address.
A few free and public APIs for this are as follows:
http://api.ipify.org/
http://helloacm.com/api/what-is-my-ip-address/
http://icanhazip.com/
http://ifconfig.me/ip
You can test out these APIs in a web browser:
Each of these listed web pages should return your public IP address and not much else. These sites are useful for when you need to determine your public IP address from behind an NAT programmatically. We look at writing a small HTTP client capable of downloading these web pages and others in Chapter 6, Building a Simple Web Client.
Now that we've seen the built-in utilities for determining our local IP addresses, let's next look at how to accomplish this from C.
Sometimes, it is useful for your C programs to know what your local address is. For most of this book, we are able to write code that works both on Windows and Unix-based (Linux and macOS) systems. However, the API for listing local addresses is very different between systems. For this reason, we split this program into two: one for Windows and one for Unix-based systems.
We will address the Windows case first.
