33,99 €
Leverage the power of Git to smooth out the development cycle Professional Git takes a professional approach to learning this massively popular software development tool, and provides an up-to-date guide for new users. More than just a development manual, this book helps you get into the Git mindset--extensive discussion of corollaries to traditional systems as well as considerations unique to Git help you draw upon existing skills while looking out--and planning for--the differences. Connected labs and exercises are interspersed at key points to reinforce important concepts and deepen your understanding, and a focus on the practical goes beyond technical tutorials to help you integrate the Git model into your real-world workflow. Git greatly simplifies the software development cycle, enabling users to create, use, and switch between versions as easily as you switch between files. This book shows you how to harness that power and flexibility to streamline your development cycle. * Understand the basic Git model and overall workflow * Learn the Git versions of common source management concepts and commands * Track changes, work with branches, and take advantage of Git's full functionality * Avoid trip-ups and missteps common to new users Git works with the most popular software development tools and is used by almost all of the major technology companies. More than 40 percent of software developers use it as their primary source control tool, and that number continues to grow; the ability to work effectively with Git is rapidly approaching must-have status, and Professional Git is the comprehensive guide you need to get up to speed quickly.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 713
Veröffentlichungsjahr: 2016
Cover
Title Page
Introduction
How this Book is Unique
Target Audience
Structure and Content
Reader Value
Next Steps
PART I: UNDERSTANDING GIT CONCEPTS
Chapter 1: What Is Git?
History of Git
Industry-Standard Tooling
The Git Ecosystem
Git's Advantages and Challenges
Summary
Chapter 2: Key Concepts
Design Concepts: User-Facing
Design Concepts: Internal
Repository Design Considerations
Summary
Chapter 3: The Git Promotion Model
The Levels of Git
Summary
Connected Lab 1: Installing Git
Installing Git for Windows
Steps
Installing Git on Mac OS X
Installing Git on Linux
PART II: USING GIT
Chapter 4: Configuration and Setup
Executing Commands in Git
Configuring Git
Initializing a Repository
Advanced Topics
Summary
Chapter 5: Getting Productive
Getting Help
The Multiple Repositories Model
Adding Content to Track—Add
Finalizing Changes—Commit
Putting It All Together
Advanced Topics
Summary
Connected Lab 2: Creating and Exploring a Git Repository and Managing Content
Prerequisites
Optional Advanced Deep-Dive into the Repository Structure
Steps
Chapter 6: Tracking Changes
Git Status
Git Diff
Summary
Connected Lab 3: Tracking Content through the File Status Life Cycle
Prerequisites
Steps
Chapter 7: Working with Changes over Time and Using Tags
The Log Command
Git Blame
Seeing History Visually
Tags
Undoing Changes in History
Advanced Topics
Summary
Connected Lab 4: Using Git History, Aliases, and Tags
Prerequisites
Steps
Chapter 8: Working with Local Branches
What Is a Branch?
Advanced Topics
Summary
Connected Lab 5: Working with Branches
Prerequisites
Steps
Chapter 9: Merging Content
The Basics of Merging
Dealing with Conflicts
Visual Merging
Advanced Topics
Summary
Connected Lab 6: Practicing with Merging
Prerequisites
Steps
Chapter 10: Supporting Files in Git
The Git Attributes File
The Git Ignore File
Summary
Chapter 11: Doing More with Git
Modifying the Layout of Files and Directories in Your Local Environment
Commands for Searching
Working with Patches and Archives for Changes
Commands for Cleaning Up
Advanced Topics
Summary
Connected Lab 7: Deleting, Renaming, and Stashing
Prerequisites
Steps
Chapter 12: Understanding Remotes—Branches and Operations
Remotes
Summary
Connected Lab 8: Setting Up a GitHub Account and Cloning a Repository
Prerequisites
Steps
Chapter 13: Understanding Remotes—Workflows for Changes
The Basic Conflict and Merge Resolution Workflow in Git
Hosted Repositories
Summary
Connected Lab 9: Using the Overall Workflow with a Remote Repository
Prerequisites
Steps
Chapter 14: Working with Trees and Modules in Git
Worktrees
Submodules
Subtrees
Summary
About Connected Labs 10–12
Connected Lab 10: Working with Worktrees
Prerequisites
Steps
Connected Lab 11: Working with Submodules
Prerequisites
Steps
Connected Lab 12: Working with Subtrees
Prerequisites
Steps
Chapter 15: Extending Git Functionality with Git Hooks
Installing Hooks
Updating Hooks
Common Hook Attributes
Hook Descriptions
Other Hooks
Hooks Quick Reference
Summary
End User License Agreement
iii
iv
v
vii
ix
xi
xxiii
xxiv
xxv
1
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
99
100
101
102
103
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
127
128
129
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
317
318
319
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
419
420
421
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
Table of Contents
Begin Reading
Chapter 1: What Is Git?
Figure 1.1 Example GitHub page
Figure 1.2 GitLab project screen
Figure 1.3 Examples of GUIs available for Git (from git-scm.org)
Figure 1.4 Example Gerrit screen
Chapter 2: Key Concepts
Figure 2.1 A traditional centralized version control model
Figure 2.2 A distributed version control model
Figure 2.3 Disconnected development
Figure 2.4 The delta storage model
Figure 2.5 The snapshot storage model
Figure 2.6 A representation of Git's packing behavior to optimize content size
Chapter 3: The Git Promotion Model
Figure 3.1 A simple dev-test-prod environment
Figure 3.2 The levels of a Git system
Figure 3.3 The local versus remote environments
Figure 3.4 Git in one picture
Chapter 4: Configuration and Setup
Figure 4.1 Understanding the scopes of Git configuration files
Figure 4.2 Tree listing of a .git directory (local repository)
Figure 4.3 Mapping files and directories to Git repositories
Chapter 5: Getting Productive
Figure 5.1 Abbreviated version of help invoked with the -h option
Figure 5.2 Git browser-based man page
Figure 5.3 Working with multiple repositories
Figure 5.4 Overlaying configuration files on your model
Figure 5.5 Where adding and staging fit in
Figure 5.6 An edit session for a hunk
Figure 5.7 Where commit fits in
Figure 5.8 The basic workflow for multiple commits
Figure 5.9 Workflow for an amended commit
Figure 5.10 The editor session for a commit message using a template file and the --verbose --verbose options
Chapter 6: Tracking Changes
Figure 6.1 Empty local environment levels.
Figure 6.2 File created in working directory
Figure 6.3 Version
a
of the file is staged.
Figure 6.4 Update made to working directory version
Figure 6.5 Version
b
staged
Figure 6.6 The file is committed.
Figure 6.7 Starting point for diffing—working directory clean
Figure 6.8 Workflow of git diff between working directory and Git (checking the staging area)
Figure 6.9 Workflow of git diff between working directory and Git (checking the local repository)
Figure 6.10 Local version updated to
b
Figure 6.11 Diff between modified local version and Git
Figure 6.12 Diffing further up the chain
Figure 6.13 Diffing from the working directory with a version in the staging area
Figure 6.14 Diffing starting at the staging area
Figure 6.15 Diffing directly against a SHA1 (HEAD)
Figure 6.16 Vimdiff
Figure 6.17 WinMerge
Figure 6.18 Meld
Figure 6.19 KDiff3
Chapter 7: Working with Changes over Time and Using Tags
Figure 7.1 Using the gitk tool to browse local history
Figure 7.2 Tagging a commit
Figure 7.3 Starting repository contents
Figure 7.4 Resetting back to an absolute SHA1
Figure 7.5 Resetting relative to a tag
Figure 7.6 Resetting for revert
Figure 7.7 Local environment after the revert
Chapter 8: Working with Local Branches
Figure 8.1 Progression of chain of commits
Figure 8.2 Your starting chain of commits
Figure 8.3 After the creation of a testing branch
Figure 8.4 After checking out the testing branch
Figure 8.5 The current branch pointer is moved to indicate that the newest commit is the latest content on that branch.
Figure 8.6 Local repository—active branch: master
Figure 8.7 Git checkout master
Figure 8.8 Git checkout testing
Figure 8.9 Git checkout master (again)
Figure 8.10 Local repository with two branches
Figure 8.11 After deleting the testing branch
Figure 8.12 The master-as-production model
Figure 8.13 The master-to-release model
Figure 8.14 The master-as-integration model
Figure 8.15 The parallel model
Figure 8.16 Repository before checkout of fc28c0d
Figure 8.17 Repository after checkout of fc28c0d
Figure 8.18 Repository state after the new commit
Figure 8.19 Repository after you switch back to feature1
Figure 8.20 After creating a new branch off of your commit
Figure 8.21 After a checkout of experimental
Figure 8.22 The two paths of your two branches
Chapter 9: Merging Content
Figure 9.1 Setup for the fast-forward example
Figure 9.2 The fast-forward merge
Figure 9.3 Setup for the three-way merge example—not eligible for fast-forward
Figure 9.4 The three points considered for the three-way merge
Figure 9.5 The three-way merge process
Figure 9.6 The new merge commit after the three-way merge
Figure 9.7 Setup for the rebase example
Figure 9.8 Identifying a common ancestor
Figure 9.9 Computing deltas from the source branch
Figure 9.10 Applying deltas on the destination tip
Figure 9.11 Completed rebase of a feature on master
Figure 9.12 Setup for the cherry-pick example
Figure 9.13 End result of the cherry-pick
Figure 9.14 The merge process in the local environment
Figure 9.15 Master branch with three topic branches
Figure 9.16 After a merge of the three topic branches
Figure 9.17 The earlier cherry-pick example
Figure 9.18 C5 cannot be cherry-picked due to a conflict.
Figure 9.19 The choices for options to pick one version
Figure 9.20 Completed cherry-pick with C5 from feature
Figure 9.21 After the octopus merge
Figure 9.22 Merging with vimdiff
Figure 9.23 Merging with WinMerge
Figure 9.24 Merging with Meld
Figure 9.25 Merging with KDiff3
Figure 9.26 Setup for an advanced rebase
Figure 9.27 Topic's chain of commits
Figure 9.28 Computing the deltas to rebase
Figure 9.29 Applying the deltas to master
Figure 9.30 The completed rebase
Figure 9.31 Topic merged into master
Figure 9.32 Beginning state of your branch
Figure 9.33 Temporary file created for scripting the rebase actions
Figure 9.34 Edited interactive rebase
to-do
script
Figure 9.35 Screen to enter commit message for squashed commits
Figure 9.36 Adding a new commit message for the squashed commits
Figure 9.37 Your chains of commits after the interactive rebase is completed
Chapter 10: Supporting Files in Git
Figure 10.1 The Git model with smudge and clean filters
Chapter 11: Doing More with Git
Figure 11.1 Local environment with an uncommitted change
Figure 11.2 After the initial stash
Figure 11.3 Another change in your local environment with an untracked file
Figure 11.4 After stashing, including the untracked file
Figure 11.5 Another change in your local environment
Figure 11.6 The third element on the queue
Figure 11.7 Queue and local environment after an apply and pop from the stash
Figure 11.8 Changing the format of a patch received in e-mail
Figure 11.9 Starting state for bisect
Figure 11.10 Checking for a good version
Figure 11.11 Initial bisect trial
Figure 11.12 Bisecting—the next steps
Figure 11.13 Narrowing in on the first bad commit
Figure 11.14 The first bad commit is found
Figure 11.15 gitk view of a bisect
Chapter 12: Understanding Remotes—Branches and Operations
Figure 12.1 Arrangement of local versus remote environments
Figure 12.2 Login access (top) versus SSH access (bottom)
Figure 12.3 Start and end of a cloning operation
Figure 12.4 A way to think about cloning multi-level paths
Figure 12.5 Initial changes in the local repository
Figure 12.6 After a push to the remote repository
Figure 12.7 Remote tracking branch created in the local repository
Figure 12.8 After a commit into the local repository
Figure 12.9 Before and after a fetch operation
Figure 12.10 The local repository before and after the merge
Figure 12.11 Before and after a pull operation
Chapter 13: Understanding Remotes—Workflows for Changes
Figure 13.1 File granularity corresponding to delta changes
Figure 13.2 Commits are a snapshot of files and directories.
Figure 13.3 Two users with the same cloned contents
Figure 13.4 User 1 successfully pushes their changes.
Figure 13.5 User 2 attempts to push their changes and is rejected.
Figure 13.6 User 2 pulls the latest changes to merge updates locally.
Figure 13.7 Merged content is pushed back into the remote.
Figure 13.8 Forking a repository
Figure 13.9 The typical Git lifecycle on a forked repository
Figure 13.10 Sending a pull request to the owner
Figure 13.11 Repository owner pulls changes.
Figure 13.12 A workflow model for making and incorporating changes
Chapter 14: Working with Trees and Modules in Git
Figure 14.1 Illustration of multiple working trees
Figure 14.2 Illustration of how submodules work
Figure 14.3 Illustration of a subtree layout
Chapter 3: The Git Promotion Model
Table 3.1 Core Commands for Moving Content between Levels in Git
Chapter 4: Configuration and Setup
Table 4.1 Components of a Git Command Line Invocation
Table 4.2 Porcelain Commands in Git
Table 4.3 Plumbing Commands in Git
Chapter 6: Tracking Changes
Table 6.1 Git Status Codes for Short Options
Chapter 10: Supporting Files in Git
Table 10.1 The File Scope for Git Attributes
Table 10.2 Options for Specifying Attributes
Table 10.3 Scopes and Precedence for Git Ignore Files
Chapter 12: Understanding Remotes—Branches and Operations
Table 12.1 Summarizing the Types of Branches in Git
Chapter 15: Extending Git Functionality with Git Hooks
Table 15.1 List of Git Hooks by Operation
Brent Laster
Welcome. If your job or interests involve designing, creating, or testing software, or managing any part of a software development lifecycle, chances are that you’ve heard of Git and, at some level, have tried to use and understand it. This book will help you reach that goal. To put it simply, Professional Git is intended to help you understand and use Git to get your job done, whether that job is a personal project or a professional requirement. In the process, it will also make Git part of your professional comfort zone. Throughout the book, I’ve provided the background and concepts that you need to know (and understand) to make sense of Git, while you learn how to interact with it.
This section will provide you with a quick introduction to the book. It will explain how this book is unique from other books about Git, the intended target audience, the book’s overall structure and content, and some of the value it offers you.
I encourage you to take a few minutes and read through this section. Then, you can dive into the material at your own pace, and build your skills and understanding of Git through the text and the included hands-on labs. Or, if you’d like to quickly see additional information about the range of content, you can browse the table of contents.
Thanks for taking a look at Professional Git.
While many books about Git are already on the market, most are aimed at providing the technical usage of the application as their major and singular goal. Professional Git will provide you with that, but it will also provide you with an understanding of Git in terms of concepts that you probably already know. As well, most books do not provide practical ways to integrate the concepts they describe. Learning is most effective when you have actual examples to work through so you can internalize the concepts and gain proficiency at your pace. Professional Git includes Connected Labs that you can work through to absorb what you’ve just read.
I’ve included simple, clear illustrations to help you visualize key ideas and workflows. I’ve also included Advanced Topics sections at the end of many chapters. These sections provide additional explanations of how to use some lesser-known features of Git as well as how to go beyond the standard Git features to gain extra value.
It is easy to experience a bad transition from another source management system to Git, if you don’t understand Git. To be most effective, you need to comprehend the Git model and workflow. You should also know what to watch out for as you make the transition and why it’s important to consider not only the commands and workflow, but also the structure and scope of its underlying repositories. I cover all of this in Professional Git.
This book is based on my years of training people on Git; these people worked at all levels and came from many different backgrounds—developers, testers, project managers, people managers, documentation specialists, and so on. I have presented the basic materials outlined in this book through many workshops at industry conferences and corporate training sessions. I’ve presented them at locations across the United States, as well as internationally. I’ve been successful in helping people to walk away with a newfound confidence in using Git.
I only make one assumption in this book: that you have experience with at least one source management system. It doesn’t matter which one: CVS, Subversion, Mercury—any will do. I just assume that you have a basic awareness of what a source management system does as well as fundamental concepts such as checking in and checking out code and branching. Beyond that, you do not require any prior knowledge or experience. And even if you have significant experience with Git or another system, you’ll find something of benefit here. In fact, if you’re reading this, then you probably fall into one of the following categories:
You are new to Git and know that you need to learn it.
You have used Git but have been trying to use it the same way you used your previous source control system.
You have used Git and feel that you know “just enough to be dangerous.”
You are getting by with Git, but really want to understand why it works the way it does and how to really use it as intended.
You work with, or manage, people who either use Git or need to learn it. Given that association, you need to know about Git and to understand the fundamental concepts.
You’ve heard about the potential benefits of Git, and so you are curious about it and about what it can do for you and the organization you work with.
You may actually see yourself in more than one of these categories. However, you probably just want to be able to get your job done (whether that job is a personal or professional goal). This book was built on that premise.
Git requires a mind shift. In fact, it requires a series of mind shifts. However, each shift is easy to understand once you can relate it to something you already know. Understanding each of these shifts will, in turn, allow you to be more productive and to harness the features of this powerful tool—and that’s what this book is about.
This book is organized as a series of chapters that present Git from the ground up, teaching you what you need to know and build on to become proficient before adding new concepts.
In the first three chapters, I cover the foundational concepts of Git: how it’s different from other systems, the ecosystem that’s been built around it, its advantages and challenges, and the model that allows you to understand its workflow and manage content effectively with it. This section will provide you with a basic understanding of the ideas, goals, and essential terminology of Git.
In the remaining chapters of the book, I cover the usage and features of Git, from performing basic operations to create repositories and commit changes into them, to creating branches, doing merges, and working with content in public repositories.
Notice that I don’t have you using Git right away. (If you want to do that, feel free to jump ahead to Chapter 4, which quickly enables you to start getting hands-on with Git.) However, I highly recommend reading the first three chapters. If you’re new to Git (or it’s been a while), the background reading, especially in Chapters 2 and 3, will provide the foundation you need to understand the remaining chapters. And even if you’ve used Git before, reading these chapters may clear up questions that you’ve had about Git, give you a better mental model to work from, and form a basis to understand some of the more advanced concepts.
Throughout the book, you’ll find examples and guidance on the commands and workflows you need to be productive with Git. Each chapter includes ways to relate concepts to what you already know and understand. In addition to the text, you’ll find many illustrations to help you understand concepts visually. As I’ve already mentioned, this book also adds a feature that allows you to get hands-on experience with Git, via Connected Labs interspersed throughout the chapters. These labs are designed to reinforce the concepts presented in the text of the preceding chapter(s) and to get you actively involved in the learning process, allowing you to better grasp the concepts. To get the most out of the book, you should take the time to complete each lab—usually only a few minutes. You’ll find that these simple steps will greatly increase your overall understanding and confidence when using Git.
As well, take a look at the Advanced Topics sections, located at the end of some chapters. You’ll likely find explanations and ideas to leverage Git functionality in ways you may not have considered before, or you may find out how to use that feature you’ve always wondered about.
For the later labs, custom Git repositories with example content are provided for the user at http://github.com/professional-git. In addition, downloadable copies of the code for the hooks from the last chapter are available in http://github.com/professional-git/hooks. In the event that GitHub is not available, you can find the needed files at www.wrox.com/go/professionalgit
If this sounds like the book for you, then I encourage you to keep reading and to start making the connections and mind shifts that will help you succeed with Git. As you progress through the book, you’ll find many ideas, insights, and “a-ha” moments that will serve you well. And with that knowledge, you’ll soon be working at the level of “Professional Git.”
CHAPTER 1
: What Is Git?
CHAPTER 2
: Key Concepts
CHAPTER 3
: The Git Promotion Model
A brief introduction to Git and its history
The different ways to find and access Git
Types of applications that incorporate Git
The advantages of using Git
The challenges of using Git
In this chapter, you'll be introduced to Git and will learn about it from a product perspective—what it is, why it's used, the different kinds of interfaces you can use with it, and the good parts and challenging parts of working with it. This will provide an important foundation for understanding the technical details that follow in the subsequent chapters.
If I were to summarize what Git is in one paragraph, it would go something like this:
Git is a popular and widely used source management system that greatly simplifies the development cycle. It enables users to create, use, and switch between branches for content development as easily as people create and switch between files in their daily workflow. It is implemented using a fast, efficient architecture that allows for ease of experimentation and refinement of local changes in an isolated environment before sharing them with others. In short, it allows everyday users to focus on getting the content right instead of worrying about source management, while providing more advanced users with the ability to record, edit, and share changes at any level of detail.
In short, Git is different—really. When you're experienced with using Git and understand it, this will make you feel empowered and productive. When you're new to Git, and trying to understand it, you will encounter a model that will lead you to think differently about managing content in source control.
To illustrate, there's an old saying that “when all you have is a hammer, everything looks like a nail.” When all you have is a traditional centralized source management system, everything looks like a file-by-file change that is expensive to branch.
Not so with Git. Git is one of those nice tools that actually allows users to focus on developing content and simplifying workflows. It's not just another tool in the toolbox, it is the toolbox. It contains all of the tools you need to manage tracking anything from a few files for a single user to projects spanning hundreds of users and a huge scope, such as the Linux kernel. Today, many large companies use Git. It's free, it's powerful, it scales, and its model works when used as designed.
Git also has a certain “feel” that's appealing to many people. Git is structured more like a series of individual utilities that you can run against your content, similar to how users work with operating systems. However, it doesn't try to be the system; it gives users ultimate control over their content, even to the point of being able to update history if needed. Git manages basic units that equate to directory structures rather than individual files, so content that extends across file and directory boundaries can be managed together. Git simplifies branching, to a point where creating, merging, or deleting branches becomes nearly as quick and easy as creating, merging, or deleting files. It also provides a local environment with full source management control that can be updated independently of the shared, public environment.
Given that it is different from other source code management (SCM) systems, it's useful to understand how Git originated. The following section includes some of its history.
Git has its roots in the development environment for the Linux kernel. In the early 2000s, the team working on the kernel began using a proprietary distributed source control system called BitKeeper (sometimes abbreviated as BK). The team was initially allowed to use this system for free. Over time, differences of opinion developed around the use of BK to the point that the owner of that system revoked the free use of the product. At that time (in 2005), Linus Torvalds, the creator of Linux, set out to create a new system that maintained the distributed ideal, but also incorporated several additional concepts he had been working with. Perhaps most importantly, he wanted it to provide the fast performance that a project on the scope of the Linux kernel would need. Thus the motivation and ideas for what became Git came into being.
Development began in early April of 2005, and an initial release was ready by July. Originally, there was an idea of purposing Git as a toolkit that could have other systems implemented on top of it. However, over time, it has been made into a full-fledged SCM in its own right.
If you're wondering about the name, there are multiple definitions for the word Git, but all of them imply a negative connotation about a person. Git was given its name by its creator. Linus jokingly stated that he named all his projects after himself.
For those interested in learning more about this phase of Git development, detailed historical information is available on the Internet.
From these early beginnings, Git has grown to become an industry-standard tool. Of course, industry standard is a relative term. Nevertheless, based on nearly any criteria, Git fits. It is used across all levels of industry. Huge projects, such as the Linux kernel, are managed in it, and also mandate its use (see the following list). It is a key component of many continuous integration/continuous delivery pipelines. Demand for knowledge about it is ever increasing. Commercial and open-source projects and applications recognize that if they require source management services, they have to integrate with Git. Projects and companies using Git include
Microsoft
Netflix
O'Reilly
PostgreSQL
Android
Linux
Eclipse
As with any sufficiently successful open-source technology, an entire ecosystem has sprung up around Git. This point is worth discussing for a moment. The basic tool that is Git has given rise to a seemingly endless number of applications to further help users who want to work with it—most named with some wordplay based on git. If you start discussing Git with someone, you may hear such names as GitHub, Gitolite, Easy Git, Git Extensions, EGit, and so on. To the uninitiated, it can be challenging to understand how each one of these names relates to the original Git tooling. To help clarify some of the confusion, I'll give you an overview of how the different offerings are categorized.
Broadly, you can break down the Git-based offerings into a few categories: core Git, Git-hosting sites, self-hosting packages, ease-of-use packages, plug-ins, tools that incorporate Git, and Git libraries.
In the core Git category, you have the basic Git executables, configuration files, and repository management tooling that you can install and use through the command line interface. (These can be installed from https://git-scm.com/downloads.) In addition to the basic pieces, the distributions usually include some supporting tools such as a simple GUI (git gui), a history visualization tool (gitk), and in some cases, an alternate interface such as a Bash shell that runs on Windows. The distribution for Windows is now called Git for Windows. Similarly there is a ported version of Git for OS/X. This version can be installed directly from the git-scm.com site, or via the Homebrew package manager or built via the MacPorts application.
When installing on Linux systems, the recommended method is to use the preferred package manager for your distribution. Example commands are shown in the following list.
Debian/Ubuntu
$ apt-get install git
Fedora (up to 21)
$ yum install git
Fedora (22 and beyond)
$ dnf install git
FreeBSD
$ cd/usr/ports/devel/git
$ make install
Gentoo
$ emerge --ask --verbose dev-vcs/git
OpenBSD
$ pkg_add git
Solaris 11 Express
$ pkg install developer/versioning/git
Git-hosting sites are websites that provide hosting services for Git repositories, both for personal and shared projects. Customers may be individuals, open-source collaborators, or businesses. Many open-source projects have their Git repositories hosted on these sites. In addition to the basic hosting services, these sites offer added value in the form of custom browsing features, easy web interfaces to Git commands, integrated bug tracking, and the ability to easily set up and share access among teams or groups of individuals.
These sites typically provide a workflow intended to allow users to contribute back to projects on the site. At a high level, this usually involves getting a copy of another user's repository, making changes in the copy, and then requesting that the original user review and incorporate the changes; this is sometimes known as the fork and pull model. (This model is explained in more detail in Chapter 13.)
For hosting, there is a pricing model that depends on the level of access, number of users, number of repositories, or features needed. For example, if a repository is intended to be public—with open access to anyone—it may be hosted for free. If access to a repository needs to be limited or it needs a higher level of service, then there may also be a charge. In addition, the hosting site may offer services such as consulting or training to generate revenue.
Examples of these types of sites include GitHub and Bitbucket. Figure 1.1 shows an example of a GitHub repository page.
Figure 1.1 Example GitHub page
Based on the success of the model and usage of the hosting sites, several packages have been developed to provide a similar functionality and experience for users and groups without having to rely on an external service. For some, this is their primary target market (GitLab), while others are stand-alone (also known as on-premise) versions of the popular web-hosting sites (such as GitHub Enterprise).
These packages are more palatable to businesses that don't want to host their code externally (on someone else's servers), but still want the collaborative features and control that are provided with the model. The cost structure usually depends on factors relating to the scale of use, such as the number of users or repositories. Figure 1.2 shows an example of a GitLab project screen.
Figure 1.2 GitLab project screen
The ease-of-use category encompasses applications that sit on top of the basic Git tooling with the intention of simplifying user interaction with Git. Typically, this means they provide GUI interfaces for working with repositories and may support GUI-based conventions such as drag-and-drop to move content between levels. In the same way, they often provide graphical tools for labor-intensive operations such as merging.
Examples include SourceTree, SmartGit, TortoiseGit, and Git Extensions. Typically, these packages are free for non-commercial use. You can see a more comprehensive list at https://git-scm.com/downloads/guis.
Figure 1.3 shows some examples of available packages.
Figure 1.3 Examples of GUIs available for Git (from git-scm.org)
One of the questions that frequently comes up when using Git is which stand-alone interface is best. There is no right answer here, but as a good default, the command line provides the most value for a number of reasons.
Although a large number and variety of GUIs are available to use with Git, there is no accepted standard. GUIs come and go, and vary highly in their degree of functionality, completeness, and utility. The command line is consistent and universally applicable.
Not all functionality is exposed through any one GUI for Git. However, all functionality available to users is exposed through the command line. If you need to do something that isn't available through a GUI, you can always drop back to the command line to accomplish it. In addition, Git includes man pages for all command line usage, so help is readily available for that interface.
If you understand the command line operations and options, it's generally easy to translate and map them to the corresponding items in a GUI.
Once you understand the basic command line operation, you'll have more insight into what you want and need to do with a GUI interface. You'll also be in a better position to choose one if desired.
As a side note, one of the main advantages of having a graphical interface with Git is having a graphical merge tool. Git also allows you to configure using a thirdparty tool for merges from the command line interface. We'll explore configuring merge tools in Chapter 9.
Plug-ins are software components that add interfaces for working with Git to existing applications. Common plug-ins that users may deal with are those for popular IDEs such as Eclipse, IntelliJ, or Visual Studio, or those that integrate with workflow tools such as Jenkins or TeamCity. It is now becoming more common for applications to include a Git plug-in by default, or, in some cases, to just build it in directly.
Over the past few years, tooling has emerged that directly incorporates and uses Git as part of its model. One example is Gerrit, a tool designed primarily to do code reviews on changes targeted for Git remote repositories. At its core, Gerrit manages Git repositories and inserts itself into the Git workflow. It wraps Git repositories in a project structure with access controls, a code review workflow and tooling, and the ability to configure other validations and checks on the code. Figure 1.4 shows an example of a Gerrit screen.
Figure 1.4 Example Gerrit screen
For interfacing with some programming languages, developers have implemented libraries that wrap those languages or re-implement the Git functionality. One of the best-known examples of this is JGit. JGit is a Java library that re-implements Git and is used by a number of applications such as Gerrit (mentioned in the previous section). These implementations make interfacing with Git programmatically much more direct. However, there is sometimes a cost in terms of waiting, when new features or bug fixes that are implemented in the core Git tooling have to be re-implemented in these libraries.
Everyone has opinions, and anyone who's tried Git has an opinion about it. These usually vary from believing it's the greatest thing since sliced bread to wondering how they could ever effectively use it. In this section, you'll look at some of the advantages and challenges that Git offers (in no particular order). Granted, these lists are subjective, but themes in each area seem to consistently emerge.
Git is popular for many reasons. There are some things it just does better (faster, easier) than other source management systems and some things that it takes a totally different approach on. Learning about and leveraging the aspects outlined here will allow you to get the most out of this tool.
The Git model provides a local environment where you can work with a local copy of a server-side repository (this server-side repository is known as the remote in Git terminology). This copy resides within your workspace. When you are satisfied with your changes in this local repository, you then sync the local repository's contents up with the remote side.
All of the source management commands that you need to make changes can be run in this local environment. There's no need to access the remote repository until you're ready to sync content. Because of this, you do not need a connection to the remote repository to conduct source management. You just work against the local copy.
Because you can perform source management tasks in your local environment without needing a connection to the remote-server side, you can work disconnected from the remote and even disconnected from a network. This is what disconnected development means.
One important factor to keep in mind is that until you sync up with the remote, all of your changes and data are only in the local environment on your system. This is usually the local disk on your machine.
Git stores a lot of information. (I'll describe its internal storage model in the next chapter.) However, it is efficient both in the way it stores content and in the way it retrieves it. Internally, Git packs together similar objects. Externally, it uses a good compression model to send significant amounts of data efficiently through a network. Of course, this network performance may be mitigated by limiting factors such as network latency, but as a general rule, wait times for Git operations from the server are not a factor.
For changes in the local environment, Git is as fast as its commands can be executed on your disk. Because it only has to interact with a local repository (in most cases not going across a network connection), the performance is equivalent to operating system commands.
Another factor that aids Git's performance is that it is designed to manage multiple smaller repositories—rather than larger aggregate ones that may be present in traditional source control systems. For example, consider how you might store the source code for a large Java project. In a traditional source control management (SCM) system, you might have a single large Java repository with all of the source code in subdirectories for the different JARs. However, in Git you would typically have a separate repository for the source code for each JAR. This granularity contributes to the smaller amount of content that has to be moved around in Git, and thus to a faster operation.
Finally, branching is extremely fast in Git. I'll explain why in Chapter 8, but essentially, as fast as you can create a file on your OS, you can create a branch in Git. This means there is no more waiting for extended periods while the source management system branches your content. Deleting branches is just as quick. Merging is generally quick as well, assuming there are no conflicts.
There's a paradigm shift that is required when learning to use Git. And a prerequisite to thinking that Git is easy to use is understanding it. However, once you grasp the concepts and start to use this tool regularly, it becomes both easy to use and powerful. There are simple default forms of commands and options. As your proficiency grows, there are extended forms that can allow you to do nearly anything you need to do with your content. In addition, almost everything about Git settings is configurable so that you can customize your working environment. (Git configuration is discussed in detail in Chapter 4.)
The primary mistake that most new Git users make is trying to use it in the same way that they've always used their traditional source management system. Usually this means that they are trying to map commands and workflow concepts from the previous system to Git's commands. However, trying to adhere too strictly to this approach with Git will actually make the learning curve steeper. A better approach is to consider what sort of source management outcome is needed (files in the repository, viewing history, and so on), and then take the time to learn how that workflow is done with Git. (The Connected Labs included throughout this book will aid this process significantly by providing hands-on experience with Git.)
The strange-looking name SHA1 is an acronym for Secure Hashing Algorithm 1. In short, it's a checksum. (It has its roots in the MD5 implementation if you're familiar with that.) Git computes SHA1s internally as keys for everything it stores in its repositories. This means that every change in Git has a unique identifier and that it's not possible to change content that Git manages without Git knowing about it—because the checksum would change. In Git, SHA1s represent a direct way to identify and specify the exact change that you want to work with.
One aspect of Git that is different from most other source management systems is the ability to rewrite or redo previous versions of content stored in the repository—that is, history. Git provides functionality that allows you to traverse previous versions, edit and update them, and place the updated versions back in the same sequence of changes stored in the repository. This is a powerful feature of the tool, but it can also be dangerous (see the section, “The Challenges: Ability to Rewrite History,” later in this chapter).
When content that you're working on in your local environment hasn't yet been synched to the remote side, this is a safe operation. And when you need it, it can be very beneficial. For example, consider a case where you forget to include a file with a change, or even just need to do something as simple as modify the message associated with the change. Git provides an amend option that allows you to update or replace the last change made in the local repository.
Additional functionality makes it possible to take selected changes from one branch and incorporate them directly into the line of changes in another branch. Beyond that are levels of functionality for doing editing throughout the history of one or more branches. An example case would be removing a hard-coded password that was accidentally introduced into the history months ago from all affected versions.
Git includes an intermediate level between the directory where content is created and edited, and the repository where content is committed. New users typically don't see this extra level as a positive, due to the perceived inconvenience of having to move content through another level. However, it does provide a separate area for use in some of Git's advanced operations, such as the amend option discussed previously. It also simplifies some status tracking. I'll cover the staging area in detail in Chapter 3.
Using branches is a core concept of Git. Earlier, I mentioned the speed with which users can create, delete, and manipulate branches. However, beyond that, Git provides capabilities for changing branch points and reproducing changes from one branch onto another branch—a feature referred to as rebasing. This ease in working with and manipulating branches forms the basis for a development model with Git. In this model, branches are managed as easily as files are in some other systems. Later in the book, I devote entire chapters to branching concepts.
It is rare these days for source management users to only be concerned with one release of content. Even when products are managed via a continuous delivery process, in a user's local environment, there are typically multiple changes underway, for new features, bug fixes, and so on. Traditionally, the best way to develop these multiple changes in parallel has been in separate workspaces, and, depending on the scope and ease of use of the source management application, in separate branches. With legacy SCM systems, maintaining these multiple workspaces, switching contexts between them, and ensuring they are up to date with the correct source code is a multi-step process that requires tracking and coordination by the user.
In Git, this is a single-step process managed by Git. Git allows you to work in one workspace for a repository, regardless of how many branches you may have or need to use. It manages updating the content in the workspace to ensure it is consistent with whichever branch is active. You never need to leave that workspace. Also, while working in one branch, you still have the expected access to view, merge, or create other branches.
If you do find yourself needing to work in multiple branches at the same time, recent versions of Git have introduced a new feature to support this—worktrees (otherwise known as working trees). Worktrees provide a way to have and use multiple working directories with different branches (at the same time) all tied back to the same local Git repository.
We discuss worktrees in detail in Chapter 14.
Now, to balance out the picture, let's look at a few of the things about Git that can be challenging—especially for new users. I'll have more to say about this topic, including what to watch out for, and strategies for effectively dealing with these challenges, throughout the book.
Going from a more traditional, centralized version control system to a distributed version control system such as Git requires a change in how you think about your source management workflow. Git implements a local environment with multiple levels in addition to a separate remote repository. As well, it operates with units that map more closely to directory tree structures than just individual files. This leads to considerations when creating and working in Git repositories, in terms of size and scope, that you don't usually worry about with centralized systems.
In most traditional source control systems, there are one or two commands for getting content out (checkout) and one or two for putting content in (check-in, commit), with options for modifying their behavior to work in different ways if needed.
With Git, there are different commands for moving content between the different layers, and these commands must be used in a particular sequence. This isn't really an issue after you've been working with Git for a while, and actually is clearer when talking about the workflow. However, it can be a little confusing to new users.
As previously mentioned, Git includes a staging level. This is an intermediate area that new code has to travel through on its way to the local repository. This will seem cumbersome at first, because content must flow through it, even in some situations where it doesn't appear to add value. However, once you are comfortable with it, it will allow you to work with a power and flexibility that you haven't experienced previously.
All of the things I'm talking about as advantages and challenges contribute to the power of Git—as well as the learning curve. As I alluded to previously, one of the fundamental mistakes that new Git users make is trying to map too many concepts and workflows that they've used in the past with other systems, too closely to Git concepts and workflows. They often expect a one-to-one fit, just with different names. The basic principles of source management still apply—tracking changes, putting code in, getting code out, and so on. However, Git adds layers of flexibility and power on top of those principles, at the cost of requiring you to think differently about the units and stages of source control.
This requires a learning curve and a willingness to accept some features and requirements as useful, even if they don't immediately appear so. It's one of those situations where a feature won't seem beneficial until it is. As you continue to use the tool, it's a pleasant experience when you encounter those situations where you need to do X, you wonder if Git can do X, and you discover (in most cases) it can. Of course, there's also a learning curve with figuring out the exact invocation, and implications, of doing X.
Part of the mind shift comes early on in thinking about what should be in your Git repositories and branches. Just converting existing repositories one-to-one from another source management system is seldom the best approach. This is due to the way that Git manages scope in terms of changes and repositories. I'll discuss more about this as you learn more about Git.
Finally, it's worth pointing out that Git offers a built-in way to learn and explore the tool and workflow as you're going through this mind shift and learning curve—the local environment. I'll talk more about this in the next couple of chapters, but for now, know that you have the ability to make any source management changes (and mistakes) you need to in your local environment before you ever push them over to the remote environment, where others can see or access them.
Most source management systems do not have strong support for binary files, and Git is no exception. There are two aspects of dealing with binary files that are challenging here: internal format and size.
Because of the internal format of these types of files where the bits rather than the characters are what is important, standard source management operations can be difficult to apply or may not make sense at all. An example of the former would be diffing. An example of the latter would be managing line endings. If the SCM does not recognize or understand that a particular file is binary and tries to execute these types of operations against it, the results can be confusing and problematic.
The size of binary files can routinely be much larger than text ones. Very large binary files can pose a challenge for a system like Git since they usually cannot be compressed very much, and so can impose more time and space to manage, leading to extended operation times when the system has to pass around these files such as when copying to a local system.
Of course, larger text files can also pose size challenges, but with text files, the ability to compute differences between versions and more compressibility can work better with Git's internal strategies for efficiently storing and serving these files.
Git has built-in mechanisms for identifying files as binary. However, it is also possible (and a best practice) to use one of its supporting files—the Git Attributes file—to explicitly identify which types of files are binary. Git Attribute files are covered in detail in Chapter 10.
The challenges with large binary files for source management in general have led to the development of several separate applications to help. Artifact repositories, such as Artifactory and Nexxus, are targeted specifically at storing and managing revisions of binary files. And the Git community itself has created various applications targeted at helping with this. Currently, the best-known one is probably Git LFS (Git Large File Storage)—a solution from the Git hosting site, GitHub. This application stores large files in a separate repository and stores text pointers in the traditional Git repository to those large files.
As referenced in the previous section on SHA1s, Git creates checksums (SHA1s) for everything that it stores. From one perspective, the overall SHA1 value for a change can function like the version number in most other source control systems. However, unlike traditional version or revision numbers, these are not short, easily remembered identifiers. SHA1s are actually 40-character hexadecimal strings. So, from a user perspective, SHA1s are not as convenient to remember, find, or communicate about. Typing one also requires some care.
Fortunately, in any Git instance, you only need to use enough of the characters from any SHA1 to uniquely identify that SHA1 from any other—usually the first seven characters. You can also use other references, such as tags or branch names, to indicate revisions where appropriate.
While talking about the Git model, I mentioned that Git thinks in units that more closely map directory structures than individual files. This difference in granularity provides advantages in managing and manipulating changes in source control. However, it can also create disadvantages in merge situations where there are conflicts. Simply put, any two changes by different users within the scope of a commit can be a conflict, even if they are in entirely different files or directories. As a result, the more people that are making changes within the scope of a repository, the more likely they are to encounter merge conflicts when trying to get their updates in. This is a factor to consider when planning how to structure your Git repositories.
Git's ability to rewrite history falls into both categories. On the challenging side of the scale is the potential impact that uncoordinated use can have on other users. Suppose that multiple users have obtained content from a remote (shared) Git repository. One user decides to perform an operation that changes the revision history. Changing the history results in new internal checksums (SHA1s) for changes in the repository, starting at whatever points the revisions were made. Once the updates are put back on the remote side, any other users that need to merge in updates will have to deal not only with the newest content, but also with the changes to the revisions in the history made by the other user. At best, this can be surprising. At worst, it can be very time-consuming and resource-intensive, because it requires them to incorporate all of the changes.
As a highly recommended guideline, changes that alter history should only be made in a user's local environment before the affected revisions are pushed across to the remote side. If there is a critical need to change revisions in the history of a repository after it has been made available on the remote side, then there is a recommended approach: other users should be informed in advance, and given a chance to get their changes in before the changes to the history are made. After the changes are completed, they can get a fresh copy to work with locally. This will allow them to avoid potentially difficult merge situations.
