38,39 €
Delphi is a cross-platform Integrated Development Environment (IDE) that supports rapid application development for Microsoft Windows, Apple Mac OS X, Google Android, iOS, and now Linux with RAD Studio 10.2. This book will be your guide to build efficient high performance applications with Delphi.
The book begins by explaining how to find performance bottlenecks and apply the correct algorithm to fix them. It will teach you how to improve your algorithms before taking you through parallel programming. You’ll then explore various tools to build highly concurrent applications.
After that, you’ll delve into improving the performance of your code and master cross-platform RTL improvements. Finally, we’ll go through memory management with Delphi and you’ll see how to leverage several external libraries to write better performing programs.
By the end of the book, you’ll have the knowledge to create high performance applications with Delphi.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 439
Veröffentlichungsjahr: 2018
Copyright © 2018 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Merint MathewAcquisition Editor: Nitin DasanContent Development Editor: Nikhil BorkarTechnical Editor: Jijo MaliyekalCopy Editor: Safis EditingProject Coordinator: Ulhas KambaliProofreader: Safis EditingIndexer: Mariammal ChettiyarGraphics: Tania DuttaProduction Coordinator: Deepika Naik
First published: February 2018
Production reference: 1230218
Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.
ISBN 978-1-78862-545-6
www.packtpub.com
Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Mapt is fully searchable
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Primož Gabrijelčič started coding in Pascal on 8-bit micros in the 1980s and he never looked back. In the last 20 years, he was mostly programming high-availability server applications used in the broadcasting industry. A result of this focus was the open sourced parallel programming library for Delphi—OmniThreadLibrary. He's also an avid writer and has written several hundred articles, and he is a frequent speaker at Delphi conferences where he likes to talk about complicated topics, ranging from memory management to creating custom compilers.
Stefan Glienke, since his very first steps with Turbo Pascal in school, was passionate about programming. In his spare time, he worked at a software company using Visual Basic and started learning C++ and Delphi and decided to start an apprenticeship as a software developer after graduation.
Over the years, he developed an interest in software architecture and began helping fellow developers improve their skills, participating in several open source projects, such as Spring4D. He is also the author of TestInsight.
Stefan has worked as reviewer on the books Coding in Delphi and Dependency Injection in Delphi.
Bruce McGee is the founder of Glooscap Software, a software development and consulting company in Toronto, Ontario. He is also a long-time Delphi user, which he continues to work with every day.
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Title Page
Copyright and Credits
Delphi High Performance
Packt Upsell
Why subscribe?
PacktPub.com
Contributors
About the author
About the reviewers
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Conventions used
Get in touch
Reviews
About Performance
What is performance?
Different types of speed
Algorithm complexity
Big O and Delphi data structures
Data structures in practice
Mr. Smith's first program
Looking at code through the Big O eyes
Don't guess, measure!
Profiling with TStopwatch
Profilers
AsmProfiler
Sampling Profiler
AQTime
Nexus Quality Suite
Summary
Fixing the Algorithm
Responsive user interfaces
Updating a progress bar
Bulk updates
Virtual display
Caching
Dynamic cache
Speeding up SlowCode
Summary
Fine-Tuning the Code
Delphi compiler settings
Code inlining control
Optimization
Record field alignment
Assertions
Overflow checking
Range checking
Extracting common expressions
The helpful CPU window
Behind the scenes
A plethora of types
Simple types
Strings
Arrays
Records
Classes
Interfaces
Optimizing method calls
Parameter passing
Method inlining
The magic of pointers
Going the assembler way
Returning to SlowCode
Summary
Memory Management
Optimizing strings and array allocations
Memory management functions
Dynamic record allocation
FastMM internals
Memory allocation in a parallel world
Replacing the default memory manager
ScaleMM
TBBMalloc
Fine-tuning SlowCode
Summary
Getting Started with the Parallel World
Processes and threads
When to parallelize the code?
Most common problems
Never access UI from a background thread
Simultaneous reading and writing
Sharing a variable
Synchronization
Critical sections
Other locking mechanisms
A short note on coding style
Shared data with built-in locking
Interlocked operations
Object life cycle
Communication
Windows messages
Synchronize and Queue
Polling
Performance
Third-party libraries
Summary
Working with Parallel Tools
TThread
Advanced TThread
Setting up a communication channel
Sending messages from a thread
Implementing a timer
Summary
Exploring Parallel Practices
Tasks and patterns
Variable capturing
Tasks
Exceptions in tasks
Parallelizing a loop
Thread pooling
Async/Await
Join
Join/Await
Future
Parallel for
Pipelines
Creating the pipeline
Stages
Displaying the result and shutting down
Summary
Using External Libraries
Using object files
Object file formats
Object file linking in practice
Using C++ libraries
Using a proxy DLL in Delphi
Summary
Best Practices
About performance
Fixing the algorithm
Fine-tuning the code
Memory management
Getting started with the parallel world
Working with parallel tools
Exploring parallel practices
Using external libraries
Final words
Other Books You May Enjoy
Leave a review - let other readers know what you think
Performance matters!
I started programming on 8-bit micros, and boy, was that an interesting time! Memory was typically not a problem as we didn't write big programs, but they certainly weren't running fast, especially if you run them with a built-in BASIC interpreter. It is not surprising that I quickly learned assembler and spent lots of early years shifting bits and registers around. So did almost everybody else who wanted to release a commercial application written for one of those computers. There were, more or less, no games and applications written in BASIC simply because they would run too slow and nobody would use them.
Time has changed; computers are now fast—incredibly fast! If you don't believe me, check the code examples for this book. A lot of times, I had to write loops that spin over many million iterations so that the result of changing the code would be noticed at all. The raw speed of processors has also changed the software development world. Low-level languages such as assembler and C mostly gave way to more abstract approaches—C#, C++, Delphi, F#, Java, Python, Ruby, JavaScript, Go, and so on. The choice is yours. Almost anything you write in these languages runs fast or at least fast enough.
Computers are so fast that we sometimes forget the basic rule—performance matters. Customers like programs that operate so fast that they don't have to think about it. If they have to wait 10 seconds for a form to appear after clicking on a button, they won't be very happy. They'll probably still use the software, though, provided that it works for them and doesn't crash. On the other hand, if you write a data processing application that needs 26 hours for a job that executes daily, you'll certainly lose them.
I'm not saying that you should switch to assembler. Low-level languages are fast, but coding in them is too slow for modern times, and the probability of introducing bugs is just too high. High-level languages are just fine, but you have to know how to use them. You have to know what is fast and what not and—preferably—you should take this into account when designing the code.
This book will walk you through the different approaches that will help you write better code. Writing fast code is not the same as optimizing a few lines of your program to the extreme. Most of the time, that is in fact the completely wrong approach! However, I'm getting ahead of myself. Let the book speak for itself.
This book was written for all Delphi programmers out there. You will find something interesting inside, whether you are new to programming or a seasoned old soul. I'm talking about basic stuff, about strings and arrays, lists and objects, but I'm also discussing parallel programming, memory manager internals, and object linking. There is also plenty of dictionaries, pointers, algorithmic complexities, code inlining, parameter passing, and what not.
So, whoever you are, dear reader, I'm pretty sure you'll find something new in this book. Enjoy!
Chapter 1, About Performance, talks about performance. We'll dissect the term itself and try to find out what users actually mean when they say that a program is performing (or not performing) well. Then, we will move into the area of algorithm complexity. We'll skip all the boring mathematics and just mention the parts relevant to programming. We will also look at different ways of finding the slow (non-performant) parts of the program, from pure guesswork to measuring tools of a different sophistication, homemade and commercial.
Chapter 2, Fixing the Algorithm, examines a few practical examples where changing an algorithm can speed up a program dramatically. In the first part, we'll look at graphical user interfaces and what we can do when a simple update to TListBox takes too long. The second part of the chapter explores the idea of caching and presents a reusable caching class with very fast implementation. In the last part, we'll revisit some code from Chapter 1, About Performance, and make it faster, again, by changing an algorithm.
Chapter 3, Fine-Tuning the Code, deals with lots of small things. Sometimes, performance lies in many small details, and this chapter shows how to use them to your advantage. We'll check the Delphi compiler settings and see which ones affect the code speed. We'll look at the implementation details for built-in data types and method calls. Using a correct type in a right way can mean a lot. Of course, we won't forget about the practical side. This chapter will give examples of different optimization techniques, such as extracting common expressions, using pointers to manipulate data, and implementing parts of the solution in assembler. At the end, we'll revisit the code from Chapter 1, About Performance, and make it even faster.
Chapter 4, Memory Management, is all about memory. It starts with a discussion on strings, arrays, and how their memory is managed. After that, we will move to the memory functions exposed by Delphi. We'll see how we can use them to manage memory. Next, we'll cover records—how to allocate them, how to initialize them, and how to create useful dynamically-allocated generic records. We'll then move into the murky waters of memory manager implementation. I'll sketch a very rough overview of FastMM, the default memory manager in Delphi. First, I'll explain why FastMM is excellent and then I'll show when and why it may slow you down. We'll see how to analyze memory performance problems and how to switch the memory manager for a different one. In the last part, we'll revisit the SlowCode program and reduce the number of memory allocations it makes.
Chapter 5, Getting Started with the Parallel World, moves the topic to parallel programming. In the introduction, I'll talk about processes and threads, and multithreading and multitasking to establish some common ground for discussion. After that, you'll start learning what not to do when writing parallel code. I'll explain how the user interface must be handled from background threads and what problems are caused by sharing data between threads. Then, I'll start fixing those problems by implementing various kinds of synchronization mechanisms and interlocked operations. We'll also deal with the biggest problem synchronization brings to the code—deadlocking. As synchronization inevitably slows the program down, I'll explain how to achieve the highest possible speed using data duplication, aggregation, and communication. At the end, I'll introduce two third-party libraries that contain helpful parallel functions and data structures.
Chapter 6, Working with Parallel Tools, focuses on a single topic, Delphi's TThread class. In the introduction, I'll explain why I believe that TThread is still important even in this modern age. I will explore different ways in which TThread based threads can be managed in your code. After that, I'll go through the most important TThread methods and properties and explain what they're good for. In the second part of the chapter, I'll extend TThread into something more modern and easier to use. Firstly, I'll add a communication channel so that you'll be able to send messages to the thread. After that, I'll implement a derived class designed to handle one specific usage pattern and show how this approach simplifies writing parallel code to the extreme.
Chapter 7, Exploring Parallel Practices, moves the multithreaded programming to more abstract terms. In this chapter, I'll discuss modern multithreading concepts: tasks and patterns. I'll look into Delphi's own implementation, Parallel Programming Library, and demonstrate the use of TTask/ITask. We'll look at topics such as task management, exception handling, and thread pooling. After that, I'll move on to patterns and talk about all Parallel Programming Library patterns: Join, Future, and Parallel For. I will also introduce two custom patterns—Async/Await and Join/Await—and finish the chapter with a discussion on the Pipeline pattern from OmniThreadLibrary.Chapter 8, Using External Libraries, admits that sometimes Delphi is not enough. Sometimes the problem is too complicated to be efficiently solved by a human. Sometimes Pascal is just lacking the speed. In such cases, we can try finding an existing library that solves our problem. In most cases, it will not support Delphi directly but will provide some kind of C or C++ interface. This chapter looks into linking with C object files and describes typical problems that you'll encounter on the way. In the second half, I'll present a complete example of linking to a C++ library, from writing a proxy DLL to using it in Delphi.
Chapter 9, Best Practices, wraps it all up. In this last chapter, I'll revisit all the important topics I explored in previous chapters. At the same time, I'll drop in some additional tips, tricks, and techniques.
Although you can read this book in bed or on the beach, you will need a computer and Delphi to play with the code examples. The code was written in Delphi 10.2 Tokyo, but it should also work without a problem in the older versions. I did use some modern features in demos—and dedicated a chapter to Parallel Programming Library that was introduced in Delphi XE7—so anything older than that is hit and miss.
This book does not refer to any functionality specific to the Enterprise edition. You'll be able to test all the code with the entry-level professional edition.
You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
Log in or register at
www.packtpub.com
.
Select the
SUPPORT
tab.
Click on
Code Downloads & Errata
.
Enter the name of the book in the
Search
box and follow the onscreen instructions.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for Mac
7-Zip/PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Delphi-High-Performance/. In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
There are a number of text conventions used throughout this book.
CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "A string parameter value is present in a string list."
A block of code is set as follows:
function IsPresentInList(strings: TStrings; const value: string): Boolean;var i: Integer;begin Result := False; for i := 0 to strings.Count - 1 do if SameText(strings[i], value) then Exit(True);end;
Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Go to Options | Options, then select General | Search directory."
Feedback from our readers is always welcome.
General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packtpub.com.
"My program is not fast enough. Users are saying that it is not performing well. What can I do?"
These are the words I hear a lot when consulting on different programming projects. Sometimes the answer is simple, sometimes hard, but almost always the critical part of the answer lies in the question. More specifically, in one word - performing.
What do we mean when we say that a program is performing well? Actually, nobody cares. What we have to know is what users mean when they say that the program is not performing well. And users, you'll probably admit, look at the world in a very different way than we programmers.
Before starting to measure and improve the performance of a program, we have to find out what users really mean by the word performance. Only then can we do something productive about it.
We will cover the following topics in this chapter:
What is performance?
What do we mean when we say that a program performs well?
What can we tell about the code speed by looking at the algorithm?
How does the knowledge of compiler internals help us write fast programs?
Why is it better to measure than to guess?
What tools can we use to find the slow parts of a program?
To better understand what we mean when we say that a program is performing well, let's take a look at a user story. In this book, we will use a fictitious person, namely Mr. Smith, Chief of Antarctica Department of Forestry. Mr. Smith is stationed in McMurdo Base, Antarctica, and he doesn't have much real work to do. He has already mapped all the forests in the vicinity of the station and half of the year it is too dark to be walking around and counting trees, anyway. That's why he spends most of his time behind a computer. And that's also why he is very grumpy when his programs are not performing well.
Some days he writes long documents analyzing the state of forests in Antarctica. When he is doing that, he wants the document editor to perform well. By that he actually means that the editor should work fast enough so that he doesn't feel any delay (or lag, as we call the delay when dealing with user input) while typing, preparing graphs, formatting tables, and so on.
In this scenario, performance simply means working fast enough and nothing else. If we speed up the operation of the document editor by a factor of two, or even by a factor of ten, that would make no noticeable improvement for our Mr. Smith. The document editor would simply stay fast enough as far as he is concerned.
The situation completely changes when he is querying a large database of all of the forests on Earth and comparing the situation across the world to the local specifics of Antarctica. He doesn't like to wait and he wants each database query to complete in as short a time as possible. In this case, performance translates to speed. We will make Mr. Smith a happier person if we find a way to speed up his database searches by a factor a ten. Or even a factor of five; or two. He will be happy with any speedup and he'd praise us up to the heavens.
After all this hard work, Mr. Smith likes to play a game. While the computer is thinking about a next move, a video call comes in. Mr. Smith knows he's in for a long chat and he starts resizing the game window so that it will share the screen with a video call application. But the game is thinking hard and is not processing user input and poor Mr. Smith is unable to resize it, which makes him unhappy.
In this example, Mr. Smith simply expects that the application's user interface will respond to his commands. He doesn't care if the application takes some time to find the next move, as long as he can do with the application what he wants to. In other words, he wants a user interface that doesn't block.
It is obvious from the previous example that we don't always mean the same thing when we talk about a program's speed. There is a real speed, as in the database example, and there is a perceived speed, hinted at in the document editor and game scenario. Sometimes we don't need to improve the program speed at all. We just have to make it not stutter while working (by making the user interface responsive at all times) and users will be happy.
We will deal with two types of performance in this book:
Programs that react quickly to user input
Programs that perform computations quickly
As you'll see, the techniques to achieve the former and the latter are somehow different. To make a program react quickly, we can sometimes just put a long operation (as was the calculation of the next move in the fictitious game) into a background thread. The code will still run as long as in the original version but the user interface won't be blocked and everybody will be happy.
To speed up a program (which can also help with a slowly-reacting user interface), we can use different techniques, from changing the algorithm to changing the code so that it will use more than one CPU at once to using a hand-optimized version, either written by us or imported from an external library.
To do anything, we have to know which part of the code is causing a problem. If we are dealing with a big legacy program, problematic part may be hard to find. In the rest of this chapter, we will look at different ways to locate such code. We'll start by taking an educated guess and then we'll improve that by measuring the code speed, first by using home-grown tools and then with a few different open source and commercial programs.
Before we start with the dirty (and fun) job of improving program speed, I'd like to present a bit of computer science theory, namely the Big O notation.
You don't have to worry, I will not use pages of mathematical formulas and talk about infinitesimal asymptotics. Instead, I will just present the essence of the Big O notation, the parts that are important to every programmer.
In the literature and, of course, on the web, you will see expressions such as O(n), O(n^2), O(1) and similar. This fancy-looking notation hides a really simple story. It tells us how much slower the algorithm will become if we increase the data size by a factor of n.
Let's say we have an algorithm with complexity of O(n), which on average takes T seconds to process input data of size N. If we increase the size of the data by a factor of 10 (to 10*N), then the algorithm will (on average) also use 10 times more time (that is, 10*T) to process the data. If we process 1,000 times more data, the program will also run 1,000 times slower.
If the algorithm complexity is O(n2), increasing the size of the data by a factor of 10 will cause the algorithm to run 102 or 100 times longer. If we want to process 1,000 times more data, then the algorithm will take 1,0002 or a million times longer, which is quite a hit. Such algorithms are typically not very useful if we have to process large amounts of data.
You may have noticed that I was using the word average a lot in the last few paragraphs. When talking about the algorithm complexity, we are mostly interested in the average behavior, but sometimes we will also need to know about the worst behavior. We rarely talk about best behavior because users don't really care much if the program is sometimes faster than average.
Let's look at an example. The following function checks whether a string parameter value is present in a string list:
function IsPresentInList(strings: TStrings; const value: string): Boolean;var i: Integer;begin Result := False; for i := 0 to strings.Count - 1 do if SameText(strings[i], value) then Exit(True);end;
What can we tell about this function? The best case is really simple—it will find that the value is equal to strings[0] and it will exit. Great! The best behavior for our function is O(1). That, sadly, doesn't tell us much as that won't happen frequently in practice.
The worst behavior is also easy to find. If the value is not present in the list, the code will have to scan all of the strings list before deciding that it should return False. In other words, the worst behavior is O(n), if the n represents the number of elements in the list. Incidentally (and without proof), the average behavior for this kind of search is also O(n).
There are better ways to check whether an element is present in some data than searching the list sequentially. We will explore one of them in the next section, Big O and Delphi data structures.
While the function of n inside the O() notation can be anything, there are some O functions that appear constantly in standard programming problems. The following table shows those Big O limits and the most common examples of problems that belong to each class:
Time complexity
Common examples of problems with that time complexity
O(1)
Accessing array elements
O(log n)
Search in an ordered list
O(n)
Linear search
O(n log n)
Quick sort (average behavior)
O(n
2
)
Quick sort (worst behavior), naive sort (bubblesort, insertion sort, selection sort)
O(c
n
)
Recursive Fibonacci, travelling salesman problem using dynamic programming (c is some numeric constant)
If we care about program performance, then O(1) algorithms are of special interest to us as they present algorithms which don't get slower (at least not noticeably) when we increase the problem size. We'll see an example of such O(1) algorithms in the next section.
When we deal with algorithms that search in some datasets, we usually try to make them behave as O(log n), not O(n), as the former slows down much, much slower than the latter.
Another big class of problems deals with sorting the data. While the naive approaches sort in O(n2), better algorithms (such as mergesort and quicksort) need on average just O(n log n) steps.
The following image shows how the time complexity for these typical limits (we have used 2n as an example of a more generic cn) grows when we increase the problem size up to 20-fold:
We can see that O(1) and O(log n) grow very slowly. While O(n log n) grows faster than O(n), it also grows much slower than O(n2), which we had to stop plotting when data was increased nine-fold.
The O(2n) starts slowly and looks like a great solution for small data sizes (small n), but then it starts rising terribly fast, much faster than O(n2).
The following table shows how fast O(n log n) and O(n2) are growing if we compare them with O(n) and how quickly O(2n) explodes.
The data column shows the data size increase factor. The number 10 in this column, for example, represents input with 10 times more elements than in the original data:
Data size
O(1)
O(log n)
O(n)
O(n log n)
O(n
2
)
O(2
n
)
1
1
1
1
1
1
1
2
1
2
2
4
4
2
10
1
4
10
43
100
512
20
1
5
20
106
400
524,288
100
1
8
100
764
10,000
10
29
300
1
9
300
2,769
90,000
10
90
We can see from this table that O(log n) algorithms present a big improvement over O(n) algorithms (8 versus 100 times increase in time when data increases 100-fold). We can also see that the O(2n) quickly becomes completely unmanageable.
The last cell in this table is particularly interesting. There are different estimates for the number of elementary particles (electrons, protons, neutrons, and so on) in the visible universe, but they all lie somewhere around 1090. Suppose we have a computer which can solve an O(2n) in a reasonable time. If we would increase the input data by a factor of just 300, then we would need 1090 computers to solve the new problem in the same time. That is as much as the number of particles in the visible universe!
Delphi's Run-Time Library (RTL) contains many data structures (classes that are specifically designed to store and retrieve data), mostly stored in System.Classes and System.Generics.Collection units that greatly simplify everyday work. We should, however, be aware of their good and bad sides.
Every data structure in the world is seeking a balance between four different types of data access: accessing the data, inserting the data, searching for data, and deleting data. Some data structures are good in some areas, others in different ones, but no data structure in this world can make all four operations independent of data size.
When designing a program, we should therefore know what our needs are. That will help us select the appropriate data structure for the job.
The most popular data structure in Delphi is undoubtedly TStringList. It can store a large amount of strings and assign an object to each of them. It can—and this is important—work in two modes, unsorted and sorted. The former, which is a default, keeps strings in the same order as they were added while the latter keeps them alphabetically ordered.
This directly affects the speed of some operations. While accessing any element in a string list can always be done in a constant time (O(1)), adding to a list can take O(1) when the list is not sorted and O(log n) when the list is sorted.
Why that big difference? When the list is unsorted, Add just adds a string at its end. If the list is, however, sorted, Add must first find a correct insertion place. It does this by executing a bisection search, which needs O(log n) steps to find the correct place.
The reverse holds true for searching in a string list. If it is not sorted, IndexOf needs to search (potentially) the whole list to find an element. In a sorted list, it can do it much faster (again by using a bisection) in O(log n) steps.
We can see that TStringList offers us two options - either a fast addition of elements or a fast lookup, but not both. In a practical situation, we must look at our algorithm and think wisely about what we really need and what will behave better.
To sort a string list, you can call its Sort method or you can set its Sorted property to True. There is, however, a subtle difference that you should be aware of. While calling Sort sorts the list, it doesn't set its internal is sorted flag and all operations on the list will proceed as if the list is unsorted. Setting Sorted := True, on the other hand, does both - it sets the internal flag and calls the Sort method to sort the data.
To store any (non-string) data, we can use traditional TList and TObjectList classes or their more modern generic counterparts, TList<T> and TObjectList<T>. They all always work in an unsorted mode and so adding an element takes O(1) while finding and removing an element takes O(n) steps.
All provide a Sort function which sorts the data with a quicksort algorithm (O(n log n) on average) but only generic versions have a BinarySearch method, which searches for an element with a bisection search taking O(log n) steps. Be aware that BinarySearch requires the list to be sorted but doesn't make any checks to assert that. It is your responsibility to sort the list before you use this function.
If you need a very quick element lookup, paired with a fast addition and removal, then TDictionary is the solution. It has methods for adding (Add), removing (Remove) and finding a key (ContainsKey and TryGetValue) that, on average, function in a constant time, O(1). Their worst behavior is actually quite bad, O(n), but that will only occur on specially crafted sets of data that you will never see in practical applications.
I've told you before that there's no free lunch and so we can't expect that TDictionary is perfect. The big limitation is that we can't access the elements it is holding in a direct way. In other words, there is no TDictionary[i]. We can walk over all elements in a dictionary by using a for statement, but we can't access any of its elements directly. Another limitation of TDictionary is that it does not preserve the order in which elements were added.
Delphi also offers two simple data structures that mimic standard queue—TQueue<T>—and stack—TStack<T>. Both have very fast O(1) methods for adding and removing the data, but they don't offer any bells and whistles—there is no direct data access, we cannot search for data, and so on. We can only insert (Enqueue in queue or Push in stack) and remove (Dequeue and Pop) data.
To help you select the right tool for the job, I have put together a table showing the most important data structures and their most important methods, together with average and (when they differ from the average) worst-case time complexities:
Data structure
Operation
Average
Worst
TStringList
Direct access
O(1)
Add
O(1) / O(
log n
)
Insert
O(1)
Delete
O(1)
IndexOf
O(
n
) / O(
log n
)
Sort
O(
n log n
)
O(
n
2
)
TList, TObjectList
Direct access
O(1)
Add
O(1)
Insert
O(1)
Delete
O(1)
Remove
O(n)
IndexOf
O(n)
Sort
O(
n log n
)
O(
n
2
)
TList<T>, TObjectList<T>
Direct access
O(1)
Add
O(1)
Insert
O(1)
Delete
O(1)
Remove
O(
n
)
IndexOf
O(
n
)
BinarySearch
O(
log n
)
Sort
O(
n log n
)
O(
n
2
)
TDictionary
Direct access
Not possible
Add
O(1)
O(
n
)
Remove
O(1)
O(
n
)
TryGetValue
O(1)
O(
n
)
ContainsKey
O(1)
O(
n
)
ContainsValue
O(n)
TQueue<T>
Direct access
Not possible
Enqueue
O(1)
Dequeue
O(1)
TStack<T>
Direct access
Not possible
Push
O(1)
Pop
O(1)
The table shows the time complexity of the most important operations on built-in Delphi data structures. Complexity for the worst case is only listed if it differs from the average complexity.