Scala Microservices - Jatin Puri - E-Book

Scala Microservices E-Book

Jatin Puri

0,0
41,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Design, build, and run Microservices using Scala elegantly

About This Book

  • Build robust microservices using Play Framework and Lagom
  • Model your data for highly interactive applications and scale using Event Sourcing & CQRS
  • Build applications that are resilient to failures by using Message Passing for communication
  • Deploy and manage Scala Microservices for scale by using docker containers with Kubernetes for orchestration

Who This Book Is For

It is assumed that the reader knows Scala or is proficient in a competent programming language such as Java, C#, Ruby, and so on, with some exposure to Scala. Some experience with writing web services would also be ideal but not mandatory.

What You Will Learn

  • Learn the essentials behind Microservices, the advantages and perils associated with them
  • Build low latency, high throughput applications using Play and Lagom
  • Dive deeper with being asynchronous and understand the superiority it provides
  • Model your complex domain data for scale and simplicity with CQRS and Event Sourcing
  • Be resilient to failures by using message passing
  • Look at best practices of version control workflow, testing, continuous integration and deployments
  • Understand operating system level virtualization using Linux Containers. Docker is used to explain how containers work
  • Automate your infrastructure with kubernetes

In Detail

In this book we will learn what it takes to build great applications using Microservices, the pitfalls associated with such a design and the techniques to avoid them.

We learn to build highly performant applications using Play Framework. You will understand the importance of writing code that is asynchronous and nonblocking and how Play leverages this paradigm for higher throughput. The book introduces Reactive Manifesto and uses Lagom Framework to implement the suggested paradigms. Lagom teaches us to: build applications that are scalable and resilient to failures, and solves problems faced with microservices like service gateway, service discovery, communication and so on. Message Passing is used as a means to achieve resilience and CQRS with Event Sourcing helps us in modelling data for highly interactive applications.

The book also shares effective development processes for large teams by using good version control workflow, continuous integration and deployment strategies. We introduce Docker containers and Kubernetes orchestrator. Finally, we look at end to end deployment of a set of scala microservices in kubernetes with load balancing, service discovery and rolling deployments.

Style and approach

The book will step through each of the core microservice concepts in Scala, building an overall picture of their capabilities. This book adopts a systematic approach, allowing you to build upon what you've learnt in previous chapters. By the end of this book you'll have an understanding of the complex aspects of building microservices in Scala and will be able to take that knowledge with you into further projects.ng of the complex aspects of building Microservices in Scala and will be able to take that knowledge with you onto whatever project calls for it

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 416

Veröffentlichungsjahr: 2017

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Scala Microservices

 

 

 

 

 

 

 

 

 

 

Develop, deploy, and run microservices with Scala

 

 

 

 

 

 

 

 

 

 

 

 

Jatin Puri
Selvam Palanimalai

 

 

 

 

 

 

 

 

 

 

 

 

BIRMINGHAM - MUMBAI

Scala Microservices

 

 

Copyright © 2017 Packt Publishing

 

 

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

 

First published: September 2017

 

Production reference: 1150917

 

Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.

ISBN 978-1-78646-934-2

 

www.packtpub.com

Credits

Authors

Jatin Puri

Selvam Palanimalai

Copy Editor

Safis Editing

Reviewer

Mark Elston

Project Coordinator

Prajakta Naik

Commissioning Editor

Kunal Parikh

Proofreader

Safis Editing

Acquisition Editor

Chaitanya Nair

Indexer

Francy Puthiry

ContentDevelopmentEditor

Siddhi Chavan

Graphics

Abhinash Sahu

Technical Editor

Supriya Thabe

Production Coordinator

Nilesh Mohite

About the Authors

Jatin Puri is a passionate engineer and programming language enthusiast. He holds a master's degree in mathematics. He is a Lightbend-certified Scala trainer and is involved with spreading goodness of Scala through Hyderabad Scala Meetup, Stack Overflow, training workshops, and open source contributions. When he is not programming, he teaches meditation and stress elimination techniques under the aegis of The Art of Living foundation.

To my family who mean the world to me. And to my guru, His Holiness Sri Sri Ravi Shankar Ji, who made the world my family.
The authors are grateful to some lovely people who were instrumental with different aspects of the book: Sri Sunil Vyas Ji for making it happen; Rajmahendra Hegde for the original idea; Nabarun Mondal and Lekha Sachdev for the metaphors; Aayush Surana, Mohit Mandokhot, Niraj Patel, Danish Puri, Mannat Vij, Kshtregya Vij, Aarushi Vij, Shravya Vij, Agam Dhall, Samidha Dhall, Nityesh Sachdev, Prakhar Srivastav, and Deepak Agrawal for the proofreads.

 

 

 

 

 

Selvam Palanimalai is a Production Engineer currently working in data pipeline automation using Kubernetes and Spark in downtown Toronto. He is passionate about technology-driven problem solving, clean data, and merciless automation. He is active in the open source community on GitHub, contributing to the Statistical group (SOCR) at the University of Michigan, Ann Arbor.

I would like to thank my Dad for all the inspiration and motivation in life. And my co-author, Jatin, for his energy and guidance in writing this book. This book wouldn't have been possible without the valuable feedback of Nehil Jain, Keshav Raghu and Mark Elston

About the Reviewer

Mark Elston has been a software developer for 30+ years and has developed software in a wide number of fields, including Systems Simulation, Embedded Hardware Control Systems, Desktop Applications, and Tester Operating Systems for Semiconductor Test systems. He has been with Advantest America for 20+ years as a software engineer and a software architect.

Advantest, his current employer, produces a wide variety of test equipment for the semiconductor industry.

He has also reviewed the book Mastering Android Wear Application Development by Packt.

www.PacktPub.com

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www.packtpub.com/mapt

Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.

Why subscribe?

Fully searchable across every book published by Packt

Copy and paste, print, and bookmark content

On demand and accessible via a web browser

Customer Feedback

Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review. If you'd like to join our team of regular reviewers, you can e-mail us at [email protected]. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!

Table of Contents

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

Introduction to Microservices

Business idea

Data collection

Linking users across sites

Rank developers

User interaction

Implementation

Development issues

Configuration and maintenance hazards

Difficult to get started

New functionality

Restart and update

Testing and deployment

Scalability

Isolation

Isolation in space

Isolation in time

Overview of application design till now

Microservices

Code example

Restructuring

What exactly are microservices

Sharing of a database

Defining microservice

Micro in microservice

Polyglot

The dark side of microservices architecture

Why Scala

Summary

Introduction to Play Framework

Quick introduction to Play 2.66

Getting started

Hello world

Structure of the Play project

Routing

Routes

HTTP POST

Actions

Auth

Templates

REST

JSON marshalling/Unmarshalling

Reading JSON

Macro to generate Reads

Generating JSON

Macro to generate Write

Play production mode

Summary

Asynchronous and Non-Blocking

Being asynchronous

Scenario 1 - synchronous

Scenario 2 - synchronous

Scenario 3 - asynchronous

Scenario 4 - waiting

Being asynchronous in Scala

ExecutionContext

Synchronous ExecutionContext

Future

Functional composition with Futures

Blocking

scala.concurrent.blocking

Non-blocking I/O

Blocking and synchronous, non-blocking and asynchronous

Work Stealing - what makes Play fast!

Scheduling improvements - 1

Scheduling improvements - 2

Work Stealing in Play

Play thread pools

Mixing synchronous and asynchronous code

Asynchronous WebService calls

Summary

Dive Deeper

Talent search engine

Project structure

The build.sbt file

Brief overview of the application

Security

Inter-microservice authentication

The auth-app microservice explained

Action.async

Brief introduction to Slick

Slick evolutions

Slick is asynchronous

The web-app microservice

The rank-app microservice

The stackoverflow-app (so-app) microservice

github-app (github-app)

The commons project

Pitfalls

Summary

Reactive Manifesto

Reactive hype?

Reactive Manifesto

Manifesto explained

Elastic

Resilience

Message-driven

Brief overview of Akka

Akka versus message brokers

Isolation

Flow control

Location Transparency

Immutability

Event driven versus message-driven

Summary

Introduction to Lagom

Why Lagom?

Brief overview of Lagom

Lagom Service API

Minimized chirper application

Anatomy of a Lagom project

Dependency injection in Lagom

API and Impl

Defining services

ServiceCall

Brief overview of Macwire

Implementing friend-impl

Akka Streams

Chirp service

Activity Stream

Frontend

Running the application

Multi project builds

Summary

CQRS and Event Sourcing

Data modelling

Bounded context

Domain-driven design

Event Sourcing

Advantages of Event Sourcing

Event

CQRS

CQRS example

Pain points with CQRS

Event Sourcing and CQRS

Conclusion

Lagom Persistence API

Getting started with setup

Managing Friends with CQRS

Command

Events

State

PersistentEntity

Implementing FriendService

Summary

Effective Communication

Isolation

Isolation - part 2

Message brokers

Apache Kafka

Conclusion

Lagom Message Broker API

Friend Recommendation

build.sbt

Friend-api

The friend-recommendation-api

Summary

Development Process

Getting to know the key ingredients

The feedback loop

Code versioning

Testing strategies

Continuous integration and deployments

Deployments

Jenkins pipelines

Jenkins pipeline - In action

Traditional deployments and machine images

Dependency versus package manager

Containers

Introducing hardware virtualization

OS-level virtualization

What do we expect from containers?

Container runtime - Docker

What made Docker a possibility?

Container images

Deployments with containers

Container concerns

Summary

Production Containers

Distributed systems and their essentials

Distributed system - definition

Reasons to distribute

Components of a distributed system

Domain name service

Server automation

What does an automation framework look like?

Infrastructure as code - Ansible

Ansible primitives

Ansible - creating a Pageview counter

Ansible cloud modules

An introduction to cluster orchestration

Kubernetes

K8s internals

API objects

Config management

K8s setup guide

K8s example app

K8s monitoring

K8s security

Network level

Service level

Certificates

Authentication

Authorization using RBAC

Caveats

Summary

Example Application in K8s

Talent search engine example

Dockerize

Creating a Docker image

Testing the Docker image

Pushing the Docker image

Deployment topology

K8s configurations

Auth-service

Installation of K8s

Deploy!

Rolling deployments

Persistent workloads

Persistent disks

Summary

Preface

Microservices is an architectural style and pattern that is becoming very popular and adopted by many organizations because of the advantages that it offers. In this book, you will learn what it takes to build great applications using microservices, the pitfalls associated with such a design, and the techniques to avoid them.

We will start by shedding light on traditional monoliths, and the problems faced in such architectures and how microservices are an obvious evolution to tackle such problems. We will then learn to build performant web-services using Play Framework. You will understand the importance of writing code that is asynchronous and non-blocking and how Play leverages this paradigm internally for a higher throughput.

Next, you will learn about the Reactive Manifesto and understand its practical benefits by leveraging it in your design. We will introduce the Lagom Framework, which serves two purposes: building reactive applications that are scalable and resilient to failures, and solving the problems associated with microservices architecture, such as service gateway, service discovery, inter-microservice communication and streaming, and so on. Message passing is used as a means to achieve resilience, and CQRS with Event Sourcing helps us model data for highly interactive applications.

We will proceed by learning about effective development processes for large teams. Using good version control workflow and continuous integration and deployments, we can achieve high confident shipment of code. Next, we will contrast it with an operating system level virtualization using Docker.

We will look at the theory of distributed systems first. This justifies the need for cluster orchestrator, such as Kubernetes, for efficient use of docker containers. Finally, we will look at the actual end-to-end deployment of a set of Scala microservices completely in Kubernetes, with load balancing, service discovery, and rolling deployments.

What this book covers

Chapter 1, Introduction to Microservices, introduces the term microservices and what we mean by it. It sheds light on the problems faced with monoliths and how Microservices architecture helps us solve those problems gracefully.

Chapter 2, Introduction to Play Framework, provides a brief overview of the Play Framework. We will also look at Play-related elements such as Guice and Play-JSON.

Chapter 3, Asynchronous and Non-Blocking, thoroughly discusses the importance of being asynchronous and how the Play Framework leverages this paradigm for scalability and high performance. You will learn about the Scala Future API and Work Stealing in Play.

Chapter 4, Dive Deeper, demonstrates how to build a sample search engine to screen developers. This is built in a microservices-based architecture using the Play Framework. In the process, we also become aware of the problems faced in building microservices.

Chapter 5, Reactive Manifesto, introduces the Reactive Manifesto. You will learn about the different guidelines provided by the manifesto to build responsive applications that are resilient to failures.

Chapter 6, Introduction to Lagom, provides an overview of the Lagom Framework and how it handles problems usually faced in the Microservices-based architecture. We will explore Lagom Service API in this chapter.

Chapter 7, CQRS and Event Sourcing, explains Event Sourcing and CQRS and the advantages they provide in the scenarios they best fit in. We will adapt this paradigm in an example using the Lagom Persistence API.

Chapter 8, Effective Communication, explains the importance of asynchronous communication using message passing in building robust Microservices. It introduces Apache Kafka as a broker, and we will explore the Lagom Message API as a means for message passing.

Chapter 9, Development Process, talks about a scalable development model to build microservices using code versioning, continuous integration, and testing. It talks about the basics of docker containers and images.

Chapter 10, Production Containers, looks into server automation, deploying, and managing containers on production systems. It also delves deep into a popular container orchestrator called Kubernetes its internals, monitoring, and security.

Chapter 11, Example Application in K8s, helps dockerize all the services from our example application from Chapter 4, Dive Deeper. Using Kubernetes, it will deploy our application microservices with load-balancing and service discovery features.

What you need for this book

You will require Java 8 and SBT for this book.

Who this book is for

It is assumed that the reader knows Scala or is proficient in a competent programming language, such as Java, C#, or Ruby, with some exposure to Scala. Some experience with writing web services would also be ideal but is not mandatory.

This book is for software developers and architects who wish to have a comprehensive understanding of microservices and build them in Scala. If any one of the following is true, this book is for you:

You have a huge monolith that is making development painful for your team

You wish to get started with Play Framework

You wish to build scalable microservices using the Lagom Framework

You want more axis to scale your application by effectively modeling data with CQRS and making it elastic and resilient using message passing

You want to deploy already existing microservices

You want to start using Docker

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book--what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply e-mail [email protected], and mention the book's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

You can download the code files by following these steps:

Log in or register to our website using your e-mail address and password.

Hover the mouse pointer on the

SUPPORT

tab at the top.

Click on

Code Downloads & Errata

.

Enter the name of the book in the

Search

box.

Select the book for which you're looking to download the code files.

Choose from the drop-down menu where you purchased this book from.

Click on

Code Download

.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR / 7-Zip for Windows

Zipeg / iZip / UnRarX for Mac

7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/scala-microservices-book/book-examples/. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Downloading the color images of this book

We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/ScalaMicroservices_ColorImages.pdf.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books--maybe a mistake in the text or the code--we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at [email protected] with a link to the suspected pirated material.

We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at [email protected], and we will do our best to address the problem.

Introduction to Microservices

Generally, the probability of a great idea striking is higher when one is in a joyful state of mind. So, one day during your holiday, while interacting with your friends, a fantastic business idea comes to your mind. The idea is to build a search engine as a tool for HR (Human Resources) to find the best talent. The HR would be able to search for their ideal candidate for the respective profession.

However, because you believe that this is a very vast idea and each profession has its own complexities, it would be wise to start with a single profession. Being a programmer yourself, you decide to write a search engine that will help the HR find their ideal developer with a single click. The initial intended idea is to be able to search for people based on the respective technology tool and location. For example, Android developers in London.

In this chapter, we will design our search engine for HR using the conventional methods first. In doing so, we will cover the following topics:

Designing all functionalities embedded as part of a single application

Understanding the advantages and issues faced with such a design

Introducing microservices as an alternative yet obvious approach

Business idea

To be able to implement a search engine, you need to decide the source of data. You decide to implement a search based on the data available at Stack Overflow, GitHub, and LinkedIn.

Data collection

The data has to be scraped from Stack Overflow, GitHub, and LinkedIn. We will need to write a web crawler that will crawl all the publicly available data on each source, and then parse the HTML files and try extracting data from each. Of course, we will need to write specific parsers for Stack Overflow, GitHub, and LinkedIn, as the data that you wish to extract will be differently structured for each respective site. For example, GitHub can provide the probable location of the users if the developer has mentioned them.

GitHub also provides an API (https://developer.github.com/v3/) to obtain the desired information and so do Stack Overflow (https://api.stackexchange.com/docs) and LinkedIn (https://developer.linkedin.com/). It makes a lot of sense to use these APIs because the information obtained is very structured and intact. They are much easier to maintain when compared to HTML parsers--as parsers can fail anytime as they are subject to a change in the page source.

Maybe you wish to have a combination of both, so as to not rely on just APIs, as the website owners could simply disable them temporarily without prior notification for numerous reasons, such as a higher load on their servers due to some event, disabling for public users to safeguard their own applications, high throttling from your IP address, or a temporary IP ban. Some of these services do provide a larger rate limit if purchased from them, but some don't. So, crawling is not dispensable.

The data collected individually from the aforementioned sites would be very rich. Stack Overflow can provide us with the following:

The list of all users on their website along with individual reputation, location, display name, website, URL, and more.

All the tags on the Stack Exchange platform. This information can be useful to generate our database of tags, such as Android, Java, iOS, JavaScript, and many more.

A split of reputation gained by individual users on different tags. At the time of writing, Jon Skeet, who has the highest reputation on Stack Overflow, had

18,286

posts on C#,

10,178

posts on Java, and so on. Reputation on each tag can give us a sense of how knowledgeable the developer is about a particular technology.

The following piece of code is the JSON response on calling the URL https://api.stackexchange.com/docs/users, which provides a list of all users in descending order in respect to their reputation:

{ "items": [ { "badge_counts": { "bronze": 7518, "silver": 6603, "gold": 493 }, "account_id": 11683, "is_employee": false, "last_modified_date": 1480008647, "last_access_date": 1480156016, "age": 40, "reputation_change_year": 76064, "reputation_change_quarter": 12476, "reputation_change_month": 5673, "reputation_change_week": 1513, "reputation_change_day": 178, "reputation": 909588, "creation_date": 1222430705, "user_type": "registered", "user_id": 22656, "accept_rate": 86, "location": "Reading, United Kingdom", "website_url": "http://csharpindepth.com", "link": "http://stackoverflow.com/users/22656/jon-skeet", "profile_image": "https://www.gravatar.com/avatar /6d8ebb117e8d83d74ea95fbdd0f87e13?s=128&d=identicon&r=PG", "display_name": "Jon Skeet" }, ..... }

In a similar manner, GitHub can also provide statistics for each user based on the user's contribution to different repositories via its API. A higher number of commits to a Scala-based repository on GitHub might represent his/her prowess with Scala. If the contributed repository has a higher number of stars and forks, then the contribution by a developer to such a repository gives him a higher score, as the repository is of a higher reputation. For example, there is strong probability that a person contributing to Spring's source code might actually be strong with Spring when compared to a person with a pet project based on Spring that is not starred by many. It is not a guarantee, but a matter of probability.

LinkedIn can give a very structured data of the current occupation, location, interests, blog posts, connections, and others.

Apart from the aforementioned sources, it might be a good idea to also build an infrastructure to manually insert and correct data. You could have a small operations team later, who will be able to delete/update/add entries to refine the data.

Once the data is collected, you will need to transform all the data collected to some desired format and have persistence storage for it. The data will also have to be processed and indexed to be made available in-memory, maybe by using Apache Lucene (http://lucene.apache.org/core/), and be able to execute faster queries on it. Querying data that is readily available on RAM are manifold times faster to access when compared to reading from disk.

Linking users across sites

Now that we have planned how to collect all the developer data based on the data available, we will also have to build our single global developer database across all websites comprising of such developer of information a Name, Contact Information, Location, LinkedIn handle, and Stack Overflow handle. This would, of course, be obtained by scraping from data generated by a web-crawler or API.

We need to have the ability to link people. For example, a developer with the handle abcxyz on LinkedIn might be the same person with the same/different handle on GitHub. So, now we can associate different profiles on different websites to a single user. This would provide much richer data that would leave is in a better position to rate that particular person.

Rank developers

We also need to have the ability to rate developers. This is a difficult problem to solve. We could calculate a rank for each user for each website and do a normalization over all the other websites. However, we need to be careful of data inconsistencies. For example, a user might have a higher score on GitHub but a poor score on Stack Overflow (maybe because he is not very active on Stack Overflow).

Ultimately, we would need a rank of each developer for each specific technology.

User interaction

Now that our backend is sorted out, we will of course need a fancy but minimal user interface for the HR manager to search. A query engine will be needed to be able to parse the search queries the users enter. For example, users might enter Full Stack Engineers in Singapore. So, we will need an engine to understand the implication of being Full Stack.

Maybe there is a need to also provide a Domain Specific Language (DSL) that users could query for complex searches, such as Location in (Singapore, Malaysia) AND Language in (Java, JavaScript) Knowledge of (Spring, Angular).

There will also be a need for the user interface to have a web interface for visualization of the responses, and store user preferences, such as default city or technology, and past searches.

Now that most of the functionality is sorted out on paper, coupled with confidence with the business idea and its ability to have an impact on how people search for talent online, confidence is sky high. The spontaneous belief is that it has all the potential to be the next market disruptor.

Implementation

Fundamentally, there are two parts to the application:

We have a web-based user interface, which is the source for queries and visualizing the result set

We have a lot of background batch work where we obtain data, extract content, run machine learning algorithms, and rank users

For the second part, we write background batch jobs that collect all the information from different sources, extract content, rank developers, and then persist them in some respective storage.

To expose the results that we have generated in the second part, we will write a set of standard service classes for each type of source (Stack Overflow, LinkedIn, and so on) that implement a common interface (just like any other traditional application):

interface SiteService{ List<Developers> getTopDevelopers(Location location, Tech tech, int responseCount); ..... } class LinkedInSiteServiceImpl implements SiteService {...} class StackoverflowSiteServiceImpl implements SiteService {...} class GithubSiteServiceImpl implements SiteService {...}

Similarly, we have Data Access Objects (DAO) to query persistence storage and provide results.

For the first part, we plan to build the web interface, just like in any other traditional approach, to be Model View Controller (MVC) based. We need to have security as well as session management for the user so that they don't have to login each time. For the user interface, we could either use templates (play templates, velocity templates, and so on) or rather a Single Page Application (https://Wikipedia/wiki/Single-page_application) where the page is dynamically loaded and built using the network calls in the background using either Ajax or WebSockets.

As we can see, the application is growing in magnitude as we think deeper to solve our problem. The complexity has increased and so has the functionality. There is, therefore, the need for more people to be involved with the project.

Development issues

Now we have backend developers, database architects, frontend developers, and architects all working on the same project.

Configuration and maintenance hazards

Now, some of the developers in the team realize that there is a need for each type of service to use its own persistence mechanism. The data received by querying Stack Overflow is by itself very complete (suppose there is a developer who has 10 k points only under the Scala tag; now, this by itself provides a summary of the user). So, this information can be easily stored using any standard relational database.

However, the same might not apply for GitHub, where it could get complex, as a project can have multiple developers and each developer can contribute to multiple projects with varying contributions. Developers decide that Neo4J (https://neo4j.com/), a graph database, best fits the schema to persist data associated with GitHub.

The LinkedIn based implementation might settle with a relational database again, but might use MongoDB to store the preprocessed response so that it is faster to respond with JSON rather than building it again and again.

Worse, all three of them might use a different caching mechanism to store their results. So, the reliance of different technologies has increased in the application. This means the configuration setup, such as URL and port, the authentication mechanism with the database, the database connection pool size, the default time zone to be used by the database, and other configurations have significantly increased.

This is only the backend part of our application. In the frontend, we might use CoffeeScript or Scala.js, coupled with some JavaScript framework, such as Angular, to develop responsive user interfaces.

Because it is one large application that is doing everything, the developers working on one thing cannot turn their backs on other modules. If some configuration is set incorrectly, and throws an exception when a module is started or, worse, does not let the whole application start or causes a build failure, it results in a waste in productivity and can seriously impact the morale of the developers at large.

By the end, the number of configurations have increased.

Modularity is lost.

We have defined service objects that provide us top developers for respective locations and technologies for each of Stack Overflow, LinkedIn, and GitHub. These service objects rely on the following:

Data Access Objects to obtain information from the persistence storage.

Cache objects to cache content. Multiple services might refer to the same cache objects for caching. For example, the same in-memory cache may be used to cache data associated with LinkedIn and Stack Overflow.

These service objects may then be used by different parts of our application, such as controllers that receive HTTP requests from the users.

The user information is also stored in relational database. The SQL Database was used to store Stack Overflow data, so we decide to use the same SQL instance to persist user information, as well as reuse the same database drivers, database authentication, connection pooling, transaction managers, and so on. Worse, we could use a common class, as nothing stops us from not preventing it.

With increasing functionality, the intended boundary designed initially gets lost. The same class may be used by multiple modules to avoid code duplication. Everything in the application starts using everything else.

All this makes refactoring the code harder, as any changes in behavior of a class may knowingly or unknowingly impact on so many different modules. Also, as the code base grows bigger, the probability of code duplication increases as it is difficult to keep track of a replicated effort.

Difficult to get started

People leaving teams and new people joining happens all the time. Ideally, for a new person joining the team, it should be straightforward to get started with development. As the configuration and maintenance gets messier, and the modularity is lost, it becomes difficult to get started.

A lack of modularity makes it necessary for developers to become well accustomed with the complete code base, even if one intends to work on a single module in the project. Due to this, the time needed for a new recruit in the team to contribute to a project increases by months.

New functionality

Our current system ranks developers based only on Stack Overflow, LinkedIn, and GitHub. We now decide to also include Topcoder, Kaggle, and developer blogs in our ranking criteria. This means that we will increase our code base size by incorporating newer classes, with reliance on newer/existing databases, caching infrastructure, additions to the list of periodic background cron jobs, and data maintenance. The list is unending.

In an ideal setup, given a set of modules, m1, m2, m3 .... mn, we would want the net developmental complexity to be Max(m1, m2, m3 ,,,,, mn). Or m1 + m2 + m3 + ... mn. However, in case of our application, it tends to be m1 * m2 * m3 * mn. That is, the complexity is dramatically increasing with the addition of new functionalities.

A single module can affect every other module; two developers working on different modules might knowingly or unknowingly affect each other in so many possible ways. A single incorrect commit by a developer (and a mistake missed by the reviewer), might not only impact his module but everything else (for example, an application startup failure due to a module affects every other module in the application).

All of this makes it very difficult to start working on a new functionality in code base as the complexity keeps increasing.

With time, if a decision gets taken to expand the application search engine to not only include developers but also creative artists and photographers, it will become a daunting task to quickly come up with newer functionalities. This will lead to increasing costs, and worse, losing business due to delays in development time.

Restart and update

A user reports a bug with the application and you realize that the bug is associated specifically with the Stack Overflow engine we have created. The developers in your team quickly find the fix in the Stack Overflow engine and are ready to deploy it in production. However, in disarray, we will need to restart the complete application. Restarting the application is overkill, as the change only affects one particular module.

These are very common scenarios in software development. The bigger the application, the greater the number of reasons for bugs and their fixes, and of course, the application restarts (unless the language provides hot swapping as a first class functionality).

Hot swapping is the ability to alter the running code of a program without needing to interrupt its execution. Erlang is a well-known example that provides the ability to hot swap. In Erlang, one can simply recompile and load the new version at runtime. This feature makes Erlang very attractive for applications that need near 100% availability in telecom and banking. Common Lisp is another such example.

For JVM, Zeroturnaround's proprietary JRebel offers the functionality to hot swap Java code (method bodies, instance variables, and others) at runtime. Because JVM by itself does not provide an interface inbuilt to exhibit it, JRebel applies multiple levels of smartness by the dynamic class rewriting at runtime and JVM integration to version individual classes. In short, it uses very complex yet impressive strategies to exhibit hot swapping, which is not a first class feature in JVM. Although, as a side effect, JRebel is mostly used for faster development to avoid restarts and not normally in production.

Testing and deployment

We now have one giant application that does everything. It relies on a number of frontend modules, frameworks, databases, build scripts, and other infrastructure with ton of configurations.

All this not only makes integration testing difficult, but also makes the deployment process frustrating and error prone. The startup time would significantly increase as the number of operations increase. This, in most cases, would also apply for test cases where the application context loading time would impact the time it takes to start running test cases, thus leading to a loss of developer productivity.

Moreover, there can always be a range of hiccups. In between version upgrades, there might be a range of database operations that need to be performed. There might be single or multiple SQL files that contain the set of SQLs to be run for version upgrades. Multiple modules might have a different set of SQL files. Although, sometimes, due to eventual tight integration, a few of the modules might rely on common tables. Any schema/DML upgrades on the table by one module might unintentionally impact other modules. In such cases, the change has to be appropriately communicated to other teams. Worse, we might not know all the affected teams and this would lead to a production failure.

Scalability

The famous Pokemon Go mobile app had 15 million global downloads within a week of its launch. On August 3, in Japan, people watched an airing of Castle in the Sky. At one point, they took to Twitter so much that it hit a one-second peak of 143,199 tweets per second. The average then was 5,700 tweets per second; thus, there was a 25 times increase in traffic all of a sudden (https://blog.twitter.com/2013/new-tweets-per-second-record-and-how).

In technology businesses, such events are not rare. Although the surge might not be as dramatic as Twitter's, it can nonetheless be significantly higher than anticipated. Our application design is not designed to be scalable. As the load increases, we might vertically scale by adding more memory and CPU, but this cannot be done forever. What if the application goes down or a database starts misbehaving, taking eternally long to respond?

Apart from adding memory and CPU, we could scale it horizontally by having multiple instances of the same application running and a load balancer could route requests to different servers based on the load on individual servers (the application server with the lower load could be routed to more traffic). However, this leaves a lot of unanswered questions:

What do we do with our databases? Will we have a single database that multiple application servers access? If we go with this setup, then a higher load on this database server would affect all the cloned servers, ultimately increasing the response time of all the servers as they all access the same database.

Do we have a separate database for each application server? Then how do we deal with the consistency issues across databases? The data written on one database would have to be copied to the other database server for consistency. What happens if the data was not timely copied and the user requested the data?

To solve this problem, a solution could be to ensure that the application servers interact timely with each other to sync up. What if there is a network partition and the application servers cannot access each other? What happens to the consistency issues in such scenarios?

How many servers do we have installed? If we install more, but the traffic load is low, then it results in wastage of resources and money (at the time of writing, a standard 8 core 32 GB RAM instance would cost 4,200 USD per annum on

Amazon Web Services

(

AWS

)).

In short, our current setup is ill-equipped to scale. It needs to be redesigned from the ground up to handle different ranges of issues, and not an ad hoc mechanism to fix it. What if you scale from 10 to 100,000 requests per minute without a complete revamp effort, but just a configuration change? This ability has to be incorporated as an abstraction and designed to scale from ground zero.

Sudden higher load are opportunities to excel as business. You would not want the application to fail when many users try accessing it for the first time. It would be an opportunity lost.

Isolation

What we currently have is this:

As humans, we are affected by our surroundings. At work, our peers affect us in positive or negative ways. Countries get affected by the peaceful/destructive atmospheres of neighboring countries. Even in the human body, long-standing hypertension can affect the kidneys. Deranged kidney function leads to accumulation of waste products that were normally excreted from the body. These metabolites or waste products then affect brain function. And worse, in the case of kidney dysfunction, water is not excreted effectively, which can lead to cardiac dysfunction. So, if the kidneys start dysfunctioning, they go on to affect everything else.

In the case of the human body, the organs' behavior to impact each other is not by design but rather by designoid. in software are inevitable and they cannot be avoided. Though it is good to write code that can prevent failures, failures can come in all unexpected forms: bug/mistakes in code, network failures, an unstable host due to high CPU/memory utilization, disk failures, JVM crash, thundering herd problem, and many others. How do we design to deal with failures? One strategy to handle failures is to have a backup mechanism. For example, Windows has layered device drivers (https://msdn.microsoft.com/en-us/library/windows/hardware/ff543100(v=vs.85).aspx). So, if one layer is not functional, the next higher layer can start working, thus potentially eliminating downtime. Another example is an octopus which has three hearts and nine brains.

If you are brainy and heartbroken, that should cheer you up.

They have those extra components in place because they want to be resilient and fault tolerant (at least they would like to believe so). Humans have two kidneys and can still survive on one kidney if the other fails. The whole of evolutionary biology teaches us how to be componentized.

But what happens if the backups also start failing? Backups fix the problem and certainly makes it more resilient when compared to the previous state, but they are not full proof. An alternative approach to handle failures is to accept failures and embrace them. Because we accept the fact that failures do happen, we try devising a strategy where other parts of the application remain unaffected by them. Though, of course, we will try everything in the world to prevent them, we accept that failures they are a reality.

A way to remain unaffected by failures in surroundings, is to provide the right isolation across modules. We quantify isolation as isolation in both space and time.

Isolation in space

With software, modules of an application do impact each other for good or bad. CPU utilization, memory consumption, and resources utilized by one part of our monolith significantly affects the application. In our application, an OutOfMemoryError caused by the Stack Overflow engine in our application would destabilize the complete application. Excessive locking in the database or higher CPU on the database server due to higher load (or abuse?) by the LinkedIn engine on the database would obstruct other modules from using the database server.

What we need is isolation in space. That is, for the modules to be completely separated in all forms so that they cannot impact each other. Perhaps splitting our application into different applications, with a separate:

LinkedIn engine application

GitHub engine application

Stack Overflow engine application, and so on

Maybe we could deploy them on different hosts in a cloud service. Sometimes we have multiple instances of the same application running with a load balancer in front to handle higher load. In such cases, these multiple instances could be even in different continents so that a natural calamity at one location does not impact the other instances running on different locations.

Having them as different applications allows failures to be captured, signaled, and managed at a fine-grained level instead of letting them cascade to other components. But this does not fully solve the problem yet. Applications could interact with each other using REST via HTTP. In our case, the Stack Overflow engine might wish to push the consolidated ranks to our main developer-ranking application. If the developer-ranking application was down at that moment, our Stack Overflow engine would not be able to push the data.

If applications interact with each other, then it is required that:

They are both alive at that moment

The operation was a success

Failure of any one then leads to failure of the operation.

One way of dealing with such a failure is to retry the same operation at a future interval. This adds to extra boiler plate code and could get tedious if there were too many different API calls across applications.