Bioinformatics with Python Cookbook - Tiago Antao - E-Book

Bioinformatics with Python Cookbook E-Book

Tiago Antao

0,0
41,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Bioinformatics is an active research field that uses a range of simple-to-advanced computations to extract valuable information from biological data, and this book will show you how to manage these tasks using Python.
This updated third edition of the Bioinformatics with Python Cookbook begins with a quick overview of the various tools and libraries in the Python ecosystem that will help you convert, analyze, and visualize biological datasets. Next, you'll cover key techniques for next-generation sequencing, single-cell analysis, genomics, metagenomics, population genetics, phylogenetics, and proteomics with the help of real-world examples. You'll learn how to work with important pipeline systems, such as Galaxy servers and Snakemake, and understand the various modules in Python for functional and asynchronous programming. This book will also help you explore topics such as SNP discovery using statistical approaches under high-performance computing frameworks, including Dask and Spark. In addition to this, you’ll explore the application of machine learning algorithms in bioinformatics.
By the end of this bioinformatics Python book, you'll be equipped with the knowledge you need to implement the latest programming techniques and frameworks, empowering you to deal with bioinformatics data on every scale.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 388

Veröffentlichungsjahr: 2022

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Bioinformatics with Python Cookbook Third Edition

Use modern Python libraries and applications to solve real-world computational biology problems

Tiago Antao

BIRMINGHAM—MUMBAI

Bioinformatics with Python Cookbook
Third Edition

Copyright © 2022 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Publishing Product Manager: Devika Battike

Senior Editor: David Sugarman

Content Development Editor: Joseph Sunil

Technical Editor: Rahul Limbachiya

Copy Editor: Safis Editing

Project Coordinator: Farheen Fathima

Proofreader: Safis Editing

Indexer: Pratik Shirodkar

Production Designer: Shankar Kalbhor

Marketing Coordinator: Priyanka Mhatre

First published: June 2015

Second edition: November 2018

Third edition: September 2022

Production reference: 1090922

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-80323-642-1

www.packt.com

Contributors

About the author

Tiago Antao is a bioinformatician who is currently working in the field of genomics. A former computer scientist, Tiago moved into computational biology with an MSc in bioinformatics from the Faculty of Sciences at the University of Porto, Portugal, and a PhD on the spread of drug-resistant malaria from the Liverpool School of Tropical Medicine, UK. Post his doctoral, Tiago worked with human datasets at the University of Cambridge, UK and with mosquito whole-genome sequencing data at the University of Oxford, UK, before helping to set up the bioinformatics infrastructure at the University of Montana, USA. He currently works as a data engineer in the biotechnology field in Boston, MA. He is one of the co-authors of Biopython, a major bioinformatics package written in Python.

About the reviewers

Urminder Singh is a bioinformatician, computer scientist, and developer of multiple open source bioinformatics tools. His educational background encompasses physics, computer science, and computational biology degrees, including a Ph.D. in bioinformatics from Iowa State University, USA.

His diverse research interests include novel gene evolution, precision medicine, sociogenomics, machine learning in medicine, and developing tools and algorithms for big heterogeneous data. You can visit him online at urmi-21.github.io.

Tiffany Ho works as a bioinformatics associate at Embark Veterinary. She holds a BSc from the University of California, Davis in genetics and genomics, and an MPS from Cornell University in plant breeding and genetics.

Table of Contents

Preface

1

Python and the Surrounding Software Ecology

Installing the required basic software with Anaconda

Getting ready

How to do it...

There’s more...

Installing the required software with Docker

Getting ready

How to do it...

See also

Interfacing with R via rpy2

Getting ready

How to do it...

There’s more...

See also

Performing R magic with Jupyter

Getting ready

How to do it...

There’s more...

See also

2

Getting to Know NumPy, pandas, Arrow, and Matplotlib

Using pandas to process vaccine-adverse events

Getting ready

How to do it...

There’s more...

See also

Dealing with the pitfalls of joining pandas DataFrames

Getting ready

How to do it...

There’s more...

Reducing the memory usage of pandas DataFrames

Getting ready

How to do it…

See also

Accelerating pandas processing with Apache Arrow

Getting ready

How to do it...

There’s more...

Understanding NumPy as the engine behind Python data science and bioinformatics

Getting ready

How to do it…

See also

Introducing Matplotlib for chart generation

Getting ready

How to do it...

There’s more...

See also

3

Next-Generation Sequencing

Accessing GenBank and moving around NCBI databases

Getting ready

How to do it...

There’s more...

See also

Performing basic sequence analysis

Getting ready

How to do it...

There’s more...

See also

Working with modern sequence formats

Getting ready

How to do it...

There’s more...

See also

Working with alignment data

Getting ready

How to do it...

There’s more...

See also

Extracting data from VCF files

Getting ready

How to do it...

There’s more...

See also

Studying genome accessibility and filtering SNP data

Getting ready

How to do it...

There’s more...

See also

Processing NGS data with HTSeq

Getting ready

How to do it...

There’s more...

4

Advanced NGS Data Processing

Preparing a dataset for analysis

Getting ready

How to do it…

Using Mendelian error information for quality control

How to do it…

There’s more…

Exploring the data with standard statistics

How to do it…

There’s more…

Finding genomic features from sequencing annotations

How to do it…

There’s more…

Doing metagenomics with QIIME 2 Python API

Getting ready

How to do it...

There’s more...

5

Working with Genomes

Technical requirements

Working with high-quality reference genomes

Getting ready

How to do it...

There’s more...

See also

Dealing with low-quality genome references

Getting ready

How to do it...

There’s more...

See also

Traversing genome annotations

Getting ready

How to do it...

There’s more...

See also

Extracting genes from a reference using annotations

Getting ready

How to do it...

There’s more...

See also

Finding orthologues with the Ensembl REST API

Getting ready

How to do it...

There’s more...

Retrieving gene ontology information from Ensembl

Getting ready

How to do it...

There’s more...

See also

6

Population Genetics

Managing datasets with PLINK

Getting ready

How to do it...

There’s more...

See also

Using sgkit for population genetics analysis with xarray

Getting ready

How to do it...

There’s more...

Exploring a dataset with sgkit

Getting ready

How to do it...

There’s more...

See also

Analyzing population structure

Getting ready

How to do it...

See also

Performing a PCA

Getting ready

How to do it...

There’s more...

See also

Investigating population structure with admixture

Getting ready

How to do it...

There’s more...

7

Phylogenetics

Preparing a dataset for phylogenetic analysis

Getting ready

How to do it...

There’s more...

See also

Aligning genetic and genomic data

Getting ready

How to do it...

Comparing sequences

Getting ready

How to do it...

There’s more...

Reconstructing phylogenetic trees

Getting ready

How to do it...

There’s more...

Playing recursively with trees

Getting ready

How to do it...

There’s more...

Visualizing phylogenetic data

Getting ready

How to do it...

There’s more...

8

Using the Protein Data Bank

Finding a protein in multiple databases

Getting ready

How to do it...

There’s more

Introducing Bio.PDB

Getting ready

How to do it...

There’s more

Extracting more information from a PDB file

Getting ready

How to do it...

Computing molecular distances on a PDB file

Getting ready

How to do it...

Performing geometric operations

Getting ready

How to do it...

There’s more

Animating with PyMOL

Getting ready

How to do it...

There’s more

Parsing mmCIF files using Biopython

Getting ready

How to do it...

There’s more

9

Bioinformatics Pipelines

Introducing Galaxy servers

Getting ready

How to do it…

There’s more

Accessing Galaxy using the API

Getting ready

How to do it…

Deploying a variant analysis pipeline with Snakemake

Getting ready

How to do it…

There’s more

Deploying a variant analysis pipeline with Nextflow

Getting ready

How to do it…

There’s more

10

Machine Learning for Bioinformatics

Introducing scikit-learn with a PCA example

Getting ready

How to do it...

There’s more...

Using clustering over PCA to classify samples

Getting ready

How to do it...

There’s more...

Exploring breast cancer traits using Decision Trees

Getting ready

How to do it...

Predicting breast cancer outcomes using Random Forests

Getting ready

How to do it…

There’s more...

11

Parallel Processing with Dask and Zarr

Reading genomics data with Zarr

Getting ready

How to do it...

There’s more...

See also

Parallel processing of data using Python multiprocessing

Getting ready

How to do it...

There’s more...

See also

Using Dask to process genomic data based on NumPy arrays

Getting ready

How to do it...

There’s more...

See also

Scheduling tasks with dask.distributed

Getting ready

How to do it...

There’s more...

See also

12

Functional Programming for Bioinformatics

Understanding pure functions

Getting ready

How to do it...

There’s more...

Understanding immutability

Getting ready

How to do it...

There’s more...

Avoiding mutability as a robust development pattern

Getting ready

How to do it...

There’s more...

Using lazy programming for pipelining

Getting ready

How to do it...

There’s more...

The limits of recursion with Python

Getting ready

How to do it...

There’s more...

A showcase of Python’s functools module

Getting ready

How to do it...

There’s more...

See also...

Index

Other Books You May Enjoy