111,99 €
Explore the multidisciplinary nature of complex networks through machine learning techniques Statistical and Machine Learning Approaches for Network Analysis provides an accessible framework for structurally analyzing graphs by bringing together known and novel approaches on graph classes and graph measures for classification. By providing different approaches based on experimental data, the book uniquely sets itself apart from the current literature by exploring the application of machine learning techniques to various types of complex networks. Comprised of chapters written by internationally renowned researchers in the field of interdisciplinary network theory, the book presents current and classical methods to analyze networks statistically. Methods from machine learning, data mining, and information theory are strongly emphasized throughout. Real data sets are used to showcase the discussed methods and topics, which include: * A survey of computational approaches to reconstruct and partition biological networks * An introduction to complex networks--measures, statistical properties, and models * Modeling for evolving biological networks * The structure of an evolving random bipartite graph * Density-based enumeration in structured data * Hyponym extraction employing a weighted graph kernel Statistical and Machine Learning Approaches for Network Analysis is an excellent supplemental text for graduate-level, cross-disciplinary courses in applied discrete mathematics, bioinformatics, pattern recognition, and computer science. The book is also a valuable reference for researchers and practitioners in the fields of applied discrete mathematics, machine learning, data mining, and biostatistics.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 550
Veröffentlichungsjahr: 2012
Contents
Cover
Title Page
Copyright
Dedication
Preface
Contributors
Chapter 1: A Survey of Computational Approaches to Reconstruct and Partition Biological Networks
1.1 Introduction
1.2 Biological Networks
1.3 Genome-wide Measurements
1.4 Reconstruction of Biological Networks
1.5 Partitioning Biological Networks
1.6 Discussion
References
Chapter 2: Introduction to Complex Networks: Measures, Statistical Properties, and Models
2.1 Introduction
2.2 Representation of Networks
2.3 Classical Network
2.4 Scale-Free Network
2.5 Small-World Network
2.6 Clustered Network
2.7 Hierarchical Modularity
2.8 Network Motif
2.9 Assortativity
2.10 Reciprocity
2.11 Weighted Networks
2.12 Network Complexity
2.13 Centrality
2.14 Conclusion
References
Chapter 3: Modeling for Evolving Biological Networks
3.1 Introduction
3.2 Unified Evolving Network Model: Reproduction of Heterogeneous Connectivity, Hierarchical Modularity, and Disassortativity
3.3 Modeling Without Parameter Tuning: A Case Study of Metabolic Networks
3.4 Bipartite Relationship: A Case Study of Metabolite Distribution
3.5 Conclusion
References
Chapter 4: Modularity Configurations in Biological Networks with Embedded Dynamics
4.1 Introduction
4.2 Methods
4.3 Results
4.4 Discussion and Concluding Remarks
Acknowledgment
Supporting Information
References
Chapter 5: Influence of Statistical Estimators on the Large-Scale Causal Inference of Regulatory Networks
5.1 Introduction
5.2 Methods
5.3 Results
5.4 Conclusion and Summary
Acknowledgment
References
Chapter 6: Weighted Spectral Distribution: A Metric for Structural Analysis of Networks
6.1 Introduction
6.2 Weighted Spectral Distribution
6.3 A Simple Worked Example
6.4 The Internet Autonomous System Topology
6.5 Comparing Topology Generators
6.6 Tuning Topology Generator Parameters
6.7 Generating Topologies with Optimum Parameters
6.8 Internet Topology Evolution
6.9 Conclusions
References
Chapter 7: The Structure of an Evolving Random Bipartite Graph
7.1 Introduction
7.2 The Structure of a Sparse Bipartite Graph
7.3 Enumerating Bipartite Graphs
7.4 Asymptotic Expansion via the Saddle Point Method
7.5 Proofs of the Main Theorems
7.6 Empirical Data
7.7 Conclusion and Summary
References
Chapter 8: Graph Kernels
8.1 Introduction
8.2 Convolution Kernels
8.3 Random Walk Graph Kernels
8.4 Path-Based Graph Kernels
8.5 Tree-Pattern Graph Kernels
8.6 Cyclic Pattern Kernels
8.7 Graphlet Kernels
8.8 Optimal Assignment Kernels
8.9 Other Graph Kernels
8.10 Applications in Bio-and Cheminformatics
8.11 Summary and Conclusions
Acknowledgments
References
Chapter 9: Network-Based Information Synergy Analysis for Alzheimer Disease
9.1 Introduction
9.2 Datasets and Methods
9.3 Results
9.4 Summary and Conclusions
Acknowledgment
References
Chapter 10: Density-Based Set Enumeration in Structured Data
10.1 Introduction
10.2 Unsupervised Pattern Discovery in Structured Data
10.3 Dense Cluster Enumeration in Weighted Interaction Networks
10.4 Dense Cluster Enumeration in Higher-Order Association Data
10.5 Discussion
References
Chapter 11: Hyponym Extraction Employing a Weighted Graph Kernel
11.1 Introduction
11.2 Related Work
11.3 Drawbacks of Current Approaches
11.4 Semantic Networks Following the MultiNet Formalism
11.5 Support Vector Machines and Kernels
11.6 Architecture
11.7 Graph Kernel
11.8 Graph Kernel Extensions
11.9 Distance Weighting
11.10 Features for Hyponymy Extraction
11.11 Evaluation
11.12 Conclusion and Outlook
Acknowledgments
References
Index
Copyright © 2012 by John Wiley & Sons, Inc. All rights reserved
Published by John Wiley & Sons, Inc., Hoboken, New Jersey
Published simultaneously in Canada
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.
Library of Congress Cataloging-in-Publication Data:
ISBN: 978-0-470-19515-4
To Christina
Preface
An emerging trend in many scientific disciplines is a strong tendency toward being transformed into some form of information science. One important pathway in this transition has been via the application of network analysis. The basic methodology in this area is the representation of the structure of an object of investigation by a graph representing a relational structure. It is because of this general nature that graphs have been used in many diverse branches of science including bioinformatics, molecular and systems biology, theoretical physics, computer science, chemistry, engineering, drug discovery, and linguistics, to name just a few. An important feature of the book “Statistical and Machine Learning Approaches for Network Analysis” is to combine theoretical disciplines such as graph theory, machine learning, and statistical data analysis and, hence, to arrive at a new field to explore complex networks by using machine learning techniques in an interdisciplinary manner.
The age of network science has definitely arrived. Large-scale generation of genomic, proteomic, signaling, and metabolomic data is allowing the construction of complex networks that provide a new framework for understanding the molecular basis of physiological and pathological states. Networks and network-based methods have been used in biology to characterize genomic and genetic mechanisms as well as protein signaling. Diseases are looked upon as abnormal perturbations of critical cellular networks. Onset, progression, and intervention in complex diseases such as cancer and diabetes are analyzed today using network theory.
Once the system is represented by a network, methods of network analysis can be applied to extract useful information regarding important system properties and to investigate its structure and function. Various statistical and machine learning methods have been developed for this purpose and have already been applied to networks. The purpose of the book is to demonstrate the usefulness, feasibility, and the impact of the methods on the scientific field. The 11 chapters in this book written by internationally reputed researchers in the field of interdisciplinary network theory cover a wide range of topics and analysis methods to explore networks statistically.
The topics we are going to tackle in this book range from network inference and clustering, graph kernels to biological network analysis for complex diseases using statistical techniques. The book is intended for researchers, graduate and advanced undergraduate students in the interdisciplinary fields such as biostatistics, bioinformatics, chemistry, mathematical chemistry, systems biology, and network physics. Each chapter is comprehensively presented, accessible not only to researchers from this field but also to advanced undergraduate or graduate students.
Many colleagues, whether consciously or unconsciously, have provided us with input, help, and support before and during the preparation of the present book. In particular, we would like to thank Maria and Gheorghe Duca, Frank Emmert-Streib, Boris Furtula, Ivan Gutman, Armin Graber, Martin Grabner, D. D. Lozovanu, Alexei Levitchi, Alexander Mehler, Abbe Mowshowitz, Andrei Perjan, Ricardo de Matos Simoes, Fred Sobik, Dongxiao Zhu, and apologize to all who have not been named mistakenly. Matthias Dehmer thanks Christina Uhde for giving love and inspiration. We also thank Frank Emmert-Streib for fruitful discussions during the formation of this book.
We would also like to thank our editor Susanne Steitz-Filler from Wiley who has been always available and helpful. Last but not the least, Matthias Dehmer thanks the Austrian Science Funds (project P22029-N13) and the Standortagentur Tirol for supporting this work.
Finally, we sincerely hope that this book will serve the scientific community of network science reasonably well and inspires people to use machine learning-driven network analysis to solve interdisciplinary problems successfully.
Matthias Dehmer Subhash C. Basak
Contributors
Lipi Acharya, Department of Computer Science, University of New Orleans, New Orleans, LA, USA
Enrico Capobianco, Laboratory for Integrative Systems Medicine (LISM) IFC-CNR, Pisa (IT); Center for Computational Science, University of Miami, Miami, FL, USA
Christina Chan, Departments of Chemical Engineering and Material Sciences, Genetics Program, Computer Science and Engineering, and Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, USA
Ricardo de Matos Simoes, Computational Biology and Machine Learning Lab, Center for Cancer Research and Cell Biology, School of Medicine, Dentistry and Biomedical Sciences, Queen's University Belfast, UK
Frank Emmert-Streib, Computational Biology and Machine Learning Lab, Center for Cancer Research and Cell Biology, School of Medicine, Dentistry and Biomedical Sciences, Queen's University Belfast, UK
Damien Fay, Computer Laboratory, Systems Research Group, University of Cambridge, UK
Hirosha Geekiyanage, Genetics Program, Michigan State University, East Lansing, MI, USA
Elisabeth Georgii, Department of Information and Computer Science, Helsinki Institute for Information Technology, Aalto University School of Science and Technology, Aalto, Finland
Hamed Haddadi, Computer Laboratory, Systems Research Group, University of Cambridge, UK
Thair Judeh, Department of Computer Science, University of New Orleans, New Orleans, LA, USA
Reinhard Kutzelnigg, Math.Tec, Heumühlgasse, Wien, Vienna, Austria
Elisabetta Marras, CRS4 Bioinformatics Laboratory, Polaris Science and Technology Park, Pula, Italy
Andrew W. Moore, School of Computer Science, Carnegie Mellon University, USA
Richard Mortier, Horizon Institute, University of Nottingham, UK
Chikoo Oosawa, Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka 820-8502, Japan
Matthias Rupp, Machine Learning Group, Berlin Institute of Technology, Berlin, Germany, and, Institute of Pure and Applied Mathematics, University of California, Los Angeles, CA, USA; currently at the Institute of Pharmaceutical Sciences, ETH Zurich, Zurich, Switzerland.
Kazuhiro Takemoto, Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka 820-8502, Japan; PRESTO, Japan Science and Technology Agency, Kawaguchi, Saitama 332-0012, Japan
Andrew G. Thomason, Department of Pure Mathematics and Mathematical Statistics, University of Cambridge, UK
Antonella Travaglione, CRS4 Bioinformatics Laboratory, Polaris Science and Technology Park, Pula, Italy
Koji Tsuda, Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology AIST, Tokyo, Japan
Steve Uhlig, School of Electronic Engineering and Computer Science, Queen Mary University of London, UK
Tim vor der Brück, Department of Computer Science, Text Technology Lab, Johann Wolfgang Goethe University, Frankfurt, Germany
Xuewei Wang, Department of Chemical Engineering and Material Sciences, Michigan State University, East Lansing, MI, USA
Dongxiao Zhu, Department of Computer Science, University of New Orleans; Research Institute for Children, Children's Hospital; Tulane Cancer Center, New Orleans, LA, USA
Chapter 1
A Survey of Computational Approaches to Reconstruct and Partition Biological Networks
Lipi Acharya, Thair Judeh, Dongxiao Zhu
“Everything is deeply intertwingled”
Theodor Holm Nelson
The above quote by Theodor Holm Nelson, the pioneer of information technology, states a deep interconnectedness among the myriad topics of this world. The biological systems are no exceptions, which comprise of a complex web of biomolecular interactions and regulation processes. In particular, the field of computational systems biology aims to arrive at a theory that reveals complicated interaction patterns in the living organisms, which result in various biological phenomenon. Recognition of such patterns can provide insights into the biomolecular activities, which pose several challenges to biology and genetics. However, complexity of biological systems and often an insufficient amount of data used to capture these activities make a reliable inference of the underlying network topology as well as characterization of various patterns underlying these topologies, very difficult. As a result, two problems that have received a considerable amount of attention among researchers are (1) reverse engineering of biological networks from genome-wide measurements and (2) inference of functional units in large biological networks (Fig 1.1).
Figure 1.1 Approaches addressing two fundamental problems in computational systems biology (1) reconstruction of biological networks from two complementary forms of data resources, gene expression data and gene sets and (2) partitioning of large biological networks to extract functional units. Two classes of problems in network partitioning are graph clustering and community detection.
Rapid advances in high-throughput technologies have brought about a revolution in our understanding of biomolecular interaction mechanisms. A reliable inference of these mechanisms directly relates to the measurements used in the inference procedure. High throughput molecular profiling technologies, such as microarrays and second-generation sequencing, have enabled a systematic study of biomolecular activities by generating an enormous amount of genome-wide measurements, which continue to accumulate in numerous databases. Indeed, simultaneous profiling of expression levels of tens of thousands of genes allows for large-scale quantitative experiments. This has resulted in substantial interest among researchers in the development of novel algorithms to reliably infer the underlying network topology using gene expression data. However, gaining biological insights from large-scale gene expression data is very challenging due to the curse of dimensionality. Correspondingly, a number of computational and experimental methods have been developed to arrange genes in various groups or clusters, on the basis of certain similarity criterion. Thus, an initial characterization of large-scale gene expression data as well as conclusions derived from biological experiments result in the identification of several smaller components comprising of genes sharing similar biological properties. We refer to these components as gene sets. Availability of effective computational and experimental strategies have led to the emergence of gene sets as a completely new form of data for the reverse engineering of gene regulatory relationships. Gene set based approaches have gained more attention for their inherent ability to incorporate higher-order interaction mechanisms as opposed to individual genes.
There has been a sequence of computational efforts addressing the problem of network reconstruction from gene expression data and gene sets. Gaussian graphical models (GGMs) [1–3], probabilistic Boolean networks (PBNs) [4–7], Bayesian networks (BNs) [8, 9], differential equation based [10, 11] and mutual information networks such as relevance networks (RNs) [12, 13], ARACNE [14], CLR [15], MRNET [16] are viable approaches capitalizing on the use of gene expression data, whereas collaborative graph model (cGraph) [17], frequency method (FM) [18], and network inference from cooccurrences (NICO) [19, 20] are suitable for the reverse engineering of biological networks from gene sets.
After a biological network is reconstructed, it may be too broad or abstract of a representation for a particular biological process of interest. For example, given a specific signal transduction, only a part of the underlying network is activated as opposed to the entire network. A finer level of detail is needed. Furthermore, these parts may represent the functional units of a biological network. Thus, partitioning a biological network into different clusters or communities is of paramount importance.
Network partitioning is often associated with several challenges, which make the problem NP-hard [21]. Finding the optimal partitions of a given network is only feasible for small networks. Most algorithms heuristically attempt to find a good partitioning based on some chosen criteria. Algorithms are often suited to a specific problem domain. Two major classes of algorithms in network partitioning find their roots in computer science and sociology, respectively [22]. To avoid confusion, we will refer to the first class of algorithms as graph clustering algorithms and the second class of algorithms as community detection algorithms. For graph clustering algorithms, the relevant applications include very large-scale integration (VLSI) and distributing jobs on a parallel machine. The most famous algorithm in this domain is the Kernighan–Lin algorithm [23], which still finds use as a subroutine for various other algorithms. Other graph clustering algorithms include techniques based on spectral clustering [24]. Originally community detection algorithms focused on social networks in sociology. They now cover networks of interest to biologists, mathematicians, and physicists. Some popular community detection algorithms include Girvan–Newman algorithm [25], Newman's eigenvector method [21, 22], clique percolation algorithm [26], and Infomap [27]. Additional community detection algorithms include methods based on spin models [28, 29], mixture models [30], and label propagation [31].
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!