Function and Evolution of Repeated DNA Sequences -  - E-Book

Function and Evolution of Repeated DNA Sequences E-Book

0,0
142,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

The genome of a living being is composed of DNA sequences with diverse origins. Beyond single-copy genes, whose product has a biological function that can be inferred by experimentation, certain DNA sequences, present in a large number of copies, escape the most refined approaches aimed at elucidating their precise role. The existence of what 20th century geneticists had already perceived (and wrongly described as "junk DNA"!) was confirmed by the sequencing of the first complex genomes, including that of Homo sapiens. A large part of what defines a living thing is not unique, but repeated, sometimes a very large number of times, increasing in complexity with successive duplications and multiplication. Understanding and defining the many functions of this myriad of repeated sequences, as well as their evolution through natural selection, has become one of the major challenges for 21st century genomics.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 704

Veröffentlichungsjahr: 2023

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Table of Contents

Title Page

Copyright Page

Foreword

Introduction: About Repeated Genomes

I.1. The “C-value” paradox

I.2. Recycling junk DNA

I.3. The different repeat types

I.4. References

1 Whole-Genome Duplications, a Source of Redundancy at the Entire-Genome Scale

1.1. Prevalence of polyploids in the tree of life

1.2. Mechanisms for the appearance of whole-genome duplications

1.3. Cellular consequences of whole-genome duplications

1.4. Rediploidization: evolutionary reduction in genetic redundancy

1.5. Functions and evolution of duplicated genes

1.6. Whole-genome duplications and evolutionary diversification

1.7. Perspectives and conclusions

1.8. References

2 Segmental Duplications and CNVs: Adaptive Potential of Structural Polymorphism

2.1. The multiple facets of genetic polymorphism

2.2. From Segmental Duplications to Copy Number Variants: terminology

2.3. SDs: a general overview

2.4. Methodologies for detecting structural variation in genomes

2.5. The molecular mechanisms at the origin of structural variation

2.6. Regions rich in SDs/LCRs favor the creation of CNVs: insertions/ duplications, deletions and inversions

2.7. From SDs to CNVs in humans and primates

2.8. SDs in little-studied species: general genomic profiles

2.9. SD content: impact of a duplicated environment on sequences that make up the SDs

2.10. SDs and epigenetic modifications

2.11. The adaptive potential of SDs: between the benefit of innovation and the cost of pathology

2.12. SDs and associated CNVs: their roles in species adaptation to changes in environments

2.13. Conclusion

2.14. Glossary of terms

2.15. References

3 Transposable Elements: Parasites that Shape Genome Evolution

3.1. Transposable elements in eukaryotic genomes

3.2. Classification of TEs and transposition mechanisms

3.3. TE self-regulation

3.4. TE restriction by the host

3.5. The impact of transposition events on genomes

3.6. Conclusion

3.7. References

4 Insights Into the Evolutionary Diversity of Centromeres

4.1. The centromere

4.2. Monocentromeres

4.3. Holocentromeres

4.4. Open questions

4.5. Acknowledgments

4.6. References

5 Evolution and Functions of Telomeres

5.1. Primary structure of telomeres

5.2. A telomere specific higher order structure: the T-loop

5.3. Telomere lengthening mechanisms

5.4. Telomere length homeostasis

5.5. Telomeres and genome organization and function

5.6. Cell senescence, aging and disease

5.7. Conclusion

5.8. Acknowledgments

5.9. References

6 G-quadruplexes: Structure, Detection and Functions

6.1. From guanine-guanine base-pairing to a secondary structure

6.2. The G4 structure: variations on a theme

6.3. Finding G-quadruplexes in a genome

6.4. Biological roles of G-quadruplexes

6.5. Perspective: G-quadruplexes as anticancer therapeutic targets

6.6. References

7 Satellite DNA, Microsatellites and Minisatellites

7.1. Satellite DNAs, origin and definition

7.2. From semantics to biology

7.3. The evolutionary mechanisms of tandem repeats

7.4. Microsatellites in human diseases

7.5. De novo formation and evolution of tandem repeats

7.6. Perspectives

7.7. Acknowledgments

7.8. References

8 CRISPR-Cas: An Adaptive Immune System

8.1. A brief history of the discovery of CRISPR-Cas systems

8.2. General characteristics of CRISPR-Cas systems

8.3. Evolution of CRISPR-Cas systems

8.4. An adaptive immune system

8.5. Phage escape mechanisms

8.6. Biological cost of CRISPR-Cas systems

8.7. Importance in nature: impact of ecological factors

8.8. Conclusions and perspectives

8.9. References

List of Authors

Index

End User License Agreement

List of Tables

Chapter 2

Table 2.1. Summary of SD detection methods

Table 2.2. Information on the SDs extracted from 12 genomes

Table 2.3. SDs in humans – Gene, modification, phenotype-disease, mechanism an...

Table 2.4. CNVs and examples of traits affected during domestication

Chapter 3

Table 3.1. Examples of “transposopathies” for which TE insertion is associated...

Table 3.2. Some notable or iconic examples of domestication of different TE en...

Table 3.3. Examples of natural transposition events, selected by humans

Table 3.4. Examples of molecular biology tools developed from TE

Chapter 6

Table 6.1. Biophysical and biochemical methods to study G-quadruplexes (for ad...

Table 6.2. Putative G-quadruplex sequences in 12 genomes (for a recent and det...

Table 6.3. Resolved G-quadruplex structures in gene promoters

List of Illustrations

Introduction

Figure I.1. Example of C

o

t curve

Figure I.2. Comparison of genome sizes and gene numbers

Figure I.3. The different types of repeated DNA sequences

Chapter 1

Figure 1.1. Whole-genome duplications identified in the eukaryotic phylogeneti...

Figure 1.2. Tetraploidization by endoreplication. In the case of a normal cell...

Figure 1.3. Allopolyploidizations in the lineage of wheat (Triticum aestivum)....

Figure 1.4. Restoration of meiosis in polyploids. In polyploid species, homolo...

Figure 1.5. Organization of homeologous regions in the genome of rainbow trout...

Figure 1.6. Homeologous regions with double-conserved synteny in teleost fish ...

Figure 1.7. Ancestral and delayed rediploidization. In the most classical case...

Figure 1.8. Outcome of duplicated genes after a polyploidization event. During...

Figure 1.9. Example of ohnologous genes with divergent territories of expressi...

Figure 1.10. Preferential retention mechanisms of ohnolog copies of genes. The...

Figure 1.11. Disentangling of a gene regulatory block during rediploidization....

Chapter 2

Figure 2.1. General view of mutations affecting eukaryotic genomes, their impa...

Figure 2.2. The human Y chromosome, its major duplications, and the alteration...

Figure 2.3. Representation of interchromosomal (center) and intrachromosomal (...

Figure 2.4. Distribution of duplications on human Y chromosomes compared to a ...

Figure 2.5. Usual fates of a duplicated gene: (i) conservation; (ii) subfuncti...

Figure 2.6. Rearrangements involving SDs that are directly (duplication/deleti...

Chapter 3

Figure 3.1. Proportion of TE sequences in several genomes. While the proportio...

Figure 3.2. Classification of autonomous TEs in eukaryotes. This classificatio...

Figure 3.3. Main mechanisms controlling the expression and mobility of TEs. Ev...

Figure 3.4. Examples of integration sites targeted by TEs. Chromosome features...

Figure 3.5. Recognition and repression of TEs by the KRAB-ZFP complex. The KRA...

Figure 3.6. Biogenesis of piRNAs derived from uni-strand piRNA clusters. Uni-s...

Figure 3.7. Biogenesis of piRNAs derived from dual-strand piRNA clusters in Dr...

Figure 3.8. Mutagenesis induced by the insertion of a TE in a gene. Shown at t...

Figure 3.9. Functional consequences of the insertion of a TE in the regulatory...

Figure 3.10. Consequences of recombination between TEs of the same family (NAH...

Figure 3.11. Complex interactions between TEs and their host. TE activity is r...

Chapter 4

Figure 4.1. (A) Illustrations of salamander cells with chromosomes (monocentri...

Figure 4.2. Phylogeny of several fungal organisms along with their respective ...

Figure 4.3. Schematics of models for different holocentric architectures of C....

Chapter 5

Figure 5.1. On top, canonical nucleotide sequence of vertebrate telomeres. The...

Figure 5.2. Emergence and evolution of linear chromosomes (simplified from Vil...

Figure 5.3. The nucleoprotein structure of human telomeres. The telomere-speci...

Figure 5.4. Mechanisms of telomere replication

Figure 5.5. The end-replication problem at telomeres is a leading mechanism pr...

Figure 5.6. Telomere length shortens with age and exaggerated shortening is as...

Chapter 6

Figure 6.1. From guanines to the G-quadruplex structure. From left to right: a...

Figure 6.2. Strand orientation and G-quadruplex topologies. The glycosidic bon...

Figure 6.3. Parallel-stranded DNA G-quadruplex model reconstruction. The UCSF ...

Figure 6.4. Examples of loop conformations from published G-quadruplex structu...

Figure 6.5. Targeting G-quadruplexes. The model reconstruction of a G-quadrupl...

Chapter 7

Figure 7.1. Length distribution of the different tandem repeats. The abscissa ...

Figure 7.2. Number of citations in the PubMed database for different tandem re...

Figure 7.3. Microsatellite expansion disorders. On the left: diseases are clas...

Figure 7.4. Replication slippage model. During DNA synthesis in a tandemly rep...

Figure 7.5. Model of slippage during homologous recombination

Figure 7.6. Mechanisms of microsatellite formation by mutation. An unrepeated ...

Figure 7.7. Mechanisms of microsatellite formation by end-joining. A double-st...

Figure 7.8. Mechanism of formation of a minisatellite. A slippage between two ...

Figure 7.9. Mechanisms of evolution of mini- and megasatellites. (A) The dupli...

Figure 7.10. The limits of microsatellite detection algorithms. Depending on t...

Chapter 8

Figure 8.1. (A) Representation of the CRISPR genomic sequence with its success...

Figure 8.2. Evolution of the CRISPR array following the infection of a sensiti...

Figure 8.3. Conservation of the secondary structure of repeats of the same gro...

Figure 8.4. (A) Genetic organization of class 1 and class 2 systems. (B) Modul...

Figure 8.5. The three stages of the immune response. Adaptation corresponds to...

Figure 8.6. Diversity of molecular mechanisms of type I, III and II systems du...

Guide

Cover Page

Table of Contents

Title Page

Copyright Page

Foreword

Introduction: About Repeated Genomes

Begin Reading

List of Authors

Index

WILEY END USER LICENSE AGREEMENT

Pages

iii

iv

xiii

xiv

xv

xvi

xvii

xviii

xix

xx

xxi

xxii

xxiii

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

SCIENCES

Biology, Field Director – Marie-Christine Maurel

Genetics, Epigenetics, Subject Head – Bernard Dujon

Function and Evolution of Repeated DNA Sequences

Coordinated by

Guy-Franck Richard

First published 2023 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.

Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:

ISTE Ltd27-37 St George’s RoadLondon SW19 4EUUK

www.iste.co.uk

John Wiley & Sons, Inc.111 River StreetHoboken, NJ 07030USA

www.wiley.com

© ISTE Ltd 2023The rights of Guy-Franck Richard to be identified as the author of this work have been asserted by him in accordance with the Copyright, Designs and Patents Act 1988.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s), contributor(s) or editor(s) and do not necessarily reflect the views of ISTE Group.

Library of Congress Control Number: 2022949377

British Library Cataloguing-in-Publication DataA CIP record for this book is available from the British LibraryISBN 978-1-78945-119-1

ERC code:LS2 Genetics, ‘Omics’, Bioinformatics and Systems Biology LS2_5 Epigenetics and gene regulation

Foreword

Our modern societies are too preoccupied with immediate performances to conceive of a world where the costs and efforts to achieve a result are not rationally minimized. And yet, life offers this image as soon as we take time to study it closely. The remarkable adaptation of different organisms to their living conditions revolves around genomes which are far from the products of what we may consider to be rational engineering. There is no such thing as a minimal genome: all of them are too large in comparison to the number of genes considered necessary to produce the organism hosting them. Often far too large, ours appears 50 times too large. They all contain identically (or almost identically) repeated sequences, sometimes they are repeated numerous times in the same genome; whereas random combinations of the four nucleotides make this phenomenon extremely unlikely, if not practically impossible.

This situation was observed as far back as the mid-20th century, long before the emergence of genomics, through the study of the renaturation kinetics of DNA molecules. The excessive amount of DNA and the abundance of repeated sequences remained a puzzle that some tended to quickly dismiss by referring to junk DNA, continuing to focus their studies only on what they already knew! Genome sequencing would come to solve this enigma by demonstrating just how incomplete our prior knowledge was. A new vision of genome organization and function is now provided, in which temporal dynamics combine with the present, because all genomes are simply imperfect copies of the genomes that preceded them and not new constructs. From now on, traces of the past mix with present events and together they lay the foundations of the future.

As the present work illustrates so remarkably, repeats found in genomes can result from major evolutionary accidents such as whole-genome duplications, which, in a singular phenomenon, tend to coincide with the transitions between major geological eras. But they may also come from repeated interactions with infectious elements – of the viral type – that eventually integrate into chromosomes and are transmitted to the offspring. Thus, there are both endogenous and exogenous causes for the existence of repeated sequences in genomes. In turn, repeats can form the basis of the formation of chromosome functional elements such as centromeres, telomeres and guanine quadruplexes. Copy number variation of long repeated sequences can play a critical role in phenotypes and in organism adaptation. Similarly, the instability of short-sequence repeats allows us to easily differentiate between individuals from the same population. However, this can sometimes lead to very serious syndromes. Finally, the different mobile elements, kinds of specialized molecular machines, present in various numbers in the different genomes cannot be ignored. By the mid-20th century, they had already been identified by their genetic effects – they are mutagenic – but we now have a much broader view of their diversity and of the consequences, sometimes considerable, of their activity.

If our knowledge of repeated genome sequences has only progressed belatedly, this is partly due to technical difficulties encountered in their sequencing. Until the recent advent of new technologies allowing longer reads, it was very difficult to correctly assemble repeated sequences and many so-called whole-genome sequences were in fact incomplete. For example, about 8% of the human genome, made up of highly repetitive sequences, remained unknown for two decades, until the application of special technologies this year. Similarly, the study of copy number variations due to segmental duplications, which have long been underestimated, is only just beginning. And let us not forget that our exploration of the living world is not only far from complete but is very much biased in favor of the already well-known groups of organisms. We can therefore expect new discoveries, or even surprises, in the study of this part of genomes, overlooked for too long, which demonstrates just how real long-term success differs from the illusion of immediate performance.

Gif-sur-YvetteBernard DUJONProfessor Emeritus at Sorbonne Universitéand the Institut PasteurMember of the Institut de France(Académie des Sciences)

May 2023

IntroductionAbout Repeated Genomes

Guy-Franck RICHARD

Instabilités naturelles & synthétiques des génomes, Institut Pasteur, CNRS UMR3525, Paris, France

Genome [ˈdʒiːnəʊm] nm Biol.: Set of hereditary characteristics of a living being, of which a small part is composed of genes providing a function to the organism, and the majority is composed of repeated sequences for which it is unknown whether they have a function.

Taking matters a little further, this could be a modern definition of the word “genome”, in the light of the knowledge garnered across three decades from sequencing the DNA content of living beings, in particular eukaryotic organisms, more complex than those of their bacterial and archaebacterial ancestors. Biologists were already aware back in the 1960s, long before the invention of the first DNA sequencing methods, that the content of genomes was difficult to comprehend. Denaturation-renaturation experiments highlighted that the speed of renaturation of the double-helix was proportional to its concentration. The Cot parameter was the value at which renaturation of half the genomic DNA was complete, under controlled conditions. Each organism could then be defined by the Cot value of its genome. In trying to establish the Cot values of genomes of the simplest organisms – phages or bacteria – or of more complex organisms, such as vertebrates, it transpired that the latter contained three types of sequences presenting very different Cot values (see Figure I.1).

Figure I.1.Example of Cot curve

It is thus possible to show that the mouse genome, for example, is composed of 70% unique sequences with slow renaturation, 20% moderately repeated sequences present in 1,000 to 100,000 copies per genome and 10% highly repeated sequences representing at least 1 million copies per genome and showing rapid renaturation (Britten and Kohne 1968). This approach, based on the physicochemical properties of DNA, slightly underestimated the quantity of repeated sequences because their renaturation rate depends on the identity between these sequences, divergent sequences (such as long terminal repeats (LTRs)) renaturing more slowly than identical sequences. Nowadays, Cot curves are still sometimes used to separate the highly repetitive fraction of a genome from its unique fraction in order to sequence specific DNA of either fraction (Peterson et al. 2008).

I.1. The “C-value” paradox

From the moment it was proven that DNA was the support of heredity, and theoretically contained all the genes necessary for the development of a living being, it seemed logical that the most sophisticated organisms had to contain more genes and therefore more DNA in their genome (the “C value”) to encode these genes. This idea was to be questioned in the 1950s with the discovery that the nuclei of certain amphibians and fish contained 20 times more DNA than the nuclei of mammals. Given that the latter presented a greater developmental complexity, this appeared very much paradoxical, and was even used as an argument by the opponents of DNA being the sole support of heredity (Thomas 1971). This “C-value paradox” could finally be explained only decades later, when the first genomes were sequenced. It is now known that the number of genes in an organism has little to do with its size or level of complexity. The baker’s yeast genome contains about 6,000 genes, that of fruit flies about 14,000 and the human genome (or those of its very close cousins, great apes) contents itself with 20,000 genes, with which it manages a very sophisticated level of developmental and behavioral complexity. But what about the paramecium with its 40,000 genes, twice as many as the human genome? Or Trichomonas vaginalis, a parasite of the genital tract, with its 60,000 genes? Or indeed wheat and its 124,000 genes, more than six times as many as our genes? Clearly, this so-called complexity could not be measured by the number of genes in an organism. Studies of comparative genomics1 have shown that this high number of genes in certain organisms does in fact conceal ancestral events of partial or total genome duplication, followed by variable amounts of gene losses (Wolfe and Shields 1997; Jaillon et al. 2004). These events actively participate in the genetic redundancy and their identification as well as their underlying mechanisms will be addressed in Chapter 1.

If the complexity of an organism has nothing to do with the number of genes contained, the same is true of the amount of DNA. The human genome, with just over three times as many genes as brewer’s yeast, contains 200 times more DNA. The genome of a rotifer – a small animal measuring just a few millimeters that lives in freshwaters – contains three times more genes than the human genome in 12 times less DNA! (see Figure I.2).

The genomic sequence of all these organisms showed that some of them had evolved a very compact genome, with high gene density, while others contained a multitude of repeated DNA sequences whose function did not appear obvious at first glance, and that some authors did not hesitate to call them “junk DNA” (Ohno 1972).

Figure I.2.Comparison of genome sizes and gene numbers

I.2. Recycling junk DNA

About 2% of the human genome is translated into proteins. Even by adding the untranslated genes (rRNA, tRNA, siRNA, snRNA, etc.), the percentage of “useful” DNA barely increases. So, what is the purpose of the 98% of DNA in our genome that has, apparently, no function? One conceivable answer is that it has none. The consortium led by Jeff Boeke, professor of genetics at Johns Hopkins University in Baltimore, set out to create the first synthetic yeast genome, using synthetic oligonucleotides. The brewer’s yeast Saccharomyces cerevisiae is a eukaryotic organism whose genome contains 12.5 million nucleotides distributed across 16 chromosomes. The synthetic chromosomes were reconstructed one by one from 70 nucleotide-long sequences assembled in blocks of 750 base pairs, themselves assembled in mega-blocks of 2–4 kb, reintroduced one after the other in a hierarchical manner into the yeast genome in replacement of the natural sequences (Muller and Koszul 2015). When designing synthetic chromosomes, it was decided that all repeated sequences would be removed from the genome. All tRNA-encoding DNAs were grouped on a single circular chromosome, specifically built to carry them. Retrotransposons, microsatellites, minisatellites and other repeated elements inessential to life were removed from the new sequence. These synthetic chromosomes, with their junk DNA removed, are perfectly able to sustain life in yeast cells containing them, without any apparent phenotypic defect, at least under laboratory growth conditions (Dymond et al. 2011; Annaluru et al. 2014). One may conclude from the results of this project that junk DNA is useless. However that would be a mistake.

The human reference genome contains about 443,000 residual elements of past retroviral invasions, covering 8.3% of the total sequence (International Human Genome Sequencing Consortium 2001). These retroviral scars are the remains of successive invasions, occurring over the past hundred million years, of our mammalian ancestors by exogenous elements, which left the trace of their passage in the form of LTR2. These retroviral remains are therefore part of our junk DNA. Nevertheless, as we will see, their presence in our genome testifies to their distant but indispensable role in the existence of our lineage. Therian mammals, that is, those possessing a uterus within which the fertilized egg develops, are classified into two groups. Eutherians (or placentals) like humans and mice have a very elaborate placenta connecting the wall of the uterus to the embryo and allowing it to develop in complete safety throughout the entire gestation period. Marsupials (kangaroos and koalas) do not have placentas and the development of their young takes place mainly outside the uterus. Genome sequencing showed that the two human genes specifically expressed in the placenta, syncytin-1 and syncytin-2, were derived from a gene encoding an ancestral viral protein, which infected the primate lineage 25–40 million years ago. Remarkably, the genome of the mouse, another placental mammal, also contains two viral genes having the same function as human genes but deriving from a slightly more recent viral infection than that of the human lineage. Thus, the placenta was invented twice, independently, in two lineages of mammals, by capture of genes of retroviral origin (Dupressoir et al. 2009). Another example is even more striking. Sexual reproduction was invented at the origin of the eukaryotic world. From the first primitive eukaryotic cells, a syngamy3 system was developed that allowed the nuclei of two haploid cells to fuse to give birth to a diploid cell. The protein responsible for the fusion of male and female gametes is the same in plants and animals; it is the product of the HAP2 gene. This protein is of viral origin and allows the envelope of a virus to fuse with the plasma membrane of its host’s cells (Fédry et al. 2017). Thus, a gene essential to sexual reproduction was captured from a virus by the genome of the very first eukaryotic cells about 1.5 billion years ago.

Other examples of the capture of a piece of transposable element exist, thus creating a new gene, a new function. Junk DNA is therefore regularly recycled during the course of evolution to bring diversity and novelty. As François Jacob (1977) said more than 40 years ago, evolution “tinkers”, it makes new from old, reusing bits of genes, cutting them, splicing them and fusing them with others in order to create novelty. What appears today to the geneticists of the 21st century as junk DNA perhaps served in the past – or will serve in the future – to create diversity. The tremendous success of the eukaryotic world in invading all ecological niches under all climates and latitudes stems in part from the extraordinary flexibility of its genome and its ability to accumulate genetic elements that are seemingly useless but will be recycled in the long run to create novelty and enable the appearance of new living species.

I.3. The different repeat types

There are often several ways to classify genetic elements. Some authors have chosen to distinguish between dispersed repeated elements in contrast to tandem repeats, the latter being repeated at least twice in a row at the same genetic locus, unlike the former, which are repeated at different loci (Richard et al. 2008). But some dispersed repeats are so numerous in the genomes that they appear to be tandemly repeated. This is the case for Alu sequences in humans, which are frequently found grouped in introns or intergenic sequences. Repeated sequences of exogenous origin, that is to say originating from an organism other than the cell in which they are observed, could also be distinguished from repeated sequences of endogenous origin, manufactured by the cell in which they are observed. Transposable elements would belong to the first category, having invaded the genomes of eukaryotic (or prokaryotic) lineages, while the different satellite DNAs would belong to the second, being manufactured by molecular processes specific to the genomes that contain them. But other problems then arise. It is known, for example, that Alu elements, inactive retrotransposons that can be mobilized in trans by the machinery of other retroelements, are of endogenous origin. They result in fact from the duplication of the non-coding 7SL RNA, which is involved in the synthesis of excreted proteins. This duplication, prior to mammalian radiation, resulted in the fusion of two monomers of 130 nucleotides derived from 7SL RNA, separated by a short adenine-rich region (Ullu and Tschudi 1984). Achieving a coherent classification of the repeated sequences therefore proves a complicated task, particularly in the genomes of evolved plants and animals within which they are plethoric, both in structure and number.

We have therefore tried in the rest of this work to present the repeat elements in relation to their role (proven or assumed) in genomes, rather than according to their structure or their assumed origin (see Figure I.3).

Figure I.3.The different types of repeated DNA sequences

After exploring total or partial genome duplications in Chapter 1, the duplications of large DNA segments, sometimes in multiple copies in tandem or dispersed within genomes, will be described in Chapter 2. These contribute significantly to the level of genetic redundancy and gene duplication and their study, although essential to understand the dynamics of complex genomes and the inheritability of certain traits is still in its infancy. Transposons and retrotransposons will be presented in Chapter 3, and their role in the generation of genetic novelties will be detailed. In most species, centromeres are present at a rate of one per chromosome. These very particular repeated elements are essential for the proper segregation of sister chromatids during cell divisions. They will be studied in Chapter 4 and as we will see, holocentric organisms depart from this rule by exhibiting several tens of centromeres per chromosome. Telomeres are highly repeated sequences found at the ends of chromosomes to prevent loss of genetic information. Their sequence and structure vary greatly from organism to organism, with some species having developed highly original telomeres that are made up of tandemly repeated elements. These concepts will be studied in Chapter 5. G-quadruplexes, these secondary DNA structures caused by the regular repeat of GC base pairs, are present in all eukaryotic genomes. Their distribution and their role in DNA transcription and replication will be discussed in Chapter 6. The different types of satellite DNA found in large numbers in eukaryotes, and whose precise function is not always clear, will be described in Chapter 7. As we will see, although prokaryotic genomes contain only few of them, some bacteria use them as camouflage to escape their host’s immune system. Remaining in the world of prokaryotes, we will end in Chapter 8 with the fascinating study of another bacterial defense mechanism, directed against these other enemies that are plasmids and bacteriophages: the CRISPR-Cas system. The acquisition of small, tandemly repeated pieces of DNA from invaders foreign to the cell provides eubacteria and archaea with a robust line of defense. And it offers 21st-century geneticists myriad tools to manipulate their preferred genomes at their own convenience.

I.4. References

Annaluru, N., Muller, H., Mitchell, L.A., Ramalingam, S., Stracquadanio, G., Richardson, S.M., Dymond, J.S., Kuang, Z., Scheifele, L.Z., Cooper, E.M. et al. (2014). Total synthesis of a functional designer eukaryotic chromosome.

Science

, 344(6179), 55–58.

Britten, R.J. and Kohne, D.E. (1968). Repeated sequences in DNA.

Science

, 161, 529–540.

Dupressoir, A., Vernochet, C., Bawa, O., Harper, F., Pierron, G., Opolon, P., Heidmann, T. (2009). Syncytin-A knockout mice demonstrate the critical role in placentation of a fusogenic, endogenous retrovirus-derived, envelope gene.

Proceedings of the National Academy of Sciences

, 106(29), 12127–12132.

Dymond, J.S., Richardson, S.M., Coombes, C.E., Babatz, T., Muller, H., Annaluru, N., Blake, W.J., Schwerzmann, J.W., Dai, J., Lindstrom, D.L. et al. (2011). Synthetic chromosome arms function in yeast and generate phenotypic diversity by design.

Nature

, 477(7365), 471–476.

Fédry, J., Liu, Y., Péhau-Arnaudet, G., Pei, J., Li, W., Tortorici, M.A., Traincard, F., Meola, A., Bricogne, G., Grishin, N.V. et al. (2017). The ancient gamete fusogen HAP2 is a eukaryotic class II fusion protein.

Cell

, 168(5), 904–915.e10.

International Human Genome Sequencing Consortium (2001). Initial sequencing and analysis of the human genome.

Nature

, 409, 860–921.

Jacob, F. (1977). Evolution and tinkering.

Science

, 196(4295), 1161–1166.

Jaillon, O., Aury, J.-M., Brunet, F., Petit, J.-L., Stange-Thomann, N., Mauceli, E., Bouneau, L., Fischer, C., Ozouf-Costaz, C., Bernot, A. et al. (2004). Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype.

Nature

, 431(7011), 946–957.

Muller, H. and Koszul, R. (2015). Conception et synthèse de néochromosomes.

Médecine thérapeutique/Médecine de la reproduction, gynécologie et endocrinologie

, 17(4), 228–236.

Ohno, S. (1972). So much “junk” DNA in our genome.

Evolution of Genetic Systems

, 23, 366–370.

Peterson, D.G., Schulze, S.R., Sciara, E.B., Lee, S.A., Bowers, J.E., Nagel, A., Jiang, N., Tibbitts, D.C., Wessler, S.R., Paterson, A.H. (2008). Integration of Cot analysis, DNA cloning, and high-throughput sequencing facilitates genome characterization and gene discovery.

Genome Research

, 12, 795–807.

Richard, G.-F., Kerrest, A., Dujon, B. (2008). Comparative genomics and molecular dynamics of DNA repeats in eukaryotes.

Microbiol. Mol. Biol. Rev.

, 72(4), 686–727.

Thomas Jr., C.A. (1971). The genetic organization of chromosomes.

Annu. Rev. Genet.

, 5, 237–256.

Ullu, E. and Tschudi, C. (1984). Alu sequences are processed 7SL RNA genes.

Nature

, 312, 171–172.

Wolfe, K.H. and Shields, D.C. (1997). Molecular evidence for an ancient duplication of the entire yeast genome.

Nature

, 387, 708–713.

Notes

1

Comparative genomics is the field of genomics that focuses on the comparison of entire genomes with each other and not just of genes. Analysis tools have been specifically developed to compare the organization, structure, synteny (gene order along a chromosome) of genomes, considered as objects to be studied as a whole.

2

LTR (long terminal repeat): repeated sequences typically found at the insertion sites of retroviruses.

3

Syngamy: nuclear fusion of two cells of opposite mating type.

1Whole-Genome Duplications, a Source of Redundancy at the Entire-Genome Scale

Elise PAREY1 and Camille BERTHELOT1,2

1Institut de biologie de l’École Normale Supérieure, CNRS UMR8197, INSERM U1024, Paris, France

2Génomique fonctionnelle comparative, Institut Pasteur, CNRS UMR3525, Paris, France

Whole-genome duplications, or polyploidizations, are high-impact mutational events, generating a copy of the entirety of a cell’s chromosomes. Although genomes are made up of numerous locally repeated sequences (transposable elements, satellite DNA, duplicated genes, etc.), whole-genome duplications introduce a redundancy of the entire genome. Along with expansions of transposable elements, whole-genome duplications are the major contributors to the evolution of genome size.

This chapter presents the main examples of polyploids, the mechanisms at the origin of their formation and their consequences for genome functioning and organism evolution. Although this chapter focuses on polyploid organisms, we will also discuss the case of polyploid cells and cell populations within haploid or diploid individuals, and their implications in the normal and pathological physiology of organisms.

Box 1.1.Glossary of terms

Glossary

Orthologs, paralogs and ohnologs: these terms correspond to the evolutionary origins of two genes (or sequences) observed in one or several different species. Orthologous genes are descended from a common ancestor by speciation (“same gene” in different species), whereas paralogs are descended from a duplication of the gene in an ancestor. Ohnologs are a subclass of paralogs and correspond specifically to paralogs derived from polyploidization.

Homeologs: duplicated chromosomes derived from polyploidization, as opposed to homologs, which correspond to pairs of chromosomes in a diploid genome. By extension, this term can also refer to duplicated chromosomal regions within a polyploid genome.

Synteny: the order and organization of genes, and more generally of sequences, along chromosomes. Synteny is said to be “conserved” when the gene order remains maintained during the course of evolution. It can be disrupted by genomic rearrangements and sequence deletions and insertions.

Paleopolyploids and neopolyploids: paleopolyploid species are descended from an ancient whole-genome duplication event. Their genome bears the traces of this duplication but no longer behaves like a polyploid genome. Neopolyploid species are recent polyploids whose genome is a clear assortment of two or more subgenomes, which have particular behaviors owing to their polyploid state.

1.1. Prevalence of polyploids in the tree of life

1.1.1. Whole duplications in eukaryotes

Whole-genome duplications have profoundly marked the evolutionary history of eukaryotic genomes, and it is in this phylum that they are best documented (see Figure 1.1). We distinguish here between species descended from polyploid ancestors, known as paleopolyploids, and extant polyploids, known as neopolyploids. The “model” whole-duplication events, that is, those most studied in eukaryotes, are those of the yeast Saccharomyces cerevisiae: many examples in plants such as maize, wheat and rockcress; the two duplications at the stem of the phylogeny of vertebrates, known as 1R and 2R; and lastly, the duplication of the ancestor of teleost fish, to which several model organisms such as zebrafish and medaka belong. The most common polyploidizations in eukaryotes are triploidizations, at the origin of genomes with three sets of chromosomes (3n), tetraploidizations (4n) and hexaploidizations (6n). Here, we provide a quick, non-exhaustive overview of well-documented polyploidization events in major eukaryotic phyla.

Figure 1.1.Whole-genome duplications identified in the eukaryotic phylogenetic tree. Autopolyploidization events are shown in blue, allopolyploidizations in red and duplications with unidentified parental origins in black. Only paleopolyploidies are represented: neopolyploidies (<5 Mya, such as those of wheat) are not indicated on the tree for legibility reasons. The geological time scale is represented below the tree, highlighting how numerous polyploidization events in plants correspond to the Cretaceous-Tertiary crisis (K-Pg boundary). The majority of events presented correspond to the data cataloged by Van de Peer et al. (2017) and Clark and Donoghue (2018), updated to reflect the current literature

Yeasts: baker’s yeast S. cerevisiae was the first eukaryotic species to see its genome sequenced (Goffeau et al. 1996). It was through analysis of this sequence that its paleopolyploid origin was formally demonstrated for the first time (Wolfe and Shields 1997; Dujon et al. 2004; Kellis et al. 2004): sets of duplicated genes were observed along the genome, making it possible to link pairs of ancestrally duplicated chromosomes, today known as “homeologs”. This signature is typical of paleopolyploid genomes. Subsequent sequencing of other yeast species enabled this duplication to be identified as an evolutionary event common to all yeasts of the genus Saccharomyces.

Plants: while yeast duplication was the first to be strictly demonstrated, polyploids had already been studied in plants for over 50 years. The geneticist Stebbins’ founding work, based on the comparison of karyotypes of different species, had already laid the first foundations for polyploidy studies (Stebbins 1947; Soltis et al. 2014). The first confirmation of a genome duplication in plants came in 2000 with the sequencing of the model species Arabidopsis thaliana (Kaul et al. 2000), whose genome was anciently tetraploid. Since then, it has become known that whole-genome duplications have been recurrent during the course of plant evolution (see Figure 1.1): 20% of extant angiosperm species (flowering plants) are recent polyploids (Barker et al. 2016), and the evolutionary group of angiosperms is descended from at least two successive paleopolyploidization events.

Many neopolyploids are described in plants, particularly in cultivated species such as wheat (hexaploid) (The International Wheat Genome Sequencing Consortium 2014) or cotton (tetraploid cultivars coexisting with diploid cultivars) (Li et al. 2014). The most recent publications estimate that at least 250 whole-genome duplications occurred in plants (Leebens-Mack et al. 2019), and that all plants are paleopolyploids descended from at least one ancient genome duplication (Nieto Feliner et al. 2020).

Vertebrates: the hypothesis of total duplications in the vertebrate ancestral lineage originated in the work of Susumu Ohno in the 1960s and 1970s. After observing the existence of many families of duplicated genes distributed across chromosomes in vertebrate genomes, Ohno (1970) proposed the classically accepted hypothesis known as the “2R hypothesis”, that is, the presence of two whole-genome duplications in the ancestor of the vertebrate lineage, denoted 1R and 2R. This hypothesis was then reinforced by the study of clusters of Hox genes, which exist as a single cluster in invertebrates, versus four in vertebrates (Popovici et al. 2001), then by studying multigene families on the whole genomes of several model vertebrates (Dehal and Boore 2005) as well as by reconstructions of ancestral genomes (Sacerdot et al. 2018). Similarly, the seven clusters of Hox genes present in the teleost fish genome have made it possible to propose a third duplication specific to this evolutionary group (known as “3R”) (Amores et al. 1998), confirmed with the sequencing of the tetraodon genome (Jaillon et al. 2004). In general, whole duplications in vertebrates are ancient and therefore still poorly characterized compared with events occurring in plants and yeasts. Several duplications took place subsequent to events 1R and 2R, for example, in xenopes, including the well-documented duplication of the model species Xenopus laevis (Session et al. 2016), as well as in non-teleost fish, notably sturgeon (Du et al. 2020). Additional duplications are also cataloged in teleosts (after 3R), including one in carp (Chen et al. 2019; Xu et al. 2019) and one in salmonids (Berthelot et al. 2014; Lien et al. 2016). With no whole-genome duplication being cataloged in warm-blooded vertebrate species, some authors have proposed that polyploids are not stable or viable in these groups (Wertheim et al. 2013).

Paramecia: in addition to yeast, paramecia represent another unicellular model of interest having undergone several whole duplications of their genome. Three duplications were highlighted by sequencing of the genome of Paramecium tetraurelia (Aury et al. 2006). The most recent duplication is shared by 15 morphologically very similar species (the P. aurelia complex). The notable genomic feature of paramecia is the separation of the somatic genome (MAC: macronucleus) from the germline genome (MIC: micronucleus). Only the MIC is subject to meiosis and is transmitted to the descendant cell and can thus be the transmitter of whole-genome duplications; the MAC is systematically reformed from the MIC during the paramecium lifecycle. In addition to the genomic redundancy provided by total duplications, the MACs of paramecia contain a second form of redundancy: they correspond to a rearranged and extremely polyploid version of the MIC, which can contain up to 800 identical copies of the chromosomes. The implications of this nuclear dimorphism on the ability to tolerate total duplications, as well as their consequences, remain poorly understood in the absence of other documented examples in eukaryotes.

Invertebrates: polyploidizations have been relatively little studied, and therefore more rarely documented, in animal species outside the vertebrate phylum. Three whole-duplication events have been suggested in the horseshoe-crab lineage (Kenny et al. 2016), as well as tetraploidization in the ancestor of scorpions and spiders (Schwager et al. 2017). It should be noted, however, that uncertainties regarding the phylogeny of these species and the still low number of species sequenced do not permit accurate dating of these events (Ballesteros and Sharma 2019; Nong et al. 2021). In addition, in hexapods, transcriptomic data suggest several events of massive gene duplications (Li et al. 2018). Nevertheless, in the absence of genomes sequenced to reveal the signatures of homeologous regions characteristic of whole-genome duplications, the origin of these gene copies is debated (Roelofs et al. 2020). Lastly, in mollusks, a whole-duplication event in the lineage of the giant African snail has recently been demonstrated through the sequencing of its genome (Liu et al. 2021).

Many authors have questioned the overrepresentation of genome duplications in plants compared to animals. The hypothesis most commonly put forward proposes that the cellular disruptions involved in total duplication, to which we will return later, would not be supported by the constraints of the development of certain groups of animals, particularly mammals. In contrast, polyploidizations would be a common mechanism favored by natural selection in plant evolution. Some authors have also proposed that the large number of polyploids in plants may reflect a “ratchet” effect, in which genome duplications would be well tolerated but irreversible. Unable to be lost once acquired, the polyploidies would then be fated to progressive accumulation during the course of evolution in this phylum (Meyers and Levin 2006).

1.1.2. Polyploidies in prokaryotic organisms

The existence of polyploid organisms is now also well documented in bacteria and archaea. Many examples have been described, notably in the majority of halophilic and methanogenic Euryarchaeota clades, in Proteobacteria, Cyanobacteria, Deinococci and Gram-positive bacteria. In these groups of unicellular organisms, the ploidy of the organisms can reach beyond 50 copies of the genome per cell (50n) (Markov and Kaznacheev 2016), and even hundreds of thousands in the case of Epulopiscium spp., a group of Firmicutes known to be symbiotic with certain fishes (Mendell et al. 2008). This ploidy is rarely stable and is often higher when the cells are in the growth phase compared to the stationary phase. Thus, wide variations in ploidy can be observed within a species or even within a population.

The evolutionary origin and dating of these polyploidization events is often difficult to determine in the absence of exhaustive genomic and phylogenetic studies to trace these events. The functional and evolutionary consequences of genome duplications are also less studied in these groups, although several avenues have been proposed. While the consequences linked to genome redundancy, which we will discuss later, present points in common with eukaryotes, it can nevertheless be noted that other advantages have been proposed in prokaryotes, without any real link to the genetic information itself. For example, certain species such as Haloferax volcanii, a halophilic archaea, use the duplication of their genome as a phosphate storage mechanism, independently of any genetic consideration (Zerulla et al. 2014). In Epulopiscium spp., the very high ploidy leads to giant cells that escape predation by ciliates and can migrate into the digestive tract of their host (Mendell et al. 2008). As these consequences are unrelated to the presence of duplicated genomic sequences and the information that they carry, we will not discuss them any further in this chapter, but we invite readers to read Soppa’s (2014) review for further examples. It should be noted, however, that prokaryotic polyploidies have been proposed as one of the original mechanisms involved in the evolution of mitosis, meiosis and sexual reproduction at the time of the appearance of eukaryotes (Markov and Kaznacheev 2016).

1.1.3. Polyploid cells in normal and pathological physiology

Genome duplications do not always concern entire organisms, and can also occur in the somatic tissues of diploid multicellular individuals where these cells can form polyploid clonal lineages, sometimes called endopolyploids. This phenomenon of polyploidization can be normal and constitutive, for example in certain cells in the liver or the placenta in humans (Ganem and Pellman 2007), in the intestine in Drosophila (Fox et al. 2010), or in many tissues in angiosperm plants, where endopolyploidy is frequent (D’Amato 1984). In the most extreme cases, endopolyploid cells can contain several hundred thousand copies of the genome, as in certain neural cells of Aplysia (Edgar et al. 2014). Endopolyploidy can also appear in a normal physiological context in response to stress factors or the environment, as is the case with wound healing in humans (Scholes and Paige 2015; Yant and Bomblies 2015).

However, it is in the context of tumorigenesis that polyploid cells are most often described in usually diploid organisms. It is estimated that more than 30% of human tumors contain cells with a duplicated genome and that endopolyploidy is part of the common processes of carcinogenesis, with karyotype rearrangements (Bielski et al. 2018). Thus, the presence of polyploid cells forms part of the classical progression markers in many types of cancers, including breast and cervical cancers.

1.2. Mechanisms for the appearance of whole-genome duplications

1.2.1. Non-separation of chromosomes after replication

Whether concerning polyploid individuals or somatic lineages, the appearance of whole-genome duplications involves similar mechanisms leading to the appearance of a first polyploid cell either during mitosis or meiosis. The main mechanism of polyploid cell formation occurs by successive replications of DNA content without cell division, through a mechanism called endoreplication (Edgar et al. 2014; Yant and Bomblies 2015) (see Figure 1.2). Endoreplication is evolutionarily well-conserved from plants to animals. It is reflected in a variation of the classical cell cycle: the G1 → S → G2 → M → G1 cycle is altered either by a reduction in the division phase (incomplete mitosis, without cytodieresis: endomyosis) or in G → S → G (without mitosis: endocycle). Programmed endoreplication of healthy polyploid cells is triggered by developmental or environmental cues, and is often related to stress response mechanisms, particularly in plants (Scholes and Paige 2015). These result in inhibition of the entry into division as well as activation of the cycle progression and exit signals (see the review by Fox and Duronio (2013) for details of the molecular mechanisms that may be involved). Constitutive endopolyploidy thus leads to the formation of specialized, differentiated cells, which lose their ability to divide.

Figure 1.2.Tetraploidization by endoreplication. In the case of a normal cell cycle, the genome is replicated in the S phase, then the sister chromatids separate during mitosis in the M phase. During an endoreplication cycle, chromatids can separate without entering the M phase (endocycle, in red), or the cell can initiate the mitosis phase but not complete it until separation of the daughter cells, producing a single tetraploid cell (endomitosis, in yellow). The mechanism is essentially identical for endoreplications during meiosis. If the chromatids do not separate, this mechanism can give rise to polytene chromosomes

In the pathological case of cancer cells, endoreplication is aberrantly triggered. When these polyploid cells regain their ability to proliferate, their unstable genome and mitoses make them particularly prone to the accumulation of mutations, a characteristic that facilitates the progression of cancers (Yant and Bomblies 2015).

In the particular case of meiosis, endoreplication leads to unreduced gametes, most frequently diploid. The frequency of unreduced meioses is variable according to eukaryotic clades and increased in hybrid species: it is estimated to be 0.73% in humans (Egozcue et al. 2002) and 0.56% on average in non-hybrid plants compared with 27.52% in hybrids (Ramsey and Schemske 1998). It is also well documented that unreduced gametes increase in response to environmental conditions, notably changes in temperature, across many species groups among plants, fish and amphibians.

1.2.2. Autopolyploidization, a perfect genome redundancy

Individuals carrying whole-genome duplications occur mostly following the fusion of unreduced gametes. In the example of a diploid species, the fusion of two unreduced 2n gametes results in tetraploidization: the transition to a 4n state. More rarely, polyploid formation can also occur after somatic doubling during the early stages of development. Autopolyploidy results from the fusion of genomes of individuals of the same species, in contrast to allopolyploidy (developed in the following section), where the parental genomes come from different species (Stebbins 1947). In plants, autopolyploids can result from the duplication of the genome of the same individual by self-fertilization, in which case genetic redundancy can be total.

As a result, autopolyploid genomes initially contain almost-identical genome copies: the only differences correspond to the polymorphisms carried by the individual(s) at the origin of the polyploidization event. Many examples of natural neo-autopolyploids are known in plants, such as the common Biscutella (Parisod et al. 2010) or Dactylis glomerata (cat grass) (Lumaret et al. 1989). Several paleopolyploids are also suggested to be of autopolyploid origin: the most recent duplication of the banana genome (so-called “alpha” duplication) (D’Hont et al. 2012), that of the soybean genome (Schmutz et al. 2010), or that of the poplar genome (Garsmeur et al. 2014). In vertebrates, the parental origins of the majority of whole-genome duplications are not known, but two events are generally accepted as probable cases of autopolyploidization: the tetraploidization at the origin of the salmonid lineage (Berthelot et al. 2014; Lien et al. 2016) and that in the sturgeon lineage (Du et al. 2020).

1.2.3. Allopolyploidization, an overlapping of genomes of similar species

Allopolyploidy can result, similarly to autopolyploidy, from the fusion of unreduced gametes but originating from individuals of two genetically similar species. Nevertheless, in the case of allopolyploidy, duplication generally occurs after an initial hybridization stage between the two species leading to the emergence of a polyploid population. Indeed, meioses of hybrids are often unstable, with a higher probability of forming unreduced gametes (Ramsey and Schemske 1998).

Figure 1.3.Allopolyploidizations in the lineage of wheat (Triticum aestivum). Common wheat is a hexaploid species (2n = 42) resulting from two recent allopolyploidization events, dated to 820 ka and 430 ka ago. Wheat is descended from the hybridization of still-existing species, einkorn wheat (Triticum urartu), Aegilops speltoides, and a species of wild goatgrass (Aegilops tauschii). These species each have a haploid karyotype with seven chromosomes, which are preserved in the current wheat genome in the form of subgenomes clasically annotated A, B and D. The karyotypes shown are the haploid karyotypes

In comparison with autopolyploids, allopolyploids contain two copies of more divergent genomes, where the different alleles can present an assortment of more varied polymorphisms. Depending on the distance between the parental species, the two subgenomes may have been subject to different evolutionary dynamics for a relatively long time and accumulated differences in terms of transposable element repertoires, genomic composition (GC, codon usage) and a divergence in coding and non-coding sequences. Indeed, it is the observation of these divergences between the two subgenomes that makes it possible to classify a polyploidization event as corresponding to an auto- or allopolyploidy. Again, many examples exist of allopolyploidizations in plants, such as the ancient maize genome duplication (Garsmeur et al. 2014), and more recent polyploidizations in wheat (The International Wheat Genome Sequencing Consortium 2014) and cotton lineages (Li et al. 2015). In certain cases, the parental genomes that have hybridized and duplicated are known, as is the case for common wheat, as a hexaploid species derived from two successive allopolyploidization events between still-extant wild cereal species (see Figure 1.3). In vertebrates, the best-documented allopolyploidies are the duplication of the genome of carp, of which at least one of the progenitors is derived from the barb clade, the other remaining unidentified (Xu et al. 2019), as well as that of the Xenopus laevis genome, derived from the hybridization of two ancestral xenopes, approximately 17 Mya (Session et al. 2016). Lastly, whole-genome duplication of the yeast S. cerevisiae is also an allopolyploidy (Marcet-Houben and Gabaldón 2015).

The relative frequency of auto- and allopolyploidies in eukaryotes is poorly known. Indeed, for many ancient events, the ancestral species are not known and the question of their origin has not been resolved. In plants, allopolyploidies were initially thought to constitute the majority. Nevertheless, it would seem that this observation is the result of an overrepresentation bias with respect to allopolyploids in published studies, whereas the two mechanisms are in fact now considered to occur in equal measure (Barker et al. 2016).

1.3. Cellular consequences of whole-genome duplications

1.3.1. Disruption of cell and nucleus organization

At the cellular level, one of the most obvious and universal consequences of genome duplications is the increase in cell size. Polyploid cells are typically larger than those of normal ploidy, and this is observed across all species groups from prokaryotes to vertebrates, as well as in normal or pathological polyploid cell lineages (Yant and Bomblies 2015). The increase in the amount of DNA hosted logically leads to an increase in the nuclear volume (Cavalier-Smith 1978), as well as an increase in the duration of the cell cycle, where the replication phase in particular is lengthened (Mable 2001). In plants, it has been proposed that the consequences of this nucleotypic effect reach as far as the developmental duration and generation time of polyploid and paleopolyploid species (Levin 1983), although the effect is less documented in other groups of the tree of life. In general, neopolyploid species are larger in adulthood than their diploid parents, with an allometric effect established between genome size, cell size and the size of the individual (Gregory et al. 2000; Otto 2007).

The doubling of genetic material also disrupts the structural organization of the nucleus: the presence of additional copies of the genome requires a reorganization of the nuclear territories. In cotton, where the existence of diploid and tetraploid cultivars allows detailed comparisons of the nuclear organization, one study notably showed that tetraploidization was accompanied by modifications in the topological arrangement of the chromosomes in the nucleus and by an inversion of open and closed genomic compartments (Wang et al. 2018). These modifications also extend to higher resolution topological structures, with a reorganization of certain chromatin domains (TADs1), whose structure is important for the regulation of gene expression. Also in cotton, new TADs appeared after polyploidization, preferentially in the open-chromatin regions. These TADs maintain the homeologous regions in a spatially closed environment, suggesting that these regions could remain co-regulated (Wang et al. 2018). In Arabidopsis, whose genome is not organized into well-defined TADs, similar modifications have nevertheless been observed with artificial induction of tetraploidization, with an increase in nuclear contacts between interchromosomal regions and a restriction of short-range interactions (Zhang et al. 2019).

Given that the techniques allowing detailed investigation of the three-dimensional organization of the nucleus are relatively recent, studies that analyze the effects of genome duplications are currently few in number and are focused on the phylum of plants, where the numerous neopolyploids allow a comparison between parental and polyploid genomes. Thus, the effects on nuclear organization in the other clades remain poorly known.

1.3.2. Modifications in the expression of genes and transposons

It is now well documented that genome duplications are accompanied by profound and immediate changes in gene expression (Adams and Wendel 2005). These effects have been clearly demonstrated by experiments with artificial tetraploidization, particularly in Arabidopsis