Transposable Elements and Genome Evolution -  - E-Book

Transposable Elements and Genome Evolution E-Book

0,0
142,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

Since their discovery by Barbara McClintock in the mid-20th century, the importance of transposable elements in shaping the architecture, function and evolution of genomes has gradually been unveiled.

These DNA sequences populate nearly all genomes and are viewed as genomic parasites. They are mobile, capable of proliferating within genomes and also commonly travel between species.

These elements are mutagenic and are responsible for several human genetic disorders, but they also constitute a major source of genetic diversity. Some insertions have beneficial effects for the host and are selected for, giving rise to significant evolutionary innovations. Their dynamics within genomes are intricate, as are their interactions with other genome components. To limit their proliferation, the genome has evolved sophisticated defense mechanisms.

While researchers commonly use these elements as genetic tools, their identification in newly sequenced genomes remains a challenge due not only to their extensive diversity, but also their large copy numbers.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 517

Veröffentlichungsjahr: 2024

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Table of Contents

Dedication Page

Title Page

Copyright Page

Acknowledgments

Introduction

I.1. Almost 80 years of research on transposable elements

I.2. What is exactly a transposable element?

I.3. How do transposable elements survive in the genome?

I.4. Structure, classification and origin

I.5. References

1 Transposable Elements in Eukaryotes

1.1. Introduction

1.2. Classification, structure and transposition mechanism

1.3. Abundance, diversity and distribution

1.4. Origins of transposable elements and evolutionary relationships with other genetic elements

1.5. Genomic impact

1.6. References

2 Prokaryotic Transposable Elements

2.1. Introduction

2.2. Transposases: the enzymes driving transposition

2.3. Insertion sequences

2.4. Transposons (Tn)

2.5. Conclusion

2.6. References

3 Transposable Elements and Human Diseases

3.1. The moving parts of the human genome

3.2. TE insertion and its impact on the genome and gene expression

3.3. TE involvement in human cancers

3.4. Involvement of TEs in noncancerous pathologies

3.5. The role of stress and environmental pollution in TE mobility

3.6. Conclusion

3.7. References

4 The Silencing Mechanisms Inhibiting Transposable Element Activity in Somatic and Germ Cells

4.1. Introduction

4.2. Silencing of transposable elements in somatic tissues

4.3. Silencing of transposable elements in the germline

4.4. The specific case of somatic cells surrounding the germline

4.5. Transmission of silencing through generations

4.6. Environmental stresses and their influence on TEs

4.7. Conclusion

4.8. References

5 Transposable Elements and Adaptation

5.1. Transposable elements are mobile genomic sequences

5.2. Transposable elements and insecticide resistance

5.3. Transposable elements and the immune response

5.4. Transposable elements and environmental shock response

5.5. Conclusion

5.6. Acknowledgments

5.7. References

6 Domestication (Exaptation) of Transposable Elements

6.1. Introduction

6.2. Host genes derived from transposons

6.3. TEs can disperse noncoding regulatory sequences across the genome

6.4. TEs form structural components of the genome

6.5. Summary

6.6. References

7 Horizontal Transfers and Transposable Elements

7.1. Introduction

7.2. Mechanisms and prerequisites for horizontal transfers of transposable elements

7.3. Bioinformatics methods for detecting horizontal transfer of transposable elements

7.4. Documented examples of horizontal transfers of transposable elements

7.5. The impact of horizontal transfers of transposable elements

7.6. Conclusion

7.7. References

8 Genome Invasion Dynamics

8.1. The lifecycle of transposable elements

8.2. Transposable elements as parasites of sexual reproduction

8.3. Limiting the spread

8.4. Long-term evolution

8.5. The intriguing case of asexuals

8.6. Transposable element genomics

8.7. Conclusion

8.8. References

9 The Ecology of Transposable Elements

9.1. Introduction

9.2. Cellular, population and specific dynamics

9.3. The “genome ecology” approach

9.4. Conclusion

9.5. References

10 Transpos able Elements as Tools

10.1. Introduction

10.2. Development of DNA transposons as genetic tools

10.3. DNA transposons as efficient gene transfer tools applied in important model organisms

10.4. Insertional mutagenesis based on engineered transposons

10.5. Application of transposons in human gene therapy

10.6. Transposase as an excision tool

10.7. Toward specific gene targeting by fusing transposases with other nucleases

10.8. Conclusion

10.9. References

11 Genomic Characterization of Transposable Elements: Databases and Software

11.1. Introduction

11.2. Databases

11.3. Search strategies for transposable element characterization

11.4. Nature of input sequences

11.5. Population genomics of transposable elements

11.6. How to evaluate the most suitable database and TE/MGEs search strategy for your study?

11.7. Acknowledgments

11.8. References

List of Authors

Index

End User License Agreement

List of Tables

Chapter 3

Table 3.1. Examples of cancers linked to transposable elements

Table 3.2. Examples of noncancerous pathologies linked to transposable element...

Chapter 10

Table 10.1. DNA transposons and their main characteristics

Chapter 11

Table 11.1. List of representative TE databases with broad (non-restricted to ...

Table 11.2. List of representative software used for transposable elements sea...

List of Illustrations

Introduction

Figure I.1. Famous examples of TE insertions that affect phenotypes or partic...

Figure I.2. TEs identification: historical timeline. TEs have two properties: ...

Chapter 1

Figure 1.1. The two classes of transposable elements as defined by Finnegan (1...

Figure 1.2. The different transposition mechanisms of eukaryotic transposable ...

Figure 1.4. Different ways to increase copy number and to repair the excision ...

Figure 1.5. Variation of TE contents (in percentage of the genome) in major Eu...

Figure 1.6. Chromosomal rearrangements triggered by ectopic recombination betw...

Chapter 2

Figure 2.1. Homoduplex electron microscopy for transposon visualization. The c...

Figure 2.2. (A and B). The major DDE mechanisms involved in transposition. (A)...

Figure 2.3. IS and their derivatives. An IS91 family IS showing the secondary ...

Figure 2.4. IS. Direct target repeats (DR), terminal inverted repeats (IR), tr...

Figure 2.5. Common translational regulation in IS. (A) Programmed -1 translati...

Figure 2.6. IS and their derivatives: transporter IS (tIS). Direct target repe...

Figure 2.7. IS and their derivatives: multiple IR (IS1380). Direct target repe...

Figure 2.8. IS and their derivatives. Direct target repeats (DRs), terminal in...

Figure 2.9. The major transposon families: the compound transposon Tn10. The h...

Figure 2.10. Intermolecular transposition models. Replicative cointegration by...

Figure 2.11. The major transposon families: Tn3. The transposons are not to sc...

Figure 2.12. Evolution of the integron-containing Tn21 clade. (A) Ancestor Tn5...

Figure 2.13. The major transposon families: Tn7. The horizontal black bar abov...

Figure 2.14. Repeated sequences in transposon ends. The sequences, represented...

Figure 2.15. Organization of Tn7 transposition proteins. The relative position...

Figure 2.16. Tn7 transposition pathways. Tn7 infecting a naïve cell using TnsA...

Figure 2.17. The major transposon families: Tn402. The horizontal black bar ab...

Figure 2.18. Casposons. Target Site Duplication (TSD) terminal inverted repeat...

Figure 2.19. Capture and excision of integron gene cassettes. The integrase ge...

Figure 2.20. The major transposon families. The horizontal black bar above sho...

Figure 2.21. Integrative conjugative elements (ICE). (A) Tn916 encodes a tyros...

Figure 2.22. pdif. The figure shows (A) a typical recombination site composed ...

Figure 2.23. Genomic islands. Passenger genes are shown labeled 1-10, together...

Chapter 3

Figure 3.1. Proportions of TEs in the human genome and distribution in differe...

Figure 3.2. Alu element insertion and recombination. (A) Schematic structure o...

Figure 3.3. Example of an SVA element insertion at the origin of X-linked dyst...

Figure 3.4. Insertion of a LINE-1 element in the APC gene, resulting in a new ...

Figure 3.5. Example of an inserted HERV-E element at the origin of X-linked Op...

Chapter 4

Figure 4.1. Mechanisms of TE epigenetic silencing in somatic cells. Difference...

Figure 4.2. In Drosophila germ cells, piRNAs are amplified through the ping-po...

Chapter 5

Figure 5.1. Transposable elements can generate mutations through different mol...

Figure 5.2. Expression and insecticide resistance assays to understand the rol...

Figure 5.3. Insecticide-exposure and temperature-tolerance assays in Rld wild-...

Figure 5.4. Genomic structure of the Rdl region and molecular mechanism genera...

Figure 5.5. Unraveling the role of ATCOPIA93 in immune stress response.

Figure 5.6. Tf1 insertions induced by temperature and oxidative stress in Schi...

Figure 5.7. Tf1 insertions play a role in the response to environmental shock....

Chapter 6

Figure 6.1. The various way a transposon can be domesticated. Transposon-gene ...

Figure 6.2. The V(D)J recombination process allows recognition of a wide varie...

Chapter 7

Figure 7.1. Lifecycle of a TE family. Gray bars represent a genome, small rect...

Figure 7.2. Evidence of HTT that can be observed and searched for

Figure 7.3. Steps for determining horizontally transferred sequences by compar...

Figure 7.4. Steps in identifying HTTs using the VHICA method

Chapter 8

Figure 8.1. Illustration of the lifecycle of a TE family. The host diploid gen...

Figure 8.2. Illustration of the selfish DNA theory applied to TEs. In absence ...

Figure 8.3. Illustration of how selection and transposition can lead to a dyna...

Figure 8.4. Illustration of the reconstruction of transposition dynamics from ...

Figure 8.5. Copy number dynamics during the TE lifecycle. When a new TE arrive...

Chapter 9

Figure 9.1. Lifecycle of a family of transposable elements. See text and Figur...

Figure 9.2. Fate of a newly arrived copy in a naive genome (i.e. one that has ...

Figure 9.3. Evolutionary forces affecting copy number within a family of trans...

Figure 9.4. Evolution of the number of autonomous and non-autonomous copies, a...

Figure 9.5. Structuring of a metapopulation into interconnected subpopulations...

Figure 9.6. All of the interactions undergone by TEs in their nuclear, cellula...

Chapter 10

Figure 10.1. Structure of cut-and-paste transposons and their transposition me...

Figure 10.2. Experimental pipeline for the development of genetic tools based ...

Figure 10.3. Taxonomic distribution of Tc1/mariner and DD? D/pogo transposons....

Figure 10.4. Transposon-based vectors for functional genomics. Catalogue of tr...

Figure 10.5. Sperm mutagenesis. Left: Mice enhancer trap (ET) vector and trans...

Chapter 11

Figure 11.1. Databases for mobile genetic elements of prokaryotic origin (MGEs...

Figure 11.2. Main strategies for TE/MGE characterization for species-centered ...

Figure 11.3. Software for population-centered studies to detect and estimate t...

Guide

Cover Page

Table of Contents

Dedication Page

Title Page

Copyright Page

Acknowledgments

Introduction

Begin Reading

List of Authors

Index

WILEY END USER LICENSE AGREEMENT

Pages

ii

iii

iv

xiii

xiv

xv

xvi

xvii

xviii

xix

xx

xxi

xxii

xxiii

xiv

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

To my unforgettable cat muse

Aurélie Hua-Van

SCIENCES

Ecosystems and EnvironmentField Directors – Françoise Gaill and Dominique Joly

Evolution, Subject Head – Dominique Joly

Transposable Elements and Genome Evolution

Coordinated by

Aurélie Hua-Van

Pierre Capy

First published 2024 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.

Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:

ISTE Ltd27-37 St George’s RoadLondon SW19 4EUUK

www.iste.co.uk

John Wiley & Sons, Inc.111 River StreetHoboken, NJ 07030USA

www.wiley.com

© ISTE Ltd 2024The rights of Cécile Tannier to be identified as the author of this work have been asserted by her in accordance with the Copyright, Designs and Patents Act 1988.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s), contributor(s) or editor(s) and do not necessarily reflect the views of ISTE Group.

Library of Congress Control Number: 2023949519

British Library Cataloguing-in-Publication DataA CIP record for this book is available from the British LibraryISBN 978-1-78945-178-8

ERC code:LS2 Genetics, ’Omics’, Bioinformatics and Systems Biology LS2_6 Genomics (e.g. comparative genomics, functional genomics)LS8 Ecology, Evolution and Environmental Biology LS8_5 Evolutionary genetics

Acknowledgments

We extend our heartfelt appreciation to all of the authors for their significant contributions to this book. Additionally, we would like to express our sincere gratitude to Wolfgang Miller for providing insightful comments on specific chapters, Arnaud Le Rouzic for his invaluable assistance during the production phase, and Dominique Joly for her meticulous reviews of the chapters and unwavering support throughout the entire process.

Aurélie HUA-VAN

Pierre CAPY

Introduction

Aurélie HUA-VAN and Pierre CAPY

Évolution, Génomes, Comportement et Écologie (EGCE), CNRS, IRD, Université Paris-Saclay, Gif-sur-Yvette, France

Any biologist interested in the genetic material which is occupying the cells of their preferred species has been or will be confronted one day to transposable elements (TEs). They can represent an important part of the genome, although their raison d’être is not to participate in the core genetic program necessary for the cell survival.

TE activities are the causative agent of numerous major scientific discoveries in genetics and evolution, and the basis of important technological advances in Life Science research. They are passively used by our species in its effort for diversifying the most important supplies it needs (like all heterotrophic species), that is to say food. More importantly, they are responsible for some spectacular evolutionary key innovations, but also for many diseases that affect our species. Some examples are shown in Figure I.1.

Discovery of the genetics laws: Gregor Mendel, the father of genetics, formalized in 1865 the basic laws that govern the transmission of phenotypic characters under the control of a single gene. Rediscovered simultaneously by several authors at the beginning of the 20th century, these laws still stand true. One of Mendel’s models was the transmission of the wrinkled phenotype of pea seeds (Figure I.1). In 1999, it was found that the wrinkled phenotype is due the insertion of a TE in the promoter of a gene (Bhattacharyya et al. 1990).

Figure I.1.Famous examples of TE insertions that affect phenotypes or participate to evolution. By crossing smooth and wrinkled peas, Mendel discovered dominance/recessivity between alleles. Strong environmental pressure triggered rapid shift in morphotype in the pepper moth. Insertion of a TE in the promoter of a gene and subsequent rearrangement may change the color of grape. The programmed recombination process triggering diversity of immunoglobulin chains chains is mediated by an enzyme originating from a TE.

Illustration of natural selection acting in real-time: industrial melanism is a textbook example of rapid evolution and adaptation by natural selection. During the industrial revolution in England (mid-1800s), the peppered moth Biston betularia was known for having quickly shifted from a predominant light colored wings morphotype to a blackened one (Figure I.1). This species rests on trees such as birches during the day, and its color serves as camouflage, preventing the moth from being preyed on by birds. The darkening of trunks due to the pollution may have favored the expansion of the dark morphotype, as supported by the famous capture-mark-recapture experiment of Kettlewell (1955). In this species, wings color is determined by a single gene. The mutated allele responsible for wing darkening has recently been characterized (Van’t Hof et al. 2016). The mutation is due to an insertion originating from a TE. More examples concerning the involvement or TEs in adaptation to environmental stress are given in Chapter 5.

Input in biotechnology: as soon as TEs have been characterized, they have proven useful for researchers, as polymorphic markers, for typing strains, or for genome engineering, ranging from mutagenesis to gene therapy. Some examples of recent use of TEs in biotechnology and research are illustrated in Chapter 10.

Input in agronomy amelioration: artificial selection has been performed by human since the neolithic period and the start of domestication in order to improve quality and quantities of our supplies. Human-directed selection is still widely used nowadays, for examples, in agriculture and horticulture to create different varieties. Notably, the color diversity or variegation in color in cultivated flowers is often due to the activity of TEs, of which are also responsible for the diversity of the grape color (Figure I.1) or the varieties of blood oranges (Lisch 2013).

Example of evolutionary innovations: besides the genetic diversity of food and flowers enjoyed by humans, some TEs insertions have been naturally selected in the past to give rise to important genetic novelties contributing to species success, such as adaptive immunity by V(D)J recombination in jawed vertebrates (Figure I.1). Some famous examples of this phenomenon, called molecular domestication, will be illustrated in Chapter 6.

Input in human health: in eukaryotes, TEs are generally silenced by epigenetic mechanisms. The silencing is not perfect and low transposition activity may be observed, which can increase when epigenetic marks are altered. In humans, de novo insertions and recombination between the huge number of TE insertions are sometimes involved in diseases and cancers. Numerous examples are given in Chapter 3.

I.1. Almost 80 years of research on transposable elements

The existence of mobile genetic factors in eukaryotic genomes was suspected as early as 1944 when Barbara McClintock, a maize cytogeneticist, observed strange phenomena such as phenotypic variegation and phenotypic alterations to various degrees, sometimes associated with a precise timing or reversibility in plants having suffered some chromosome breakage-and-fusion cycles. At the chromosomal level, she observed rearrangements, modification of the locus size and transposition of the locus to another place, so she called it Ds (for dissociation). She realized that all of these were due to some controlling elements that could move (transpose) throughout the genome, modify the expression of nearby genes and cause chromosomal rearrangements. The system she described was bipartite, the effect of the Ds locus being only observed in the presence of the Ac (for Activator) locus, itself transposable (McClintock 1950).

Later on, at the beginning of the 1970s, puzzling observations such as male recombination (normally not occurring in Drosophila), sterility and high mutation rates were discovered in Drosophila melanogaster in some particular crosses between natural and laboratory strains, leading to the notion of hybrid dysgenesis. At about the same time, the first mobile insertion sequences (ISs), which are TEs in bacteria, were molecularly identified in Escherichia coli. Similar mobile sequences, the P element and the I factor, were finally found to be responsible for the hybrid dysgenesis phenomenon in Drosophila. It soon appeared that numerous TEs existed in Drosophila and that some of them transposed through a RNA intermediate.

TEs were then discovered in more and more species, after cloning of loci responsible for phenotypic changes. It became clear that most visible mutations in plants and animals were due to TE activities, in particular induced mutations (UV, X-rays, etc.), or selected ones. The noticeable effects are as follows (visible on multicellullar organisms): variegation (somatic transposition), reversibility (sometimes) and mild phenotypic effects. It appeared later on that the middle repetitive fraction of eukaryotic genomes was mainly composed of TEs (repeativeness of the genome could be measured by denaturation followed by slow renaturation, three compartments were detected, highly repeated sequences are the ones that renature very quickly, while it takes much more time for unique sequences to find the complementary strand). We now know that all eukaryotic genomes carry TEs, including unicellular eukaryotes. For most TEs, we have deciphered their transposition mechanisms and how they are regulated. We know their mutagenic consequences and their impact on genome evolution both at the functional and structural level. At the genomic era, a new challenge is to efficiently identify these TEs in new sequenced genomes (Chapter 11).

I.2. What is exactly a transposable element?

We can give a very minimal definition of transposable elements (TEs): a DNA sequence, able to move and replicate in the genome of a single cell. That is, TEs are mobile, repeated and dispersed in the genome. Although theoretically not false, this definition is too poor to encompass the extraordinary properties of such sequences.

Figure I.2.TEs identification: historical timeline. TEs have two properties: they are mobile and repeated. Their discoveries has first relied on the phenotypic changes associated with their mobility, mainly on model organisms. Since 20 years now, at the genomic era, new TEs are primarily identified in non-model organisms sequenced genomes because of their repeated nature. In between, the golden age of molecular biology has permitted a deep understanding of their biology and their impact on the genome. First identifications of main groups of eukaryotic TEs are indicated.

Alternatively, you will often hear them described as junk DNA, parasitic elements, selfish DNA, jumping genes and mobile genetic elements. We will first try to define and distinguish all of these terms and determine how they fit into TEs.

Mobile genetic elements (MGEs): MGEs correspond to any nucleic acid sequence which is not permanently embedded into the genome of its host cell and able to replicate independently from the genome replication. TEs are part of the large group referred to as the MGEs that also includes viruses and plasmids. This group encompasses all nonliving genetic entities that spent some or all of their time outside the genomes of their host. Although some viruses are not integrative and will always been found outside the genome, TEs have to be within the genome at some point of their transposition process. Another difference is that viruses are infectious, they are able to exit a cell and enter another one from the same individual or from another one. Theoretically, TEs are restricted to within the cell. Yet TEs are not only able to escape the cell, but also to escape the individual and escape the species. The frontiers with viruses also fade when we examine the origin of these entities. Some viruses and some TEs have a common origin.

Jumping genes: this expression was initially used to describe short or longer DNA segments demonstrated to change position on the chromosome in bacteria, or to move from plasmid to plasmid. This led to the discovery of ISs (bacterial TEs similar to Class II TIR eukaryotic elements) and to composite Transposon (Tn) that often carry antibiotic resistance genes. These elements are able to form hairpin structure due to the complementary of the two ends of a single strand (terminal inverted repeats (TIRs)). The link was then made with eukaryotic DNA pieces able to move and to turn off expression of adjacent genes or to induce genomic rearrangements involved in rapid evolution. Molecular cloning of these eukaryotic jumping genes revealed that they share structural similarities (the TIRs) with IS.

Selfish DNA: selfish DNA has the intrinsic property to propagate with the host (cell, individual, population or species). The notion of selfish DNA refers to the selfish gene theory proposed by Dawkins (1976) in 1976. Yet there are differences between the two concepts. The selfish gene theory is a reformulation of Darwin’s natural selection theory from the gene perspective. It merely states that a fitter allele will invade a population by progressively replacing alternative less fit alleles at the same locus. According to Dawkins, life began with the emergence of chemical entities able to replicate, which he called replicators. As a replicator, an allele can only survive on the long term if it replicates more than the others. Hence, genes (alleles) are selfish because non-selfish alleles have disappeared. To secure their survival, genes cooperate to construct their shared survival vehicle (an individual) capable of producing numerous next-generation vehicles that will propagate them. From this point of view, vehicles are selfish, species are selfish and well… life is selfish.

The fate of a TE is not different, for it TO BE, it has to propagate. Yet it uses quite different means. The way used by TEs to make copies of themselves (then to propagate) is to directly make copy of themselves (Orgel and Crick 1980). Hence, all autonomous TEs contain genes that encode proteins whose function is to copy the TE (their vehicle) or at least to move it to a better place. At the same time, TEs also use the genes vehicle to propagate in the population. Furthermore, since they are not forever attached to the rest of the genome (they have an extrachromosomal step), they sometimes profit of this to escape from the vehicle (the host) and reach a new one. This strategy is enough for them to propagate more efficiently than the genes. In brief, through these two features (creating a copy, or excising from the genome host), they behave more selfishly than selfish genes, and are therefore the most successful genetic entities on the planet, along with viruses.

Genomic parasite: TEs are called genomic parasites because they populate the genome and use (hijack) the cell machinery for their self-propagation, without bringing any apparent benefit to the host. Indeed, TEs encode genes that are structurally similar to any other protein-coding genes in the genome. They contain regulatory regions and promoter recognized by the transcription machinery. The resulting messenger RNA is translated by the cell ribosomes. The hijacking of host cell resources is likely a bit simpler compared to what we can find in some viruses, and thus less harmful. Indeed, TE activity is not responsible for the death or the collapsing of the cell that carries them. Yet, they may have some detrimental effects due to the fact that they are mobile or repeated.

Junk DNA: junk DNA was initially coined by Ohno (1972) as all of the noncoding DNA in the genome (or not transcribed regions). It included repeated fractions of the genomes, now known to mainly correspond to TEs or TE-derived sequences, as well as micro- or minisatellites sequences (sequences that are tandemly repeated, unlike TEs that are dispersed over the genome). TEs are not always noncoding since some of them contain genes. However, these genes do not perform any obvious useful functions for the cell (Doolittle and Sapienza 1980). A more accurate definition would be that junk DNA has no function for the cell, that is, it has no effect on the phenotype and then no potential advantage for the host. It is more accurate because it excludes regions that are not transcribed but are involved in gene regulation and then have an obvious advantage for the host.

I.3. How do transposable elements survive in the genome?

When considering that TEs are mildly harmful for the host, we wonder why they are still there, present in every genome. Natural selection should act and they should then disappear. Furthermore, if they have no function/no phenotype, they are not expected to be submitted to selection. Hence, they will freely accumulate mutations in their sequences and will ultimately become inactivated by mutation. It is tempting to bring forward the explanation that TEs are maintained because they can be beneficial, since they are major actors in evolution.

However, we have to consider several points: (i) the harmfulness to the cell because of insertion or ectopic rearrangement is associated with particular insertions and not the whole bunch of copies of a TE family; (ii) the TE family, as long as it is active, may continue to replicate, replacing copies that have been counterselected;

(iii) even if over time a TE family accumulates mutations and becomes inactive, invasion by new TE families from the outside is still possible, and this seems to occur relatively frequently, through horizontal transfer, in bacteria of course but also in eukaryotes (see Chapter 7); (iv) the genome has developed sophisticated defense mechanisms to silence TEs (see Chapter 4), but not to actively get rid of the TE sequences. It is then expected that every genome, at any point of the evolution time, will contain at least some active families, and some (or many) inactive relics, even if TEs have no beneficial impact on the host. And this is indeed what we observe.

Yet, each trajectory of a TE family in a species is unique, and each species has a unique cocktail of TEs in term of diversity and abundance. Why is it so diverse? We still poorly understand the dynamics of TE families, and the interplay of the different factors involved. An important parameter in eukaryotes is sexual reproduction, which allows TEs to spread through non-Mendelian mechanisms (Chapter 8). Within the genome, TEs live together, cooperate and compete, and interact with genes and other genomic entities. This has lead to viewing the genome as an ecosystem (Chapter 9).

I.4. Structure, classification and origin

TEs have various structures that reflect their diverse origins and transposition mechanisms (see Chapters 1 and 2). They must contain genes encoding proteins, necessary for transposition and specific cis sequences for recognition of the TE termini. Biochemical activities are the basic activities involved in nucleic acid metabolism: polymerase, strand transferase (strand transfer), endonuclease and ligase. Polymerase is needed for all replicative TEs. Often this activity is carried by the element itself. Strand transferase activity is present in all TEs except the ones that directly replicates at the integration site. Endonuclease activity is present in all TEs, required for either for excision from the donor site or for preparing the target site. DNA cleavage can often be performed by the same catalytic core that also makes strand transfer (strand transferase). Ligase is usually provided by the cell and serves finishing the transposition process by restoring a continuous DNA molecule at the excision and donor site. Depending on the TE, some more enzymatic activity may be needed, such as helicase in helitrons, which helps to destroy hydrogen bonds between the two complementary DNA strands or RNAse H or Protease in Class I elements.

The origin of TEs is unknown. It is certainly not unique, considering the high diversity of structures. Both TEs and viruses can be viewed as purely parasitic, having an advantage over their host, that is to profit on it for expanding. Most likely these nucleic acid parasites emerge very early in the history of life. As early cheaters, they may have been present as soon as the first replicators emerged. Coupling an enzymatic activity involved in nucleic acid metabolism, with a DNA binding domain, will bring the needed specificity. Let us surround this gene by the ad hoc DNA sequences, and we will obtain a fair TE!

Among extant TEs, group II introns are probably the most interesting candidates to reflect how parasites may have evolved at the very beginning of life. They usually are neglected because they are not abundant in prokaryotes, are found in eukaryotes only in the organelle genomes (of some plants and fungi) and have a particular way of moving. Mobility occurs at the RNA level, in a way resembling spliceosomal introns. Hence, group II introns are suspected to be their progenitors. The excision (or splicing) is mediated by a ribozyme present in the sequence (ribozymes may have been present at the very beginning of life). The RNA is then reverse transcribed by a reverse transcriptase (RT) encoded in the sequence (and which is related to bacterial RT, such as RT in retrons) and reinserted in a sequence-specific manner in a empty homologous site. Interestingly, recently, a number of ribozymes have been identified in some TEs (Cervera and de la Peña 2020). Yet, tracing back the origin of TEs remains extremely challenging.

I.5. References

Bhattacharyya, M.K., Smith, A.M., Ellis, T.H., Hedley, C., Martin, C. (1990). The wrinkled-seed character of pea described by Mendel is caused by a transposon-like insertion in a gene encoding starch-branching enzyme.

Cell

, 60(1), 115–22.

Cervera, A. and de la Peña, M. (2020). Small circrnas with self-cleaving ribozymes are highly expressed in diverse metazoan transcriptomes.

Nucleic Acids Res.

, 48(9), 5054–5064.

Dawkins, R. (1976).

The Selfish Gene

. Oxford University Press, London.

Doolittle, W.F. and Sapienza, C. (1980), Selfish genes, the phenotype paradigm and genome evolution.

Nature

, 284(5757), 601–603.

Kettlewell, H.B.D. (1955). Selection experiments on industrial melanism in the lepidoptera.

Heredity

, 9(3), 323–342.

Lisch, D. (2013). How important are transposons for plant evolution?

Nature Reviews Genetics

, 14(1), 49–61.

McClintock, B. (1950). The origin and behavior of mutable loci in maize.

Proceedings of the National Academy of Sciences (PNAS) USA

, 36(6), 344–355.

Ohno, S. (1972). So much “junk” dna in our genome.

Brookhaven Symposium in Biology

, 23, 366–370.

Orgel, L.E. and Crick, F.H. (1980). Selfish DNA: The ultimate parasite.

Nature

, 284(5757), 604–607.

Van’t Hof, A.E., Campagne, P., Rigden, D.J., Yung, C.J., Lingley, J., Quail, M.A., Hall, N., Darby, A.C., Saccheri, I.J. (2016). The industrial melanism mutation in British peppered moths is a transposable element.

Nature

, 534(7605), 102–105.

1Transposable Elements in Eukaryotes

Aurélie HUA-VAN

Évolution, Génomes, Comportement et Écologie (EGCE), CNRS, IRD, Université Paris-Saclay, Gif-sur-Yvette, France

1.1. Introduction

While transposable elements (TEs) are found in all three domains of life, they have expanded the most over time in eukaryotes, such that they sometimes reach 80% of the genome, with copy number above millions. It is also in eukaryotes that their diversity is the highest. Different kinds of transposition strategies have been developed by these selfish DNAs. Each species can host thousands of different TE families, and even closely related species may have specific TE families. Their evolutionary history is quite complicated, as is their relationship with their respective host. At the mercy of their host’s silencing strategy, they can manage to escape repression and subsequent extinction by invading new species, where they evolve and amplify freely until defenses are put in place by this new host. The old companionship they maintain with their host is still visible, unlike in prokaryotes, where the turnover is very rapid in order to keep their genomes small.

In this chapter, I will review the basics of structure and transposition mechanisms of eukaryotic TEs, their origins, what we know about their distribution and abundance in eukaryotes, and finally the impact of such selfish elements in genome size, structure, function and evolution.

1.2. Classification, structure and transposition mechanism

The first attempt at TE classification was by Finnegan (1989). At this time, three types of TEs were known, called LTR retroelements, non-LTR retroelements (LINEs) and DNA transposons. Two classes were defined, based on the nature of the transposition intermediate, with Class I (LTR and non-LTR retroelements) transposing through an RNA intermediate and Class II (DNA transposons) directly from DNA to DNA (Figure 1.1).

Figure 1.1.The two classes of transposable elements as defined by Finnegan (1989).(A) Class I is represented by LTR retrotransposons and non-LTR retrotransposons, which transpose through a copy-and-paste mechanism. An inserted copy is transcribed, then reverse transcribed into DNA and integrated at a new genomic location. RT: reverse transcriptase; INT: integrase; RH: ribonuclease H; PR: protease; EN: endonuclease. (B) Class II transposons transpose through a cut-and-paste mechanism: The copy is excised from the genome and reinserted into the target site.

For most non-specialist researchers, TEs classically belong to one of these three types. However, in the last few decades, a huge diversity of TEs has been uncovered in eukaryotes, which makes this classification too simplistic. Most of the new TE groups were identified through genome mining. These correspond to less active groups, and deciphering their transposition mechanism remains challenging. However, refinement of the former classification or new classification scheme was proposed.

In 2003, a classification based on the types of enzymes responsible for integration was proposed (Curcio and Derbyshire 2003). Interestingly, similar catalytic activities (represented by catalytic motifs) could be found independently in the two main classes. The classification integrated both eukaryotic and prokaryotic elements. However, this classification did not catch all of the diversity that can be found in TE transposition, since it focused mainly on the integration process.

A major revision was proposed by Wicker et al. (2007). While the two main classes remain valid, new hierarchical subdivisions such as subclasses, orders and superfamilies were added to take into account variation in the transposition mechanism (number of cut strands), the structure and finally the phylogeny/sequence homology. The classification system included the most recently discovered types of TEs, but remained focused on eukaryotic TEs; it did not integrate some particular elements, once considered as not belonging to the TEs, such as group II introns. It was then challenged (Piégu et al. 2015; Kapitonov and Jurka 2008), but still currently remains the most widely used.

Figure 1.2.The different transposition mechanisms of eukaryotic transposable elements.

COMMENTS ON Figure 1.2.– Class I TEs use an RNA intermediate, so no cleavage occurs at the donor site, unlike Class II TEs, for which one strand (Polintons, Helitrons) or both strands (TIR, Cryptons) are excised from the donor site. Whatever the number of cut, intermediates can be found as linear or circular. If DNA synthesis has to occur, it can be done on the extrachromosomal intermediate before integration, coupled with reintegration, or after integration. Linear intermediates usually reinsert after staggered cut of the target site, which creates small direct duplications at both ends of the insertion, called the TSD (target site duplication). Circular intermediates (DIRS, Helitrons, Cryptons) usually do not.

Recently, Arkhipova (2017) proposed a revised classification that attempted to integrate all previous classification systems, and also to include prokaryotic elements, as well as other mobile genetic elements (see Introduction). Figure 1.2 illustrates the various transposition mechanisms that characterize the main groups of eukaryotic TEs.

1.2.1. Class I

This class is likely the most abundant in eukaryotes, due to the replicative nature of the transposition process. All TEs in this class transpose via a copy-and-paste mechanism, which involves the production of an RNA intermediate that is then reverse transcribed into a new DNA copy, ready to insert at another location in the genome. The initial copy at the donor site is always left intact. According to Wicker et al. (2007), this class is divided into several orders, which are characterized by different structures and transposition mechanisms. The two most famous are the LTR and the LINE orders, which correspond to the initial A and B groups defined by Finnegan. However, some other orders exist, showing a less typical structure, the diversity of which is less known.

1.2.1.1. LTR retroelements

LTR retroelements share their structure with LTR retroviruses. Their mechanism of transposition is well known. They are characterized by two long terminal repeats: LTRs that are in direct orientation and that frame a piece of DNA several kb long, which comprise of one or more open reading frames (ORFs) that encode proteins necessary for the transposition process. The first ORF usually encodes a structural protein that associates to form viral-like particles (VLP). A second gene encodes several proteins translated in a row and later cut by a protease. The reverse transcriptase (RT) will generate the cDNA complementary of the transcribed genomic RNA, synthetizing the new copies, while an RNAse H has the role of destroying the RNA in the hybrid DNA-RNA duplex. Reverse transcription usually takes place after formation of the VLP that encapsulates the genomic RNA and needed proteins, along with the two short RNAs needed for initiation of the cDNA. Finally, the integrase will perform the integration to the new site, through strand transfer, after target site cleavage. Several superfamilies have been recognized, based on gene order and on the phylogeny on the RT conserved domains: Copia, Gypsy/Ty3, ERV (Endogenous retrovirus) and BEL/Pao. These superfamilies have been included in the International Virus Taxonomy, within the order of Ortervirales. For example, ERVs are classified in the class Retroviridae, which also includes retroviruses such as HIV. In fact, the main distinctions between LTR retroviruses and LTR retrotransposons is that the former are infectious, able to exit the cell/individual before entering a new one, while LTR retrotransposons realize all their cycles within the cell. Some of these contain a third ORF that encodes an envelope gene, theoretically allowing them to leave the cell. The Gypsy element in D. melanogaster was first considered as a retrotransposon but is actually able to change cells and can then be functionally considered as a virus. The distribution of Env genes in the phylogenetic tree (based on RT) is quite patchy. Different envelope genes have been acquired independently and then been subsequently lost. Those LTR retroelements are found in large numbers in all eukaryotic genomes. Gypsy and Copia-like elements are particularly frequent.

Figure 1.3.Classification of eukaryotic transposable elements showing their main features and catalytic domains. GAG: capsid-like protein; HUH: Rep endonuclease domain; Hel: helicase domain; APE: apurinic/apyrimidinic endonuclease; RLE: restriction-like endonuclease; DDE: DDE integrase; YR: tyrosine recombinase; SF1-HEL: helicase (subfamily 1); T: reverse transcriptase; DJR: double jelly roll; PolB: DNA-dependant-DNA-polymerase B. D: direct; I: inverted. P-LTR: pseudo-LTR.

1.2.1.2. DIRS

DIRS elements also possess LTRs, but present some specific features, and are clearly different from classical LTR retroelements. Their LTRs can be direct of inverted, depending on the DIRS family. An internal/subterminal region, homologous to the LTR sequence, is present, at least in the DIRS-like family; this appears to be crucial for the transposition mechanism. DIRS encode a reverse transcriptase, which seems to be phylogenetically close to that of the LTR, as well as an RNAse H and a GAG-like gene; however, they typically contain a gene coding for a tyrosine recombinase that replaces the DDE integrase. They also sometimes carry more specific genes (methylase, or hydrolase, of unknown function). The DIRS order remains a poorly characterized order among Class I. Four different superfamilies have been recognized (Malicki et al. 2020).

1.2.1.3. Non-LTR retroelements

Non-LTR retrotransposons lack LTRs and have few structural characteristics at their ends, including a TA rich stretch at the 3’ end, which is necessary for transposition. Non-LTR retrotransposons include LINEs for long interspersed elements, which form a large order of Class I elements. They nevertheless share catalytic activities with LTR elements: a GAG-like protein (usually called ORF1), a reverse transcriptase and an endonuclease, which can be of various types, restriction-like endonuclease (RLE) or Apurinic-like endonuclease (APE), defining two subgroups. The transposition mechanism is quite different from the one from LTR elements. After transcription (and translation of proteins), transposition occurs directly at the reintegration site through a mechanism called TPRT (target-primed reverse transcription). The RNA hybridizes through the TA rich tail to the complementary single-strand DNA obtained after the staggered cut by the endonuclease. The DNA is then synthetized by elongation of the staggered cut. One feature of this transposition mechanism is that if the reverse transcription is aborted prematurely, a 5’ truncated element will be inserted. Since signal for transcription is usually located in 5’, those copies are themselves unable to transpose again and are called dead-on-arrival (DOA). Note that 5’ truncated copies are frequent in genomes. LINEs are classified into five big superfamilies based on the specificity of the endonuclease domain and RT phylogeny (Arkhipova 2017; Kojima 2020). They are present in all genomes and can reach huge copy numbers (about 500,000 for L1 in the human genome).

Tightly associated with LINEs are the SINEs (short interspersed elements). These are highly abundant in some genomes (for example, in some mammals). They have no coding capacity and rely entirely on LINEs for their transposition. SINEs are derived from non-mobile short RNA sequences such as tRNA or 7SL transcripts. They do sometimes carry in 3’ some short regions homologous to LINEs, suggesting de novo emergence by template switching during reverse transcription. In humans, the SINEs Alu have amplified up to 1 million copies.

1.2.1.4. PLEs

Penelope-like element (PLEs) are a particular subclass among Class I elements. They form an ancient group characterized by a reverse transcriptase, which is more related to the eukaryotic telomerase reverse transcriptase (TERTs). They contain atypical LTRs, called pseudo-LTRs, that can be direct or inverted. Among other strange features, we find sometimes the presence of introns and hammerhead-like ribozymes in the pseudo-LTRs. An endonuclease may sometimes be present. Absence of endonuclease correlates with a preference for insertion near telomeres, taking advantage of the short RNA sequences from which originate the telomeres repeats. The absence of integrase or recombinase suggests they may reinsert using a TPRT process. Recent studies uncovered a distribution larger than expected, with elements present in all major eukaryotic kingdoms (Craig et al. 2021).

1.2.2. Class II

1.2.2.1. TIR

TIRs elements are the most known TEs among Class II. While the first TIR elements had been identified after their insertion into a gene, triggering a phenotypic change, most recent elements have been exclusively identified through genome sequence analysis. About 20 different superfamilies are now recognized (Yuan and Wessler 2011; Kojima 2020) and are quite universal (present in almost every sequenced genome). All TIR TEs share characteristic features such as the presence of terminal inverted repeats (TIR), ranging from 11 to several hundreds of bp, that surround one or two genes encoding – at least for a DDE transposase. The DDE motif takes its name from three important residues in the catalytic core of the transposase. Initially identified in Tc1-mariner elements only, variant DDE motifs have finally been recognized in all known TIR superfamilies. DDE transposases are part of the large RNAse H fold superfamilies of protein. DDE-containing proteins are also found in Polintons, as well as in LTR retroelements (integrase). Interestingly, most of the bacterial TEs also contain DDE transposases, and several eukaryotic TIRs superfamilies share more homologies with their bacterial counterpart than with other eukaryotic TIRs superfamilies (Yuan and Wessler 2011). This is in favor of an ancient diversification, prior to the split between eubacteria and archaea/eukaryotes.

The mechanism of transposition is referred to as the “cut-and-paste” mechanism. It is obvious then that this mechanism per se is not responsible for the amplification of copies. However, TIRs elements are sometimes present in a large number of copies. In eukaryotes, two main processes have been proposed to explain this apparent contradiction. In prokaryotes, in which similar elements exist, transposition figures between two DNA molecules are usually resolved by replication. This is a common mechanism (see Chapter 3). In eukaryotes, the fate of a transposing copy depends partly on genome replication. Indeed, if a copy present in an already replicated site transposes into an unreplicated site, this copy will be duplicated in one of the daughter cells (Figure 1.4). Of course, if the contrary occurs (transposition from unreplicated to replicated site), one daughter cell will lose the copy. Hence, in order to be efficient, transposition of Class II elements should be synchronized with replication. In any case, there is another way for a TIR element to amplify. The excision from the donor site leaves a double strand DNA break (DSB) that must be repaired by the cellular machinery for the cell to survive. From this point of view, Class II are quite harmful! Several ways exist in eukaryotic cells to repair a DSB (Figure 1.4). One is homology-dependant template repair, which uses an homologous DNA sequence to restore the intact site. Homologous sites are found in sister chromatids after replication, or in homologous chromosomes when the cell is at the diploid stage. If this homologous locus carries the TE copy, a full copy can be restored at the empty site. It often happens that the template-dependent repair is prematurely interrupted; the DSB is then repaired by an alternative mechanism known as non-homologous end-joining (NHEJ), which uses microhomology for direct ligation of the two broken ends. In such cases, an internally deleted copy can be generated, as shown for the P element in Drosophila. For other elements, NHEJ could be the main solution for repairing and this explains the excision footprint usually found at the empty donor site.

1.2.2.2. Cryptons

Cryptons were discovered in 2003 in several pathogenic fungi (Goodwin et al. 2003), and more recently in animals and some stramenopiles (Kojima and Jurka 2011). They have no repeats at the end and transpose using a tyrosine recombinase (YR), which supposedly excises them as a circular DNA intermediate and reinserts them in target sites with some homology. Indeed, short direct repeats are found at both termini. This group is poorly characterized, sharing relatively low similarity. They have been proposed to be at the origin of DIRS retroelements (Class I), and some domesticated versions have been identified in humans (Kojima and Jurka 2011).

1.2.2.3. Helitrons

Helitrons were discovered in silico in 2001, as a new group of Class II elements, characterized by the presence of a rolling-circle replication initiator protein (Rep) present in some prokaryotic TEs (such as IS91) and other mobile genetic elements that replicate through a rolling-circle mechanism (Kapitonov and Jurka 2001). The Rep domain corresponds to a HUH endonuclease and is associated with a C-terminal helicase domain. Helitrons have been found in almost all eukaryotic supergroups and are particularly prevalent in some species. Two groups are recognized that differ in their termini. Unlike TIR elements, Helitrons from group 1 do not have TIRs. They end with short sequences (LTS and RTS, for left and right terminal sequences, respectively) with several characteristics (5’TC, and 3’CTRR with a subterminal hairpin structure at the RTS which functions as a termination signal). In the second group, Helentrons contain an extra endonuclease domain similar to the one found in LINEs, and possess small subterminal inverted repeats in addition to the hairpin structures that can be present at both ends (Thomas and Pritham 2015).

Some aspects of the supposed transposition mechanism have been recently experimentally verified (Grabundzija et al. 2018; Kosek et al. 2021). The top strand of the donor DNA (coding strand) is cut at the LTS and displaced by the helicase activity, while synthesis of a new strand is initiated. The 3’ cut at the RTS is elicited by a pause of the protein at the hairpin structure and allows the formation of a single-stranded circular intermediate that theoretically can undergo rolling circle replication. However, it seems probable that the reinsertion concerns single-stranded DNA (ssDNA), the double strand at the target site being reconstituted during genome replication. This mechanism is referred to as “peel-and-paste”. More details are provided in Chapter 3 for prokaryotic TEs.

Figure 1.4.Different ways to increase copy number and to repair the excision site. (A) Transposition can proceed from a replicated site to an unreplicated one. After replication is finished, at least one daughter cell will have a extra copy. (B) The excision leaves a double strand break (DSB) that can be repaired by using an homologous template. If this template contains the insertion, the TE is restored at the excision site. Processes A and B can combine. (C) Another way to repair the DSB is by direct ligation using a non-homologous end joining pathway. This relies on microhomology at both ends that allows annealing of the two protruding ends, followed by ligation. This repair mechanism can be used to repair aborted SDSA (Single-Dependant Strand Annealing), leading to internally deleted elements, and can result in an excision footprint, depending on the way the element is excised.

The 3’ termination of the strand displacement at the donor site is loosely defined by the short 3’ hairpin, and adjacent sequences are frequently included in the transposition intermediate; therefore, these elements are able to capture genes, although this transduction-like mechanism may not account for all cases of gene capture observed. Several other models have been proposed but will not be discussed here (Kapitonov and Jurka 2007).

1.2.2.4. Polintons

Polintons (aka Mavericks) were discovered relatively recently (Kapitonov and Jurka 2006). They are long (around 20 kb) and possess TIRs as TIR elements. Between TIRs, several genes are present, including a DDE transposase gene and a DNA polymerase B gene. Hence, those elements are supposed to be replicative and have been called self-synthetizing. However, the precise mechanism of transpositions is not yet fully deciphered. Polintons may not be as active as TIRs, nor as universally spread, although they have been identified in several protists, fungi and animals, with the exception of plants (Haapa-Paananen et al. 2014). However, they can be abundant in some species (30% of the genome in Trichomonas vaginalis) (Pritham et al. 2007).

1.2.3. Autonomous, non-autonomous and relics

Most of these groups also contain shorter versions of TEs that contain cis sequences necessary for transposition, but lack coding capacity. These elements cannot transpose by themselves but can still move and amplify using the transposition machinery of the autonomous element. They are called non-autonomous and can be very short in size (between 200 bp and 1 kb). Non-autonomous elements derived from TIRs are very frequent and often outnumber the autonomous copies. They have been called MITEs (miniature inverted transposable elements); often, they have only kept the TIRs and the short central region appears unrelated or highly degenerated, compared to the autonomous. Every TIR superfamily has MITEs. Non-autonomous short versions also exist for Helitrons, for example. the DINE element of Drosophila, and for LTR retroelements (LARDs, TRIMs), LINEs and PLEs. In humans, the most numerous TEs are Alu elements that hijack the transposition machinery of L1 elements for transposing. Recently, a new group of non-autonomous elements have been identified in plants. Their peculiarities are the presence within LTRs of a hammerhead-like ribozyme. Such ribozymes are also found in non-LTR retroelements from animals. Those elements have been called retrozymes (Cervera and de la Peña 2020).

1.3. Abundance, diversity and distribution

Eukaryotic TEs are found everywhere but each species has a unique TE landscape. Abundance may vary from a few percent of the genome to more than 80%, and these characteristics are independent of the eukaryotic groups (Figure 1.5). Class I is usually more abundant, either because of LTR retroelements or of LINEs, but species with a majority of Class II elements can be found in any group. Hence, TE contents (proportion of the different types of TEs) are also quite variable (Wells and Feschotte 2020). Closely related species tend to have similar landscapes which can be explained by phylogenetic inertia (the TE landscape is ancestral). Shared TEs can also be more abundant in closely related species because of increased rates of horizontal transfer (Gilbert et al. 2021). However, very different patterns are sometimes observed between species belonging to a same phylogenetic clade. The huge increase in sequenced genomes should allow better deciphering of tendencies associated with phylogenetic clades. However, drawing an exhaustive picture is certainly impossible. The TE annotations can be performed using different tools, on assemblies of very variable quality, using a TE database more or less complete, which create lots of biases (see Chapter 11). Some recent analyses that combine homology-based de novo searches sometimes exhibit up to 50% of unclassified repetitive sequences. In brief, there are too many genomes and not enough comparative studies.

Figure 1.5.Variation of TE contents (in percentage of the genome) in major Eukaryotic groups

The diversity of TEs increases with the number of sequenced species. There is no exhaustive database but the most widespread superfamilies are divided into thousands of different TE families. For example, Ty3/Gypsy (Class I) and Tc1-mariner (Class II) are almost universal, and tens or hundreds are related; however, different families (lineages) often coexist in the same genome for each TE superfamily. Usually, only a few families are active and the remaining correspond to ancient, extinct (or little active) families. Why TEs have exploded in some genomes and not in others and why the success of any given superfamily is so variable are still unresolved questions. Population size, lifestyle and demography are thought to impose some constraints, or at least have an influence. Genome ability to control TE transposition or to purge efficiently non-essential sequences or genome permissivity to horizontal transfer are likely important factors as well.

Within genomes, TEs are not randomly distributed. Some genomic regions, such as heterochromatin, are enriched in TEs. This may be due to a harsher selection on euchromatic arms rich in genes. Hence, TE will tend to accumulate in regions less important for functions or in specific regions (centromeres, telomeres, sexual chromosomes, dispensable chromosomes, low recombination regions). This also depends on the TE insertional preferences, which range from no preference (TEs insert almost randomly) to highly specific (like some LINE elements do by targeting a specific sequence in the rDNA genes – which are nevertheless repeated!).

1.4. Origins of transposable elements and evolutionary relationships with other genetic elements