Genre Analysis and Corpus Design - Ulrike Henny-Krahmer - E-Book

Genre Analysis and Corpus Design E-Book

Ulrike Henny-Krahmer

0,0

Beschreibung

This work in the field of digital literary stylistics and computational literary studies is concerned with theoretical concerns of literary genre, with the design of a corpus of nineteenth-century Spanish-American novels, and with its empirical analysis in terms of subgenres of the novel. The digital text corpus consists of 256 Argentine, Cuban, and Mexican novels from the period between 1830 and 1910. It has been created with the goal to analyze thematic subgenres and literary currents that were represented in numerous novels in the nineteenth century by means of computational text categorization methods. To categorize the texts, statistical classification and a family resemblance analysis relying on network analysis are used with the aim to examine how the subgenres, which are understood as communicative, conventional phenomena, can be captured on the stylistic, textual level of the novels that participate in them.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern
Kindle™-E-Readern
(für ausgewählte Pakete)

Seitenzahl: 1101

Veröffentlichungsjahr: 2024

Das E-Book (TTS) können Sie hören im Abo „Legimi Premium” in Legimi-Apps auf:

Android
iOS
Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Schriften des Instituts für Dokumentologie und Editorik

herausgegeben von:

Bernhard Assmann

Alexander Czmiel

Oliver Duntze

Christiane Fritze

Frederike Neuber

Malte Rehbein

Patrick Sahle

Gerlinde Schneider

Martina Scholger

Nadine Sutor

Roman Bleier

Stefan Dumont

Franz Fischer

Ulrike Henny-Krahmer

Christopher Pollin

Torsten Roeder

Torsten Schaßan

Markus Schnöpf

Philipp Steinkrüger

Georg Vogeler

Band 17

Acknowledgements

My thanks go to my three supervisors: Christof Schöch, Fotis Jannidis, and Hanno Ehrlicher, who supported me with their professional ideas, hints, and feedback during the creation of this thesis, from the first beginnings of the definition of the topic to the guidance during the working process and the finalization of the text. This work has involved much more than reading, thinking, excerpting, structuring, and writing. The project also consisted of collecting, modeling, managing, analyzing, visualizing, and publishing data, working with digital tools and programming. My supervisors assisted and encouraged me in all these activities and are role models for me to follow established as well as new paths in academic work and in the Digital Humanities.

This work on the computational analysis of subgenres of nineteenth-century Spanish-American novels was undertaken within the framework of the BMBF-funded project “Computational Literary Genre Stylistics” (CLiGS) at the Chair of Computational Philology and Newer German Literary History of the University of Würzburg. I am grateful for the fact that I was able to concentrate on my dissertation by working in the project and for the exciting and instructive time and cooperation with the other staff members in CLiGS and at the chair. I very much hope that our paths will continue to cross. I would like to thank, in particular, my co-doctoral students José Calvo Tello, Daniel Schlör, and Stefanie Popp, as well as the project members Robert Hesselbach, Katrin Betz, and Steffen Pielström. For their support in the preparation of my corpus of novels, I would like to thank the student assistants Constanze Ludewig and Jakob Stahl. The CLiGS project has advanced computational genre analysis with corpora in Romance languages, in particular, and I am glad to have been a part of this initiative.

Further supporters have made the elaboration, completion, and publication of this dissertation possible: Thank you to the Ibero-American Institute (IAI) in Berlin for their support in the digitization of several of the Spanish-American novels, which are now part of the IAI’s digital library and of the Conha19 corpus that I created as part of this dissertation project. I would also like to thank Thomas Schmid and the Graduate School of the Humanities (GSH) of the University of Würzburg, which not only guided me through the administrative process of doctoral studies but also accompanied the steady progress of the project and opened up additional qualification opportunities. I thank DARIAH-EU for creating the Open Access Monograph Bursary for Early Career Researchers in Digital Humanities. I am very honored to be the first one to receive it. It encouraged me to publish the book immediately in Open Access and, above all, to pursue the approach of linking research data, program code, and text even more consistently. It is only because of the DARIAH bursary that there are now also TEI and HTML versions of this dissertation.

For proofreading and valuable advice on content and form, I thank my supervisors, the members of the Institute for Documentology and Scholarly Editing (IDE), Sean Winslow from the University of Graz and Rebecca Collin from the Academic Writing Consultancy at the University of Rostock. Thank you to Bernhard Assmann for his support in preparing the PDF version of this monograph and to Christopher Pollin for helping me with HTML and CSS complexities. Many thanks to Stefan Dumont for creating the book cover. Thank you to the Ehrenfelder Musikschule and the Urania Theater in Cologne, where I was able to sit, work and write during the pandemic when all libraries were closed.

I would also like to thank those who marked my professional path before my doctoral studies. These are, above all, the staff of the Cologne Center for eHumanities (CCeH) at the University of Cologne, where my work as a digital humanities researcher began and where I later, as a guest, always had a place in an office. I also thank the members of the IDE, which I joined in 2012. The IDE members are colleagues and friends, a group that, regardless of age, locality, or institutional affiliation, advances the topic of digital documentology and editing through joint activities. I am proud that my dissertation can now appear as a publication in the IDE’s publication series SIDE, even if it thematically reaches out in the direction of quantitative analysis of historical literary texts, beyond the core field of the IDE. As IDE members always say, “IDE is who you are.” I took that seriously. I would like to thank all the members of the IDE and especially Frederike Neuber, Martina Scholger, Patrick Sahle and Franz Fischer, with whom I have collaborated the most in the past and present.

Looking back even further, I thank my host families, classmates, and friends in Mexico who welcomed me during my year abroad there in 1999 and 2000, through whom I learned Spanish and developed a desire to further engage with their language, culture, and literature. Without them, the topic of this thesis would certainly be very different – or this thesis would not exist at all.

Finally, my thanks go to my friends and family. Thanks to my parents for letting me go out into the world and do what I was interested in, even to the far away places and on the uncertain paths. Thanks to my parents-in-law for taking care of me and having my back when it matters. Thank you, Constantin and Ivo, for bearing with me when I work, for distracting me, as well, and for being there. I dedicate this work to you.

Genre Analysis and Corpus Design: Nineteenth-Century Spanish-American Novels (1830–1910)

Summary

This work in the field of digital stylistics and computational literary studies is concerned with theoretical aspects of literary genre, with the design of a corpus of nineteenth-century Spanish-American novels, and with its empirical analysis in terms of subgenres of the novel. The digital text corpus consists of 256 Argentine, Cuban, and Mexican novels from the period between 1830 and 1910. It has been created with the goal to analyze thematic subgenres and literary currents that were represented in numerous novels in the nineteenth century by means of computational text categorization methods. The texts have been gathered from different sources, encoded in the standard of the Text Encoding Initiative (TEI), and enriched with detailed bibliographic and subgenre-related metadata, as well as with structural information.

To categorize the texts, statistical classification and a family resemblance analysis relying on network analysis are used with the aim to examine how the subgenres, which are understood as communicative, conventional phenomena, can be captured on the stylistic, textual level of the novels that participate in them. The result is that both thematic subgenres and literary currents are textually coherent to degrees of 70–90 %, depending on the individual subgenre constellation, meaning that the communicatively established subgenre classifications can be accurately captured to this extent in terms of textually defined classes.

Besides the empirical focus, the dissertation also aims to relate literary theoretical genre concepts to the ones used in digital genre stylistics and computational literary studies as subfields of digital humanities. It is argued that literary text types, conventional literary genres, and textual literary genres should be distinguished on a theoretical level to improve the conceptualization of genre for digital text analysis.

Análisis de género y diseño de corpus: Novelas hispanoamericanas del siglo XIX (1830–1910)

Resumen

Este trabajo en el campo de la estilística literaria digital y los estudios literarios computacionales se ocupa de las preocupaciones teóricas del género literario, del diseño de un corpus de novelas hispanoamericanas del siglo XIX y de su análisis empírico en términos de subgéneros de la novela. El corpus de textos digitales consta de 256 novelas argentinas, cubanas y mexicanas del período comprendido entre 1830 y 1910. Ha sido creado con el objetivo de analizar los subgéneros temáticos y las corrientes literarias que estaban representadas en numerosas novelas del siglo XIX mediante métodos de categorización computacional de textos. Los textos han sido recogidos de diferentes fuentes, codificados en el estándar de la Iniciativa de Codificación de Textos (TEI), y enriquecidos con detallados metadatos bibliográficos y de subgéneros, así como con información estructural.

Para la categorización de los textos se utiliza una clasificación estadística y un análisis de semejanza familiar basado en el análisis de redes, con el fin de examinar cómo los subgéneros, entendidos como fenómenos comunicativos y convencionales, pueden ser captados en el plano estilístico y textual de las novelas que participan en ellos. El resultado es que tanto los subgéneros temáticos como las corrientes literarias son textualmente coherentes en grados del 70–90 %, dependiendo de la constelación individual de subgéneros, lo que significa que las clasificaciones de subgéneros establecidas comunicativamente pueden ser capturadas con precisición hasta este punto en términos de clases textualmente definidas.

Además del enfoque empírico, la disertación también pretende relacionar los conceptos teóricos de género literario con los utilizados en la estilística de género digital y los estudios literarios computacionales como subcampos de las humanidades digitales. Se argumenta que los tipos de texto literario, los géneros literarios convencionales y los géneros literarios textuales deberían distinguirse a nivel teórico para mejorar la conceptualización del género para el análisis de textos digitales.

Gattungsanalyse und Korpusaufbau: Hispanoamerikanische Romane im 19. Jahrhundert (1830–1910)

Zusammenfassung

Diese Arbeit ist in den Forschungsfeldern der digitalen literaturwissenschaftlichen Stilistik und der Computational Literary Studies angesiedelt und setzt sich mit theoretischen Gattungsproblemen, mit der Erstellung eines Korpus von hispanoamerikanischen Romanen des 19. Jahrhunderts und mit ihrer empirischen Analyse nach Untergattungen auseinander. Das digitale Textkorpus umfasst 256 argentinische, kubanische und mexikanische Romane aus der Zeit von 1830 bis 1910 und ist mit dem Ziel erstellt worden, thematische Untergattungen und literarische Strömungen, die im 19. Jahrhundert durch zahlreiche Romane repräsentiert waren, mit Hilfe computergestützter Methoden der Textkategorisierung zu analysieren. Die Texte wurden aus verschiedenen Quellen zusammengetragen und gemäß dem Standard der Text Encoding Initiative (TEI) codiert, wobei die Dokumente mit detaillierten bibliographischen und untergattungsbezogenen Metadaten sowie mit textstrukturellen Informationen angereichert wurden.

Um die Texte zu kategorisieren werden Verfahren der statistischen Klassifikation und eine Familienähnlichkeitsanalyse verwendet, die auf einer Netzwerkanalyse basiert. Das Ziel der Analysen ist es zu untersuchen inwieweit die Untergattungen, die primär als Phänomene der Kommunikation und Konvention verstanden werden, auf der stilistischen, textlichen Ebene der Romane, die an ihnen teilhaben, erfasst werden können. Das Ergebnis ist, dass sowohl die thematischen Untergattungen als auch die literarischen Strömungen zu 70–90 % textlich kohärent sind, in Abhängigkeit der gewählten Untergattungskonstellation, womit gemeint ist, dass die kommunikativ etablierten Untergattungsklassifikationen in diesem Maß an Genauigkeit auch als textlich definierte Klassen erfasst werden können.

Über die empirische Ausrichtung hinaus ist ein weiteres Ziel der Dissertation, literaturtheoretische Gattungskonzepte zu denjenigen in Beziehung zu setzen, die in der digitalen Gattungsstilistik als einer Teildisziplin der Digital Humanities verwendet werden. Es wird argumentiert, dass literarische Texttypen, konventionelle literarische Gattungen und textliche literarische Gattungen auf einer theoretischen Ebene unterschieden werden sollten, um die Konzeption von Gattung für die digitale Textanalyse zu verbessern.

Contents

Acknowledgements

Summary

Resumen

Zusammenfassung

1 Introduction

2 Concepts

2.1 Literary Genres

2.1.1 Disciplinary Locations of Genre Studies

2.1.2 Ontological Status and Relevance of Genres

2.1.2.1 Semiotic Models of Genres

2.1.2.2 Genres and Digital Genre Stylistics: The Roles of Corpora, Genre Labels, Features, and Text Style

2.1.3 System and History

2.1.3.1 A Conceptual Proposal for Digital Genre Stylistics: Literary Text Types, Conventional Literary Genres, and Textual Literary Genres

2.1.3.2 Text Types, Conventional Genres, and Textual Genres in Semiotic Models of Generic Terms

2.1.3.3 Literary Currents, Schools, and Movements

2.1.3.4 Genre Systems and Hierarchies

2.1.3.5 Genre Identity and Variability

2.1.4 Categorization

2.1.4.1 Logical Classes

2.1.4.2 Prototype Categories

2.1.4.3 Family Resemblance Networks

2.2 Style

2.3 Subgenres of the Nineteenth-Century Spanish-American Novel

2.3.1 Thematic Subgenres

2.3.1.1 Novela histórica

2.3.1.2 Novela de costumbres

2.3.1.3 Novela sentimental

2.3.2 Subgenres Related to Literary Currents

2.3.2.1 Novela romántica

2.3.2.2 Novela realista

2.3.2.3 Novela naturalista

3 Corpus

3.1 Selection Criteria

3.1.1 Boundaries of the Novel

3.1.1.1 Fictionality

3.1.1.2 Narrativity

3.1.1.3 Prose

3.1.1.4 Length

3.1.1.5 Independent Publication

3.1.1.6 Additional Criteria

3.1.1.7 A Working Definition of the Novel

3.1.2 Borders of Argentina, Cuba, and Mexico

3.1.3 Limits of the Nineteenth Century

3.2 Bibliographical Database

3.2.1 Sources

3.2.2 Data Model and Text Encoding

3.2.3 Assignment of Subgenre Labels

3.2.3.1 An Example

3.2.3.2 Levels of Subgenre Terms

3.2.3.3 Explicit and Implicit Subgenre Signals

3.2.3.4 Interpretive Subgenre Labels

3.2.3.5 Literary-Historical Subgenre Labels

3.2.3.6 A Discursive Model of Generic Terms

3.3 Text Corpus

3.3.1 Selection of Novels and Sources

3.3.2 Text Treatment

3.3.3 Metadata and Text Encoding

3.3.3.1 TEI Header

3.3.3.1.1 Title and Publication Statements

3.3.3.1.2 Declaration of Rights

3.3.3.1.3 Source Description

3.3.3.1.4 Encoding Description

3.3.3.1.5 Abstracts

3.3.3.1.6 Text Classification with Keywords

3.3.3.1.7 Revision Description

3.3.3.2 TEI Body

3.3.3.2.1 Typographically Marked Subdivisions of the Text

3.3.3.2.2 Typographically Highlighted Words or Phrases

3.3.3.2.3 Gaps

3.3.3.2.4 Verse Lines

3.3.3.2.5 Dramatic Text

3.3.3.2.6 Representations of Written Text

3.3.3.2.7 Quotations

3.3.3.2.8 Direct Speech and Thought

3.3.3.2.9 Embedded Texts

3.3.3.3 TEI Schema

3.3.4 Assignment of Subgenre Labels

3.3.5 Derivative Formats and Publication

4 Analysis

4.1 Metadata Analysis

4.1.1 On Representativeness

4.1.2 Authors

4.1.3 Works

4.1.3.1 Comparison of Bib-ACMé and Conha19

4.1.3.2 Corpus-specific Overviews

4.1.4 Editions

4.1.5 Subgenres

4.1.5.1 Explicit Signals, Implicit Signals, and Literary-Historical Labels

4.1.5.2 Discursive Levels of Subgenre Labels

4.1.5.2.1 Theme

4.1.5.2.2 Literary Currents

4.1.5.2.3 Mode of Representation

4.1.5.2.4 Mode of Reality

4.1.5.2.5 Identity

4.1.5.2.6 Medium

4.1.5.2.7 Attitude

4.1.5.2.8 Intention

4.1.5.3 Subgenre Labels Selected for Text Analysis

4.1.5.3.1 Primary Thematic Labels

4.1.5.3.2 Primary Literary Currents

4.2 Text Analysis

4.2.1 Features

4.2.1.1 General Features: MFW

4.2.1.2 Semantic Features: Topics

4.2.2 Categorization

4.2.2.1 Classification

4.2.2.1.1 Thematic Subgenres

4.2.2.1.2 Literary Currents

4.2.2.2 Family Resemblance: Network Analysis

4.2.2.2.1 Method

4.2.2.2.2 Data

4.2.2.2.3 Results

5 Conclusion

References

Appendix

Sources of the Novels in the Corpus

Appendix of Figures

Index of Figures

Index of Tables

Index of Examples

1 Introduction

If people are asked about the objects, beings, or events around them, they will most probably name the categories that the things belong to, for example, a book on a shelf, a bird in a tree, a dancer on stage, or a thunderstorm in the sky. Research in cognitive development has shown that even small babies begin to recognize what is around them in terms of categories when they are about a year old (Gopnik, Meltzoff, and Kuhl 2009, 79–83). Paradoxically, however, all objects and beings are unique: “All you ever see are individual objects: this particular sweet pea, this individual dollar bill. There is no ‘sweet-peaness’ or ‘dollarhood’ in the world. So how could it ever be informative to say that this individual thing belongs to this nonexistent, mythical category, when the individual thing itself is all we ever actually experience?” (Gopnik, Meltzoff, and Kuhl 2009, 79). Categorizing serves a basic need of humans to confer meaning to what they perceive and to leave aside the individuality of things. It helps them to grasp the world around them. However, the perception of the individual is also dependent on the understanding of the general, so that what is special about something emerges from the background of the familiar.

Literary texts are no exception. One type of category that they are commonly associated with is genre. A poem is something different than a drama, and a science fiction novel is not to be confused with a sentimental one. One comes across literary genres in everyday life, for example, as an organizing principle in a bookstore, in a library, or on the covers of the books themselves. Experiences in daily life usually suggest that the assignment of individual texts to genres does not cause particular problems, only that one might be disappointed, surprised, or impressed when the book bought is different than expected by its genre label. In literary theory and its antecedents, however, the “genre problem” has been discussed intensely for thousands of years, starting with the attempt of Aristotle and Plato to formulate a theory of poetry (Zymner 2003, 10). Some of the main questions in the debates about genre are as what type of category they can be conceived: as logical classes with clear boundaries into which all the literary works can neatly be sorted? As prototypical categories with exemplary masterpieces at the center and mediocre imitations at the edge? As networks of related texts that form generic families? Or as some other kind of category that can be described as a combination of necessary and optional features?1 Moreover, it has been debated if genres can be assumed to exist at all beyond pure naming conventions, given that the literary works associated with them can be so different. This is connected to the problem of genre change and also dependence on the cultural context because literary historians must deal with the phenomenon that the same names of genres are applied to phenomena with quite distinct textual characteristics across time and place. At times it has been tried to avoid the challenges that genres as categories of literary texts pose by denying their relevance altogether (for example, by Croce 1905). However, the practical relevance that genres have not only in daily life but also for students of literature and literary scholars cannot be denied. Topics of courses and exams are often defined in terms of genres, for example, a seminar on the Spanish picaresque novel or classic drama. Literary histories are also often structured in terms of genres that were important for certain periods. Finally, the interpretation of individual literary works does not happen in a vacuum. In order to assess the value of single texts, they are often examined with regard to a specific literary tradition or genre (Keckeis and Michler 2020, 7–8). As a way to approach the genre problem theoretically, there is a tendency in recent literary genre theory to see the phenomenon as one that can be described in different dimensions that are linked to each other in cognitive, communicative, social, and textual dimensions (Gymnich and Neumann 2007).

This dissertation aims to enter the theoretical discussion about genres from an interdisciplinary perspective. It is located in the field of digital literary stylistics, which is part of the wider discipline of digital humanities, in which humanities research is combined with methods from information science and computer science, and which includes interdisciplinary disciplines such as computational linguistics, computational philology, or computational literary studies. The subfield of digital stylistics is concerned with the analysis of linguistic and literary style with computational methods. An important subject is the investigation of the style of individual authors, but genre style has also been the focus of digital stylists.2 To examine genre on the level of style means that the approach is primarily text-centered, and it also entails empirical work.

Digital literary stylistics is not exclusively but predominantly applied research. The basis for it are digital corpora of literary texts, which are designed for a specific language, period, set of authors, or genre, or combinations of several ones of them if the aim is a contrastive analysis. The topic of this dissertation is, therefore, genre analysis and corpus design, both as a theoretical discussion of genre as a concept in literary theory and digital stylistics and as an empirical corpus study. A specific corpus was built for this purpose as a basis for an analysis of metadata and texts in terms of genre. The genres that the empirical part of this study is concerned with are the novel and its subgenres in the context of nineteenth-century Spanish-American literature, more precisely Argentine, Cuban, and Mexican novels that were published between 1830 and 1910. There is a growing number of digital stylistic studies concerned with texts in Romance languages, as the contributions to the conference “Digital Stylistics in Romance Studies and Beyond”, which took place at the University of Würzburg in 2019, show.3 Nevertheless, most of the digital stylistic studies on literary texts are still based on corpora of texts in English.4 Comprehensive central repositories of digital literary texts, which are curated following scholarly standards, such as the Digital Library in the TextGrid Repository (TextGrid n.d.) or the German Text Archive (Berlin-Brandenburgische Akademie der Wissenschaften 2022) for German texts, are not yet available for Spanish literary works. There are, however, initiatives that also promote the building of corpora of literary texts in Spanish, many of which go back to individual work, community initiatives, or research projects, for example, the “Corpus of Spanish Golden-Age Sonnets” (Navarro-Colorado, Ribes Lafoz, and Sánchez 2016; Navarro-Colorado 2020) or the multi-language corpora DraCor (Fischer et al. 2019, n.d.) and ELTEC (Odebrecht, Burnard, and Schöch 2021), which include Spanish drama and novels, respectively. In addition, the project “Computational Literary Genre Stylistics” (CLiGS), the context in which this dissertation was written, was concerned with building and analyzing digital corpora of literary texts in French, Spanish, Italian, and also Portuguese.5 Building such corpora is important for several reasons: it strengthens digital quantitative research in the respective disciplines and language areas, and it helps to make the empirical results of digital stylistics in general more reliable if they are based on findings derived from a broad range of different corpora and if they are not specific for certain languages or genres.

The Spanish-American nineteenth-century novel is well studied, and knowledge about it is consolidated in literary histories and monographs.6 Various subgenres of this novel have also been analyzed in depth by literary scholars, for instance, the historical novel, the anti-slavery novel, or novels of the romantic, naturalistic, and modernist currents (Löfquist 1995; Peñaranda Medina 1994; Read 1939; Rivas 1990; Schlickers 2003; Suárez-Murias 1963). However, nineteenth-century Spanish-American novels and their subgenres have not yet been analyzed on the basis of a comprehensive digital text corpus and by means of stylistic computational methods. There are several reasons why a quantitative digital analysis of the subgenres of that novel is of interest. First, many studies on nineteenth-century Spanish-American novels focus on selected works that have a canonical status. The effect is that only a specific section of the whole literary production of the time forms the basis of the literary-historical knowledge about the novel and its subgenres in that period.7 There is also qualitative research based on larger corpora, but in these cases, scholars mostly concentrate either on the novel as a whole or on one specific subgenre.8

A digital approach in which several subgenres of the novel are contrasted can contribute new insights into the characteristics of the texts that are distinctive for the different subgenres. Moreover, more novels can be taken into account if a comparatively large corpus is used – not only the well-known novels but also works that have thus far not received much critical attention. This can shed new light on the concepts of the subgenres, on the one side because the quantitative relevance of the subgenres becomes clearer, and on the other side because lesser-known works possibly represent the subgenres that they are associated with in a different way, by use of other textual traits and stylistic means. Third, a quantitative digital study is different from a qualitative approach even if the same number of novels and sub-genres would be analyzed because the way knowledge about the texts is extracted and summarized is not directly dependent on the human reader but results from a mechanical treatment of the texts and computational processing. This can produce new findings about the subgenres that remain unrecognized by close reading methods. Even if the nineteenth century is past, the literature of that time is still of importance because that century marked the rise of the novel as a genre and the beginning of the national literatures of the different Spanish-American countries. Many of the subgenres that were practiced or emerged in the nineteenth century are still relevant in twenty-first-century literature, such as historical or crime novels. For digital genre stylistics in general, the subgenres of the Spanish-American novels are an interesting empirical case because they combined generic concepts of a European origin with specific local inventions. Especially regarding literary currents, a neat chronological succession is not given so that several currents were en vogue at once (on these aspects, see, for instance, Varela Jácome [1982] 2000). It is an interesting question to what extent different theoretical concepts of genre categories are suitable to capture the various nineteenth-century Spanish-American subgenres.

Although this dissertation is concerned with texts in Spanish, it is written in English because the field of digital genre stylistics and digital humanities in general is highly interdisciplinary. The aim is to provide results that can be appreciated by scholars of Spanish-American literature but also by digital humanists from around the world. A second linguistic and cultural background of this thesis is German, and much research literature from German-speaking countries has been taken into account, especially literature on genre theory but also digital stylistics papers and research on the Spanish-American novel. In general, quotes are not translated, assuming that the context provides enough information to grasp their meaning.

Before the specific goals and questions of this thesis and its structure are outlined, it must be clarified what is not covered here. The period that is covered is 1830 to 1910, the whole long nineteenth century, but the corpus of novels that is analyzed is treated as a synchronic one. In the discussion of results, the publication date of the novels has been taken into account to see if that had an influence on the results, but no inherently diachronic analysis of the subgenres is pursued here. A second aspect that is not addressed fully at the moment of the publication of this thesis is the one of sustainable research data management in connection with the publication of the corpus. Its basic publication strategy is presented, and it is published in Open Access and in standard formats in a public code repository on GitHub and Zenodo, including versioning, but the publication method is not discussed explicitly in relationship to the FAIR principles (Wilkinson et al. 2016) or other best practices for research data publication. In the longer term, it is planned to prepare the corpus for long-term preservation and accessibility in suitable institutional or subject-specific repositories, but in the context of this dissertation, the initial focus was on its creation, analysis, and basic availability for transparency and re-use.

As mentioned above, the main goals of this dissertation are firstly in the area of genre theory, secondly in the construction of a digital corpus of novels, and thirdly in its computer-assisted analysis. The theoretical foundations of the thesis are clarified in chapter 2, “Concepts”. On the level of genre theory (chapter 2.1, “Literary Genres”), the aim is to work out which of the existing concepts of the ontological status of genres (chapter 2.1.2, “Ontological Status and Relevance of Genres”) and their historical and theoretical nature (chapter 2.1.3, “System and History”) are relevant and applicable and how these concepts need to be adapted for digital genre stylistics. In this context, three aspects are specifically addressed. The first aspect is the question of how generic terms can be modeled and defined. This is an important issue because genre labels are the main feature through which genre conventions enter a digital stylistic text analysis. This aspect is deepened in chapters 2.1.2.1 (“Semiotic Models of Genres”) and 2.1.2.2 (“Genres and Digital Genre Stylistics: The Roles of Corpora, Genre Labels, Features, and Text Style”). Second, definitions of genre and text types stemming from literary theory and linguistics are compared to see to what degree they are suitable for digital stylistics. In particular, the question of how and whether a conventional or historical level of genre and a textual one should be separated is discussed. An own proposal is made for conceptual differentiation in chapter 2.1.3.1 (“A Conceptual Proposal for Digital Genre Stylistics: Literary Text Types, Conventional Literary Genres, and Textual Literary Genres”), building on existing approaches. Third, three main concepts that have been proposed to conceptualize genres as categories, namely logical classes, prototypical structures, and family resemblance networks, are related to the distinction between conventional and textual levels of genre (chapter 2.1.4, “Categorization”). It is then outlined how these three concepts can be implemented in text-based digital genre analyses by referring to computational methods of text classification, clustering, and network analysis. The theoretical part also explains the concept of literary style (chapter 2.2, “Style”) that underlies the analyses in the empirical part. Furthermore, the part on concepts is closed by a presentation of three major thematic subgenres and three literary currents chosen for text analysis (chapter 2.3, “Subgenres of the Nineteenth-Century Spanish-American Novel”). These are the historical novel, the sentimental novel, and the novel of customs as thematic subgenres, and the romantic novel, the realist novel, and the naturalistic novel as literary currents. Several hypotheses are formulated regarding the textual and stylistic characteristics and coherence expected for these subgenres and currents.

The empirical part of the work has two main parts: chapter 3, “Corpus”, and chapter 4, “Analysis”. The first main goal of this part has been to build up a comprehensive digital bibliography of Argentine, Mexican, and Cuban nineteenth-century novels and a corresponding digital corpus of 256 texts. Both have been elaborated as a prerequisite and basis for the text analysis of subgenres. The selection of novels from the three countries is motivated, and the diachronic limits of the bibliography and the corpus are clarified. General defining characteristics of the novel are discussed as a basis for the selection of works for both digital resources (chapter 3.1, “Selection Criteria”). A special focus is on how the subgenre labels were collected, modeled, and encoded (chapters 3.2.3 and 3.3.4, “Assignment of Subgenre Labels”, for the bibliography and the corpus, respectively). An empirically based adaptation of the semiotic models of Raible (1980) and Schaeffer (1983), which are also presented in the first chapter on genre theory, provides the theoretical foundation for the organization of the subgenre labels in the bibliography and the corpus. The preparation of the bibliography and corpus is explained in detail, including the availability and usage of bibliographical and full-text sources, the treatment of the extracted full-texts, the collection of metadata and text encoding, and the chosen publication strategy. Both resources are published on the web and offered to other scholars for reuse (Henny-Krahmer 2017–2021, 2021a).

The creation of the two collections of data and texts was primarily motivated by the goal of analyzing subgenres of Spanish-American nineteenth-century novels with quantitative methods. Therefore the selection of the materials was guided by the question about the specific subgenres that are the focus of interest here. However, the bibliography and the corpus also aim to provide a foundation for future analysis in other contexts. There are aspects of the corpus that are not employed in the analyses in this dissertation but are nonetheless presented as relevant for the design of corpora for digital literary genre studies. Examples are chapter structures and paragraphs that were encoded in the corpus but not considered in the analysis. Another example is the separation of direct speech and narrated text, which was realized only for a part of the corpus and was analyzed only on a test basis. Such additional encoding prepares for future analyses beyond the scope of this dissertation. In addition, some structural units of the corpus have already been used for analyses in the CLiGS project, although they are not the focus of the work here.9 The digital text corpus created here thus claims to go beyond limited, project-specific use. It aims to be a community data collection that can be used by different representatives of a research community, is suitable for addressing different questions from a specific research field, is comprehensive, follows discipline-specific standards, and is designed to be archived and reusable in the medium term (Schöch 2017a, 224; National Science Board 2005, 20–21).

Of the two resources, the bibliography constitutes the sampling frame for the novels in the corpus, which means that it represents the larger population of all the novels that were published between 1830 and 1910 in Argentina, Cuba, and Mexico. Of course, the bibliography does not contain information about all these works, as it cannot be known with certainty how many and which novels were published in that time, but it aspires to approximate that amount of novels. It is then possible to compare the novels contained in the bibliography to the ones in the corpus to see how representative the latter is for the novel and its subgenres of the chosen years and countries. This is done in the first part of the analysis chapter, in chapter 4.1, “Metadata Analysis”. Not only the question of representativeness is tackled in that chapter, but also which subgenres on which discursive levels were quantitatively relevant. In addition, it is analyzed how the novels can be characterized by other parameters that have a possible impact on the analysis of genre style, for instance, the narrative perspective of the novels or the decades that they were published in. The metadata analysis chapter also provides a general overview of which authors and works are included in the bibliography and corpus and to which subgenres the works are assigned. This informs potential subsequent users of the resources in detail about their content and the distribution of the content in quantitative terms.

The second part of the analysis chapter, chapter 4.2, “Text Analysis”, is concerned with the text analysis of the corpus of 256 novels. Two main types of stylistic features are employed in the analysis: most frequent words (MFW) and topics. In the first part of the text analysis chapter (4.2.1, “Features”), both types of features are presented and it is discussed how they relate to literary concepts of style and theme. In the second part of the analysis chapter (4.2.2, “Categorization”), the texts are categorized, first by statistical classification and then with a family resemblance network analysis as an alternative categorization approach. The novels are analyzed on two discursive levels of genre: thematic subgenres and literary currents. Only the subgenres and currents that are most relevant in quantitative terms are analyzed in this part. One goal of the text analysis is to show in empirical experiments how statistical classification and network analysis can be employed to analyze genres on the textual level in terms of different categorical concepts. Another goal is to find out if the conventionally, historically, and theoretically defined thematic subgenres and literary currents can be captured at all on the stylistic level of a group of texts, and if yes, how textually coherent the groups of novels associated with these subgenres are. In the classification setting (chapter 4.2.2.1, “Classification”), textual coherence means the degree to which the communicatively established subgenre classifications of the novels can be captured accurately in terms of textually defined classes, and it is measured in terms of classification accuracy. A further question is what can be learned about the subgenres and the individual texts from the errors that the classifier makes.

Besides the statistical classification approach, a family resemblance analysis (chapter 4.2.2.2, “Family Resemblance: Network Analysis”) is pursued. While a classificatory approach assumes strict boundaries between the various groups of texts, in a network structure, the focus is on direct and indirect relationships between groups of novels, and the results are more open. In this context, the question of textual coherence refers to the extent to which textually based groups of novels in the network are also related to the same genre or subgenre of novels from a communicative perspective. In this case, coherence cannot simply be measured with an accuracy value but must be assessed by evaluating and interpreting the clusters found in the network. That way, the family resemblance network analysis can also answer questions about the internal structure of subgenres, and it takes into account factors other than the genre that may influence the groupings of texts found in the network.

Just as for the digital bibliography and text corpus, all Python and XSLT scripts used to perform the analyses and all associated data are published on GitHub and Zenodo in script and data repositories (Henny-Krahmer 2021a, 2021b, 2021c, 2021d). From the text of this dissertation, links are always provided to the relevant individual scripts and data in these repositories. Selected result data are also included directly in the text in the form of XML examples, tables, and figures. This book is, therefore, to be understood as an enhanced monograph: the text is a chain of argumentation and a narrative that leads through the data and scripts and becomes complete only with them. In addition, the text of this dissertation itself has been encoded in TEI and is available in a web-based HTML format and as a PDF.10 Finally, it must be said that this work was submitted as a dissertation in early 2021. Updates could be made only partly, so in essence, the contents reflect the state of research at the time of submission.11

1 For an overview of the categorization aspect of genres, see Zymner (2003, 99–104).

2 For an introduction to the background and goals of digital literary stylistics, see the website (SIG-DLS n.d.) of the corresponding special interest group of the Alliance of Digital Humanities Organizations (ADHO).

3 See the call for papers (CLiGS n.d.) and the conference proceedings to be published in 2023 (Hesselbach et al., forthcoming).

4 See, for instance, the influential studies of Jockers (2013) and Underwood (2019).

5 One outcome of the project is the Textbox, a collection of small to medium-sized corpora of literary texts in Romance languages of different genres, which are published on GitHub and free to reuse (Schöch, Calvo Tello et al. 2018, 2019). Beyond the Textbox, the following more extensive individual corpora resulting from the CLiGS project are worth mentioning: the “Corpus of Novels of the Spanish Silver Age” (CoNSSA, Calvo Tello 2021a) and a text collection of over 800 French dramatic texts (Schöch 2017b) derived from the corpus Théâtre Classique (Fièvre 2007–2022). The latter is also available as part of the multilingual DraCor corpus, where it is called FreDraCor (Milling, Fischer, and Göbel 2021).

6 For general literary histories on Spanish-American literature that also cover the nineteenth-century novel and for specialized monographs, see, among others, Alegría (1959), Anderson Imbert (1954), Dill (1999), Gálvez (1990), Goić (2009), Íñigo Madrigal, Alvar, and Aínsa (1982), Lindstrom (2004), Rössner (2007), and Sánchez (1953).

7 Rivas (1990), for instance, establishes the concept of the anti-slavery novel based on seven different novels. Gnutzmann (1998) as well studies the Argentine naturalistic novel with a corpus of seven texts.

8 For example, Löfquist (1995) on the Chilean historical novel, Read (1939) on the Mexican historical novel, or Schlickers (Schlickers 2003) on the Spanish-American naturalistic novel. Another approach is to consider the novel as a whole for an individual country and for a certain period. Lichtblau (1959), for example, studies the nineteenth-century novel in Argentina, and Molina (2011) the Argentine novel between 1838 and 1872.

9 There are two studies based on subparts of the corpus in which the internal structure of the texts was exploited: Schöch, Henny et al. (2016) on the development of topics in different parts of the novels, depending on the subgenres, and Henny-Krahmer (2018) on the connection of sentiments and direct speech versus narrated text in different subgenres.

10 The web-based edition of this dissertation can be accessed at https://side17.i-d-e.de/.

11 In the meantime, for example, the dissertation of my co-doctoral student José Calvo Tello from the CLiGS project has been published (Calvo Tello 2021b), the content of which could not be considered here because the dissertations were prepared at the same time. Due to the joint research project in which the two theses were written, there are, of course, common foundations and references between them.

2 Concepts

A computational stylistic genre analysis of Spanish-American novels builds on terms and concepts from several disciplines. These must be clarified and related to each other, which is the goal of this chapter, in which genre-theoretical aspects, concepts of literary style and literary-historical basics on the Spanish-American novel are discussed. In the first part of this chapter (2.1), concepts of literary genre are approached. First, it is outlined which scholarly disciplines are concerned with genre studies, which ones are relevant for digital genre stylistics, and how they relate (2.1.1). Then three literary theoretical issues about genre, which have caused much debate in literary genre theory, are discussed, namely their ontological status and relevance (2.1.2), the relationship between systems or theories of genres and their history (2.1.3), and three main types of concepts for genres as categories – logical classes, prototype categories, and family resemblance analysis (2.1.4). All of these theoretical issues are related to digital stylistic genre analyses’ practices to find out which genre theoretical concepts are useful and applicable in that field and how literary genre theory and computational genre stylistics interact. In the second main part of this chapter (2.2), a working definition of literary style is presented as a basis for analyzing metadata and text in the empirical part of the thesis. In the last part, in section 2.3, literary-historical background information is given for three major thematic subgenres and three literary currents of nineteenth-century Spanish-American novels to formulate hypotheses and establish a basis on which they can be analyzed textually.

2.1 Literary Genres

2.1.1 Disciplinary Locations of Genre Studies

In general language, the term “genre” is used to designate kinds of communicative acts that may be written, spoken, or otherwise represented. Not individual instances of communicative acts are designated by the term “genre”, but the characteristics of groups of them. Genres may be of any sort of communication, for example, instruction manuals or podcasts, but in most cases, “genre” refers to forms of art such as kinds of works in the visual arts, performing arts, music, and literature, the latter being at the center of interest here, more precisely in their written form. This investigation thus focuses on literary genres.12 In a general sense, literary genres can be understood as groups of literary texts that share or can be referred to with a group name because they have something in common. For example, Agatha Christie’s “Murder on the Orient Express”, Henning Mankell’s “Innan frosten”, or Mario Vargas Llosa’s “Lituma en los Andes” can all be considered novels and, more specifically, crime novels. There has been much debate in literary studies about what the genre names are or should be, what the commonality of the texts belonging to a genre is, and what role genres play for literary texts at all. The investigation of literary genres is an old but still a central problem of literary studies, whether on a theoretical or historical level. The discussion about genres can at least be dated back to antiquity, and often, Aristotle‘s Poetics from c. 335 BCE is cited as one of the initial texts concerned with genre theory.13 Still today, there is an ongoing debate on the definition of genres both in the sense of general concepts as well as on the level of concrete individual genres, which the vast literature on genre theory and the history of genres shows.14

However, literary genres have not only been investigated in literary studies themselves but also within the broader context of textual genres and text classes, for example, in general linguistics, computational linguistics, and information science. While in literary studies, genres are usually understood as kinds of literary works, in linguistics, they tend to be conceived as all sorts of texts, also non-literary ones, and are therefore often referred to as ”text types“.15 In the field of computational processing of text, there is a tradition, especially in computational linguistics, of describing, detecting, and distinguishing genres and types of text.16 In computer science, the task of automatically grouping different kinds of texts has been pursued under the labels of “text categorization“ or “text classification“.17 The term “categorization” is used in different ways in computer science. Sometimes it is understood as equivalent to “classification”, and in other cases, it is only used for unsupervised methods such as clustering.18 Here, in contrast, the term “categorization” is used in a more general sense to comprise all different kinds of category building. This is the sense of the term that is usually used in literary genre theory (see Müller 2010).

The concern with literary genres, the linguistic characteristics of text types, and the computational processing of text converges in digital literary studies, computational philology, or computational literary studies and more specifically in digital stylistics, or ”stylometry“. The scope of digital literary studies is broad and comprises all points of contact between literature and the computer.19 The term ”computational philology“ can also be understood as a collective term for all possible uses of the computer in literary studies, with a focus on the creation and use of digital editions, for example, but also on computational text analysis (Jannidis 2007; 2010, 109). Computational literary studies, on the other hand, is a newer term for a subfield of the digital humanities in which a particular emphasis is placed on quantitative text analysis methods20. Digital stylistics, in turn, focuses on studying style with digital methods. Stylistics can be defined as “a sub-discipline of linguistics that is concerned with the systematic analysis of style in language and how this can vary according to such factors as, for example, genre, context, historical period and author” (Jeffries and McIntyre 2010, 12). The paradigmatic case of a digital stylistic study is authorship attribution, i. e. the use of statistical methods to clarify cases of anonymous or disputed authorship. However, quantitative digital methods have also recently been used for genre stylistics.21 It should be added that stylistics also is a sub-discipline of literary studies when its methods are applied and developed in the context of literary scholarship, especially because style is considered an important characteristic of literary texts (Spillner 2001, 234).

The present study, which aims to create and analyze a corpus of nineteenth-century Spanish-American novels and their subgenres, is situated in the field of quantitative digital literary studies, computational literary studies, or, more precisely, digital genre stylistics. Therefore, the theoretical discussions of genre in general literary studies are only one point of reference. Still, they constitute a central theoretical frame for analyzing literary genres in digital stylistics, and it has to be clarified which aspects of genre can be and usually are analyzed with the text analytical digital approach.

Three issues that have been at the center of genre theoretical discussions in the twentieth and twenty-first centuries are taken up here and related to questions of the design and analysis of digital corpora of literary texts in terms of genre: the question about the ontological status (are they just abstract terms or do they exist?) and the relevance of genres, the debate about the relationship between systematic descriptions and definitions of genres and their historical manifestations, and the question of the type of category that genres can be conceived as.22 These issues are considered especially relevant for literary texts and genres. They are interrelated because they all center around the question of the individuality of texts and the variability of the characteristics of text groups. The following chapters serve to address these essential literary genre theoretical issues and relate them to digital genre stylistics.

2.1.2 Ontological Status and Relevance of Genres

The first of the controversial issues of twentieth-century literary genre theory that is taken up here is the question of whether genres actually exist. Another question related to it is whether genres are a relevant category for literary analysis at all because if they would not exist, why should they be investigated? Both in the early and late twentieth century, there were theoretical approaches that fundamentally questioned the relevance of genres. According to nominalistic positions, generic terms are just abstract labels to aggregate and subsume similar texts, but genres do not exist. On the other hand, representatives of realistic positions argue that genres exist independently of individual texts, for example, as psychological dispositions or anthropologically basic world views (Zipfel 2010, 213–214). In his book “Gattungstheorie. Information und Synthese”, which was published in 1973, Hempfer discusses both kinds of positions in detail by surveying a broad range of approaches that can be subsumed under the labels “nominalistic” versus “realistic”. An important early critic of considering art in terms of genre was Croce, who emphasized the uniqueness and individuality of works of art as a result of the aesthetic and creative impetus of human activity. He considers genres useless and views them as intermediate pseudo-concepts between the individual and the universal, unable to capture or describe the individual expressions (Hempfer 1973, 38–41). Genre categories were also questioned later, in particular in post-structuralist theories. For example, Derrida finds that literary texts essentially break rules, while genres start from the opposite idea of a set of normative rules for text production and reception.23 Still, he indirectly also recognizes the relevance of genre for the production and reception of literary works by stating that texts participate in genres even if they cannot be neatly assigned to them:

Before going about putting a certain example to the test, I shall attempt to formulate, in a manner as elliptical, economical, and formal as possible, what I shall call the law of the law of genre. It is precisely a principle of contamination, a law of impurity, a parasitical economy. In the code of set theories, if I may use it at least figuratively, I would speak of a sort of participation without belonging—a taking part in without being part of, without having membership in a set. (Derrida 1980, 59)

According to Derrida, texts usually mark their relationship to genres, and for literature, he even sees this characteristic as necessary.24 This remark can be made consciously or unconsciously, explicitly or implicitly, it can be made relative to several different genres, and it can be made in ways undermining the referenced genre, “mendacious, false, inadequate, or ironic” (Derrida 1980, 64). Frow interprets Derrida’s critique of genre as rooted in a very specific concept of it – one that relates genre to prescription and taxonomic endeavors (Frow 2015, 28) – but that is not without alternatives:

The conception of genre that I have been working towards here represents a shift away from an ‘Aristotelian’ model of taxonomy in which a relationship of hierarchical belonging between a class and its members predominates, to a more reflexive model in which texts are thought to use or to perform the genres by which they are shaped. (Frow 2015, 26–27)

Another direction of the post-structuralist critique of genre is the one developing the concept of écriture, which was initially formulated by Barthes. He defines écritureas a level between language and style, on which authors can express themselves individually and consciously, engaging in the history of literature and pursuing social intentions. Language, in turn, is naturally given to the writers of a certain period and linguistic context, and it works as a prescriptive and habitual frame. Style, on the other hand, is an individual characteristic of each writer and is just as little controlled as the general language use (Barthes [1953] 2002, 16–18). Compared to genre, the concept of écriture focuses more on the singularity of texts, their individual interrelationships, and the writing process. From this viewpoint, genres are seen as mere terms that suggest clear differentiations where in fact, the texts interrelate more freely and openly. In this sense, the idea of écriture is linked to recent theories of intertextuality. Nevertheless, as in Derrida’s law of genre, the genres remain a point of reference when texts allude to generic terms and conventions, be it to break them (Schmitz-Emans 2010, 107–109).

On the realistic side, there are, amongst others, normative and anthropological conceptions of genre, but also communicative and semiotic approaches, including conceptualist positions.25 In general, communicative theories assume that genres exist as concepts that influence the production and reception of literary works. In a narrower sense, communicative genre theories are linguistically oriented. In a wider sense, theories that emphasize the social functions of genres can also be subsumed under this term. An influential proposition was Voßkamp’s idea to describe genres as “literary-social institutions” that undergo stabilization and dissolution processes and in which socio-historical communicative needs are condensed in a particular time and place. As such, genres are communicative models that are not mere text-internal literary phenomena but determined by a broader societal context (Voßkamp 1977, 30, 32; Zipfel 2010, 215). In semiotically oriented communicative genre theories, texts, genres, and generic terms are all conceived as complex linguistic signs, and genres can be understood as conventionalized models of an intended message or reality (Raible 1980, 324–326). It is assumed that such conventions and models influence authors producing literary texts and that readers, in their turn, use them to categorize and make sense of individual literary works. That way, genres become part of the communicative process and manifest themselves in it without being equated with a particular part of the process. Statements on and expectations about genres are controlled and triggered through generic signals that can accompany literary texts, be inscribed into them, and interpreted from them.

According to Hempfer, genres are only truly communicatively and semiotically determined if they are understood as a precondition for the comprehension of literary texts that authors are forced to take into account and not only as historically possible but not necessary options of communication (Hempfer 1973, 90–92). It follows from this that, communicatively speaking, literary works cannot be without genre. It does not mean, though, that every work needs to be associated with exactly one genre on one specific level. On the contrary, texts can be influenced by several genres and also on different levels of generality. The mentioned “Murder on the Orient Express” and “Lituma en los Andes” can be interpreted as instances of crime novels and, at the same time, novels and, more generally, narrative. However, “Murder on the Orient Express” can also be analyzed more specifically as a “detective novel” and “Lituma en los Andes” has also been assessed as a “novela indigenista” (Martínez Cantón 2008). Then again, other texts are only framed by the genre “novel” but not a specific subtype of it. They are sometimes called “general fiction” or “literary fiction”, if the literary merit of the works is stressed.26 As Raible puts it: “Ein Werk als Exemplar einer Gattung sehen heißt es in eine Reihe von Werken stellen, die analog zu einem Präzedenzfall sind” (Raible 1980, 334). One work alone does not constitute a genre, but when it is produced and received according to communicative models that have formed and have been formed by other works, it becomes part of a system of generic conventions.

If texts that participate in genres – to speak in Derrida’s terms – are understood as communicative objects, they should be described both on the level of the communicative situation and on the level of the textual sign itself. This means that both text-external features, for example, the time and place of its publication, and text-internal features, such as certain elements of content or style, determine how a text participates in a genre. Text-external factors can considerably determine a text’s form, and they can narrow down the possibilities of a text’s interpretation. However, literary works, especially written ones, are functionally less determined than other types of texts (Raible 1980, 334).

An approach reconciling aspects of the nominalistic and realistic conceptions of genre presented so far is Hempfer’s position, which he calls “the constructivist synthesis”. Following Piaget’s theory of knowledge, on the level of scholarly description, he sees genres as structures that emerge from the interaction between the subject that seeks to understand them and the objects to which the structure is applied. These structures constitute a process of approximation between subject and object. As Hempfer formulates it:

Auf der Ebene der historischen Entwicklung lassen sich die ‘Gattungen’ nun nicht im gleichen Sinn wie etwa die Geburt Napoleons als ‘Faktum’ begreifen, sondern es handelt sich, wie in den verschiedensten semiotisch orientierten Gattungstheorien betont wird, um Normen der Kommunikation, die mehr oder weniger interiorisiert sein können. Da diese Normen aber an konkreten Texten ablesbar sind, werden sie für den Analysator zu ‘Fakten’ und lassen sich demzufolge allgemein als faits normatifs verstehen, ein Begriff, den Piaget aus der Soziologie zur Bezeichnung analoger Phänomene in die Psychologie eingeführt hat. Diesen faits normatifs wird dann in der wissenschaftlichen Analyse eine bestimmte Beschreibung zugeordnet, die als solche immer ein aus der Interaktion von Erkenntnissubjekt und zu erkennendem Objekt erwachsenes Konstrukt darstellt. (Hempfer 1973, 125)

The more interiorized the communicative norms are, the more they approach the status of ahistorical constants (for example, knowledge about what narrative is). Hempfer aims to differentiate the ahistorical constants from historical norms that are less interiorized and more subject to open (for example, poetological) discussion and change (Hempfer 1973, 126–127).27

This paper follows Hempfer’s idea that genres are not to be understood as objective facts, but as communicative phenomena that can, however, leave traces in texts. If genres are understood as norms, then such textual traces can be conceived as normative facts in Hempfer’s sense. The connection between genres as communicative norms and the texts on which they have an influence results in turn from the communicatively established assignment of texts to genres. How is it made clear that a text participates in a genre? This can be expressed, for example, through generic signals in the texts but also through signals that accompany the texts (e.g., in subtitles or paratexts). Thus, genre signals and genre names used in connection with literary works have a special significance for establishing genre affiliations. The various references and levels of meaning of such linguistic expressions of genres are broken down, in particular, in semiotic genre theories. Since genre labels are digital genre stylistics’ primary approach to communicative genre norms, semiotic genre models are discussed in more detail in the following chapter.

2.1.2.1 Semiotic Models of Genres

One aspect that semiotic models of genres focus on is the multilayered meanings of generic terms, which point to the many communicative levels that genres can be defined on and the complexity of genres as signs. As signs, the generic terms can be understood as models for the even more complex models that the genres themselves are conceived as (Raible 1980, 334). Two semiotic models for generic terms are presented in more detail here. These are used as a basis for an empirically established discursive model of subgenre terms for the corpus of nineteenth-century Spanish-American novels created and analyzed in the context of this dissertation.28 The first of the two models has been formulated by Raible (1980, 342–345) and involves six dimensions from which generic terms usually draw their meaning and classificatory features:

the communicative situation between sender and recipient (“Kommunikationssituation”)

the object area of the texts involving persons and things (“Objektbereich”)

the higher order structure of texts (“übergeordnete Ordnungsstruktur”)

the relationship between text and reality (“Verhältnis zwischen Text und Wirklichkeit”)

the communicative medium that the text uses (“Medium”)

the way of linguistic representation (“sprachliche Darstellungsweise”).