38,99 €
DIFFUSIONS IN ARCHITECTURE
A guide to diffusion models and their impact on design, with insight on how this novel artificial intelligence technology may disrupt the industry
Diffusions in Architecture: Artificial Intelligence and Image Generators delves into the impact of Generative AI models and their effect on architecture design and aesthetics. The book presents an in-depth analysis of how these new technologies are revolutionizing the field of architecture and changing the way architects approach their work. The architects presented in the book focus on the application of specific AI techniques and tools used in generative design, such as Diffusion models, Dall-E2, Stable Diffusion, and MidJourney. It discusses how these techniques can generate synthetic images that are both realistic and imaginative, creating new possibilities for architectural design and aesthetics.
Twenty-two leading designers and theorists offer their insights, providing disciplinary depth by covering the full impact of these learning tools on architecture.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 413
Veröffentlichungsjahr: 2024
Cover
Table of Contents
Title Page
Copyright
Acknowledgments
Preface
Introduction to Diffusion in Architecture
The Terms
AI
as a Cultural Perception
From Representation to Prediction: AI images and Media History
Mapping Between Media
Simulation and Originality
Content vs Style in AI Images
Prologue: The Weird Ontology of Diffusion Models
Ontology of Image Generators
A Linguistic Turn in Architecture
Suppositions
Commorancies
Vestures
Estrangements
Suppositions
Rice or Pasta? Choose Your AI
Taxonomy or the Differences Between Humans and Machines Organizing Data
Diffusions
Pictorial Turn
Diffusion
Learning and Unlearning
Blind Eyes
Text Prompts
Guilty Pleasures
The Labors of AI
The Latent Space of Labor: Hidden but Visible
Giving Credit in the Wild West of Data Acquisition
Foucault, Archaeology, and Dataset
Digital Unions
An Artificial Tale
Resistance Is Fertile: Three Directions for Friction
Dataset as Archive
Tune the Dataset
Explore the Potential for Pleasure and Fear
Revive, Like a Monster
Ontology of Diffusion Models: Tools, Language and Architecture Design
Methods: Basics of Contemporary Diffusion Models
About Tools That Learn
Idea as Tool and Tool as Idea
Where There Are Tools, Technique Is not Far
Tools That Make Tools
The Limit of My Language Is the Limit of My World, or ‐ Locke vs Wittgenstein
Anatomy of a Prompt
Commorancies
The Etiology of a New Collective Architecture
Jellyfish Housing
Machinic Housing Domains
Bio Wood Series
Noise, Pixels, and Ancient Metamorphic Rocks: A Zoo of Alpine Villas
Villa near Hallstatt
Villa, Zugspitze, Tyrol, Austria
Variations on the topic of the Alpine villa
Villa, Zugspitze, Tyrol, Austria
Villa above the Rotelstein, Salzburg Austria
Timberpunk, Prompt Odyssey, Synthetic Ground, and Other Projects
TimberPunk
Prompt Odyssey
Fluidity Tangle
Swirling Steel
Synthetic Ground
Computation Past Forward: The Endless Recurring of the New
Ars Mechanica: Tools for Making and Tools for Thinking
A Drawing Is Not a Building
Algebraic Calculus and Analytical Geometry
Synthetic Intelligence
Diffusion Models: A Historical Continuum
Diffusion Models: A Historical Continuum
Vestures
Design as a Latent Condition
Prompt Encoding
AI, Architecture, and Art
AI Affecting Architects and Artists
AI and Personal Design Styles
AI as a Poetry Instrument
The Advent of Trees in Architecture or the Reversal of Autonomy with Large‐Scale Models
Designing with Cities
Modeling Circular Living
Foresting Architecture
Models to model
Five Points of Architecture and AI
The Challenge of Bias
The Cultivation of Sensibility
The Crisis of Labor
The Freedom of Incoherence
The Redefinition Of Authorship
De‐Coding Visual Cliches and Verbal Biases: Hybrid Intelligence and Data Justice
Human–Machine Intelligence and Empty Signifier
Generative AI and Geo‐Specificity
Data Justice and the Aesthetics of Incomplete Hybrids
Where, Why, and How Tetrahedron Voxels Explode?
Geo‐Tectonics and Blended Modalities of Representation
Hybrid Picture Planes
Light Architecture: The Future of Nonhuman Spaces via Circular Thinking for Urban Lifestyle
Synthetic Nature and Synthetic Light
Synthetic Nature and Synthetic Light
Trees in Daylight
Austin City Streetscapes
Driftwood Rock
Timber Beach Houses
Timber Forest Houses
Material Driven Forms
Digital Spoliare
Estrangements
Imago Mundi Imaginibus Mundi
Theatrum Mundi
Portable Theatrum Mundi
Clouds
Hybrids
Architectural Fire
Flying Machinery
Machines Producing Machines
3D Diffusion or 3D Disfiguration?
3D‐Diffusion Pandas as Figure‐Ground Body Studies
3D‐Diffusion House as Volume Studies
3D‐Diffusion House as Surface Studies
3D‐Diffusion Chair as Physical Studies
Role Play
You and AI
You and AI
Pick Me
Pick Me
The Doghouse: Exhibition Installation, MAK Vienna, Austria
The Doghouse ‐ Midjourney Sections
The Doghouse
The Doghouse
Do Humans Dream of Furry Houses?
Furry Futures
Chinchilla Villa
Animal Architecture
Animal Architecture
Allies in Exile
Appendix
Epilogue
Language, Authorship, and Estrangement
The Wicked and the Tame
Glossary
Contributor Profiles
References
Prologue
Suppositions
Commorancies
Vestures
Estrangements
Bibliography
Index
Image Credits
About the Author
End User License Agreement
Cover
Table of Contents
Title Page
Copyright
Dedication
Series Page
Acknowledgments
Preface
Prologue: The Weird Ontology of Diffusion Models
Begin Reading
Appendix
Epilogue
Glossary
Contributor Profiles
References
Bibliography
Index
Image Credits
About the Author
Wiley End User License Agreement
v
vi
vii
ix
xi
xvi
xvii
xviii
xix
xx
xxi
xxii
xxiii
xxiv
xxv
xxvi
xxvii
xxviii
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
FIRST EDITION
Edited byMatias del Campo ed.
Copyright © 2024 by John Wiley & Sons, Inc. All rights reserved.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey.
Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.
Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates in the United States and other countries and may not be used without written permission. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book. These fonts provided by Mr. G. S. Dykes: Goudy-Old-Style-Bold, GoudyHundred, GoudyOldStyleBT-Bold, GoudyOldStyleBT-Roman, GoudyOldStyleT-Bold, GoudyOldStyleT-Italic, GoudyOldStyleT-Regular
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.
Library of Congress Cataloging-in-Publication Data Applied for: Paperback ISBN: 9781394191772
Graphic Design: Matias del Campo Cover Image: Courtesy of Matias del Campo
This project was made possible with funding from the Taubman College of Architecture and Urban Planning at the University of Michigan.
To those who teach machines to think and feel, Who embrace the estrangements of paths yet to be taken, And show us how imagination can be real.
“Diffusion models offer a powerful new approach to generative modeling, enabling us to capture the complex dynamics of real‐world phenomena in a way that was not previously possible. As a result, we can create more nuanced and sophisticated aesthetics in our generative outputs.”
Yann LeCun
“The creative imagination is the synthesis of existing elements into a new whole.”
Henri Bergson
“A generative model is like having a genie in a bottle which can make all your wishes come true, but you are facing the problem that you still have to find out yourself what it is that you actually want.”
Mario Klingemann
“Synthetic imagination is the art of taking two ideas that have never met before and creating a new idea.”
Albert Einstein
The completion of this book is the culmination of a fast paced journey that has been both rewarding and challenging, yet one that I would not have traversed without the invaluable support and guidance of those who have walked alongside me.
To the architects, designers, and thinkers whose creativity and passion have inspired me, I offer my heartfelt thanks. Your work has been a guiding light, illuminating the path forward for those of us who seek to push the boundaries of what is possible in reimagining the built environment the lens of artificial intelligence. First and foremost, the architects who made this book possible: Cesare Battelli, Kory Bieg, Daniel Bolojan, Niccolo Casa, Virginia San Fratello, Soomeen Hahm, Immanuel Koh, Andrew Kudless, Elena Manferdini, Ryan Vincent Manning, Sandra Manninger, Sina Mostafavi, Rasa Navasaityte, Igor Pantic, Kyle Steinfeld, Marco Vanucci, and Dustin White, I owe you. To the scholars and researchers who have contributed with thoughtful texts unpacking the problem at hand, and whose rigorous inquiries have informed the ideas presented in this book, I am deeply grateful. Your insights have been a source of clarity and perspective, and have helped to shape my own thinking on this complex and ever‐evolving field. Thank you Mario Carpo, Bart Lootsma, and Joy Knoblauch for your invaluable contributions. To the AI experts whose technical expertise has enabled the realization of new forms, functions, and aesthetics, I extend my appreciation. Your contributions have been indispensable in translating conceptual ideas into tangible realities. Alexandra Carlson, Danish Syed, Janpreet Singh, Justin Johnson, Jessy Grizzle, and the many others who have shared their knowledge within the AR2IL laboratory I thank you. Without the support that I have experienced through the Taubman College of Architecture and Urban Planning at the University of Michigan, this book would not have been possible. My thanks go out in particular to Dean Jonathan Massey for his continuous support of my work and to Associate Dean of Research Kathy Velikov and her team at R+CP for making this publication and many others of my endeavors possible. To Vishal Rohira, Fatima Azahra Addou, and Siya Sha for helping to make this book a reality. To the readers who engage with this work, I invite you to journey with me through a landscape of synthetic imagination, and to explore the possibilities that arise at the intersection of architecture and AI. Finally, to my loved ones who have supported me my entire life, providing encouragement and understanding in equal measure, I offer my heartfelt gratitude to you, Sandra Manninger, my wife, partner in crime, and inspiration for everything I do. Mother, sister: your unwavering love has been a constant source of motivation and has sustained me through the highs and lows of every creative process.
In the words of Ernest Hemingway, The world breaks everyone, and afterward, some are strong at the broken places. This book is the product of the creative and intellectual challenges that have tested me, and the support that has strengthened me. To all those who have helped me on this journey, I extend my deepest thanks.
With gratitude and humility,
Matias del Campo
Lev Manovich
This book is a unique document of the beginning of a true revolution in cultural imagination and creation. This revolution has been in development for over 20 years. The first AI papers proposing that the web universe of texts, images, and other cultural artifacts can be used to train computers to do various tasks appeared already in 1991–2001. In 2015, Google’s deep dream and style transfer methods attracted lots of attention: suddenly computers could create new artistic images mimicking the styles of many famous artists. The release of DALL‐E 2 in 2021 was another milestone: now computers could synthesize images from text descriptions. MidJourney, Stable Diffusion, and DALL‐E 2 all contributed to the acceleration of this evolution in 2022. Synthetic images could not have many aesthetics that range from photo realism to any kind of physical or digital medium, including mosaics, oil paintings, street photography, or 3D CG rendering. The code for producing such images (also referred to as a model in the field of artificial intelligence) was made public in August 2022, sparking a flurry of experiments and accelerating development.
I’ve been using computer tools for art and design since 1984, and I’ve seen a few major media revolutions, including the introduction of Mac computers, the development of photorealistic 3D computer graphics and animation, the rise of the web after 1993, and the rise of social media sites after 2006. The new AI generative media revolution appears to be as significant as any of them. Indeed, it is possible that it is as significant as the invention of photography in the 19th century or the adoption of linear perspective in Western art in the 16th century. In what follows, I will describe a few aspects of AI visual generative media (as of Spring 2023) that I believe are particularly significant or novel. But first, let’s define our terms.
In this introduction, artist or creator refers to any skilled person who creates cultural objects in any media or their combinations – architects and designers are included. Indeed, because many architects and architecture students are now experimenting with AI image and animation generation, the old concept of paper architecture is resurfacing. Let’s face it: building architecture is only a small part of what architects imagine, design, and debate. Images that articulate new ideas, aesthetics, ideals, and arguments have always been the primary focus of the architecture field, and we should not be ashamed of this. So, in my opinion, this book contains real architecture because it raises many interesting questions and points of view. It is unimportant whether some of the ideas expressed in these images will be realized as architecture for VR or physical spaces.
The terms generative media, AI media, generative AI and synthetic media are all interchangeable. They refer to the process of creating new media objects with deep neural networks, such as images, animation, video, text, music, 3D models and scenes, and other types of media. Neural networks are also used to generate specific elements and types of content, such as photorealistic human faces and human poses and movements, in addition to such objects. They can also be used in media editing, such as replacing a portion of an image or video with another content that fits spatially. These networks are trained on vast collections of media objects already in existence. Popular artificial neural network types for media generation include diffusion models, text‐to‐image models, generative adversarial networks (GAN), and transformers. For the generation of still and moving images using neural networks, the terms image generation, synthetic image, AI image, and AI visuals can be used interchangeably.
Note that the word generative can also be used in different ways to refer to making cultural artifacts using any algorithmic processes (as opposed to only neural networks) or even a rule‐based process that does not use computers. This is how the phrases generative art and generative design are typically used today in cultural discourse and popular media. Here, I am using generative in a more restrictive way to designate deep network methods and apps for media generation that use these methods.
There is no one specific technology or a single research project called AI. It is our cultural perception that evolves over time. When an allegedly uniquely human ability or skill is being automated, we refer to it as AI. As soon as this automation is successful, we stop referring to it as an AI case. In other words, AI refers to technologies and methodologies that are starting to function but aren’t quite there yet.
AI was present in the earliest computer media tools. The first interactive drawing and design system, Ivan Sutherland’s Sketchpad (1961–1962), had a feature that would automatically finish any rectangles or circles you started drawing. In other words, it knew what you were trying to make. So this was undoubtedly AI already. My first experience with a desktop paint program running on Apple II was in 1984, and it was truly amazing to move your mouse and see simulated paint brushstrokes appear on the screen. But today, we no longer consider this AI. Or, for example, a Photoshop function that automatically selects an outline of an object that was added many years ago – this, too, is AI. The history of digital media systems and tools is full of such AI moments – amazing at first, then taken for granted and forgotten as AI after a while. (In AI history books, this phenomenon is referred to as the AI effect.) At the moment, creative AI/artistic AI stands for recently developed methods where computers transform some inputs into new media outputs (e.g., text‐to‐image models) and specific techniques (e.g., certain types of deep neural networks). However, we must remember that these methods are neither the first nor the last in the long history and future of simulating human art abilities or assisting humans in media creation.
Historically, humans created images of existing or imagined scenes by a number of methods, from manual drawing to 3D CG (see below for an explanation of the methods). With AI generative media, a fundamentally new method emerges. Computers use large datasets of existing representations in various media to predict new images (still and animated). I can certainly propose different historical paths leading to visual generative media today, or divide one historical time line into different stages, but here is one such possible trajectory:
Creating representations manually (e.g., drawing with various instruments and carving). More mechanical stages and parts were sometimes executed by human assistants typically training in master’s studio – so there is already some delegation of functions.
Creating manually but using assistive devices (e.g., perspective machines and camera lucida). From hands to hands + device. Now some functions are delegated to mechanical and optical devices.
Photography, X‐ray, video, volumetric capture, remote sensing, photogrammetry. From using hands to recording information using machines. From human assistants to machine assistants.
3D CG. You define a 3D model in a computer and use algorithms that simulate effects of light sources, shadows, fog, transparency, translucency, natural textures, depth of field, motion blur, etc. From recording to simulation.
Generative AI – using media datasets to predict still and moving images. From simulation to prediction.
Prediction is the actual term often used by AI research in their publications describing visual generative media methods. So while it can be used even evocatively, actually this is what happens scientifically when you use image‐generative tools. If you are working with text‐to‐image model, the net attempts to predict the images that correspond best to your text input.
I am certainly not suggesting that using all other already‐accepted terms such as generative media is bad. But if we want to better understand the difference between AI visual media synthesis methods and other representational methods developed in human history, prediction well captures this difference.
There are several methods for creating AI media. One method transforms human media input while retaining the same media type. Text entered by the user, for example, can be summarized, rewritten, expanded, and so on. The output, like the input, is a text. Alternatively, in the image‐to‐image generation method, one or more input images are used to generate new images.
However, there is another path that is equally intriguing from the historical and theoretical perspectives. AI media can be created by automatically translating content between media types. Because this is not a literal one‐to‐one translation, I put the word translation in quotes. Instead, input from one medium instructs a neural network to predict the appropriate output from another. Such input can also be said to be mapped to some outputs in other media. Text is mapped into new styles of text, images, animation, video, 3D models, and music. The video is converted into 3D models or animation. Images are translated into text and so on. Text‐to‐image method translation is currently more advanced than others, but they will catch up eventually. Translations (or mappings) between one media and another were done manually throughout human history, often with artistic intent. Novels have been adapted into plays and films, and comic books have been adapted into television series. A fictional or non fictional text is illustrated with images. Each of these translations was a deliberate cultural act requiring professional skills and knowledge of the appropriate media.
Some of these translations can now be performed automatically on a massive scale thanks to AI, becoming a new means of communication and culture creation. What was once a skilled artistic act is now a technological capability available to everyone. We can be sad about everything that will be lost as a result of the automation and democratization of this critical cultural operation – skills, originality, deep creativity, and so on. However, any such loss may be only temporary if culture AI development improves its abilities to be original and understand context.
Because the majority of people in our society can read and write in at least one language, text‐to‐another media methods are currently the most popular. Text‐to‐image, text‐to‐animation, text‐to‐3D model, and text‐to‐music are among them. These AI tools can be used by anyone who can write, or by using Google Translate to create a prompt in a language these tools understand well, such as English. However, other media mappings can be equally interesting for professional creators. Throughout the course of human cultural history, various translations between media types have attracted attention. They include translations between video and music (club culture); long literary narratives turned into movies and television series; any texts illustrated with images in various media such as engravings; numbers turned into images (digital art); texts describing paintings (ekphrasis, which began in Ancient Greece); and mappings between sounds and colors (especially popular in modernist art).
The continued development of AI models for mappings between all types of media, without privileging text, has the potential to be extremely fruitful, and I hope that more tools can accomplish this. These tools can be used alone or in conjunction with other tools and techniques. I am not claiming that will be able to create innovative interpretations of Hamlet by avant‐garde theater directors such as Peter Brook or astonishing abstract films by Oscar Fishinger that explored musical and visual correspondences. It is sufficient that new media mapping AI tools stimulate our imagination, provide us with new ideas, and enable us to explore numerous variations of specific designs.
Both the modern human creation process and the AI generative media process seem to function similarly. A neural network is trained using unstructured collections of cultural content, such as billions of images and their descriptions or trillions of web and book pages. The net learns associations between these artifacts’ constituent parts (such as which words frequently appear next to one another) as well as their common patterns and structures. The trained net then uses these structures, patterns, and culture atoms to create new artifacts when we ask it to. Depending on what we ask for, these AI artifacts might closely resemble what already exists or they might not.
Similarly, our life is an ongoing process of both supervised and unsupervised cultural training. We take art and art history courses; view websites, videos, magazines, and exhibition catalogs; visit museums; and travel in order to absorb new cultural information. And when we prompt ourselves to make some new cultural artifacts, our own nervous networks (infinitely more complex than any AI nets to date) generate such artifacts based on what we’ve learned so far: general patterns we’ve observed, templates for making particular things (such as drawing a human head with correct proportions or editing an interview video), and often concrete parts of existing artifacts. In other words, our creations may contain both exact replicas of previously observed artifacts and new things that we represent using templates we have learned, such as color combinations and linear perspective. Additionally, both human and AI models frequently have a default house style (the actual term used by Midjourney developers). If you didn’t specify the subject yourself, AI will generate it using this aesthetic. A description of the medium, the kind of lighting, the colors and shading, and/or a phrase like in the style of followed by the name of a well‐known artist, illustrator, photographer, fashion designer, or architect are examples of such specifications.
Because it can simulate tens of thousands of already‐existing aesthetics and styles and interpolate between them to create new hybrids, AI is more capable than any single human creator in this regard. However, at present, skilled and highly experienced human creators also have a significant advantage. Both humans and artificial intelligence are capable of imagining and representing both nonexistent and existing objects and scenes. However, human‐made images can include particular content, certain details, and distinctive aesthetics that are currently beyond the capabilities of AI. In other words, a large group of highly skilled and experienced illustrators, photographers, and designers can represent everything a trained neural net can do (although it will take much longer), but they can also visualize objects and compositions and use aesthetics that the neural net cannot do at this time.
What is the cause of the aesthetic and content gap between human and artificial creators? Most frequently occurring cultural atoms, structures, and patterns in the training data are successfully learned during the process of training an artificial neural network. In the mind of a neural net, they gain more importance. On the other hand, atoms and structures that happen very infrequently or only once are not learned. They do not enter the artificial culture universe as learned by AI. Consequently, when we ask AI to synthesize them, it is unable to do so.
Because of this, Midjourney, Stable Diffusion, or RunwayML are not currently able to generate drawings in my style, expand my drawings by adding newly generated parts, or replace specific portions of my drawings with new content drawn in my style (e.g., perform outpainting or inpainting). Instead, AI generates more generic, common objects than what I frequently draw when I attempt to do such operations. Or it produces something that is merely ambiguous but uninteresting.
I am certainly not claiming that the style and the world shown in my drawings is completely unique. They are also a result of specific cultural encounters I had, things I observed, and things I noticed. But because they are uncommon (and thus unpredictable), AI finds it difficult to simulate them, at least without additional training using my data. Here, we encounter the greatest obstacle we face as creators in using the AI‐generated media. Frequently, AI generates new media artifacts that are more generic and stereotypical than what we intended. This may include elements of content, lighting, crosshatching, atmosphere, spatial structure, and details of 3D shapes, among others. Occasionally, it is immediately apparent, in which case you can either attempt to correct it or disregard the results. Very often, however, such substitutions are so subtle that we cannot detect them without extensive observation or, in some cases, the use of a computer to analyze quantitatively numerous images.
In other words, new AI generative media models, like the discipline of statistics since its inception in the 18th century and the field of data science since the end of the 2010s, deal well with frequently occurring items and patterns in the data, but do not know what to do with the infrequent and uncommon. We can hope that AI researchers will be able to solve this problem in the future, but it is so fundamental that we should not anticipate a solution immediately.
In the arts, the relationship between content and form has been extensively discussed and theorized. This brief section does not attempt to engage in all of these debates or to initiate discussions with all relevant theories. Instead, I’d like to consider how these concepts play out in AI’s generative culture. But instead of using content and form, I’ll use different pairs of terms that are more common in AI research publications and online conversations between users. They are subject and style.
At first glance, AI media tools appear capable of clearly distinguishing between the subject and style of a representation. In text‐to‐image models, for instance, you can generate countless images of the same subject. Adding the names of specific artists, media, materials, and art historical periods is all that is required for the same subject to be represented differently to match these references.
Photoshop filters began to differentiate between subject and style in the 1990s, but AI‐generative media tools are more capable. For instance, if you specify oil painting in your prompt, simulated brushstrokes will vary in size and direction across a generated image based on the objects depicted. AI media tools appear to understand the semantics of the representation, as opposed to earlier filters that simply applied the same transformation to each image region regardless of its content. For instance, when I used a painting by Malevich and a painting by Bosch in a prompt, AI generated an image of space that contained Malevich‐like abstract shapes as well as many small human and animal figures that were properly scaled for perspective.
AI tools routinely add content to an image that I did not specify in my text prompt, in addition to representing what I requested. This frequently occurs when the prompt includes in the style of or by followed by the name of a renowned visual artist or photographer. In one experiment, I used the same prompt in Midjourney AI image tool 148 times, each time adding the name of a different photographer. The subject in the prompt renamed always the same – empty landscape with some builds, a road, and electric poles with wires going to horizon. Sometimes adding a photographer’s name had no effect on the elements of a generated image that fit our concept of style, such as contrast, perspective, and atmosphere. Every now and again, Midjourney also modified image content. For example, when well‐known photographs by a particular photographer feature human figures in specific poses, the tool would occasionally add such figures to my photographs. (Like Malevich and Bosch, they were transformed to fit the spatial composition of the landscape rather than mechanically duplicated.) Midjourney has also sometimes changed the content of my image to correspond to a historical period when a well‐known photographer created his most well‐known photographs. Here’s another thing I noticed. When we ask Midjourney or a similar tool to create an image in the style of a specific artist, and the subject we describe in the prompt is related to the artist’s subjects, the results can be very successful. However, when the subject of our prompt and the imagery of this artist are very different, rendering this subject in this style frequently fails.
To summarize, in order to successfully simulate a given visual style using current AI tools, you may need to change the content you intended to represent. Not every subject can be rendered successfully in any style. This observation, I believe, complicates the binary opposition between the concepts of content and style. For some artists, AI can extract their style from examples of their work and then apply it to different types of content. But for other artists, their style and content can’t be separated. For me, these kinds of observations and subsequent thoughts are one of the most important reasons for using new media technologies like AI‐generative media and learning how they work. Of course, I had been thinking about the relationships between subject and style (or content and form) for a long time, but being able to conduct systematic experiments like the one I described brings new ideas and allows us to look back at cultural history in new ways. And this is why I’m particularly excited to write this introduction to the book that brings together the ideas, experiments, and explorations of a number of architects who are part of the current AI revolution.
Matias del Campo
The idea for this book started to take shape in July and August 2022. It was at this time that the phenomenon of employing diffusion models to speculate about architecture began to snowball, growing larger with each passing day. The proliferation of images across social media platforms, coupled with the lightning‐fast speed at which dedicated channels were emerging to showcase architectural phantasms created with the aid of text‐to‐image generators, served to cement the need to create a record of the explosive burst of diffusion models into the architecture scene. The advent of these tools marks a significant turning point in the relationship between human creativity and algorithmic intelligence (AI), opening up a new realm of architectural possibility and imbuing the field with a fresh vitality.
The aphoristic words of Ludwig Wittgenstein, The limits of my language mean the limits of my world, have taken on a new resonance in the era of natural language text‐to‐image applications powered by AI algorithms. Applications such as Midjourney, Stable Diffusion, and Dall‐E 2 have spread like wildfire throughout the architecture community, yielding thousands of stunning images. This explosion of a novel design tool has given rise to two notable outcomes. Firstly, it has produced an abundance of extraordinary images. Secondly, it provokes theoretical inquiries within the architecture discipline, suggesting the dawn of a posthuman design methodology1. The confluence of these factors has precipitated a seismic shift within the architectural landscape, fundamentally altering the relationship between the human and the machine in the act of creation. This burgeoning field of natural language text‐to‐image applications is forging an epic shift in the architectural discourse. By deploying AI‐assisted image generators, architects are able to test the waters of the vast ocean of AI without having to bear the burden of coding neural networks from scratch or undergoing the tribulations of creating their own datasets. The advent of ChatGPT has resulted in the obsolescence of promptism and prompt engineering. It has made it feasible for even the most inexperienced user to come up with functional code and complex image prompts. All of this resulted in the proliferation of astonishing images, engendering a new epoch of design that blends human creativity with algorithmic intelligence. In doing so, diffusion models have emerged as a possible new design tool. Enabling architects to mine the multilayered, deep historical repositories of architectural knowledge for chimeras, capriccios, and mutants. Encouraging architects to discover a new voice for the architecture of the 21st century ‐ one that is rooted in bold and visionary experimentation by default. This moment is truly remarkable, not just for the technological innovations revolutionizing the field of architecture, but also for the profound shift in our understanding of design. By working in tandem with AI, architects are not merely creating new images, but rather probing and subverting the very bedrock of traditional design methods. One could even view these image generators as highly advanced accelerators of human ingenuity – expanding the limited possibilities of the mind and allowing humans to peek into the exotic realm of latent space. In doing so, they are opening up new avenues for creativity and experimentation, allowing architecture to transcend the constraints of the past and move into a new era of design. The processing of images through diffusion models breaking them down to mere noise only to rebuild them as something entirely surprising, unexpected and occasionally novel is akin to how architects can deconstruct traditional architectural values and mount them back together into fresh ideas and concepts. Such concepts may better serve the needs of a rapidly evolving 21st century. The proliferation of this new tendency has resulted in a wealth of architectural imagery shared on social media platforms, discussed in the comments sections, and exchanged in informal online meetings. But beyond the excitement and wonder of these images, there lies a deeper discourse that needs to be explored.
A House made of Feathers Midjourney V.3 May 22nd 2022. It was shown in the exhibition Strange in Forth Worth, Texas in June 2022. SPAN(MdC & MS)
The ontology of this new architecture, which is based on the use of neural networks trained on pre‐existing datasets and pretrained models, has led to an important question about the originality of its outputs. Can a neural network truly create something new when it is built upon existing data? On the other end of the spectrum are, of course, the epistemological questions, which result in a fundamental challenge to the very nature of creativity in architecture. Furthermore, this inquiry goes beyond the question of whether the creations of the neural network are new or not.2 For all intents and purposes, the question whether the results are new or not might not be relevant at all – as long as it provokes the architect to come up with a novel solution to a design problem. It also probes the extent to which these creations can transform our understanding, methodology, and representation of architecture. Can the transformer’s outputs challenge our existing perceptions and practices of architecture, leading us to a new and more profound understanding of the field? This epistemological investigation, therefore, is not limited to the creation of innovative designs but also encompasses the core significance of architecture itself. The very essence of architecture as a discipline and a practice is being questioned, and the answer to this inquiry could have far‐reaching implications for the future of architecture. It forces us to think deeply about the meaning of architecture and how we can create meaningful and impactful architectural works in the age of AI. In addition, the dawn of a new era necessitates a shift in the ethical considerations that underpin the discipline of architecture. One of these considerations is the urgent need to adopt more scientific and collective methods of design, as opposed to clinging to the outdated Romantic idea of the solitary genius. Although the concept of the star architect has largely faded,3 it still holds sway over certain territories of the field. It is therefore imperative that we re‐examine the meaning of imitation4 and encourage the ethical sharing of knowledge and ideas. The discipline must also confront the ethical implications of employing a technology that is dependent on the work of countless others. Only by doing so can it be ensured that architecture remains a socially responsible and sustainable practice.
The linguistic turn,5 represents a shift away from traditional philosophical inquiries and toward an exploration of the role of language in shaping human experience and understanding. This mid‐20th century movement was influenced by the works of thinkers such as Wittgenstein,6 Austin,7 and Merleau‐Ponty.8 It has fundamentally altered the thinking about language and its impact on everyday live. For the linguistic turn, language is not just a tool for communication, but rather a means by which reality is constructed and how we interpret the world around us. It is through language that we make meaning, and it is through meaning that we understand our place in the world. The impact of the linguistic turn is undeniable, as it continues to shape our understanding of the world and our place in it. The emergence of text‐to‐image models has given rise to the linguistic turn in architecture, a renewed interest in the power of language and its ability to shape the built environment. At the core of this turn is the understanding that language is not merely a tool for communication but is in fact a fundamental component of architecture itself, one that shapes how we perceive, interact with, and understand the built environment. As such, the linguistic turn in architecture is not simply a technological phenomenon, but a cultural and philosophical one as well. It challenges us to rethink our fundamental assumptions about the nature of creativity, authorship, agency, sensibility, and the relationship between language and the world. By drawing on the work of thinkers like Foucault9 and Barthe,10 it invites us to explore the possibilities of a more democratic, collaborative, and open‐ended approach to architecture. Together, the linguistic turn and the rise of image generators have created a rich field of inquiry within architecture, one that draws on the insights of thinkers and theorists from a wide range of disciplines. By exploring the complex interplay between language, image, and the built environment, architects and theorists alike are challenging traditional modes of practice and opening up new possibilities for the future of architecture.
The book Diffusions in Architecture: Artificial Intelligence and Image Generators presents itself as a collection of images and comments by 25 architects and theorists. It is divided into four large blocks: Suppositions, Commorancies, Vestures, and Estrangements.
Beneath any hypothesis, theory, or model lies a set of silent assumptions known as suppositions. These enigmatic propositions serve as the foundation for our conceptual frameworks, guiding the trajectory of our intellectual pursuits. Like threads woven into a fabric, suppositions shape the texture and structure of our theories, often hidden but undeniably influential. They mark the beginning of our intellectual journey, and their ambiguity and uncertainty reflect the elusive nature of knowledge. While we may draw on evidence and logic, suppositions ultimately rest on our subjective perceptions and intuitions.
Commorancies are a broad collection of architectural entities that can be inhabited, ranging from houses and dwellings to temporary shelters. This archaic term allows for a flexible interpretation of inhabitation, serving as an elastic envelope for the examples presented in this section of the book. The term’s morphing abilities also allow it to be translated as place of residence or simply place. As Edward Casey argued in The Fate of Place: A Philosophical History11, house and home have broader connotations that primarily refer to their spatiality. The strange places depicted in this chapter evoke a sense of spatiality as it would occur in a physical environment, even though they are generated from numerical data. Despite being reduced to images, these places will inevitably spill over into our physical realm. However, a place never becomes merely parasitic in relation to its architectural properties, nor is it merely a by‐product of powerful image generators. It retains its own features and fate, its own local being, whether actual or virtual.
Within the taxonomy of synthetic imaginations,12 vestures are unique as they handle the challenge of the exterior, the façade, the frontage, or the wrapping of an object. Vesture13 is typically associated with clothing or attire, especially as a symbol of rank or status (like coronation robes, regalia, chasubles, dalmatics, kasayas, and so on), as well as with the covering of a specific object or surface, such as a building or piece of furniture. It can also describe the process of putting clothing on someone or something, such as dressing a doll or preparing furniture for a photo shoot.14 This chapter investigates how diffusion models tend to infuse imagery with moods and atmospheres (analogous to layers of fabric), creating a painterly and cinematic appearance that highlights the vestures of architecture.
The concept of estrangement in architecture exists within the interstices of the recognizable and the defamiliarized, the known and the enigmatic, and the ordinary and the evocative. It is a complex and nuanced phenomenon that eludes easy comprehension, necessitating a deep exploration of the language of architecture its forms, materials, and spaces. The projects in this section live happily in this territory of interrogation. At its core, estrangement serves as a means to challenge assumptions about the world we inhabit. It involves unraveling our perception of architecture (built and unbuilt), pushing beyond the boundaries of our habitual understanding, and allowing us to see the world in a new light.
The explorations presented in this book delve into the intricate interplay between diffusion models, architectural designs, and AI. It is an inquiry that seeks to uncover the nuances of the relationship between human creativity and algorithmic intelligence. By synthesizing these two disparate elements, a novel approach to design is forged, one that is marked by a radical departure from the conventions of the past. This new design paradigm has engendered an entirely new era in which the boundaries between human and synthetic imagination have become increasingly blurred. In this way, the use of diffusion models has ushered in a new level of creativity and experimentation, providing fertile ground for the emergence of hitherto unexplored architectural possibilities.