Do you remember the iconic beginning of the movie “The Matrix” (Wachowski, 1999), where a curtain of symbols transported us into a world of synthetic images and sounds. Such a believable and realistic environment where nothing could give those who lived in it that it was unreal, except perhaps some failures that manifested in the form of “déjà vu”.
“I Already Know Kung Fu” is the title of the scene in which Neo, the main character of The Matrix, suddenly learns the rules and moves of this martial art, thanks to a computer model being preloaded on your system, something. Like Kung Fu AI.
Like Neo, you can now generate an image in a few seconds in the style of Van Gogh. Anyone can do it. That is, everyone who has an Internet connection or a computer. AI does it for you.
The symbols that encoded the world of the Matrix, and which our computers can eventually represent as the basis of the binary language of zeros and ones, now represent the meeting place of sounds, proteins, images, and newspaper articles… Sometimes, artistic and scientific productions come together as never before and Without complexes, in a tsunami of numerical matrices, where the data is related, impact and generation.
Ever since the first computer systems, the idea of designing machines that could create themselves has always been around. The field of computational creativity has been specifically devoted to the study of the relationship between creativity and artificial systems in an area of convergence with disciplines as interesting as cognitive psychology.
Currently, the star algorithms in art generation are based on deep learning AI, an artificial intelligence that can generate new data from patterns and structures found in other pre-existing data. What many did not suspect is that all these years when we kept our photos for free, in addition to labeling, served not only algorithms to be able to identify the cat in the picture, but also because of their number. And variations of images of “seen” cats were trained to create versions of cats with amazing attributes.
But is this ability to generate variations enough to call “it” art? Can an AI-generated image of a cat in the style of Johannes Vermeer be considered art, or does it have to ask for more? Any invasion of new technology removes the foundations around the figure of the work, the artist and his creative process. In 1935, Walter Benjamin introduced the work of art and the concept of aura in the age of mechanical reproduction. Currently, the debate is no longer so much about the mechanical reproducibility of the original, but whether the original is actually produced by a machine.
Generative algorithms do this based on patterns they have extracted from other works of art, which represents the boundary between actually generated or partially copied discussion and an ongoing source of debate. The intention in this text is not to make a judgment, but to discuss some points that can provide arguments for reflection.
Within the generation of digital art we can find two approaches based on the degree of intervention of the artist. Artwork can be produced by programming the parameters that configure the objects in the scene, intervening at the pixel or polygon level; or directly integrates mathematical equations that allow defining, for example, geometric structures or determining physical behavior. And, on the other hand, we have a type of digital art that comes from studying other works of art. At this point, the artist does not have to worry about mathematically parameterizing the object and/or its behavior, because he can get an almost instantaneous result by inputting text or another reference image. The biggest differential between both generative models is undoubtedly the mode of learning.
If we consider the algorithm as a student, the main thing for teachers will be to offer it ideal learning conditions. In the field of artificial intelligence, this includes providing representative data in terms of quantity and variability and, on the other hand, making sure that the student does not remember but can understand.
If you had to remember any of the following sequences, which do you think you would remember best?
-first sequence: abc,abc,abc,abc,abc,abc?
– that is, the second order: baa,caa,abc,aba,bab,cba?
You will most likely answer the first one, right? And that’s because you’ve found a pattern, memorizing “abc” six times. This way of compressing information has a double advantage: on the one hand, the most obvious one is the fact that it requires less storage space; and, on the other hand, forcing the extraction of obscure structures and features in our data.
Just as there are schools with different learning methodologies, there are AI training designs for this purpose: those based on reconstruction of the original (VAE); those that compete with each other through the generation network; And another one, which is an estimator (GAN) and acts as a basis for the generation of the famous “deep fakes”.
There are also other strategies, such as strategies based on diffusion models, where the original content is corrupted by noise and the original network is forced to reconstruct it with another type of content, such as text or image. This type of approach is the basis of artificial intelligence known as Dall·e-2, Stable Diffusion or Midjourney, all three of which can generate content with high quality and variability.
“Latent space” is one of the key terms of this revolution and represents a kind of platonic world of ideas that contains digital matrices. After training, the network crystallized the most relevant input data and certain semantic levels into the distribution distances of this space.
When we say that two objects are similar in a generation model, it really means that in this latent space they will be closer than two other different values.
In turn, we can generate content from Colab notebooks integrated into applications such as Photoshop or directly from the company’s website, as is the case with Dall·e-2. Most of the time we generate an image using text that works as a key between the noise image and the recognizable image. This is the reason why these text records can be commercialized as secret codes. There are also community and websites like Léxica where you can find structured references such as: the main object you want to generate (cat), the style (conceptual art) and the reference artist (Marcel Duchamp), as well as attachments. Other words that improve results, like when referring to some art portals like Artstation.
Google, Meta, Microsoft, OpenAI, StabilityAI are the industry’s major players in AI-powered visualization tools, but all of them have been dared by StabilityAI to stand apart from the rest by teaming up with Runwayml, LAION and CompVis. Research into the generation of StableDiffusion, an open source model released to the community, from which he began to improve the original model and develop new features.
Another aspect in which the type of access to the models differs is the level of customization in the generative process. We can control, for example, how direct the generation is to the input text or image (creative limit), as well as the number of iterations in the process to add more precision to the result (perfection limit).
Although we use natural language in generation, it is often difficult to achieve the result we expect, for example, that the figures are in a certain pose or that there is consistency between the two generated images. That’s why improvements are constantly being added, such as the recent ControlNet, which allows you to direct the generation with text entries, thumbnails or body poses, among others. Hugging Face, which is another agent of this revolution, allows you to test all these features in your website in friendly interfaces.
The Text_to_Image generation sequence is the most common and the one we focused on the most, but we can also generate Text_to_Audio, Text_to_Text.
So far we’ve talked about data, learning strategies, and ways to navigate this latent space, but what other comparisons can we make? It’s time to talk about the process.
The artist assumes that he is motivated to project a part of his personality into the work, while the algorithm must be guided. A person has life experience, and the algorithm has data at its disposal, which, moreover, is mostly randomly pulled from the Internet, which, on the other hand, can change when most of the data comes. from machines in the world such as robots.
Another aspect from which we can directly evaluate a work is its end result or product. Here it is more difficult to distinguish. The reason is that although the algorithms require guidance, we cannot guarantee that the product is a copy because the models’ job is to learn accurately from the data, not to fit it. Philosophically, wouldn’t there be something similar in this process to what some people do when they create? That is, seeing many pieces and then producing variations from that experience.
The production of artificial intelligence is not limited to images and is expanding every day to music, video, 3D… and it is clear that the waiting time in a generation is getting shorter and shorter.
With these advances, are you considering an on-demand movie generation service? Would it be possible to generate infinite variations of the Star Wars series even with a character like you? Is it dangerous to create a custom art bubble that limits our experience and experimentation towards discovering and experimenting with new artistic trends?
It is up to us whether we create artificial intelligence or accept its consequences. The question of whether art is generated by artificial intelligence will continue, but what seems almost certain is that the algorithm itself already exists.
Source: El Diario