Caveat: You need a computer with a graphics card with a GPU and at least 8GB VRAM. If not you can at least profit from the code-based part in Google Colab.
Introduction
Generative AI has taken the world by storm, opening up new realms of creativity and innovation. From chatbots powered by large language models (LLMs) to text-to-image models like Stable Diffusion, the possibilities seem boundless. The allure of AI in the art world has captivated enthusiasts, offering a unique blend of technology and creativity. In this blog post, I’ll share my journey of creating AI art of myself using open source models and tools, such as Stable Diffusion, Dreambooth, and the user-friendly GUI-based tool “Automatic1111.”
The Catalyst: A Black Friday Gaming PC
Last Black Friday, I decided invest in a gaming desktop equipped with a powerful GPU, initially to improve my video editing workflow (which can already be witnessed in my recent Thailand travel video). But one other reason for the investment was that I also wanted to take the plunge into the world of generative AI and be able to test these powerful models locally.This decision was driven by the desire to explore the creative potential of AI models and algorithms, especially those designed for transforming images and generating unique pieces of digital art.
Learning the Ropes
Armed with my new gaming desktop, I embarked on a journey of self-discovery through various Udemy courses dedicated to generative AI. In terms of text-to-image models I followed the course Master AI image generation using Stable Diffusion which gave a good overview and had excellent Google Colab notebooks with example implementations. The course provided me with a foundational understanding of the tools and techniques involved in using GenAI. At the time, however, I was still busy with work and looking for a new job so that I could not find sufficient time to test these tools on my new PC.
Overcoming Hurdles
When I came back to the topic, I wanted to play with image generation models using a graphical user interface (GUI) tool rather than code, since creating these images involves a lot of trial and error and changing settings and that works much better using GUI-based tools with buttons and sliders than adjusting code snippets all the time. On YouTube I stumbled over discussions on Automatic1111 and how to also use Dreambooth to finetune an existing model on new images.
The initial stages of my exploration were not without their share of hurdles. Wrestling with Python dependencies and configuring the right settings proved to be a daunting task. It took me a while to get things running, in particular installing the xformers
package and needed a few trips through Google & StackOverflow to get it to work (and unfortunately have forgotten by now what was the trick 😅), but finally succeeded.
Then I checked out some popular video tutorials by Olivio Sarikas and Dr. Furkan Gözükara, which helped explaining the menus and settings and the general process to use the Dreambooth extension.
However, since the videos were produced, Automatic1111 was updated, with changes to the menus and often also to the default options for many settings, making it more difficult to reproduce the results shown in the videos. While I managed to create some interesting pictures using the text-to-image functionality of the models, I was not really able to fine-tune a model to my face, either because the results looked bad or because the process would simply take too long on my system with only 8GB available with my GPU.
So only recently I took another go at it and remembered that refining with Dreambooth was also featured in the Udemy course I did last year, and revisited the Google Colab notebook to re-run it with my own training data.
Have a look into my adjusted copy of that notebook. There you can fine-tune the StableDiffusion 1.5 model (or any other model available on HuggingFace) with your own data. To bring your images in the right format you can use the web tool BIRME, which can resize your images in bulk and already select a good focal point (where the image format is not already square). Then you can also copy the created model checkpoint into your Google Drive to later download it to your local machine and use it in Automatic1111.
One additional step is needed before using the downloaded model locally, which is converting the .ckpt file into a .safetensors file, otherwise Automatic1111 will throw an error, at least it did for me. I used the Ckpt2Safetensors Conversion Tool to perform the conversion.
The Breakthrough: Creating Impressive AI Art
After navigating the intricacies of generative AI, I finally achieved a breakthrough, producing impressive AI art of myself using Stable Diffusion, Dreambooth, and the intuitive GUI-based tool “Automatic1111.” The amalgamation of these open source models and tools allowed me to transcend traditional boundaries and explore the endless possibilities of self-expression through artificial intelligence.
Learnings so far:
- As described in the other tutorials, you should pay attention to your training data having enough variation in terms of angles, lighting, clothing and facial expression. So in case you don’t have enough make some “selfies” looking to the side, up etc. Following the video tutorials suggestions I included mostly pictures of me that were from close enough, around from the waist up. However, what I witness in almost all generated images with the generated “me” being smaller and further away is that the face is messed up. So I wonder if this can improve with more images of me being at a higher distance and will try to re-train. But it could also be that this is a general feature of finetuning at lower resolutions (512x512 pixels) and these pictures need to be refined later anyways.
- Pay attention to your prompts, make them as detailed as possible and also use negative prompts. There are many websites that give some inspiration for good prompts (e.g. here), and the Automatic1111 extension “Prompt Generator” is very useful to create some more ideas.
- Google Colab in the free version is actually not that powerful. During the training process the T4 GPU always used less than 8GB VRAM, which is also what I would have available locally with my GeForce GTX 4070Ti. So I wonder if the issue was simply the different settings. I downloaded the generated .json-files containing the finetuning parameters, so I will try to test these locally again.
- Play around again and again, also testing different settings. The “X/Y/Z” Script functionality can be a useful tool to test different values for the paramters “CFG Scale”, “Scheduler” or simply different random seeds. It might be a good idea to generate a larger batch of images using one CFG value, then take the seeds of the pictures you like and then create image grids with different values of CFG Scale to further refine.
- There is still so much more to explore! Additional models, like ControlNets are able to fix common issues like misformed faces or hands (with extra fingers), or upscale the images to higher resolutions.
Conclusion:
My journey into the world of generative AI art has been a rollercoaster of challenges and triumphs. From acquiring a gaming PC on Black Friday to navigating the complexities of Python and software updates, the experience has been both educational and rewarding. Through perseverance and dedication, I’ve not only learned to harness the power of AI for artistic expression but have also witnessed the transformative potential of these open source models and tools. As the realm of generative AI continues to evolve, so too does the canvas of possibilities for creative individuals eager to explore the intersection of technology and art.
Note: this blog post was co-authored by ChatGPT (GPT 3.5 Turbo), so also expect some post in the near future on LLMs! ;-)