Stable Diffusion
My page where I write some notes and links about Stable Diffusion. Stable Diffusion is a diffusion model that turns text into images, just like Midjourney.
Stable Diffusion is not as good as Midjourney, but it's open source and free. You can install it on your computer, you can train your own model, tweak it, and do so many more things with it than with Midjourney, that it has attracted a huge community of makers.
If you found that using Midjourney via Discord was complicated, wait until you try using Stable Diffusion. This is really a tool for geeks. You'll have much more freedom, but many more choices to make, concepts to understand, and probably error messages to tackle.
You'll need a pretty good graphic card too.
Vocabulary
Stable Diffusion comes with a lot of concepts that you will need to understand in order to use its full capacities.
- checkpoint: a model used to generate images. Many people have trained their own models using their collections of images. For a strange reason, these models are not called models, but checkpoints. Models are very big (2 to 7 GB).
- lora: a fine training of a model, that you need to apply on top of an already existing model. By adding a lora on top of a model, you can teach new things (concepts, characters, words, styles, ...) to an already existing model. A lora is some kind of increment of a model, and is much lighter than a complete model. (10 to 200 MB)
- embeddings (or textual inversions): a vector that represents a certain concept or character. When you give a prompt to a diffusion model (Stable Diffusion, Midjourney, ...) that text is first turned into a vector that is supposed to represent that prompt. Then that vector is given to the model so that it can generate an image. There are ways to train the embedding algorithm so that it can recognize new words. Embeddings are typically from 10 to 100 kB in size. Yes, that's small, but keep in mind that an embedding is just associated to one concept (like John Lennon).
Versions
Stable Diffusion 2.0 and 2.1 are the latest versions of the Stable Diffusion model. Unfortunately, these models were trained differently and a lot of people don't find them any better than the previous 1.5 model. So most people stick with the oldest 1.5 model.
That's the thing with Stable Diffusion: you'll get the freedom to chose which model suits your needs best, ... but you might get a bit lost with all these available models.
Links
- Civitai: website that has a lot of resources for Stable Diffusion: checkpoints, lora, embeddings, ... with download counts, rating, comments, ... this is the place to go for the best resources for using Stable Diffusion.
- List of models on Stable-Diffusion-Art.com
- Automatic1111: most popular web GUI to use Stable Diffusion. This is not a Stable Diffusion model. This is a software that acts as a web interface so that you can use a Stable Diffusion model from your web browser on your local machine.