Even though I'm part of the bigger stable diffusion model under different names, I've noticed that it's not that great at reproducing the style of lesser-known artists without putting in a lot of extra work. People complain that Stable Diffusion is stealing their precious little art styles through sampling, but the reality is that unless you're a big name with lots of online presence, the system isn't that good at recreating your work. In many ways, this is a good thing. But when you want the system to help you recreate your own art and mix your past work to make more interesting future work, it's not really helpful.
Sure, I can get there by adding little details and mentioning your influences, but it's still tough to get something that reliably recreates my art style in a way that's useful for making new art.
So, my solution was to create a new model for stable diffusion that's focused on my avatar art.
Initially, I tried taking shortcuts and doing things the 'normie' way, but it turned out to be a complete disaster. The overtraining these systems do led to model spitting out training data with little variation. Follow the directions on some of the more technical ones, and you get a lot of mud, and the system forgetting what bodies look like. The first couple I did produced these beautiful abstract kaleidoscopic masterpieces with heads inside heads floating in the ether to be gazed at by the daring or stupid. It was great! And the experience was pretty awesome, but at the end of the day, I didn't quite get to the more mundane useful space I wanted to be in with it.
Part of the problem here was training data. I learned that if you're going to train a non-cognitive system on something as specific as an art style, that you have to be really intentional with what you tell it to recreate. And it's interesting what the system gloms onto and what it doesn't. For example, Stable Diffusion seems to have a strong bias towards generating images of thin women with large boobs. One busty character can influence all of it. Also, things like scales, tattoos, and other markings that you might think of as incidental end up being very important to the machine.
The other thing that derailed the earlier experiments was the tools themselves. While Dreambooth gives you the full set of options to work with, third party services aimed at non-technical or power user demographics will skimp on some of the more advanced features that you actually need to make a successful style training work, in favor of usability.
It's frustrating. On the one hand, the normie oriented services wouldn't quite get me there. On the other, Dreambooth can really seem like this daunting mountain of unknown things that you have to climb. But I do think it's the right tool for the job. At least, if your aim, like mine, is to recreate your own art style in a reliable way.
It exposed me to a variety of settings that I didn't quite understand. It took some trial and error to get my models to match my art style. But once I figured it out, the results were impressive.
To build my training dataset, I used selections from the last two years of my 3D character creations. Mostly cc4 toon models that I crafted myself. Some of the earlier had to be reworked for less realism and a more toony look, so as not to offset the information with too much photorealistic content. Then, I opened them all up in iClone and added backgrounds, which made all the difference.
Once the dataset was in place, I set the system to use 60 sample steps instead of 100. Additionally, I'm using both Clip and Deppburu to ensure that my prompts have the flexibility and variability that I need. They work together to fill in each other's shortcomings and hopefully create a more detailed representation of the overall art style. And with both in place in the language model, it accounts for the variability of prompting grammar a little better, which makes it harder to break. I know, this is me, talking about not breaking a diffusion model, lol. I get it. But remember the whole goal here is reliability.
I'm also not using EMA weights or VAE to get the final product. Instead, I'm using an instance prompt, which creates a hardcoded art style that can be called up by keywords. My instance prompt here is "Lynn3-style." However, since the training data affects both the instance prompt and the overall training of the model, I'm including samples from both to show you the results at different steps in the process.
Here we are at 10 steps:
This time, you can really see my sample data taking hold. It's not over-fitting at all, and it's beginning to essentially capture the essence of the art style, which is fantastic. But it's a little rough. So much boobage, though.
Here's ten steps without the instance prompt, so we can see how the larger training is affecting the larger dataset. Only glimmers of it are peaking through, but it looks promising. You can also see some of my luds and color grading happening, even if you can't see the actual style in a more pronounced way like you can with the instance prompt.
At 20 Steps:
It's rough, but this is basically it. At the instance prompt, this is basically my art style. The assets are iffy. One thing I'm noticing at this phase of the process is how incredibly limited the clothing selections are in the training data. This is a problem that's going to get so much worse as the over all style starts to melt together a little more. So the lesson here? Lighting is important, yes. But don't forget to provide the machine with diverse and novel clothing options.
And the larger dataset is starting to take shape too, taking more style cues from the training data and slowly beginning to melt them down, so that my style is all they know how to create. This is going pretty smooth at this point.
At 46ish Steps:
The training started and stopped a few times, so the numbers get a little wonky. Let's just call this one... sort of around 30 steps? Now we start to see it get a lot sharper. The lighting's better, the images are less blurry, crisper, and they don't have as much of the 3d stink to them. 3d has this texture to it. No matter how you do it. And there's still some here, but it's getting better.
That said, even at this spot in the training, there's definitely some over-fitting going towards the end. It's not awful, but it is there. If I was training for specific characters, I would actually want over-training. That would let me deliver more consistent results for specific characters. But since I'm training a style, and not characters, the fact that it's beginning to glom on to characters like this is a little concerning.
About half of these look like totally new characters.
But where I was surprised was to see the influence on the larger set. It's interesting what's happening here as the training goes on.
And 59 Steps:
This is actually pretty smooth, I think. I tones down the 3d stink, and creates something a bit smoother and more cohesive. Some of the over-fitting here is intense though. Like images 5, 8 and 9 are basically raw training data (minus the hair color). They're close enough that I'm concerned about it.
And we leave the final dataset in a place where, it's clearly influenced by my training data, but the influence isn't quite as pronounced as it is with the instance prompt. This certainly warrants further experiments at higher sample rates in the future.
So children, what did we learn through all this?
Well, a lot actually
1. There's this real struggle going on between consistency and smoothness that's both interesting, and hard to overcome. You can get a rough likeness of your style pretty easily, but go too far, and you see some pretty serious over fitting.
2. Diversity of clothing options and background are important.
3. Don't train weird skin textures unless your goal is to have weird skin textures as part of everything in your model. If you do train weird skin textures or color, the thing you want to do is annotate the hell out of them because neither clip nor deepbaru pick up on that. Also, on a related note, investigate other interrogation systems.
4. Stable Diffusion likes luts and color grading, a lot. More than lighting. Good to know.
5. Don't train gigantic boobs unless you want gigantic boobs on everything.
6. Stable Diffusion 1.5 has a strong bias towards thin, often unattainable, frequently impossible body types. I'm no saint here, and I feel like art in general does, but we need to plan on breaking that in the future.
In any case darlings, thank you for reading this far.
I hope my happy little post was useful and informative.