Prompts behave very differently when you mess with the noise patterns vs text2img. This is why we do it. This first one is our starting point, the noise painting. Don't worry, this one's pretty simple, but it does rearrange the latent space in an interesting way.
Note: For the purposes of this discussion, a noise paint is a hand made image with novel noise patterns, that often utilize anamorphic imagery, logographic shorthand, and abstract elements, put in place with the intention of breaking the typical Stable Diffusion use case.
In theory, anything that you describe badly can function the same way, but the way I use noise paintings is engineered to exploit common errors and mistakes the AI makes, using them to generate visual presentations that are novel, or unusual. Below is an example of the one I used to create this project with. It was created in Clip Studio Paint, and makes heavy use of the liquify brush. While I have created noise paintings that are more traditional, and designed to be finished pieces of art... this is not one of them.
This is the final piece based on the original noise painting, and it's cool, so I'm happy with it. We got there by long tail prompting to the tune of about 150/75 tokens. That's normal for me. We did the denoising at .63, I think, and the secondary guidance at 16.
Note: For the sake of brevity, a token is roughly equivalent to a word, so we're talking about a total instructional payload of around 225 words between the positive prompt, and the shorter negative prompt we're doing.
But, bear in mind that 16 is high. I like to turn it up high for long prompts because the higher you turn up the cfg, the closer the system tries to stay to your prompt. And, since we are dadaists, we like to keep our computers confused, dizzy, and hallucinating like crazy. But, we also want some level of order in this chaos we're intentionally creating, and we want to make the math on the back and as needlessly complicated as possible, so the computer has to use that cute little neural net to optimize the differential. Work it baby!
Note: Stable Diffusion refers to it as a "configuration." However, everywhere else in machine learning world, it's called "secondary guidance." I'm using the terms interchangeably. They mean the same thing.
Good art comes from dizzy computers.
Still with me after all that ai jargon? You can just lie and nod your head, it's okay. You're pretty, so I'll let you get away with it. Shhhhh... I won't tell a soul.
Anyway, that's what gives us our final result, which has nearly all of the elements we described in the prompt, in more or less the places I told the machine to put them. I like this one because it is precisely as weird and surreal as I wanted it to be. Granted, it could still use some cleanup, but for now, I think we're fine to talk about it as is.
Testing Against the Default Noise Levels
But, see, the thing is... I'm not really content to leave it there.
Are you? No, of course not. You want me to push further, so I will.
The very next thing I want to try is the original prompt that we developed during this workflow, but instead of running it as an image modifier, I'm going to run it as text, so I can compare the final to the effect of the prompt itself. If I'm successful, the prompt based version with the same seed should look very different, and it does.
Don't worry, this beautiful abomination wasn't rendered to be a production piece. Not enough samples, very little if any refinement or correction. We just want to get the broad strokes of what the prompt does, minus the intentional noise manipulation we were doing earlier.
So far, so good. We now have evidence that I thoroughly broke the machine using our happy little noise manipulation scheme, and that our combination of starting image and long tail prompt did the job we set out to do.
Note: It's important to test because there are a million things that can go wrong with your settings, and this isn't always the case.
Breaking Out of Stereotypes
The next metric I like to look at is trainability.
Note for the Artstation version of this post: So, by now you've probably heard some uninformed idiot somewhere say that AI is like this virus that sucks up everything there is, can copy anything, and no art anywhere is safe, omg, the sky is on fire, everybody run for cover, and bring out your pitchforks because it's the end of art, omg, omg, omg... or something to that effect.
Well, I'm happy to report that this kind of hyperbole simply isn't true. Ai's don't work that way. In reality, they work on the principle of data flattening. They observe and record stereotypes, which makes them useful as base tools, but you can actually measure how far you are from any given stereotype, based on the difference between any given source image you test, and the automated description of that image, generated by tools like the VITS engine for CLIP. Basic interrogation is going to be a mandatory feature going forward for all artists over the next few years. Just... get used to it?
Now, I personally don't mind if my images are trained or not. I encourage people to take my noise paintings and renders and do whatever they want to them. Steal them, mint them, repurpose them, take credit for them. I do not care. But, I do want to make sure that I'm operating above the stereotype, because that's a test of compositional uniqueness that I feel is important.
High Trainability means that I'm at, near, or inside the stereotype, which for my purposes means boring, like anything by Karla Ortiz.
Note: I'm using term used in the technical sense, not necessarily the derogatory sense, but there can be and often is overlap. The stereotype is the average of all art ever created and scanned relative to anything you're evaluating against.
But perhaps the better way to think about is around the way image interrogation works. Like when CLIP gives me back a string of text that will generate the same image, or something eerily similar when I push it back through Stable Diffusion with the same seed at default noise levels, minus the image reference.
If we end up with a dozen images of house faced ladies in dark rooms with cars and trees, then we have triggered the failure scenario, and should probably start again. Some similarity is okay, and unavoidable when doing this, but my personal goal is to end up with something different. Not per se because I don't want to be trained, but because I want to produce something interesting and somewhat unique.
Low trainability means that entering the same CLIP caption will generate something entirely different visually. This tells us that the image we created based on the noise painting doesn't easily fit within the bland average of all art ever made.
Part of the fun of re-arranging the noise patterns, is that you can get the machine to put out images it couldn't easily accept back in. But You lose your objectivity when you look at massive volumes of art, which I think is normal, and it helps to have at least a vague way to evaluate how well you're doing against the metrics that matter.
And, as you can see... the work generated by the CLIP caption looks... nice, and it's still a fairly complex piece for a stable diffusion render. But, you'll notice that the output is much simpler, and stylistically different than the earlier one.
CLIP Caption (for educational purposes only): a painting of a woman, russ mills, glimpse of red, abstract facades of buildings, perky woman made of petals, david luong, maze of streets, her gaze is downcast, black and red scheme, mark brooks detailed, image split in half, spiralling
A job well done, I say!
Hang on while I pat myself on the back for a job well done. *pat pat pat*
And there you go. A look into my little brain for usable data! I'm not willing to share the prompts themselves in a public post, but you would like to play with them, let me know, and I'll drop you the png's with elevendy eleven emebeds in them.
Anyway babies, I love you all.
You're beautiful and sweet.
Talk soon!