Image Generation using Generative AI

We will discuss techniques to maximize the output and formats from diffusion models.

Format Modifiers

The most basic practice in image generation is to specify the format of the image. AI image models are capable of deploying a wide variety of formats, from stock photo, to oil paintings. The image often looks completely different depending on the format, including the style of the objects or people generated in the image.

Input:
a stock photo of a business meeting

If your blog post or website imagery would look better with something more artistic, you can do that with essentially no limits.

Input:
an oil painting of a business meeting

Art Style Modifiers

One of the great powers of AI image models is their ability to replicate any popular art style or artist. The most common examples shared on social media and AI demos are images in the style of Van Gogh, Dali, or Picasso

We recommend AI artists exercise caution when generating AI art in the distinctive style of any living artist and instead stick to artists who died as rule of thumb (seek legal counsel for any planned commercial use).

Input:
illustration of a dragon, in the style of Alice's Adventures in Wonderland by Dali

Reverse Engineering Prompts

If you didn’t go to art school or don’t know much about film or photography, it can be daunting to try to figure out the art styles, formats, and artists you want to take advantage of. Often you see a picture you like and have no way of describing it in enough detail to re-create it with a prompt. Thankfully, Midjourney’s Describe functionality allows you to reverse engineer a prompt from an image by typing /describe and then uploading the image. It works for both AI-generated images and also normal images from other sources, too.

Negative Prompts

Often two concepts are so intertwined in the training data that they appear together frequently when generating images of one of the concepts, even if that’s not what you specified or intended. For example when you ask for oil paintings, you often get the accompanying frame and surrounding wall, because that’s what’s in the images for a large number of museum collections of these paintings.

There is the ability to add negative prompts in most of the AI models, which allow you to specify what you don’t want in the image. Negative prompts can be used to effectively separate two intertwined concepts and ensure your image doesn’t contain anything you were hoping to avoid. Taking the example of oil paintings and frames, you can add –no to the end of the prompt, and anything in a comma-separated list after that flag will be negated from the prompt. To fix your frames problem, add “frame” and “wall” as a negative prompt.

Input:
oil painting in the style of Rembrandt --no frame, wall

Weighted Terms

Negative prompts are useful if you want to completely negate something, but often you just want to dial it down. To mix and match different concepts, it can be helpful to have control over how much of each you want.

By default all words in a prompt have an equal weighting of 1, although words at the beginning of the prompt have a greater effect, which is why we typically put the subject of our image there by convention. You can change the weights of sections of the prompt in Midjourney by adding a hard break with two colon characters, ::, and a number denoting the new weight. With this method you can make an image that is primarily Van Gogh but with a dash of Dali.

Input:
painting of the Golden Gate Bridge::1 in the style of Van Gogh::0.8, in the style of Dali::0.2

Weights can go higher than 1 as needed for emphasis, or lower if you want to de-emphasize something. You can also add negative weights to the prompt to remove that aspect to varying degrees. The –no parameter used for negative prompts is actually just a shortcut for adding ::-0.5 to that section of the prompt. Using the prior example, you could strip any Van Gogh influence out of Dali’s work by adding a -1 weight to Van Gogh and dialing up the Dali weight to 5.

Input:
painting of the Golden Gate Bridge::1 in the style of Van Gogh::-1, in the style of Dali::5

Prompting with an Image

Many AI image generation tools let you prompt the model not just with text but with an image. Supplying an example image of what you’re going for can give you a great baseline for building something more unique and original, while still matching the style you need. In the Stable Diffusion community this is called Img2Img, whereas in Midjourney you simply link to an image in the prompt.

The image link then should be pasted at the beginning of the Midjourney prompt and accompanied by your text prompt. You don’t need to be as descriptive now that you have given a base image (a picture is worth a thousand words). The image won’t match exactly, but it will be similar to the point of being recognizeable if you know what image was supplied and how it was modified by the prompt.

Input:
https://s.mj.run/XkIHsYIdUxc in the style of The Great Gatsby

Inpainting

Working with AI image generation tools is always an iterative process. Rarely do you get the complete final image on the first try. There are usually artifacts that you want to address, or styles that you want to change. For example, say you had generated an image with Midjourney of a woman in a 1920s-style flapper dress but wanted to change what she was wearing without regenerating the entire image.

The solution is Inpainting, which is available in most implementations of Stable Diffusion, in Midjourney via a feature called Vary Region, and with Adobe Photoshop’s Generative Fill. However, DALL-E pioneered this functionality, and it is still our personal preference in terms of the quality of the results. To demonstrate this functionality, first you generate an image with DALL-E in in ChatGPT (Plus), and then you erase the part of the image you want to regenerate.

Outpainting

Related to inpainting in DALL-E is outpainting, where you generate outside of the frame of the existing image. This technique can in effect zoom out from the existing image to add context around it. This can be used to fill in more detail in an image you have generated or uploaded. Outpainting is no longer available in OpenAI’s labs interface and is not yet available in ChatGPT, but it is called Zoom Out in Midjourney and presents itself as an option for images that have been upscaled.

Meme Unbundling

The main issue with replicating an artist’s style, or emulating an art movement, is that it’s relatively unoriginal. Nobody knows what the legal implications of AI art will be, but certainly artists like Greg Rutkowski and others have already spoken out about the immorality of copying their style.

One way to still get the benefit of the shortcut of referring to a successful artist or art movement, while being more original in your style, is to unbundle the memes of an artist. The word meme doesn’t just mean a funny viral image: it refers to any piece of cultural information that gets copied from person to person. If you can decompose an art style into its component parts and characteristics, then you can use these subcomponents to remix your own style.

Say, for example, you were trying to make a painting of Times Square in the style of Salvador Dali. It works by asking a text generation model (in this case ChatGPT-4) to describe the style of the artist you’re emulating, which you can then use as a prompt.

Input:
Briefly describe the characteristics and artistic style of "The Persistence of Memory" by Salvador Dali, without mentioning the artwork or the artist:

Output:
This is a notable example of Surrealist art, steeped in dream-like, fantastical imagery that pushes the boundaries of traditional understanding and interpretation. The painting is composed with a distinctly precise and detailed technique, reflecting a deceptive realism.
…
Contrasting with the melting objects, there are other items in the scene that exhibit solidity and permanence, creating an uncanny juxtaposition. The painting's composition and imagery invite deep exploration and various interpretations, concerning themes of reality, dreams, time, and the subconscious.

Now that you have a description of one of Dali’s most iconic paintings, you can understand what makes it so visually appealing. From here you need to abbreviate the description to fit into an image model prompt.

Input:
Convert this description into a shortened prompt for an AI image generation model such as DALL-E:

Output:
Surrealist landscape with subdued colors of brown, blue, and yellow. Depict a barren, craggy shoreline and calm sea meeting a clear sky. Include amorphous, melting objects resembling common items, indicating fluidity and impermanence of time. Contrast these with other solid, permanent elements. Evoke a sense of stillness, dreamlike atmosphere, and quietude.

Take this output and add the subject of your painting, Times Square. It can also help to modify the prompt to make it flow better, as the prompts ChatGPT writes can be too instructive:

Input:
Painting of Times Square, surrealist landscape with subdued colors of brown, blue, and yellow, a barren, craggy shoreline and calm sea meeting a clear sky. Include amorphous, melting objects resembling common items, indicating fluidity and impermanence of time. Contrast these with other solid, permanent elements. Evoke a sense of stillness, dreamlike atmosphere, and quietude.