Lesson 5.1 — AI Image Generation

Colorful digital artwork created by AI on a screen

A few years ago, creating a high-quality illustration or digital painting required either expensive software, years of skill development, or hiring a professional designer. Today, you can type a sentence and receive a photorealistic image, a watercolour painting, a logo concept, or a cinematic still — in seconds. AI image generation is one of the most visible and discussed applications of modern AI, and it is genuinely useful for a wide range of everyday tasks.

This lesson explains how it works in plain terms, compares the main tools available, teaches you the five-element prompt framework that separates mediocre results from excellent ones, and covers copyright considerations you should know before using AI images commercially.

How It Works (Plain English)

AI image generators are trained on vast collections of images and their associated descriptions — captions, alt text, metadata, web page text. Through this training, the model learns the statistical relationships between words and visual elements: what "golden hour lighting" looks like, how "impressionist" style differs from "hyperrealistic," what a Labrador retriever typically looks like.

When you type a prompt, the model generates an image by working backward from noise — starting with a random field of pixels and progressively refining it, guided by the prompt, until a coherent image emerges. This is why the process is called diffusion.

You are not retrieving an existing image. You are generating a new one that has never existed before — though it is built from patterns learned from millions of real images.

Key takeaway: AI image generators don't copy images from the internet. They generate new images based on patterns learned during training. This matters both for understanding the technology and for copyright discussions.

Tool Comparison

Tool	Best for	Strengths	Weaknesses	Cost
DALL-E 3 (via ChatGPT)	Beginners, text-in-images	Excellent prompt understanding, text rendering, integrated with ChatGPT	Less control over style details, safety filters can be restrictive	Included with ChatGPT Plus
Midjourney	High-quality artistic images	Stunning aesthetic quality, consistent style, active community	Runs via Discord (awkward for new users), no free tier	From $10/month
Adobe Firefly	Commercial use, design workflows	Trained on licensed content (strong copyright position), integrates with Photoshop	Sometimes less photorealistic than competitors	Included with Creative Cloud; limited free credits
Leonardo.ai	Game assets, characters, concept art	Strong on stylised art, character consistency, fine-tuned models	Learning curve for advanced features	Free tier available; Pro from ~$10/month
Stable Diffusion	Technical users who want full control	Open source, runs locally, unlimited generation	Requires technical setup, no guided interface	Free (but needs hardware or hosting)

For beginners: Start with DALL-E 3 inside ChatGPT. The interface is familiar and the results are excellent for most everyday use cases. Move to Midjourney when you want more control over aesthetic quality.

The Five-Element Prompt Framework

Most people type a single sentence and are disappointed by the results. Excellent prompts use five elements, and the difference in output quality is dramatic.

Element 1: Subject

What or who is in the image? Be specific.

Weak: "a dog"
Strong: "a golden retriever puppy lying in autumn leaves"

Element 2: Style

What artistic or visual style do you want?

Photography styles: "photorealistic," "35mm film photography," "studio portrait"
Art styles: "watercolour," "oil painting," "anime," "flat design illustration," "Art Nouveau"
Specific artists (check copyright guidance below): "in the style of Monet"

Element 3: Mood and Atmosphere

What feeling should the image evoke?

"Melancholic and misty," "bright and cheerful," "cinematic and dramatic," "cosy and warm"

Element 4: Technical Parameters

Lighting, perspective, and composition details that photographers and cinematographers use:

Lighting: "golden hour," "soft overcast light," "dramatic side lighting," "ring light"
Perspective: "bird's eye view," "close-up macro," "wide-angle establishing shot"
Camera: "shot on Canon 5D," "85mm portrait lens," "depth of field blur on background"

Element 5: Negative Prompts (where supported)

What to exclude: "no text," "no people," "avoid dark colours," "avoid cartoonish style"

Before and After: The Framework in Practice

Before (basic prompt):

"A coffee shop"

Result: Generic interior, flat lighting, nothing distinctive.

After (five-element prompt):

"A small independent coffee shop interior, morning light streaming through large windows, warm and cosy atmosphere, photorealistic, 35mm film photography aesthetic, shallow depth of field, latte art visible on the counter, no people, no text, warm amber tones"

Result: A specific, atmospheric, usable image that could appear in a magazine or on a website.

Before:

"A person working from home"

After:

"A young woman working at a minimalist wooden desk, large monitor with code visible, afternoon light from a window on the left, realistic photography, shallow depth of field background blur, cosy home office aesthetic, plants visible behind her, no phone visible, natural skin tones, professional but relaxed mood"

The second prompt gives you something you could actually use in a presentation or on a website. The first gives you a stock photo cliché.

Common Failure Modes and Fixes

Problem	Likely cause	Fix
Hands look wrong	Known AI weakness	Use negative prompts; regenerate multiple times
Text in image is garbled	Models struggle with text	DALL-E 3 handles text better than most; keep text very short
Image is generic/boring	Prompt lacks specificity	Add style, mood, and technical elements
Wrong number of subjects	AI often duplicates or drops items	Regenerate; describe positioning explicitly
Style not consistent	No style anchor	Reference a specific art movement or photographic style

Copyright: What You Need to Know

This is a rapidly evolving area, but here is where things stand as of 2025:

Who owns AI-generated images? In most jurisdictions, AI-generated images cannot be copyrighted by the person who created the prompt, because copyright requires human authorship. In the US and UK, pure AI output is not protectable. Significantly modified outputs may be.

Can you use AI images commercially? It depends on the tool:

Tool	Commercial use	Notes
DALL-E 3	Generally yes, per OpenAI terms	Check current terms; some restrictions apply
Midjourney	Paid plans only; not on free tier	Check your subscription level
Adobe Firefly	Yes — explicitly designed for commercial use	Strongest copyright protection due to training data
Leonardo.ai	Paid plans include commercial rights	Check per-plan details
Stable Diffusion	Generally yes for outputs	Training data copyright debates ongoing

The "in the style of" question: Prompting for "in the style of [living artist]" is ethically contested and legally uncertain. Style itself is generally not copyrightable, but many artists object strongly to it. For commercial work, stick to historical movements ("in the style of Impressionism") or describe the style without naming an individual artist.

Safe practice: For anything commercial, use Adobe Firefly or generate images with a tool whose terms explicitly permit commercial use on your plan level.

Practical Uses for Beginners

Blog and social media images — unique illustrations without stock photo licences
Presentation visuals — custom diagrams, conceptual images, cover slides
Product mockups — rough visualisations of ideas before committing to design
Brainstorming — generating visual options quickly to explore a creative direction
Personal projects — custom artwork, gifts, creative exploration

Practice Task

Open DALL-E 3 (via ChatGPT) or Midjourney. First, type a simple one-line prompt and note the result. Then build a full five-element prompt for the same subject and compare. The difference in quality should be immediately visible.

Image Generation AI