Modules/Module 5/Lesson 1
Lesson 1 of 6 ~10 min read

Image Generation AI

Lesson 5.1 — AI Image Generation

Colorful digital artwork created by AI on a screen

A few years ago, creating a high-quality illustration or digital painting required either expensive software, years of skill development, or hiring a professional designer. Today, you can type a sentence and receive a photorealistic image, a watercolour painting, a logo concept, or a cinematic still — in seconds. AI image generation is one of the most visible and discussed applications of modern AI, and it is genuinely useful for a wide range of everyday tasks.

This lesson explains how it works in plain terms, compares the main tools available, teaches you the five-element prompt framework that separates mediocre results from excellent ones, and covers copyright considerations you should know before using AI images commercially.


How It Works (Plain English)

AI image generators are trained on vast collections of images and their associated descriptions — captions, alt text, metadata, web page text. Through this training, the model learns the statistical relationships between words and visual elements: what "golden hour lighting" looks like, how "impressionist" style differs from "hyperrealistic," what a Labrador retriever typically looks like.

When you type a prompt, the model generates an image by working backward from noise — starting with a random field of pixels and progressively refining it, guided by the prompt, until a coherent image emerges. This is why the process is called diffusion.

You are not retrieving an existing image. You are generating a new one that has never existed before — though it is built from patterns learned from millions of real images.

Key takeaway: AI image generators don't copy images from the internet. They generate new images based on patterns learned during training. This matters both for understanding the technology and for copyright discussions.


Tool Comparison

ToolBest forStrengthsWeaknessesCost
DALL-E 3 (via ChatGPT)Beginners, text-in-imagesExcellent prompt understanding, text rendering, integrated with ChatGPTLess control over style details, safety filters can be restrictiveIncluded with ChatGPT Plus
MidjourneyHigh-quality artistic imagesStunning aesthetic quality, consistent style, active communityRuns via Discord (awkward for new users), no free tierFrom $10/month
Adobe FireflyCommercial use, design workflowsTrained on licensed content (strong copyright position), integrates with PhotoshopSometimes less photorealistic than competitorsIncluded with Creative Cloud; limited free credits
Leonardo.aiGame assets, characters, concept artStrong on stylised art, character consistency, fine-tuned modelsLearning curve for advanced featuresFree tier available; Pro from ~$10/month
Stable DiffusionTechnical users who want full controlOpen source, runs locally, unlimited generationRequires technical setup, no guided interfaceFree (but needs hardware or hosting)

For beginners: Start with DALL-E 3 inside ChatGPT. The interface is familiar and the results are excellent for most everyday use cases. Move to Midjourney when you want more control over aesthetic quality.


The Five-Element Prompt Framework

Most people type a single sentence and are disappointed by the results. Excellent prompts use five elements, and the difference in output quality is dramatic.

Element 1: Subject

What or who is in the image? Be specific.

  • Weak: "a dog"
  • Strong: "a golden retriever puppy lying in autumn leaves"

Element 2: Style

What artistic or visual style do you want?

  • Photography styles: "photorealistic," "35mm film photography," "studio portrait"
  • Art styles: "watercolour," "oil painting," "anime," "flat design illustration," "Art Nouveau"
  • Specific artists (check copyright guidance below): "in the style of Monet"

Element 3: Mood and Atmosphere

What feeling should the image evoke?

  • "Melancholic and misty," "bright and cheerful," "cinematic and dramatic," "cosy and warm"

Element 4: Technical Parameters

Lighting, perspective, and composition details that photographers and cinematographers use:

  • Lighting: "golden hour," "soft overcast light," "dramatic side lighting," "ring light"
  • Perspective: "bird's eye view," "close-up macro," "wide-angle establishing shot"
  • Camera: "shot on Canon 5D," "85mm portrait lens," "depth of field blur on background"

Element 5: Negative Prompts (where supported)

What to exclude: "no text," "no people," "avoid dark colours," "avoid cartoonish style"


Before and After: The Framework in Practice

Before (basic prompt):

"A coffee shop"

Result: Generic interior, flat lighting, nothing distinctive.

After (five-element prompt):

"A small independent coffee shop interior, morning light streaming through large windows, warm and cosy atmosphere, photorealistic, 35mm film photography aesthetic, shallow depth of field, latte art visible on the counter, no people, no text, warm amber tones"

Result: A specific, atmospheric, usable image that could appear in a magazine or on a website.


Before:

"A person working from home"

After:

"A young woman working at a minimalist wooden desk, large monitor with code visible, afternoon light from a window on the left, realistic photography, shallow depth of field background blur, cosy home office aesthetic, plants visible behind her, no phone visible, natural skin tones, professional but relaxed mood"

The second prompt gives you something you could actually use in a presentation or on a website. The first gives you a stock photo cliché.


Common Failure Modes and Fixes

ProblemLikely causeFix
Hands look wrongKnown AI weaknessUse negative prompts; regenerate multiple times
Text in image is garbledModels struggle with textDALL-E 3 handles text better than most; keep text very short
Image is generic/boringPrompt lacks specificityAdd style, mood, and technical elements
Wrong number of subjectsAI often duplicates or drops itemsRegenerate; describe positioning explicitly
Style not consistentNo style anchorReference a specific art movement or photographic style

Copyright: What You Need to Know

This is a rapidly evolving area, but here is where things stand as of 2025:

Who owns AI-generated images? In most jurisdictions, AI-generated images cannot be copyrighted by the person who created the prompt, because copyright requires human authorship. In the US and UK, pure AI output is not protectable. Significantly modified outputs may be.

Can you use AI images commercially? It depends on the tool:

ToolCommercial useNotes
DALL-E 3Generally yes, per OpenAI termsCheck current terms; some restrictions apply
MidjourneyPaid plans only; not on free tierCheck your subscription level
Adobe FireflyYes — explicitly designed for commercial useStrongest copyright protection due to training data
Leonardo.aiPaid plans include commercial rightsCheck per-plan details
Stable DiffusionGenerally yes for outputsTraining data copyright debates ongoing

The "in the style of" question: Prompting for "in the style of [living artist]" is ethically contested and legally uncertain. Style itself is generally not copyrightable, but many artists object strongly to it. For commercial work, stick to historical movements ("in the style of Impressionism") or describe the style without naming an individual artist.

Safe practice: For anything commercial, use Adobe Firefly or generate images with a tool whose terms explicitly permit commercial use on your plan level.


Practical Uses for Beginners

  • Blog and social media images — unique illustrations without stock photo licences
  • Presentation visuals — custom diagrams, conceptual images, cover slides
  • Product mockups — rough visualisations of ideas before committing to design
  • Brainstorming — generating visual options quickly to explore a creative direction
  • Personal projects — custom artwork, gifts, creative exploration

Practice Task

Open DALL-E 3 (via ChatGPT) or Midjourney. First, type a simple one-line prompt and note the result. Then build a full five-element prompt for the same subject and compare. The difference in quality should be immediately visible.