Image Generation AI
Lesson 5.1 — AI Image Generation
A few years ago, creating a high-quality illustration or digital painting required either expensive software, years of skill development, or hiring a professional designer. Today, you can type a sentence and receive a photorealistic image, a watercolour painting, a logo concept, or a cinematic still — in seconds. AI image generation is one of the most visible and discussed applications of modern AI, and it is genuinely useful for a wide range of everyday tasks.
This lesson explains how it works in plain terms, compares the main tools available, teaches you the five-element prompt framework that separates mediocre results from excellent ones, and covers copyright considerations you should know before using AI images commercially.
How It Works (Plain English)
AI image generators are trained on vast collections of images and their associated descriptions — captions, alt text, metadata, web page text. Through this training, the model learns the statistical relationships between words and visual elements: what "golden hour lighting" looks like, how "impressionist" style differs from "hyperrealistic," what a Labrador retriever typically looks like.
When you type a prompt, the model generates an image by working backward from noise — starting with a random field of pixels and progressively refining it, guided by the prompt, until a coherent image emerges. This is why the process is called diffusion.
You are not retrieving an existing image. You are generating a new one that has never existed before — though it is built from patterns learned from millions of real images.
Key takeaway: AI image generators don't copy images from the internet. They generate new images based on patterns learned during training. This matters both for understanding the technology and for copyright discussions.
Tool Comparison
| Tool | Best for | Strengths | Weaknesses | Cost |
|---|---|---|---|---|
| DALL-E 3 (via ChatGPT) | Beginners, text-in-images | Excellent prompt understanding, text rendering, integrated with ChatGPT | Less control over style details, safety filters can be restrictive | Included with ChatGPT Plus |
| Midjourney | High-quality artistic images | Stunning aesthetic quality, consistent style, active community | Runs via Discord (awkward for new users), no free tier | From $10/month |
| Adobe Firefly | Commercial use, design workflows | Trained on licensed content (strong copyright position), integrates with Photoshop | Sometimes less photorealistic than competitors | Included with Creative Cloud; limited free credits |
| Leonardo.ai | Game assets, characters, concept art | Strong on stylised art, character consistency, fine-tuned models | Learning curve for advanced features | Free tier available; Pro from ~$10/month |
| Stable Diffusion | Technical users who want full control | Open source, runs locally, unlimited generation | Requires technical setup, no guided interface | Free (but needs hardware or hosting) |
For beginners: Start with DALL-E 3 inside ChatGPT. The interface is familiar and the results are excellent for most everyday use cases. Move to Midjourney when you want more control over aesthetic quality.
The Five-Element Prompt Framework
Most people type a single sentence and are disappointed by the results. Excellent prompts use five elements, and the difference in output quality is dramatic.
Element 1: Subject
What or who is in the image? Be specific.
- Weak: "a dog"
- Strong: "a golden retriever puppy lying in autumn leaves"
Element 2: Style
What artistic or visual style do you want?
- Photography styles: "photorealistic," "35mm film photography," "studio portrait"
- Art styles: "watercolour," "oil painting," "anime," "flat design illustration," "Art Nouveau"
- Specific artists (check copyright guidance below): "in the style of Monet"
Element 3: Mood and Atmosphere
What feeling should the image evoke?
- "Melancholic and misty," "bright and cheerful," "cinematic and dramatic," "cosy and warm"
Element 4: Technical Parameters
Lighting, perspective, and composition details that photographers and cinematographers use:
- Lighting: "golden hour," "soft overcast light," "dramatic side lighting," "ring light"
- Perspective: "bird's eye view," "close-up macro," "wide-angle establishing shot"
- Camera: "shot on Canon 5D," "85mm portrait lens," "depth of field blur on background"
Element 5: Negative Prompts (where supported)
What to exclude: "no text," "no people," "avoid dark colours," "avoid cartoonish style"
Before and After: The Framework in Practice
Before (basic prompt):
"A coffee shop"
Result: Generic interior, flat lighting, nothing distinctive.
After (five-element prompt):
"A small independent coffee shop interior, morning light streaming through large windows, warm and cosy atmosphere, photorealistic, 35mm film photography aesthetic, shallow depth of field, latte art visible on the counter, no people, no text, warm amber tones"
Result: A specific, atmospheric, usable image that could appear in a magazine or on a website.
Before:
"A person working from home"
After:
"A young woman working at a minimalist wooden desk, large monitor with code visible, afternoon light from a window on the left, realistic photography, shallow depth of field background blur, cosy home office aesthetic, plants visible behind her, no phone visible, natural skin tones, professional but relaxed mood"
The second prompt gives you something you could actually use in a presentation or on a website. The first gives you a stock photo cliché.
Common Failure Modes and Fixes
| Problem | Likely cause | Fix |
|---|---|---|
| Hands look wrong | Known AI weakness | Use negative prompts; regenerate multiple times |
| Text in image is garbled | Models struggle with text | DALL-E 3 handles text better than most; keep text very short |
| Image is generic/boring | Prompt lacks specificity | Add style, mood, and technical elements |
| Wrong number of subjects | AI often duplicates or drops items | Regenerate; describe positioning explicitly |
| Style not consistent | No style anchor | Reference a specific art movement or photographic style |
Copyright: What You Need to Know
This is a rapidly evolving area, but here is where things stand as of 2025:
Who owns AI-generated images? In most jurisdictions, AI-generated images cannot be copyrighted by the person who created the prompt, because copyright requires human authorship. In the US and UK, pure AI output is not protectable. Significantly modified outputs may be.
Can you use AI images commercially? It depends on the tool:
| Tool | Commercial use | Notes |
|---|---|---|
| DALL-E 3 | Generally yes, per OpenAI terms | Check current terms; some restrictions apply |
| Midjourney | Paid plans only; not on free tier | Check your subscription level |
| Adobe Firefly | Yes — explicitly designed for commercial use | Strongest copyright protection due to training data |
| Leonardo.ai | Paid plans include commercial rights | Check per-plan details |
| Stable Diffusion | Generally yes for outputs | Training data copyright debates ongoing |
The "in the style of" question: Prompting for "in the style of [living artist]" is ethically contested and legally uncertain. Style itself is generally not copyrightable, but many artists object strongly to it. For commercial work, stick to historical movements ("in the style of Impressionism") or describe the style without naming an individual artist.
Safe practice: For anything commercial, use Adobe Firefly or generate images with a tool whose terms explicitly permit commercial use on your plan level.
Practical Uses for Beginners
- Blog and social media images — unique illustrations without stock photo licences
- Presentation visuals — custom diagrams, conceptual images, cover slides
- Product mockups — rough visualisations of ideas before committing to design
- Brainstorming — generating visual options quickly to explore a creative direction
- Personal projects — custom artwork, gifts, creative exploration
Practice Task
Open DALL-E 3 (via ChatGPT) or Midjourney. First, type a simple one-line prompt and note the result. Then build a full five-element prompt for the same subject and compare. The difference in quality should be immediately visible.