
The Discomfort Felt When Viewing Generated Images
AI image generation attempts to faithfully recreate the input text. The sincerity in executing what is told and learning from a vast database is a 'divine skill' beyond human cognitive reach, something we humans cannot achieve. Despite this, the resulting images often fail to captivate the human heart. This time, we explore the reasons for the discomfort between AI-generated images and photographs.
From the Perspective of Whitespace
Many generated images correctly convert text into images. Most of them are devoid of waste, in other words, they lack 'whitespace'. 'Whitespace' refers to the intentional or unintentional space or information left, and AI-generated images are often overflowing with detail, lacking this whitespace. While they convey information faithfully, they do not include context.

Image by sora.KagiAke
However, 'whitespace' encompasses human emotional nuances and hidden meanings. This context-based understanding, rooted in emotion, is an area where AI, lacking a physical form, is weakest. AI excels at processing images literally, but it is still developing in creating whitespace that includes emotional depth and hidden meanings.
Humanity Reflected in Whitespace
Photos taken by humans contain meanings beyond mere image recording. They may include unpredictability, a spectrum of emotions, and sometimes elements considered 'mistakes'. All these accidental products become part of the story conveyed by the photo, providing viewers with empathy and room for imagination.

Image by ザワ
The Process of AI Image Generation
We have seen that the discomfort between AI-generated images and photographs indeed depends on whether there is 'whitespace' between them. Let's deepen our understanding by learning about the process of AI image generation.
- Text Conversion: AI converts the input words into easily interpretable concepts. For example, if you input the text 'birds flying in the sky', AI understands this as an element of the image and converts it into basic instructions for depicting a seaside landscape or a dog's figure.
- Image Generation from Noise: AI starts with completely random noise (a collection of random pixels without features) and gradually removes this noise to form a concrete image.
- Image Decoding: Image decoding is the process of converting AI-generated image data into a form easily understood by the human eye. This includes converting to image formats like JPEG or PNG that humans can view, adjusting the image resolution, color, and retouching internal representations.

Image by mars
The Dissonance Created by Differences in AI and Human Perspectives
The process of 'noise' removal when AI generates images fundamentally follows mathematical calculations and algorithms. Removing noise means bringing the image closer to reality, but there are limits. The spontaneity, emotional richness, and imperfections naturally present in human photography are elements difficult to capture through calculation. At this point, differences in perception of noise already arise between AI and humans.
In the detailed processes involved in generating images, it is currently difficult for AI to fully understand and reproduce human complexity and subtle nuances. AI excels at literal interpretation and data-based output, but it has not yet fully mimicked the depth of human experience and emotion. To improve accuracy, what AI may need now is to embrace the seemingly meaningless 'whitespace' of everyday human life.