The evolution of AI requires securing high-quality training data. However, in recent years, it is said that data on the internet has been exhausted, and many AI companies are exploring new methods of data supply.
This time, we introduce the background, corporate initiatives, and points that creators should pay attention to.
Depletion of Training Data and Use of Synthetic Data
In recent years, Elon Musk has pointed out that "the cumulative knowledge of humanity has been exhausted in AI training." To solve this data depletion problem, many companies have started utilizing "synthetic data."
Synthetic data is a method where AI-generated data is used again for training, but there are concerns that the quality of data may deteriorate, reducing AI accuracy.
Generation of Synthetic Data and Impact on Photographic Works
To address this data shortage, companies like are developing technologies to train AI models from a small number of real images. Meanwhile, companies like and are advancing attempts to relearn data generated by AI using synthetic data.
However, particularly in the case of photographic data, it is said that it is difficult to reproduce realistic textures and compositions with synthetic data alone, and there remains a high dependence on high-quality real photographic data. As a result, there is a tendency to use a large amount of past photographic data.
Rights Protection and Measures Creators Should Pay Attention To
It is common for creators' works to be used as AI training data, raising copyright and ethical issues. It is important to understand whether your work is being used for AI training and to take necessary steps for rights protection if needed.
Additionally, with the increase in AI-generated content, there is a movement to re-evaluate the value of originality.