Text-to-image AI systems are booming in both ability and popularity right now, and what better proof than their appearance in the world’s hottest app: TikTok.
The video platform recently added a new effect it calls “AI greenscreen” that allows users to type in a text prompt that the software will then generate as an image. This image can then be used as the background to a video — potentially a very useful tool for creators.
The output of TikTok’s system is pretty basic compared to that of state-of-the-art text-to-image models like Google’s Imagen, OpenAI’s DALL-E 2, or Midjourney’s eponymous software. It creates only rather abstract and swirling images; a strength reflected in the dreamy nature of TikTok’s suggested prompts like “astronaut in the ocean” and “flower galaxy.” Other models, by comparison, can produce both photorealistic imagery and complex and coherent illustrations that look like they were drawn or painted by humans.
The limitations of TikTok’s model may well be intentional, though. First, more advanced models require greater computing power, which would be expensive and resource-intensive for the company to implement. Secondly, TikTok has more than a billion users, and giving all these individuals the power to create photorealistic images of anything they can imagine would almost certainly produce some troubling results.
For example, we tested the models ability to create nudity and gore — two types of output that text-to-image generators often try to limit. Pictures based on violent prompts like “assassination of Boris Johnson” and “assassination of Joe Biden” produce mostly abstract swirls, with a just-about-recognizable face for the UK’s prime minister (though the man’s familiar blond mop does makes caricature particularly easy).
Likewise, a request involving nudity — “naked model on beach” — produces thematically appropriate colors, including flesh-tones, sandy oranges, and ocean blues, but nothing that would make a vicar blush.
What’s notable about the appearance of TikTok’s “AI greescreen,” then, is that it shows just how fast this technology is going mainstream. The latest cycle of development for text-to-image AI arguably began in 2021 with the original release of DALL-E by OpenAI. Less than two years later and the tech is already in the hands of millions via an app like TikTok.
Given the potential of these systems for both harm and good, things are only going to get stranger from here on in.
Read the full article here