Generate Images Directly in ChatGPT with GPT-4o

No More Gobbledygook: OpenAI’s New Image Model Produces Clear Text

OpenAI has introduced “Images in ChatGPT,” which allows users to generate images directly in the ChatGPT interface for the first time. The GPT-4o model powers this development, enabling users to generate images directly in their chat sessions, which represents a major advancement in AI-generated content.

“Images in ChatGPT” extends access to sophisticated image creation across all subscription levels, including Plus, Pro, Team, and the free version, to promote equitable access. According to OpenAI spokesperson Taya Christianson, free tier users who generate approximately three images per day will follow similar restrictions to DALL-E 3, but these limits might change depending on user demand. OpenAI will keep DALL-E enthusiasts connected through a specialized custom GPT platform.

OpenAI’s research lead Gabriel Goh explained GPT-4o’s transformative quality by defining it as an “omnimodal” foundation with the capability to process text alongside images, audio, and video data. The model now demonstrates enhanced binding capabilities that effectively address common AI image generation challenges. GPT-4o successfully keeps relationships between up to 20 objects accurate without any color or shape mixing, unlike earlier models, which faced difficulties.

The most remarkable advancement in this model is its improved text rendering capability. AI-generated images typically display garbled or nonsensical text elements. Goh explained that the development required numerous months of iterative refinement to achieve the desired results. The team recognizes that perfect text rendering is not yet possible for small text elements but has reached reliable usability levels for text in images.

The system architecture differs from common image generators’ diffusion models because it uses an autoregressive technique. The sequential generation process of images from left to right and top to bottom resembles text generation procedures and is believed to enhance text rendering and binding abilities.

OpenAI demonstrated how their system can produce scientific illustrations with detailed labeling, such as Newton’s prism experiment, along with multi-panel comics featuring coherent characters and dialogue, and informational posters containing precise text. The demonstration included practical applications like creating transparent background images for stickers and restaurant menus, as well as logos.

Jackie Shannon, who leads the multimodal products at ChatGPT, highlighted the system’s capability of utilizing extensive world knowledge. She explained that when she creates images, she works within her own limitations but draws upon her extensive world knowledge. The model introduces global understanding into the process, which means you can request an image of Newton’s prism experiment and receive it without needing to provide any background information.

OpenAI believes the improved quality and capabilities of their image generation make the slight increase in processing time worthwhile. Shannon acknowledged that the system’s latency needs improvement, but stated that the superior quality of images, along with enhanced capabilities and world knowledge, compensates for users’ longer waiting times.

OpenAI addressed potential misuse worries by highlighting its commitment to strong protective measures. The system protects against watermark removal while preventing sexual deepfake creation and rejecting CSAM requests. Generated images will carry standard C2PA metadata, which identifies them as OpenAI products despite lacking visual watermarks. OpenAI operates its own internal tools to verify images.

Shannon stated that while no system achieves perfection in this area they persistently enhance their protective measures and consider this the initial phase. Users maintain ownership over all images produced with ChatGPT and can utilize these images freely within our established usage policies.

OpenAI has expanded the capabilities of its main product through “Images in ChatGPT” and pushed the limits of AI-generated creativity to offer users powerful visual tools for expression within their chat interface.

No More Gobbledygook: OpenAI’s New Image Model Produces Clear Text

Recent Posts

Google Ads

Hot Categories

Business

Education

Entertainment

Events

Investing

News

Sports

Technology

Tag