Invention Title:

AI TEXT FOR VIDEO STREAMS

Publication number:

US20250119587

Publication date:
Section:

Electricity

Class:

H04N19/70

Inventors:

Assignee:

Applicant:

Smart overview of the Invention

The invention details a method and apparatus for video processing using artificial intelligence (AI). It involves receiving a text prompt from a user device which instructs an AI generative process to create images. These images are then encoded into a video bitstream, with the text prompt included as supplemental enhancement information (SEI) within the bitstream. This allows the text prompt to be signaled along with the encoded video, facilitating integration with AI applications.

Field of Application

This technology is aimed at video coding and decoding, specifically focusing on incorporating AI-generated content within coded video streams. It leverages existing standards for compressing video to reduce bandwidth and storage requirements while integrating AI text prompts for enhanced video applications. The invention addresses the need for efficient carriage of AI-generated imagery and related data within video streams.

Background

Video compression is essential due to the high data requirements of uncompressed digital video. Techniques like lossless and lossy compression are employed to manage bandwidth and storage, with lossy compression being prevalent in consumer applications. SEI messages are a key component in current video standards, allowing additional information to be carried alongside video data. However, existing protocols do not adequately support emerging AI applications that generate new imagery without embedding neural network models in the stream.

Summary of Invention

The method provides a system where a text prompt guides an AI generative process to create images, which are then encoded into a video bitstream. This bitstream includes SEI messages containing the text prompt, enabling seamless integration of AI-generated content with traditional video data. The apparatus includes memory and processors to manage this encoding process, facilitating conversion between visual media files and bitstreams containing AI instructions.

Detailed Description

The communication system involves multiple terminals connected via a network, supporting both unidirectional and bidirectional transmission of coded video data. This setup is applicable to various devices such as servers, PCs, and smartphones across different network types. The system also includes a streaming environment where captured uncompressed video is encoded and stored for retrieval by clients, who decode it for display. This architecture supports diverse applications like streaming, videoconferencing, and digital TV.