US20250342698
2025-11-06
Physics
G06V20/44
The patent application details a system and method for creating structured reports from video footage using artificial intelligence. The system processes video inputs by extracting frames, identifying objects, and adjusting their importance based on contextual relevance. It employs a Long Short-Term Memory (LSTM) network to analyze temporal patterns and integrates spatial data from feature point identification and geomapping techniques. The system's event detection modules identify key actions, while scene understanding and semantic segmentation provide environmental context and pixel-level detail.
The process begins with frame extraction and pre-processing to enhance accuracy for downstream analyses. Techniques such as normalization, noise reduction, and frame stabilization are applied. Object detection is performed using AI models like YOLO or SSD to identify key objects within each frame. Object tracking algorithms like SORT or DEEP SORT maintain continuity across frames by assigning persistent IDs to objects, even when they move or are temporarily obscured.
The system conducts object importance adjustment by aggregating metadata on object appearances and behaviors. A neural network adjusts the importance of objects based on their relevance to the context. The processed data feeds into an LSTM network that analyzes the temporal progression of events and interactions, integrating inputs from object tracking, feature point identification, geomapping, and Gaussian splatting. This enables tracking of subtle changes in behavior within a three-dimensional space.
Outputs from the LSTM, event detection, scene understanding, and semantic segmentation components feed into a large language model (LLM). This LLM generates a coherent natural language description of events in the video, capturing temporal progression and spatial relationships. Another LLM formats the narrative according to specific organizational templates, ensuring compliance with standards such as those required by police departments.
The system provides an interface for users to review and edit the generated report before submission. Once finalized, the report is submitted to the organization's database. The system evaluates AI components for effectiveness and alignment with organizational requirements, including behavioral analysis to identify teachable moments for future training.