Project "German AI (Artificial Intelligence) startup offers next generation video understanding AI to analyze videos on a new level and is searching for industry partners"

Project objectives:

Summary

A German Artificial Intelligence startup has developed an innovative AI for video understanding. The AI combines state-of-the-art Computer Vision and Natural Language Processing to analyze video on various levels for a deep understanding of the content. The vision is to create a European standard General Purpose Technology for video understanding. The startup is looking for industry partners to conclude commercial agreements with technical assistance, research or technical agreements.

Description

In various industries, the real problem with creating value with AI is the lack of a suitable data basis. For example, media companies produce a lot of content in complex data formats (especially video), unstructured, and not sufficiently documented with meta information. Thus, they often have no overview of their content or transparency within their archives, and many potentials remain unused. At the moment, the best practice is to use the human workforce (up to 50 archivers) to annotate videos, which is very slow, expensive, and not scalable. Still, due to the overwhelming amount of content produced, they can only annotate a small fraction of their videos manually.

This is why the AI startup, based in the South of Germany, developed a novel AI-based video analysis platform for video assets. It utilizes state-of-the-art video captioning techniques to create meaningful, semantically deep annotations to describe what is actually happening in the videos in full sentences. The platform is based on a human-in-the-loop approach, where the AI annotates the videos and the human only refines the ones, where the AI was not perfect. Based on this human input, the AI constantly learns and improves itself, which is specialized to the customer's own, individual needs.

Compared to other solutions, the Unique Selling Point (USP) is that it is not only extracting simple objects, faces, etc., but rather understanding the multi-modal context of the video, and summarizing it in multiple scenes. By taking multiple views (like image and speech) into account, the AI can describe the actual gist, and in doing so also surpasses all current state-of-the-art video captioning models. Furthermore, instead of trying to replace the human with an out-of-the-box AI, he is integrated into an innovative, AI-supported workflow, to make his tasks more natural for humans, and to let the AI continuously learn and adapt to human-level intelligence.

The startup is interested in research and technical cooperation agreements as well as in a commercial agreement with technical assistance with partners from the industry who want to apply, further develop and leverage this technology to their industry-specific use cases. Especially, it would be important that the cooperation partner provides a relevant video dataset, in the best case already annotated with relevant labels.

Advantages & innovations

The solution has three innovative USPs: 1. Increased efficiency through semi-automation: The human-in-the-loop approach combines the best of both worlds: high-quality human assessment with scalable AI automation. 2. Improved quality through context understanding: The AI leverages both visual and audio information to understand the video content better than any other AI model. 3. Fast adaptivity through continual learning: The AI continuously adapts to individual needs, and the processes get more and more efficient over time. This combination of USPs enables five distinct advantages: 1. Accelerated, scalable video tagging processes: The AI can accelerate the process close to 1:1 – resulting in a 10x faster tagging process. This frees up the time of the human labeler to let him focus on more important work. 2. Transparency over video archive: One can easily scale up the AI and apply it up to the whole video archive, leveraging the other 90% of created content and bringing a new level of transparency, which allows searching any piece of content effectively. 3. Cost reduction for content creation: With a well-annotated video archive, they can easily find the perfect scene or clip to re-use and enhance content creation. Also, they can create new products and services, like automatic video descriptions for visually impaired people. 4. Sustainable and increasing AI capabilities: The AI gets better on a daily basis by continuously watching and learning from the users. This is crucial in the fast-paced world of content creation and keeps raising the number of videos that can be handled. 5. Network effect between organizations: The AI can share learnings between different users, departments, and even whole organizations, to create synergies with the combined knowledge of everyone - without sharing their actual content.

Stage of development

Available for demonstration

Contact / source: Detail (europa.eu)

IF YOU ARE INTERESTED IN KNOWING MORE ABOUT THIS PROJECT PLEASE REGISTER AS FREE MEMBER OR LOGIN IF ALREADY REGISTERED