Tencent pushed a new graphics video model Follow Your Pose v2 to generate multi player sports videos
2024-06-11 11:47·Source: Home of webmaster
Message from webmaster's home (ChinaZ. com) on June 11:Tencent's hybrid team, together with Sun Yat sen University and Hong Kong University of Science and Technology, launched a newGraphic videoThe model is named "Follow Your Pose-v2".This model has achieved a leap from single person to multiple people in the field of video generation, and can handle group photos of people, so that all people can move in the video at the same time.
Main highlights:
Support multi person video action generation: realize the generation of multi person video action with less reasoning time.
Strong generalization ability: high-quality video can be generated regardless of age, clothing, race, background clutter or action complexity.
Daily life photos/videos are available: Daily life photos (including snapshots) or videos can be used for model training and generation without looking for high-quality pictures/videos.
Correctly handle the occlusion of people: in the face of the problem of multiple people blocking each other in a single picture, it can generate an occlusion picture with correct context.
Technical realization:
The model uses the "optical flow director" to introduce background optical flow information, which can generate stable background animation even when the camera is shaking or the background is unstable.
Through the "Inference Graph Guider" and "Depth Graph Guider", the model can better understand the spatial information of characters in the picture and the spatial position relationship of multiple characters, and effectively solve the problems of multi character animation and body occlusion.
Evaluation and comparison:
The team proposed a new benchmark Multi Character, which contains about 4000 frames of multi character videos to evaluate the effect of multi character generation.
The experimental results show that the performance of "Follow Your Pose-v2" is better than that of "Follow Your Pose-v2" on two public data sets (TikTok and TED speeches) and seven indicatorsnewestMore than 35% technology.
Application prospect:
Image to video generation technology has broad application prospects in film content production, augmented reality, game production, advertising and other industries, and is one of the AI technologies that will receive much attention in 2024.
Other information:
Tencent's hybrid team also announced the acceleration library of the large open-source model of cultural map (hybrid DiT), which greatly improved the reasoning efficiency and shortened the time of map generation by 75%.
The threshold for using the hybrid DiT model is lowered. Users can call the model with three lines of code in the official model library of Hugging Face.
This article introduces several important developments in the AI field: 1) Tencent launched the first art 3D generation model Hunyuan3D PolyGen, which significantly improved the modeling efficiency;2) Ali released the multi-mode large model HumanOmniV2, with an accuracy rate of 69.33%;3) Nail AI tables to handle thousands of tasks in one hour;4) Baidu PaddleOCR3.1 is upgraded in multilingual recognition and document translation;5) Microsoft launched Deep Research agent to automate research process;6) Hong Kong Polytechnic and OPPO jointly open source video ultra clear framework DLoRAL;7) Google's open source MCP toolbox simplifies the integration of AI and database;8) Win11 will introduce AI dynamic wallpaper function.These innovations show the breakthrough progress of AI in 3D generation, multimodal understanding, office efficiency, visual processing and other fields.
[AI Daily] Important developments in AI today: 1) Alibaba open-source audio generation model ThinkSound, which supports chained reasoning, realizes high fidelity spatial audio generation;2) Google Veo3 is upgraded to support still pictures to generate vivid videos;3) Hugging Face released SmolLM3, a small model with 3 billion parameters, and its performance is better than Llama-3.2-3B;4) Alibaba open-source network agent WebSailor, showing strong reasoning and retrieval capabilities;5) Moonvalley released the native 1080P video generation model Marey Realism v1.5;6) Vidu Q1 supports up to seven reference images to generate consistent videos;7) Apple
Alibaba's Gaode Map officially launched the innovative AR clocking function. Through in-depth integration of AI technology and map services, it provides users with an immersive clocking experience that integrates reality with fiction, and seamlessly connects to the real world.This function relies on cutting-edge AI space integration technology, breaks through the limitations of traditional two-dimensional maps, and pioneered 3D three-dimensional clocking.Users can open the latest version of Goddard Map App and click the number "in the upper right corner" to easily start the AR journey.It supports multiple forms of clocking such as pictures, videos, interesting AR props and virtual images. After clocking, the system will automatically generate and
Xunlei AV released a new version, focusing on the "download storage play" full chain service.The new version supports multiple network disks such as Alibaba Cloud disks, Baidu network disks, and NAS devices to integrate resources across platforms;It supports 100 video formats such as MP4/AVI/MKV and 4K/8K/HDR HD playback;The new smart caption matching function can automatically adapt the external subtitles.The product also optimizes the cross end synchronization experience and supports PC/tablet/TV multi end access to cloud resources.At present, users can enjoy the rights of super members for free for a limited time, including 10 privileges such as high-definition cloud broadcast and double speed broadcast.Xunlei said that it would continue to deepen ecological cooperation with online disk manufacturers, and improve digital content consumption experience through technological innovation.
The column of AI Daily summarized the recent important progress in the AI field: 1) byte beating open-source EX-4D framework, which can convert monocular video into multi view 4D video;2) Station B open source animation video generation model AniSora V3, which supports a variety of styles;3) DeepSWE+open source AI Agent system based on Qwen3-32B;4) Byte open source 300 million parameter image editing model VINCIE-3B;5) Stability AI launched the mobile audio generation model Stable Audio Open Small;6) Google releases free education AI tool kit Gemini for Education;7) Topview Launches the Revolutionary AI Digital Man with Cargo Technology Avatar
This article introduces the AI Daily column and several recent breakthroughs in the AI field: 1) Zhipu launched a free AI Slides tool to quickly generate high-quality PPT based on the GLM model;2) Keling AI releases the model shown in Figure 2.1, which supports the generation of more than 180 styles of images;3) NVIDIA introduced DiffusionRenderer technology to realize the conversion of video to editable 3D scenes;4) Ink knife AI adds the function of generating high fidelity prototype in 30 seconds;5) Higgsfield launched the Soul ID tool, and 10 photos can generate a virtual image;6) Google DeepMind open-source GenAI Processors tool library;7) Google Veo added image to video function;8) Mistral AI released Devstral2507 series models specially for code modeling.These innovations show the rapid development of AI in the fields of content generation, 3D modeling, product design, etc.
Keling AI announced the launch of a full range of video models with the "video sound effect" function. When using Keling AI for video creation, users can not only obtain high-quality video images, but also experience stereo sound effects that accurately match the video and have a sense of space
[AI Daily] Important developments in the AI field today: 1) Tencent Yuanbao is upgraded to support one sentence search and presentation of images and videos;2) WeChat payment MCP was launched, and the combination of AI and payment opened a new era of commerce;3) Google Veo3 video generation model is open to Pro/Ultra members, adding the function of "photo video generation";4) The reasoning efficiency of the open source DeepSeek R1 enhanced version was improved by 200%;5) Meitu WHEE launched the "one sentence map repair" function;6) The chip company Ambiq applied for an IPO in the United States, benefiting from the demand for generative AI;7) Kunlun World Wide Open Source Award Model Skywork Neward-V2;8) Kyutai releases ultra-low latency open source speech synthesis technology;9) Figma plans to enter the New York Stock Exchange with a valuation of 20 billion dollars;10) Byte skipping open source Trae Agent intelligent development tool.
On July 4, 2025, Shokz Shaoyin will shine at the ISPO Asian Sporting Goods Exhibition.OpenRun Pro2, its professional sports headset, relies on DualPitch™Innovative functions such as bone conduction technology, IP55 waterproof and 12 hour endurance were included in the list of "ISPO China Choice".At the same time, OpenDots ONE on display uses dynamic titanium arc design to achieve stable wear, and OpenFit2 uses dolphin arc ear hook to enhance comfort.Engraving service is provided on site to attract many audiences.Data shows that Shaoyin has topped the global sales list of sports headphones for two consecutive years, and will continue to lead the open headset market in China in 2024.Through technological innovation, the brand constantly breaks through the upper limit of bone conduction sound quality to meet the diversified needs of users.
LiblibAI launched Lovart Chinese version of "Star Agent", which is an AI design tool specially optimized for the Chinese market.On the basis of retaining the full link capability of Lovart's global version of "generate edit layout deliver", the product has been comprehensively restructured for Chinese semantics, national style aesthetics and local use scenarios.Starstream Agent supports the automation of the whole process from creative conception to design delivery, can handle cross modal creation such as graphic design, video generation and 3D modeling, and has a built-in visual style library that conforms to Chinese culture.The launch standard of the product