home page  >  industry  >  key word  >  Latest information of Tutor Video  >  text

Tencent pushed a new graphics video model Follow Your Pose v2 to generate multi player sports videos

2024-06-11 11:47 · Source: Home of webmaster

Message from webmaster's home (ChinaZ. com) on June 11: Tencent's hybrid team, together with Sun Yat sen University and Hong Kong University of Science and Technology, launched a new Graphic video The model is named "Follow Your Pose-v2". This model has achieved a leap from single person to multiple people in the field of video generation, and can handle group photos of people, so that all people can move in the video at the same time.

Main highlights:

  • Support multi person video action generation: realize the generation of multi person video action with less reasoning time.

  • Strong generalization ability: high-quality video can be generated regardless of age, clothing, race, background clutter or action complexity.

  • Daily life photos/videos are available: Daily life photos (including snapshots) or videos can be used for model training and generation without looking for high-quality pictures/videos.

  • Correctly handle the occlusion of people: in the face of the problem of multiple people blocking each other in a single picture, it can generate an occlusion picture with correct context.

 image.png

Technical realization:

The model uses the "optical flow director" to introduce background optical flow information, which can generate stable background animation even when the camera is shaking or the background is unstable.

Through the "Inference Graph Guider" and "Depth Graph Guider", the model can better understand the spatial information of characters in the picture and the spatial position relationship of multiple characters, and effectively solve the problems of multi character animation and body occlusion.

Evaluation and comparison:

The team proposed a new benchmark Multi Character, which contains about 4000 frames of multi character videos to evaluate the effect of multi character generation.

The experimental results show that the performance of "Follow Your Pose-v2" is better than that of "Follow Your Pose-v2" on two public data sets (TikTok and TED speeches) and seven indicators newest More than 35% technology.

Application prospect:

Image to video generation technology has broad application prospects in film content production, augmented reality, game production, advertising and other industries, and is one of the AI technologies that will receive much attention in 2024.

Other information:

Tencent's hybrid team also announced the acceleration library of the large open-source model of cultural map (hybrid DiT), which greatly improved the reasoning efficiency and shortened the time of map generation by 75%.

The threshold for using the hybrid DiT model is lowered. Users can call the model with three lines of code in the official model library of Hugging Face.

Address: https://arxiv.org/pdf/2406.03035

Project page: https://top.aibase.com/tool/follow-your-pose

report

  • Related recommendations
  • Recommendation AI Daily: Tencent HYBRID pushed 3D to generate a big model Hunyuan3D PolyGen; Nail AI table is coming; Alibaba pushed the multi-modal big language model HumanOmniV2

    This article introduces several important developments in the AI field: 1) Tencent launched the first art 3D generation model Hunyuan3D PolyGen, which significantly improved the modeling efficiency; 2) Ali released the multi-mode large model HumanOmniV2, with an accuracy rate of 69.33%; 3) Nail AI tables to handle thousands of tasks in one hour; 4) Baidu PaddleOCR3.1 is upgraded in multilingual recognition and document translation; 5) Microsoft launched Deep Research agent to automate research process; 6) Hong Kong Polytechnic and OPPO jointly open source video ultra clear framework DLoRAL; 7) Google's open source MCP toolbox simplifies the integration of AI and database; 8) Win11 will introduce AI dynamic wallpaper function. These innovations show the breakthrough progress of AI in 3D generation, multimodal understanding, office efficiency, visual processing and other fields.

  • Recommendation AI Daily: Alibaba Tongyi open source audio generation model ThinkSound; Google Veo3 branch image generation video; Kunlun Wanwei Releases Skywork-R1V 3.0

    [AI Daily] Important developments in AI today: 1) Alibaba open-source audio generation model ThinkSound, which supports chained reasoning, realizes high fidelity spatial audio generation; 2) Google Veo3 is upgraded to support still pictures to generate vivid videos; 3) Hugging Face released SmolLM3, a small model with 3 billion parameters, and its performance is better than Llama-3.2-3B; 4) Alibaba open-source network agent WebSailor, showing strong reasoning and retrieval capabilities; 5) Moonvalley released the native 1080P video generation model Marey Realism v1.5; 6) Vidu Q1 supports up to seven reference images to generate consistent videos; 7) Apple

  • AR clocking function of Gaode Map online: realize 3D three-dimensional clocking to generate exclusive clocking video

    Alibaba's Gaode Map officially launched the innovative AR clocking function. Through in-depth integration of AI technology and map services, it provides users with an immersive clocking experience that integrates reality with fiction, and seamlessly connects to the real world. This function relies on cutting-edge AI space integration technology, breaks through the limitations of traditional two-dimensional maps, and pioneered 3D three-dimensional clocking. Users can open the latest version of Goddard Map App and click the number "in the upper right corner" to easily start the AR journey. It supports multiple forms of clocking such as pictures, videos, interesting AR props and virtual images. After clocking, the system will automatically generate and

  • The new upgrade of Xunlei AV supports decoding and playing of multiple network disks and 100 video formats

    Xunlei AV released a new version, focusing on the "download storage play" full chain service. The new version supports multiple network disks such as Alibaba Cloud disks, Baidu network disks, and NAS devices to integrate resources across platforms; It supports 100 video formats such as MP4/AVI/MKV and 4K/8K/HDR HD playback; The new smart caption matching function can automatically adapt the external subtitles. The product also optimizes the cross end synchronization experience and supports PC/tablet/TV multi end access to cloud resources. At present, users can enjoy the rights of super members for free for a limited time, including 10 privileges such as high-definition cloud broadcast and double speed broadcast. Xunlei said that it would continue to deepen ecological cooperation with online disk manufacturers, and improve digital content consumption experience through technological innovation.

  • Recommendation Daily report A: Station B upgraded AniSora V3 animation video generation model; Byte open source 4D video generation framework EX-4D; DeepSWE Open Source AI Agent System Hits the Top

    The column of AI Daily summarized the recent important progress in the AI field: 1) byte beating open-source EX-4D framework, which can convert monocular video into multi view 4D video; 2) Station B open source animation video generation model AniSora V3, which supports a variety of styles; 3) DeepSWE+open source AI Agent system based on Qwen3-32B; 4) Byte open source 300 million parameter image editing model VINCIE-3B; 5) Stability AI launched the mobile audio generation model Stable Audio Open Small; 6) Google releases free education AI tool kit Gemini for Education; 7) Topview Launches the Revolutionary AI Digital Man with Cargo Technology Avatar

  • Recommendation AI Daily: Zhipu online PPT generation function AI Slides; Keling AI releases the model shown in Figure 2.1

    This article introduces the AI Daily column and several recent breakthroughs in the AI field: 1) Zhipu launched a free AI Slides tool to quickly generate high-quality PPT based on the GLM model; 2) Keling AI releases the model shown in Figure 2.1, which supports the generation of more than 180 styles of images; 3) NVIDIA introduced DiffusionRenderer technology to realize the conversion of video to editable 3D scenes; 4) Ink knife AI adds the function of generating high fidelity prototype in 30 seconds; 5) Higgsfield launched the Soul ID tool, and 10 photos can generate a virtual image; 6) Google DeepMind open-source GenAI Processors tool library; 7) Google Veo added image to video function; 8) Mistral AI released Devstral2507 series models specially for code modeling. These innovations show the rapid development of AI in the fields of content generation, 3D modeling, product design, etc.

  • The "video sound effect" function of Keling AI series model can synchronously generate high-quality stereo sound effect

    Keling AI announced the launch of a full range of video models with the "video sound effect" function. When using Keling AI for video creation, users can not only obtain high-quality video images, but also experience stereo sound effects that accurately match the video and have a sense of space

  • Recommendation AI Daily: Tencent Yuanbao upgraded a sentence to search for pictures and videos; WeChat payment MCP online; Google launches Veo 3 globally

    [AI Daily] Important developments in the AI field today: 1) Tencent Yuanbao is upgraded to support one sentence search and presentation of images and videos; 2) WeChat payment MCP was launched, and the combination of AI and payment opened a new era of commerce; 3) Google Veo3 video generation model is open to Pro/Ultra members, adding the function of "photo video generation"; 4) The reasoning efficiency of the open source DeepSeek R1 enhanced version was improved by 200%; 5) Meitu WHEE launched the "one sentence map repair" function; 6) The chip company Ambiq applied for an IPO in the United States, benefiting from the demand for generative AI; 7) Kunlun World Wide Open Source Award Model Skywork Neward-V2; 8) Kyutai releases ultra-low latency open source speech synthesis technology; 9) Figma plans to enter the New York Stock Exchange with a valuation of 20 billion dollars; 10) Byte skipping open source Trae Agent intelligent development tool.

  • From sports to the whole scene, see how Shaoyin leads the field of open headphones from Shanghai ISPO

    On July 4, 2025, Shokz Shaoyin will shine at the ISPO Asian Sporting Goods Exhibition. OpenRun Pro2, its professional sports headset, relies on DualPitch Innovative functions such as bone conduction technology, IP55 waterproof and 12 hour endurance were included in the list of "ISPO China Choice". At the same time, OpenDots ONE on display uses dynamic titanium arc design to achieve stable wear, and OpenFit2 uses dolphin arc ear hook to enhance comfort. Engraving service is provided on site to attract many audiences. Data shows that Shaoyin has topped the global sales list of sports headphones for two consecutive years, and will continue to lead the open headset market in China in 2024. Through technological innovation, the brand constantly breaks through the upper limit of bone conduction sound quality to meet the diversified needs of users.

  • LiblibAI launches "Starstream Agent": Chinese semantics+3D all-around generation of pictures and videos, the strongest design agent in China

    LiblibAI launched Lovart Chinese version of "Star Agent", which is an AI design tool specially optimized for the Chinese market. On the basis of retaining the full link capability of Lovart's global version of "generate edit layout deliver", the product has been comprehensively restructured for Chinese semantics, national style aesthetics and local use scenarios. Starstream Agent supports the automation of the whole process from creative conception to design delivery, can handle cross modal creation such as graphic design, video generation and 3D modeling, and has a built-in visual style library that conforms to Chinese culture. The launch standard of the product

Hot text

  • 3 days
  • 7 days