Tencent pushed a new graphics video model Follow Your Pose v2 to generate multi player sports videos
2024-06-11 11:47·Source: Home of webmaster
Message from webmaster's home (ChinaZ. com) on June 11:Tencent's hybrid team, together with Sun Yat sen University and Hong Kong University of Science and Technology, launched a newGraphic videoThe model is named "Follow Your Pose-v2".This model has achieved a leap from single person to multiple people in the field of video generation, and can handle group photos of people, so that all people can move in the video at the same time.
Main highlights:
Support multi person video action generation: realize the generation of multi person video action with less reasoning time.
Strong generalization ability: high-quality video can be generated regardless of age, clothing, race, background clutter or action complexity.
Daily life photos/videos are available: Daily life photos (including snapshots) or videos can be used for model training and generation without looking for high-quality pictures/videos.
Correctly handle the occlusion of people: in the face of the problem of multiple people blocking each other in a single picture, it can generate an occlusion picture with correct context.
Technical realization:
The model uses the "optical flow director" to introduce background optical flow information, which can generate stable background animation even when the camera is shaking or the background is unstable.
Through the "Inference Graph Guider" and "Depth Graph Guider", the model can better understand the spatial information of characters in the picture and the spatial position relationship of multiple characters, and effectively solve the problems of multi character animation and body occlusion.
Evaluation and comparison:
The team proposed a new benchmark Multi Character, which contains about 4000 frames of multi character videos to evaluate the effect of multi character generation.
The experimental results show that the performance of "Follow Your Pose-v2" is better than that of "Follow Your Pose-v2" on two public data sets (TikTok and TED speeches) and seven indicatorsnewestMore than 35% technology.
Application prospect:
Image to video generation technology has broad application prospects in film content production, augmented reality, game production, advertising and other industries, and is one of the AI technologies that will receive much attention in 2024.
Other information:
Tencent's hybrid team also announced the acceleration library of the large open-source model of cultural map (hybrid DiT), which greatly improved the reasoning efficiency and shortened the time of map generation by 75%.
The threshold for using the hybrid DiT model is lowered. Users can call the model with three lines of code in the official model library of Hugging Face.
11.11 Cloud Shenghui!Massive products, easy to cloud!The first year of ECS is 1.8% off, and you can get 3 months free if you buy one year!Super value offers and stable performance make your cloud journey more enjoyable.Come to Tencent Cloud to buy!
Tencent's hybrid team, together with Sun Yat sen University and the Hong Kong University of Science and Technology, announced the launch of a new graphics video model, "Follow your Pose-V2", to realize the multi person action drive of a single picture. Users only need to input a picture of a person and an action video, and the model can skillfully make the person in the picture dance with the action in the video, and the generated video length can reach 10 seconds.The model also has a strong generalization ability. It can easily cope with characters of different ages and clothes, scenes with messy backgrounds, or videos with complex actions
Since its debut, the fast-moving video generation model "Keling" has attracted wide attention at home and abroad for its amazing effect.On June 21, Keling re evolved and officially launched the graphics video function, which supports the generation of 5s video with any static image, and can match different text content to achieve rich visual narrative.Fasthand will release the latest big model technology and application strategy, including the panoramic view of the big model technology matrix, base model technology innovation, application and landing situation.
Tencent has opened an innovative model, V-Express, which can use portrait photos to generate videos.This technology balances different control signals through a series of gradually discarded operations, so that weak signals such as audio can be effectively used to achieve integrated control of attitude, input image and audio.The V-Express method will continue to be studied in depth and its application in a wider range of fields will be explored to promote the further development of portrait video generation.
Fast Hands has launched a new video generation model called "Kering", which adopts a technical route similar to Sora and combines several technological innovations developed by Fast Hands.This model can not only generate ultra long video with a resolution of 2 minutes, 30 fps and 1080p, but also support multiple aspect ratios to simulate the characteristics of the physical world and accurately model complex motion.Users who are interested in AI video creation can experience the function of the "Kering" big model in the Fast Movie APP.
SignLLM is an innovative multilingual sign language model, which has the ability to generate sign language video through text description.This technology is a great progress for hearing impaired people, because it can provide a new way of communication.Through this model, we can better serve the multicultural and linguistic communities and promote the barrier free exchange of information.
Welcome to the column of [AI Daily]!Here is a guide for you to explore the AI world every day. Every day, we present you hot content in the AI field, focus on developers, and help you to understand technology trends and innovative AI product applications.Click to learn about new AI products:https://top.aibase.com/1Hedra's Character-1 is open to use. Hedra's Character-1 is open to use, providing the creator with an artifact to generate speech and singing videos through text and pictures, and starting the creative revolution.The model is based on innovative data sets and simplified design, and can be completed in a short time
MotionFollower is an innovative technology that can copy the motion in one video to the character in another without changing the background and appearance of the second video.This technology has a wide range of application scenarios, and can be used in film production, advertising creation, game development and other fields.MotionFollower is expected to play an important role in film production, advertising creation, game development and other fields.
Welcome to the column of [AI Daily]!Here is a guide for you to explore the AI world every day. Every day, we present you hot content in the AI field, focus on developers, and help you to understand technology trends and innovative AI product applications.Click to learn about new AI products:https://top.aibase.com/1Alibaba pushed the video transfer tool Diffutoon, which is an AI tool jointly developed by Alibaba and East China Normal University. Diffutoon can transform realistic videos into various animation styles, making video production simple and interesting.Murati recognized
Fast Hands has launched a new domestic video generation model called Keling, which adopts a technical route similar to Sora and combines the technological innovation developed by Fast Hands.This model can generate ultra long video of up to 2 minutes, 30 fps and 1080p resolution, and supports multiple aspect ratios.Users who are interested in AI video creation can experience the functions of the Kering model in the Fast Movie APP.
Google DeepMind's Veo model is an innovative model that can generate video clips based on a single reference image.The user can adjust the visual style of the video by entering a text prompt to make it consistent with the original style.Her nails were painted purple, her little finger wore a gold ring, and her wrist had a small tattoo.
Pixelpost is an AI assisted application design tool, which provides services for start-ups, designers and companies.It uses advanced AI technology to generate design through user input prompts, provides pre designed components and templates, supports preview and test design in different device frameworks, and realizes rapid and efficient application interface design.The product supports iOS, iPad and Android platforms, has cloud storage and iCloud synchronization functions, and is convenient for users to design anytime and anywhere.
Feedback is a platform that uses AI technology to provide customer feedback analysis for the hotel industry.It transforms customer feedback into actionable information, helps hotels focus on what customers really care about, and reduces costs.Through real-time AI discussion, automated personalized response, advanced competitive analysis and other functions, Feedback has improved service quality, enriched customer experience, and provided strategic advantages for the hotel.
RAGElo is a toolset that uses the Elo scoring system to help select the best large-scale language model (LLM) agent based on retrieval enhanced generation (RAG).With the prototyping and integration of generative LLM in production becoming easier, evaluation is still the most challenging part of the solution.RAGElo calculates the ranking of different settings by comparing the answers of different RAG channels and prompts to multiple questions, providing a good overview of which settings are effective and which are not.
Scene is an online platform integrating web design, collaboration and publishing.It provides Muse, an AI assistant, to help users optimize website design, content creation and team collaboration throughout the design process from concept to deployment.The main advantages of Scene include simplifying the design process, improving efficiency, reducing costs, and improving user experience through AI technology.
OC component test is an online platform that allows users to input descriptions of themselves or friends, and the system will analyze and match similar well-known figures or roles.Through interesting interaction, the product helps users better understand their own or others' personality characteristics, and enhance self cognition and social communication.
Mypapers.ai is an alpha version of online academic paper management tool, which allows users to view more information, switch the display of papers and authors, track the research journey, and view relevant codes by clicking nodes.The tool is supported by Olark and has user feedback function. It is an innovative academic research auxiliary platform.
"Test who you write like" testurtext.site is an online tool to identify different writers' styles by analyzing text.It uses advanced algorithms and artificial intelligence technology to help users understand the writing style of text and compare it with the style of famous writers.This style testing tool is not only entertaining, but also provides inspiration and learning opportunities for writing enthusiasts.
DemoDazzle is an AI driven demonstration platform using OpenAI high-level language model, which aims to automate the demonstration and guidance process of various products and services.The platform provides real-time AI conversations and question answers to improve user experience and satisfaction by creating customized virtual images.The main advantages of the product include intelligence, personalization and efficiency.DemoDazzle is about to go online and is currently in test mode.
Glif AI application is a platform integrating a variety of AI creative tools. Users can find and build various AI driven image generators, cartoon generators, character generators, etc. here.These tools use the latest AI technology, such as neural network, to provide users with a new way of creation, so that even users without professional design background can easily create high-quality images and works of art.
Ozone is a product that uses cloud technology and artificial intelligence to simplify the video editing process.It helps users save time by providing real-time collaboration, automatic caption generation, music generation, text to video conversion and other functions, focusing on the narrative of creative stories rather than tedious editing tasks.The vision behind Ozone is to make content creation equal to story telling. Through cloud video streaming technology and AI, we can develop creative tools to cooperate with users rather than fight against them.
EasyClips is an AI tool that focuses on helping Twitch anchors and content creators quickly discover and generate live highlights.It uses advanced algorithms to analyze live content and automatically extract wonderful clips, thus saving the time of creators and improving the efficiency of content production.The main advantages of the product include no need for manual search, one click generation of multiple highlights, support for sharing on multiple social platforms, etc.EasyClips aims to help anchors increase audience participation and expand social media influence.
RegEx Helper is an AI driven online tool designed to help users quickly generate regular expressions.It automatically generates matching regular expressions through user description of requirements, which simplifies the creation and management of regular expressions in the programming process, especially for novice programmers or developers who need to quickly verify regular expressions.
TTSMaker is an online text to speech platform, which can easily convert text into audio through AI artificial intelligence algorithm.It supports more than 50 languages and more than 300 voice pack styles, and is suitable for video dubbing, audio books, education and training, product marketing and other scenarios.Users can use TTSMaker to synthesize voice for free, and have 100% copyright of the synthesized audio file, which can be used for any legitimate commercial purpose.
Anthropic's prompt library is an online platform that provides optimized prompts for various business and personal tasks.It helps users complete tasks more efficiently and improve work efficiency through prompts submitted by users.The platform supports a variety of task types, from programming, writing to business analysis, and is a multi-functional auxiliary tool.
Nocket.io is a browser plug-in designed to simplify web bookmarking, highlighting and notes.It is seamlessly integrated with Notion to help users transform web content, inspiration and ideas into creative output.Save bookmarks, highlights, comments, and AI summaries in Notion.The main advantages of the product include: one click to save web pages, highlight important content, quickly record ideas, evaluate the usefulness of content through star ratings, and seamlessly synchronize with Notion.In addition, Nocket.io provides basic and professional subscription schemes to meet the needs of different users.
Imagetoprompts is a website that uses AI technology to transform users' favorite pictures into prompt words. Users can sell these unique prompt words on promptbase.com and start to earn profits.This technology not only provides a new way of creative expression, but also is simple to operate and has great potential.
Pygma is an AI assistant focusing on social media management. It provides personalized services through dialogue, helps users optimize the planning, generation and publishing of Instagram content, and improves the influence of social media.
Ad Intel is an online platform that provides insight into advertising creativity. It aims to help users obtain action suggestions supported by data by analyzing competitors' advertisements.It identifies, tracks and analyzes the successful advertisements of competitors in an automated way, thus saving users' search time in the advertising library, reducing the advertising expenditure of blind testing, and providing operable suggestions to improve the return on investment (ROAS) of advertising.The platform was developed by the MadMen AI team in San Francisco, California, and released by Sesame Labs in 2024.
Wendy is a mental health companion application designed for iPhone, which aims to help users achieve psychological adaptability by providing personalized counseling, non judgmental environment and measurable results.It emphasizes the importance of mental health, provides 24/7 instant support, and constantly adjusts the support mode with the change of user needs.Wendy is not a substitute for dealing with emergencies or providing clinical advice, but suggests users to seek professional medical help in emergencies.
Revid.ai is an AI driven video production platform. It makes scripts and generates eye-catching visual content by analyzing millions of viral videos, and optimizes content to achieve maximum impact.Whether it's making product demos, explaining videos or social media ads, revid.ai can help users create content that continues to attract audiences.