Tencent's hybrid team launched ArtifactsBench, the world's first comprehensive benchmark system dedicated to evaluating the quality of AI generated visual interaction code. The system includes 1825 real application scenario test tasks, which are innovatively evaluated from ten dimensions, such as functionality, aesthetics, and user experience, through actual running code, dynamic screenshots, multimodal AI judges, etc. The experimental results show that the consistency with the judgment of human experts is more than 90%, and the consistency with the industry's gold standard WebDev Arena is 94.4%, setting a new standard for AI code generation capability evaluation.