微软颠覆生产力：Copilot推自定义版，AI PC原生支持PyTorch，奥特曼预告新模型

1.8 million people are using Github Copilot, which has changed the world.

What will AI productivity look like in the future? The world is waiting for Microsoft's answer.

In the early morning of May 22, Microsoft Build 2024 Developers Conference was held in Seattle, the United States. Today, we released AI technology and new tools brought by AI.

"For more than 30 years, Microsoft has always had two dreams about computers - first, to let computers understand us, not to let us understand computers; second, in a world of increasing information, to let computers help us effectively reason, plan and act according to information. The wave of artificial intelligence has found the answer to our dream." Said Satya Nadella, CEO of Microsoft.

The content of today's Build conference is mainly the latest Copilot, which is applicable to the new form hardware of generative AI, and the tool stack using new AI capabilities.

Of course, in addition to massive application integration, cooperation with AI start-ups and hardware manufacturers, there is also a new AI model developed by Microsoft itself.

Copilot+PC， With dedicated end side model

Native support PyTorch

The first is about the new form of PC Copilot+PC. Microsoft said that the first models to be launched in 618 will be equipped with Qualcomm Snapdragon X series processors, and more such devices based on Intel and AMD processors will be launched later this year.

Since NPU has 40+TOPS AI computing power, the ability to run AI workloads has been increased by 20 times and the efficiency has been improved by 100 times for the version that has been launched. With such hardware foundation, AI PC is not as simple as accessing GPT-4o on the cloud. Windows now provides AI experience from three levels.

The Windows Copilot stack is now extended to Windows through the Windows Copilot Runtime. AI has transformed the system from the inside out, enabling developers to speed up AI development on Windows.

Nadella said that the Windows Copilot Runtime includes a set of APIs supported by more than 40 end AI models attached to Windows, including a small language model (SLM) called Phi Silica, which is designed for NPUs in the Copilot+PC. They will be used for tasks such as intelligent search, real-time translation, image generation and processing.

Microsoft said that Phi Silicon fully uses NPU for reasoning, and the output speed of the first token is 650 token/s, which only consumes about 1.5 watts of power, allowing the CPU and GPU to be used for other computing tasks at the same time. When running continuously, text generation reuses the KV cache in the NPU and runs on the CPU, generating about 27 tokens per second.

Microsoft has proposed Windows Semantic Index, a new operating system function that redefines search on Windows and supports new experiences such as Recall. Later, Microsoft will provide this function to developers who use the Vector Embedding API, so that people can build their own vector storage and RAG based on data in applications.

The new Copilot+PC also comes with a native AI framework and tool chain to facilitate developers to introduce their own end model to Windows. Microsoft officially announced that through DirectML, PyTorch and Web Neural Network, it will now run locally on Windows. This will provide developers with more tools available, allowing thousands of Hugging Face models to run on Windows. NPU can also help these tools complete tasks faster than ever before.

Just like DirectX for graphics processing, DirectML is a high-performance low-level API for machine learning in Windows. DirectML abstracts the different hardware provided by Microsoft's IHV partners for the Windows ecosystem, and supports GPU and NPU. CPU integration is also coming soon. It integrates relevant frameworks in the AI field, such as ONNX Runtime, PyTorch and WebNN.

In addition, Windows Subsystem for Linux (WSL) can run Windows and Linux workloads simultaneously, providing a platform for AI development on Windows. Developers can easily share files, GUI applications, GPUs, etc. between different environments without additional settings.

Yesterday, Microsoft introduced the new Windows 11 AI PC to the world. In addition to the powerful functions announced at the Build Developer Conference today, Windows is becoming a highly open AI platform and developer platform.

These evolutions seem to attract developers from Mac in an instant.

Continuous upgrade of Copilot

Start Volume Team Collaboration

Next is a series of AI productivity improvements for individuals and teams.

Focusing on AI software development capabilities, GitHub launched the first set of GitHub Copilot extensions developed by Microsoft and third-party partners, and has now opened an invited preview. New features allow developers and enterprises to customize their GitHub Copilot intelligent co pilot experience directly in the GitHub Copilot intelligent co pilot chat through their preferred services (such as Azure, Docker, Sentry, etc.).

As one of the extended functions launched by Microsoft, GitHub Copilot intelligent copilot for Azure shows how to use natural language and broader functions to improve development speed. With this extension, developers can explore and manage Azure resources, troubleshoot and find related logs and codes through Copilot's intelligent co pilot Chat.

At this conference, Microsoft showed how Copilot can improve the team collaboration and business efficiency of the organization. Nadella mainly introduced the following three upgrades:

Team Copilot extends Copilot beyond personal assistants to work for teams and improve collaboration and project management.
Agents: Custom Copilot enables customers to coordinate and automate business processes.
Copilot extensions and connectors make it easier to customize and extend Copilot to meet special business needs.

Team Copilot

Team Copilot makes Copilot not only a personal assistant, but also a valuable team member, participating and contributing with other members. Of course, you control the whole process, assign tasks or responsibilities to Copilot, and make the whole team work together to improve efficiency, collaboration and creativity.

Team Copilot can be used in Microsoft Teams, Microsoft Loop, Microsoft Planner and other collaborative applications.

Specifically, Team Copilot can play the following three roles.

First, the meeting host. Copilot makes the discussion in the meeting more productive by managing the agenda and taking notes jointly written in the meeting.

Group collaborators: Copilot helps everyone gain more from chatting, show the most important information, track action projects, and solve unresolved problems.

Project Manager: Copilot ensures the smooth progress of each project by creating and assigning tasks, tracking deadlines, and notifying team members when they need input.

These features will be available to customers with Microsoft Copilot for Microsoft 365 licenses in the preview version later in 2024.

Agents

Agents are new custom Copilots that can automate business processes. Each business process needs to improve efficiency and release new value, and each process is different.

Therefore, Microsoft announced that Microsoft Copilot Studio has introduced new functions, which can build custom Copilots and work independently as agents under the guidance of customers. The functions of Agents include the following:

Automate long running business processes
Reasoning operation and user input
Using memory to introduce context
Learning based on user feedback
Record exception requests and seek help

The following is a demonstration of creating a custom Copilot (i.e. Agents).

These Agents functions are available to customers in the Early Access Program.

In addition, Microsoft has further enriched the functions of Copilot through Copilot extensions and Copilot connectors.

With the new Copilot extension, anyone can easily customize Copilot operations and extend Copilot to their data and line of business systems. Developers can use Copilot Studio or Teams Toolkit for Visual Studio to build these extensions.

Microsoft also introduced the Copilot connector in the Copilot Studio, allowing developers to more easily and quickly create Copilot extensions.

This series of capabilities can make it easier for developers to incorporate AI into their products and service systems.

Small model Phi-3 family new

Multimodal Phi-3-Vision Appears

As the latest flagship model of OpenAI, GPT-4o can now be used in Azure AI Studio or as an API. This groundbreaking multimodal model integrates text, image and audio processing, setting a new standard for generative and conversational AI experiences.

The Phi-3 series of AI small language models (SLMs) developed by Microsoft has also released a new multimodal model, Phi-3-vision, which can be used in Azure.

Developers can experience these most advanced cutting-edge models in Azure AI Playground and start building and customizing models in Azure AI Studio.

As a global technology brand renowned for innovative smart phones and smart devices, OPPO is trying out Azure AI speech to text, Fast Transcription and Azure AI text to speech technologies on its new smart phones, with a view to bringing new experiences to customers.

There are four models in the Phi-3 model family. Each model is adjusted and developed according to Microsoft's responsible AI and security standards to ensure its direct use.

Phi-3-vision is a 4.2B parameter multimodal model with language and visual functions, supporting 128K context length.
Phi-3-mini is a language model with 3.8B parameters, supporting 128K and 4K context lengths.
Phi-3-small is a language model with 7B parameters, supporting 128K and 8K context lengths.
Phi-3-medium is a language model with 14B parameters, supporting 128K and 4K context lengths.

Among them, Phi-3-vision is the first multimodal model in the Phi-3 family. It combines text and image, and can infer images in the real world, as well as extract and infer text from images. The model is also optimized for chart understanding, which can be used to generate insights and answer questions.

Based on the language function of Phi-3-mini, Phi-3-vision continues to integrate powerful language and image reasoning quality in small models. As shown in the figure below, Phi-3-vision can generate insights from charts and diagrams.

Phi-3-small and Phi-3-medium outperform language models of the same size and larger scale.

Phi-3-small with 7B parameters defeated GPT-3.5 Turbo in various language, reasoning, coding and mathematical benchmarks.

The Phi-3-medium with 14B parameter continues this trend and performs better than Gemini 1.0 Pro.

The Phi-3-vision of 4.2B parameter performs better than larger models such as Claude-3 Haiku and Gemini 1.0 Pro V in general visual reasoning tasks, OCR, table and chart understanding tasks.

Of course, the Phi-3-vision model is open source.

Hugging Face Address: https://huggingface.co/microsoft/Phi-3-vision-128k-instruct

The release of this model has excited researchers, and some people have begun to imagine its application in the robot field.

From Phi-3 to Phi Silica, we can see that Microsoft's exploration of large models focuses on applications, which distinguishes its niche from OpenAI.

The strongest atmosphere group - OpenAI CEO Altman

After Microsoft officials announced a series of updates, OpenAI CEO Altman also came to the event site to show support. He encourages developers and start-ups to take advantage of the current AI boom, which he believes is the most exciting moment since the boom of mobile devices and even the emergence of the Internet.

In terms of models, Altman revealed that GPT-4o will become faster but cheaper in the future. He was also very happy to announce that the next big model will come out soon. Microsoft has built a larger supercomputer for this work (supercomputing on the scale of killer whales).

Altman suggests that new modes and overall intelligence will be the key to the next model of OpenAI. "The most important point, which sounds like the most boring point I can say... The model will become more and more intelligent, generally speaking, comprehensive intelligence."

However, he also pointed out that new technology can not let developers get rid of hard work. This work has yet to be completed. Developers must figure out how to make these technologies useful to people. He said that it is best not to forget that it is not easy to bring these new technologies into life.