hotspot:

    Intel disclosed the reasoning performance of Xeon 6 processor for Meta Llama 3 model

    [Original by Zhongguancun Online] Author: eleven

    Recently, Meta launched its big open source Meta Llama3 model with 8 billion and 70 billion parameters. The model introduces new functions such as improved reasoning and more model dimensions, and uses a new tokenizer, aiming to improve the efficiency of coding language and improve the performance of the model.

    At the first time of model release, Intel verified that Llama3 can be used in processor And disclosed the reasoning performance of the upcoming Intel Xeon 6 performance core processor (code named Granite Rapids) for Meta Llama 3 model.

    Intel Xeon processors can meet the requirements of demanding end-to-end AI workloads. Taking the fifth generation Xeon processor as an example, each core has a built-in AMX acceleration engine, which can provide excellent AI reasoning and training performance. So far, this processor has been adopted by many mainstream cloud service providers. Moreover, Xeon processors can provide lower latency and simultaneously handle multiple workloads when performing general computing.

    In fact, Intel has been continuously optimizing the large model reasoning performance of Xeon platform. For example, compared with the Llama 2 model Software , PyTorch and Intel PyTorch Expansion Pack( Intel Extension for PyTorch). This optimization is implemented in parallel with the Paged Attention algorithm and tensor, because it can maximize the available computing power and memory bandwidth. The figure below shows the inference performance of the Meta Lama 3 model with 8 billion parameters on the AWS m7i.metal-48x instance, which is based on the fourth generation Intel Xeon scalable processor.

     Intel disclosed the reasoning performance of Xeon 6 processor for Meta Llama 3 model

    Figure 1: Next Token Delay of Llama 3 on AWS Instance

    In addition, Intel also disclosed for the first time the performance test of the upcoming Intel Xeon 6 performance core processor (code named Granite Rapids) for Meta Llama 3. The results show that compared with the fourth generation Xeon processor, the delay of the Llama 3 reasoning model with 8 billion parameters of the Intel Xeon 6 processor is reduced by two times, and it can run a reasoning model with larger parameters such as Llama 3 with 70 billion parameters on a single two-way server with a token delay of less than 100 milliseconds.

     Intel disclosed the reasoning performance of Xeon 6 processor for Meta Llama 3 model

    Figure 2: Next Token Delay of Llama 3 Based on Intel Xeon 6 Performance Core Processor (Code named Granite Rapids)

    Considering that Llama 3 has a more efficient Tokenizer, the test uses a randomly selected prompt to quickly compare Llama 3 and Llama 2. With the same prompt, the number of tokens marked by Llama 3 is 18% less than that of Llama 2. Therefore, even though the Llama 3 model with 8 billion parameters is higher than the Llama 2 model with 7 billion parameters, when BF16 reasoning is run on the AWS m7i.metal-48xl instance, the reasoning delay of the overall prompt is almost the same (in this assessment, Llama 3 is 1.04 times faster than Llama 2).

    This article is an original article. If it is reproduced, please indicate the source: Intel disclosed the reasoning performance of Xeon 6 processor for Meta Llama 3 model https://server.zol.com.cn/867/8672608.html

    server.zol.com.cn true https://server.zol.com.cn/867/8672608.html report one thousand nine hundred and thirty-eight Recently, Meta launched its big open source Meta Llama3 model with 8 billion and 70 billion parameters. The model introduces new functions such as improved reasoning and more model dimensions, and uses a new tokenizer, aiming to improve the efficiency of coding language and improve the performance of the model. At the first time of model release, Intel verified that Llama3 can be used in a variety of applications, including Intel Xeon processors
    Prompt: Support the "←→" key on the keyboard to turn pages read the whole passage
    Text navigation
    • Page 1: Intel Discloses Xeon 6 Performance
    • Guess you like it
    • newest
    • selected
    • relevant
    Weekly follow ranking
    • product
    • brand
    Recommended Q&A
    put questions to
    zero

    Download ZOL APP
    See the latest hot products in seconds

    Content error correction