AI&Big Data

Topping

Yingxin

Blogged

05/29 15:13

Currently, they are in the heap length window. Do you still need RAG?

Since the release of Google's Gemini 1.5 Pro, there have been many people in the industry behind the "cricket" RAG. On the one hand, Gemini's performance is really outstanding. According to the official technical report, Gemini 1.5 Pro can stably process up to 100 tokens, equivalent to one hour of video, 11 hours of audio, more than 30000 lines of code or 700000 words, and the processing limit is 10 million tokens, equivalent to the trilogy of The Lord of the Rings, setting a record for the longest context window. With its super long context understanding ability, Gemini 1.5 Pro has been recognized by many users. Many have tested Gemini 1.5 Pro Expand more

comment one Let me comment Close comments

Collection eight

fabulous two

Stow

Topping

CloudWeGo

Blogged

04/26 17:51

Cloud primordial ✖️ The best practice of microservice architecture in the AI era - CloudWeGo Technology Salon · Shanghai Station was opened

##Activity Introduction Since CloudWeGo was open source for more than two years, the community has developed rapidly and the ecology has become increasingly rich. There have been more than * * 40 * * landing enterprise users, covering * * AI, e-commerce, finance, games * *, the Internet and other industries. At the same time, with the continuous and vigorous development of cloud native technology and AI technology, we find that enterprise users are also facing more and more challenges in terms of performance, cost and stability. The system needs to support elastic scaling and stability under tidal traffic, so it also needs a set of high-performance, easy to expand, and feature rich microservice architecture. We sincerely invite enterprise users and developers to participate in CloudWeGo Expand more

comment one Let me comment Close comments

Collection zero

fabulous zero

Stow

Topping

OneFlow Deep Learning Framework

Blogged

05/15 08:17

Maximizing the Effective Throughput of LLM Serving

Today's LLM applications have diverse delay requirements. For example, a chat robot may need a fast initial response (for example, less than 0.2 seconds), but only needs to match human reading speed in decoding speed, while code completion requires a fast end-to-end generation time to achieve real-time code recommendations. This paper shows that the existing serving system with optimized throughput is not the optimal choice under the delay standard. The author advocates using good throughput, that is, the number of requests completed per second that meet the service level objective (SLO), as an improvement indicator to measure the performance of LLM serving, in order to consider the cost Expand more

comment zero Let me comment Close comments

Collection three

fabulous zero

Stow

Topping

RWKV Yuanshi Intelligent

Blogged

05/24 18:38

RWKV model local deployment tool Ai00 Server nanny level tutorial

#Learn about Ai00 # # Introduction Ai00 Server is a RWKV language model reasoning API server based on the web rwkv reasoning engine. It is also an open source software based on MIT protocol, and is composed of RWKV open source community members [@ cryscan]（ https://github.com/cryscan ）And [@ Gu Zhenniu]（ https://github.com/cgisky1980 ）Developed by the Ai00-x development team. Ai00 Server supports Vulkan as the reasoning backend, supports Vulkan parallel and concurrent batch reasoning, and can run on all GPUs that support Vulkan. In fact, Ai00 Server supports most Expand more

comment zero Let me comment Close comments

Collection zero

fabulous zero

Stow

Topping

Kangaroo cloud number stack

Blogged

05/30 10:18

Deeply explore the principle and practice of token bucket current limiting

In today's Internet era, with the increasing number of users and requests, the performance and stability of the system are facing enormous challenges. [Current limiting algorithm]（ https://www.dtstack.com/dtinsight?src=szsm ）As one of the important means to ensure system stability, it is widely used in various services and applications. The core purpose of flow restriction is to limit the number of requests within a certain time window, maintain the availability and stability of the system, and prevent the slow operation or downtime of the system caused by traffic explosion# There are four kinds of common flow limiting algorithms compared with common flow limiting algorithms: ● Token Bucket Expand more

comment zero Let me comment Close comments

Collection zero

fabulous zero

Stow

Topping

HuggingFace

product manager

Blogged

05/15 18:30

Introduction to Idefics2: a powerful 8B visual language model for the community

We are pleased to release Idefics2 here, which is a general multimodal model that accepts any text sequence and image sequence as input and generates text accordingly. It can be used to answer image related questions, describe visual content, create stories based on multiple images, extract information from documents, and perform basic arithmetic operations. Idefics2 https://hf.co/HuggingFaceM4/idefics2-8b Idefics2 is improved by Idefics1. Its parameter quantity is 8B. It has an open license (Apache 2.0) and greatly enhanced OCR (Optical Character Recognition) function, so it is expected to become a solid multimodal community Expand more

comment zero Let me comment Close comments

Collection one

fabulous zero

Stow

Topping

HuggingFace

product manager

Blogged

05/17 10:10

PaliGemma officially released - Google's latest cutting-edge open visual language model

PaliGemma is a new generation of visual language model family launched by Google, which can receive image and text input and generate text output. The Google team has launched three types of models: pre training (PT) model, hybrid model and fine-tuning (FT) model. These models have different resolutions and provide a variety of precision for use. All models are published in the model library of Hugging Face Hub, equipped with model description and license, and support transformers integration. What is PaliGemma? PaliGemma (Github) is a series of models with visual and language processing capabilities Expand more

comment zero Let me comment Close comments

Collection two

fabulous zero

Stow

Topping

HuggingFace

product manager

Blogged

05/24 19:30

Use Intel Gaudi 2 and Xeon CPU to build cost-effective enterprise RAG applications

Retrieval Augmented Generation (RAG) can incorporate fresh domain knowledge stored in external databases into the big language model to enhance its text generation capability. It provides a way to separate the company data from the knowledge learned from the language model during training, which helps us to make an effective compromise between performance, accuracy, security and privacy. Through this article, you will learn how Intel can help you develop and deploy RAG applications through the enterprise AI open platform OPEA open source project. You will also learn about Intel Gaudi 2 AI accelerator and Xeon CPU through real RAG use cases Expand more

comment zero Let me comment Close comments

Collection zero

fabulous zero

Stow

Topping

Huawei Cloud Developer Alliance

Blogged

05/28 11:48

Teach you how to implement MindSpot model training based on Huawei Cloud

This article is shared from Huawei Cloud Community's MindSpot Huawei Cloud Model Training of [Ascension and Development Process]. The author is addicted to sk. Foreword Learn how to install and configure Huawei Cloud ModelArts and the development board Atlas 200I DK A2. And get through the whole process from Ascend910 training to Ascend310 reasoning. Training Phase A. Environment Building MindSpot Huawei Cloud Model Training Step 1 Create OBS Parallel File Login Huawei Cloud ->Console ->Select "Object Storage Service OBS" in the left navigation bar ->Select "Bucket List" in the left navigation bar ->Click "Create Bucket" in the upper right corner, as shown in the following figure: On the left Expand more

comment zero Let me comment Close comments

Collection zero

fabulous zero

Stow

Topping

Huawei Cloud Developer Alliance

Blogged

05/29 11:01

This article teaches you how to call Ascend C operator

This article is shared by Shengteng CANN, the author of "One Text Teaches You How to Call Ascend C Operator" in Huawei Cloud Community. Ascend C is a programming language launched by CANN for operator development scenarios. It natively supports C and C++standard specifications, and has both development efficiency and running performance. The operator program written based on Ascend C runs on the Shengteng AI processor through compiler compilation and runtime scheduling. With Ascend C, developers can efficiently implement customized innovative algorithms based on Shengteng AI hardware. This article focuses on how to call custom operator verification after completing the development and deployment of custom operator based on Ascend C operator programming language Expand more

comment zero Let me comment Close comments

Collection zero

fabulous zero

Stow

Topping

Baihai_IDP

Blogged

05/29 10:23

A new paradigm of human-computer cooperation? AI Agents "sheepskin roll" for everyone

>* * Editor's Note: * * The current popular big language model and retrieval enhancement generation model have made breakthrough progress in language understanding and content generation, but there are still many limitations. They lack the ability to guide behavior according to goals, continue learning and interact with the environment, and are difficult to cope with the requirements of complex and changeable real scenes.>> In this article brought to you today, the author's point of view is that the field of AI is moving towards the development of more intelligent and autonomous AI Agent systems, which will completely change the way we use AI.>> The author believes that the future of AI will be more intelligent and autonomous Expand more

comment zero Let me comment Close comments

Collection zero

fabulous zero

Stow

Topping

Huawei Cloud Developer Alliance

Blogged

05/27 10:24

Explain the principle of attention mechanism and teach you to use Python to implement the deep learning model

This article is shared by Echo_Wish, the author of "Implementing the Deep Learning Model with Python: Attention Mechanism" in the Huawei Cloud Community. In the world of deep learning, attention mechanism is a powerful technology, which is widely used in the fields of natural language processing (NLP) and computer vision (CV). It can help the model pay more attention to important information when dealing with complex tasks, thus improving performance. In this article, we will introduce the principle of attention mechanism in detail, and use Python and TensorFlow/Keras to implement a simple attention mechanism model. 1. Attention Expand more

comment zero Let me comment Close comments

Collection zero

fabulous zero

Stow

Topping

Huawei Cloud Developer Alliance

Blogged

05/27 09:45

What is Token? Why do large models need to calculate the number of tokens

This article is shared by Tracy, the little assistant of Kaitian aPaaS, in "[Technology Sharing] What is Token? Why does GPT price based on Token" of Huawei Cloud Community. When using the LLM model, we often encounter a keyword called Token. For example, the latest GPT-4 Turbo model supports up to 128k token context; Claude-2.1, once the strongest competitor of GPT, supported up to 200K token context; When creating roles in the GPT store, the core Prompt supports up to 8000 tokens. 1. What is Token? GPT does not directly calculate the "character", but turns the character into a Expand more

comment zero Let me comment Close comments

Collection zero

fabulous zero

Stow

Topping

Zilliz

programmer

Blogged

05/24 21:07

An in-depth analysis of ColBERT

In recent years, the field of vector search has experienced explosive growth, especially after the advent of large language models (LLMs). Academics began to focus on how to enhance the embedding vector model by expanding training data, adopting advanced training methods and new architectures. In the previous article, we have thoroughly discussed various types of embedding vectors and models designed for efficient information retrieval, including dense, sparse and binary embedding vectors designed for specific use cases, their respective advantages and disadvantages. In addition, we also introduced various embedding vector models, such as for thick Expand more

comment zero Let me comment Close comments

Collection zero

fabulous zero

Stow

Topping

Kangaroo cloud number stack

Blogged

05/22 14:15

Detailed explanation of EasyMR's adaptation practice technology based on localization of ICT

Localized information technology refers to the use of domestic information technology products and services to build an independent and controllable information technology system. In recent years, with the increasing emphasis of the country on network security and information security, [localization of information technology]（ https://www.dtstack.com/dtengine/easymr?src=szsm ）It has become an important part of the national strategy, and has shown the following major trends: ● policy driven, accelerated development. The country has issued a series of policies and regulations, and vigorously supported the development of localized information and innovation industry. For example, the "14th Five Year" Digital Economy Development Plan proposes that by 2025, core technologies in key information technology fields will be tackled Expand more

comment zero Let me comment Close comments

Collection one

fabulous zero

Stow

Topping

Little white rabbit likes to eat big gray wolf

Blogged

05/23 15:28

Analysis and interpretation of the development of Chinese humanoid robot industry in 2024

Humanoid robot is one of the most potential and promising industries in the field of science and technology in the world today. With the continuous progress of science and technology and the rapid development of artificial intelligence technology, humanoid robots, as the new track of the future industry and the new engine of economic growth, will profoundly change the way of human production and life and reshape the global industrial development pattern. Recently, GGII compiled and released the Blue Book on the Development of China's Humanoid Robot Industry (2024), which provides a comprehensive analysis and outlook of China's humanoid robot industry. The following is the collation of its core content: industry overview: humanoid robot, as a potential industry in the field of science and technology, is expected to be in 2025 Expand more

comment zero Let me comment Close comments

Collection zero

fabulous zero

Stow

Topping

Huawei Cloud Developer Alliance

Blogged

05/23 13:58

14 Flink SQL performance optimization practices

This article is shared from "Flink SQL Performance Optimization Practice" of Huawei Cloud Community, written by Chaomeng. In the field of big data processing, Apache Flink has become the first choice of many enterprises with its ability to integrate stream processing and batch processing. However, as the amount of data grows, performance optimization becomes critical. This article will discuss the common performance problems, tuning methods, error prone points and tuning techniques of Flink SQL in simple terms, and provide code examples 1. Common performance problems 1.1 Low data source reading efficiency and insufficient parallelism: the default parallelism may not make full use of hardware resources-- Set parallelism Expand more

comment one Let me comment Close comments

Collection one

fabulous one

Stow

Topping

Huawei Cloud Developer Alliance

Blogged

05/23 10:11

Yiwen teaches you to build a local knowledge base question and answer based on LangChain and ChatGLM3

This article is shared by Ye Yiyi, the author of "[Cloud Resident Co creation] LangChain ＋ ChatGLM3 to Realize Local Knowledge Base, Transfer to Huawei Cloud ModelArts, and Realize AI Application Development of Big Model", a Huawei cloud community. 1、 Preface The lecturer of Huawei Cloud in this issue is Jason, the engineer of Huawei Cloud EI development ecology. The topic of sharing is: questions and answers based on the local knowledge base of LangChain+ChatGLM3. Now, the development of big language model has reached a new height, and its application scenarios are also applicable to thousands of industries. Huawei Cloud EI has the full stack AI capability, and its ModelArts is a one-stop AI development platform, which can help developers to be intelligent, high Expand more

comment zero Let me comment Close comments

Collection one

fabulous zero

Stow

Topping

Yingxin

Blogged

05/23 11:58

Robin Li Attends VivaTech: The biggest difference between Chinese AI and the West lies in the application

On May 22, at the main forum of "Viva Technology" held in Paris, France, Li Yanhong, Baidu's founder, chairman and CEO, talked with Maurice Levy, chairman of the board of supervisors of Publicis Group, saying that the biggest difference between Chinese AI and the West lies in its application. China has hundreds of basic models, But people are increasingly discussing what is the super application in the AI era. He said that applications have driven the rapid development of AI in China. VivaTech is the largest scientific and technological innovation event in Europe. Since its establishment in 2016, it has been to the eighth session. French President Marcelon, Musk, Tu Expand more

comment zero Let me comment Close comments

Collection one

fabulous zero

Stow

Topping

Official Alluxio

Blogged

05/21 16:46

Case Sharing | Application and Deployment of Alluxio in Automatic Driving Model Training

Sharing guests: Yang Linsan - Huixi Intelligent About Huixi Intelligent: Huixi Intelligent is a start-up company that makes automatic driving chips and was founded in 2022. It is committed to creating an innovative on-board intelligent computing platform, providing high-level intelligent driving chips, easy-to-use open tool chains and full stack automated driving solutions, helping automobile enterprises achieve high-quality and efficient automated mass production and delivery, building low-cost, large-scale and automated iterative capabilities, and leading high-level intelligent travel in the data driven era. Share outline: how to use Alluxio in startups? The process of using Alluxio from 0-1 (research deployment production). Sharing practical experience Expand more

comment zero Let me comment Close comments

Collection one

fabulous zero

Stow

administrators

Currently, they are in the heap length window. Do you still need RAG?

Cloud primordial ✖️ The best practice of microservice architecture in the AI era - CloudWeGo Technology Salon · Shanghai Station was opened

Maximizing the Effective Throughput of LLM Serving

RWKV model local deployment tool Ai00 Server nanny level tutorial

Deeply explore the principle and practice of token bucket current limiting

Introduction to Idefics2: a powerful 8B visual language model for the community

PaliGemma officially released - Google's latest cutting-edge open visual language model

Use Intel Gaudi 2 and Xeon CPU to build cost-effective enterprise RAG applications

Teach you how to implement MindSpot model training based on Huawei Cloud

This article teaches you how to call Ascend C operator

A new paradigm of human-computer cooperation? AI Agents "sheepskin roll" for everyone

Explain the principle of attention mechanism and teach you to use Python to implement the deep learning model

What is Token? Why do large models need to calculate the number of tokens

An in-depth analysis of ColBERT

Detailed explanation of EasyMR's adaptation practice technology based on localization of ICT

Analysis and interpretation of the development of Chinese humanoid robot industry in 2024

14 Flink SQL performance optimization practices

Yiwen teaches you to build a local knowledge base question and answer based on LangChain and ChatGLM3

Robin Li Attends VivaTech: The biggest difference between Chinese AI and the West lies in the application

Case Sharing | Application and Deployment of Alluxio in Automatic Driving Model Training

Hot News

DataGear 1.12.0 release, data visualization analysis platform

Milvus new version v0.10.3 goes online!

Apache Beam 2.24.0 release, big data stream processing and batch processing programming paradigm

PyMiner -- MATLAB for open source industry