Cloud native data lake analysis DLA_0 O&M building database_big data analysis_database

Cloud native data lake analysis

Play video

Data Lake Analytics (Data Lake Analytics for short) uses an elastic architecture, provides one-stop data lake analysis and computing services, and supports ETL, machine learning, streaming, and interactive analysis; It can analyze and integrate object storage (OSS), RDS (MySQL, etc.), NoSQL (MongoDB, etc.) data sources; Functions include data entry, metadata management and automatic discovery, and support for dual engines: Presto and Spark.

Cloud native data lake analysis (DLA) product delisting announcement

AnalyticDB Data Analysis Training Camp ADB Hucang version free trial Understanding AnalyticDB MySQL Purchase AnalyticDB MySQL

Product advantages

Compatible with Presto and Spark

Compatible with the syntax and multiple versions of open source Presto and Spark, it can be used quickly.

Out of the box

In the form of serverless, there is no need to purchase any resources, and the Internet can be accessed directly, reducing the operation and maintenance costs and eliminating the hassle of large database system construction.

Real time multi-source data entering the lake

OSS data is directly analyzed to build a large-scale analysis dataset, with a delay of about 10 minutes.

Massive computing power instant expansion

The cluster expands rapidly on demand, and 300 nodes can be popped up in one minute at the earliest to flexibly respond to business changes.

Product Functions

Rich product series, fully covering the needs of multiple scenarios It adopts serverless form, supports Presto and Spark engines, and has flexible cluster capacity expansion at minute level. Compared with offline deployment of computer rooms, it costs less

Serverless form The data lake analysis adopts the form of Serverless, without infrastructure and management costs. It has direct access to the Internet, out of the box, pay as needed, and does not need to hold the analysis costs for a long time. During the upgrade period, it has little impact on the business, and the product iteration is fast and agile.

Presto engine Presto engine is an interactive analysis engine based on Presto for data lake analysis. It is connected to MySQL protocol and can use any tool compatible with MySQL protocol for data analysis. It is suitable for data analysis scenarios such as ad hoc query, BI analysis and lightweight ETL.

Spark engine Spark engine is a service-oriented big data analysis and computing service based on open source Spark. It is compatible with open source Spark syntax, all APIs and multiple versions, and supports SQL language and DataFrame code. It is suitable for massive data cleaning, streaming, writing Java, Scala, Python and other SQL difficult to express scenarios.

Flexible billing mode to meet different cost requirements In the case of CU, the resource package and the CU version support both the Presto and Spark engines, and the scan volume version only supports the Presto engine

CU time resource package The pay as you go+resource package payment mode is applicable to scenarios with large and frequent fluctuations in business volume. Analyze the consumption of all pay as you go instances (Presto and Spark engine instances) by using the resource package to offset the data lake. During the use period, pay as you go instances can be flexibly upgraded, regardless of the price difference of the remaining prepaid time after upgrade. The resource package is not used and cannot be offset. Compared with the monthly package and pay as you go instances, it is more flexible and cost-effective. Learn more>

CU version Support data lake analysis Presto and Spark engine instances, suitable for scenarios with high query frequency and large amount of query data. It is charged according to CPU and memory specifications. For example, one core 4GB is 1CU, and the unit price is 0.35 yuan/hour. Two billing modes are supported: monthly package and pay as you go billing. Learn more>

Scan Volume Version Only data lake analysis Presto engine instances are supported, which is suitable for scenarios with low query frequency and small amount of query data. The charge is based on the bytes scanned. For example, the cost of scanning each TB of data is 28 yuan. Two billing modes are supported: traffic package (monthly package) and pay as you scan (pay as you scan). Learn more>

Multiple enterprise level capabilities covering various business needs It has superior flexibility, supports metadata discovery, supports real-time analysis of data from multiple sources and one click, and can analyze dozens of source data such as OSS directly using SQL

Superior elasticity The Spark engine of data lake analysis supports job level flexibility. It can set long-term reserved resources (MIN) and elastic resource upper limit (MAX). The MIN minimum is 0. The instance can automatically expand and shrink elastically between MIN and MAX according to business peaks and troughs. There is no need to reserve resources in advance to reduce costs while maintaining stable business operation; At the same time, it supports second level pull up. At present, 500~1000 computing nodes can be pulled up every minute, which can quickly respond to business resource requirements.

Meta information discovery It can automatically create and update data lake metadata for data files on OSS for easy analysis and calculation. It has the ability to automatically explore file data fields and types, automatically map directories and partitions, automatically perceive new columns and partitions, and automatically group files into tables.

Real time analysis of multi-source data entering the lake Support the construction of real-time data lake, with a delay of about 10 minutes; Without ETL, you can use SQL to analyze multiple data sources across OSS, relational databases (PostgreSQL, MySQL, etc.), NoSQL (TableStore, etc.), and mask the differences in access to various data sources; The analysis environment is isolated from the production library, and the analysis process will not affect the business system at the data source side.

Perfect ecology, low learning and use cost Database experience, multiple GUI tools, data SaaS visualization tool support, low learning cost

Rich GUI tools Support multiple MySQL GUI management tools such as Microstrategy, MySQL Workbench, DBeaver, etc.

Multiple visualizer support High integration and compatibility with QuickBI, Tableau, DataV and other BI tools.

Compatible with standard SQL Compatible with SQL: 2003 standard, supporting standard JDBC/ODBC protocol, rich built-in function support, and database like use experience.

Application scenarios

Cloud native data lake analysis architecture

Lakehouse enters the lake in real time

Optimization of data warehouse building cost

Log transfer compliance and analysis

Adopt cloud native architecture to significantly reduce the operation and maintenance workload

The serverless Spark version solves the following business challenges: it needs to customize the code, which is difficult for SQL to express, such as writing Java, Scala, Python or SQL conditional, large-scale cleaning, such as cleaning 1TB~1PB of OSS data one day; It needs algorithm support and streaming support.
The serverless Presto version solves the following scenario business challenges: quickly build reports, such as QuickBI acceleration, and Youmeng returns data analysis; It supports lightweight ETL and can be cleaned quickly by simply using SQL.

Capable of providing

Developed on the basis of open source Apache Spark, compatible with Spark, pySpark ecology, open source algorithm library, etc.

Developed on the basis of open source Presto, compatible with Presto ecology.

Serverless Spark has good flexibility and supports job level flexibility. Long term reserved resources (MIN) and elastic resource upper limit (MAX) can be set. MIN is 0 at least. The data lake analysis instance automatically expands and shrinks the capacity between MIN and MAX according to the business peaks and troughs, without reserving resources in advance, reducing costs while maintaining stable business operation.

Carry out data ETL and build warehouse in the lake.

Serverless Presto provides built-in cache, time-sharing flexibility and partition projection to help users quickly build BI reports.

Recommended matching products

RDS MySQL

Object Storage OSS

Real time analysis of heterogeneous data to speed up data drive

This scenario requires joint query and real-time analysis of multiple types of data sources, and the traditional solution process is tedious and time-consuming. The cloud native data lake provides the federated analysis capability between multiple data sources, shields the differences of access to various data sources, and quickly excavates the value of data.

Capable of providing

The production library is isolated from the analysis environment to keep the production library lighter and better performance. The historical data is analyzed in the analysis environment, which has no impact on the production environment.

Support the construction of real-time data lake, support the CDC and message data (such as Kafka) of DB into the lake, and build a large-scale dataset that can be added, deleted, modified and queried, with a delay of about 10 minutes.

It supports federated queries and aggregate queries on data scattered in various types of data sources such as MySQL, SQL Server, PostgreSQL, and OSS through standard JDBC.

It is compatible with MySQL protocol and does not need ETL. You can use SQL to directly analyze dozens of source data such as OSS, and quickly and low-cost open big data analysis.

Recommended matching products

RDS MySQL

Object Storage OSS

Massive data analysis is slow, and the cost of self built data warehouse is high

Using the production database directly to analyze massive data will not only affect online business, but may also lead to timeout and query failure; However, a large amount of software and hardware resources, R&D costs and operation and maintenance costs are required for self built data warehouse.

Capable of providing

It supports the RDS one click warehouse creation function and rapid query and analysis of massive data. Through the simple configuration of the console, data can be synchronously imported into OSS, and some businesses that used to occupy RDS computing resources can be migrated to Data Lake Analysis+OSS, reducing the pressure on the RDS business database.

Rich ecological support, supporting multiple GUI management tools such as Microstrategy, MySQL Workbench, and various visualization tools such as QuickBI, Tableau, and DataV.

It is compatible with MySQL protocol, based on SQL analysis, without learning cost, shielding the complexity of underlying technology, and greatly reducing the operation and maintenance cost.

It is ready to use on demand and has low preparation cost. It can expand and shrink the capacity of instances before and after the business peak, respond quickly, and fully match the resource fluctuations brought by the business tide.

Recommended matching products

RDS MySQL

Object Storage OSS

The application access to the whole site is accelerated, and the behavior log analysis ability is easily available

Cloud native data lake analysis provides full link support for data collection, rapid query analysis and storage, and accelerates the whole station and analyzes log archiving in one step to achieve data driven business growth.

Capable of providing

The log return is a full link product. It does not need to pay attention to the intermediate ETL process, and the result log table after cleaning is obtained directly.

Quickly build reports, such as analyzing error code distribution and user access links, to achieve link traceability.

Meet the log compliance requirements, and meet the compliance requirements of the superior competent department on the length of log data storage.

Recommended matching products

RDS MySQL

Content and network distribution CDN

Object Storage OSS

Customer Stories

Jiahe Technology

Jiahe Technology responds to business peaks and troughs through the super analytical capabilities of Data Lake Analysis+OSS. At the same time, the flexible service of Serverless provided by Data Lake Analysis is charged on demand. It does not need to purchase fixed resources, special operation and maintenance personnel. The code is universal, without additional learning costs, and the relative cost performance is increased by 30%; The temporary business acceptance rate increased by 200% - 300%, and the average task time decreased by 67%.

Easy to click the world

The cloud native data lake analysis helps E-Dianxia improve time, cost, security, computing efficiency and other aspects in the whole link of data acquisition, storage and analysis, reducing the comprehensive operating cost by about 50%. It supports direct analysis of dozens of source data such as OSS through SQL statements, greatly improving data query and analysis capabilities, and helping business development.

Product Dynamics

2018-05-11 New products

Alibaba Cloud Data Lake Analytics public beta release

View details

New functions/specifications on August 10, 2018

Data Lake Analytics supports reflow of multiple data sources

View details

2018-08-17 New Features/Specifications

Data Lake Analytics supports reflow of multiple data sources

View details

2018-10-19 New Region/New Availability Zone

Data Lake Analytics officially launched in South China region

View details

2018-11-01 New Region/New Availability Zone

Alibaba Cloud Data Lake Analytics officially launched in the UK region

View details

New products on November 14, 2018

Alibaba Cloud Data Lake Analytics officially commercialized

View details

New functions/specifications on January 15, 2019

Data Lake Analytics releases the table creation wizard function, supporting OSS data sources

View details

2019-01-18 New functions/specifications

Alibaba Cloud Data Lake Analytics' json_extract function for MongoDB data source

View details

2019-02-01 New functions/specifications

Data Lake Analytics supports MongoDB Connector

View details

New functions/specifications on March 15, 2019

Data Lake Analytics Access to DataWorks

View details

New functions/specifications on March 15, 2019

Data Lake Analytics supports Redis Connector

View details

New functions/specifications on March 29, 2019

Data Lake Analytics access to MNS and ONS messaging systems

View details

2019-04-15 New Region/New Availability Zone

Data Lake Analytics newly launched in the United States (Virginia) region

View details

New functions/specifications on May 15, 2019

Data Lake Analytics supports POLARDB Connector

View details

2019-06-17 New functions/specifications

Data Lake Analytics supports MaxCompute Connector

View details

2019-07-12 New functions/specifications

Data Lake Analytics supports accessing multi index tables of table store

View details

2019-08-30 New Features/Specifications

Alibaba Cloud Data Lake Analytics adds the corresponding function between IP and country, province and city

View details

2019-09-02 New functions/specifications

One click warehouse building function of Data Lake Analytics to quickly build a data warehouse system based on RDS

View details

New region/new zone on November 15, 2019

Data Lake Analytics launched new regions such as India (Mumbai), the United States (Silicon Valley), and Japan (Tokyo)

View details

New functions/specifications on December 30, 2019

Data lake analysis is fully upgraded to support Presto analysis engine

View details

New functions/specifications on December 31, 2019

Data Lake Analytics supports Youmeng to release the overall solution of "data open U-DOP"

View details

2020-02-13 New Features/Specifications

Data Lake Analytics releases multiple optimizations such as SQL completion and console cache, which greatly improves the user's console

View details

2020-02-24 Function optimization

Optimize for OSS multi version performance and console prompts

View details

2020-03-13 Function optimization

Optimize the overall performance of DROP TABLE when there are many partitions

View details

2020-03-13 Function optimization

When encountering an empty file in gz format during SQL execution, it will be skipped directly

View details

2020-04-02 New Features/Specifications

Data lake analysis supports AnalyticDB for PostgreSQL data source

View details

2020-04-02 New Features/Specifications

Data lake analysis supports users to write data to MaxCompute

View details

2020-04-02 New Features/Specifications

Data lake analysis supports analysis of Actiontrail logs

View details

2020-04-16 New Features/Specifications

Data lake analysis supports view creation, deletion and authorization

View details

2020-04-16 New Features/Specifications

Data lake analysis supports users to modify Column name, type and comment

View details

2020-04-23 New Features/Specifications

Data lake analysis supports the conversion of different file formats for OSS users

View details

2020-04-24 New Features/Specifications

Data lake analysis release Serverless Spark computing engine

View details

2020-05-15 New Features/Specifications

Data lake analysis supports Mongodb read-only instance

View details

2020-06-22 New Features/Specifications

Data lake analysis supports the analysis of MaxCompute external table data

View details

2020-06-22 New Features/Specifications

Data lake analysis supports analysis of Druid data source

View details

2020-06-22 New Features/Specifications

Data lake analysis supports analysis of ElasticSearch data source

View details

2020-08-14 New Features/Specifications

Data lake analysis supports data lake management

View details

2020-08-28 New Features/Specifications

Data lake analysis release SQL (Presto compatible) CU version

View details

2020-08-28 New Features/Specifications

RAM sub users can submit serverless Spark jobs

View details

2020-09-04 New Features/Specifications

Cloud native data lake analysis metadata crawling supports automatic discovery of NGINX logs

View details

2020-09-10 New Features/Specifications

Cloud native data lake analysis supports CU version package

View details

2020-09-18 New Features/Specifications

Cloud native data lake analysis engine SQL (Presto) supports access to user built Hive

View details

2020-09-25 New Features/Specifications

Cloud native data lake analysis Spark engine supports accessing user Hive

View details

2020-09-25 New Features/Specifications

Cloud native data lake analysis Spark engine supports accessing user Hadoop

View details

2020-09-25 New Features/Specifications

Cloud native data lake analysis Spark engine supports accessing user Hbase

View details

2020-10-30 New Features/Specifications

Cloud native data lake analysis supports Kudu data source

View details

2020-12-18 New Features/Specifications

Cloud native data lake analysis metadata crawling supports TableStore data source

View details

2020-12-18 New Features/Specifications

Cloud native data lake analysis metadata discovery supports data posted to OSS by SLS

View details

2021-02-05 New Features/Specifications

Cloud native data lake analysis launches data lake analysis acceleration function based on Alluxio

View details

2021-02-10 New Features/Specifications

Cloud native data lake analysis and publishing support OSS metadata discovery database mode

View details

2021-04-28 New Features/Specifications

Cloud native data lake analysis supports Lakehouse to lake, helping users to import RDS and PolarDB business data into lake

View details

2021-04-28 New Features/Specifications

Cloud native data lake analysis Spark engine supports query submission in SparkSQL mode on the console

View details

2021-04-28 New Features/Specifications

Cloud native data lake analysis Spark engine supports docking with user built Jupyter

View details

2021-05-21 New Features/Specifications

Cloud native data lake analysis Presto engine CU version supports monitoring CPU, memory and other parameter indicators

View details

2021-06-30 New Features/Specifications

DLA Lakehouse supports reading RDS MySQL and PolarDB MySQL slave databases to simplify warehouse building in the lake

View details

2021-08-09 New Features/Specifications

DLA Lakehouse supports kafka's quasi real-time warehouse building in the lake, which can be used for query and analysis of behavior logs and other scenarios

View details

View all logs

Documentation and Tools

Product documentation

View Data Lake Analysis Usage Document

API & SDK

Understand Open API and SDK usage

Product pricing

Understand the pricing and billing methods of data lake analysis

common problem