Cloud native data lake analysis
Play video
Data Lake Analytics (Data Lake Analytics for short) uses an elastic architecture, provides one-stop data lake analysis and computing services, and supports ETL, machine learning, streaming, and interactive analysis; It can analyze and integrate object storage (OSS), RDS (MySQL, etc.), NoSQL (MongoDB, etc.) data sources; Functions include data entry, metadata management and automatic discovery, and support for dual engines: Presto and Spark.

Product advantages

Compatible with Presto and Spark
Compatible with the syntax and multiple versions of open source Presto and Spark, it can be used quickly.
Out of the box
In the form of serverless, there is no need to purchase any resources, and the Internet can be accessed directly, reducing the operation and maintenance costs and eliminating the hassle of large database system construction.
Real time multi-source data entering the lake
OSS data is directly analyzed to build a large-scale analysis dataset, with a delay of about 10 minutes.
Massive computing power instant expansion
The cluster expands rapidly on demand, and 300 nodes can be popped up in one minute at the earliest to flexibly respond to business changes.

Product Functions

Rich product series, fully covering the needs of multiple scenarios It adopts serverless form, supports Presto and Spark engines, and has flexible cluster capacity expansion at minute level. Compared with offline deployment of computer rooms, it costs less
Serverless form The data lake analysis adopts the form of Serverless, without infrastructure and management costs. It has direct access to the Internet, out of the box, pay as needed, and does not need to hold the analysis costs for a long time. During the upgrade period, it has little impact on the business, and the product iteration is fast and agile.
Presto engine Presto engine is an interactive analysis engine based on Presto for data lake analysis. It is connected to MySQL protocol and can use any tool compatible with MySQL protocol for data analysis. It is suitable for data analysis scenarios such as ad hoc query, BI analysis and lightweight ETL.
Spark engine Spark engine is a service-oriented big data analysis and computing service based on open source Spark. It is compatible with open source Spark syntax, all APIs and multiple versions, and supports SQL language and DataFrame code. It is suitable for massive data cleaning, streaming, writing Java, Scala, Python and other SQL difficult to express scenarios.
Flexible billing mode to meet different cost requirements In the case of CU, the resource package and the CU version support both the Presto and Spark engines, and the scan volume version only supports the Presto engine
CU time resource package The pay as you go+resource package payment mode is applicable to scenarios with large and frequent fluctuations in business volume. Analyze the consumption of all pay as you go instances (Presto and Spark engine instances) by using the resource package to offset the data lake. During the use period, pay as you go instances can be flexibly upgraded, regardless of the price difference of the remaining prepaid time after upgrade. The resource package is not used and cannot be offset. Compared with the monthly package and pay as you go instances, it is more flexible and cost-effective. Learn more>
CU version Support data lake analysis Presto and Spark engine instances, suitable for scenarios with high query frequency and large amount of query data. It is charged according to CPU and memory specifications. For example, one core 4GB is 1CU, and the unit price is 0.35 yuan/hour. Two billing modes are supported: monthly package and pay as you go billing. Learn more>
Scan Volume Version Only data lake analysis Presto engine instances are supported, which is suitable for scenarios with low query frequency and small amount of query data. The charge is based on the bytes scanned. For example, the cost of scanning each TB of data is 28 yuan. Two billing modes are supported: traffic package (monthly package) and pay as you scan (pay as you scan). Learn more>
Multiple enterprise level capabilities covering various business needs It has superior flexibility, supports metadata discovery, supports real-time analysis of data from multiple sources and one click, and can analyze dozens of source data such as OSS directly using SQL
Superior elasticity The Spark engine of data lake analysis supports job level flexibility. It can set long-term reserved resources (MIN) and elastic resource upper limit (MAX). The MIN minimum is 0. The instance can automatically expand and shrink elastically between MIN and MAX according to business peaks and troughs. There is no need to reserve resources in advance to reduce costs while maintaining stable business operation; At the same time, it supports second level pull up. At present, 500~1000 computing nodes can be pulled up every minute, which can quickly respond to business resource requirements.
Meta information discovery It can automatically create and update data lake metadata for data files on OSS for easy analysis and calculation. It has the ability to automatically explore file data fields and types, automatically map directories and partitions, automatically perceive new columns and partitions, and automatically group files into tables.
Real time analysis of multi-source data entering the lake Support the construction of real-time data lake, with a delay of about 10 minutes; Without ETL, you can use SQL to analyze multiple data sources across OSS, relational databases (PostgreSQL, MySQL, etc.), NoSQL (TableStore, etc.), and mask the differences in access to various data sources; The analysis environment is isolated from the production library, and the analysis process will not affect the business system at the data source side.
Perfect ecology, low learning and use cost Database experience, multiple GUI tools, data SaaS visualization tool support, low learning cost
Rich GUI tools Support multiple MySQL GUI management tools such as Microstrategy, MySQL Workbench, DBeaver, etc.
Multiple visualizer support High integration and compatibility with QuickBI, Tableau, DataV and other BI tools.
Compatible with standard SQL Compatible with SQL: 2003 standard, supporting standard JDBC/ODBC protocol, rich built-in function support, and database like use experience.

Application scenarios

Cloud native data lake analysis architecture
Lakehouse enters the lake in real time
Optimization of data warehouse building cost
Log transfer compliance and analysis
Adopt cloud native architecture to significantly reduce the operation and maintenance workload
The serverless Spark version solves the following business challenges: it needs to customize the code, which is difficult for SQL to express, such as writing Java, Scala, Python or SQL conditional, large-scale cleaning, such as cleaning 1TB~1PB of OSS data one day; It needs algorithm support and streaming support.
The serverless Presto version solves the following scenario business challenges: quickly build reports, such as QuickBI acceleration, and Youmeng returns data analysis; It supports lightweight ETL and can be cleaned quickly by simply using SQL.
Capable of providing
Developed on the basis of open source Apache Spark, compatible with Spark, pySpark ecology, open source algorithm library, etc.
Developed on the basis of open source Presto, compatible with Presto ecology.
Serverless Spark has good flexibility and supports job level flexibility. Long term reserved resources (MIN) and elastic resource upper limit (MAX) can be set. MIN is 0 at least. The data lake analysis instance automatically expands and shrinks the capacity between MIN and MAX according to the business peaks and troughs, without reserving resources in advance, reducing costs while maintaining stable business operation.
Carry out data ETL and build warehouse in the lake.
Serverless Presto provides built-in cache, time-sharing flexibility and partition projection to help users quickly build BI reports.
Recommended matching products
Real time analysis of heterogeneous data to speed up data drive
This scenario requires joint query and real-time analysis of multiple types of data sources, and the traditional solution process is tedious and time-consuming. The cloud native data lake provides the federated analysis capability between multiple data sources, shields the differences of access to various data sources, and quickly excavates the value of data.
Capable of providing
The production library is isolated from the analysis environment to keep the production library lighter and better performance. The historical data is analyzed in the analysis environment, which has no impact on the production environment.
Support the construction of real-time data lake, support the CDC and message data (such as Kafka) of DB into the lake, and build a large-scale dataset that can be added, deleted, modified and queried, with a delay of about 10 minutes.
It supports federated queries and aggregate queries on data scattered in various types of data sources such as MySQL, SQL Server, PostgreSQL, and OSS through standard JDBC.
It is compatible with MySQL protocol and does not need ETL. You can use SQL to directly analyze dozens of source data such as OSS, and quickly and low-cost open big data analysis.
Recommended matching products
Massive data analysis is slow, and the cost of self built data warehouse is high
Using the production database directly to analyze massive data will not only affect online business, but may also lead to timeout and query failure; However, a large amount of software and hardware resources, R&D costs and operation and maintenance costs are required for self built data warehouse.
Capable of providing
It supports the RDS one click warehouse creation function and rapid query and analysis of massive data. Through the simple configuration of the console, data can be synchronously imported into OSS, and some businesses that used to occupy RDS computing resources can be migrated to Data Lake Analysis+OSS, reducing the pressure on the RDS business database.
Rich ecological support, supporting multiple GUI management tools such as Microstrategy, MySQL Workbench, and various visualization tools such as QuickBI, Tableau, and DataV.
It is compatible with MySQL protocol, based on SQL analysis, without learning cost, shielding the complexity of underlying technology, and greatly reducing the operation and maintenance cost.
It is ready to use on demand and has low preparation cost. It can expand and shrink the capacity of instances before and after the business peak, respond quickly, and fully match the resource fluctuations brought by the business tide.
Recommended matching products
The application access to the whole site is accelerated, and the behavior log analysis ability is easily available
Cloud native data lake analysis provides full link support for data collection, rapid query analysis and storage, and accelerates the whole station and analyzes log archiving in one step to achieve data driven business growth.
Capable of providing
The log return is a full link product. It does not need to pay attention to the intermediate ETL process, and the result log table after cleaning is obtained directly.
Quickly build reports, such as analyzing error code distribution and user access links, to achieve link traceability.
Meet the log compliance requirements, and meet the compliance requirements of the superior competent department on the length of log data storage.
Recommended matching products

Customer Stories

Jiahe Technology
Jiahe Technology responds to business peaks and troughs through the super analytical capabilities of Data Lake Analysis+OSS. At the same time, the flexible service of Serverless provided by Data Lake Analysis is charged on demand. It does not need to purchase fixed resources, special operation and maintenance personnel. The code is universal, without additional learning costs, and the relative cost performance is increased by 30%; The temporary business acceptance rate increased by 200% - 300%, and the average task time decreased by 67%.
Easy to click the world
The cloud native data lake analysis helps E-Dianxia improve time, cost, security, computing efficiency and other aspects in the whole link of data acquisition, storage and analysis, reducing the comprehensive operating cost by about 50%. It supports direct analysis of dozens of source data such as OSS through SQL statements, greatly improving data query and analysis capabilities, and helping business development.

Product Dynamics

2018-05-11 New products
Alibaba Cloud Data Lake Analytics public beta release
View details
New functions/specifications on August 10, 2018
Data Lake Analytics supports reflow of multiple data sources
View details
2018-08-17 New Features/Specifications
Data Lake Analytics supports reflow of multiple data sources
View details
2018-10-19 New Region/New Availability Zone
Data Lake Analytics officially launched in South China region
View details
2018-11-01 New Region/New Availability Zone
Alibaba Cloud Data Lake Analytics officially launched in the UK region
View details
New products on November 14, 2018
Alibaba Cloud Data Lake Analytics officially commercialized
View details
New functions/specifications on January 15, 2019
Data Lake Analytics releases the table creation wizard function, supporting OSS data sources
View details
2019-01-18 New functions/specifications
Alibaba Cloud Data Lake Analytics' json_extract function for MongoDB data source
View details
2019-02-01 New functions/specifications
Data Lake Analytics supports MongoDB Connector
View details
New functions/specifications on March 15, 2019
Data Lake Analytics Access to DataWorks
View details
New functions/specifications on March 15, 2019
Data Lake Analytics supports Redis Connector
View details
New functions/specifications on March 29, 2019
Data Lake Analytics access to MNS and ONS messaging systems
View details
2019-04-15 New Region/New Availability Zone
Data Lake Analytics newly launched in the United States (Virginia) region
View details
New functions/specifications on May 15, 2019
Data Lake Analytics supports POLARDB Connector
View details
2019-06-17 New functions/specifications
Data Lake Analytics supports MaxCompute Connector
View details
2019-07-12 New functions/specifications
Data Lake Analytics supports accessing multi index tables of table store
View details
2019-08-30 New Features/Specifications
Alibaba Cloud Data Lake Analytics adds the corresponding function between IP and country, province and city
View details
2019-09-02 New functions/specifications
One click warehouse building function of Data Lake Analytics to quickly build a data warehouse system based on RDS
View details
New region/new zone on November 15, 2019
Data Lake Analytics launched new regions such as India (Mumbai), the United States (Silicon Valley), and Japan (Tokyo)
View details
New functions/specifications on December 30, 2019
Data lake analysis is fully upgraded to support Presto analysis engine
View details
New functions/specifications on December 31, 2019
Data Lake Analytics supports Youmeng to release the overall solution of "data open U-DOP"
View details
2020-02-13 New Features/Specifications
Data Lake Analytics releases multiple optimizations such as SQL completion and console cache, which greatly improves the user's console
View details
2020-02-24 Function optimization
Optimize for OSS multi version performance and console prompts
View details
2020-03-13 Function optimization
Optimize the overall performance of DROP TABLE when there are many partitions
View details
2020-03-13 Function optimization
When encountering an empty file in gz format during SQL execution, it will be skipped directly
View details
2020-04-02 New Features/Specifications
Data lake analysis supports AnalyticDB for PostgreSQL data source
View details
2020-04-02 New Features/Specifications
Data lake analysis supports users to write data to MaxCompute
View details
2020-04-02 New Features/Specifications
Data lake analysis supports analysis of Actiontrail logs
View details
2020-04-16 New Features/Specifications
Data lake analysis supports view creation, deletion and authorization
View details
2020-04-16 New Features/Specifications
Data lake analysis supports users to modify Column name, type and comment
View details
2020-04-23 New Features/Specifications
Data lake analysis supports the conversion of different file formats for OSS users
View details
2020-04-24 New Features/Specifications
Data lake analysis release Serverless Spark computing engine
View details
2020-05-15 New Features/Specifications
Data lake analysis supports Mongodb read-only instance
View details
2020-06-22 New Features/Specifications
Data lake analysis supports the analysis of MaxCompute external table data
View details
2020-06-22 New Features/Specifications
Data lake analysis supports analysis of Druid data source
View details
2020-06-22 New Features/Specifications
Data lake analysis supports analysis of ElasticSearch data source
View details
2020-08-14 New Features/Specifications
Data lake analysis supports data lake management
View details
2020-08-28 New Features/Specifications
Data lake analysis release SQL (Presto compatible) CU version
View details
2020-08-28 New Features/Specifications
RAM sub users can submit serverless Spark jobs
View details
2020-09-04 New Features/Specifications
Cloud native data lake analysis metadata crawling supports automatic discovery of NGINX logs
View details
2020-09-10 New Features/Specifications
Cloud native data lake analysis supports CU version package
View details
2020-09-18 New Features/Specifications
Cloud native data lake analysis engine SQL (Presto) supports access to user built Hive
View details
2020-09-25 New Features/Specifications
Cloud native data lake analysis Spark engine supports accessing user Hive
View details
2020-09-25 New Features/Specifications
Cloud native data lake analysis Spark engine supports accessing user Hadoop
View details
2020-09-25 New Features/Specifications
Cloud native data lake analysis Spark engine supports accessing user Hbase
View details
2020-10-30 New Features/Specifications
Cloud native data lake analysis supports Kudu data source
View details
2020-12-18 New Features/Specifications
Cloud native data lake analysis metadata crawling supports TableStore data source
View details
2020-12-18 New Features/Specifications
Cloud native data lake analysis metadata discovery supports data posted to OSS by SLS
View details
2021-02-05 New Features/Specifications
Cloud native data lake analysis launches data lake analysis acceleration function based on Alluxio
View details
2021-02-10 New Features/Specifications
Cloud native data lake analysis and publishing support OSS metadata discovery database mode
View details
2021-04-28 New Features/Specifications
Cloud native data lake analysis supports Lakehouse to lake, helping users to import RDS and PolarDB business data into lake
View details
2021-04-28 New Features/Specifications
Cloud native data lake analysis Spark engine supports query submission in SparkSQL mode on the console
View details
2021-04-28 New Features/Specifications
Cloud native data lake analysis Spark engine supports docking with user built Jupyter
View details
2021-05-21 New Features/Specifications
Cloud native data lake analysis Presto engine CU version supports monitoring CPU, memory and other parameter indicators
View details
2021-06-30 New Features/Specifications
DLA Lakehouse supports reading RDS MySQL and PolarDB MySQL slave databases to simplify warehouse building in the lake
View details
2021-08-09 New Features/Specifications
DLA Lakehouse supports kafka's quasi real-time warehouse building in the lake, which can be used for query and analysis of behavior logs and other scenarios
View details
View all logs

Documentation and Tools