Data Lake Formation_Data Warehouse_Data Real time Analysis AliCloud

Data Lake Formation

As a core component of cloud native data lake architecture, Data Lake Formation (DLF) helps users quickly build cloud native data lake architecture. Data lake construction provides unified management of metadata on the lake, enterprise level authority control, and seamless connection with multiple computing engines to break data islands and gain insight into business value.

Cloud Habitat Conference - Cloud Native Data Lake Technology Outlook, click to go to

Product documentation

Product advantages

Data into the lake

Support multiple data types and channels into the lake
Support unified data cleaning

Metadata service

Intelligent metadata identification service
Unified collection to avoid decentralized management

Permission management

Enterprise level data authority management
Users can set permissions for libraries, tables, and fields respectively

Multi engine docking

Support multiple upstream computing engines
Easily build a full link data lake service

Ecological openness

Compatible with Hive Metastore
Multi language Open API is provided for easy integration

Data acceleration

Unique JindoFS data acceleration function
Provide high-performance data lake analysis acceleration capability

Application scenarios

Open source ecological construction of data lake

Building a data warehouse integrating lake and warehouse

Real time analysis of data lake data

Data lake construction machine learning

Open source ecological construction of data lake

Users have built their own data processing and analysis platform based on Alibaba Cloud open source big data ecosystem (E-MapReduce, real-time computing Flink, DLA and other products). However, in the trend of rapid expansion of data volume, users' storage resources do not match the expansion speed of computing resources, and there are demands for cost optimization; The big data ecology is rich, users have a wide range of data sources, and metadata is scattered and difficult to manage. Users hope to manage metadata in different storage uniformly.

Programme value

metadata management

Data lake construction supports automatic collection and discovery of multi engine metadata, which can achieve unified management and avoid data islands

Ecological advantages

Alibaba Cloud big data team provides expert service support

Building a data warehouse integrating lake and warehouse

Data warehouse and data lake are two design orientations of big data architecture. The priority design of data lake brings maximum flexibility to data entering the lake by opening the underlying file storage. The priority design of data warehouse focuses more on the growth requirements of enterprise level such as data usage efficiency, data management under large-scale, security/compliance. Flexibility and growth are of different importance to enterprises in different periods. With the gradual clarity and precipitation of users' business, users are faced with the integration of data lake and data warehouse architecture. Relying on Alibaba Cloud data warehouse (MaxCompute, Hologres, ADB and other products) and data lake, they build products to help users build a data system integrating lake and warehouse, so that data and computing can flow freely between lake and warehouse, So as to build a complete and organic big data technology ecosystem.

Programme value

Operation and maintenance free

The data lake construction product provides users with full hosting services. It can help users quickly build the data lake system on the cloud with a simple click

Safety is guaranteed

Unified authority management system can control the authority of database, table and column.

Real time analysis of data lake data

A large number of different types of data are stored in OSS. Users want to analyze and query data in various dimensions, such as real-time data analysis and OLAP query, and feed back the corresponding results to the business system. At the same time, users want to be able to easily dock with multiple computing engines on the cloud, and can directly query data without extracting all data to the query system.

Programme value

Real time data into the lake

Provide real-time data access to the lake, and provide business timeliness

Metadata auto discovery

Data lake construction can automatically capture, arrange and prepare data for analysis and avoid complex manual operation

Data lake construction machine learning

Big data is the foundation of AI, and AI is also the future of big data. The data lake can serve users well in classic machine learning scenarios and deep learning scenarios: in machine learning scenarios, users face problems such as large data volume, slow model training, and poor algorithm effect. The data lake needs to be able to connect to mature machine learning platforms. In deep learning, users need to be able to dynamically adjust the use of GPU resources to save costs.

Programme value

Easy to use

Data Lake builds a seamless connection with Alibaba Cloud machine learning platform, and provides multiple Open APIs to facilitate user integration

Data normalization

The data lake construction supports the user to clean and standardize the data when entering the lake, facilitating the subsequent use of machine learning model analysis

Customer Stories

Practice of online education data lake

Practice of online game data lake

Practice of mutual entertainment new media data lake

Practice of online education data lake

An online education platform with more than 100 million users.

customer demand

Users hope that courseware materials, application logs, learning samples and other data can be centrally stored and managed. Users also hope to be able to provide courseware playing, offline analysis, machine learning for different types of data, and realize the application of different scenarios of online education.

Customer value

The data lake builds a perfect fit for data storage OSS, and interfaces with a large number of computing engines to meet users' different analysis needs.

Practice of online game data lake

A leading interactive entertainment company in Asia.

customer demand

Users hope to timely adjust the difficulty of game level, swap rate and resource output rate through data analysis to ensure the user's game experience and improve the user retention rate. Users also hope that cloud resources can be flexibly expanded and upgraded, and the data lake solution can solve the problem of tight binding of traditional big data cluster computing and storage resources, providing users with more flexibility.

Customer value

The data lake construction helps users quickly build cloud data lake services, solve the problem of storage computing resources, and interface with real-time computing analysis engines, which can help users adjust their business in real time.

Practice of mutual entertainment new media data lake

A new Internet media platform with more than 100 million monthly users.

customer demand

Users hope to manage metadata of multiple storage systems in a unified manner, provide data sharing and analysis capabilities, and serve business development.

Customer value

Data lake construction is used to uniformly and centrally manage decentralized metadata, and the unique discovery ability can collect and classify data by directory from user databases and object stores.

Product Dynamics

2021-06-15 New Features

Data lake construction - public beta release of data exploration function

View details

2021-10-15 New Features

New metadata migration function

View details

2021-11-01 New Features

Data permission function publishing - support database/table/column data permission control

View details

2022-01-01 New Region/Availability Zone

The international station was launched, adding Singapore

View details

2022-01-01 New Features

Lake Management Storage Overview New Function Release

View details

2022-05-20 New Features

Data Lake Permission Management Release

View details

2022-06-15 New Features

Lifecycle Management Release

View details

2022-07-06 New Features

DLF supports multiple catalogs

View details

View all logs

More products and services

E-MapReduce

Open source Hadoop, Spark, HBase, Hive, Flink ecological big data PaaS products built on Alibaba Cloud ECS

Big data computing service MaxCompute

Provide a fast, fully hosted PB level data warehouse solution to analyze and process massive data economically and efficiently

interactive analysis

Real time interactive analysis product compatible with PostgreSQL protocol

Object Storage OSS

Massive, secure, low-cost and highly reliable cloud storage service, providing 99.999999999999% data reliability