Data Lake Formation
As a core component of cloud native data lake architecture, Data Lake Formation (DLF) helps users quickly build cloud native data lake architecture. Data lake construction provides unified management of metadata on the lake, enterprise level authority control, and seamless connection with multiple computing engines to break data islands and gain insight into business value.

Product advantages

Data into the lake
Support multiple data types and channels into the lake
Support unified data cleaning
Metadata service
Intelligent metadata identification service
Unified collection to avoid decentralized management
Permission management
Enterprise level data authority management
Users can set permissions for libraries, tables, and fields respectively
Multi engine docking
Support multiple upstream computing engines
Easily build a full link data lake service
Ecological openness
Compatible with Hive Metastore
Multi language Open API is provided for easy integration
Data acceleration
Unique JindoFS data acceleration function
Provide high-performance data lake analysis acceleration capability

Application scenarios

Open source ecological construction of data lake
Building a data warehouse integrating lake and warehouse
Real time analysis of data lake data
Data lake construction machine learning
Open source ecological construction of data lake
Users have built their own data processing and analysis platform based on Alibaba Cloud open source big data ecosystem (E-MapReduce, real-time computing Flink, DLA and other products). However, in the trend of rapid expansion of data volume, users' storage resources do not match the expansion speed of computing resources, and there are demands for cost optimization; The big data ecology is rich, users have a wide range of data sources, and metadata is scattered and difficult to manage. Users hope to manage metadata in different storage uniformly.
Programme value
metadata management
Data lake construction supports automatic collection and discovery of multi engine metadata, which can achieve unified management and avoid data islands
Ecological advantages
Alibaba Cloud big data team provides expert service support
Building a data warehouse integrating lake and warehouse
Data warehouse and data lake are two design orientations of big data architecture. The priority design of data lake brings maximum flexibility to data entering the lake by opening the underlying file storage. The priority design of data warehouse focuses more on the growth requirements of enterprise level such as data usage efficiency, data management under large-scale, security/compliance. Flexibility and growth are of different importance to enterprises in different periods. With the gradual clarity and precipitation of users' business, users are faced with the integration of data lake and data warehouse architecture. Relying on Alibaba Cloud data warehouse (MaxCompute, Hologres, ADB and other products) and data lake, they build products to help users build a data system integrating lake and warehouse, so that data and computing can flow freely between lake and warehouse, So as to build a complete and organic big data technology ecosystem.
Programme value
Operation and maintenance free
The data lake construction product provides users with full hosting services. It can help users quickly build the data lake system on the cloud with a simple click
Safety is guaranteed
Unified authority management system can control the authority of database, table and column.
Real time analysis of data lake data
A large number of different types of data are stored in OSS. Users want to analyze and query data in various dimensions, such as real-time data analysis and OLAP query, and feed back the corresponding results to the business system. At the same time, users want to be able to easily dock with multiple computing engines on the cloud, and can directly query data without extracting all data to the query system.
Programme value
Real time data into the lake
Provide real-time data access to the lake, and provide business timeliness
Metadata auto discovery
Data lake construction can automatically capture, arrange and prepare data for analysis and avoid complex manual operation
Data lake construction machine learning
Big data is the foundation of AI, and AI is also the future of big data. The data lake can serve users well in classic machine learning scenarios and deep learning scenarios: in machine learning scenarios, users face problems such as large data volume, slow model training, and poor algorithm effect. The data lake needs to be able to connect to mature machine learning platforms. In deep learning, users need to be able to dynamically adjust the use of GPU resources to save costs.
Programme value
Easy to use
Data Lake builds a seamless connection with Alibaba Cloud machine learning platform, and provides multiple Open APIs to facilitate user integration
Data normalization
The data lake construction supports the user to clean and standardize the data when entering the lake, facilitating the subsequent use of machine learning model analysis

Customer Stories

Practice of online education data lake
Practice of online game data lake
Practice of mutual entertainment new media data lake
Practice of online education data lake
An online education platform with more than 100 million users.
customer demand
Users hope that courseware materials, application logs, learning samples and other data can be centrally stored and managed. Users also hope to be able to provide courseware playing, offline analysis, machine learning for different types of data, and realize the application of different scenarios of online education.
Customer value
The data lake builds a perfect fit for data storage OSS, and interfaces with a large number of computing engines to meet users' different analysis needs.
Practice of online game data lake
A leading interactive entertainment company in Asia.
customer demand
Users hope to timely adjust the difficulty of game level, swap rate and resource output rate through data analysis to ensure the user's game experience and improve the user retention rate. Users also hope that cloud resources can be flexibly expanded and upgraded, and the data lake solution can solve the problem of tight binding of traditional big data cluster computing and storage resources, providing users with more flexibility.
Customer value
The data lake construction helps users quickly build cloud data lake services, solve the problem of storage computing resources, and interface with real-time computing analysis engines, which can help users adjust their business in real time.
Practice of mutual entertainment new media data lake
A new Internet media platform with more than 100 million monthly users.
customer demand
Users hope to manage metadata of multiple storage systems in a unified manner, provide data sharing and analysis capabilities, and serve business development.
Customer value
Data lake construction is used to uniformly and centrally manage decentralized metadata, and the unique discovery ability can collect and classify data by directory from user databases and object stores.

Product Dynamics

2021-06-15 New Features
Data lake construction - public beta release of data exploration function
View details
2021-10-15 New Features
New metadata migration function
View details
2021-11-01 New Features
Data permission function publishing - support database/table/column data permission control
View details
2022-01-01 New Region/Availability Zone
The international station was launched, adding Singapore
View details
2022-01-01 New Features
Lake Management Storage Overview New Function Release
View details
2022-05-20 New Features
Data Lake Permission Management Release
View details
2022-06-15 New Features
Lifecycle Management Release
View details
2022-07-06 New Features
DLF supports multiple catalogs
View details
View all logs

More products and services

E-MapReduce
Open source Hadoop, Spark, HBase, Hive, Flink ecological big data PaaS products built on Alibaba Cloud ECS
Big data computing service MaxCompute
Provide a fast, fully hosted PB level data warehouse solution to analyze and process massive data economically and efficiently
interactive analysis
Real time interactive analysis product compatible with PostgreSQL protocol
Object Storage OSS
Massive, secure, low-cost and highly reliable cloud storage service, providing 99.999999999999% data reliability