Cloud native enterprise data lake solution

The data lake is a unified storage pool that can interface with multiple data input methods. You can store structured, semi-structured, and unstructured data of any size. The data lake can seamlessly connect with a variety of computing and analysis platforms, directly conduct data processing and analysis, break the island, and gain insight into business value. At the same time, the data lake provides cold and hot hierarchical conversion capability, covering the entire life cycle of data.

Scheme architecture

Data Lake Storage

Object storage OSS is based on the reliability design of 12 nines, which can store data of any size, support cold and hot layering, and can interface with business applications and various computing and analysis platforms. It is very suitable for enterprises to build data lakes based on OSS.

Why build a data lake based on OSS

Massive elasticity: computing storage separation, storage scale elastic expansion

Ecologically open: Ecologically friendly to Hadoop, and seamlessly connected to Alibaba Cloud computing platforms

High cost performance: unified storage pool, avoid duplicate copies, and multiple types of hot and cold tiers

Easier management: unified management of encryption, authorization, lifecycle, cross zone replication, etc

Challenges Resolved

Inelasticity: waste of self built HDFS resources and difficulty in computing storage coupling and capacity expansion

High cost: the cost of self built HDFS is high, and there is no data cold and hot layering scheme

Lack of service: compared with Alibaba Cloud EMR, self built big data clusters lack expert support

Difficult to manage: data is scattered in multiple clusters, lacking unified data management