Cloud native enterprise data lake solution
The data lake is a unified storage pool that can interface with multiple data input methods. You can store structured, semi-structured, and unstructured data of any size. The data lake can seamlessly connect with a variety of computing and analysis platforms, directly conduct data processing and analysis, break the island, and gain insight into business value. At the same time, the data lake provides cold and hot hierarchical conversion capability, covering the entire life cycle of data.

Scheme architecture

Data Lake Storage
Object storage OSS is based on the reliability design of 12 nines, which can store data of any size, support cold and hot layering, and can interface with business applications and various computing and analysis platforms. It is very suitable for enterprises to build data lakes based on OSS.
Why build a data lake based on OSS
Massive elasticity: computing storage separation, storage scale elastic expansion
Ecologically open: Ecologically friendly to Hadoop, and seamlessly connected to Alibaba Cloud computing platforms
High cost performance: unified storage pool, avoid duplicate copies, and multiple types of hot and cold tiers
Easier management: unified management of encryption, authorization, lifecycle, cross zone replication, etc
Challenges Resolved
Inelasticity: waste of self built HDFS resources and difficulty in computing storage coupling and capacity expansion
High cost: the cost of self built HDFS is high, and there is no data cold and hot layering scheme
Lack of service: compared with Alibaba Cloud EMR, self built big data clusters lack expert support
Difficult to manage: data is scattered in multiple clusters, lacking unified data management
Recommended products