Cloud native enterprise data lake solution
The data lake is a unified storage pool that can interface with multiple data input methods. You can store structured, semi-structured, and unstructured data of any size. The data lake can seamlessly connect with a variety of computing and analysis platforms, directly conduct data processing and analysis, break the island, and gain insight into business value. At the same time, the data lake provides cold and hot hierarchical conversion capability, covering the entire life cycle of data.

Scheme architecture

Data Lake Storage
Object storage OSS is based on the reliability design of 12 nines, which can store data of any size, support cold and hot layering, and can interface with business applications and various computing and analysis platforms. It is very suitable for enterprises to build data lakes based on OSS.
Why build a data lake based on OSS
Massive elasticity: computing storage separation, storage scale elastic expansion
Ecologically open: Ecologically friendly to Hadoop, and seamlessly connected to Alibaba Cloud computing platforms
High cost performance: unified storage pool, avoid duplicate copies, and multiple types of hot and cold tiers
Easier management: unified management of encryption, authorization, lifecycle, cross zone replication, etc
Challenges Resolved
Inelasticity: waste of self built HDFS resources and difficulty in computing storage coupling and capacity expansion
High cost: the cost of self built HDFS is high, and there is no data cold and hot layering scheme
Lack of service: compared with Alibaba Cloud EMR, self built big data clusters lack expert support
Difficult to manage: data is scattered in multiple clusters, lacking unified data management
Recommended products

Application scenarios

Open source ecological construction of data lake
Build a full trusteeship massive data warehouse
Big data cold and hot tiered storage
Interactive query of massive data
Data lake building machine learning capability
Open source ecological construction of data lake
Build a full trusteeship massive data warehouse
Big data cold and hot tiered storage
Interactive query of massive data
Data lake building machine learning capability
Application scenarios
• Customers build data processing and analysis based on Hadoop ecology
• Widely used in Internet, finance, manufacturing, transportation and other fields
User pain points
• Rapid growth of data scale, unmatched expansion speed of storage resources and computing resources, and customer's demand for cost optimization
• Wide data sources, the storage system needs to interface with different data sources, including application data
WHY AliCloud
• OSS can support EB scale data lakes, support multiple data channels, and comprehensively cover various data sources such as logs, messages, databases, and HDFS
• OSS seamlessly interfaces with EMR Hive, Spark, Presto, Impala and other big data processing engines to eliminate data islands
• Alibaba Cloud EMR big data expert service support
• Alibaba Cloud Data Lake Formation provides data lake metadata management, data lake acceleration and other services; EMR big data expert service support

Application Practice

Practice of online education data lake
Practice of online game data lake
Practice of mutual entertainment new media data lake
Internet Advertising Data Lake Practice
An online education platform with more than 100 million users
customer demand
Courseware materials, application logs, learning samples and other data can be stored centrally
It can provide courseware playing, offline analysis and machine learning for different types of data to meet the needs of different scenarios of online education
Customer value
OSS supports centralized storage of various types of data, such as audio, video, pictures, logs, etc., and seamless docking of big data processing, and on-demand distribution of teaching courseware

Industry Scenario Best Practices

Data Lake Solution - Game Industry Best Practices
Mining the value of data and improving the game experience through data cloud refined operation
Current Time 0:00
/
Duration Time -:-
Progress: NaN%

Practical explanation

First, efficiently migrate massive HDFS files to OSS; Lesson 2: Data worry free: use checksum to migrate HDFS data to OSS
Lesson 3: How to archive HDFS data to OSS; Lesson 4: How to archive Hive data to OSS by partition
Lesson 5: The fastest way to access objects such as OSS: JindoFS SDK; Lesson 6: Hadoop/Spark Access OSS Acceleration

Customer Stories

Fluent in data lake practice
Yidiantianxia Data Lake Practice
Practice of Jiahe Science and Technology Data Lake
Watch video
Customer video - fluent
Through Alibaba Cloud's data lake solution tailored for Fluency, it has solved the unified storage of all kinds of data for Fluency applications, and helped Fluency build a "Chinese English voice database" with a data scale of hundreds of billions. The data lake built by Alibaba Cloud can give full play to the advantages of the computing and decoupling architecture. Combining Alibaba Cloud ECS elastic instances and K8S, it can dynamically expand and reduce the corresponding computing resources according to the actual business needs. It does not need to resident computing resources according to the business peak to help optimize costs to the greatest extent.
Watch video
Product recommendation
Free pre-sales expert service
Contact experts: According to the requirements you submitted, there will be free pre-sales expert services!