E-MapReduce Serverless Spark
E-MapReduce (hereinafter referred to as "EMR") Serverless Spark is an open source big data platform. E-MapReduce is a fully hosted, one-stop data computing platform based on Spark. It provides users with a full range of productized services such as task development, debugging, release, scheduling, operation and maintenance, which significantly simplifies the workflow of big data computing and enables users to focus more on data analysis and value refining.

Product advantages

Cloud native fast computing engine
Built in Spark Native Engine, which improves the performance of the open source version by 200%;
The built-in Celeborn (Remote Shuffle Service) supports PB level Shuffle data, and the total cost of computing resources can be reduced by up to 30%.
Open data lake architecture
Support the separation of computing and storage, computing can be elastically scaled, and storage can be paid as you go;
Connect with OSS-HDFS, fully compatible with HDFS cloud storage, and seamlessly and smoothly migrate to the cloud;
Centralized DLF metadata, fully open up the warehouse metadata.
One stop development experience
Provide one-stop data development experience such as job development, debugging, release and scheduling;
Built in version management, development and production isolation, meeting enterprise level development and release standards.
Serverless resource platform
Out of the box, there is no need to manually manage and operate the cloud infrastructure;
Elastic scaling, second level resource elasticity and supply;
Pay as you go, pay as you actually calculate resources, and further reduce the total calculation cost.

Product Functions

Build enterprise level fully hosted data platform services
Easy to use Committed to providing customers with high-quality product experience, customers can start the development journey without building complex infrastructure
High performance Based on Spark Native Engine, provide up to three times the performance experience of open source Spark
High scalability Based on Alibaba Cloud Serverless, it provides resource flexibility to meet the sudden peak of ETL operations and further reduce the cost of actual computing resources.
Observable resources Provide observable indicators and alarm capabilities at the resource and task instance levels
High security Based on Alibaba Cloud VPC deployment, VPC access is provided, while more detailed access control and higher security level protection are provided.
Open architecture and ecological integration
AliCloud upstream and downstream product integration Seamless connection with AliCloud OSS-HDFS/OSS, data lake to build DLF, and DataWorks to provide customers with maximum convenience.

Application scenarios

Establish data platform based on EMR Serverless Spark
Thanks to its open product architecture, EMR Serverless Spark makes it easy and efficient to analyze and process structured and unstructured data in the data lake. In addition, it also has a built-in task scheduling system, which allows users to easily build and manage data ETL tasks, realizing the automation of data pipelines and periodic data processing. EMR Serverless Spark also has an advanced version management system embedded, and provides complete isolation between development and production environment to ensure compliance with the strict requirements of enterprise users in terms of R&D and release processes. These features together ensure the reliability and efficiency of data processing, while meeting the high standards of enterprise applications.
Advantages:
Full custody free operation and maintenance
Elastic scalability
Open data lake architecture
One stop data development platform
Recommended combination