file

What is EMR on ECS

EMR on ECS refers to the way EMR operates on ECS. EMR on ECS combines the big data processing function of EMR with the advantages of container deployment of ECS, enabling you to configure and manage EMR clusters more flexibly, so as to better adapt to complex data processing and analysis scenarios. With EMR on ECS, you can quickly create, manage, operate and maintain EMR clusters, and use computing and storage resources more efficiently.

Product advantages

EMR provides you with relatively convenient and controllable enterprise level open source big data services. You can quickly build open source big data services, such as Hadoop, Spark, Flink, Kafka and HBase services.

  • 100% use community open source components, adapt and optimize open source components, and the performance is much higher than that of the open source version.

  • Based on time elastic scalability, preemptive instances can further reduce costs.

  • The binding relationship between computing and storage is decoupled, and the elastic utilization of resources is realized.

  • Create and expand clusters at the minute level without manually deploying and starting services.

Product billing

The charging methods supported by EMR on ECS are as follows:

  • Monthly guarantee : Purchase resources for a certain period of time, pay first and use later.

  • Pay as you go : Subscribe and release resources on demand, pay after use.

For detailed billing rules, see Billing Overview

Comparison with self built Hadoop cluster

The advantage comparison between the open source big data development platform EMR and the self built Hadoop cluster is shown in the following table.

Contrast item

Alibaba Cloud EMR

Self built Hadoop cluster

cost

It supports pay as you go and monthly package, cluster resources support flexible adjustment, hierarchical data storage, and high resource utilization. No additional software license fees.

Resources need to be estimated in advance, and the resources are relatively fixed, and the resource utilization rate is low. If Hadoop distribution is adopted, additional license fees shall be paid.

performance

Compared with the open source version, the performance has been greatly improved.

The open source community version is adopted, and the performance needs to be self optimized.

Ease of use

The Hadoop cluster is launched at the minute level to respond to business needs quickly.

Purchase servers and deploy Hadoop ecological components, with a cycle of several weeks.

elastic

The cluster can be started and destroyed temporarily according to the job. Cluster resources can be automatically adjusted dynamically according to the time cycle or cluster load. Based on JindoFS computing storage separation architecture, computing and storage resources can be easily expanded separately.

Computing and storage are coupled. Resources are relatively fixed and cannot be flexibly adjusted.

security

Support enterprise level multi tenant resource management, support table, column, row level permission control and log audit, and support data encryption.

The multi tenant management capability needs to be configured by itself. The capability is not perfect and cannot meet the enterprise level requirements.

reliable

Large scale, enterprise level environment inspection, upgrade with the open source version, and pass professional compatibility verification tests to provide a better use experience than the community version.

You need to update and upgrade the open source version by yourself, verify the compatibility of each component version, and repair community bugs by yourself.

service

Professional and senior big data expert technical service team provides after-sales support.

There is no service support for the community version. For Hadoop distribution, you need to pay additional license and service fees.

  • Introduction to this page (1)