System operation and maintenance management OOS

The unified automated management and operation and maintenance platform on the cloud supports the orchestration of ECS, RDS, SLB and other cloud products to improve the efficiency of daily operation and maintenance management.

Product Introduction

System operation and maintenance management (CloudOps   Orchestration   Service (OOS for short) is a collection of system management tools, which contains a variety of system management automation tools, including automated tasks (supporting batch operations, scheduled tasks, task templates, cross regional operation and maintenance, and also providing functions such as approval and notification of important operation and maintenance scenarios), patch management, parameter management, configuration list, etc., to improve the efficiency of ECS, RDS The orchestration efficiency of SLB and other cloud products. OOS supports open source tool Terraform for management and operation. You can use system operation and maintenance management for free to arrange and manage cloud products or resources, that is, OOS itself does not charge, but you need to pay for the cloud resources used.

Product advantages
  • Provide full custody of automated execution, that is, serverless automated execution. The implementation process can meet the automatic operation and maintenance needs of entrepreneurial companies, small and medium-sized enterprises and large enterprise customers without consuming and using your computing resources (such as the ECS example). In the fully automated mode, there is no need for manual guard, allowing you to focus more on the rapid growth of business.

  • In traditional scenarios, the management complexity of executing batch tasks is significantly increased compared with that of executing single tasks. OOS can help you to provide real-time progress management, operation status statistics and rapid error location, thus improving the overall operation and maintenance efficiency.

  • Provide the daily operation and maintenance tasks in the form of template, and manage the template according to the code management method. From creation to approval, and then to synchronization to production accounts, subsequent O&M tasks only select operations from standard templates to ensure the safety of O&M actions, which are as standardized as source code, and thus complete the best practice of operations as code.

Product Functions
Getting Started and Trying Out
Get started quickly
  • zero one Prepare resources

    one

    Create or prepare a metered ECS instance

  • zero two Create a scheduled task for OOS

    one

    Automatic task of logging in to the OOS console

    two

    Select common operation and maintenance tasks -->switch on and off regularly

    three

    Select the target instance, startup/shutdown time and operation sequence

Free trial

On cloud automated operation and maintenance management platform, providing   ECS   Multiple management capabilities such as batch operation and scheduled task execution.

Implementation of ECS timing management using O&M choreography OOS
If you need to perform some repetitive operation and maintenance management on ECS, such as periodic system vulnerability repair, regular execution of certain commands in ECS, and batch replacement of ECS instance system disks, you can easily do so through the O&M choreography OOS. The automation tasks supported by operation and maintenance scheduling include scheduled tasks, periodic tasks, batch tasks, etc.
35 minutes
Technical solutions
Periodically update the image

Enterprise customers need to maintain multiple custom image types, and periodically update versions according to the needs of systems, applications, or security updates. Operation and maintenance choreography services are used to reduce manual repetitive operations and significantly improve operation and maintenance efficiency.

 Periodically update the image
  • Avoid the risk of inefficiency and misoperation of manual processing

    It is no longer necessary to manually create a temporary instance based on the source image, connect to the instance remotely for updating, release the instance, and complete it completely automatically.

  • Automatic integration with application update, code release, etc

    The O&M choreography template is invoked through the API to implement automatic integration with the customer's own O&M platform and cloud service interface.

Batch manage instance resources

Classified label management is carried out for instances according to different purposes, and operations such as batch setting, deployment, operation and maintenance are carried out.

 Batch manage instance resources
  • Classified management instance resources

    Batch management of ECS instances' tag resources. Tags can be classified according to different purposes such as OSType, AppVersion, Env, etc

  • Risk control of batch operation

    Execute the cloud assistant commands in batches through the O&M orchestration, verify that the content meets the expectations, and then carry out subsequent batches

Manage preemptive instances

Preemptive instances are suitable for stateless application scenarios, such as scalable Web site services, image rendering, big data analysis, and large-scale parallel computing. Since the preemptive instance will be released after the protection cycle, in order to ensure the overall stability of the cluster, it is necessary to sense the status of the instance in time. In case of an event that the instance is about to be released, automatic replacement operations will be taken to achieve smooth switching to ensure business continuity.

 Manage preemptive instances
  • Preemptive instance release awareness

    Event trigger, automatically identify preemptive instance status

  • Save the state in the instance before releasing

    Before the instance is released, the internal logs of the instance and other data that need to be persisted are automatically transferred

  • Instance Smooth Replace

    Automatically remove instances from load balancing, and create new instances to add to load balancing

Security Compliance

  •  Approve operation and maintenance capability

    Approve operation and maintenance capability

    • Approve operation and maintenance capability The system operation and maintenance management OOS provides the operation and maintenance management capability of operation approval. For high-risk O&M operations, such as releasing important ECS instances, users can configure task pauses in the OOS template, send an approval link containing a notice to the administrator user, and decide whether to continue or cancel the execution according to the administrator's approval results to avoid O&M risks.

  •  Encryption parameter hosting capability

    Encryption parameter hosting capability

    • Encryption parameter hosting capability Users can use OOS encryption parameters to store sensitive information, such as database passwords. The encryption parameters will be encrypted by KMS when they are stored to ensure that sensitive information will not be disclosed during creation and use.

  •  System patch automatic repair capability

    System patch automatic repair capability

    • System patch automatic repair capability By using the OOS patch management function, users can timely understand the system patches that need to be repaired in ECS instances, and complete the automatic repair of operating system patches through configuration to ensure the security and compliance of users' server assets.

  •  Support RAM permission setting

    Support RAM permission setting

    • RAM permission setting Access control RAM is Alibaba Cloud's unified user identity and resource access rights management service. Users can control the access of sub users or roles to the system O&M management OOS resources from the account level through RAM. At the same time, the system O&M management OOS can specify the role of OOS to perform template execution by setting Ram role parameters.

  •  Operation record audit

    Operation record audit

    • Operation record audit Any operation and maintenance operation conducted through the system operation and maintenance management OOS can be traced by viewing the task execution history, the operator, execution time, execution result and other information, and quickly locate the cause of abnormal operation.

common problem
Q: User has no permission to do the action: (ListTemplates)?
A: The sub account does not have sufficient permissions to execute the specified OOS API. You need to log in to the RAM console as an administrator or master account, and properly authorize the sub account that reported an error. The authorization range can be related APIs or all APIs. The following example "Action": "oos: *" is true View details
Q: The execution template reports an error: Assets role failed Code: NoPermission, msg: You are not authorized to do this action. You should be authorized by RAM, how to solve it?
A: The corresponding RAM role does not configure a trust policy for the OOS service. The master account or administrator needs to log in to the RAM access control of the RAM console and add the corresponding RAM role OOSServiceRole. See Setting RAM Permissions for OOS Services View details
Q: Temporary bandwidth upgrade error: code: InvalidAccountStatus NotEnoughBalance message: Your account does not have enough balance, how to solve it?
A: The account balance is insufficient, which causes the new order cannot complete the transaction, and the account needs to be recharged. View details