sundry

Oceanbase distributed aggregation push down

Aggregation operators in the database, such as sum, avg, group by, sort, etc., consume a lot of CPU and IO resources for computing, reading and writing. By parallel execution, the corresponding operators are pushed down to each computing node, making full use of the computing resources of the cluster, and improving the execution efficiency.

In general, the downward pressure of aggregation operators can be divided into two plans

Partition-Wise Aggregation

  1. explain select sum(c2) from t2 group by c1\G  
  2. *************************** 1. row ***************************  
  3. Query Plan:   
  4. ============================================  
  5. |ID|OPERATOR           |NAME|EST.  ROWS|COST|  
  6. --------------------------------------------  
  7. |0 |EXCHANGE IN DISTR  |    |1000     |2834|  
  8. |1 | EXCHANGE OUT DISTR|    |1000     |2740|  
  9. |2 |  HASH GROUP BY    |    |1000     |2740|  
  10. |3 |   TABLE SCAN      |t2  |4000     |499 |  
  11. ============================================  
  12.   
  13. Outputs & filters:   
  14. -------------------------------------  
  15.   0 - output([T_FUN_SUM(t2.c2)]), filter(nil)  
  16.   1 - output([T_FUN_SUM(t2.c2)]), filter(nil)  
  17.   2 - output([T_FUN_SUM(t2.c2)]), filter(nil),   
  18.       group([t2.c1]), agg_func([T_FUN_SUM(t2.c2)])  
  19.   3 - output([t2.c1], [t2.c2]), filter(nil),   
  20.       access([t2.c1], [t2.c2]), partitions(p[0-3])  

If the semantic of aggregate query is to group by partition key and aggregate within the group, such aggregation operation is called partition wise aggregation. All operations are pushed down to the partition for parallel execution. It should be noted that partition wise aggregation is not necessarily the optimal execution, because parallelism is limited by the number of partitions. In OceanBase 1. x, there is no other alternative plan for this situation.

Push down and two-stage polymerization

  1. explain select sum(c1) from t2 group by c2\G  
  2. *************************** 1. row ***************************  
  3. Query Plan:   
  4. =============================================  
  5. |ID|OPERATOR            |NAME|EST.  ROWS|COST|  
  6. ---------------------------------------------  
  7. |0 |HASH GROUP BY       |    |1000     |3395|  
  8. |1 | EXCHANGE IN DISTR  |    |1000     |2834|  
  9. |2 |  EXCHANGE OUT DISTR|    |1000     |2740|  
  10. |3 |   HASH GROUP BY    |    |1000     |2740|  
  11. |4 |    TABLE SCAN      |t2  |4000     |499 |  
  12. =============================================  
  13.   
  14. Outputs & filters:   
  15. -------------------------------------  
  16.   0 - output([T_FUN_SUM(T_FUN_SUM(t2.c1))]), filter(nil),   
  17.       group([t2.c2]), agg_func([T_FUN_SUM(T_FUN_SUM(t2.c1))])  
  18.   1 - output([T_FUN_SUM(t2.c1)], [t2.c2]), filter(nil)  
  19.   2 - output([T_FUN_SUM(t2.c1)], [t2.c2]), filter(nil)  
  20.   3 - output([T_FUN_SUM(t2.c1)], [t2.c2]), filter(nil),   
  21.       group([t2.c2]), agg_func([T_FUN_SUM(t2.c1)])  
  22.   4 - output([t2.c1], [t2.c2]), filter(nil),   
  23.       access([t2.c1], [t2.c2]), partitions(p[0-3])  

In more general cases, the aggregation operation may not be performed according to the partition key as a grouping. In this case, OceanBase will adopt the two-stage aggregation method shown in Example 2, press the aggregation operation down to obtain partial aggregation results, and then summarize to obtain the final results. It should be noted that at present, the second phase of two-phase aggregation is not executed in parallel.

fabulous ( one hundred and seventy-three )

This article is written by Ji Changxin Author, article address: https://blog.isoyu.com/archives/oceanbasefenbushijuhexiaya.html
use Knowledge Sharing Attribution 4.0 International License Agreement. Unless the reprint/source is indicated, they are all original or translated by this website. Please sign your name before reprinting. Last editing time: September 16, 2018 at 04:34 p.m

key word:

Popular articles

Comments:

4 comments, 0 visitors, 0 bloggers
  1.  Headlines
    Headlines Published on:

    The article is good, I really like it

  2.  Marshal ratio
    Marshal ratio Published on:

    Good morning, boss

  3.  Xifengli Blog
    Xifengli Blog Published on:

    I just read it as a heavenly book. What is the concept of pressure?

Post reply

[Required]

I am a human?

Please wait three seconds after submission to avoid unsubmission and repetition