Apache CarbonData 2.0 Online Release

2020/05/30 18:45
Reading number 685

Click above“ Kaiyuan Cooperative ”Follow us





|Sponsor: Apache CarbonData Kaiyuan Cooperative
|Editor: Chen Meimei
|Design: Ye Xiuyuan




Apache CarbonData 2.0 Online Release


2020/06/03 (Wednesday )19:30 - 21:00


Apache CarbonData x Open Source Agency






Activity introduction

Overview



Apache CarbonData is a high-performance EB level native Hadoop analytical data warehouse, providing high-performance detailed query and interactive query capabilities for object-oriented storage of EB level data, providing real-time data synchronization and update capabilities, providing support and acceleration for major ETL businesses, and machine learning library supporting annotation and training analysis.

As one of the few top Apache projects contributed by Chinese companies, CarbonData officially "debuted" in 2017. Let's briefly review the history of CarbonData:

  • CarbonData 1.4 became the top project of Apache (at the beginning of 2017): a number of large domestic and foreign customers tried it out. At that time, Spark on CarbonData was 1.5~2 times more than Spark on Parquet in customer performance tests. Promoted CarbonData 1.0 to be released as an official product, and became the first top Apache project contributed by a Chinese company.

  • CarbonData 1.5, 1.6 (early 2019): Hadoop ecological ACID capabilities, including transaction, fault tolerance, metadata management, etc.

  • CarbonData 2.0 release (current): the system architecture for the cloud environment has been redesigned, with dozens of advanced functions, including storage and computing separation optimization, index and materialized view capabilities, data lake capabilities, real-time data synchronization and update, and so on.

It can be seen that since its debut, CarbonData has been adhering to the open source spirit of practice and exploration and constantly forging ahead, becoming an important force in the rise of Chinese power among the Apache Software Foundation.
Now let's quickly preview the important milestone features of CarbonData 2.0.


Separation of deposit and account



              
              
              
  1. -Storage optimization: metadata management optimization of object-oriented storage to avoid the high cost of moving objects and enumerating objects in data management
  2. -Computing ecology: support Spark 2.4.5, Flink Hive、Alluxio、Presto、PyTorch、TensorFlow


Detailed query and interactive analysis


              
              
              
  1. -Detail query: secondary index, spatial index, segment level MinMax index, realizing second level response to PB level detail query
  2. -Complex query: materialized view, timing pre aggregation, bucket index, and second level response to complex query
  3. -Data lake index management: distributed index cache - IndexServer, and support index memory preloading


Real time data synchronization and update



              
              
              
  1. -Insert, Update and Delete performance enhancements, support Merge syntax


ETL support and acceleration



              
              
              
  1. -Support Hive to read and write CarbonData transaction tables, and deep optimization of read and write performance


AI



              
              
              
  1. -Machine learning library supporting annotation and training analysis


In order to let you know more about Apache CarbonData 2.0, we will hold an online live broadcast of Apache CarbonData 2.0.


Live broadcast information



At that time, not only guests from CarbonData community explained the latest features and performance of CarbonData, but also we invited several CarbonData developers to share big data application experience and practice with you.


Special Guests


  • Chen Liang (Huawei; Apache CarbonData PMC & Committer)
  • Li Kun (Huawei; Apache CarbonData PMC & Committer)
  • Kunal Kapoor(Apache CarbonData PMC & Committer)
  • Ravindra Pesala (Development Bank of Singapore, Apache CarbonData PMC & Committer)
  • Vimal Das(UBER, Apache CarbonData PMC & Committer)
  • Zhichao Zhang(Kyligence, Apache CarbonData PMC & Committer)
  • Cao Lu (big data architect of SAIC Data Business Department, Apache CarbonData Committer)
  • He Xiaoqiao (Meituan Reviews Data Platform Engineer, Apache CarbonData Committer)
  • Hao Xingjun (core contributor of Apache CarbonData)
  • Lin Lvqiang, Richard Lin (director of Kaiyuan News Agency; host of this press conference)


Agenda





Click to read the original text to directly participate in the registration~





Introduction to Kaiyuan News Agency


The Open Source Society is a manufacturer neutral, volunteer and non-profit open source alliance organized by enterprises, communities and individuals at home and abroad who support open source in accordance with the principle of "contribution, consensus and co governance", aiming to jointly create a healthy and sustainable open source ecosystem and promote the Chinese open source community to become an active participant and contributor of global open source software. We focus on open source governance, international integration, community development and open source projects.






Related reading | Related Reading


Summer 2020: "Big Masters Talk about Open Source" | Li Jiansheng: "Hidden" Rules in the Open Source World


Ding~An interesting and brain burning programmer challenge game, waiting for you to play


How to start the first open source project?


This article is shared from the WeChat official account Kaiyuan She.
In case of infringement, please contact support@oschina.cn Delete.
Participation in this article“ OSC Source Innovation Plan ”, welcome you to join us and share with us.

Expand to read the full text
Loading
Click to lead the topic 📣 Post and join the discussion 🔥
Reward
zero comment
zero Collection
zero fabulous
 Back to top
Top