Zookeeper serialization component Jute analysis - ksfzhaohui's personal page - OSCHINA - Chinese open source technology exchange community

Jute analysis of Zookeeper serialization component

brief introduction
Jute is a serialization component in Zookeeper. It was originally the default serialization component in Hadoop. Its predecessor was Hadoop Record IO. Later, Apache Avro has better cross language, rich data structure, support for MapReduce, and can be easily used for RPC calls; Therefore, Hadoop abandoned Record IO and began to use Avro. It separated Record IO and became an independent serialization component, renamed Jute.
Zookeeper has been using Jute as a serialization tool since the earliest version, and the latest version of zookeeper-3.4.9 still uses Jute; As for why it is not replaced by serialization components with better performance and better versatility, such as Apache Avro, Thrift, Protobuf, etc., it is mainly because of the compatibility of serialization components of new and old versions. On the other hand, Jute has not become the bottleneck of Zookeeper; The following is an analysis of the use of Jute and some source codes.

Simple use
First of all, I will simply use Jute and have a preliminary understanding of Jute:
1. Provide a bean that implements the interface Record

 public class TestBean implements Record { private int intV; private String stringV; public TestBean() { } public TestBean(int intV, String stringV) { this.intV = intV; this.stringV = stringV; } //Get/set method @Override public void deserialize(InputArchive archive, String tag) throws IOException { archive.startRecord(tag); this.intV = archive.readInt("intV"); this.stringV = archive.readString("stringV"); archive.endRecord(tag); } @Override public void serialize(OutputArchive archive, String tag) throws IOException { archive.startRecord(this, tag); archive.writeInt(intV, "intV"); archive.writeString(stringV, "stringV"); archive.endRecord(this, tag); } }

The Record interface implemented mainly implements two methods, namely deserialize and serialize.

2. Serialization and Desequence

 public class BinaryTest1 { public static void main(String[] args) throws IOException { ByteArrayOutputStream baos = new ByteArrayOutputStream(); BinaryOutputArchive boa = BinaryOutputArchive.getArchive(baos); new TestBean(1, "testbean1").serialize(boa, "tag1"); byte array[] = baos.toByteArray(); ByteArrayInputStream bais = new ByteArrayInputStream(array); BinaryInputArchive bia = BinaryInputArchive.getArchive(bais); TestBean newBean1 = new TestBean(); newBean1.deserialize(bia, "tag1"); System.out.println("intV = " + newBean1.getIntV() + ",stringV = " + newBean1.getStringV()); bais.close(); baos.close(); } }

The serializer BinaryOutputArchive and the deserializer ByteArrayInputStream are provided respectively, and then the TestBean is serialized and deserialized by specifying the tag1 tag. Finally, the data before and after serialization are compared.

Use Analysis
In the above example, just is used simply. Of course, you can also enter the source code for code analysis during the use process. You can first look at the code structure of Just:

First, start with the Record interface inherited by the bean. The source code is as follows:

 public interface Record { public void serialize(OutputArchive archive, String tag) throws IOException; public void deserialize(InputArchive archive, String tag) throws IOException; }

It is very simple. Two methods are provided: serialize and deserialize. Each method has two parameters. OutputArchive represents the serializer, InputArchive represents the deserializer, and tag is used to identify objects, mainly because the same serializer can serialize multiple objects, so each object needs to be identified.

Similarly, the OutputArchive serializer is also an interface. The source code is as follows:

 public interface OutputArchive { public void writeByte(byte b,  String tag) throws IOException; public void writeBool(boolean b,  String tag) throws IOException; public void writeInt(int i,  String tag) throws IOException; public void writeLong(long l,  String tag) throws IOException; public void writeFloat(float f,  String tag) throws IOException; public void writeDouble(double d,  String tag) throws IOException; public void writeString(String s,  String tag) throws IOException; public void writeBuffer(byte buf[], String tag) throws IOException; public void writeRecord(Record r,  String tag) throws IOException; public void startRecord(Record r,  String tag) throws IOException; public void endRecord(Record r,  String tag) throws IOException; public void startVector(List v,  String tag) throws IOException; public void endVector(List v,  String tag) throws IOException; public void startMap(TreeMap v,  String tag) throws IOException; public void endMap(TreeMap v,  String tag) throws IOException; }

Types supporting serialization are defined in the interface:
Basic type: byte, boolean，int，long，float，double
Non basic type: string, byte [], nested type, vector, treeMap
The corresponding InputArchive deserializer supports the same type, which is no longer cumbersome here.

The implementation classes of OutputArchive and InputArchive can be seen from the code structure, mainly as follows:
OutputArchive implementation class: BinaryOutputArchive, CsvOutputArchive and XmlOutputArchive
InputArchive implementation classes: BinaryInputArchive, CsvInputArchive and XmlInputArchive
Purpose:
BinaryOutputArchive: used for network transmission and local disk storage
CsvOutputArchive: It is more convenient for visual presentation of data objects
XmlInputArchive: save and restore data as xml
More places in Zookeeper are used for network transmission and local disk storage, so BinaryOutputArchive is the most widely used. The above instance also uses BinaryOutputArchive as the serialization class.

Here's a brief look at the implementation code of BinaryOutputArchive:

 private ByteBuffer bb = ByteBuffer.allocate(1024); private DataOutput out; public static BinaryOutputArchive getArchive(OutputStream strm) { return new BinaryOutputArchive(new DataOutputStream(strm)); } /** Creates a new instance of BinaryOutputArchive */ public BinaryOutputArchive(DataOutput out) { this.out = out; } public void writeByte(byte b, String tag) throws IOException { out.writeByte(b); } //Other types of serialization are omitted. You can check the source code yourself

In the above code, BinaryOutputArchive provides two methods to construct BinaryOutputArchive, one is the static method getArchive (OutputStream strm), and the other is the constructor of DataOutput parameters;
No matter which construction method is used, a DataOutput parameter must be provided, and the final serialization of all types is based on the DataOutput of the jdk. Without implementing a set of methods, there are certain limitations, which cannot be optimized in space.
So far, several important classes in just serialization have been simply analyzed. Next, a more complex bean is provided according to all supported data types analyzed above.

A more comprehensive example

 public class TestBeanAll implements Record { private byte byteV; private boolean booleanV; private int intV; private long longV; private float floatV; private double doubleV; private String stringV; private byte[] bytesV; private Record recodeV; private List<Integer> listV; private TreeMap<Integer, String> mapV; @Override public void deserialize(InputArchive archive, String tag) throws IOException { archive.startRecord(tag); this.byteV = archive.readByte("byteV"); this.booleanV = archive.readBool("booleanV"); this.intV = archive.readInt("intV"); this.longV = archive.readLong("longV"); this.floatV = archive.readFloat("floatV"); this.doubleV = archive.readDouble("doubleV"); this.stringV = archive.readString("stringV"); this.bytesV = archive.readBuffer("bytes"); archive.readRecord(recodeV, "recodeV"); // list Index vidx1 = archive.startVector("listV"); if (vidx1 !=  null) { listV = new ArrayList<>(); for (; ! vidx1.done();  vidx1.incr()) { listV.add(archive.readInt("listInt")); } } archive.endVector("listV"); // map Index midx1 = archive.startMap("mapV"); mapV = new TreeMap<>(); for (; ! midx1.done();  midx1.incr()) { Integer k1 = new Integer(archive.readInt("k1")); String v1 = archive.readString("v1"); mapV.put(k1, v1); } archive.endMap("mapV"); archive.endRecord(tag); } @Override public void serialize(OutputArchive archive, String tag) throws IOException { archive.startRecord(this, tag); archive.writeByte(byteV, "byteV"); archive.writeBool(booleanV, "booleanV"); archive.writeInt(intV, "intV"); archive.writeLong(longV, "longV"); archive.writeFloat(floatV, "floatV"); archive.writeDouble(doubleV, "doubleV"); archive.writeString(stringV, "stringV"); archive.writeBuffer(bytesV, "bytes"); archive.writeRecord(recodeV, "recodeV"); // list archive.startVector(listV, "listV"); if (listV !=  null) { int len1 = listV.size(); for (int vidx1 = 0;  vidx1 < len1; vidx1++) { archive.writeInt(listV.get(vidx1), "listInt"); } } archive.endVector(listV, "listV"); // map archive.startMap(mapV, "mapV"); Set<Entry<Integer, String>> es1 = mapV.entrySet(); for (Iterator<Entry<Integer, String>> midx1 = es1.iterator();  midx1 .hasNext();)  { Entry<Integer, String> me1 = (Entry<Integer, String>) midx1.next(); Integer k1 = (Integer) me1.getKey(); String v1 = (String) me1.getValue(); archive.writeInt(k1, "k1"); archive.writeString(v1, "v1"); } archive.endMap(mapV, "mapV"); archive.endRecord(this, tag); } }

The above examples involve all types supported by just, which gives us a more intuitive understanding. It would be crazy to write such a code every time you write a bean. Fortunately, most serialization tools now support data description languages, DDL (Data Description Language), of course, just is no exception. In fact, if you look at the source code of Zookeeper, you will find that many classes have such a description at the beginning: // File generated by hadoop record compiler. Do not edit.
Classes with relevant descriptions are generated through the data description language of just.

Data Description Language
Many classes in Zookeeper are generated through the description language. The corresponding description files can also be found in Zookeeper's package: zookeeper-3.4.9/src Under file zookeeper.jute The file contains all the bean files that need to be generated in Zookeeper. You can open it yourself to view it. Here I provide a more complete description file instance:

 module test { class TestBean { int intV; ustring stringV; } class TestBeanAll { byte byteV; boolean booleanV; int intV; long longV; float floatV; double doubleV; ustring stringV; buffer bytes; test.TestBean record; vector<int>listV; map<int,ustring>mapV; } }

Module specifies the package name, class specifies the class name, and then the field types in the class. The supported types are listed above;
The above description file includes all types. The final generated class file is similar to the above class TestBeanAll;
With the description file, how to generate the class file? The relevant code implementation is under the compiler package. The compiler package is not expanded in the above class structure picture. Here we can expand it:

From the class structure, we can see four classes: JavaGenerator, CSharpGenerator，CppGenerator，CGenerator； Respectively generate java, c #, c++, c language class files;
By looking up layer by layer, we can finally find the Rcc class as the main class, and we can look at some codes:

 public static void main(String args[]) { String language = "java"; ArrayList recFiles = new ArrayList(); JFile curFile=null; for (int i=0;  i<args.length; i++) { if ("-l".equalsIgnoreCase(args[i]) || "--language".equalsIgnoreCase(args[i])) { language = args[i+1].toLowerCase(); i++; } else { recFiles.add(args[i]); } } if (! "c++".equals(language) && ! "java".equals(language) && ! "c".equals(language)) { System.out.println("Cannot recognize language:" + language); System.exit(1); } //The following is omitted }

The default language is java, and the language is specified by - l or – language; C # has been supported in the code, but it has not been written here. I don't know why. Could it be that there are bugs in C #.
So we can write a simple test case:

 public class ParseTest { public static void main(String[] args) { String params[] = new String[3]; params[0] = "-l"; params[1] = "java"; params[2] = "test.jute"; Rcc.main(params); } }

Three parameters are specified. Of course, multiple just description files can be added here, and the corresponding class files can be generated after running.

Simple comparison with Protobuf

Compared with the serialization and deserialization time of protobuf3 and the number of bytes after serialization, the corresponding versions are:
Protobuf：protobuf-3.0.0
jute:zookeeper-3.4.9

Provide their own description files, and specify fields and field names of the same type, as shown below:
Protobuf description file:

 syntax = "proto3"; option java_package = "protobuf.clazz";  option java_outer_classname = "GoodsPicInfo"; message PicInfo {  int32 ID = 1;  int64 GoodID = 2;         string Url = 3;  string Guid = 4;  string Type = 5;  int32 Order = 6;  }

Just description file:

 module test { class PicInfo { int ID; long GoodID; ustring Url; ustring Guid; ustring Type; int Order; } }

Then generate the corresponding class file through their respective generation tools. The following is the test code

Protobuf test code:

 public class Protobuf_Test { public static void main(String[] args) throws InvalidProtocolBufferException { long startTime = System.currentTimeMillis(); byte[] result = null; for (int i = 0;  i < 50000; i++) { GoodsPicInfo.PicInfo.Builder builder = GoodsPicInfo.PicInfo .newBuilder(); builder.setGoodID(100); builder.setGuid("11111-22222-3333-444"); builder.setOrder(0); builder.setType("ITEM"); builder.setID(10); builder.setUrl(" http://xxx.jpg "); GoodsPicInfo.PicInfo info = builder.build(); result = info.toByteArray(); } long endTime = System.currentTimeMillis(); System. out. println ("Byte size:"+result. length+", serialization time:" + (endTime - startTime) + "ms"); for (int i = 0;  i < 50000; i++) { GoodsPicInfo.PicInfo newBean = GoodsPicInfo.PicInfo .getDefaultInstance(); MessageLite prototype = newBean.getDefaultInstanceForType(); newBean = (PicInfo) prototype.newBuilderForType().mergeFrom(result) .build(); } long endTime2 = System.currentTimeMillis(); System. out. println ("Deserialization takes time:"+(endTime2 - endTime)+"ms"); } }

Just test code:

 public class Jute_test { public static void main(String[] args) throws IOException { long startTime = System.currentTimeMillis(); byte array[] = null; for (int i = 0;  i < 50000; i++) { ByteArrayOutputStream baos = new ByteArrayOutputStream(); BinaryOutputArchive boa = BinaryOutputArchive.getArchive(baos); new PicInfo(10, 100, " http://xxx.jpg ", "11111-22222-3333-444", "ITEM", 0).serialize(boa, "tag" + i); array = baos.toByteArray(); } long endTime = System.currentTimeMillis(); System. out. println ("Byte size:"+array. length+", serialization time:" + (endTime - startTime) + "ms"); for (int i = 0;  i < 50000; i++) { ByteArrayInputStream bais = new ByteArrayInputStream(array); BinaryInputArchive bia = BinaryInputArchive.getArchive(bais); PicInfo newBean = new PicInfo(); newBean.deserialize(bia, "tag1"); } long endTime2 = System.currentTimeMillis(); System. out. println ("Deserialization takes time:"+(endTime2 - endTime)+"ms"); } }

Carry out 50000 serialization and deserialization operations respectively, and the results are as follows:
Protobuf: byte size: 48, serialization time: 141ms, deserialization time: 62ms
Just: Bytes size: 66, serialization time: 94ms, deserialization time: 62ms
Just has some advantages in the time spent in serialization, but the size of bytes is not ideal.

summary
This article starts with a simple example, then analyzes several core classes, and learns that the data types supported by just, the languages supported, and its serialization and deserialization are all based on the DataOutput and DataInput of jdk; Then we understand the data description language of jute; Finally, we can compare it with protobuf and find that jute still has its own advantages. I think this is part of the reason Zookeeper has always used jute as its serialization tool.

golyu 2024-06-10 14:45

If only this was the library of solidjs

Francesca 2024-06-10 16:19

Be ignorant. This thing has a long history. It is used for scientific computing and has high performance

lyh97157268 2024-06-09 20:58

Like c++

zoujiaqing 2024-06-07 21:22

I dare not use it

Wang Zheng 2024-06-08 09:46

You said, "All the tests are graduate students" and smiled. I don't know my level is low.

iVista 2024-06-10 18:13

I was blinded by the math test

Francesca 2024-06-09 13:21

But the end of closed source must be open source, because many people who are dissatisfied with closed source have created open source, so the end of open source is not necessarily closed source, but to find a business model that is open source= Free Admission

Francesca 2024-05-19 18:00

Wine runs the Android emulator of Windows. Chrome OS is installed in the Android emulator. Linux environment is installed in chrome OS. Linux environment is installed in the Linux environment. Wine is installed in the Android emulator

osc_27546117 2024-06-09 22:36

Learned electric programming and expected its progress

infoworld 2024-05-11 15:12

Universities should use open source free software instead of commercial ones. In this way, hands and feet will not be tied technically.

One code Yma 2024-05-06 09:14

My technical article was moved by CSDN. Why didn't anyone step on the sewing machine? This kind of report is a joke to me. The monsters with background are fine, and the monsters without background fight to death

kangert 2024-06-09 20:07

The problem of docker hub is very uncomfortable

Small and beautiful software development 2024-06-08 23:03

It's mainly about waist training

SnailJob 2024-06-09 09:13

Yes, please continue to follow Snail Job

MrChen89 2024-04-29 09:18

There are a group of people like this. I don't know what they have experienced. When it comes to HW, I can't say anything good, even if it's neutral

-SORA- 2024-04-30 17:07

When this happened in a foreign country, the comment area suddenly became very objective and rational**

Xiao Xu Middle aged 2024-06-08 10:12

First place in making money!! Money and treasures will be plentiful

Xiao_f 2024-06-07 22:59

One thing to say, compared with other domestic manufacturers, Qwen's relaxed licensing fully demonstrates the style of a large factory

One code Yma 2024-05-09 09:58

Recently, I often go to interviews. People who hate Ali background most regard me as a fool, even though I am a fool

Monkeys think of apes 2024-05-31 18:31

You can cheat your brother. Just don't cheat yourself

zhy 2024-05-16 13:16

At the end of Shannon is Nong

Ma Nong Little Fatty Brother 2024-05-16 14:40

I give you six seconds. I give you six moves with the same effect in the martial arts contest, which shows the invincibility and confidence of the master

xiaoqibabby 2024-05-15 17:36

The bank is strongly required to be responsible for

Happy LeapFrog 2024-05-18 09:18

But the question is: "What's the use of this for ordinary Android users?" Now the answer seems to be: "Almost nothing.".

brucepapa 2024-06-09 21:02

I also have several backaches... After a few days of exercise, it will be much better to focus on stretching the back muscles.

Ding Yun H 2024-06-07 20:44

There is no querydsl. Since querydsl was used, I can't look at other forms anymore

yh2216 2024-06-09 13:15

GDWhisperer 2024-05-15 17:23

I transferred tens of thousands of yuan to my own account, which was under risk control. How did I do this? The bank should be responsible for this**

Ai East 2024-06-10 19:11

Absolutely easy to use

osc_92224065 2024-04-29 10:57

Long term oppressed outsourcing of state-owned enterprises

Single structure 2024-05-11 10:09

Selected as Open Source China's disgrace pillar

abeet 2024-06-08 20:38

There are no pictures, for fear that we will learn, right

Bright 2024-05-19 23:25

What a fool! I killed myself. How can people deal with me later.

Yoona520 2024-05-17 16:34

Zhou Hongyi is now living more and more like a clown. If he stays behind the scenes, he has to become an online celebrity. Can you learn from Lei Jun?

Shuimu Yi'an 2024-05-20 09:58

The news should be read continuously. I'm waiting for the third news besides rustdesk and teamviewer. Localized remote desktop software is far ahead.

monkey_cici 2024-05-09 00:25

My I9 CPU, 64GB memory module and 3080Ti computer are inferior to the top configuration of 19999 on a tablet

Qin Liming 2024-05-11 09:12

be devoid of any sense of shame

Xiaoxia cat ball 2024-06-09 21:29

Very good, come on

Xiao Xu Middle aged 2024-06-10 07:05

Learn

kangert 2024-06-09 20:10

Really need to practice

zhuzhua 2024-05-21 10:08

I'm laughing to death. Those who have been deeply kidnapped dare not pay? Who will use the domestic open source framework of small companies in the future will be 213!!! Wait for harvesting later

muwanqing123 2024-06-09 08:28

Bullshit authentication

Xiao Xu Middle aged 2024-06-08 12:43

Do AI functions need networking? Will it be 404?

generation

Code e person 2024-06-09 10:03

Prepare the next project and try it

zoujiaqing 2024-06-07 21:21

Spring boot was not updated last year

oldpig 2024-04-28 09:59

”Huawei contributed all the source code "?, the title is completely inconsistent with the content.

pan3793 2024-06-07 22:26

Let AI give AI a score

zzeric 2024-04-28 20:01

Although France is the parent community, the core developers of OCCT on github are all Russians. Without Russians, the French parent community cannot continue to operate. So Huawei took over, moved to China, changed its name and resumed open source and community operations. What's the problem?

zhangleijie 2024-06-08 10:08

pretty good

CodeDoger 2024-05-02 20:48

35 It's too old to go to work and too early to retire at 60

osc_566335 2024-04-28 14:44

This is also called floor washing? Does it mean that Tesla will not wash the floor if it releases all the source code? Some people HWptds? That is to say, the language is ambiguous, which will also rise to the washing ground? Are some people too focused? Think the people he pays attention to must be staring at?

H Fine water and long flow H 2024-06-10 09:39

I haven't heard about whether fartran has paid. I'm in the top ten

Chief taxi captain 2024-05-17 11:17

I suggest that 360 open source all its products, and then become the leading enterprise in the domestic open source industry through open source, leading everyone to compete with foreign enterprises

kakai 2024-05-10 10:21

The world only knows that Android was created by Google. Several people know that Android is only a product acquired by Google. Similarly, what is the problem with Huawei's contribution to the collection of OGG open source work and integration into its own proprietary product line?

Kevin586 2024-06-08 14:41

Dream is garbage, which can also be listed and refresh my cognition

gamedot 2024-05-17 11:14

Old Zhou is deeply concerned about Huawei's great cause of open source. He is not a Huawei person, but has Huawei's soul.

Yeah, for 2024-05-17 13:42

That's too right. Old Zhou can't control Google, but he can control 360. Do not do to others what you do not want. All 360 products should be opened first.

yh2216 2024-06-09 23:03

I remember saying that one year C++was the language of the year,

sunday12345 2024-05-15 18:31

What does the bank do? It's blamed on the remote desktop. Persimmons really pick up soft pinches~?

Li Yinghui 2024-05-09 16:40

Buddhism has a good word, evil opinion. In dealing with the world, it is meaningless to draw conclusions from preset positions; It is also important to receive good logic training.

Jute analysis of Zookeeper serialization component

Hot content

Popular comments of the whole site

About the author

Author's Album

Author's other popular articles

Hot News

Hot software

OSCHINA Community

Online tools

Introduction

QQ group

Public account

Video number

Jute analysis of Zookeeper serialization component

Hot content

Popular comments of the whole site

About the author

Author's Album

Author's other popular articles

Hot News

Recommended attention

Hot software

OSCHINA Community

Online tools

Introduction

QQ group

Public account

Video number