@Yano
2018-11-24T14:46:10.000000Z
字数 3853
阅读 3959
Java
本篇博客仅分析Thrift对象的序列化、反序列化的字节数组,以及Thrift对象的序列化、反序列化原理。其他源码分析会另开章节~
struct Person {1: required i32 age;2: required string name;}
thrift -r --gen java test.thrift
@Testpublic void testPerson() throws TException {Person person = new Person().setAge(18).setName("yano");System.out.println(person);TSerializer serializer = new TSerializer();byte[] bytes = serializer.serialize(person);System.out.println(Arrays.toString(bytes));Person parsePerson = new Person();TDeserializer deserializer = new TDeserializer();deserializer.deserialize(parsePerson, bytes);System.out.println(parsePerson);}
com.yano.nankai.spring.thrift.Person(age:18, name:yano)[8, 0, 1, 0, 0, 0, 18, 11, 0, 2, 0, 0, 0, 4, 121, 97, 110, 111, 0]com.yano.nankai.spring.thrift.Person(age:18, name:yano)
上述测试用例首先新建了Person对象,这个对象只有两个field。接着调用Thrift的TSerializer对person对象进行序列化。
其生成的字节数组为:
[8, 0, 1, 0, 0, 0, 18, 11, 0, 2, 0, 0, 0, 4, 121, 97, 110, 111, 0]
TSerializer类的serialize方法如下,最终是调用了person对象的write方法。
public byte[] serialize(TBase base) throws TException {this.baos_.reset();base.write(this.protocol_);return this.baos_.toByteArray();}
Person类的write方法:
public void write(TProtocol oprot) throws TException {validate();oprot.writeStructBegin(STRUCT_DESC);oprot.writeFieldBegin(AGE_FIELD_DESC);oprot.writeI32(this.age);oprot.writeFieldEnd();if (this.name != null) {oprot.writeFieldBegin(NAME_FIELD_DESC);oprot.writeString(this.name);oprot.writeFieldEnd();}oprot.writeFieldStop();oprot.writeStructEnd();}
其中TProtocol默认为TBinaryProtocol,writeStructBegin()和writeStructEnd()方法为空。
oprot.writeFieldBegin(AGE_FIELD_DESC);
TBinaryProtocol 中的具体实现为:
public void writeFieldBegin(TField field) throws TException {this.writeByte(field.type);this.writeI16(field.id);}
可以看到,首先是将字节数组写入了一个byte表示该字段的类型,而这里的TFiled AGE_FIELD_DESC 为:
private static final TField AGE_FIELD_DESC = new TField("age", TType.I32, (short)1);
在thrift中定义的第一个字段为:
1: required i32 age;
其中TType的定义如下:
public final class TType {public static final byte STOP = 0;public static final byte VOID = 1;public static final byte BOOL = 2;public static final byte BYTE = 3;public static final byte DOUBLE = 4;public static final byte I16 = 6;public static final byte I32 = 8;public static final byte I64 = 10;public static final byte STRING = 11;public static final byte STRUCT = 12;public static final byte MAP = 13;public static final byte SET = 14;public static final byte LIST = 15;public static final byte ENUM = 16;public TType() {}}
那么字节数组的第一个元素就是i32这个类型,为8。
接下来会写入这个字段所定义的id,age字段的id为1(注意这里是占两个字节),所以字节数组接下来的两个元素是 0,1。
对于name字段也是同理。
输出的字节数组每个值所代表的含义:
8 // 数据类型为i320, 1 // 字段id为10, 0, 0, 18 // 字段id为1(age)的值,占4个字节11 // 数据类型为string0, 2 // 字段id为2(name)0, 0, 0, 4 // 字符串name的长度,占4个字节121, 97, 110, 111 // "yano"的4个ASCII码(其实是UTF-8编码)0 // 结束
其反序列化的语句为:
Person parsePerson = new Person();TDeserializer deserializer = new TDeserializer();deserializer.deserialize(parsePerson, bytes);
Person类的read函数:
public void read(TProtocol iprot) throws TException {TField field;iprot.readStructBegin();while (true){field = iprot.readFieldBegin();if (field.type == TType.STOP) {break;}switch (field.id) {case 1: // AGEif (field.type == TType.I32) {this.age = iprot.readI32();setAgeIsSet(true);} else {TProtocolUtil.skip(iprot, field.type);}break;case 2: // NAMEif (field.type == TType.STRING) {this.name = iprot.readString();} else {TProtocolUtil.skip(iprot, field.type);}break;default:TProtocolUtil.skip(iprot, field.type);}iprot.readFieldEnd();}iprot.readStructEnd();// check for required fields of primitive type, which can't be checked in the validate methodif (!isSetAge()) {throw new TProtocolException("Required field 'age' was not found in serialized data! Struct: " + toString());}validate();}
其代码也很简单清晰,先在字节数组中读取TField(5个字节,1字节类型+4字节id),接着根据id将其赋值给对应的字段。
其中有很多细节,就不一一介绍了。我写得也不如源码清楚。
我曾经分析过Google Protocol Buffers 的序列化字节码,Google Protocol Buffers 序列化算法分析。感觉两者在序列化字节数组方面实现差别还是挺大的: