Protocol Buffers Developer Guide

Table of Contents

1 Overview

Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages.

Protocol buffers 是 google 提供的一种将结构化数据进行序列化和反序列化的方法,其优点是语言中立,平台中立,可扩展性好。类似 XML,但是序列化后的数据更小,解析更快,使用更简单。只需要定义想要结构化的数据一次,就可以使用针对各种语言生成的的代码很容易的从各种数据流中读写结构化的数据。

Protocol buffers currently supports generated code in Java, Python, and C++. With our new proto3 language version, you can also work with Go, JavaNano, Ruby, and C#, with more languages to come.

Protocol buffers 目前支持 Java、Python 和 C++。新的 proto3 还支持 Go、JavaNano、Ruby 和 C#。

2 Developer Guide

Welcome to the developer documentation for protocol buffers – a language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more.

This documentation is aimed at Java, C++, or Python developers who want to use protocol buffers in their applications. This overview introduces protocol buffers and tells you what you need to do to get started – you can then go on to follow the tutorials or delve deeper into protocol buffer encoding. API reference documentation is also provided for all three languages, as well as language and style guides for writing .proto files.

2.1 What are protocol buffers?

Protocol buffers are a flexible, efficient, automated mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages. You can even update your data structure without breaking deployed programs that are compiled against the "old" format.

2.2 How do they work?

You specify how you want the information you're serializing to be structured by defining protocol buffer message types in .proto files. Each protocol buffer message is a small logical record of information, containing a series of name-value pairs. Here's a very basic example of a .proto file that defines a message containing information about a person:

message Person {
    required string name = 1;
    required int32 id = 2;
    optional string email = 3;

    enum PhoneType {
        MOBILE = 0;
        HOME = 1;
        WORK = 2;
    }

    message PhoneNumber {
        required string number = 1;
        optional PhoneType type = 2 [default = HOME];
    }

    repeated PhoneNumber phone = 4;
}

As you can see, the message format is simple – each message type has one or more uniquely numbered fields, and each field has a name and a value type, where value types can be numbers (integer or floating-point), booleans, strings, raw bytes, or even (as in the example above) other protocol buffer message types, allowing you to structure your data hierarchically. You can specify optional fields, required fields, and repeated fields. You can find more information about writing .proto files in the Protocol Buffer Language Guide.

Once you've defined your messages, you run the protocol buffer compiler for your application's language on your .proto file to generate data access classes. These provide simple accessors for each field (like name() and set_name()) as well as methods to serialize/parse the whole structure to/from raw bytes – so, for instance, if your chosen language is C++, running the compiler on the above example will generate a class called Person. You can then use this class in your application to populate, serialize, and retrieve Person protocol buffer messages. You might then write some code like this:

Person person;
person.set_name("John Doe");
person.set_id(1234);
person.set_email("jdoe@example.com");
fstream output("myfile", ios::out | ios::binary);
person.SerializeToOstream(&output);

Then, later on, you could read your message back in:

fstream input("myfile", ios::in | ios::binary);
Person person;
person.ParseFromIstream(&input);
cout << "Name: " << person.name() << endl;
cout << "E-mail: " << person.email() << endl;

You can add new fields to your message formats without breaking backwards-compatibility; old binaries simply ignore the new field when parsing. So if you have a communications protocol that uses protocol buffers as its data format, you can extend your protocol without having to worry about breaking existing code.

You'll find a complete reference for using generated protocol buffer code in the API Reference section, and you can find out more about how protocol buffer messages are encoded in Protocol Buffer Encoding.

2.3 Why not just use XML?

Protocol buffers have many advantages over XML for serializing structured data. Protocol buffers:

  • are simpler
  • are 3 to 10 times smaller
  • are 20 to 100 times faster
  • are less ambiguous
  • generate data access classes that are easier to use programmatically

For example, let's say you want to model a person with a name and an email. In XML, you need to do:

<person>
  <name>John Doe</name>
  <email>jdoe@example.com</email>
</person>

while the corresponding protocol buffer message (in protocol buffer text format) is:

# Textual representation of a protocol buffer.
# This is *not* the binary format used on the wire.
person {
    name: "John Doe"
        email: "jdoe@example.com"
        }

When this message is encoded to the protocol buffer binary format (the text format above is just a convenient human-readable representation for debugging and editing), it would probably be 28 bytes long and take around 100-200 nanoseconds to parse. The XML version is at least 69 bytes if you remove whitespace, and would take around 5,000-10,000 nanoseconds to parse.

Also, manipulating a protocol buffer is much easier:

cout << "Name: " << person.name() << endl;
cout << "E-mail: " << person.email() << endl;

Whereas with XML you would have to do something like:

cout << "Name: "
<< person.getElementsByTagName("name")->item(0)->innerText()
<< endl;
cout << "E-mail: "
<< person.getElementsByTagName("email")->item(0)->innerText()
<< endl;

However, protocol buffers are not always a better solution than XML – for instance, protocol buffers would not be a good way to model a text-based document with markup (e.g. HTML), since you cannot easily interleave structure with text. In addition, XML is human-readable and human-editable; protocol buffers, at least in their native format, are not. XML is also – to some extent – self-describing. A protocol buffer is only meaningful if you have the message definition (the .proto file).

Sounds like the solution for me! How do I get started?

Download the package – this contains the complete source code for the Java, Python, and C++ protocol buffer compilers, as well as the classes you need for I/O and testing. To build and install your compiler, follow the instructions in the README.

Once you're all set, try following the tutorial for your chosen language – this will step you through creating a simple application that uses protocol buffers.

2.4 Introducing proto3

Our most recent version 3 alpha release introduces a new language version - Protocol Buffers language version 3 (aka proto3), as well as some new features in our existing language version (aka proto2). Proto3 simplifies the protocol buffer language, both for ease of use and to make it available in a wider range of programming languages: our current alpha release lets you generate protocol buffer code in Java, C++, Python, JavaNano, and Ruby, with some limitations. In addition you can generate proto3 code for Go using the latest Go protoc plugin, available from the golang/protobuf Github repository. More languages are in the pipeline.

We currently recommend trying proto3 only:

  • if you want try using protocol buffers in one of our newly-supported languages.
  • If you you want to try our new open-source RPC implementation gRPC (currently also in alpha release) – we recommend using proto3 for all new gRPC servers and clients as it avoids compatibility issues.

Note that the two language version APIs are not completely compatible. To avoid inconvenience to existing users, we will continue to support the previous language version in new protocol buffers releases.

You can see the major differences from the current default version in the release notes and learn about proto3 syntax in the Proto3 Language Guide. Full documentation for proto3 is coming soon!

(If the names proto2 and proto3 seem a little confusing, it's because when we originally open-sourced protocol buffers it was actually Google's second version of the language – also known as proto2. This is also why our open source version number started from v2.0.0).

2.5 A bit of history

Protocol buffers were initially developed at Google to deal with an index server request/response protocol. Prior to protocol buffers, there was a format for requests and responses that used hand marshalling/unmarshalling of requests and responses, and that supported a number of versions of the protocol. This resulted in some very ugly code, like:

if (version == 3) {
    ...
} else if (version > 4) {
    if (version == 5) {
        ...
    }
    ...
        }

Explicitly formatted protocols also complicated the rollout of new protocol versions, because developers had to make sure that all servers between the originator of the request and the actual server handling the request understood the new protocol before they could flip a switch to start using the new protocol.

Protocol buffers were designed to solve many of these problems:

New fields could be easily introduced, and intermediate servers that didn't need to inspect the data could simply parse it and pass through the data without needing to know about all the fields. Formats were more self-describing, and could be dealt with from a variety of languages (C++, Java, etc.)

However, users still needed to hand-write their own parsing code.

As the system evolved, it acquired a number of other features and uses:

Automatically-generated serialization and deserialization code avoided the need for hand parsing.

In addition to being used for short-lived RPC (Remote Procedure Call) requests, people started to use protocol buffers as a handy self-describing format for storing data persistently (for example, in Bigtable). Server RPC interfaces started to be declared as part of protocol files, with the protocol compiler generating stub classes that users could override with actual implementations of the server's interface. Protocol buffers are now Google's lingua franca for data – at time of writing, there are 48,162 different message types defined in the Google code tree across 12,183 .proto files. They're used both in RPC systems and for persistent storage of data in a variety of storage systems.

2.6 What are protocol buffers?

3 Tutorials

3.1 Tutorials Overview

Each tutorial in this section shows you how to implement a simple application using protocol buffers in your favourite language, introducing you to the language's protocol buffer API as well as showing you the basics of creating and using .proto files. The complete sample code for each application is also provided.

The tutorials don't assume that you know anything about protocol buffers, but do assume that you are comfortable writing code in your chosen language, including using file I/O.

  • C++ Tutorial
  • C# Tutorial
  • Go Tutorial
  • Java Tutorial
  • Python Tutorial

3.2 Basics: C++

This tutorial provides a basic C++ programmer's introduction to working with protocol buffers. By walking through creating a simple example application, it shows you how to

这篇教程向 C++程序员提供如何使用 protocol buffers 的简单介绍。通过创建一个简单例子,展示如下内容:

  • Define message formats in a .proto file.

    在.proto 文件中定义消息格式。

  • Use the protocol buffer compiler.

    使用 protocol buffer 编译器。

  • Use the C++ protocol buffer API to write and read messages.

    使用 C++ protocol buffer API 读写消息。

This isn't a comprehensive guide to using protocol buffers in C++. For more detailed reference information, see the Protocol Buffer Language Guide, the C++ API Reference, the C++ Generated Code Guide, and the Encoding Reference.

这不是通过 C++ 使用 protocol buffers 的全面指南。更详细的参考信息,参见 the Protocol Buffer Language Guide, the C++ API Reference, the C++ Generated Code Guide, and the Encoding Reference.

3.3 Why Use Protocol Buffers?

为什么使用 Protocol Buffers?

The example we're going to use is a very simple "address book" application that can read and write people's contact details to and from a file. Each person in the address book has a name, an ID, an email address, and a contact phone number.

将要使用的示例是一个简单的“地址簿”应用程序,它可以从文件中读写人们的联系方式。地址簿中的每个人有名字,ID,邮件地址和联系电话号码。

How do you serialize and retrieve structured data like this? There are a few ways to solve this problem:

如何序列化和反序列化类似的结构化数据呢?有一些解决办法:

  • The raw in-memory data structures can be sent/saved in binary form. Over time, this is a fragile approach, as the receiving/reading code must be compiled with exactly the same memory layout, endianness, etc. Also, as files accumulate data in the raw format and copies of software that are wired for that format are spread around, it's very hard to extend the format.

    原始的内存中的数据结构可以通过二进制形式进行发送/保存。随着时间的推移,这变成脆弱的方法,因为接收/读取的代码必须使用相同的内存布局、字节顺序等进行编译。同时,随着文件以原始格式积累数据和专门针对这种格式软件的发散,扩展格式变得非常难。

  • You can invent an ad-hoc way to encode the data items into a single string – such as encoding 4 ints as "12:3:-23:67". This is a simple and flexible approach, although it does require writing one-off encoding and parsing code, and the parsing imposes a small run-time cost. This works best for encoding very simple data.

    可以发明一种专门格式将数据项编码为一个字符串—比如将四个整形数编码为"12:3:-23:67"。这种方法简单灵活,尽管它要求编写一次性的编码和解码程序,解码还会增加小的运行成本。这最合适编码非常简单的数据。

  • Serialize the data to XML. This approach can be very attractive since XML is (sort of) human readable and there are binding libraries for lots of languages. This can be a good choice if you want to share data with other applications/projects. However, XML is notoriously space intensive, and encoding/decoding it can impose a huge performance penalty on applications. Also, navigating an XML DOM tree is considerably more complicated than navigating simple fields in a class normally would be.

    将数据序列化为 XML。这种方法非常有吸引力,因为 XML 是一种人类可读的语言,并且很多语言都有相对应的库。如果想和其他程序/项目共享数据的话,这可能是非常不错的选择。然而,XML 是出了名的占用空间大,并且编解码都会使应用程序性能损失巨大。同时,操作 XML DOM 树通常也比操作类的简单字段更加复杂。

Protocol buffers are the flexible, efficient, automated solution to solve exactly this problem. With protocol buffers, you write a .proto description of the data structure you wish to store. From that, the protocol buffer compiler creates a class that implements automatic encoding and parsing of the protocol buffer data with an efficient binary format. The generated class provides getters and setters for the fields that make up a protocol buffer and takes care of the details of reading and writing the protocol buffer as a unit. Importantly, the protocol buffer format supports the idea of extending the format over time in such a way that the code can still read data encoded with the old format.

protocol bffers 可以灵活、高效、自动化的解决这个问题。使用 protocol buffers,编写一个.proto 文件描述想要存储的数据结构。通过该文件,protocol buffer 编译器创建一个类,该类使用高效的二进制格式对 protocol buffer 数据自动编码和解码。生成的类针对组成 protocol buffer 的字段提供 getters 和 setters,并负责将 protocol buffer 作为一个单元进行读写的细节。重要的是,随着时间推移,protocol buffer 格式可以进行扩展,代码仍然可以读取使用旧格式编码的数据。

3.4 Where to Find the Example Code

哪里去找示例代码?

The example code is included in the source code package, under the "examples" directory. Download it here.

源码包中的“example”目录包含有示例代码。这里下载

3.5 Defining Your Protocol Format

定义自己的协议格式。

To create your address book application, you'll need to start with a .proto file. The definitions in a .proto file are simple: you add a message for each data structure you want to serialize, then specify a name and a type for each field in the message. Here is the .proto file that defines your messages, addressbook.proto.

创建地址簿应用程序,需要从.proto 文件开始。.proto 文件的定义很简单:为每个想要序列化的数据结构体添加消息,然后为消息中的每个字段指定名字和类型。下面的 addressbook.proto 文件定义了消息:

package tutorial;

message Person {
    required string name = 1;
    required int32 id = 2;
    optional string email = 3;

    enum PhoneType {
        MOBILE = 0;
        HOME = 1;
        WORK = 2;
    }

    message PhoneNumber {
        required string number = 1;
        optional PhoneType type = 2 [default = HOME];
    }

    repeated PhoneNumber phone = 4;
}

message AddressBook {
    repeated Person person = 1;
}

As you can see, the syntax is similar to C++ or Java. Let's go through each part of the file and see what it does.

如你所见,语法类似 C++或者 Java。继续浏览文件的每个部分看看它做了什么。

The .proto file starts with a package declaration, which helps to prevent naming conflicts between different projects. In C++, your generated classes will be placed in a namespace matching the package name.

.proto 以包声明开始,这有助于在不同项目间防止命名冲突。在 C++中,生成的类将放在包名称同名的命名空间中。

Next, you have your message definitions. A message is just an aggregate containing a set of typed fields. Many standard simple data types are available as field types, including bool, int32, float, double, and string. You can also add further structure to your messages by using other message types as field types – in the above example the Person message contains PhoneNumber messages, while the AddressBook message contains Person messages. You can even define message types nested inside other messages – as you can see, the PhoneNumber type is defined inside Person. You can also define enum types if you want one of your fields to have one of a predefined list of values – here you want to specify that a phone number can be one of MOBILE, HOME, or WORK.

接下来是消息定义。消息只是包含一组类型化字段的集合。有许多标准的数据类型可以作为字段类型,包括 bool,int32,float,double,和 string。也可以通过将其他消息类型作为字段类型向消息添加结构体—上面的例子中,Persion 消息包含 PhoneNumber 消息,AddressBook 消息又包含 Person 消息。甚至可以在其他消息中嵌套定义消息类型—如你所见,PhoneNumber 类型定义 Person 当中。如果想字段中有预定义的值列表,也可以定义枚举类型—这里想指定电话号码可以是 MOBILIE,HOME 或者 WORK 中的一个。

The "= 1", "= 2" markers on each element identify the unique "tag" that field uses in the binary encoding. Tag numbers 1-15 require one less byte to encode than higher numbers, so as an optimization you can decide to use those tags for the commonly used or repeated elements, leaving tags 16 and higher for less-commonly used optional elements. Each element in a repeated field requires re-encoding the tag number, so repeated fields are particularly good candidates for this optimization.

每个元素上标记的"= 1", "= 2"作为字段在二进制编码中的唯一标签。1-15 的标签数字比之后的数字编码的时候要少一个字节。所以一个可以决定使用的优化是 常用或重复使用的元素使用 1-15 号标签,不常用的可选元素使用 16 以及后面的标签。重复字段的每个元素需要重新编码标签号码,所以重复字段特被适合这种优化。

Each field must be annotated with one of the following modifiers:

每个字段必须使用下面修饰符中的一个标注:

  • required: a value for the field must be provided, otherwise the message will be considered "uninitialized". If libprotobuf is compiled in debug mode, serializing an uninitialized message will cause an assertion failure. In optimized builds, the check is skipped and the message will be written anyway. However, parsing an uninitialized message will always fail (by returning false from the parse method). Other than this, a required field behaves exactly like an optional field.

    required:必须提供字段的值,否则消息将被认为是“为初始化”的。如果 libprotobuf 是在调试模式下编译的,序列化未初始化的信息将会导致一个断言失败。在优化过构建中,将会跳过检查,可以任意写消息。然而,解析为初始化的消息将总是失败(通过解析方法返回 false)。除此之外,required 的字段行为类似 optional 字段。

  • optional: the field may or may not be set. If an optional field value isn't set, a default value is used. For simple types, you can specify your own default value, as we've done for the phone number type in the example. Otherwise, a system default is used: zero for numeric types, the empty string for strings, false for bools. For embedded messages, the default value is always the "default instance" or "prototype" of the message, which has none of its fields set. Calling the accessor to get the value of an optional (or required) field which has not been explicitly set always returns that field's default value.

    optional:字段设置与否都可。如果可选字段没有设置就使用默认值。对于简单类型,可以指定自己的默认值,就像例子中电话号码类型那样做的。否则,使用系统默认值:数字类型是 0,字符串是空串。布尔是 false。嵌入消息的默认值总是消息没有设置字段的“default instance”或“prototype”。对没有明确设置的可选或必选字段调用访问器总是返回字段的默认值。

  • repeated: the field may be repeated any number of times (including zero). The order of the repeated values will be preserved in the protocol buffer. Think of repeated fields as dynamically sized arrays.

    repeated:字段可能被重复任意次数(包括 0)。重复值的顺序将会保存在 protocol buffer 中。把重复字段想成动态大小的数组。

Required Is Forever You should be very careful about marking fields as required. If at some point you wish to stop writing or sending a required field, it will be problematic to change the field to an optional field – old readers will consider messages without this field to be incomplete and may reject or drop them unintentionally. You should consider writing application-specific custom validation routines for your buffers instead. Some engineers at Google have come to the conclusion that using required does more harm than good; they prefer to use only optional and repeated. However, this view is not universal.

应该永远小心将字段标记为 required。如果在什么时候想停止写或者发送给一个 required 字段,将它变为可选字段可能是有问题的–旧的 reader 将会认为消息没有该字段是不完整的,可能会无意中拒绝或删除它们。相反,应该考虑为 buffers 编写特定应用程序的自定义验证程序。一些 gooogel 的工程师认为 required 弊大于利;他们更喜欢只使用 optional 和 repeated。然而,这种观点是不普遍的。

You'll find a complete guide to writing .proto files – including all the possible field types – in the Protocol Buffer Language Guide. Don't go looking for facilities similar to class inheritance, though – protocol buffers don't do that.

Protocol Buffer Language Guide 中将会有编写.proto 文件的完整指南—包括所有可用的字段类型。不要查找类似类继承的功能,虽然 protocol buffers 不这么做。

3.6 Compiling Your Protocol Buffers

Now that you have a .proto, the next thing you need to do is generate the classes you'll need to read and write AddressBook (and hence Person and PhoneNumber) messages. To do this, you need to run the protocol buffer compiler protoc on your .proto:

现在已经有一个.proto,接下来要做的事情是生成可以用来读写 AddressBook(也是 Person 和 PhoneNumber)消息的类。要做到这一点,必须在.proto 上运行 protocol buffer 编译器。

  1. If you haven't installed the compiler, download the package and follow the instructions in the README.

    如果还没有安装编译器,下载并按照 README 中的说明进行安装。

  2. Now run the compiler, specifying the source directory (where your application's source code lives – the current directory is used if you don't provide a value), the destination directory (where you want the generated code to go; often the same as $SRC_DIR), and the path to your .proto. In this case, you…:

    现在运行编译器,指定源码目录(程序源代码所放目录–如果不指定是默认是当前目录),目标目录(希望生成的代码存放的目录,经常和$SRC_DIR 一样),.proto 文件路径。这种情况下:

    protoc -I=$SRC_DIR --cpp_out=$DST_DIR $SRC_DIR/addressbook.proto
    

    Because you want C++ classes, you use the --cpp_out option – similar options are provided for other supported languages.

    如果想要 c++类,可以使用 --cpp_out 选项,其他语言也有类似的选项。

This generates the following files in your specified destination directory:

指定目标目录下会生成以下文件:

  • addressbook.pb.h, the header which declares your generated classes.

    addressbook.pb.h,声明生成类的头文件。

  • addressbook.pb.cc, which contains the implementation of your classes.

    addressbook.pb.cc,类文件的实现。

3.7 The Protocol Buffer API

Let's look at some of the generated code and see what classes and functions the compiler has created for you. If you look in tutorial.pb.h, you can see that you have a class for each message you specified in tutorial.proto. Looking closer at the Person class, you can see that the complier has generated accessors for each field. For example, for the name, id, email, and phone fields, you have these methods:

来看一些生成的代码,看看编译器生成哪些类和函数。如果查看 tutorial.pb.h,会发现 tutorial.proto 中定义的每个消息都有一个对应的类。

// name
inline bool has_name() const;
inline void clear_name();
inline const ::std::string& name() const;
inline void set_name(const ::std::string& value);
inline void set_name(const char* value);
inline ::std::string* mutable_name();

// id
inline bool has_id() const;
inline void clear_id();
inline int32_t id() const;
inline void set_id(int32_t value);

// email
inline bool has_email() const;
inline void clear_email();
inline const ::std::string& email() const;
inline void set_email(const ::std::string& value);
inline void set_email(const char* value);
inline ::std::string* mutable_email();

// phone
inline int phone_size() const;
inline void clear_phone();
inline const ::google::protobuf::RepeatedPtrField< ::tutorial::Person_PhoneNumber >& phone() const;
inline ::google::protobuf::RepeatedPtrField< ::tutorial::Person_PhoneNumber >* mutable_phone();
inline const ::tutorial::Person_PhoneNumber& phone(int index) const;
inline ::tutorial::Person_PhoneNumber* mutable_phone(int index);
inline ::tutorial::Person_PhoneNumber* add_phone();

As you can see, the getters have exactly the name as the field in lowercase, and the setter methods begin with set_. There are also has_ methods for each singular (required or optional) field which return true if that field has been set. Finally, each field has a clear_ method that un-sets the field back to its empty state.

如你所见,getters 有明确的名字,也就是小写的字段名,setter 方法以 set_ 开始。每个单数(required 或 optional)字段有 has_ 方法返回该字段是否设置。最后每个字段都有 clear_ 方法重置其回到空状态。

While the numeric id field just has the basic accessor set described above, the name and email fields have a couple of extra methods because they're strings – a mutable_ getter that lets you get a direct pointer to the string, and an extra setter. Note that you can call mutable_email() even if email is not already set; it will be initialized to an empty string automatically. If you had a singular message field in this example, it would also have a mutable_ method but not a set_ method.

虽然数字 id 字段只有上面描述的基本访问组,name 和 email 字段由于它们是字符串有几个额外的方法– mutable_ getter 可以获得直接指向字符串的指针,一个额外的 setter。注意,即使 email 还没有设置也可以直接调用 mutable_email() ;它将会自动初始化为空字符串。如果这个例子中有一个单数消息字段,也将会有一个 mutable_ 方法但没有 set_ 方法。

Repeated fields also have some special methods – if you look at the methods for the repeated phone field, you'll see that you can

重复字段也有一些特殊方法—查看重复的电话字段的方法,可以看到:

  • check the repeated field's _size (in other words, how many phone numbers are associated with this Person).

    检查重复字段的 ~_size~(换句话说,这个人关联多少电话号码)

  • get a specified phone number using its index.

    通过索引获得指定的电话号码。

  • update an existing phone number at the specified index.

    根据指定索引更新存在的电话号码。

  • add another phone number to the message which you can then edit (repeated scalar types have an add_ that just lets you pass in the new value).

    向可以编辑的消息添加另一个电话号码(重复的标量类型有 add_ 可以传递新值)。

For more information on exactly what members the protocol compiler generates for any particular field definition, see the C++ generated code reference.

protocol bufffers 编译器为任何特定字段定义生成的确切成员信息查看C++ generated code reference

3.8 Enums and Nested Classes

The generated code includes a PhoneType enum that corresponds to your .proto enum. You can refer to this type as Person::PhoneType and its values as Person::MOBILE, Person::HOME, and Person::WORK (the implementation details are a little more complicated, but you don't need to understand them to use the enum).

生成代码中包含与.proto 中枚举对应的 PhoneType 枚举类型。可以用 Person::PhoneType 引用该类型,Person::MOBILE、Person::HOME、Person::WORK 引用值(实现细节稍微复杂一些,但是使用枚举不要了解这些)。

The compiler has also generated a nested class for you called Person::PhoneNumber. If you look at the code, you can see that the "real" class is actually called Person_PhoneNumber, but a typedef defined inside Person allows you to treat it as if it were a nested class. The only case where this makes a difference is if you want to forward-declare the class in another file – you cannot forward-declare nested types in C++, but you can forward-declare Person_PhoneNumber.

编译器还生成了嵌套类 Person::PhoneNumber。如果查看代码,可以发现真正的类名实际上是叫 Person_PhoneNumber,但是 Person 类中通过 typedef 定义使得可以把它看做事一个嵌套类。唯一不同的情况是如果想在另一个文件中提前声明该类—C++中不能提前声明内置类型,但是可以提前声明 Person_PhoneNumber。

3.9 Standard Message Methods

Each message class also contains a number of other methods that let you check or manipulate the entire message, including:

每个消息类都包含其他一些方法可用来检查或操纵整个消息,包括:

  • bool IsInitialized() const;: checks if all the required fields have been set.

    bool IsInitialized() const;检查所有 required 字段是否被设置了。

  • string DebugString() const;: returns a human-readable representation of the message, particularly useful for debugging.

    string DebugString() const;:返回可读的消息表示,这对调试特别有用。

  • void CopyFrom(const Person& from);: overwrites the message with the given message's values.

    void CopyFrom(const Person& from);:使用给定消息的值重写消息。

  • void Clear();: clears all the elements back to the empty state.

    清除所有元素回到空状态。

These and the I/O methods described in the following section implement the Message interface shared by all C++ protocol buffer classes. For more info, see the complete API documentation for Message.

这些以及下面的章节中描述的 I/O 方法实现来了所有 C++ protocol buffer 类共享的消息接口。更多信息参阅complete API documentation for Message

3.10 Parsing and Serialization

Finally, each protocol buffer class has methods for writing and reading messages of your chosen type using the protocol buffer binary format. These include:

最后,每个 protocol buffer 类都有方法以 protocol buffer 二进制格式读写选定类型的消息。包括:

  • bool SerializeToString(string* output) const;: serializes the message and stores the bytes in the given string. Note that the bytes are binary, not text; we only use the string class as a convenient container.

    bool SerializeToString(string* output) const;:序列化消息,并将字节存储在指定字符串中。注意字节是二进制的,不是文本格式;string 类只是作为方便的容器。

  • bool ParseFromString(const string& data);: parses a message from the given string.

    bool ParseFromString(const string& data);: 从指定字符串中解析消息。

  • bool SerializeToOstream(ostream* output) const;: writes the message to the given C++ ostream.

    bool SerializeToOstream(ostream* output) const;:向指定 C++ ostream 写消息。

  • bool ParseFromIstream(istream* input);: parses a message from the given C++ istream.

    bool ParseFromIstream(istream* input);:从指定 C++ istream 中解析消息。

These are just a couple of the options provided for parsing and serialization. Again, see the Message API reference for a complete list.

针对消息解析和序列化只有几个选项。查看Message API reference for a complete list

Protocol Buffers and O-O Design Protocol buffer classes are basically dumb data holders (like structs in C++); they don't make good first class citizens in an object model. If you want to add richer behaviour to a generated class, the best way to do this is to wrap the generated protocol buffer class in an application-specific class. Wrapping protocol buffers is also a good idea if you don't have control over the design of the .proto file (if, say, you're reusing one from another project). In that case, you can use the wrapper class to craft an interface better suited to the unique environment of your application: hiding some data and methods, exposing convenience functions, etc. You should never add behaviour to the generated classes by inheriting from them. This will break internal mechanisms and is not good object-oriented practice anyway.

Protocol Buffers 和面向对象设计的 Protocol buffer 类基本上是哑数据持有者(类似 C++中的结构体);对象模型中它们并不是一等公民。如果需要向生成的类中添加更丰富的行为,最好的方法是在应用程序特定类中包裹 protocol buffer 类。如果没有权限设计.proto 文件(比如说从其他项目重用该文件),包裹 protocol buffer 仍是个好主意。这种情况下,可以使用包裹类生成一个更适合应用程序特定环境的接口:隐藏一些数据和方法,暴露更方便函数等等。绝对不要通过继承来向生成类添加行为。这将会打破内部机制,并且无论如何这都不是好的面向对象实践。

3.11 Writing A Message

Now let's try using your protocol buffer classes. The first thing you want your address book application to be able to do is write personal details to your address book file. To do this, you need to create and populate instances of your protocol buffer classes and then write them to an output stream.

现在试着使用 protocol buffer 类。首先希望地址簿应用程序能够将个人信息写到地址簿文件。要做到这一点,需要创建和填充 protocol buffer 类实例并将它们写到输出流。

Here is a program which reads an AddressBook from a file, adds one new Person to it based on user input, and writes the new AddressBook back out to the file again. The parts which directly call or reference code generated by the protocol compiler are highlighted.

下面程序从一个文件读取地址簿信息,基于用户输入添加新的个人信息后将地址簿再次写回文件。高亮的部分表示直接调用和引用协议编译器生成的代码。

#include <iostream>
#include <fstream>
#include <string>
#include "addressbook.pb.h"
using namespace std;

// This function fills in a Person message based on user input.
void PromptForAddress(tutorial::Person* person) {
    cout << "Enter person ID number: ";
    int id;
    cin >> id;
    person->set_id(id);
    cin.ignore(256, '\n');

    cout << "Enter name: ";
    getline(cin, *person->mutable_name());

    cout << "Enter email address (blank for none): ";
    string email;
    getline(cin, email);
    if (!email.empty()) {
        person->set_email(email);
    }

    while (true) {
        cout << "Enter a phone number (or leave blank to finish): ";
        string number;
        getline(cin, number);
        if (number.empty()) {
            break;
        }

        tutorial::Person::PhoneNumber* phone_number = person->add_phone();
        phone_number->set_number(number);

        cout << "Is this a mobile, home, or work phone? ";
        string type;
        getline(cin, type);
        if (type == "mobile") {
            phone_number->set_type(tutorial::Person::MOBILE);
        } else if (type == "home") {
            phone_number->set_type(tutorial::Person::HOME);
        } else if (type == "work") {
            phone_number->set_type(tutorial::Person::WORK);
        } else {
            cout << "Unknown phone type.  Using default." << endl;
        }
    }
}

// Main function:  Reads the entire address book from a file,
//   adds one person based on user input, then writes it back out to the same
//   file.
int main(int argc, char* argv[]) {
    // Verify that the version of the library that we linked against is
    // compatible with the version of the headers we compiled against.
    GOOGLE_PROTOBUF_VERIFY_VERSION;

    if (argc != 2) {
        cerr << "Usage:  " << argv[0] << " ADDRESS_BOOK_FILE" << endl;
        return -1;
    }

    tutorial::AddressBook address_book;

    {
        // Read the existing address book.
        fstream input(argv[1], ios::in | ios::binary);
        if (!input) {
            cout << argv[1] << ": File not found.  Creating a new file." << endl;
        } else if (!address_book.ParseFromIstream(&input)) {
            cerr << "Failed to parse address book." << endl;
            return -1;
        }
    }

    // Add an address.
    PromptForAddress(address_book.add_person());

    {
        // Write the new address book back to disk.
        fstream output(argv[1], ios::out | ios::trunc | ios::binary);
        if (!address_book.SerializeToOstream(&output)) {
            cerr << "Failed to write address book." << endl;
            return -1;
        }
    }

    // Optional:  Delete all global objects allocated by libprotobuf.
    google::protobuf::ShutdownProtobufLibrary();

    return 0;
}

Notice the GOOGLE_PROTOBUF_VERIFY_VERSION macro. It is good practice – though not strictly necessary – to execute this macro before using the C++ Protocol Buffer library. It verifies that you have not accidentally linked against a version of the library which is incompatible with the version of the headers you compiled with. If a version mismatch is detected, the program will abort. Note that every .pb.cc file automatically invokes this macro on startup.

注意 GOOGLE_PROTOBUF_VERIFY_VERSION 宏。使用 C++ protocol buffer 库前执行该宏是很好的做法—尽管不是严格必须的。它通过验证有没有意外的链接到和编译时版本不兼容的库的头文件。如果检测到版本不匹配,程序将会终止。注意每个.pb.cce 文件启动时会自动调用该宏。

Also notice the call to ShutdownProtobufLibrary() at the end of the program. All this does is delete any global objects that were allocated by the Protocol Buffer library. This is unnecessary for most programs, since the process is just going to exit anyway and the OS will take care of reclaiming all of its memory. However, if you use a memory leak checker that requires that every last object be freed, or if you are writing a library which may be loaded and unloaded multiple times by a single process, then you may want to force Protocol Buffers to clean up everything.

还要注意程序结束时调用 ShutdownProtobufLibrary()。它做的工作就是删除所有 protocol buffer 库分配的全局对象。这对于大多数程序来说不是必须的,因为进程结束,系统将会负责回收它的所有内存。然后,如果使用要求每个对象释放的内存泄漏检查工具,或者编写一个单进程多次加载卸载的库,可能想要强制 protocol buffers 清理一切。

3.12 Reading A Message

Of course, an address book wouldn't be much use if you couldn't get any information out of it! This example reads the file created by the above example and prints all the information in it.

当然,如果不能从地址簿中读取任何信息的话,它不会有多大用处。下面的示例会读取之前生成的文件,并将里面的所有信息打印出来。

#include <iostream>
#include <fstream>
#include <string>
#include "addressbook.pb.h"
using namespace std;

// Iterates though all people in the AddressBook and prints info about them.
void ListPeople(const tutorial::AddressBook& address_book) {
    for (int i = 0; i < address_book.person_size(); i++) {
        const tutorial::Person& person = address_book.person(i);

        cout << "Person ID: " << person.id() << endl;
        cout << "  Name: " << person.name() << endl;
        if (person.has_email()) {
            cout << "  E-mail address: " << person.email() << endl;
        }

        for (int j = 0; j < person.phone_size(); j++) {
            const tutorial::Person::PhoneNumber& phone_number = person.phone(j);

            switch (phone_number.type()) {
            case tutorial::Person::MOBILE:
                cout << "  Mobile phone #: ";
                break;
            case tutorial::Person::HOME:
                cout << "  Home phone #: ";
                break;
            case tutorial::Person::WORK:
                cout << "  Work phone #: ";
                break;
            }
            cout << phone_number.number() << endl;
        }
    }
}

// Main function:  Reads the entire address book from a file and prints all
//   the information inside.
int main(int argc, char* argv[]) {
    // Verify that the version of the library that we linked against is
    // compatible with the version of the headers we compiled against.
    GOOGLE_PROTOBUF_VERIFY_VERSION;

    if (argc != 2) {
        cerr << "Usage:  " << argv[0] << " ADDRESS_BOOK_FILE" << endl;
        return -1;
    }

    tutorial::AddressBook address_book;

    {
        // Read the existing address book.
        fstream input(argv[1], ios::in | ios::binary);
        if (!address_book.ParseFromIstream(&input)) {
            cerr << "Failed to parse address book." << endl;
            return -1;
        }
    }

    ListPeople(address_book);

    // Optional:  Delete all global objects allocated by libprotobuf.
    google::protobuf::ShutdownProtobufLibrary();

    return 0;
}

3.13 Extending a Protocol Buffer

Sooner or later after you release the code that uses your protocol buffer, you will undoubtedly want to "improve" the protocol buffer's definition. If you want your new buffers to be backwards-compatible, and your old buffers to be forward-compatible – and you almost certainly do want this – then there are some rules you need to follow. In the new version of the protocol buffer:

发布使用 protocol buffer 的代码之后,迟早都要改善 protocol buffer 的定义。如果想要新的 buffers 向后兼容,旧的 buffers 向前兼容—几乎肯定要做这样的事情—必须遵守一些规则。在新版本的 protocol buffer 中:

  • you must not change the tag numbers of any existing fields.

    一定不要改变已有字段的 tag 编号。

  • you must not add or delete any required fields.

    一定不要添加或删除任何 required 字段。

  • you may delete optional or repeated fields.

    可以删除 optional 或 repeated 字段。

  • you may add new optional or repeated fields but you must use fresh tag numbers (i.e. tag numbers that were never used in this protocol buffer, not even by deleted fields).

    可以添加新的 optional 或 repeated 字段,但是必须使用新的 tag 编号(例如,该 protobuf 中从来没有使用过的 tag 编号,包括已删除的字段)。

(There are some exceptions to these rules, but they are rarely used.)

(这些规则也有一些例外,但是很少用到。)

If you follow these rules, old code will happily read new messages and simply ignore any new fields. To the old code, optional fields that were deleted will simply have their default value, and deleted repeated fields will be empty. New code will also transparently read old messages. However, keep in mind that new optional fields will not be present in old messages, so you will need to either check explicitly whether they're set with has_, or provide a reasonable default value in your .proto file with [default = value] after the tag number. If the default value is not specified for an optional element, a type-specific default value is used instead: for strings, the default value is the empty string. For booleans, the default value is false. For numeric types, the default value is zero. Note also that if you added a new repeated field, your new code will not be able to tell whether it was left empty (by new code) or never set at all (by old code) since there is no has_ flag for it.

如果遵守这些规则,旧代码将能很顺利的读取新消息,新字段将被简单忽略掉。对于旧代码,删除的 optional 字段将会简单的使用它们的默认值,删除的 repeated 字段将为空。新代码将透明的读取旧消息。然而,请记住,旧消息中不会有新的 optional 字段,所以要么通过 has_ 明确的检查它们是否被设置,要么在.proto 文件中通过 tag 号之后设置[default = value]来提供合理的默认值。如果 optional 元素没有指定默认值,将使用类型相关的默认值:string 将使用空,bool 将使用 false,数字将使用 0.同样注意,如果添加一个新的 repeated 字段,因为它没有 has_ 标志,新代码无法判断新代码是否将它设置为空,或者旧代码根本没有进行设置。

3.14 Optimization Tips

The C++ Protocol Buffers library is extremely heavily optimized. However, proper usage can improve performance even more. Here are some tips for squeezing every last drop of speed out of the library:

C++ Protocol Buffers 库进行了大量优化。然而,正确的使用还可以提高性能。这里有一些如何榨干该库的小贴士:

  • Reuse message objects when possible. Messages try to keep around any memory they allocate for reuse, even when they are cleared. Thus, if you are handling many messages with the same type and similar structure in succession, it is a good idea to reuse the same message object each time to take load off the memory allocator. However, objects can become bloated over time, especially if your messages vary in "shape" or if you occasionally construct a message that is much larger than usual. You should monitor the sizes of your message objects by calling the SpaceUsed method and delete them once they get too big.

    尽可能重用消息对象。即使消息被清除了,它们也会为了重用尽量保持它们分配的所有内存。因此,如果处理许多继承上有相同类型和类似结构的消息,每次重用相同的消息对象是个好主意,这可以减轻内存分配器的负载。然而,经过多次对象可能会变臃肿,特别是消息外形不同或经常构造比平时大的多的消息。应该通过调用 SpaceUsed 方法来监视消息对象的大小,一旦它们太大了旧删掉。

  • Your system's memory allocator may not be well-optimized for allocating lots of small objects from multiple threads. Try using Google's tcmalloc instead.

    系统内存分配器可能没有针对多线程分配大量小对象的情况进行优化。可以尝试使用 Google 的 tcmalloc 代替。

3.15 Advanced Usage

Protocol buffers have uses that go beyond simple accessors and serialization. Be sure to explore the C++ API reference to see what else you can do with them.

Protocol buffers 的用途不仅是简单访问和序列化消息。一定要看看C++ API reference 了解还可以用它做什么。

One key feature provided by protocol message classes is reflection. You can iterate over the fields of a message and manipulate their values without writing your code against any specific message type. One very useful way to use reflection is for converting protocol messages to and from other encodings, such as XML or JSON. A more advanced use of reflection might be to find differences between two messages of the same type, or to develop a sort of "regular expressions for protocol messages" in which you can write expressions that match certain message contents. If you use your imagination, it's possible to apply Protocol Buffers to a much wider range of problems than you might initially expect!

协议消息类型提供的一个关键特性是反射。不用针对消息类型编写代码就可以遍历消息所有字段并操作它们的值。反射的一个大用处就是可以从其他编码(比如 XML 或 JSON)中转化协议消息。更高级的用处可能是找出两个同类消息的差异,或开发一种“协议消息的正则表达式”,可以编写表达式类匹配特定消息内容。如果使用想象力,Protocol Buffers 适用问题的范围可能会超出最初的期望!

Reflection is provided by the :Reflection interface.

反射由 :Reflection 接口提供。

Author: lsl

Created: 2016-08-07 Sun 19:30

Validate