Protocol Buffers 开发者指南

Table of Contents

翻译自Protocol Buffers Developer Guide

1 Overview

Protocol Buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages.

Protocol buffers 是 google 提供的一种将结构化数据序列化和反序列化的方法,具有语言无关,平台无关,可扩展性好的优点。类似 XML,但是更小,更快,更简单。只需定义好如何结构化数据,就可以特殊生成的代码很容易的从各种数据流中读写结构化的数据,可用于多种语言。

Protocol buffers currently supports generated code in Java, Python, and C++. With our new proto3 language version, you can also work with Go, JavaNano, Ruby, and C#, with more languages to come.

Protocol buffers 目前支持使用 Java、Python 和 C++生成代码。使用新的 proto3 语言版本可以和 Go、JavaNano、Ruby 和 C#协同工作,还有更多语言将要添加。

2 Developer Guide

Welcome to the developer documentation for protocol buffers – a language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more.

欢迎来到 protocol buffer 开发者指南。它是一种将结构化数据序列化的方法,具有语言无关,平台无关,可扩展的优点,可用于通信协议,数据存储等方面。

This documentation is aimed at Java, C++, or Python developers who want to use protocol buffers in their applications. This overview introduces protocol buffers and tells you what you need to do to get started – you can then go on to follow the tutorials or delve deeper into protocol buffer encoding. API reference documentation is also provided for all three languages, as well as language and style guides for writing .proto files.

该文档面向想要在应用中使用 protocol buffer 的 Java、C++和 Python 开发者。本概述将介绍 protocol buffers,并告诉你想要开始应该如何做—然后可以按照教程继续或深入 protocol buffers 编码部分。还提供了这三种语言的 API 参考文档,以及编写.proto 文件的语言和风格指南。

2.1 What are protocol buffers?

Protocol buffers are a flexible, efficient, automated mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages. You can even update your data structure without breaking deployed programs that are compiled against the "old" format.

Protocol buffers 是 google 提供的一种将结构化数据序列化和反序列化的方法,具有语言无关,平台无关,可扩展性好的优点。类似 XML,但是更小,更快,更简单。只需定义好如何结构化数据,就可以特殊生成的代码很容易的从各种数据流中读写结构化的数据,可用于多种语言。甚至不破坏使用旧版协议格式编译的部署程序就可以升级数据结构,

2.2 How do they work?

You specify how you want the information you're serializing to be structured by defining protocol buffer message types in .proto files. Each protocol buffer message is a small logical record of information, containing a series of name-value pairs. Here's a very basic example of a .proto file that defines a message containing information about a person:

想要指定序列化的数据如何结构化,需要在.proto 文件中定义 protocol buffer 消息类型。每个 protocol buffer 消息类型是信息的一小段逻辑记录,包含一系列的名字-值对。下面是.proto 文件的一个非常基本的例子,它定义了有关个人信息的消息:

message Person {
	required string name = 1;
	required int32 id = 2;
	optional string email = 3;

	enum PhoneType {
		MOBILE = 0;
		HOME = 1;
		WORK = 2;

	message PhoneNumber {
		required string number = 1;
		optional PhoneType type = 2 [default = HOME];

	repeated PhoneNumber phone = 4;

As you can see, the message format is simple – each message type has one or more uniquely numbered fields, and each field has a name and a value type, where value types can be numbers (integer or floating-point), booleans, strings, raw bytes, or even (as in the example above) other protocol buffer message types, allowing you to structure your data hierarchically. You can specify optional fields, required fields, and repeated fields. You can find more information about writing .proto files in the Protocol Buffer Language Guide.

如你所见,消息格式很简单—每个消息类型有一个或多个被唯一编号的字段,每个字段有名字和值类型,值类型可以是数字(整数或浮点数),布尔值,字符串,原始字节,甚至其他 protocol buffer 消息类型(如上所示),允许分层结构化数据。可以指定可选字段,必填字段和可重复字段。可以在 Protocol Buffer 语言指南中找到更多有关编写.proto 文件的信息。

Once you've defined your messages, you run the protocol buffer compiler for your application's language on your .proto file to generate data access classes. These provide simple accessors for each field (like name() and set_name()) as well as methods to serialize/parse the whole structure to/from raw bytes – so, for instance, if your chosen language is C++, running the compiler on the above example will generate a class called Person. You can then use this class in your application to populate, serialize, and retrieve Person protocol buffer messages. You might then write some code like this:

定义好消息之后,就可以运行应用语言对应的 protocol buffer 编译器生成数据访问类。他们针对每个字段提供简单的访问器(比如 name()和 set_name()),也提供从原始字节序列化/分析整个结构的方法—所以,例如,如果选择 C++语言,针对上面的例子运行编译器将会产生一个叫做 Person 的类。可以在应用中使用该类来填充,序列化和获取 Person protocol buffer 消息。之后可能编写如下的代码:

Person person;
person.set_name("John Doe");
fstream output("myfile", ios::out | ios::binary);

Then, later on, you could read your message back in:


fstream input("myfile", ios::in | ios::binary);
Person person;
cout << "Name: " << << endl;
cout << "E-mail: " << << endl;

You can add new fields to your message formats without breaking backwards-compatibility; old binaries simply ignore the new field when parsing. So if you have a communications protocol that uses protocol buffers as its data format, you can extend your protocol without having to worry about breaking existing code.

可以向消息格式中添加新的字段,这不会破坏向后兼容性;旧二进制程序分析时简单忽略新字段。所以如果通讯协议使用了 protocol buffer 作为它的数据格式,可以扩展协议而无需担心破坏已有代码。

You'll find a complete reference for using generated protocol buffer code in the API Reference section, and you can find out more about how protocol buffer messages are encoded in Protocol Buffer Encoding.

在 API 参考部分将会发现使用 protocol buffer 代码的完整参考,可以在 Protocol Buffer Encoding 了解如何编码 protocol buffer 消息。

2.3 Why not just use XML?

Protocol buffers have many advantages over XML for serializing structured data. Protocol buffers:

相比 XML,Protocol buffers 在序列化结构数据方面有很多优势:

  • are simpler
  • are 3 to 10 times smaller
  • are 20 to 100 times faster
  • are less ambiguous
  • generate data access classes that are easier to use programmatically


For example, let's say you want to model a person with a name and an email. In XML, you need to do:

例如,假如要模拟一个有名字和 email 的人,XML 中,需要这样做。

  <name>John Doe</name>

while the corresponding protocol buffer message (in protocol buffer text format) is:

对应的 protocol buffer 消息(以 protocol buffer 文本格式):

# Textual representation of a protocol buffer.
# This is *not* the binary format used on the wire.
person {
	name: "John Doe"
		email: ""

When this message is encoded to the protocol buffer binary format (the text format above is just a convenient human-readable representation for debugging and editing), it would probably be 28 bytes long and take around 100-200 nanoseconds to parse. The XML version is at least 69 bytes if you remove whitespace, and would take around 5,000-10,000 nanoseconds to parse.

该消息编码为 protocol buffer 二进制格式(上面的文本形式只是一种用于调试和编辑的人类可读的表示)时,可能只有 28 字节长,解析需要 100-200 纳秒。XML 版本即使删掉空白符至少也有 69 字节,解析需要 5,000-10,000 纳秒。

Also, manipulating a protocol buffer is much easier:

同时,操作 protocol buffer 更简单:

cout << "Name: " << << endl;
cout << "E-mail: " << << endl;

Whereas with XML you would have to do something like:

然而使用 XML 不得不这样做:

cout << "Name: "
<< person.getElementsByTagName("name")->item(0)->innerText()
<< endl;
cout << "E-mail: "
<< person.getElementsByTagName("email")->item(0)->innerText()
<< endl;

However, protocol buffers are not always a better solution than XML – for instance, protocol buffers would not be a good way to model a text-based document with markup (e.g. HTML), since you cannot easily interleave structure with text. In addition, XML is human-readable and human-editable; protocol buffers, at least in their native format, are not. XML is also – to some extent – self-describing. A protocol buffer is only meaningful if you have the message definition (the .proto file).

然而,protocol buffers 并不总 XML 好—例如,对使用标记的文本文档建模,protocol buffer 并不是一个好方法,因为不能轻易的交错文本结构。另外,XML 是人类可读和可变记得;protocol buffers 至少原生格式上不是。某种程度上 XML 还是自描述的。protocol buffer 只有在定义消息(.proto 文件)之后才是有意义的。

Sounds like the solution for me! How do I get started?


Download the package – this contains the complete source code for the Java, Python, and C++ protocol buffer compilers, as well as the classes you need for I/O and testing. To build and install your compiler, follow the instructions in the README.

下载包含 Java,Python 和 C++ protocol buffer 编译器的全部代码的包,还有进行 I/O 和测试所需要的类。按照 README 中的说明构建和安装编译器。

Once you're all set, try following the tutorial for your chosen language – this will step you through creating a simple application that uses protocol buffers.

一旦都准备好了,尝试按照所选语言的教程创建一个使用 protocol buffers 的简单应用。

2.4 Introducing proto3

Our most recent version 3 alpha release introduces a new language version - Protocol Buffers language version 3 (aka proto3), as well as some new features in our existing language version (aka proto2). Proto3 simplifies the protocol buffer language, both for ease of use and to make it available in a wider range of programming languages: our current alpha release lets you generate protocol buffer code in Java, C++, Python, JavaNano, and Ruby, with some limitations. In addition you can generate proto3 code for Go using the latest Go protoc plugin, available from the golang/protobuf Github repository. More languages are in the pipeline.

最新 3 alpha 版本引入了一个新语言,Protocol Buffer 语言版本 3(也叫 proto3),以及现有语言版本(也叫 proto2)中的一些新特性。Proto2 简化了 protocol buffer 语言,以便更易于使用和面向更广泛的变成语言:当前 alpha 版本可以针对 Java,C++,Python,JavaNano 和 Ruby 生成 protocol buffer 代码。此外,使用最新版本呢的 Go protoc 插件还可以为 GO 生成 proto3 代码,该插件可以从 golang/protobuf Github 仓库获取。计划提供更多语言。

We currently recommend trying proto3 only:

当前推荐只在以下情况使用 proto3:

  • if you want try using protocol buffers in one of our newly-supported languages.

    如果想使用新支持的语言使用 protocol buffers。

  • If you you want to try our new open-source RPC implementation gRPC (currently also in alpha release) – we recommend using proto3 for all new gRPC servers and clients as it avoids compatibility issues.

    如果想尝试新开源的 RPC 实现 gRPC(当前仍处于 alpha 版本呢)—推荐为所有新的 gRPC 服务器和客户端使用 proto3 来避免兼容问题。

Note that the two language version APIs are not completely compatible. To avoid inconvenience to existing users, we will continue to support the previous language version in new protocol buffers releases.

注意两种语言版本的 API 并不完全兼容,为了避免给现有用户带来的不变,新版本 protocol buffer 版本中将继续支持之前的语言版本。

You can see the major differences from the current default version in the release notes and learn about proto3 syntax in the Proto3 Language Guide. Full documentation for proto3 is coming soon!

可以在发布说明中看到与当前默认版本的主要差异,从 Proto3 语言指南中可以学习 proto3 语法。很快会有完整的 proto3 文档。

(If the names proto2 and proto3 seem a little confusing, it's because when we originally open-sourced protocol buffers it was actually Google's second version of the language – also known as proto2. This is also why our open source version number started from v2.0.0).

(proto2 和 proto3 的名字看起来有点迷惑,因为最初开源 protocol buffers 时,它实际上是 google 的第二个语言版本—也称为 proto2。这也是为什么我们开源版本号从 V2.0.0 开始。)

2.5 A bit of history

Protocol buffers were initially developed at Google to deal with an index server request/response protocol. Prior to protocol buffers, there was a format for requests and responses that used hand marshalling/unmarshalling of requests and responses, and that supported a number of versions of the protocol. This resulted in some very ugly code, like:

最初 google 开发 Protocol buffers 用于处理索引服务器请求/响应协议。protocol buffers 之前,有一个使用手动编组和解组请求和响应的格式,支持很多版本的协议。这导致一些非常丑陋的代码,比如:

if (version == 3) {
} else if (version > 4) {
	if (version == 5) {

Explicitly formatted protocols also complicated the rollout of new protocol versions, because developers had to make sure that all servers between the originator of the request and the actual server handling the request understood the new protocol before they could flip a switch to start using the new protocol.


Protocol buffers were designed to solve many of these problems:

Protocol buffers 的设计目的是解决这些问题:

  • New fields could be easily introduced, and intermediate servers that didn't need to inspect the data could simply parse it and pass through the data without needing to know about all the fields.


  • Formats were more self-describing, and could be dealt with from a variety of languages (C++, Java, etc.)

    自描述格式,能够使用多种语言(C++,Java 等)处理。

However, users still needed to hand-write their own parsing code.


As the system evolved, it acquired a number of other features and uses:


  • Automatically-generated serialization and deserialization code avoided the need for hand parsing.


  • In addition to being used for short-lived RPC (Remote Procedure Call) requests, people started to use protocol buffers as a handy self-describing format for storing data persistently (for example, in Bigtable).

    除了用于短暂的 RPC(远程过程调用)请求,人们开始使用 protocol buffers 作为方便的自描述格式持久化存储数据(例如,在 Bigtable 中)。

  • Server RPC interfaces started to be declared as part of protocol files, with the protocol compiler generating stub classes that users could override with actual implementations of the server's interface.

    服务器 RPC 接口开始被声明为 protocol 文件的一不部分,用户使用实际服务器接口的实现可以重新 protocol 编译器生成的基类。

Protocol buffers are now Google's lingua franca for data – at time of writing, there are 48,162 different message types defined in the Google code tree across 12,183 .proto files. They're used both in RPC systems and for persistent storage of data in a variety of storage systems.

protocol buffer 现在是 Google 数据的通用语—当前为止,google 代码树中 12183 个.proto 文件定义了 48162 个不同消息类型。它们用于 RPC 系统和各种存储系统中持久化存储数据。

Author: 刘尚亮

Created: 2017-08-30 三 16:06

Emacs 25.2.2 (Org mode 8.2.10)