Difference between revisions of "Protobuf notes"

Latest revision as of 17:55, 9 January 2025

Overview

There's a lot going on with Protobuf, therefore this local Neela Nurseries page started to capture some links to online docs and tutorials.

https://protobuf.dev/getting-started/cpptutorial/

https://protobuf.dev/programming-guides/proto3/

Also a good more detailed introduction, the following documentation page is one of a full collection of documentations. It contains example C code to "get the size of the message without storing it anywhere":

https://chromium.googlesource.com/external/github.com/nanopb/nanopb/+/refs/heads/master/docs/concepts.md

Another reference with possible good detail:

https://www.swi-prolog.org/pldoc/man?section=protobufs-tags

Encoding details of protobuf "on the wire":

https://protobuf.dev/programming-guides/encoding/

^ Terminology and Elements Of

Protobuf has several features and "moving parts", Two of these which we'll loop back and write a bit more about include filed names and field numbers. Both of these are important identifiers which are expressed in the .proto message defining files used at build time for senders and receivers of protobuf formatted messages. Because they're defined at build time, this means that changing these name and numeric field identifiers after software has been built and released can and will likely cause message interpretation errors. The newer message format is not understood by the older version of software sending and receiving the updated messages.

Protobuf has a means for handling cases where field names and numbers need to be "removed". This means involves changing the message definitions to mark those defunct identifiers as reserved. They cannot be re-used, different names and numbers must be chosen, but they can be taken "out of circulation". Read further at:

https://protobuf.dev/programming-guides/proto3/#assigning

^ To Factor Protobuf Message Definitions

The following online guide speaks to defining multiple related protobuf messages in a single .proto file:

https://protobuf.dev/programming-guides/proto3/#adding-types

^ Nanopb

2022-01-08 Saturday

https://docs.python.org/3/tutorial/modules.html

Cmake script to locate Nanopb headers and sources:

https://chromium.googlesource.com/external/github.com/nanopb/nanopb/+/nanopb-0.2.9.1/extra/FindNanopb.cmake

A nanopb API reference:

https://jpa.kapsi.fi/nanopb/docs/reference.html

https://jpa.kapsi.fi/nanopb/docs/reference.html#pb_encode

https://jpa.kapsi.fi/nanopb/docs/reference.html#pb_encode_submessage

A further reference from University of Hannover:

https://gitlab.uni-hannover.de/tci-gateway-module/grpc/-/blob/47a06ace92d0db299e6fa9ecc9a9d26db8d85c62/third_party/nanopb/docs/reference.rst#pb-encode

pb_encode.c has an interesting function . . .

258 /* Encode a field with callback semantics. This means that a user function is
259  * called to provide and encode the actual data. */
260 static bool checkreturn encode_callback_field(pb_ostream_t *stream,
261     const pb_field_t *field, const void *pData)
262 {
263     const pb_callback_t *callback = (const pb_callback_t*)pData;
264 
265 #ifdef PB_OLD_CALLBACK_STYLE
266     const void *arg = callback->arg;
267 #else
268     void * const *arg = &(callback->arg);
269 #endif
270 
271     if (callback->funcs.encode != NULL)
272     {
273         if (!callback->funcs.encode(stream, field, arg))
274             PB_RETURN_ERROR(stream, "callback error");
275     }
276     return true;
277 }

^ Handling of values of zero

To save space nanopb (and maybe protobuf by its specification) does not encode values of zero and their fields, and during decoding the protobuf implementation assumes that missing fields carry a value of zero:

https://github.com/nanopb/nanopb/issues/696

^ Set Up Errors

https://github.com/zephyrproject-rtos/zephyr/issues/70065

^ Protobuf C Code Examples

When compiling nanopb Protobuf library as part of C language programs, nested Protobuf messages require use of nanopb defined function type `pb_callback_t` in order to encode and to decode those nested messages. Some examples of this on github:

https://github.com/particle-iot/device-os/blob/9338b13b1e611f1b57f71f702795c2ca71142c1f/proto_defs/src/cloud/describe.pb.h#L9

https://github.com/hello/kitsune/blob/6a28afa80dd4547907e58341a84bd0b5bec5d88e/kitsune/protobuf/periodic.pb.h#L11

In the first example an early on file instance of `pb_callback_t` occurs on line 56. Looking further this project has a few dozen protoc generated files . . . switching to a possible smaller project:

https://github.com/hello/kitsune

In kitsune project, looking at:

(1) file kitsune/kitsune/audio_features_upload_task.c function setup_protbuf( . . . )
(2) file audio_features_upload_task_helpers.c function encode_repeated_streaming_bytes_and_mark_done(pb_ostream_t *stream, const pb_field_t *field, void * const *arg)
(3) in same file reviewing function write_streams(pb_ostream_t *stream, const pb_field_t *field,hlo_stream_t * hlo_stream)

Here is an excerpt from proto_utils.c which appears to contain a pb_callback_t definition:

147 bool encode_device_id_string(pb_ostream_t *stream, const pb_field_t *field, void * const *arg) {
148     //char are twice the size, extra 1 for null terminator
149     char hex_device_id[2*DEVICE_ID_SZ+1] = {0};
150     if(!get_device_id(hex_device_id, sizeof(hex_device_id)))
151     {
152         return false;
153     }
154 
155     return pb_encode_tag_for_field(stream, field) && pb_encode_string(stream, (uint8_t*)hex_device_id, strlen(hex_device_id));
156 }

Same routine no line numbers, plus following routine which references first routine in function point assignment:

bool encode_device_id_string(pb_ostream_t *stream, const pb_field_t *field, void * const *arg) {
    //char are twice the size, extra 1 for null terminator
    char hex_device_id[2*DEVICE_ID_SZ+1] = {0};
    if(!get_device_id(hex_device_id, sizeof(hex_device_id)))
    {   
        return false;
    }   

    return pb_encode_tag_for_field(stream, field) && pb_encode_string(stream, (uint8_t*)hex_device_id, strlen(hex_device_id));
}

void pack_batched_periodic_data(batched_periodic_data* batched, periodic_data_to_encode* encode_wrapper)
{
    if(NULL == batched || NULL == encode_wrapper)
    {   
        LOGE("null param\n");
        return;
    }   

    batched->data.funcs.encode = encode_all_periodic_data;  // This is smart :D
    batched->data.arg = encode_wrapper;
    batched->firmware_version = KIT_VER;
    batched->device_id.funcs.encode = encode_device_id_string;
}

A search for calls to `pack_batched_periodic_data()`:

$ grep -nr pack_batched_periodic_data ./*
./commands.c:1069:			pack_batched_periodic_data(&data_batched, &periodicdata);
./proto_utils.c:158:void pack_batched_periodic_data(batched_periodic_data* batched, periodic_data_to_encode* encode_wrapper)
./proto_utils.h:29:void pack_batched_periodic_data(batched_periodic_data* batched, periodic_data_to_encode* encode_wrapper);

Tracing yet further back kitsune project commands.c has following routine which declares and uses a `periodic_data` type:

1038 void thread_tx(void* unused) {
1039         batched_periodic_data data_batched = {0};
1040 #ifdef UPLOAD_AP_INFO
1041         batched_periodic_data_wifi_access_point ap;
1042 #endif
1043         periodic_data forced_data;
1044         bool got_forced_data = false;
1045 
1046         LOGI(" Start polling  \n");
1047         while (1) {
1048                 if (uxQueueMessagesWaiting(data_queue) >= data_queue_batch_size
1049                  || got_forced_data ) {
1050                         LOGI(   "sending data\n" );
1051 
1052                         periodic_data_to_encode periodicdata;
1053                         periodicdata.num_data = 0;
1054                         periodicdata.data = (periodic_data*)pvPortMalloc(MAX_BATCH_SIZE*sizeof(periodic_data));
1055 
1056                         if( !periodicdata.data ) {
1057                                 LOGI( "failed to alloc periodicdata\n" );
1058                                 vTaskDelay(1000);
1059                                 continue;
1060                         }
1061                         if( got_forced_data ) {
1062                                 memcpy( &periodicdata.data[periodicdata.num_data], &forced_data, sizeof(forced_data) );
1063                                 ++periodicdata.num_data;
1064                         }
1065                         while( periodicdata.num_data < MAX_BATCH_SIZE && xQueueReceive(data_queue, &periodicdata.data[periodicdata.num_     data], 1 ) ) {
1066                                 ++periodicdata.num_data;
1067                         }
1068 
1069                         pack_batched_periodic_data(&data_batched, &periodicdata);
1070 
1071                         data_batched.has_uptime_in_second = true;
1072                         data_batched.uptime_in_second = xTaskGetTickCount() / configTICK_RATE_HZ;
1073 
1074                         if( !is_test_boot() && provisioning_mode ) {

 . . .

In this kitsune project see also `kitsune/kitsune/protobuf/provision.pb.h`.

^ To Predetermine Encoded Data Size

This section might also be titled "To Determine Encoded Data Size at Build Time".

Often not possible with messages of variable and unknown length, the question of determining maximum possible encoded message size at build time can be useful for projects where messages have fixed field counts in a given message. This section collects what tools and online public discussions cover this protobuf analysis topic.

https://stackoverflow.com/questions/30915704/maximum-serialized-protobuf-message-size

^ Encoding Submessages

https://stackoverflow.com/questions/56739667/nanopb-protocol-buffers-library-repeated-sub-messages-encode

https://groups.google.com/g/nanopb/c/OT4Kw3Siuio

May be necessary in a pb_callback_t function to call `pb_encode_tag()` followed by `pb_encode_submessage()`.

https://github.com/nanopb/nanopb/issues/331

Evidently protobuf callbacks for decoding are not supported for `oneof` types:

https://stackoverflow.com/questions/39854434/nanopb-correctly-encoding-and-decoding-repeated-construct-fields-in-submessage

How to encode strings, hint: use the .arg message structure member:

https://stackoverflow.com/questions/57569586/how-to-encode-a-string-when-it-is-a-pb-callback-t-type

^ Length Prefixing

One way to send large data sets via protobuf is to break them into smaller pieces, and apply protobuf definition to give these pieces a meaning both sender and receiver can understand. See one Mr. Eli's article on this strategy:

https://eli.thegreenplace.net/2011/08/02/length-prefix-framing-for-protocol-buffers

^ Other Protobuf Libraries

Pigweed protobuf library . . .

https://pigweed.dev/pw_protobuf/#comparison-with-other-protobuf-libraries

^ Google protoc compiler

How to pass options via cmake to protoc . . .

https://github.com/nanopb/nanopb/issues/432

^ References To Sort

Protobuf references, somewhat arbitrary starting point yet introduces some key topics of Protobuf standard and use cases:

JSON supported data types:

https://www.w3schools.com/js/js_json_datatypes.asp

First Protobuf .proto file, compiles using `protoc-c`, part of a package available with Ubuntu 20.04:

// syntax = "proto3";
syntax = "proto2";

// Notes:
// $ protoc-c --c_out=. ./first.proto

message sensorUpdates {
  required int32 message_id = 1;
  optional float vrms = 2;
}

. . . It appears that the integer values which message elements are assigned as tantamount to key names in JSON.