Difference between revisions of "Protobuf notes"
m (Create sub-section "Handling of values of zero".) |
m (Add section "Google protoc compiler".) |
||
Line 299: | Line 299: | ||
* https://pigweed.dev/pw_protobuf/#comparison-with-other-protobuf-libraries | * https://pigweed.dev/pw_protobuf/#comparison-with-other-protobuf-libraries | ||
+ | |||
+ | <!-- odne komentar --> | ||
+ | |||
+ | == [[#top|^]] Google protoc compiler == | ||
+ | |||
+ | How to pass options via cmake to protoc . . . | ||
+ | |||
+ | * https://github.com/nanopb/nanopb/issues/432 | ||
<!-- odne komentar --> | <!-- odne komentar --> |
Latest revision as of 17:55, 9 January 2025
Contents
Overview
There's a lot going on with Protobuf, therefore this local Neela Nurseries page started to capture some links to online docs and tutorials.
Also a good more detailed introduction, the following documentation page is one of a full collection of documentations. It contains example C code to "get the size of the message without storing it anywhere":
Another reference with possible good detail:
Encoding details of protobuf "on the wire":
^ Terminology and Elements Of
Protobuf has several features and "moving parts", Two of these which we'll loop back and write a bit more about include filed names and field numbers. Both of these are important identifiers which are expressed in the .proto message defining files used at build time for senders and receivers of protobuf formatted messages. Because they're defined at build time, this means that changing these name and numeric field identifiers after software has been built and released can and will likely cause message interpretation errors. The newer message format is not understood by the older version of software sending and receiving the updated messages.
Protobuf has a means for handling cases where field names and numbers need to be "removed". This means involves changing the message definitions to mark those defunct identifiers as reserved. They cannot be re-used, different names and numbers must be chosen, but they can be taken "out of circulation". Read further at:
^ To Factor Protobuf Message Definitions
The following online guide speaks to defining multiple related protobuf messages in a single .proto file:
^ Nanopb
2022-01-08 Saturday
- https://github.com/nanopb/nanopb/blob/master/generator/proto/nanopb.proto
- https://jpa.kapsi.fi/nanopb/docs/whats_new.html
- https://jpa.kapsi.fi/nanopb/docs/
Cmake script to locate Nanopb headers and sources:
A nanopb API reference:
A further reference from University of Hannover:
pb_encode.c has an interesting function . . .
258 /* Encode a field with callback semantics. This means that a user function is 259 * called to provide and encode the actual data. */ 260 static bool checkreturn encode_callback_field(pb_ostream_t *stream, 261 const pb_field_t *field, const void *pData) 262 { 263 const pb_callback_t *callback = (const pb_callback_t*)pData; 264 265 #ifdef PB_OLD_CALLBACK_STYLE 266 const void *arg = callback->arg; 267 #else 268 void * const *arg = &(callback->arg); 269 #endif 270 271 if (callback->funcs.encode != NULL) 272 { 273 if (!callback->funcs.encode(stream, field, arg)) 274 PB_RETURN_ERROR(stream, "callback error"); 275 } 276 return true; 277 }
^ Handling of values of zero
To save space nanopb (and maybe protobuf by its specification) does not encode values of zero and their fields, and during decoding the protobuf implementation assumes that missing fields carry a value of zero:
^ Set Up Errors
^ Protobuf C Code Examples
When compiling nanopb Protobuf library as part of C language programs, nested Protobuf messages require use of nanopb defined function type `pb_callback_t` in order to encode and to decode those nested messages. Some examples of this on github:
In the first example an early on file instance of `pb_callback_t` occurs on line 56. Looking further this project has a few dozen protoc generated files . . . switching to a possible smaller project:
In kitsune project, looking at:
(1) file kitsune/kitsune/audio_features_upload_task.c function setup_protbuf( . . . )
(2) file audio_features_upload_task_helpers.c function encode_repeated_streaming_bytes_and_mark_done(pb_ostream_t *stream, const pb_field_t *field, void * const *arg)
(3) in same file reviewing function write_streams(pb_ostream_t *stream, const pb_field_t *field,hlo_stream_t * hlo_stream)
Here is an excerpt from proto_utils.c which appears to contain a pb_callback_t definition:
147 bool encode_device_id_string(pb_ostream_t *stream, const pb_field_t *field, void * const *arg) { 148 //char are twice the size, extra 1 for null terminator 149 char hex_device_id[2*DEVICE_ID_SZ+1] = {0}; 150 if(!get_device_id(hex_device_id, sizeof(hex_device_id))) 151 { 152 return false; 153 } 154 155 return pb_encode_tag_for_field(stream, field) && pb_encode_string(stream, (uint8_t*)hex_device_id, strlen(hex_device_id)); 156 } Same routine no line numbers, plus following routine which references first routine in function point assignment: bool encode_device_id_string(pb_ostream_t *stream, const pb_field_t *field, void * const *arg) { //char are twice the size, extra 1 for null terminator char hex_device_id[2*DEVICE_ID_SZ+1] = {0}; if(!get_device_id(hex_device_id, sizeof(hex_device_id))) { return false; } return pb_encode_tag_for_field(stream, field) && pb_encode_string(stream, (uint8_t*)hex_device_id, strlen(hex_device_id)); } void pack_batched_periodic_data(batched_periodic_data* batched, periodic_data_to_encode* encode_wrapper) { if(NULL == batched || NULL == encode_wrapper) { LOGE("null param\n"); return; } batched->data.funcs.encode = encode_all_periodic_data; // This is smart :D batched->data.arg = encode_wrapper; batched->firmware_version = KIT_VER; batched->device_id.funcs.encode = encode_device_id_string; } A search for calls to `pack_batched_periodic_data()`: $ grep -nr pack_batched_periodic_data ./* ./commands.c:1069: pack_batched_periodic_data(&data_batched, &periodicdata); ./proto_utils.c:158:void pack_batched_periodic_data(batched_periodic_data* batched, periodic_data_to_encode* encode_wrapper) ./proto_utils.h:29:void pack_batched_periodic_data(batched_periodic_data* batched, periodic_data_to_encode* encode_wrapper);
Tracing yet further back kitsune project commands.c has following routine which declares and uses a `periodic_data` type:
1038 void thread_tx(void* unused) { 1039 batched_periodic_data data_batched = {0}; 1040 #ifdef UPLOAD_AP_INFO 1041 batched_periodic_data_wifi_access_point ap; 1042 #endif 1043 periodic_data forced_data; 1044 bool got_forced_data = false; 1045 1046 LOGI(" Start polling \n"); 1047 while (1) { 1048 if (uxQueueMessagesWaiting(data_queue) >= data_queue_batch_size 1049 || got_forced_data ) { 1050 LOGI( "sending data\n" ); 1051 1052 periodic_data_to_encode periodicdata; 1053 periodicdata.num_data = 0; 1054 periodicdata.data = (periodic_data*)pvPortMalloc(MAX_BATCH_SIZE*sizeof(periodic_data)); 1055 1056 if( !periodicdata.data ) { 1057 LOGI( "failed to alloc periodicdata\n" ); 1058 vTaskDelay(1000); 1059 continue; 1060 } 1061 if( got_forced_data ) { 1062 memcpy( &periodicdata.data[periodicdata.num_data], &forced_data, sizeof(forced_data) ); 1063 ++periodicdata.num_data; 1064 } 1065 while( periodicdata.num_data < MAX_BATCH_SIZE && xQueueReceive(data_queue, &periodicdata.data[periodicdata.num_ data], 1 ) ) { 1066 ++periodicdata.num_data; 1067 } 1068 1069 pack_batched_periodic_data(&data_batched, &periodicdata); 1070 1071 data_batched.has_uptime_in_second = true; 1072 data_batched.uptime_in_second = xTaskGetTickCount() / configTICK_RATE_HZ; 1073 1074 if( !is_test_boot() && provisioning_mode ) { . . .
In this kitsune project see also `kitsune/kitsune/protobuf/provision.pb.h`.
^ To Predetermine Encoded Data Size
This section might also be titled "To Determine Encoded Data Size at Build Time".
Often not possible with messages of variable and unknown length, the question of determining maximum possible encoded message size at build time can be useful for projects where messages have fixed field counts in a given message. This section collects what tools and online public discussions cover this protobuf analysis topic.
^ Encoding Submessages
May be necessary in a pb_callback_t function to call `pb_encode_tag()` followed by `pb_encode_submessage()`.
Evidently protobuf callbacks for decoding are not supported for `oneof` types:
How to encode strings, hint: use the .arg message structure member:
^ Length Prefixing
One way to send large data sets via protobuf is to break them into smaller pieces, and apply protobuf definition to give these pieces a meaning both sender and receiver can understand. See one Mr. Eli's article on this strategy:
^ Other Protobuf Libraries
Pigweed protobuf library . . .
^ Google protoc compiler
How to pass options via cmake to protoc . . .
^ References To Sort
Protobuf references, somewhat arbitrary starting point yet introduces some key topics of Protobuf standard and use cases:
- https://www.crankuptheamps.com/blog/posts/2017/10/12/protobuf-battle-of-the-syntaxes/
- https://www.educative.io/edpresso/what-is-the-difference-between-protocol-buffers-and-json
JSON supported data types:
First Protobuf .proto file, compiles using `protoc-c`, part of a package available with Ubuntu 20.04:
// syntax = "proto3"; syntax = "proto2"; // Notes: // $ protoc-c --c_out=. ./first.proto message sensorUpdates { required int32 message_id = 1; optional float vrms = 2; }
. . . It appears that the integer values which message elements are assigned as tantamount to key names in JSON.