top of page

Edge AI Models Deployment : A Novel Solution

  • jimli44
  • Jun 11
  • 3 min read

Updated: Jul 3

A novel solution for Edge AI models deployment that addresses major pain points: high overhead, model IP protection and system integration.

ree

Let's have a quick comparison of the standard approach (left) and the proposed solution (right) to deploy a Neural Network model onto target platform.


ree

The key to the standard approach is the NN deployment framework, Tensorflow Lite (Micro) is a popular choice for embedded platforms. Deploying a model this way relying on the readiness of this deployment framework on the target platform and knowhow of converting the model.


In the proposed solution, the model and all its dependencies, including data and processing routines, are packaged into one file. It will require some skills to do so, the result is target platform can then integrate the model processing like an usual function library.


Expanding on the model packaging approach, there are a few advantages it offers.


Minimal framework overhead

The deployment framework is essentially a big library of code. Take deploying a modest 66k parameters model for example (details given in appendix), the model file itself is 84kB of binary. However, when compile in the deployment framework, extra 166kB is added, making the total program memory requirement of this NN based feature at 250kB. In small footprint embedded world, this additional 166kB could well decide whether this feature will fit into the product system or not.

In the proposed model packaging approach, the same model only requires <5kB worth of code. Leaner overhead means wider choice of deploy-able platforms, i.e. greater commercial opportunities.



Model IP protection

For model developers, the model is the IP. If having to ship the model file out for final integration, exposing the IP, that would NOT be a desirable approach. Packaging the model into binary provides an invaluable layer of protection.

Moreover, not just the model now is hidden away, security features like license key checking, can also be added to the package to facilitate licensing management.



Zero hassle system integration

In collaborative projects, adding the required deployment framework or adapting the model to existing framework could be hardwork or even impractical. Model packaging makes system integration become as simple as linking in the model package then call the processing function, significantly lowering integration risk and reassuring time-to-market. Same package can also be used on multiple platforms as long as in same architecture.



To experience packaged model in your own application, "free trial: integrate NN processing in MCU & DSP with 2 lines of C code".


In next post, we will dive into how model packaging exactly work, stay tuned!




Appendix


Example model conversion log.

_________________________________________________________________
 Layer (type)                Output Shape              Param #
=================================================================
 input_1 (InputLayer)        [(1, 64)]                 0
 reshape (Reshape)           (1, 1, 64)                0
 dense (Dense)               (1, 1, 128)               8320
 conv1d (Conv1D)             (1, 1, 128)               16512
 dense_1 (Dense)             (1, 1, 128)               16512
 conv1d_1 (Conv1D)           (1, 1, 128)               16512
 dense_2 (Dense)             (1, 1, 64)                8256
=================================================================
Total params: 66,112
Trainable params: 66,112
Non-trainable params: 0
_________________________________________________________________
=== TFLite ModelAnalyzer ===
Your TFLite model has '1' subgraph(s). In the subgraph description below,
T# represents the Tensor numbers. For example, in Subgraph#0, the FULLY_CONNECTED op takes
tensor #0 and tensor #10 and tensor #7 as input and produces tensor #13 as output.
Subgraph#0 main(T#0) -> [T#26]
  Op#0 FULLY_CONNECTED(T#0, T#10, T#7) -> [T#13]
  Op#1 TANH(T#13) -> [T#14]
  Op#2 RESHAPE(T#14, T#3[1, 1, 1, 128]) -> [T#15]
  Op#3 CONV_2D(T#15, T#5, T#1) -> [T#16]
  Op#4 TANH(T#16) -> [T#17]
  Op#5 RESHAPE(T#17, T#4[1, 1, 128]) -> [T#18]
  Op#6 FULLY_CONNECTED(T#18, T#11, T#9) -> [T#19]
  Op#7 TANH(T#19) -> [T#20]
  Op#8 RESHAPE(T#20, T#3[1, 1, 1, 128]) -> [T#21]
  Op#9 CONV_2D(T#21, T#6, T#2) -> [T#22]
  Op#10 TANH(T#22) -> [T#23]
  Op#11 RESHAPE(T#23, T#4[1, 1, 128]) -> [T#24]
  Op#12 FULLY_CONNECTED(T#24, T#12, T#8) -> [T#25]
  Op#13 TANH(T#25) -> [T#26]
Tensors of Subgraph#0
  T#0(serving_default_input_1:0) shape:[1, 64], type:INT16
  T#1(model/conv1d/BiasAdd/ReadVariableOp) shape:[128], type:INT64 RO 1024 bytes, buffer: 2, data:[??, ??, ??, ??, ??, ...]
  T#2(model/conv1d_1/BiasAdd/ReadVariableOp) shape:[128], type:INT64 RO 1024 bytes, buffer: 3, data:[??, ??, ??, ??, ??, ...]
  T#3(model/conv1d_1/Conv1D/ExpandDims) shape:[4], type:INT32 RO 16 bytes, buffer: 4, data:[1, 1, 1, 128]
  T#4(model/dense/Tensordot/shape) shape:[3], type:INT32 RO 12 bytes, buffer: 5, data:[1, 1, 128]
  T#5(model/conv1d/Conv1D) shape:[128, 1, 1, 128], type:INT8 RO 16384 bytes, buffer: 6, data:[., W, ., ., ., ...]
  T#6(model/conv1d_1/Conv1D) shape:[128, 1, 1, 128], type:INT8 RO 16384 bytes, buffer: 7, data:[., 0, W, ., b, ...]
......
  T#24(model/conv1d_1/Tanh;model/conv1d_1/BiasAdd;model/conv1d_1/Conv1D/Squeeze;model/conv1d_1/BiasAdd/ReadVariableOp1) shape:[1, 1, 128], type:INT16
  T#25(model/dense_2/Tensordot;model/dense_2/BiasAdd) shape:[1, 1, 64], type:INT16
  T#26(StatefulPartitionedCall:0) shape:[1, 1, 64], type:INT16
---------------------------------------------------------------
              Model size:      84056 bytes
    Non-data buffer size:      13772 bytes (16.38 %)
  Total data buffer size:      70284 bytes (83.62 %)
    (Zero value buffers):          0 bytes (00.00 %)

 
 
 

Comments


Commenting on this post isn't available anymore. Contact the site owner for more info.

Author

WLi_pic.webp

Weiming Li

  • LinkedIn

© 2025 by MLSP.ai. All Rights Reserved

bottom of page