Deployment aware design
- 4 hours ago
- 2 min read
Since the last case study post, questions have been asked about what should be done in order to easier get models running on low power platforms. It’s a big topic, but the most effective steps start right from the beginning in the model design phase.

You likely have heard of “quantization aware training”, then “deployment aware design” is the same concept, but applied during model design, for the purpose of smooth deployment at the end.
Check compatibility
When the initial design of your model takes shape, it’s a good idea to check the compatibility with the deployment framework intended to use. Unlike deploying on GPU, low power platforms often carry limitations. Take the popular framework Tensorflow for example, last time I counted, only around 10% of Tensorflow core operators are supported natively by its Lite version. That % is even lower for Tensorflow Lite Micro variant. The remaining will highly depend on the converter and developer’s craftsmanship.
Once have the code for all the operators used, a trial run to generate the model file will check whether they are compatible with the model generation procedure. For fixed-point format preferred by most low power environment, quantization is often a problematic step, since having the implementation code is one thing, the framework knows how to quantize the model parameters accordingly is a different matter.
Resolving any compatibility issue before going too deep into the design phase would save a lot of headaches later on.
Tune performance/compute ratio
When dealing with very limited compute resource budget, ultimate performance often becomes non-realistic, finding the desirable performance/compute ratio is the key. Common practices:
Squeeze dimension size.
Explore simpler activation functions.
Reduce depth.
Trading things off for better ratio, for example having slightly bigger dimension size but less layers to achieve the same performance.
If using some sort of runtime environment for deployment, completing last step would allow quick measurement of resource usage.
Remove redundancy
This step requires developers to be critical to themselves, review the model they have been working hard on and try to find things not contributing to the final objective, some low-hanging fruits:
Matrix manipulation that helps code readability but not necessary.
Calculation done on the entire data matrix but only a small portion of result is used.
Information produced that help debugging but not required for calculating the result.
The ones higher up on the tree likely require modifying the model data path and running some sort of sensitivity test, to find out data and calculation not really contributing to the objective.
With above steps, the finished model will be deployable and lean, in a solid position for next step.




Comments