Deployment aware design

Mar 9
2 min read

Updated: Mar 10

Since the last case study post, questions have been asked about what should be done in order to easier get models running on low power platforms. It’s a big topic, but the most effective steps start right from the beginning in the model design phase.

You likely have heard of “quantization aware training”, then “deployment aware design” is the same concept, but applied during model design, for the purpose of smooth deployment at the end.

Check compatibility

When the initial design of your model takes shape, it’s a good idea to check the compatibility with the deployment framework intended to use. Unlike deploying on GPU, low power environment often carry limitations. Take the popular Tensorflow framework for example, last time I counted, only around 10% of Tensorflow core operators are supported natively by its Lite version. That % is even lower for Tensorflow Lite Micro variant. The remaining will highly depend on the converter and developer’s craftsmanship.

Once have the code for all the operators used, a trial run to generate the model file will check whether they are compatible with the model generation procedure. For fixed-point format preferred by most low power environment, quantization is often a problematic step, since having the implementation code is one thing, the framework knows how to quantize the parameters accordingly is a different matter.

Resolving any compatibility issue before going too deep into the design phase would save a lot of headaches later on.

Tune performance/compute ratio

When dealing with very limited compute resource budget, ultimate performance often becomes non-realistic, finding the desirable performance/compute ratio is the key. Common practices:

Squeeze dimension size.
Explore simpler activation functions.
Reduce depth.
Trading things off for better ratio, for example having slightly bigger dimension size but less layers to achieve the same performance.

If using some sort of runtime environment for deployment, the draft model file from last step would allow quick measurement of resource usage. If manual deployment is going to be used, close collaboration between design and deployment engineer will go a long way here.

Remove redundancy

This step requires developers to be critical to themselves, scrutinize the model they have been working hard on and try to find things not contributing to the final objective, some low-hanging fruits:

Matrix manipulation that helps code tidiness but not necessary.
Calculation done on the entire data matrix but only a small portion of result is used.
Information produced that help debugging but not required for calculating the result.

The ones higher up on the tree likely require modifying the model data path and running some sort of sensitivity test, to find out data and calculation not really contributing to the objective. These are usually can be done as v2 optimization.

With above, the finished model will be deployable and lean, in a solid position for next step.

Deployment aware design

Recent Posts

Comments

Author