top of page

from LSTM to GRU

  • jimli44
  • May 19
  • 1 min read

This post is a quick experience sharing about RNN.

ree

When thinking about employing Recurrent Neural Network (RNN) in our model, LSTM is the goto choice. However, even when we are happy with LSTM, we should still give GRU a quick try.

 

GRU is “pin compatible” with LSTM, in other words, it’s very easy to swap LSTM with GRU in model languages. To find out whether GRU works for your model, it’s often as simple as changing the layer name from LSTM to GRU and re-run training.

 

GRU is more efficient, ballpark 25% less parameters and 30% faster compared with LSTM.

Take a layer with 64 RNN units and 64 input as example, implementation using LSTM will have 33024 parameters while using GRU requires 24960 parameters. Given the similar gated memory architecture, when there is such saving in number of parameters, it’s not hard to imagine the similar saving in computation.

 

Now the million dollar question, does GRU perform the same?

It’s case by case, depending on actual model and the goal. My experience so far, mainly on signal processing use cases, show GRU can achieve same performance as LSTM. Here are a few other evaluations should serve as good reference.



The bottom line is, when something is so easy to swap in and offer great potential benefits, should definitely give it a try.

 
 
 

Comments


Commenting on this post isn't available anymore. Contact the site owner for more info.

Author

WLi_pic.webp

Weiming Li

  • LinkedIn

© 2025 by MLSP.ai. All Rights Reserved

bottom of page