from LSTM to GRU
- jimli44
- May 19
- 1 min read
This post is a quick experience sharing about RNN.

When thinking about employing Recurrent Neural Network (RNN) in our model, LSTM is the goto choice. However, even when we are happy with LSTM, we should still give GRU a quick try.
GRU is “pin compatible” with LSTM, in other words, it’s very easy to swap LSTM with GRU in model languages. To find out whether GRU works for your model, it’s often as simple as changing the layer name from LSTM to GRU and re-run training.
GRU is more efficient, ballpark 25% less parameters and 30% faster compared with LSTM.
Take a layer with 64 RNN units and 64 input as example, implementation using LSTM will have 33024 parameters while using GRU requires 24960 parameters. Given the similar gated memory architecture, when there is such saving in number of parameters, it’s not hard to imagine the similar saving in computation.
Now the million dollar question, does GRU perform the same?
It’s case by case, depending on actual model and the goal. My experience so far, mainly on signal processing use cases, show GRU can achieve same performance as LSTM. Here are a few other evaluations should serve as good reference.
LSTM and GRU Neural Network Performance Comparison Study: Taking Yelp Review Dataset as an Example (https://ieeexplore.ieee.org/document/9221727)
A comparison of LSTM and GRU networks for learning symbolic sequences (https://arxiv.org/abs/2107.02248)
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling (https://arxiv.org/abs/1412.3555)
The bottom line is, when something is so easy to swap in and offer great potential benefits, should definitely give it a try.





Comments