Creating a Recurrent Neural Network from scratch using C#
Let’s build Recurrent Neural Network in C#!
The main objective of this post is to implement an RNN from scratch using c# and provide an easy explanation as well to make it useful for the readers.
Implementing any neural network from scratch at least once is a valuable exercise. It helps you gain an understanding of how neural networks work and here we are implementing an RNN which has its own complexity and thus provides us with a good opportunity to hone our skills.
What is an RNN?
It’s a process of sequence. It’s used for NLP, voice recognition ,text recognition.
Recurrent neural networks are artificial neural networks where the computation graph contains directed cycles.
Unlike feedforward neural networks, where information flows strictly in one direction from layer to layer, in recurrent neural networks (RNNs), information travels in loops from layer to layer so that the state of the model is influenced by its previous states.
While feedforward neural networks can be thought of as stateless, RNNs have a memory which allows the model to store information about its past computations. This allows recurrent neural networks to exhibit dynamic temporal behavior and model sequences of input-output pairs.
Another way to think about RNNs is that they have a “memory” which captures information about what has been calculated so far.
RNNs are called recurrent because they perform the same computation (determined by the weights, biases, and activation functions) for every element in the input sequence. The difference between the outputs for different elements of the input sequence comes from the different hidden states, which are dependent on the current element in the input sequence and the value of the hidden states at the last time step.
It might be tempting to try to solve this problem using feedforward neural networks, but two problems become apparent upon investigation. The first issue is that the sizes of an input x and an output y are different for different input-output pairs.
Input: x(t) is taken as the input to the network at time step t. For example, x1, could be a one-hot vector corresponding to a word of a sentence.
Hidden state: h(t) represents a hidden state at time t and acts as “memory” of the network. h(t) is calculated based on the current input and the previous time step’s hidden state: h(t) =
f(
U x(t) +
W h(t−
1)).
The function f is taken to be a non-linear transformation such as tanh, ReLU.
Weights: The RNN has input to hidden connections parameterized by a weight matrix U, hidden-to-hidden recurrent connections parameterized by a weight matrix W, and hidden-to-output connections parameterized by a weight matrix V and all these weights (U,V,W) are shared across time.
Output: o(t) illustrates the output of the network. In the figure I just put an arrow after o(t) which is also often subjected to non-linearity, especially when the network contains further layers downstream.
Input and data initialization
Feedforward Pass
The RNN forward pass can thus be represented by below set of equations.
Backprogpate Pass
Computing Gradients / Loss
Given our loss function L, we need to calculate the gradients for our three weight matrices U, V, W, and bias terms b, c and update them with a learning rate α. Similar to normal back-propagation, the gradient gives us a sense of how the loss is changing with respect to each weight parameter. We update the weights W to minimize loss with the following equation:
Conclusion
What a journey, right? We’ve learned a lot about the inner workings of the Recurrent Neural Network models. More importantly, we’ve implemented the backpropagation algorithm. Hopefully, you got some practical understanding of the processes involved in training an Recurrent Neural Network. Can you adapt the code and make a Deep Neural Network?
You can find the source code from my GitHub account
You can find the video from my Youtube channel