Creating Convolutional Neural Network from scratch using C#

Let’s build Convolutional Neural Network in C#!

The main objective of this post is to implement an CNN from scratch using c# and provide an easy explanation as well to make it useful for the readers.

convolutional neural network process

What is Convolutional Neural Network?

Convolutional neural networks are the workhorse behind a lot of the progress made in deep learning during the 2010s. These networks have revolutionized tasks such as image classification and object detection, but they also work remarkably well in other contexts such as text classification, speech recognition, or any domain where a filter can be used to detect similarities in regions of input data.

In this post I will go over how to build a basic CNN in from scratch using C#. This exercise goes into the nuts and bolts for how these networks actually work, it is impossible to understand what a convolutional neural network is actually doing at each step when all you have to do is type a few lines of code to create a CNN.


When Yann LeCun published his work on the development of a new kind of neural network architecture, the Convolutional Neural Network (CNN), his work went largely unnoticed. It took 14 years and a team of researchers from The University of Toronto to bring CNN’s into the public’s view during the 2012 ImageNet Computer Vision competition. Their entry, which they named AlexNet after chief architect Alex Krizhevsky, achieved an error of only 15.8% when tasked with classifying millions of images from thousands of categories. Fast forward to 2018 and the current state-of-the-art Convolutional Neural Networks achieve accuracies that surpass human-level performance.

Process of CNN

Using already existing models in ML/DL libraries might be helpful in some cases. But to have better control and understanding, you should try to implement them yourself. This article shows how a CNN is implemented just using C#.

Convolutional neural network (CNN) is the state-of-art technique for analyzing multidimensional signals such as images. There are different libraries that already implements CNN such as CNTK, TensorFlow and Keras. Such libraries isolates the developer from some details and just give an abstract API to make life easier and avoid complexity in the implementation. But in practice, such details might make a difference. Sometimes, the data scientist have to go through such details to enhance the performance. The solution in such situation is to build every piece of such model your own. This gives the highest possible level of control over the network. Also, it is recommended to implement such models to have better understanding over them.

In this article, CNN is created using only C# library. Five layers are created which are Convolution, ReLU, Max pooling , Flatten and Fully Connected. The major steps involved are as follows:


creating the inputs from image

Convolution Layer

This holds the raw pixel values of the training image as input. In the example above an image (deer) of width 32, height 32, and with three colour channels R, G, B is used. It goes through the forward propagation step and finds the output probabilities for each class. This layer ensures the spatial relationship between pixels by learning image features using small squares of input data.

The green section resembles our 5x5x1 input image, I. The element involved in carrying out the convolution operation in the first part of a Convolutional Layer is called the Kernel/Filter, K, represented in the color yellow.

We have selected Filter as a 3x3x1 matrix.

The filter used in the diagram could be used for sharpening an image , edge detection, blur and identity.

Rectified Linear Unit (ReLU) Layer

Relu activation layer gives you the non negative values.

ReLU stands for Rectified Linear Unit for a non-linear operation. The output is ƒ(x) = max(0,x).

There are other non linear functions such as tanh or sigmoid that can also be used instead of ReLU. Most of the data scientists use ReLU since performance wise ReLU is better than the other two.

Pooling Layer

Max pooling is used to pick the maximum value of the each filter size 2x2

Max pooling, the most common type of pooling, simply means taking the maximum value from a given array of numbers.

In this case, we split up the feature map into a bunch of n×n boxes and choose only the maximum value from each box. Here is what that looks like.

Flatten Layer

Flatten layer is used to convert the multi dimension to single dimension vector values.

flatten is the single line

Fully Connected Layer

Fully connected layer is similar to feed forward neural network.

The final layer of a convolution neural network is called the fully connected layer. This is a standard neural network .

It is applied to the dot product of an input and a matrix of weights. Then a softmax function can convert the output into a list of probabilities for classification.

Thanks for reading this post. you can follow me.

CTO @ Preline,Inc