Machine Learning made fun and easy (Part 2)

This post is the second part of the machine learning made fun an easy, you can check the first part in the link below :

Neural Networks

As we went through all the popular and effective algorithms before, it’s time to tacle the big boss : Neural Networks

Inspired by the brain neural functionalities, neural networks were designed to stimulate how the human brain works and navigates information through the system, but how exactly???

Well, as every human learns from |Repetition, Mistakes, Guidance| (from learning how to walk as a baby to perfecting skills in your own domain), neural networks also use these principles to train models to be as smart as possible, by “repeatedly” going through the data, each iteration correcting the “mistakes” while being “guided” by the parameters given at the start.

To put it even more simple :
Neural refers to nodes/neurons and Network refers to the connection between those neurons.

At the start:
The neurons at the first column contains the input values of our data while the other neurons are 0. Each network(called synapse) contains a “weight” which is a random value at first and a “bias”.

The weight represents how much that synapse contributes to the value of the next neuron
The bias represents a favorism mechanic that expresses how much the model leans towards a result (its like when you wanna pick vanilla or chocolate, your usual favorite flavor impacts your in the moment decision)

Neural Network architecture

Neurons are organized in layers, the first one being the input layer(number of neurons = number of inputs) and the last layer is the output layer, and the layers in between are called hidden layers, where all the calculations happen 😉.

Each neuron from each hidden layer is connected to the neurons of the layer before (that includes the output layer too), and that allows the information to flow through the whole neural network (like how information flows in our brains) and for each neuron to contribute to the whole system.

When the flow is in the direction from the input to the output layer it’s called “feed forward” as you keep feeding neurons with new data going forward in the network.

When you go the opposite direction it’s called “back propagation” as you use your findings in the end of the network to optimise the results found (compare the findings with the desired result and go back to try correcting the weights and biases to adapt to what we want)

This process of feed forward and back propagation is how neural networks are trained.

Example of a neural network

the process of feed forward advances the data from a layer to the next one, by performing calculations until reaching the output layer.

But what calculations exactly? 🤔

The new value X(new) of a neuron is calculated by:
-Taking the value of each neuron X in the previous layer, and multiplying each one by its associated weight W
-Then adding the bias value B
-After that we apply the sum of all of this to obtain the value we seek.

Then just do this for all the neurons of a network and your feed forward is done 😉.

Calculation of perceptrons values within the neural network

That is, when we are speaking mathematically, but practically there is another parameter that interfers with this equation, which is the activation function.

Looking back at us humans, the brain sends through the “right” information by firing specific neurons in a specific order, which then provides the right signal that we desired.

The same analogy goes for neural nets, as it decides what neurons to fire after performing the calculations we discussed previously , but since the calculated values are not bound by a certain range, what if they all fire at once or something 😲😲😲?

This is where the activation function comes into play, regularizing the values generated by the neuron (post calculations) to only have certain neurons fire, “activating” the desired ones.
Without these functions, the neural network remains a linear function which means it cannot solve complex problems (that’s the answer to why 😉)

Let’s understand by seeing what some activation functions do:
-Sigmoid : maps values to probabilities (values between 0 and 1), good for binary classifications.
-Softmax : more generalized logistic function, good for multiclass classifications.
-TanH : while sigmoid’s range is 0 to 1, TanH is between -1 and 1, The advantage is that the negative inputs will be mapped strongly negative and the zero inputs will be mapped near zero.
-RelU: maps negative values to 0 while keep the positive ones as they are, it’s the most used function due to computational efficiency.
-Linear: A=bx used mostly in the output layer of regression problems.

Different activation functions

We have explained so far the feed forward process generally, but what about the back propagation, how do we make sure the weights are updated well?

Here comes the concept of the “Loss Function”, also called the error function or cost function (too many nicknames 😂), that can be explained as simple as:
-It is calculated from the difference between the “desired” output and the the “predicted” output
-The “gradient” of this function is used to update the weights in each of the back propagation phases (also called optimizing the weights)

The goal of having this function is to keep “minimizing the loss” in the network, which increases on the other hand, the accuracy of the model

The most popular/used loss functions are:
-MSE(Mean Squared Error) : used in regression problems, by taking the mean of squared differences between predicted and target value.
-Binary Crossentropy : used for binary classification, based on the usage of the Log function to compare the predicted and target value, but should always be preceded by a sigmoid function to guarantee having values between 0 and 1.
-Categorical Crossentropy: used for multiclass classification, same principle as before, but with having softmax as activation function for the output layer

Conclusion

There is still a whole ocean of things to be explained in the universe of machine learning, but as long as you get a solid foundation at the start, everything becomes manageable with some effort and perseverance 😉.

Machine Learning and Data Science enthousiast. Building and learning through creative and fun approaches.