Deep Learning / Machine Learning , Artificial intelligence
Currently, AI is advancing at an excellent pace and deep learning is one among the most important contributors thereto . So, within the 2nd post of the 101 for Dummies like Me series, I’ll take you thru the fundamentals of deep learning. you’ll find the first post here where we talked about the fundamentals of PyTorch, which was inspired by Intro to Deep Learning with PyTorch from Udacity. a number of the pictures during this post are taken from Udacity Deep Learning Nanodegree which may be a great start line for beginners. within the last years, applications of Deep Learning made huge advancements in many domains arousing astonishment in folks that didn’t expect the technology and world to vary so fast. Refer below image for a uber level comparison between AI vs. ML vs. DL
The hype started in March 2016 when Lee Sedol, the 18-time world champion was beaten 4 to 1 by super-computer AlphaGo. This match had an enormous influence on the Go community as AlphaGo invented completely new moves which made people attempt to understand, reproduce them and created a completely new perspective on the way to play the sport . But that’s not over, in 2017 DeepMind introduced AlphaGo Zero. The newer version of an already unbeatable machine was ready to learn everything with none starting data or human help. All that with computational power 4 times but it’s predecessor!
What is Deep Learning all about?
Deep learning may be a branch of machine learning supported a group of algorithms that plan to model high-level abstractions in data inspired by the structure and performance of the brain called artificial neural networks.
In a simple case, consider the image on your left, where you’ve got some sets of neurons: The leftmost layer of the network is named the input layer(L1), and therefore the rightmost layer the output layer(L3) (which, during this example, has just one node). the center layer of nodes is named the hidden layer(L2) because its values aren’t observed within the training set. We also say that our example neural network has 3 input units (not counting the bias unit), 3 hidden units, and 1 output unit. Similarly, if it’s a deep network, there are many layers between the input and output (and the layers aren’t made from neurons but it can help to consider it that way), allowing the algorithm to use multiple processing layers, composed of multiple linear and non-linear transformations.
Sidenote from Biology: Like our human brain has many neurons during a hierarchy and Network of neurons which are interconnected with one another via Axons and passes Electrical signals from one layer to a different called synapses. this is often how we humans learn things. Whenever we see, hear, feel and think something a synapse(electrical impulse) is fired from one neuron to a different within the hierarchy which enables us to find out , remember and memorize things in our lifestyle since the day we were born.
A neural network having more than one hidden layer is generally referred to as a Deep Neural Network.
How Deep Learning differ from Traditional Machine Learning?
One of the main differences between machine learning and deep learning model is on the feature extraction area. Feature extraction is completed by human in machine learning whereas deep learning model figures out by itself.
Deep learning models tend to perform well with the quantity of knowledge whereas old machine learning models stop improving after a saturation .
Machine learning algorithms and deep learning algorithms have different problem-solving approaches, in one hand a machine learning algorithm breaks the matter into different levels where, at each level, the matter is solved then the answer of every level is combined to make the answer of an entire problem while in deep learning the matter is solved end-to-end as a whole.
Machine learning algorithms interpret crisp rules while deep learning doesn’t i.e. result interpretation is more appropriate in machine learning while deep learning lacks this ideality.
In general, the training time of deep learning algorithms is high thanks to the presence of numerous parameters within the deep learning algorithms whereas machine learning comparatively takes lesser time within the training procedure. this is often then reversed for the testing time. The testing time for machine learning is above deep learning.
What are activation functions all about?
Activation functions are really important for a man-made Neural Network to find out and add up of something really complicated and Non-linear complex functional mappings between the inputs and response variable. They introduce non-linear properties to our Network. Their main purpose is to convert an input of a node in an A-NN to an output . That output now’s used as an input within the next layer within the stack.
One of the only activation functions is that the Heaviside step function. This function returns a 0 if the linear combination is a smaller amount than 0. It returns a 1 if the linear combination is positive or adequate to zero.
Specifically, in A-NN we do the sum of products of inputs(X) and their corresponding Weights(W) and apply an Activation function f(x) thereto to urge the output of that layer and feed it as an input to subsequent layer. Other activation functions you’ll see are the logistic (often called the sigmoid), tanh, and softmax functions.
What is the error and loss function?
In most learning networks, the error is calculated because the difference between the particular output and therefore the predicted output.
The function that’s wont to compute this error is understood as Loss Function(J). Different loss functions will give different errors for an equivalent prediction, and thus have a substantial effect on the performance of the model. one among the foremost widely used loss function is mean square error(MSE), which calculates the square of the difference between actual value and therefore the predicted value. Different loss functions are wont to affect different sort of tasks, i.e. regression and classification. Thus, loss functions are helpful to coach a neural network. Given an input and a target, they calculate the loss, i.e difference between output and therefore the target variable. Loss functions fall into four major category:
Regressive loss functions which are utilized in case of regressive problems, that’s when the target variable is continuous. Some examples are: Mean Square Error, Absolute Error & Smooth Absolute Error
Classification loss functions used when the target variable y, may be a binary variable, 1 for true and -1 for false. Some examples are: Binary Cross Entropy, Negative Log Likelihood, Margin Classifier & Soft Margin Classifier
Embedding loss functions which affect problems where we’ve to live whether two inputs are similar or dissimilar. Some examples are:
1. L1 Hinge Error- Calculates the L1 distance between two inputs.
2. Cosine Error- Cosine distance between two inputs.
How did Gradient Descent come into play?
“A gradient measures how much the output of a function changes if you change the inputs a little bit.” — Lex Fridman (MIT)
Gradient descent is an optimization algorithm wont to find the values of parameters (coefficients) of a function (f) that minimizes a price function (cost). Gradient descent is best used when the parameters can’t be calculated analytically (e.g. using linear algebra) and must be looked for by an optimization algorithm. Gradient descent is employed to seek out the minimum error by minimizing a “cost” function.
Stochastic Gradient Descent performs a parameter update for every training example, unlike normal Gradient Descent which performs just one update. Thus it’s much faster. Gradient Decent algorithms can further be improved by tuning important parameters like momentum(which determines the speed with which learning rate has got to be increased as we approach the minima), learning rate(a hyper-parameter that controls what proportion we are adjusting the weights of our network with respect the loss gradient) etc.
The lifecycle of a typical Deep learning application?
The lifecycle of a typical (supervised) deep learning application consists of various steps, ranging from data and ending with predictions within the wild.
- In neural networks, you Forward Propagate to get the output and compare it with the real value to get the error. Now, to minimize the error, you propagate backward by finding the derivative of error with respect to each weight and then subtracting this value from the weight value. This is called Back Propagation, read this post to understand Backpropagation in 5 minutes
- Ian Goodfellow, Yoshua Bengio, and Aaron Courville (2016). Deep
Learning. MIT Press. Online
- Application Deep Learning
- Regularisation is the technique used to solve the over-fitting problem. Over-fitting happens when the model is biased to one type of dataset.