# Deep Learning / Machine Learning , Artificial intelligence

Currently, **AI **is advancing at **an excellent **pace and deep learning is **one among the most important **contributors **thereto **. So, **within the **2nd post of the 101 for Dummies like Me series, I’ll take **you thru the fundamentals **of deep learning. **you’ll **find **the first **post here where we talked about **the fundamentals **of PyTorch, which was inspired by Intro to Deep Learning with PyTorch from Udacity. **a number of the pictures during this **post are taken from Udacity Deep Learning Nanodegree which **may be a **great **start line **for beginners. **within the **last years, applications of Deep Learning made huge advancements in many domains arousing astonishment in **folks that **didn’t expect the technology and world **to vary **so fast. Refer below image for a uber level comparison between AI vs. ML vs. DL

The hype started in March 2016 when Lee Sedol, the 18-time world champion was beaten 4 to 1 by super-computer AlphaGo. This match had an enormous influence on the Go community as AlphaGo invented completely new moves which made people attempt to understand, reproduce them and created a completely new perspective on the way to play the sport . But that’s not over, in 2017 DeepMind introduced AlphaGo Zero. The newer version of an already unbeatable machine was ready to learn everything with none starting data or human help. All that with computational power 4 times but it’s predecessor!

**What is Deep Learning all about?**

Deep learning **may be a **branch of machine learning **supported ****a group **of algorithms that **plan to **model high-level abstractions in data inspired by the structure **and performance **of the brain called artificial neural networks.

In a simple case, consider the image on your left, where **you’ve got **some sets of neurons: The leftmost layer of the network **is named **the input layer(L1), **and therefore the **rightmost layer the output layer(L3) (which, **during this **example, has **just one **node). **the center **layer of nodes **is named **the hidden layer(L2) because its values **aren’t **observed **within the **training set. We also say that our example neural network has 3 input units (not counting the bias unit), 3 hidden units, and 1 output unit. Similarly, if it’s a deep network, there are many layers between the input and output (and the layers **aren’t made from **neurons but it can help to **consider **it that way), allowing the algorithm to use multiple processing layers, composed of multiple linear and non-linear transformations.

Sidenote from Biology: Like our human brain has **many **neurons **during a **hierarchy and Network of neurons which are interconnected with **one another **via Axons and passes Electrical signals from one layer **to a different **called synapses. **this is often **how we humans learn things. Whenever we see, hear, feel and think something a synapse(electrical impulse) is fired from one neuron **to a different within the **hierarchy which enables us **to find out **, remember and memorize things in our **lifestyle **since the day we were born.

**A neural network having more than one hidden layer is generally referred to as a Deep Neural Network**.

**How Deep Learning differ from Traditional Machine Learning?**

One of **the main **differences between machine learning and deep learning model is on the feature extraction area. Feature extraction **is completed **by human in machine learning whereas deep learning model figures out by itself.

Deep learning models tend to perform well with **the quantity of knowledge **whereas old machine learning models stop improving after a **saturation **.

Machine learning algorithms and deep learning algorithms have different problem-solving approaches, in one hand a machine learning algorithm breaks **the matter **into different levels where, at each level, **the matter **is solved **then the answer of every **level is combined **to make the answer **of **an entire **problem while in deep learning **the matter **is solved end-to-end as a whole.

Machine learning algorithms interpret crisp rules while deep learning **doesn’t **i.e. result interpretation is more appropriate in machine learning while deep learning lacks this ideality.

In general, the training time of deep learning algorithms is high **thanks to **the presence of **numerous **parameters **within the **deep learning algorithms whereas machine learning comparatively takes lesser time **within the **training procedure. **this is often **then reversed for the testing time. The testing time for machine learning is **above **deep learning.

**What are activation functions all about?**

Activation functions are really important for **a man-made **Neural Network **to find out **and **add up **of something really complicated and Non-linear complex functional mappings between the inputs and response variable. They introduce non-linear properties to our Network. Their main purpose is to convert an **input **of a node in an A-NN to an **output **. That **output ****now’s **used as an input **within the **next layer **within the **stack.

One of **the only **activation functions **is that the **Heaviside step function. This function returns a 0 if the linear combination **is a smaller amount **than 0. It returns a 1 if the linear combination is positive or **adequate to **zero.

Specifically, in A-NN we do the sum of products of inputs(X) and their corresponding Weights(W) and apply an Activation function f(x) **thereto to urge **the output of that layer and feed it as an input to **subsequent **layer. Other activation functions you’ll see are the logistic (often called the sigmoid), tanh, and softmax functions.

**What is the error and loss function?**

In most learning networks, the error is calculated **because the **difference between **the particular **output **and therefore the **predicted output.

The function **that’s wont to **compute this error **is understood **as Loss Function(J). Different loss functions will give different errors for **an equivalent **prediction, and thus have **a substantial **effect on the performance of the model. **one among the foremost **widely used loss function is mean square error(MSE), which calculates the square of the difference between actual value **and therefore the **predicted value. Different loss functions are **wont to affect **different **sort of **tasks, i.e. regression and classification. Thus, loss functions are helpful **to coach **a neural network. Given an input and a target, they calculate the loss, i.e difference between output **and therefore the **target variable. Loss functions **fall into **four major category:

Regressive loss functions which are **utilized in **case of regressive problems, **that’s **when the target variable is continuous. Some examples are: Mean Square Error, Absolute Error & Smooth Absolute Error

Classification loss functions used when the target variable y, **may be a **binary variable, 1 for true and -1 for false. Some examples are: Binary Cross Entropy, Negative Log Likelihood, Margin Classifier & Soft Margin Classifier

Embedding loss functions which **affect **problems where **we’ve to live **whether two inputs are similar or dissimilar. Some examples are:

1. L1 Hinge Error- Calculates the L1 distance between two inputs.

2. Cosine Error- Cosine distance between two inputs.

**How did Gradient Descent come into play?**

*“A ***gradient*** measures how much the output of a function changes if you change the inputs a little bit.” — ***Lex Fridman (MIT)**

Gradient descent is an optimization algorithm **wont to **find the values of parameters (coefficients) of a function (f) that minimizes **a price **function (cost). Gradient descent is best used when the parameters **can’t be **calculated analytically (e.g. using linear algebra) and must be **looked for **by an optimization algorithm. Gradient descent **is employed to seek out **the minimum error by minimizing a “cost” function.

Stochastic Gradient Descent performs a parameter update **for every **training example, unlike normal Gradient Descent which performs **just one **update. Thus **it’s **much faster. Gradient Decent algorithms can further be improved by tuning important parameters like momentum(which determines **the speed **with which learning rate **has got to **be increased as we approach the minima), learning rate(a hyper-parameter that controls **what proportion **we are adjusting the weights of our network with respect the loss gradient) etc.**The lifecycle of a typical Deep learning application?**

The lifecycle of a typical (supervised) deep learning application consists **of various **steps, **ranging from data **and ending with predictions **within the **wild.

# Reference:

- In neural networks, you
**Forward Propagate**to get the output and compare it with the real value to get the error. Now, to minimize the error, you propagate backward by finding the derivative of error with respect to each weight and then subtracting this value from the weight value. This is called**Back Propagation**, read this post to understand Backpropagation in 5 minutes - Ian Goodfellow, Yoshua Bengio, and Aaron Courville (2016). Deep

Learning. MIT Press. Online - Application Deep Learning
- Regularisation is the technique used to solve the over-fitting problem. Over-fitting happens when the model is biased to one type of dataset.
- https://towardsdatascience.com/deep-learning-101-for-dummies-like-me-a53e3caf31b1

You must log in to post a comment.