# Week 3: Computational Basics – Numerical computation and optimization, Introduction to Machine Learning packages

• How machine store numbers
• Machine only stores numbers to a finite precision
• The floating point format: ±S × b^e
• For a 64-bit storage scheme IEEE standard: 1-bit sign, 11-bits signed exponent, 52-bits mantissa
• Underflow and overflow
• Conditional number
• The condition number of a function with respect to an argument measures how much the output value of the function can change for a small change in the input argument.
• For symmetric matrices, condition number is the ratio of the largest to smallest eigenvalue.
• cond(A)= ||A|| ||A-1||     A should be square non-singular matrix.
• Derivative
• The derivative of a function of a real variable measures the sensitivity to change of the function value with respect to a change in its argument.
• Partial derivative of a function of several variables is its derivative with respect to one of those variables, with the others held constant.
• The gradient is a multi-variable generalization of the derivative. While a derivative can be defined on functions of a single variable, for functions of several variables, the gradient takes its place. The gradient is a vector-valued function, as opposed to a derivative, which is scalar-valued.
• The gradient of a function ƒ, denoted as ∇ƒ, is the collection of all its partial derivatives into a vector.
• Hessian
• It is a square matrix of second-order partial derivatives of a scalar-valued function.
• Jacobian
• the Jacobian matrix is the matrix of all first-order partial derivatives of a vector-valued function.
• Taylor Series
• A Taylor series is a representation of a function as an infinite sum of terms that are calculated from the values of the function’s derivatives at a single point.
• A function can be approximated by using a finite number of terms of its Taylor series.
• Optimization
• The general optimization task is to maximize or minimize a function f(x) by varying x.
• The function f(x) is called objective function, cost function or loss function.
• Any maximization problem can be written as minimization of -f(x).
• Unconstrained problem
• Find x that minimizes f(x) with x ∊ ℝ
• Any local extremum will have the property f ′(x)=0
• Such points are called stationary points or critical points
• The stationary points may be a (local) minimum, maximum or saddle point.
• If f ′′(x)>0, it is a local minimum.
• If f ′′(x)<0,it is a local maximum.
• If f ′′(x)=0, it could be a saddle point.
• The absolute lowest/highest level of f(x) is called the global maximum/minimum
• Optimization - Multivariate x
• Since x is now a vector quantity, we need to evaluate the gradient.
• The type of critical point is decided by the nature of the Hessian.
• If H is positive definite, it is a local minimum.
• If H is negative definite, it is a local maximum.
• If H is indefinite(neither p.d or n.d), then it is a saddle point.
• Constrained optimization
• The general constrained optimization task is to maximize or minimize a function f(x) by varying x given certain constraints on x.
• All constraints can be converted to two types of constraints:
• Equality constraints
• Inequality constraints
• Canonical form of optimization problems.
• Generalized Lagrange function
• It is a strategy for finding the local maxima and minima of a function subject to equality constraints.
• Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient of the function at the current point.
• It is possible to gradient descent algorithm to:
• Diverge
• Oscillate witoutdiverging or converging
• Converge slowly
• Converge rapidly
• When should iteration stop?
1. Decide on learning rate(α), stopping precision and stopping criteria.
2. Make an initial guess for w = w0
3. Calculate wk+1 = wk - α ∇J
4. If stopping criteria not satisfied, go to step 3
5. Stop
• Packages/Tools
• Python
• Scikit-Learn
• Tensorflow
• Keras
• PyTorch
• Caffe
• MATLAB

### Assignment Tips

• Question 1: The first question is smiliar to the example shown in the video lecture The python code for the example problem is given below. Try to run in google colab.

Output:

Output: