Program D Lbasis

归一化操作

针对每一行进行归一化,用np.linalg.norm 函数

For example, if

then

and

broadcasting documentationarrow-up-right.

激活函数

Sigmoid

def sigmoid(x):
    """
    Compute the sigmoid of x

    Arguments:
    x -- A scalar or numpy array of any size.

    Return:
    s -- sigmoid(x)
    """
    s = 1/(1+np.exp(-x))
    return s

##Softmax 函数

You can think of softmax as a normalizing function used when your algorithm needs to classify two or more classes.

Instructions:

$ \text{for } x \in \mathbb{R}^{1\times n} \text{, } softmax(x) =softmax(\begin{bmatrix}x_1 &&x_2 &&... &&x_n \end{bmatrix}) =\begin{bmatrix}\frac{e^{x_1}}{\sum_{j}e^{x_j}} &&\frac{e^{x_2}}{\sum_{j}e^{x_j}} &&... &&\frac{e^{x_n}}{\sum_{j}e^{x_j}} \end{bmatrix} $

$\text{for a matrix } x \in \mathbb{R}^{m \times n} \text{, $x_{ij}$ maps to the element in the $i^{th}$ row and $j^{th}$ column of $x$, thus we have: }$ softmax(x)=softmax[x11x12x13x1nx21x22x23x2nxm1xm2xm3xmn]=[ex11jex1jex12jex1jex13jex1jex1njex1jex21jex2jex22jex2jex23jex2jex2njex2jexm1jexmjexm2jexmjexm3jexmjexmnjexmj]=(softmax(first row of x)softmax(second row of x)...softmax(last row of x))softmax(x) = softmax\begin{bmatrix}x_{11} & x_{12} & x_{13} & \dots & x_{1n} \\x_{21} & x_{22} & x_{23} & \dots & x_{2n} \\\vdots & \vdots & \vdots & \ddots & \vdots \\x_{m1} & x_{m2} & x_{m3} & \dots & x_{mn}\end{bmatrix} = \begin{bmatrix}\frac{e^{x_{11}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{12}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{13}}}{\sum_{j}e^{x_{1j}}} & \dots & \frac{e^{x_{1n}}}{\sum_{j}e^{x_{1j}}} \\\frac{e^{x_{21}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{22}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{23}}}{\sum_{j}e^{x_{2j}}} & \dots & \frac{e^{x_{2n}}}{\sum_{j}e^{x_{2j}}} \\\vdots & \vdots & \vdots & \ddots & \vdots \\\frac{e^{x_{m1}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m2}}}{\sum_{j}e^{x_{mj}}} &\frac{e^{x_{m3}}}{\sum_{j}e^{x_{mj}}} & \dots & \frac{e^{x_{mn}}}{\sum_{j}e^{x_{mj}}}\end{bmatrix} = \begin{pmatrix}softmax\text{(first row of x)} \\softmax\text{(second row of x)} \\... \\softmax\text{(last row of x)} \\\end{pmatrix}

预处理

查看数据

A trick when you want to flatten a matrix X of shape (a,b,c,d) to a matrix X_flatten of shape (b∗c∗d, a) is to use:

What you need to remember:

Common steps for pre-processing a new dataset are:

  • Figure out the dimensions and shapes of the problem (m_train, m_test, num_px, ...)

  • Reshape the datasets such that each example is now a vector of size (num_px * num_px * 3, 1)

  • "Standardize" the data}

Logistic Regression神经网络

Mathematical expression of the algorithm:

For one example $x^{(i)}​$:

The cost is then computed by summing over all training examples:

Key steps: In this exercise, you will carry out the following steps:

  • Initialize the parameters of the model

  • Learn the parameters for the model by minimizing the cost

  • Use the learned parameters to make predictions (on the test set)

  • Analyse the results and conclude

The main steps for building a Neural Network are:

  1. Define the model structure (such as number of input features)

  2. Initialize the model's parameters

  3. Loop:

    • Calculate current loss (forward propagation)

    • Calculate current gradient (backward propagation)

    • Update parameters (gradient descent)

You often build 1-3 separately and integrate them into one function we call model().

Forward Propagation:

  • You get X

  • You compute $A = \sigma(w^T X + b) = (a^{(0)}, a^{(1)}, ..., a^{(m-1)}, a^{(m)})$

  • You calculate the cost function: $J = -\frac{1}{m}\sum_{i=1}^{m}y^{(i)}\log(a^{(i)})+(1-y^{(i)})\log(1-a^{(i)})$

sklearn包实现:

神经网络

Mathematically:

For one example $x^{(i)}$:

Given the predictions on all the examples, you can also compute the cost $J$ as follows:

Reminder: The general methodology to build a Neural Network is to:

  1. Define the neural network structure ( # of input units, # of hidden units, etc).

  2. Initialize the model's parameters

  3. Loop:

    • Implement forward propagation

    • Compute loss

    • Implement backward propagation to get the gradients

    • Update parameters (gradient descent)

You often build helper functions to compute steps 1-3 and then merge them into one function we call nn_model(). Once you've built nn_model() and learnt the right parameters, you can make predictions on new data.

多层神经网络

  • Initialize the parameters for a two-layer network and for an LL-layer neural network.

  • Implement the forward propagation module (shown in purple in the figure below).

    • Complete the LINEAR part of a layer's forward propagation step (resulting in Z[l]Z[l]).

    • We give you the ACTIVATION function (relu/sigmoid).

    • Combine the previous two steps into a new [LINEAR->ACTIVATION] forward function.

    • Stack the [LINEAR->RELU] forward function L-1 time (for layers 1 through L-1) and add a [LINEAR->SIGMOID] at the end (for the final layer LL). This gives you a new L_model_forward function.

  • Compute the loss.

  • Implement the backward propagation module (denoted in red in the figure below).

    • Complete the LINEAR part of a layer's backward propagation step.

    • We give you the gradient of the ACTIVATE function (relu_backward/sigmoid_backward)

    • Combine the previous two steps into a new [LINEAR->ACTIVATION] backward function.

    • Stack [LINEAR->RELU] backward L-1 times and add [LINEAR->SIGMOID] backward in a new L_model_backward function

  • Finally update the parameters.

确定神经网络结构

初始化模型参数

循环

前向传播

多层神经网络的前向传播

1.Linear Forward

  1. Linear-Activation Forward

3.L_model_forward

###损失函数

反向传播

grad_summary
  • Tips:

    • To compute dZ1 you'll need to compute $g^{[1]'}(Z^{[1]})$. Since $g^{[1]}(.)$ is the tanh activation function, if $a = g^{[1]}(z)$ then $g^{[1]'}(z) = 1-a^2$. So you can compute $g^{[1]'}(Z^{[1]})$ using (1 - np.power(A1, 2)).

多层神经网络反向传播

1.Linear backward

For layer $l$, the linear part is: $Z^{[l]} = W^{[l]} A^{[l-1]} + b^{[l]}$ (followed by an activation).

Suppose you have already calculated the derivative $dZ^{[l]} = \frac{\partial \mathcal{L} }{\partial Z^{[l]}}$. You want to get $(dW^{[l]}, db^{[l]} ,dA^{[l-1]})$.

  1. Linear-Activation backward

3.L-Model Backward

更新参数

整合

预测

产生数据

调整隐藏层size

##决策边界

整合

Reference

Last updated