Sigmoid Generator

Training deep nets with logistic units is hard. Idea is to linearize the network in the beginning of the training and let the network learn the grade of non linearity (e-g- capacity) by itself.

Forward

  • Let $f(x,\alpha)=\alpha{1\over 1+e^{-x}} + (1-\alpha)(x-0.5)$
  • $z_i=S^{t-1}_i$
  • $S^t_i=f(z_i, b_i)$

Backward

  • Let $g(x)={1\over 1+e^{-x}}$
  • ${\delta E\over\delta z_i}
    ={\delta S^t_i\over\delta z_i}{\delta E\over\delta S^t_i}
    =(b_i g(z_i)(1-g(z_i))+1-b_i){\delta E\over\delta S^t_i}
    $
  • ${\delta E\over\delta S^{t-1}_i}
    ={\delta z_i\over\delta S^{t-1}_i}{\delta E\over\delta z_i}
    ={\delta E\over\delta z_i}
    $
  • ${\delta E\over\delta b_i}
    ={\delta S^t_i\over\delta b_i}{\delta E\over\delta S^t_i}
    =(g(z_i)-(z_i-0.5)){\delta E\over\delta S^t_i}
    $