LikeLike

]]>We usually choose a loss function that is not linear piecewise (i.e. cross entropy or l2 distance) so the hessian of the parameters w.r.t the loss is non zero.

Moreover methods that use finite difference do not suffer this issue.

LikeLike

]]>LikeLike

]]>LikeLike

]]>LikeLike

]]>LikeLike

]]>LikeLike

]]>I think that the most interesting property of RBMs is that they have a consistent generation model p(x | h) and inference model p(h | x) which correspond to the same joint p(x, h). On the other hand, models like variational autoencoders have to learn an approximate inference model q(h | x), which may not be consistent with the generation model p(x | z).

LikeLike

]]>In general, for that question I’m not sure about the difference in behaviour between 1) and 2) (unless (c) doesn’t apply to (1) )

LikeLike

]]>