print("Before optimization, ", terminator: "") print("x: \(x) and f(x): \(f(x))")
// Optimization loop for _ in 1...maxIterations { /// Derivative of `f` w.r.t. `x`. let 𝛁xF = gradient(at: x) { x -> Float in return f(x) } // Optimization step: update `x` to maximize `f` x += η * 𝛁xF }
for _ in 1...maxIterations { let 𝛁xF = gradient(at: x) { x -> Float in return f(x) } // Optimization step: update `x` to maximize `f` x.move(along: 𝛁xF.scaled(by: -η)) } print("After gradient descent, ", terminator: "") print("input: \(x) and output: \(f(x))")
解释
这个原理挺好理解得,如果一个函数在极大值点周围:
同样的,在极小值点周围:
本文中,主要是使用函数:gradient(at: in:)
1
@inlinable public func gradient<T, R>(at x: T, in f: @differentiable (T) -> Tensor<R>) -> T.TangentVector where T : Differentiable, R : TensorFlowFloatingPoint