Dynamic System Control using a Static Generative Model
Chinese version: 静态生成式模型的动态系统控制算法 - 知乎 (zhihu.com)
Recap from the last blog post, we formulated the control problem as a generative model in the following fashion:
\[[v,w,x] = g(z), \quad z \sim \mathbb{D},\]
in which \(v\) is the collection of control objectives to be minimized, \(w\) contains the output control signals, and \(x\) represents the input signals that are thought to offer information to the control problem.
This model is static for the lack of explicit formulation of the continuous evolution of the system states \([v,w,x]\). It is possible to change the model to a dynamic one, but in this blog post we show an algorithm that can offer dynamic control using a static model.
\[ \begin{align} & \text{Initialize } \epsilon_0 \text{ and } [v_0, x_0] = r (u_0, w_0) \\ & \text{for } t = 1 \text{ to } \infty \text{ do} \\ & \quad \text{Initialize } [v', w', x'] = [v_{t-1}, w_{t-1}, x_{t-1}] \\ & \quad \text{While } v_{t-1} - v' > \epsilon_{t-1} \text{ and } v' > v_{\text{min}} \text{ do} \\ & \quad \quad z' = \underset{z}{\text{argmin}} \| g(z) - [v' - \epsilon_{t-1}, w', x'] \| \\ & \quad \quad [v', w', x'] = g(z') \\ & \quad w_t = w' \\ & \quad [v_t, x_t] = r (u_t, w_t) \\ & \quad \text{if } \|[v_t, x_t] - [v', x'] \| < \delta_{\text{min}} \text{ then} \\ & \quad \quad \epsilon_t = \gamma_{\text{upscale}} \cdot \epsilon_{t-1} \\ & \quad \text{else if } \|[v_t, x_t]- [v', x'] \| > \delta_{\text{max}} \text{ then} \\ & \quad \quad \epsilon_t = \gamma_{\text{downscale}} \cdot \epsilon_{t-1} \end{align} \]
In the algorithm above, \([v, x] = r(u, w)\) represents the objectives and the input signals obtained from the system after applying the control signals \(w\), with internal unobserved states \(u\). Intuitively, the more we can observe the signals from \(u\) i.e., moving it from \(u\) to \(x\), the better we might be able to disentangle the uncertainty of the controlled system. This is an open area for theoretical research, but the algorithm does not need \(u\) as long as it can run the controlled system to obtain \([v, x]\).
The algorithm attempts to reduce the objectives \(v_{t-1}\) by an amount \(\epsilon_{t-1}\), and then verify whether it matches with what the system returns. If it does not, in the next step it will try to use a smaller \(\epsilon_t\), the amount of objective reduction in the next step. This mechanism is controlled by a set of parameters satisfying \(\delta_{\text{min}} \leq \delta_{\text{max}}\) and \(0 < \gamma_{\text{downscale}} < 1 < \gamma_{\text{upscale}} \).
Since the loop on \([v',w',x']\) is a fixed process with inputs \([v_{t-1}, w_{t-1}, x_{t-1}]\) and \(\epsilon_{t-1}\), the uncertainties in the outputs are disambiguited. As a result, we can amortize the process as \(w_t = h(v_{t-1}, w_{t-1}, x_{t-1}, \epsilon_{t-1})\). Brandon Amos presented a good mathematical tutorial on amortization.
Finally, all the samples \([v_t, w_t, x_t] \) obtained from the controlled system \(r\) can be brought back as training data to improve the model \(g\), realizing what is perhaps the simplest form of System 2 to System 1 knowledge learning. A philosophical discussion on both amortization and System-2-to-1 learning exists in Yann LeCun's position paper.
Comments
Post a Comment