5  Aim 2 - Multi-input CLOC

Rationale

The power of closed-loop optogenetic control (CLOC, henceforth referring specifically to feedback control; see Section 3.1) is limited by the degrees of freedom provided by the optogenetic stimulation. Naturally, we would want at least as many actuators as degrees of freedom in the system to control it effectively—for example, we may want to stimulate different layers, cell types, or columns separately in the cortex. Moreover, actuation/stimulation can be unidirectional or bidirectional, referring to whether a single opsin type or both excitatory and inhibitory opsins are used simultaneously. Unidirectional control has obvious shortcomings: for example, an excitatory opsin alone could not lower the firing rate of transfected cells, making it unsuitable to clamp baseline activity or follow a time-varying reference with steep drops.

While a previous study (1) has implemented bidirectional CLOC, it does not feature the generalizability and scalability of the model-based, optimal control algorithms introduced by later work (2) (see Section 3.3) for unidirectional actuation. This adaptive linear-quadratic regulator (LQR) approach is more robust to disturbances and can scale to multi-input multi-output (MIMO) systems. Moreover, its behavior can be easily configured by setting penalties on state error, the control signal, and even the derivative of the control signal to encourage smooth actuation.

Thus, a natural goal for furthering CLOC is to combine the advantages of multi-input/bidirectional actuation and model-based optimal control—however, this poses additional challenges and opportunities. The adaptive LQR method previously developed has limited application for multi-input actuation because it does not model the constraint that the input (light intensity) must be nonnegative. This would cause problems in the case of bidirectional (or any spatially overlapping) actuation, since the controller might call for negative excitatory input rather than positive excitatory input. A heuristic workaround to this is to place light sources in spatially non-overlapping pairs (e.g., blue and red) and treat them as the positive and negative directions of a single actuator. This allows the continued use of simple and fast LQR methods but fails to model kinetic differences or spectral crosstalk between inhibitory and excitatory opsins and precludes alternate light configurations. While I hypothesize that methods for optimal control with constraints will outperform this heurstic LQR, such methods are more costly computationally. It is thus unclear which method(s) are preferable for real-time control on the timescale of network-level variables of neural activity using compute resources typically available to an experimental neuroscience lab.

Innovation

I propose addressing this problem by comparing LQR to model predictive control (MPC), which is widely used for its flexibility in implementing optimal control with constraints. Rather than computing the control signal from the current error signal at each step, MPC looks ahead, optimizing over the predicted trajectory some finite number of steps into the future, in what is known as “receding horizon control.” The quadratic program optimization required at every control step, however, introduces latency which could harm control performance compared to LQR. I thus plan on testing my hypothesis that MPC will be able to better optimize multi-input optogenetic stimulation while accommodating experimental constraints and considerations during real-time control on timescales relevant to network-level descriptions of neural activity. I will do this by assessing control quality of MPC compared to the heuristic LQR approach previously described as I simulate multi-output feedback control of firing rates and gamma oscillations observed in local field potentials (LFP).

Figure 5.1: An illustration of how MPC optimizes the system input over a receding horizon. By Martin Behrendt, licensed under CC BY-SA 3.0.

Approach

System and controller formulation

Naturally, the model is a vital element of MPC. I will use a previously developed Gaussian linear dynamical system (GLDS) model (2), which has been shown to reliably capture firing rate dynamics in a light-driven spiking neuron system. The discrete-time GLDS is governed by the following equations:

\[ \begin{aligned} x_{t + 1} &= Ax_{t} + Bu_{t} + w_t \\ y_{t} &= Cx_{t} + d \\ z_{t} &= y_{t} + v_t \\ \end{aligned} \]

where \(x_{t} \in \mathbb{R}^n\) is the \(n\)-dimensional state, \(u_{t} \in \mathbb{R}^k\) is the \(k\)-dimensional stimulus (i.e., \(k = 2\) for two opsins, one light source each), \(y_{t} \in \mathbb{R}^m\) is the firing rate in spikes/timestep (for each of \(m\) measured neurons), and \(z_{t} \in \mathbb{R}^m\) is the number of binned spikes observed at time \(t\). \(A \in \mathbb{R}^{n \times n}\), \(B \in \mathbb{R}^{n \times k}\), and \(C \in \mathbb{R}^{m \times n}\) are the state transition, input, and output matrices, respectively. \(w_{t} \sim \mathcal{N}\left( 0, Q \right)\) and \(v_{t}\mathcal{\sim N}\left( 0, R \right)\) are Gaussian-distributed process and measurement noise, respectively, and \(d \in \mathbb{R}^{m \times 1}\) represents baseline firing rates. Model order (\(n\)) and horizon length (\(T \in \mathbb{N}\)) will be chosen to balance complexity and prediction error for noise-driven fitting data generated from the test network. The latent state \(x_{t}\) will be estimated online using the Kalman filter (3), driven by the prediction error \(z_{t} - {\widehat{y}}_{t|t - 1}\).

I will set hard non-negative constraints on the light input as well as a ceiling determined by hardware limitations (i.e., the maximum voltage deliverable to the LED driver). To design an appropriate cost function, I will use a conventional per-time step quadratic form

\[\ell( x_{t},r_{t},u_{t} ) = ( x_{t} - r_{t} )^{T}Q^{\text{ctrl}}( x_{t} - r_{t} ) + u^{T}R^{\text{ctrl}}u\ ,\]

where \(r_{t} \in \mathbb{R}^n\) is the reference trajectory at time \(t\). \(Q^{\text{ctrl}}\) and \(R^{\text{ctrl}}\) are real \(n \times n\) and \(k \times k\) matrices chosen to appropriately penalize tracking error and stimulus size, respectively. This quadratic cost function formulation lends the problem well to standard optimization techniques—combined with a linear dynamical system, it constitutes the classical linear-quadratic-Gaussian (LQG) control problem.

Then, at every time step \(t\) the controller solves the following quadratic program:

\[ \begin{aligned} \text{minimize}{} \quad & \sum_{\tau=t}^{t+T} \ell(x_\tau, u_\tau) \\ \text{subject to} \quad & u_\tau \succeq 0 \\ & x_{\tau + 1} = Ax_{\tau}+Bu_\tau \\ \end{aligned} \]

where \(T \in \mathbb{N}\) is the number of steps in the prediction/control horizon and \(\succeq\) indicates an inequality for each element of \(u_\tau \in \mathbb{R}^k\). This yields the solution \(\tilde{u}_\tau,...,\tilde{u}_{\tau+T-1}\), of which we take just the first step \(\tilde{u}_t\) to apply to the system.

Control method comparison in silico

To confirm our assumption that bidirectional and multi-channel configurations will improve control quality, I will test both unidirectional and bidirectional actuation as well as one-channel and multi-channel configurations for each experiment. (In the bidirectional actuation case, one “channel” includes both an inhibition-triggering and an excitation-triggering light source). Another condition of interest will be to penalize a low-pass-filtered version of the stimulus, which could reflect overheating or ion imbalances caused by prolonged stimulation (46). This could be added to the linear dynamics and quadratic cost functions without changing the optimization methods, but, as with opsin and channel count, the increased size of the problem increases latency which could affect which method performs best. Most importanly, I will compare open-loop, heuristic LQR, and MPC approaches to see whether MPC attains better performance despite a longer computational delay, as hypothesized. Control algorithm computation time will be measured for each method and used during simulations as a realistic delay. To evaluate controller performance, I will use metrics such as the mean-squared error (MSE).

In the first experiment, I will simulate multi-output control of firing rates in an attempt to clamp population activity. (7) and (8) laid the foundation for this by clamping aggregate, population firing rates, and (2) took this further by treating firing rates of individual neurons separately, though with a single optic fiber input. This is an obvious case where multiple inputs should provide more control, allowing us to manipulate neurons (or groups of neurons in the case of unsorted threshold crossings) more individually, each to its own baseline. The experiment will be performed on a Poisson linear dynamical system (PLDS) model (9) fit to optogenetic input/spike output data from a spiking neural network (SNN) model. The SNN model will contain necessary features, such as cell types, time-varying exogenous input, and connectivity profiles, to produce stochastic firing patterns with unpredictable disturbances and will be simulated together with the recording and stimulation facilities of Aim 1.

As another experiment, I propose another form of network-level control—manipulating the oscillatory signatures found in LFP signals. Gamma oscillations (30-90 Hz) are one such example, which have been shown to have time-varying properties and interact with other oscillatory bands. Interestingly, gamma rhythms can even show phase coherence in distant brain regions such as visual cortex in opposite hemispheres, which has been hypothesized to play a role in information integration (10). I plan to test control methods on a virtual experiment of this phenomenon by simulating two populations of neurons exhibiting gamma oscillations (11, 12) with long-range connections mediating gamma frequency coherence, again using the framework from Aim 1. I will then perform feedback control on the LFP signal to counter the natural phase locking that arises, providing an opportunity to test how well LFP can be controlled on a <100-ms timescale of a single gamma cycle.

Expected results

I expect that bidirectional, multi-channel, and model predictive control will be perform better than unidirectional, single-channel, and LQR control, respectively for both experiments, despite the added computational cost these methods require. I also expect that restricting the controller through hard constraints or prolonged stimulation penalties will also be possible with the more sophisticated methods without greatly increasing the error.

Preliminary results

Basic simulations controlling a linear dynamical system model fit to experimental data show the advantages of bidirectional control and of MPC (see Figure 5.2). Bidirectional actuation allows the system to avoid overshooting the reference, in the case of LQR, or to minimize error faster by first exciting then inhibiting, in the case of MPC. MPC’s advantages in looking ahead also clearly allow it to follow the reference more closely than the heuristic LQR controller (assigning negative inputs to the second light source). However, these results do not yet account for control signal computation latency.

Figure 5.2: Simulated control of an linear dynamical system with 1- and 2-input control, using LQR and MPC controllers. The top panel of each contains the reference and the actual firing rate, in spikes/second. The bottom contains the light intensity, in terms of mW/mm2, where blue represents light for an excitatory opsin (such as ChR2) and red-orange that for an inhibitory opsin (such as Jaws).

Potential pitfalls & alternative strategies

There are some limitations in the proposed GLDS model that may need to be addressed. While it was adequate for the experiments in (2), a Poisson (9) or other output nonlinearity may be needed in the case that a standard GLDS does not fit the data well. While this would likely make the estimation of \(x\) more expensive, the underlying dynamics could remain linear, leaving the same underlying quadratic program for the controller to solve.

This touches another potential concern: the speed of MPC. If the optimization problem solution is slow, there are a few options to explore. One is that some variations in the control scheme can help balance speed and performance, such as letting the control horizon be shorter than the prediction horizon, which shrinks the optimization problem. Likewise, the control period can be longer than the time step of the system, reducing how often the control signal is computed. If conventional methods such as these are unsuccessful, I may turn to approximate methods such as that described in (13) or training an artificial neural network. Another potential solution is explicit MPC (14), which finds a piecewise-affine explicit solution to the quadratic program which can be faster than obtaining the implicit solution for small-enough problems.