A Path to Energy-efficient Spiking Delayed Feedback Reservoir Computing System for Brain-inspired Neuromorphic Processors

Kangjun Bai, Yang Yi
Bradley Department of Electrical and Computing Engineering
Virginia Tech, Blacksburg, Virginia, 24061, USA
E-mail: {kangjun, yangyi8}@vt.edu

Abstract
Following the computation revolution in the field of machine learning, the reservoir computing system has shown its promising perspectives toward mimicking our mammalian brains, with comparable performance to other conventional neuromorphic computing systems. In this work, we proposed a spiking delayed feedback reservoir (S-DFR) computing system, which is embedded with the temporal encoding scheme, the Mackey-Glass (MG) nonlinear transfer function, and the dynamic delayed feedback loop. By adopting the temporal encoding scheme as the signal processing module, pre- and post-neuron signals are represented by the digitized pulse train with alterable time intervals. Experimental results demonstrate its rich dynamic behaviors with merely 206μW of power consumption; furthermore, the system robustness is studied and analyzed through the Monte-Carlo simulation. To the best of our knowledge, our proposed S-DFR computing system represents the first analog integrated circuit (IC) implementation of the time delay reservoir (TDR) computing system.

Keywords
Delayed feedback reservoir, edge of chaos, spiking neural network, neuromorphic computing

1. Introduction
Surrounded by the enormous amount of data processing in our daily lives, developments of the computation performance are hindered by the gap between the complementary metal–oxide–semiconductor (CMOS) technology of the microprocessor and the storage capacity of the memory. The modern von Neumann computing architecture, which computes and stores data in separated locations, has become inefficient to many applications; for instance, speech, image, and video frame recognition. The amount of power spent on the data processing on supercomputers has become a burden to the World’s energy consumption.

As is well known, human brains can solve complex tasks such as learning, analyzing and classifying surrounded information with merely 10 Watts of power consumption [1]. Neuromorphic computing systems, which mimic the working mechanism of mammalian brains, have become the need of breaking through the performance barrier of the traditional von Neumann computing architecture.

To mimic the neurology of mammalian brains, the artificial neural network (ANN) is used to model biological nervous systems. Two well-known ANNs, feedforward neural network (FNN) and recurrent neural network (RNN), are capable to efficiently solve complex tasks, which are not probable with the traditional von Neumann computing architecture. As an important branch in RNNs, Liquid State Machine (LSM), which is constructed based on RNNs, closely mimics biological nervous systems and has the capability to process the temporal spiking information. However, training the recurrent connection is computationally expensive. Introduced in the early 2000s by Jaeger [2] and Maass [3], reservoir computing system exploits the dynamic behavior of RNNs, which is a novel concept in the machine learning paradigm.

In the conventional reservoir computing system, the high-dimensional nonlinear input projection, as achieved in traditional RNNs, is constructed with a layer of fixed and untrained recurrent neural network, namely the reservoir. By only training output weights, the training complexity is greatly reduced. In the past decade, reservoir computing systems have fully developed to solve complex tasks [4-7]. In order to further reduce the design complexity and more closely mimic the dynamic behavior of mammalian brains, the time delay reservoirs (TDR) computing system [8], which is embedded with a single nonlinear neuron and a static delay feedback loop, is introduced.

In this context, the TDR computing system utilizes the feedback as a persistent dynamic memory to process time-series input signals. Satisfied with both persistent memory and separation property, the TDR computing system has promising perspective toward mimicking our mammalian brains. With the delay embedded, the TDR computing system does not only have ease of hardware implementation but also exhibits the near chaotic behavior. Recently, the photonic implementation using the TDR computing system has attracted worldwide attention [9-11], and beneficial to multifaceted applications [12]. However, the energy-hungry peripheral devices such as analog-to-digital (A/D) and digital-to-analog (D/A) converters as well as significant design area are required to facilitate the computation. To the best of our knowledge, the analog integrated circuit (IC) implementation of TDR computing systems has not been discovered yet.

In this work, we propose a novel and practical spiking delayed feedback reservoir (S-DFR) computing system that built on the temporal encoding scheme and the Mackey-Glass (MG) nonlinear transfer function. The contributions of our work include: 1) our proposed S-DFR computing system exhibits higher power efficiency with smaller design area; 2) our experimental results demonstrate its rich dynamic behaviors with merely 206μW of power consumption; 3) to the best of our knowledge, our work represents the first analog IC implementation of the TDR computing system.
2. Architecture and Operating Principle

In past several decades, neuron function was modeled as a linear integration of multiple synaptic inputs followed by a threshold nonlinearity. However, the recent neuron computation has evolved conceptually to a much more sophisticated mixed signal evaluation with biological synapses [13], such as RNNs. The critical challenge in RNNs is that all weights need to be trained, which results in significant high computational complexity. The concept of the TDR computing system has been proposed and realized to reduce the computational complexity and solve complex tasks.

Inevitably, delay is ubiquitous in almost every system. For instance, the diffusion of substances (cellular metabolites, oxygen and carbon dioxide in the blood), and the intrinsic time for the transportation between neurons [14]. With the delay embedded, the TDR computing system exhibits the near chaotic behavior. Neuron nodes within the reservoir layer compute the functionality of nonlinear mapping whereby the biological behavior of the neural system is achieved. TDR computing systems should process following properties: 1) higher dimensionality; 2) persistent memory. The TDR computing system is conceptually evolved in its training mechanism, and the implementation of its reservoir layer is highly flexible.

In [8], it has been shown that the TDR computing system has the practically identical performance to the conventional reservoir computing system. However, real-time operations require the interface with analog signals, hence, energy-hungry peripheral devices, such as analog multipliers as well as A/D and D/A converters, are needed to facilitate the computation of time-continuous signals. Apparently, with these peripheral components, the digital implementation often results in extremely high computational energy with large design area. Unlike the digital implementation, the analog realization computes the analog signals directly without the interface of A/D and D/A conversion, hence, greatly reduce the power consumption and the design area. Most importantly, the analog realization mimics the working mechanism of biological nervous systems closely. By naturally perform the neuron-like functions, the analog realization combines high-speed operations with small design area and low power consumption. Thus, we propose a novel and practical S-DFR computing system that built on the temporal encoding scheme, the MG nonlinear transfer function, and the dynamic delayed feedback loop. Fig. 1 illustrates the overview of our proposed S-DFR computing system architecture.

2.1. System Overview

Similar to the conventional TDR computing system, the proposed S-DFR computing system is constructed of three layers, namely the input layer, reservoir layer and output layer. In the proposed S-DRF computing system, the masking interface in the input layer is replaced by the temporal encoder [15, 16], where pre- and post-neuron signals are represented by the digitized pulse train with alterable time intervals. The time interval between spikes, $D_t$, can be expressed as:

$$D_t = f(C_m, V_{th}) - f(C_{m-1}, V_{th-1}),$$

where the function, $f(C_m, V_{th})$, could be represented as:

$$f(C_m, V_{th}) = (C_m + 1) [\beta \cdot (V_{th} - \gamma) + \theta],$$

where $\beta$, $\gamma$ and $\theta$ are design parameters that relate to the charging and the refractory period of the temporal encoder, $C_m$ and $V_{th}$ are the membrane capacitance and the firing threshold potential, respectively.

Within the reservoir layer, the high-dimensional nonlinear transformation is constructed by a single nonlinear node with a dynamic delayed feedback loop. The nonlinear node projects the encoded input signal onto a high-dimensional space to linearly separate classification results. Within the delayed feedback loop, the total delay time, $T$, is divided into $N$ equidistant delay neurons, which can be expressed as:

$$T = N \cdot \tau,$$

By combining the feedback signal with the next incoming data, the S-DFR computing system utilizes the feedback loop to form the persistent dynamic memory. The delay loop creates the recurrent connection, which represents nonlinear signals at different delay time. Under the influence of varied delay unit, the reservoir layer exhibits a transient response to process temporal signals, which classifies the results at output layer with a linear weighted sum from each individual delay neuron.

2.2. Mackey-Glass (MG) Nonlinear Function

The system has a nonlinear transformation mechanism to project incoming signals onto high-dimensional spaces. Sigmoid and hyperbolic tangent functions are commonly used as a nonlinear transfer function in conventional reservoir computing systems. However, both sigmoid and hyperbolic tangent functions do not have the delay property to mimic the biological behavior of nervous systems. On the other hand, Mackey-Glass (MG) function, a delay differential equation (DDE), falls into the category of delayed systems.

MG function was initially dealing with diseases that exhibit symptom with oscillatory instabilities [17]. The dynamics of the MG function depends on both current and previous states, which can be expressed as:

$$\frac{dV}{dt} = \frac{a \cdot V(t-\tau)}{1 + V(t-\tau)^n} - b \cdot V(t),$$

where $a$ and $b$ are arbitrary design parameters, $n$ and $\tau$ is the nonlinear and delay exponents, respectively.

As illustrated in Fig. 2, the introduction of delay enables the MG function to handle time-delayed feedback structures in a way that mimics the biophysical behavior of nervous
systems. Such a system shows its potential to be served as the nonlinear transformation for reservoir computing systems in [8]. Both high dimensionality and persistent memory can be achieved by the delayed system.

![Figure 2: Simplified diagram of Mackey-Glass (MG) nonlinear function.](image)

Recently, only a few research aim at implementing the MG nonlinear function via electronic circuits. In [18], the MG function is modeled by using three fabricated analog multipliers. Moreover, as demonstrated in [19], the MG function is formed by an n-type and a p-type junction gate field-effect transistor (JFET). Due to its simple structure, this electronic circuit has been widely used to model the MG nonlinear function. However, output signals from these nonlinear devices are relatively small, which often require operational amplifiers (op-amp) to further gain up outgoing signals to the desired level. Although these results demonstrate their capability of nonlinear transformation, within the low power CMOS technology, output signals of these electronic circuits are usually limited to several hundred micro-volts, which require an op-amp with the finite gain of $10^6$ to obtain the sufficient potential level.

To the best of our knowledge, the IC implementation of the MG function and the TDR computing system has not been discovered yet. By implementing the spiking scheme and short-term dynamic memory property, the proposed S-DFR computing system exhibits higher energy efficiency with smaller design area.

3. System Implementation

Inspired by the delay behavior of biological neurons and the MG nonlinear function, highly nonlinear neurons within the conventional reservoir computing system are built on the temporal encoding scheme and the Mackey-Glass (MG) nonlinear function, followed by the dynamic delayed feedback loop. Such a system has potential to linearly project input signals onto high-dimensional spaces to enable the process of classification, and to adapt the spiking temporal neuron signal.

3.1. S-DFR Computing System

The overview of our proposed S-DFR computing system is demonstrated in Fig. 3. During the operation, the analog input signal is firstly encoded into a spiking temporal code by the temporal encoder; the encoded signal is then projected onto a high-dimensional space with the nonlinear node, followed by injecting into the dynamic delayed feedback loop.

To enable the function of the nonlinear node and dynamic delayed feedback loop, a further encoding and decoding operation are embedded. The persistent memory signal, which is combined with the delayed signal from the previous data and the next incoming data, is encoded into a spike train by a temporal encoder (TE). Hence, post-neuron signals are represented by a digitized pulse train. The dynamic delayed feedback loop, which is constructed by integrate-and-fire (IF) neurons, is implemented by using the output spike train from the previous stage as a clock triggering signal for the following stage. When a spike train is generated from the TE, it resets the following IF neuron; meanwhile, the membrane potential of the corresponding IF neuron starts to charge up. Over the delay period of $\tau$, the IF neuron fires a spike train, which results the nonlinear signal at given delay time. This nonlinear-representative spike train travels along the delay line until it reaches the last IF neuron.

To maintain the accuracy of the signal processing, the delayed nonlinear signal in the last IF neuron is decoded into an analog signal, followed by a gain regulator, such that the new input data is dominant.

3.2. Charge Pump based MG Nonlinear Node

Recent researches of the TDR computing system show that the classification accuracy is mainly depended on the nonlinear function. To improve the circuit implementation for better classification accuracy, the charge pump based MG nonlinear node is implemented, as illustrated in Fig. 4.

![Figure 3: An overview of spiking delayed feedback reservoir (S-DFR) computing system.](image)
loop to maintain the stability of the conversion process, is optimized such that it can operate in the sub-threshold region to achieve the minimum power consumption without losing the computational accuracy.

3.3. Integrate-and-fire (IF) Delay Neuron

The IF delay neuron utilizes the capacitor-sensing technique from the traditional leaky-IF (LIF) neuron model, as depicted in Fig. 5.

In the IF delay neuron, the delay time can be regulated by the total integrating time over the membrane capacitor, \( C_m \). The delay time constant, \( \tau \), can be expressed as,

\[
\tau = C_m \cdot \frac{V_{th(in)}}{i_{ex}},
\]

where the \( V_{th(in)} \) is the threshold voltage at the input stage of the IF delay neuron, and \( i_{ex} \) is the controllable excitation current. Theoretically, the mathematical model of the IF delay neuron is similar as the traditional resistor-capacitor delay unit; since the input impedance, \( R_{in} \), of the input stage is equivalent to \( \frac{V_{th(in)}}{i_{ex}} \). Thus, the delay time constant of the IF delay neuron can be rewritten as,

\[
\tau = C_m \cdot R_{in},
\]

However, unlike the traditional resistor-capacitor delay unit that is formed by a large capacitor, the delay time of the IF delay neuron can be regulated by the input impedance. Consequently, a large delay time can be achieved by increasing the equivalent input impedance, which can be formed by reducing the \( i_{ex} \).

During the operation, the membrane capacitor, \( C_m \), senses the excitation current, \( i_{ex} \), and continues to charge up its voltage potential. While \( V_{th(in)} \) exceeds the potential level of reference transistors (\( M_{4,8} \)), two cascading inverters (\( M_{5,6} \) and \( M_{11,12} \)) fire a spike as output. Once the firing process takes place, a feedback loop (\( M_{17-10} \)) generates a high voltage at \( V_{reset} \) to trigger the reset transistor (\( M_2 \)) to reset the membrane capacitor. As such, the firing process for one output spike is completed.

4. Performance Evaluation

The proposed S-DFR computing system is implemented and fabricated using the standard Global Foundries 130nm CMOS technology. In this section, the detailed experimental performance of the proposed S-DFR computing system and its dynamic behavior are evaluated. Furthermore, the system robustness is studied and analyzed through the Monte-Carlo simulations in the Cadence Virtuoso platform.
4.1. Transformation of Nonlinear Node

To achieve the nonlinear behavior, the nonlinear node is designed to emulate the MG delay differential equation. As depicted in Fig. 6, it can be observed that the nonlinear correlation between the input and output signal is successfully achieved.

Figure 6: Nonlinear regime of MG function and circuit implementation.

Similar to the nonlinear characteristic of the ideal MG function, it can be observed that the nonlinearity of the transfer function in the circuit implementation can be regulated by controlling the charge pump current, $I_{cp}$. To demonstrate this feature, the charge pump current, $I_{cp}$, is altered to achieve various nonlinearity of the transfer function as plotted in Fig. 7. Fig. 7(b) presents the ideal MG function with different nonlinearity exponent, $n$, as discussed in Section 2. As $n$ increases, the nonlinearity of the transfer function rises accordingly. The same characteristic can be observed in the circuit implementation. With increasing the charge pump current, the nonlinearity of the circuit’s function can be regulated, as illustrated in Fig. 7(a).

Figure 7: Nonlinearity of the transfer function in: a) circuit implementation; b) ideal MG function.

4.2. Regulation of Static Delay Loop

In the proposed S-DFR computing system, the dynamic behavior can be varied from periodic to chaotic by regulating the total delay time within the dynamic delayed feedback loop. As plotted in Fig. 8(a), the delay time of a single IF delay neuron can be regulated from 180ns to 1.5μs by controlling the excitation current from 50nA to 300nA. Fig. 8(b) demonstrates output spike trains from first sixth IF delay neurons within the dynamic delayed feedback loop. Experimental results indicate that the delay time between each IF delay neuron is found to be approximately identical.

Figure 8: a) The delay time of a single IF delay neuron is measured at various $I_{ex}$; b) output spike trains within the dynamic delayed feedback loop.

4.3. System Robustness

The system robustness of the nonlinear transfer function is studied through the Monte-Carlo simulation with both temperature and process variation. To evaluate the system robustness, the proposed charge pump based nonlinear node is simulated by introducing the process variation with 200 sampling points. The voltage across the loop filter, $V_L$, is examined in this task. As depicted in Fig. 9(a), the average offset voltage at $V_L$ is 6.17mV, and the standard deviation of $V_L$ is 9.38mV, which indicate that 100% of data points lie within a band of $3\sigma$. Furthermore, the temperature variation is analyzed by sweeping the temperature from 0°C to 100°C, as plotted in Fig. 9(b). With the simulated results in temperature variation, the voltage across the loop filter maintains its constant level with the average error rate at 2.5% while the temperature is below 40°C. As the temperature keeps increasing, error rates increase up to 12.5% accordingly.

Figure 9: System robustness in: a) process variation; b) temperature variation.

4.4. System Performance

The phase portrait is a graphics tool to visualize how solutions of a given nonlinear system would behave in the long run. The dynamic behavior of the S-DFR computing system models the DDE with varied delay time. As demonstrated in Fig. 10, phase portraits that obtained from the simulation are plotted by using two signals within the reservoir layer where one of them is collected with time delay. When the total delay time within the system maintains at 100μs, the delayed signal, $V(t - \tau)$, repeatedly traces its initial path even in a long run, which results as periodic as plotted in Fig. 10(a). As the
total delay time within the system increases to 500μs, the delayed signal diverges its initial path without offtracking from the equilibrium point after a long run, which results as the edge of chaotic, as plotted in Fig. 10(b).

![Figure 10: Phase portrait of the dynamic system in: a) τ = 100μs; b) τ = 500μs.](Image)

The proposed S-DFR computing system is fabricated via the standard Global Foundries 130nm CMOS technology, as depicted in Fig. 11. The design area of the whole chip occupies 1.5mm × 1.5mm while each S-DFR computing system module takes up to 175μm × 56μm. The design specification of the proposed S-DFR computing system and the comparison to our previous work [20] are summarized in Table 1.

**Table 1:** The design specification of the proposed S-DFR computing system vs. the hardware implementation or our previous work [20].

<table>
<thead>
<tr>
<th></th>
<th>Previous Work [20]</th>
<th>This Work</th>
</tr>
</thead>
<tbody>
<tr>
<td>Design Structure</td>
<td>Delayed Feedback Architecture</td>
<td></td>
</tr>
<tr>
<td>Implementation</td>
<td>Analog IC Implementation</td>
<td></td>
</tr>
<tr>
<td>Technology</td>
<td>180nm</td>
<td>130nm</td>
</tr>
<tr>
<td>Supply Voltage</td>
<td>1.8V</td>
<td>1.2V</td>
</tr>
<tr>
<td>Nonlinear Function</td>
<td>Inverted Sigmoid</td>
<td>Mackey-Glass</td>
</tr>
<tr>
<td>Delay Structure</td>
<td>LIF-based</td>
<td>IF-based</td>
</tr>
<tr>
<td>Frequency</td>
<td>20MHz</td>
<td></td>
</tr>
<tr>
<td>Design Area</td>
<td>/</td>
<td>0.0098mm²</td>
</tr>
<tr>
<td>Average Power</td>
<td>526μW</td>
<td>206μW</td>
</tr>
<tr>
<td>Power Reduction</td>
<td>/</td>
<td>61%</td>
</tr>
</tbody>
</table>

Compared to state-of-the-art TDR computing systems with embedded MG nonlinear function, which are implemented via the energy-hungry peripheral devices [18, 19, 21] or field-programmable gate array (FPGA) [22], our work represents the first pure analog IC implementation of the TDR computing system. Benefit from the 130nm CMOS technology, the proposed S-DFR computing system with the embedded MG nonlinear function achieves 2X better energy efficiency compared to our previous work [20, 23].

![Figure 11: Layout view of the proposed S-DFR computing system.](Image)

**5. Conclusion**

In this work, we proposed a novel and practical S-DFR computing system, which is embedded with the temporal encoding scheme, the MG nonlinear function, and the dynamic delayed feedback loop. This is a pure analog signal processing element that utilizes the digitized spike train for the data processing. Such a system naturally mimics biological behaviors of mammalian brains with analog components, hence, greatly reduce the power consumption and the design area, compared to state-of-the-art TDR computing systems. The performance and robustness of the proposed S-DFR computing system are studied through the Monte-Carlo simulation. From experimental results, the proposed S-DFR computing system exhibits richness in dynamic behaviors with merely 206μW of power consumption, which indicates the successful implementation of the proposed S-DFR architecture that mimics biological neurons with delay property.

**Acknowledgement**

This material is based upon work funded by Air Force Research Lab (AFRL), under AFRL Grant No. FA8750-16-2-0120, No. FA8750-15-1-0052. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of AFRL.

**References**


[23] Li, J., et al. Analog hardware implementation of spike-based delayed feedback reservoir computing system. in Neural Networks (IJCNN), 2017 International Joint Conference on. 2017. IEEE.