Audio-Visual Data Processing and Concept Formation


The Brain-Like Artificial Intelligence (BLAI) is pioneered by Prof Nikola Kasabov and here it is applied to a specific application.

This project develops novel methods and systems that are based on the NeuCube SNN brain-like architecture, for:

  1. Audio data processing - pattern recognition, modelling, prediction, understanding, such as: music, speech, or any types of sounds;
  2. Visual information processing using the DVS input device for video encoding into spikes, and consecutive learning in a specialised NeuCube model for: pattern recognition, modelling, prediction, and understanding;
  3. Integrated multi-modal audio-visual information processing: pattern recognition, modelling, concept formation, understanding, and language modelling.

Efficient multimodal audio-visual information processing is still a challenge for Information Sciences, and the human brain is the most efficient machine that achieves this. Understanding the functioning of the brain and its processes, and utilising this for the creation of more sophisticated intelligent systems, have been challenging tasks and an ongoing focus of research in both neurophysiology and artificial intelligence (AI). Recent advancements in computer science have led to the emergence of brain-inspired computing paradigms that can support the analysis of these processes due to their similarity to the real brain. This study aims to create a three-dimensional, spatio-temporal brain-like model of integrated audio-visual information processing using the NeuCube spiking neural network (SNN) architecture. The model will utilise principles of how the brain perceives information through auditory and visual pathways for object recognition in order to investigate the formation of semantic concepts. Based on previous research, which has shown that brain-inspired evolving systems can handle multi-modal input very well, this study was designed to use a biologically realistic brain template with up to two million neurons to compute and simulate neural behavior. The auditory and visual input stimuli will be transformed into electrical signals (spikes) using artificial cochlear and retinal simulation software and possibly devices, and they will be entered into their corresponding processing areas in the brain. It is hoped that such a system can be beneficial in the search for new AI technologies for human-computer interaction, along with a better understanding of the multimodal audio-visual information processing and concept formation in the human brain.

FIGURE 1. “FIGURE 1. The audio-visual information processing model. Illustrations of the ear and eye used with permission from”.

Learning the Auditory Characteristics of Classical Music

Music has been a vital part of mankind for centuries, expressing cultural, emotional, and spiritual aspects of life. The human brain can memorise and recall musical pieces by learning the dynamic, characteristic patterns of soundwaves through its auditory pathway and related brain areas. In this study, we tried to understand these patterns by creating a computer model that simulates the processes in the inner ear and the auditory cortex, using the NeuCube Deep Spiking Neural Network software architecture. We applied this model to three pieces of classical music by Mozart, Vivaldi, and Bach, and found that there were distinct differences between the neural connections created in each network. Furthermore, when taking 0.5 second-long samples of the music, our model could identify with 97.43% accuracy to which of the three pieces it belonged. Our research suggests that brain-like models can outperform the human brain in terms of music memorisation and recognition.

FIGURE 2. Results of the case study on classical music.

Learning the Visual Characteristics of Written Digits

Vision is a vital sense for humans when interacting with and understanding their environment. The human brain can interpret and memorise visual cues from our surroundings by learning the dynamic, characteristic patterns of lightwaves through its visual pathway and related brain areas. In this study, we tried to understand these patterns by creating a computer model that simulates the processes in the retina and the visual cortex, using the NeuCube Deep Spiking Neural Network software architecture. We applied this model to a dataset of written digits [1] and achieved 92.05% accuracy when replicating the experimental setup of previous studies [2]. Our model outperforms previous results and suggests that brain-like models have an advantage over established computational methods for visual processing tasks.

FIGURE 3. Results of the case study on written digits.

[1] Yousefzadeh, A., Serrano-Gotarredona, T., & Linares-Barranco, B. (n.d.). MNIST-DVS and FLASH-MNIST-DVS Databases. Retrieved August 27, 2017, from

[2] Zhao, B., Ding, R., Chen, S., Linares-Barranco, B., & Tang, H. (2015). Feedforward categorization on AER motion events using cortex-like features in a spiking neural network. IEEE transactions on neural networks and learning systems, 26(9), 1963-1978.

[3] Henderson, J. A., Gibson, T. A., & Wiles, J. (2015). Spike event based learning in neural networks. arXiv preprint arXiv:1502.05777.

Related Papers and Benchmarking

The proposed methods and systems showed superior results in the following aspects:

  1. Better data analysis and classification/regression accuracy (see table below);
  2. Better visualisation of the created models, with a possible use of VR;
  3. Better understanding of the data and the processes that are measured;
  4. Enabling new information and knowledge discovery through meaningful interpretation of the models.
Henderson et al., 2015
Zhao et al., 2015
Feed-forward SNN
Composite system, including convolution, feature spike conversion, motion detector, and SNN classifier
Composite system, including bio-inspired spike encoding algorithm, and retinotopic mapping into a brain-like SNN for learning and classification
Learning Algorithm
A new scheme for spike based learning
Tempotron learning
STDP learning
Train test ratio
random selection
random selection
10-fold cross validation
Accuracy (%)

See also some of the related papers:

Kasabov, N. K. (2014). NeuCube: A spiking neural network architecture for mapping, learning and understanding of spatio-temporal brain dataNeural Networks52, 62-76.

Kasabov, N., Scott, N. M., Tu, E., Marks, S., Sengupta, N., Capecci, E., Othman, M., Gholoami Doborjeh, M., Murli, N., Hartono, R., Espinosa-Ramos, J. I., Zhou, L., Alvi, F., Wang, G., Taylor, D., Feigin, V., Gulyaev, S., Mahmoud, M., Hou, Z. G., Yang, J. (2016). Evolving spatio-temporal data machines based on the NeuCube neuromorphic framework: design methodology and selected applicationsNeural Networks78, 1-14.

R&D System

For this project, a R&D system has been developed based on NeuCube. The system can be obtained subject to licensing agreement.


The developer of this project is:

Anne Wendt

Giovanni Saraceno

Lukas Paulun