Unsupervised Learning of Mid-level Representations for Event-based Data
Synopsis
In this project, we will develop a system that can extract intermediate mid-level representations from the output of an event camera, thereby supporting a broad range of downstream vision tasks. At the initial stage, the student will carry out a survey of existing event-based learning algorithms, not limited to deep learning works, in addition to a potential idea being explored by our team. Then the student will have the ability to evaluate specific techniques that looks most promising as well as workable for a graduate thesis. This project also offers a unique opportunity to access exciting real-world neuromorphic systems being developed at ICNS, like the Robotic Foosball table.
Description
What better way for machines to sense than to emulate the human senses? Event-based camera belong to a novel family of asynchronous frame-free vision sensors whose principle of operation are based on abstractions of the functioning of biological retinas. These event-based sensors acquire the content of the scenes, i.e., the changes in scenes asynchronously. Every pixel is independent and autonomously encodes visual information in its field of view into precisely time stamped events. In the last few years, processing this unconventional output has been a research task undertaken many labs around the world. A key problem that remains unexplored is mid-level feature extraction from events, and this project aims to address this gap.
Many successful computer vision models for scene recognition transform low-level features such as Gabor filter responses into richer representations of intermediate or mid-level complexity. This process can often be broken down into two steps: (1) a coding step that transforms the features into a representation better suited to the task, and (2) a pooling step that encapsulates the coded features over larger receptive fields. Thus, extracting mid-level representations is a useful intermediate step for various other intermediate vision tasks, such as accurate flow estimation, macrofeature extraction, and subsequent high-level vision tasks. This is especially of significance to the visually sparse, motion dependent and often disjoint information recorded by event cameras.
From a biological vision perspective, there is evidence that much of the ventral stream organization in our brain can be explained by relatively coarse mid-level features without requiring explicit recognition of the objects themselves (Bria Long, Chen-Ping Yu, Talia Konkle, PNAS 2018). This property is neatly demonstrated by the intermediate layers of a deep convolutional neural network that support extraction of the mid-level representations from standard RGB data. However, this aspect remains relatively unexplored for event cameras. At the initial stage of this project, the student shall carry out a survey of existing event-based learning algorithms, not limited to deep learning works, in addition to a potential idea being explored by our team. Then allows us to evaluate specific techniques that looks most promising as well as workable for a graduate thesis.
The applications of mid-level feature extraction include projects that offer a unique opportunity to access exciting real-world neuromorphic systems developed at ICNS, like the Robotic Foosball table and the Robotic Pinball system. In these cases, tracking fast moving objects that are locally consistent is potentially made easier using richer, intermediate representations.
Nature Design, Computational, Analytical
Pre-requisite Good programming skills in MATLAB and Python. Data analysis in R is additionally helpful.
Interested candidates get in touch with Bharath Ramesh to discuss opportunities