CNN Accelerator Aided Visitor Verification System on MAX78000FTHR

Published  November 14, 2024   0
CNN Accelerator Aided Visitor Verification System on MAX78000FTHR

The project aims at building a Visitor Verification System for domestic purposes using MAX78000FTHR utilizing state-of-the-art CNN models such as MTCNN and FaceNet for face recognition and verification.

Why this project?

Every day, we are visited by numerous people at our homes. Some are familiar faces, while others are strangers. At times, these strangers can become intruders, especially when the residents are elderly. Due to their inability to respond swiftly, elderly individuals may find themselves at a disadvantage, allowing strangers to intrude into their homes for theft or other malicious purposes. To avoid this situation, the visitor can be verified before opening the door to interact with them.

Choice of hardware and algorithm:

The MAX78000FTHR Evaluation Kit features an ARM Cortex-M4F processor, which benefits from a large community and extensive documentation support. These resources help developers maximise the use of available tools and solve any issues they encounter. The MAX78000FTHR has a CNN accelerator and includes an on-board VGA CMOS camera making it suitable for image processing applications.

Since visitors arrive throughout the day, their identification and verification must be performed continuously. Developing deep learning models typically requires thousands of labelled samples per class, which makes data collection costly and time-consuming. For this purpose, Face Identification (FaceID), built from MTCNN (Multi Tasking Convolutional Neural Network) and FaceNet with Knowledge Distillation approach, is used at the edge with minimized latency and optimized power consumption using a MAX78000 CNN inference engine.

What's unique?

  1. On device processing - Instead of relying on cloud-based services, our system performs facial recognition and verification directly on the MAX78000 hardware. This ensures faster response times and eliminates concerns about data privacy, as sensitive information never leaves the local device. This also avoids subscriptions to cloud services which is quite expensive for advanced features like facial recognition.

  2. Lower Power Consumption - The MAX78000 is optimized for energy efficiency, making our solution ideal for continuous operation in battery-powered setups, unlike many other systems that require frequent charging or depend on wired power.

  3. Knowledge Distillation approach - It is the process of transferring knowledge from a large model to a smaller one. Large models have higher knowledge capacity than small models. Yet, the capacity might not be fully utilised. So, the aim here is to teach the exact behaviour of the big network to the smaller network (here from FaceNet to FaceID model).

This project aligns with Sustainable Development Goal 11: Sustainable Cities and Communities, specifically focusing on Target 11.7 – ensuring safe, inclusive, and accessible public spaces, particularly for vulnerable populations such as the elderly.

Impact Statement

The Visitor Verification System built on MAX78000FTHR helps the residents of a house to verify their visitor before starting any conversation with them. MAX78000FTHR, an ultra-low power Convolutional Neural Network (CNN) inference engine to run Artificial Intelligence (AI) computations on tiny edges of IoT, helps in real-time monitoring of visitors without giving place for privacy and security issues. The uniqueness of this project lies in its use of the MAX78000FTHR, which offers significantly lower power consumption per operation compared to traditional software-based CNN implementations, making it an ideal solution for continuous, low-power AI-driven applications at the edge.

Components Required

  • MAX78000 Feather (MAX78000FTHR)

  • TFT 2.4inch SPI 240x320 LCD Display based on ILI9341 Controller IC

  • SD card with a minimum of 512MB or above of memory capacity (2GB SD card is used in our demonstration)

To see the full demonstration video, click on the YouTube Video below.

Circuit Description

In this project, we will be using the on-board VGA CMOS camera present in the MAX78000FTHR to capture still images. For displaying purposes, we are planning to use 2.4inch 240x320 LCD Display based on the ILI9341 driver IC (for displaying).

Since the MAX78000FTHR doesn't have a display on-board, we have to connect the SPI pins of the main board to the 2.4inch LCD display unit. Additionally, we do wish to include touch functionality so that the user can interface with the display which requires connecting the necessary I2C pins on the board to the display.

A visual representation of the circuit diagram is shown below.

Circuit Diagram CNN Accelerator

Since the deployment of this device would be indoors near the entrance of a house, we will be using 5V USB Power rails to power up the MAX78000FTHR and the TFT display.

Face Detection And Identification

The block diagram below explains the process of deploying the CNN model on MAX780000 micro-controller.

Block Diagram  of CNN Accelerator

FaceID Model Description

The face identification is solved in three main steps:

  • Face Extraction: Detection of the faces in the image to extract a rectangular subimage that contains only one face.

  • Face Alignment: Determination of the rotation angles (in 3D) of the face in the subimage to compensate its effect by affine transformation.

  • Face Identification: Identification of the person using the extracted and aligned subimage.

Multitask-cascaded convolutional neural networks (MTCNN) solves both the face detection and alignment steps. Face identification is based on learning a signature, i.e., embedding, for each facial image whose distance to another embedding gives a measure about the face's similarity. It is expected to observe small distances between the faces of the same person and large distances for the faces of distinct people.

FaceNet is one of the most popular CNN-based models developed for the embedding-based face identification approach. The triplet loss is the key behind its success. This loss function takes three input samples: the anchor, positive sample from the same identity with the anchor, and negative sample from a different identity. The triplet loss function gives low values when the distance of the anchor is close to the positive sample and distant to the negative one.

A knowledge distillation approach is adopted to develop this tinier CNN model (of 450k parameters) from FaceNet, which typically has 7.5 million parameters, as it is a widely appreciated neural network for FaceID applications.

FaceID Model Main Steps

The above block diagram explains how MTCNN and FaceNet models are leveraged to create the compact FaceID model, AI85FaceIdNet. The embeddings from the FaceNet model serve as the target for AI85FaceIdNet. Techniques such as center loss or triplet loss are not used, as these are assumed to be addressed by the FaceNet model. Instead, the model's loss function is based on the Mean Squared Error (MSE) between the target and predicted embeddings, which is also used to define face similarity.

The training of the MAX78000 FaceID model, AI85FaceIdNet, is performed using VGGFace-2 and YouTubeFaces databases and validated using MaximCeleb dataset.

Firmware

As far as our research goes on MSDK firmware development environment provided by Analog Devices (which acquired Maxim AI), we will have to use

  • ai8x-training for training (or quantizing a model for MAX series microcontrollers)

  • ai8x-synthesis for generating the C code that is required to configure the CNN accelerator unit and model in general.

  • MSDK toolkit can then be used on VS Code to write Firmware using C language for MAX78000 chip.

For the peripherals mentioned,

  • Camera: The MSDK has provided required HAL to use the on-board CMOS VGA Camera. Reference: camera.h

  • TFT LCD Display: MSDK also provides the HAL to interface with LCD Display based on the ILI9341. Reference: tft_ili9341.h

At the time of submission, we have implemented the Face detection and face identification model on the CNN accelerator with code support for the TFT display as well. Unfortunately, we weren't able to get the TFT display with SPI for data transfer as of writing this (the one we have currently uses 8 bit parallel interface but there is no software driver for it available from MSDK). Due to this our current demo will use the USB COM Port (UART terminal) in the board to print the inference from the CNN model on the PC console.

As discussed previously in the Face identification and detection section, we have used 3 CNN models to build this project. The weights of 2nd CNN model that is used to generate the 512-length encoding for the input image is too large to fit into the flash of this device and hence we will storing the weights in the root directory of a SD card under the name weights_2.bin.

Since this is a multi-file project, we are not attaching the code snippets here in this page. All the code used for this project can be accessed at Project GitHub Repo.

Firmware Build Steps

Create a new python3.11.x virtual environment and install the required python modules for training purposes

  • GitPython==3.1.43

  • PyGithub==2.3.0

  • PyYAML==6.0.1

  • matplotlib==3.9.1

  • numpy==1.26.4

  • onnx==1.16.1

  • pytest==8.3.2

  • rich==13.7.1

  • xxhash==3.4.1

  • batch-face==1.5.0

  • pytorch (CPU only)

 

  • Referring to the project structure in Project GitHub Repo, a db directory is to be created which contains individual directories for the images of people whom we wish to store as known individuals (relatives or friends of the old person) in the model. Around 5 images of each known individual are stored in the individual directories of that person.

  • The gen_db.sh shell file is used to to train the pre-trained CNN Face ID model on new input data (that is the images stored in db directory). Once the training is done with the help of pytorch (or ai8x-training), the weights of the model are then quantized using the ai8x-synthesis tool to support the MAX78000 CNN hardware accelerator.

  • Running the shell file generates the new embeddings.h and weights_3.h file that are used in the next steps.

  • Since the weights of this 2nd model that generates the 512-length encoding feature vector for the input image is very large to be stored in the flash of the chip, it is stored in the root directory of a SD card as weights_2.bin file. This file is present in the SDHC_weights directory in GitHub repo.

  • The project is then built using the make command that generates the necessary elf file to run the project.

  • The SD card containing the weights_2.bin file in the root directory is first inserted into the SD card slot before powering and flashing (or programming) the chip.

  • Once the elf file is flashed, the program begins running on the chip.

Execution Flow

The terminal will log the action that takes place.

At first the chip initialises by loading the 1st and 3rd CNN model weights. Then the 2nd CNN model weights are loaded into the chip by first mounting the SD card and reading the weights_2.bin into the CNN accelerator.

After this the face detection and identification loop starts. If a face is detected from the input image, the face identification function is then executed else the loop continues. The terminal logs are shown below:
MAX78000 Feather Facial Recognition Demo
I[main : 106] Initializing...
I[main : 180] Initializing SD Card...
I[sd : 554] Card inserted
I[sd : 86] SD card mounted.
I[sd : 566] SD Card Opened
I[main : 185] -----
I[facedetection: 148] Image Capture Time : 33ms
I[facedetection: 149] FaceDetect Process Time : 70ms
I[facedetection: 150] Total FaceDetect Time : 104ms
I[main : 192] Face detected!
I[faceid : 364] FaceID result: subject id: 5
I[faceid : 366] FaceID result: subject name: Emma_W
I[faceid : 96] FaceID Processing Time : 344ms
I[main : 202] ----- (Total loop time: 470ms)

An average total loop time of 470ms is achieved in case of both face detection and identification. Obtaining inferences from the CNN model in such a short compute time on a microcontroller is made possible by the CNN hardware accelerator, which not only enhances processing speed but also significantly reduces power consumption.

Terminal Window Screenshots

On showing an image of Leonardo DiCaprio (one of our test images)

Terminal Window Screenshots Leonardo DiCaprio


On showing an image of Emma Watson

Terminal Window Screenshot Emma Watson

Future works

  • A full working system with the LCD display with Custom UI deployed in the field (the entrance of a home in this case).

  • LVGL support for MAX series micro-controllers is mentioned in the MSDK GitHub repository. We aim to use this graphics library for rendering images, text, buttons and implementing user interactions (depending on the flash availability on the microcontroller and time left).

  • Explore on-chip training for unknown faces (new entry in the model).

Literature references

Tools and resources

Firmware driver support

We would like to thank CircuitDigest and Digikey India Team for organising and supporting this project initiative.

For all the code, follow the GitHub link below:

Aided Visitor Verification System CodeAided Visitor Verification System Code Zip File

Have any question realated to this Article?

Ask Our Community Members