The DAIR Program is longer accepting applications for cloud resources, but access to BoosterPacks and their resources remains available. BoosterPacks will be maintained and supported until January 17, 2025. 

After January 17, 2025: 

  • Screenshots should remain accurate, however where you are instructed to login to your DAIR account in AWS, you will be required to login to a personal AWS account. 
  • Links to AWS CloudFormation scripts for the automated deployment of each sample application should remain intact and functional. 
  • Links to GitHub repositories for downloading BoosterPack source code will remain valid as they are owned and sustained by the BoosterPack Builder (original creators of the open-source sample applications). 

Introduction

This Sample Solution, from the Time-Series AI Anomaly Detection BoosterPack, demonstrates how Chillwall AI used deep learning models and Autoencoders to solve the problem of anomaly detection.

The Time-Series AI Anomaly Detection BoosterPack demonstrates the use of advanced technologies to simplify a complex and costly big data solution that is often overlooked yet critical to business operations.

This material is informed by the solutions that we develop at Chillwall AI to bring the power of AI to everyday personal context predictions.

Problem Statement

In today’s ever-changing landscape of Big Data, anomaly detection plays a pivotal role in safeguarding business stability.

Anomaly Detection Matters

Do you use Big Data in your solution? Nearly every industry is reliant on data. Left undetected, data anomalies can be highly disruptive affecting product safety, cyber threats, financial losses and process optimizations.

Business as Usual Challenge

Within our vast operational data representing our normal business lurks data anomalies: unexpected deviations, subtle shifts or unusual occurrences. Here lies the potential to disrupt your everyday business.

Complex Solution – Often Overlooked by SME’s

Organizations use top level solutions like security detection software, but early detection is best solved at the system or code level. Canadian SME’s lack the expertise and development time to build complex anomaly detection algorithms (especially where machine learning and data science is not a core competency). This limits speed to market and rapid prototyping of innovative solutions. Purpose-built anomaly detection requires specialized data scientists, network engineers and software developers requiring 6 months or more to build.

Advanced Deep Learning Solution – At your Fingertips

This Sample Solution utilizes Artificial Neural Networks (Autoencoders) to identify abnormal patterns in time-series operational data. An autoencoder is employed for unsupervised learning to compress input data and reconstruct it with minimal error. In this context, the challenge lies in training the autoencoder on a representative set of normal data, capturing its underlying structure. Once trained, the autoencoder is tasked with reconstructing new data instances, with anomalies identified by deviations in the reconstruction error from the norm. We include an ECG sample dataset as a sample reference for data structure and model run.

This BoosterPack offers significant value to an SME:

  • Turn-key solution with limited technical knowledge required.
  • Virtual cloud environment set up for you.
  • Allows upload of raw unsupervised data sets.
  • Automated AI model trains your data.
  • Low-cost fast solution. Runs within your DAIR budget.

Time-Series AI Anomaly Detection in the DAIR Cloud – Sample Solution

Sample Solution Overview Diagram

The diagram below illustrates the structure of the Sample Solution.

Component Descriptions

Significant components used in the Sample Solution are summarized in the table below:

This section guides you through the configuration & deployment of the application powered by artificial neural networks (ANN’s).

How to Deploy and Configure

If you’re a DAIR account holder, you can deploy the Sample Solution by navigating to the Time-Series AI Anomaly Detection BoosterPack  and following instructions on the        page for deploying a new instance. The solution will auto-configure your virtual environment.

1. Click Deploy, then sign into your DAIR account to open the Anomaly Detection BoosterPack stack URL. Click Next on Create Stack page

2.  In Specify stack details page, please configure as shown below:

3. In Configure stack optionspage, select Next.

4. In Review and create page, select Submit.

5. The stack will be created and deployed successfully as shown below, you can click refresh button to check updated stack status.

1. Once your stack is created successfully, click the Outputs tab to view your instance IP Address located under Value. Use this IP Address to access your GPU instance via SSH command.

2. After gaining access to your GPU instance, wait until the following DAIR BoosterPack Installing Complete message is shown in your console. Press Enter to advance to next step.

A sample code and dataset will preload with high model accuracy and the ability to change parameters for your model familiarity. See Technology Demonstration section and Data Architecture section below for more guidance on parameter settings and uploading your own dataset.

3. Run the following 3 commands, then your virtual environment is activated!

. /home/ubuntu/miniconda/etc/profile.d/conda.sh
conda init
conda activate anomaly_detection

This section guides you through a demonstration of the Time-Series AI Anomaly Detection BoosterPack using Deep Learning Artificial Neural Networks (Autoencoders) is compelling because it identifies operational anomalies at the lowest base level.

The demonstration will illustrate a sample use case as guidance for your operational modeling.

Electrocardiogram ECG Dataset Example

Understanding Your Dataset

The original dataset is a 20-hour long ECG downloaded from Physionet. The name is BIDMC Congestive Heart Failure Database (chfdb) and it is record “chf07”. In this dataset, we are using 5000 Electrocardiograms. Columns 0-139 contain the ECG time-series data point for a particular patient, and each example has been labeled either abnormal rhythm or normal rhythm.

In this anomaly detection Sample Solution, we will demonstrate the transformative capabilities of Autoencoder deep learning models. Through sophisticated encoding and decoding mechanisms, our system learns intricate patterns from ECG time series data, enabling it to finding the irregular heart rates, heartbeats, and rhythms. See ECG sample dataset as a sample reference.

1. Model Training: To train the model, you need to run following two commands:

cd anomaly_detection/src/
python3 train_model.py

2.  Setting a Threshold: After we train the Autoencoder model using the training dataset, we calculate the reconstruction loss for every ECG record and draw a histogram graph. The graph will be saved at /reports/ training_MAE_loss.jpg.

Although some abnormal rhythm seems to fool the Autoencoder, their data clearly exhibits a discernible feature that distinguishes it from normal data. We would set a threshold that distinguishes between normal and abnormal behavior patterns in the data, then we can control the rate of false positives, while also capturing most anomalous data points. The default value is 0.3 for ECG dataset, you need to find new threshold value if you use your own dataset and update this value in scripts.

3. Model Accuracy Test: After threshold setting, we will test our model accuracy using the test dataset, you will see the report of accuracy, precision and recall as following below.

Model Accuracy Report

 Model Accuracy Evaluation Description
  Accuracy Accuracy shows how a classification ML model is correct    overall.
  Precision Precision is the ratio between the True Positives and all the  Positives. For our problem statement, that would be the  measure of patients that we correctly identify as having a  heart disease out of all the patients actually having it.
 Recall Recall is the measure of our model correctly identifying True  Positives. Thus, for all the patients who actually have heart  disease, recall tells us how many we correctly identified as  having a heart disease.

4. Run Anomaly Model Inference: After the model is built and selected as final model, you can run following command for model inference:

python3 predict_model.py

Result

Your result will be generated and saved at /result/predicting.csv

1. After finishing the sample demonstration, navigate to Services > Management & Governance > CloudFormation

2. Select the check box of your stack, click Delete. Then, the stack, EC2 instance, and security group will be deleted automatically from your account.

Factors to Consider

The Anomaly Detection sample solution BoosterPack utilizes a cost-effective GPU g4dn.2xlarge instance. This instance meets the needs of most use case scenarios.

Top Reasons for Choosing Artificial Neural Networks (Autoencoders)

One of the key technology decisions in this solution was Autoencoders which are well suited to the Anomaly Detection task.

Alternative Solutions

If you have a small dataset, the anomaly is not complex to identify or depends heavily on the interpretability of the model, then these are alternative solutions:

Sample Solution Data Format Guidelines

Data Upload – Sample Dataset Preloaded

ECG dataset saved in Sample Solution Github repo automatically downloads after stack/instance is created. This allows first time users the opportunity to learn and test the model.

Data Upload – New Datasets

Users can send their own datasets to the instance using the SCP command. For a step-by-step guide, please refer to Transferring Files between your laptop and Amazon instance. It is important to use the same file name as the current file to replace the old datasets.

Input Data Format

Sample Dataset

ECG dataset is in CSV format, and it has 140 columns
which represents the ECG readings and a labels column which has been encoded to 0 or 1 showing whether the ECG is abnormal or normal.  This is a sample of the data format as guide for new datasets.

New Dataset

Using you own dataset; these are the format requirements:

Input Data Quality Check

The quality of the prediction is linked to the quality of the training dataset, so quality of dataset is one of the most important factors in Machine Learning. When you select your own dataset, try to avoid the following:

Output Data Format

Output dataset format is CSV with header and index, the first column is index, and second column predicts the result which has been encoded to 0 or 1 showing whether the ECG is abnormal (0) or normal (1).

The Autoencoder-based anomaly detection solution is designed as a basic framework to speed up the development stage of your solution. The following security concerns and mitigating controls should be considered:

ConcernMitigation
Data transmitted to and from EC2 instances could be intercepted, exposing sensitive information.Use SSH for secured data transmission.
Unauthorized access to EC2 instances could lead to data breaches or tampering with the system.Implement AWS Identity and Access Management (IAM) to control access to EC2 instances, ensuring only authorized users and services have access. Utilize security groups to restrict network traffic.
AWS Login PasswordsAlways change default passwords using your own strong passwords.

Not applicable to this BoosterPack.

Scaling in Anomaly Detection refers to the ability of the technology to handle increasing volumes of data and maintain its performance as the size of the dataset grows. Utilizing parallel processing techniques, such as multi-threading or multi-processing, can help improve the scalability of anomaly detection systems by leveraging multiple computing resources simultaneously. We utilize Tensorflow and CUDA cores for multiprocessing when performing operations for model training. CUDA is a parallel computing platform and application programming interface (API) model, which allows TensorFlow to efficiently utilize GPU resources for parallel computation. If multi-threading or multi-processing are needed for data pre-processing, please refer to Thread-based parallelism and Process-based parallelism for more details.

Our project deploys in the DAIR AWS EC2 instance, and EC2 instances are deployed across multiple physical servers within an Availability Zone (AZ), so if one server fails, the instance can be quickly migrated to another healthy server. Regularly backup data to mitigate potential data loss.

Not applicable to this BoosterPack.

Not applicable to this BoosterPack.

This BoosterPack is a low-cost solution within your DAIR account. This BoosterPack is designed to be run many times per month within your DAIR account budget. This application deploys on On-Demand Linux g4dn.2xlarge Instance, the price is USD 0.835/hour.

All code written by Chillwall AI are licensed under the MIT license.

Source code for the Sample Solution can be found in the Chillwall AI repository at: https://github.com/Chillwall/anomaly_detection.git

Glossary

The following terminology, as defined below, is used throughout this document.

TermDescription
APIApplication Programming Interface
DAIRDigital Accelerator for Innovation and Research. Where used in this document without a version qualifier DAIR should be understood to mean DAIR BoosterPack Pilot environment (a hybrid cloud comprised of public and private cloud vs. the legacy DAIR private cloud service). A hybrid cloud environment
Deep LearningMachine learning with artificial neural networks
ECGElectrocardiogram
GPU   Graphics Processing Unit
Hyper-parameterA parameter set prior to model training, as opposed to those derived during training
InstanceWhen you run virtual machines in the cloud environment, they are known as instances.
LSTMLong Short-Term Memory (Network)
Machine LearningFramework of building models without explicit programming.
OverfittingThe production of an analysis that corresponds too closely or exactly to a particular set of data and may therefore fail to fit to additional data or predict future observation’s reliability.
Time-SeriesA sequence of data taken at successive equal time-intervals.