Flight Plan: TensorRT
TensorRT is a platform from NVIDIA for deep learning inference. It includes a deep learning inference optimizer and runtime that provides low latency and high-throughput for deep learning models. Models can be developed and trained in any of many different deep learning frameworks (such as TensorFlow) and then, with TensorRT, optimized and calibrated for lower precision and deployed for production. TensorRT is built on CUDA, NVIDIA’s parallel programming model, allowing you to leverage and maximize GPU utilization.
Resources
The table below provides a non-comprehensive list of links to useful introductory resources.
Tutorials
The table below provides a non-comprehensive list of links to tutorials the author has found to be most useful.
Documentation
Please see the table below for a set of documentation resources for TensorRT.
Support
Support resources for TensorRT are described in the support section of the documentation.
Best Practices
A comprehensive list of best practices can be found in the “Best Practices” section of the official TensorRT documentation.
Tips and Traps
Versions
Different versions of TensorRT are compatible only with specific versions of CUDA and TensorFlow. Furthermore, CUDA versions are compatible with specific versions of GPU drivers. Make sure to check out the TensorRT Compatibility Matrix and CUDA toolkit notes to avoid incompatibility issues. TensorRT provides Docker containers pre-packaged with the appropriate versions. See the Movie Recommender: Sample Solution page for example installation instructions.
Python vs C++ API
TensorRT provides both Python and C++ APIs. Initially, most of the documentation and tutorials were written in C++. When having problems finding tutorials and support online for Python, it might be useful to investigate the C++ solutions, as the APIs are mostly equivalent and easy to translate.
Tech Spotlight: Multilayer Perceptron for Collaborative Filtering
Collaborative Filtering is a widely used approach to implement recommender systems. Collaborative filtering methods are based on users’ behaviours, activities, or preferences and predict what users will like based on their similarity to other users.
A Multilayer Perceptron is a type of neural network that contains an input layer to receive the data, an output layer to make a prediction given the input, and in between them, an arbitrary number of hidden layers that represent non-linear functions that, combined, can learn complex problems.
In the paper “Neural Collaborative Filtering (He et al. 2017)“, the authors propose a Deep Learning Framework for Collaborative Filtering. One of the models that the authors evaluate is a Multilayer Perceptron. The problem is presented as a classification problem and the model is trained by using movies watched and rated by users as positive examples, and unwatched movies as negative examples.
Resources
Please see sections below for resources to learn more about multilayer perceptron for collaborative filtering.
Tutorials
The table below provides a non-comprehensive list of links to tutorials the author has found to be most useful.
Documentation
Please see the table below for a set of documentation resources.
Best Practices
General Best Practices for Machine Learning projects apply. There are many online resources regarding this topic, such as: