The DAIR Program is longer accepting applications for cloud resources, but access to BoosterPacks and their resources remains available. BoosterPacks will be maintained and supported until January 17, 2025. 

After January 17, 2025: 

  • Screenshots should remain accurate, however where you are instructed to login to your DAIR account in AWS, you will be required to login to a personal AWS account. 
  • Links to AWS CloudFormation scripts for the automated deployment of each sample application should remain intact and functional. 
  • Links to GitHub repositories for downloading BoosterPack source code will remain valid as they are owned and sustained by the BoosterPack Builder (original creators of the open-source sample applications). 

The Geospatial-AI lnformation Toolbox (GAIT)

A Cloud-based Platform for Geospatial Intelligence with Machine Learning

 

Provided by: Ecosystem Informatics (ESI)

Ready for takeoff?

This Flight Plan is divided into two parts: Geospatial intelligence and Machine Learning. The Sample Solution will detail how we combine these technologies in the DAIR cloud.

Part 1
  • Geospatial Intelligence Overview
  • Best Practices
  • Tips and Traps
  • Resources
Part 2
  • Machine Learning Overview
  • Best Practices
  • Tips and Traps
  • Resources

Geospatial Intelligence Overview

What is geospatial intelligence?

Geospatial intelligence is a broad field that combines geospatial data with other types of data from sources like social, political, and environmental sciences. 

The Intelligence Community defines geospatial intelligence as:

“…the use and analysis of geospatial information to assess geographically referenced activities on Earth.”

While often associated with national defense – geospatial intelligence is increasingly being leveraged by civilian and private sector organizations in the telecommunications, transportation, public health and safety, and real estate industries. This data helps them improve their products and services and better serve their customers.

The basic principle is to organize and combine all available data around its geographical location on Earth and then use it to prepare products that can be used by planners, emergency responders, and decision makers.

In this BoosterPack, you will deploy an application that uses machine learning algorithms to perform analysis of geospatial data and the relationship, with a variable that serves as the predict (the entity we want to predict). This data is overlaid on a map to effectively visualize patterns, and relationships within a set of geospatial data.

What value has it added to my business?

The merging of cloud computing with machine learning and satellite imagery has allowed us to gather more accurate global insights about everything from extreme weather and sea-level rise to water and air pollution. We collect and analyze information by harnessing geospatial data and then – using machine learning algorithms – detect trends, patterns, and changes. This helps us translate environmental data into easy-to-digest information for potential clients in sectors like agriculture, health care, insurance, and government.

Best Practices

The data structure for this project requires three important dimensions:

In most data frames with all three features, the spatial dimension typically takes the form of addresses which must be converted into longitudinal and latitudinal coordinates that can be overlaid on a map. Several APIs are used to facilitate this conversion process and work with data that does not have associated longitude and latitude values. See the following resources for an overview of spatio-temporal data and geocoding for data analysis.

See these resources for an overview of spatio-temporal data and geocoding for data analysis.

Tips and Traps

The output of a geospatial analysis (with spatio-temporal data) should show change over space and time, and shouldn’t be confused with a time-series model or analysis. Cluster analysis of several points within a spatial region may introduce spurious data output if the data is treated as a purely time-series dataset. It can limit the accurate prediction of variables which have a relationship with other dependent variables that are significantly affected by their spatial attributes.

For example, real estate costs are dependent on many factors. Graphical plots showing price changes over time provide a one-dimensional view of what is happening temporally. However, there is little understanding of how other factors may be contributing to this change. Spatial elements, like proximity of public transportation, schools, and law-enforcement presence (patrol routes, police stations, or effective neighborhood watch organizations) can influence the price of homes in comparison with other neighborhoods. Factoring all, or some elements (both spatial and temporal) in the analysis process greatly improves home price predictions over time.

Resources

Tutorials

The table below provides a non-comprehensive list of tutorials the author found most useful.


Machine Learning Overview

What is Machine Learning?

Machine learning is a branch of artificial intelligence (AI) and computer science which focuses on the use of data and algorithms to imitate the way that humans learn, thus gradually improving its accuracy. Machine learning is a vital component in the growing field of data science.

What value has it added to my business?

Using statistical methods, algorithms are trained to make classifications or predictions, uncovering key insights within data mining projects. These insights subsequently drive decision making within applications and businesses, ideally impacting key growth metrics.

The convergence of cloud computing with machine learning and satellite imagery means there’s now a better way to gather global insights about everything from extreme weather and sea-level rise to water and air pollution. The collection and analysis of information by harnessing geospatial data and then, using machine learning algorithms to detect trends, patterns, and changes helps our business translate environmental data into easy-to-digest information for potential clients in sectors like agriculture, health care, insurance and government.

Why choose machine learning over the alternatives?

We chose machine learning to leverage the power of advanced pattern recognition which leads to highly accurate data and the ability to predict future trends.

Machine learning models can be built to be highly modular and scalable enabling deployment at scale.

Best Practices

Technology infrastructure has multiple roles when it comes to machine learning applications. One of the major tasks is to define how we gather, process, and receive new data. After that, we need to decide how we train our models and version them. Finally, we must consider how to deploy the model in a production environment. In all these tasks, infrastructure plays a crucial role. You will probably spend more time working on the infrastructure of your system, than on the machine learning model itself.

Microservices architecture can help you achieve modularity and scalability. Using technologies like Docker and Kubernetes, you should be able to encapsulate separate parts of the system. This way you can make incremental improvements in each of them and replace each component independently as necessary. Also, scaling with Kubernetes is typically a low-effort process.

Tips and Traps

When choosing an algorithm, always consider accuracy, training time, and ease of use. Many users put accuracy first, while beginners tend to focus on algorithms they know best.

When presented with a dataset, first consider how to obtain results, no matter what those results might look like. Beginners tend to choose algorithms that are easy to implement and can obtain results quickly – this is fine when it is just the first step in the process. Once you obtain results and become familiar with the data, you may spend more time using more sophisticated algorithms to strengthen your understanding of the data, and further improving the results.

The best algorithms may not have the highest reported accuracy. An algorithm usually requires careful tuning and extensive training to obtain its best performance.

Resources

Documentation

The table below includes documentation resources for machine learning and cloud platform building and deployment.

Got it? Now let us show you how we deployed it on the DAIR Cloud…

There is a substantial need for an open-source platform that includes a NoSQL database application with its interface and AI capabilities that are automated and quick-to-deploy in the cloud, while being compatible with geospatial data. This problem is general and relevant to several businesses types and industries. It enables users to record, edit, browse, and query spatio-temporal data to answer questions of where, when, how, what, and who, for informed decision-making.

This Sample Solution demonstrates how a modular and scalable cloud-based platform can be built to leverage the power of geospatial intelligence and machine learning. Once deployed, the basic infrastructure platform can be used as a foundation to build and deploy similar cloud-based platforms at scale.