July 11, 2023

Esri partners with Data and AI Company Databricks

As a result of the new partnership, data scientists now have access to spatial analytics tools from Esri’s ArcGIS software running natively within Databricks’ big data platform.

Databricks is the provider of a big data tool named Databricks Lakehouse Platform which merges data engineering, data science, machine learning, and analytics within a single platform. To provide even more valuable insights to data scientists, spatial analytics is often added to the mix to put large amounts of data in proper context. A new partnership with Esri brings advanced spatial analytics capabilities found in Esri’s ArcGIS software to the Databricks Lakehouse Platform, allowing users to perform spatial analytics at scale with it. 

How it works

More specifically, two Esri offerings integrate with the Databricks Lakehouse Platform: ArcGIS GeoAnalytics Engine and Big Data Toolkit. These tools were built specifically to enable users to perform spatial analysis on big datasets. By integrating both directly into the Databricks Lakehouse Platform, users do not need to switch between applications or move their data from one environment to another.

Instead, they can now install GeoAnalytics Engine and run spatial SQL functions and analysis tools using a Spark cluster managed by Databricks. As GeoAnalytics Engine extends PySpark (the Python API that is used for Spark, which is one of the open source projects that underpin Databricks), they can spatially enable their data wherever it lives and seamlessly execute spatial analysis workflows alongside other data science and machine learning technologies in a Databricks notebook. How the Esri Big Data Toolkit integrates with Databricks is explained below. 

Extending Apache Spark with ArcGIS GeoAnalytics Engine

ArcGIS GeoAnalytics Engine is a cloud-native geoanalytics library which extends Apache Spark with geoanalytics tools and functions, essentially providing spatial analysis functionality on top of big data. In the documentation, these are specified as ready-to-use SQL functions and analysis tools. The product consists of a Spark plugin and Python library containing the analysis tools. SQL is a programming language for writing data queries that operate on databases. Apache Spark is an open-source unified analytics engine for large-scale data processing. The Apache Spark website describes it as a “multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters”.  

GeoAnalytics Engine can be installed on Databricks in Azure, AWS, or Google Cloud Platform. Additionally, GeoAnalytics Engine is also available on Amazon EMR and Azure Synapse Analytics. Esri has extensive online documentation available on how to install GeoAnalytics Engine on any of these cloud offerings, as well as how to install and setup GeoAnalytics Engine on a personal computer, a standalone Spark cluster, or a managed Spark service in the cloud. It is worth knowing that GeoAnalytics Engine functions and tools operate on vector geometry data only, which includes points, lines, polygons, multipoints, and generic vector geometries.

Big Data Toolkit in Azure Databricks

Esri’s Big Data Toolkit (BDT) allows customers to aggregate, analyze, and enrich big data within their existing big data environment. It is delivered as a term subscription that includes a set of spatial analysis and data interoperability tools that work with an existing big data environment, helping data scientists and analysts enhance their big data analytics with spatial tools that take advantage of the massive computing capacity they already have. More info on this Esri Professional Services Solution is found here. The Setup section explains how to install and set up the Esri Big Data Toolkit in Azure Databricks.

Want more stories like this? Subscribe today!



Read Next

Related Articles

Comments

Join the Discussion