Databricks for Profiles Sync

Free x
Team x
Business ✓
+
Unify ✓
?

Unify requires a Business tier account and is included with Engage.
See the available plans, or contact Support.

With Databricks for Profiles Sync, you can use Profiles Sync to sync Segment profiles into your Databricks Lakehouse.

Getting started

Before getting started with Databricks Profiles Sync, note the following prerequisites for setup.

  • The target Databricks workspace must be Unity Catalog enabled. Segment doesn’t support the Hive metastore. Visit the Databricks guide enabling the Unity Catalog for more information.
  • Segment creates managed tables in the Unity catalog.
  • Segment supports only OAuth (M2M) for authentication.

Warehouse size and performance

A SQL warehouse is required for compute. Segment recommends a warehouse with the the following characteristics:

  • Size: small
  • Type Serverless otherwise Pro
  • Clusters: Minimum of 2 - Maximum of 6

To improve the query performance of the Delta Lake, Segment recommends creating compact jobs per table using OPTIMIZE following Databricks recommendations.

Segment recommends manually starting your SQL warehouse before setting up your Databricks destination. If the SQL warehouse isn’t running, Segment attempts to start the SQL warehouse to validate the connection and may experience a timeout when you hit the Test Connection button during setup.

Set up Databricks for Profiles Sync

  1. From your Segment app, navigate to Unify > Profiles Sync.
  2. Click Add Warehouse.
  3. Select Databricks as your warehouse type.
  4. Use the following steps to connect your warehouse.

Connect your Databricks warehouse

Use the five steps below to connect to your Databricks warehouse.

To configure your warehouse, you’ll need read and write permissions.

Step 1: Name your schema

Pick a name to help you identify this space in the warehouse, or use the default name provided. You can’t change this name once the warehouse is connected.

Step 2: Enter the Databricks compute resources URL

You’ll use the Databricks workspace URL, along with Segment, to access your workspace API.

Check your browser’s address bar when inside the workspace. The workspace URL should resemble: https://<workspace-deployment-name>.cloud.databricks.com. Remove any characters after this portion and note the URL for later use.

Step 3: Enter a Unity catalog name

This catalog is the target catalog where Segment lands your schemas and tables.

  1. Follow the Databricks guide for creating a catalog. Be sure to select the storage location created earlier. You can use any valid catalog name (for example, “Segment”). Note this name for later use.
  2. Select the catalog you’ve just created.
    1. Select the Permissions tab, then click Grant.
    2. Select the Segment service principal from the dropdown, and check ALL PRIVILEGES.
    3. Click Grant.

Step 4: Add the SQL warehouse details from your Databricks warehouse

Next, add SQL warehouse details about your compute resource.

  • HTTP Path: The connection details for your SQL warehouse.
  • Port: The port number of your SQL warehouse.

Step 5: Add the service principal client ID and client secret

Segment uses the service principal to access your Databricks workspace and associated APIs.

Service principal client ID: Follow the Databricks guide for adding a service principal to your account. This name can be anything, but Segment recommends something that identifies the purpose (for example, “Segment Profiles Sync”). Segment doesn’t require Account admin or Marketplace admin roles.

The service principal needs the following setup:

Client secret: Follow the Databricks instructions to generate an OAuth secret.

Once you’ve configured your warehouse, test the connection and click Next.

Set up selective sync

With selective sync, you can choose exactly which tables you want synced to the Databricks warehouse. Segment syncs materialized view tables as well by default.

Select tables to sync, then click Next. Segment creates the warehouse and connects databricks to your Profiles Sync space.

You can view sync status, and the tables you’re syncing from the Profiles Sync overview page.

Learn more about using selective sync with Profiles Sync.

This page was last modified: 05 Feb 2024



Get started with Segment

Segment is the easiest way to integrate your websites & mobile apps data to over 300 analytics and growth tools.
or
Create free account