Connect Amazon Redshift
Overview
Warehouse Native ExperimentationA method of running feature management experiments directly within your data warehouse, leveraging its processing power and existing data infrastructure. allows you to run experiments on data that already lives in your data warehouseA centralized repository for storing and managing large volumes of structured and semi-structured data. Examples include Snowflake, BigQuery, Redshift, and Databricks.. By connecting Harness FME directly to your Amazon Redshift instance, you can securely query and analyze experiment data from your source of truth.
To begin, connect your Amazon Redshift instance as a data source through a direct connection or using IAM role-based authentication.
Prerequisites
Ensure that you have the following before getting started:
- Access to your organization's Redshift cluster endpoint and database
- A designated results table where experiment results are stored in Amazon Redshift
Setup
Harness recommends the following best practices:
- Use IAM Role authentication instead of static credentials.
- Restrict access to read-only privileges.
- Keep Redshift clusters within secure VPCs and use SSL connections.
- Regularly audit IAM Roles and access policies.
To integrate Amazon Redshift as a data warehouse for Warehouse Native Experimentation:
-
From the Harness FME navigation menu, click FME Settings and click View on a project on the Projects page. Then, navigate to the Data Source tab.
-
Select Redshift as your data warehouse. In the Data Sources tab of your Harness FME project, select Redshift from the list of supported data warehouses.
Project experimentation typeA project uses a single experimentation type based on the metric source used.
When you add a data source to a project, the project’s experimentation type is set to
Warehouse Native. All metrics in the project must then use Warehouse Native metric sources.If a project instead uses metrics created from an ingested event source, the project’s experimentation type is set to
Cloud. -
Enter the following connection details:
Field Description Example Server (Host URL) The endpoint of your Redshift cluster. redshift-cluster.analytics.us-east-1.redshift.amazonaws.comPort The port number used by your Redshift instance (by default, set to 5439).5439Database The database containing your experimentation data. experimentsUsername The database user that Harness FME uses to authenticate with your Redshift cluster. Ensure this user has read access to the required schemas and tables. fme_userSchema The schema within your database containing your experiment or metric tables. analyticsIAM Role ARN The IAM role with permissions to access your Redshift cluster. arn:aws:iam::123456789012:role/FMEAccessRoleExternal ID A unique identifier used in the IAM role trust policy to ensure that only Harness FME can assume the role. Ths is commonly used for secure cross-account or third-party access and is required if the IAM role trust policy specifies an External ID. AbcDeFg-hYjkClm7nop-sKViFlWo5REVqatyHp3zbrZvmd3QuJxCBhkYDVKEhWwjHbWBYFdKQKbKlmi7Hf2DRgc-6NduSqZ5QResults Table Name The name of the table where experiment results are stored. FME_RESULTS -
Select an authentication method. Harness FME supports two authentication methods for connecting to Amazon Redshift:
-
IAM role-based authentication (recommended): Harness FME can assume an IAM role to obtain temporary credentials for accessing your Redshift cluster.
- Create or use an existing IAM role with permissions to access the cluster.
- Attach a policy granting Redshift read access to relevant databases and schemas.
- Provide the IAM Role ARN in Harness FME.
-
Username and password authentication: Provide a Redshift database username and password with read access to the required databases and schemas. Ensure the user has permission to query the tables required for experiment metrics.
-
-
Configure network access for private VPC deployments. If your Amazon Redshift cluster is deployed in a private VPC, Harness must be allowed network access to run experiment calculations against your warehouse.
To enable connectivity:
- Update your Redshift cluster's security group inbound rules to allow connections from the Harness outbound IP addresses.
- Allow traffic on the Redshift port (default:
5439) or your custom port if configured.
Harness FME maintains a list of static outbound IP addresses used for warehouse connections. You must allow these addresses in your security group rules.
-
Test the connection by clicking Test Connection. If the test fails, verify the following:
- The IAM Role has the correct trust policy and permissions.
- The Redshift cluster is publicly accessible (or within a connected VPC).
- The correct database, schema, and port are entered.
-
Select a database and a schema. After authentication, Harness FME retrieves the list of accessible databases and schemas based on your IAM Role permissions. Select the one containing your experiment exposure and event/metric data.
-
Specify a results table. Create a results table where Harness FME will write experiment analysis results. Ensure the following:
- The table exists in your database.
- The schema matches the expected format for experiment results below.
Field Type Description METRICIDVARCHARUnique identifier for the metric being calculated. METRICNAMEVARCHARHuman-readable name of the metric being calculated. METRICRESULTIDVARCHARUnique identifier representing a specific calculation per metric, per experiment, per analysis run. EXPIDVARCHARUnique identifier for the experiment associated with this metric calculation. EXPNAMEVARCHARHuman-readable name of the experiment associated with this metric calculation. TREATMENTVARCHARThe experiment variant (e.g., Control or Treatment) associated with the metric results. DIMENSIONNAMEVARCHARThe name of the dimension being analyzed (e.g., country, platform). DIMENSIONVALUEVARCHARThe corresponding value of the analyzed dimension. ATTRIBUTEDKEYSCOUNTBIGINTCount of unique keys (users, sessions, etc.) attributed to this metric result. REQUESTTIMESTAMPTIMESTAMPTimestamp when the metric computation request occurred. MINFLOAT8Minimum observed value for the metric. MAXFLOAT8Maximum observed value for the metric. COUNTBIGINTTotal number of observations included in the metric calculation. SUMFLOAT8Sum of all observed metric values. MEANFLOAT8Average (mean) of the metric values. P50FLOAT850th percentile (median) metric value. P95FLOAT895th percentile metric value. P99FLOAT899th percentile metric value. VARIANCEFLOAT8Variance of the metric values. EXCLUDEDUSERCOUNTBIGINTNumber of users excluded from the analysis (due to filters, SRM, etc.). ASOFTIMESTAMPTIMESTAMPTimestamp representing when the result snapshot was written. To create the results table with the correct structure, run the following SQL statement in Amazon Redshift:
CREATE TABLE IF NOT EXISTS <DATABASE_NAME>.<SCHEMA_NAME>.<TABLE_NAME> (
METRICRESULTID VARCHAR(256),
TREATMENT VARCHAR(256),
DIMENSIONNAME VARCHAR(256),
DIMENSIONVALUE VARCHAR(256),
ATTRIBUTEDKEYSCOUNT BIGINT,
REQUESTTIMESTAMP TIMESTAMP,
MIN FLOAT8,
MAX FLOAT8,
COUNT BIGINT,
SUM FLOAT8,
MEAN FLOAT8,
P50 FLOAT8,
P95 FLOAT8,
P99 FLOAT8,
VARIANCE FLOAT8,
EXCLUDEDUSERCOUNT BIGINT,
ASOFTIMESTAMP TIMESTAMP,
METRICID VARCHAR(256),
METRICNAME VARCHAR(256),
EXPID VARCHAR(256),
EXPNAME VARCHAR(256)
); -
Save and activate. Once the test passes, click Save to create the connection.
Your Redshift data source can now be used to create assignment and metric sources for Warehouse Native Experimentation.
Example Redshift configuration
| Setting | Example |
|---|---|
| Cluster Endpoint | redshift-cluster.analytics.us-east-1.redshift.amazonaws.com |
| Port | 5439 |
| Database | experiments |
| Schema | analytics |
| IAM Role ARN | arn:aws:iam::123456789012:role/FMEAccessRole |
| Results Table | FME_RESULTS |