Skip to main content

Harness Query Language (HQL) Reference

Last updated on

HQL is a domain-specific query language for querying heterogeneous data sources in the Harness Data Platform. It provides a unified interface for querying events, entities, metrics, and views across multiple database backends (StarRocks, AlloyDB, BigQuery, PostgreSQL, MySQL) with pipe-based operations and automatic SQL generation.


Data Sources

HQL supports four types of data sources. Type identifiers can be unquoted for simple names or quoted for names with special characters (like colons).

Source TypeDescriptionExamples
eventTime-series and log dataSpans, logs, cost events, clickstream
entityBusiness objects and transactional dataPipeline executions, artifacts, API entities
metricAggregated and analytical dataAggregated cost metrics, performance metrics
viewVirtual tables that expand to CTEs at query timeCustom views like successful_pipelines
-- Unquoted identifier
find event span

-- Quoted identifier with special characters
find event "ccm:unified_table"

-- Entity type
find entity "pipeline:pipeline_execution"

-- Metric type
find metric "ccm:cost_metrics"

Table Aliases

Assign aliases to data sources for use in joins and field references.

find entity "pipeline:pipeline_execution" p
| filter p.status = "FAILED"

find event "ccm:unified_table" t
| select { t.region, sum(t.cost) as total }
| group_by t.region

Operations

Operations are chained using the pipe (|) operator. Each operation transforms the result of the previous one.

Filter

Filters rows based on conditions. Supports equality, comparison, pattern matching, membership, string containment, and null checks.

-- Basic equality
find event "ccm:unified_table"
| filter region = "us-east-1"

-- Date range
find entity "pipeline:pipeline_execution"
| filter start_time >= ago(30d)

-- OR conditions
find event span
| filter span_kind = "internal" or span_kind = "server"

-- IN operator
find entity artifact
| filter tag in ["v1.0", "v1.1", "v2.0"]

-- NULL checks
find entity api
| filter description is not null

-- String contains
find event logs
| filter message contains "error"

-- Comparison with AND
find event "ccm:unified_table"
| filter cost > 100 and region = "us-west-1"
OperatorDescription
=, !=Equality / Inequality
>, >=, <, <=Comparison
=~Pattern matching (regex)
in, !inMembership
contains, !containsString containment
is null, is not nullNull checks
is empty, is not emptyEmpty checks
and, orLogical operators (and has higher precedence)

Select

Selects specific fields or expressions. Use -> for nested field access.

find entity "pipeline:pipeline_execution"
| select { pipeline_id, status, start_time }

-- With expressions and aliases
find event "ccm:unified_table"
| select {
region,
sum(cost) as total_cost,
count() as record_count
}

-- Nested field access
find entity event e
| select { e.tags->env, e.user->profile->name }

Group By

Groups rows by specified expressions. All non-aggregated fields in select must be included in group_by.

find event "ccm:unified_table"
| select { region, cloudProvider, sum(cost) as total_cost }
| group_by region, cloudProvider

Aggregate

Performs second-order aggregations (aggregations on already aggregated data).

find entity "pipeline:pipeline_execution"
| select { status, count() as count }
| group_by status
| aggregate { sum(count) as total }

Order By, Limit, Offset, Distinct

find entity "pipeline:pipeline_execution"
| filter start_time >= ago(7d)
| select { pipeline_id, status, start_time }
| order_by start_time desc
| limit 100
| offset 20

-- Distinct values
find event span
| select distinct { service_name }
| order_by service_name asc

Functions

Aggregation Functions

FunctionDescription
count()Count of rows
count(expr)Count of non-null values
sum(expr)Sum of values
avg(expr)Average of values
min(expr)Minimum value
max(expr)Maximum value
approx_count_distinct(expr)Approximate distinct count

String Functions

FunctionDescription
lower(str)Convert to lowercase
upper(str)Convert to uppercase
trim(str)Remove leading/trailing whitespace
concat(a, b, ...)Concatenate strings
substr(str, start, len)Substring extraction
length(str)String length
replace(str, from, to)Replace occurrences

Math Functions

FunctionDescription
abs(n)Absolute value
round(n, d)Round to d decimal places
floor(n)Round down
ceil(n)Round up
mod(n, m)Modulo
sqrt(n)Square root
pow(n, e)Raise to power

Time Functions

FunctionDescription
now()Current timestamp
ago(duration)Timestamp relative to now (e.g., ago(30d), ago(1h))
date_trunc(unit, ts)Truncate timestamp to unit ('day', 'hour', 'week', 'month')
date_diff(unit, a, b)Difference between two timestamps
to_timestamp(expr)Cast to timestamp
extract(field from ts)Extract field from timestamp (year, month, day, etc.)
find event "ccm:unified_table"
| filter startTime >= ago(30d)
| select {
date_trunc('day', startTime) as day,
sum(cost) as daily_cost
}
| group_by date_trunc('day', startTime)

Conditional Expressions (CASE WHEN)

find entity "pipeline:pipeline_execution"
| select {
pipeline_id,
case
when status = "SUCCESS" then "passed"
when status = "FAILED" then "failed"
else "other"
end as result
}

-- Inline conditional aggregation
find entity "pipeline:pipeline_execution"
| filter start_time >= ago(30d)
| select {
pipeline_id,
count() as total,
sum(case when is_successful = true then 1 else 0 end) as successes,
case
when count() > 0
then (sum(case when is_successful = true then 1 else 0 end) * 100.0 / count())
else 0
end as success_rate
}
| group_by pipeline_id

Cast and Interval Expressions

-- Cast
find event "ccm:unified_table"
| select { cast(cost as double) as cost_float }

-- Interval arithmetic
find entity "pipeline:pipeline_execution"
| filter start_time >= now() - interval '7' day

Common Table Expressions (CTEs)

CTEs let you define named subqueries that can be referenced in the main query. They are useful for breaking down complex queries, reusing subqueries, and improving readability. CTEs cannot be nested (no CTEs inside CTE definitions).

with high_duration_spans as (
find event span
| filter duration > 1000
),
error_logs as (
find event log l
| filter l.level = "ERROR"
)
find high_duration_spans
| select { id, duration }
-- CTE with full operations
with recent_failures as (
find entity "pipeline:pipeline_execution"
| filter is_failed = true
| filter start_time >= ago(7d)
| select { pipeline_id, error_message }
| order_by start_time desc
| limit 100
)
find recent_failures
| select { pipeline_id, count() as failure_count }
| group_by pipeline_id

Joins

HQL supports joining CTEs or direct data sources with inner (default), left, right, and full join types.

with recent_pipelines as (
find entity "pipeline:pipeline_execution"
| filter start_time >= ago(30d)
)
find entity artifact a
| left join recent_pipelines rp on a.pipeline_id = rp.pipeline_id
| select { a.id, rp.status }

-- Multiple join conditions
find entity "pipeline:pipeline_execution" p
| join entity artifact a on p.artifact_id = a.id and p.account_id = a.account_id
| select { p.pipeline_id, a.tag }

-- Complex join with multiple CTEs
with pipeline_exec as (
find entity "pipeline:pipeline_execution" pes
| filter pes.is_deleted = false
| filter pes.start_time >= ago(30d)
),
artifact_scans as (
find entity "sto:scan" s
| filter s.status = "ACTIVE"
)
find pipeline_exec
| join artifact_scans on pipeline_exec.pipeline_id = artifact_scans.pipeline_id
| select {
pipeline_exec.pipeline_id,
pipeline_exec.status,
artifact_scans.scan_id,
artifact_scans.scan_type
}
| order_by pipeline_exec.start_time desc

Examples

Cost Analysis by Region

find event "ccm:unified_table"
| filter startTime >= ago(30d)
| select {
region,
cloudProvider,
sum(awsBlendedCost) as total_cost,
count() as record_count
}
| group_by region, cloudProvider
| order_by total_cost desc
| limit 20

Pipeline Execution Statistics

find entity "pipeline:pipeline_execution"
| filter start_time >= ago(30d)
| select {
date_trunc('day', start_time) as day,
status,
count() as execution_count,
count(case when is_successful = true then 1 end) as successful_count
}
| group_by date_trunc('day', start_time), status
| order_by day desc, execution_count desc

Top API Endpoints by Latency

find event span
| filter span_kind = "server"
| filter startTime >= ago(24h)
| select {
name as endpoint,
avg(duration) as avg_latency_ms,
max(duration) as max_latency_ms,
count() as request_count
}
| group_by name
| order_by avg_latency_ms desc
| limit 20

Conditional Aggregation (Success Rate)

find entity "pipeline:pipeline_execution"
| filter start_time >= ago(30d)
| select {
pipeline_id,
count() as total,
case
when count() > 0
then (sum(is_successful) * 100.0 / count())
else 0
end as success_rate
}
| group_by pipeline_id
| order_by success_rate asc

Nested Field Access

find entity event e
| select {
e.tags->env,
e.metadata->labels->app,
e.user->profile->name
}
| filter e.metadata->tags->env = "production"
| limit 100

Time-Series Analysis

find event "ccm:unified_table"
| filter startTime >= ago(90d)
| select {
date_trunc('week', startTime) as week,
cloudProvider,
sum(awsBlendedCost) as weekly_cost,
avg(awsBlendedCost) as avg_daily_cost
}
| group_by date_trunc('week', startTime), cloudProvider
| order_by week desc, weekly_cost desc

Error Rate with CTEs

with error_events as (
find event logs
| filter level = "ERROR"
| filter timestamp >= ago(24h)
),
all_events as (
find event logs
| filter timestamp >= ago(24h)
)
find error_events
| select {
date_trunc('hour', timestamp) as hour,
count() as error_count
}
| group_by date_trunc('hour', timestamp)
| order_by hour desc

Best Practices

1. Use Aliases for Clarity

-- Good: aliased source makes join references unambiguous
find entity "pipeline:pipeline_execution" p
| join entity artifact a on p.artifact_id = a.id
| select { p.pipeline_id, a.tag, p.status }

2. Filter Early

Apply filters as early as possible to reduce data processing.

-- More efficient — filter before aggregation
find event "ccm:unified_table"
| filter startTime >= ago(30d)
| select { region, sum(cost) }
| group_by region

-- Less efficient — filter after aggregation
find event "ccm:unified_table"
| select { region, sum(cost) }
| group_by region
| filter sum(cost) > 1000

3. Use CTEs for Complex Queries

with recent_failures as (
find entity "pipeline:pipeline_execution"
| filter is_failed = true
| filter start_time >= ago(7d)
),
failure_summary as (
find recent_failures
| select { pipeline_id, count() as count }
| group_by pipeline_id
)
find failure_summary
| order_by count desc
| limit 10

4. Group By All Non-Aggregated Fields

All fields in select that are not wrapped in an aggregation function must appear in group_by.

-- Correct
find event "ccm:unified_table"
| select { region, cloudProvider, sum(cost) as total }
| group_by region, cloudProvider

-- Incorrect — cloudProvider missing from group_by
find event "ccm:unified_table"
| select { region, cloudProvider, sum(cost) as total }
| group_by region

5. Quote Type Identifiers with Special Characters

-- Unquoted (simple names only)
find event span

-- Quoted (required when name contains colons or spaces)
find entity "ccm:unified_table"
find entity "pipeline:pipeline_execution"

6. Always Limit Result Sets

Use limit for queries that might return large result sets to avoid performance issues.

find entity "pipeline:pipeline_execution"
| filter start_time >= ago(7d)
| order_by start_time desc
| limit 500

Reference

Operation Precedence

Operations are applied in the order they appear in the query.

OrderOperation
1find (data source)
2filter
3select
4group_by
5aggregate
6order_by
7limit / offset

Expression Precedence

PriorityOperator
1 (highest)Parentheses ()
2Multiplicative * / %
3Additive + -
4Comparison = != > >= < <=
5Logical AND
6 (lowest)Logical OR

Data Type Mapping

HQL TypePostgreSQLStarRocks
stringVARCHAR / TEXTVARCHAR
intINTEGERINT
longBIGINTBIGINT
doubleDOUBLE PRECISIONDOUBLE
boolBOOLEANBOOLEAN
timestampTIMESTAMPDATETIME

Reserved Keywords

The following keywords are reserved but can be used as field names in qualified references (e.g., alias.select).

find, filter, select, group_by, aggregate, order_by, limit, offset, distinct, join, left, right, inner, full, on, with, as, and, or, not, in, is, null, true, false, case, when, then, else, end, cast, asc, desc, contains, by

SQL Dialect Translation

HQL queries are automatically translated to the appropriate SQL dialect based on the data source: StarRocks (MySQL-compatible), AlloyDB (PostgreSQL-compatible), PostgreSQL, BigQuery, and MySQL. The query engine selects the correct dialect automatically.