ModelScan Step Configuration
The ModelScan step in Harness STO uses the open-source scanner ModelScan to scan your machine learning (ML) models for security vulnerabilities. You can perform ModelScan scans in both Orchestration and Ingestion modes. This document will guide you through configuring the ModelScan step in your STO pipeline.
- To run scans as a non-root user, you can use custom STO scan images and pipelines. See Configure your pipeline to use STO images from private registry.
- STO supports multiple workflows for loading self-signed certificates. See Run STO scans with custom SSL certificates.
Supported ML Libraries and Formats
The following table lists the ML libraries and serialization formats, along with their support status in the ModelScan step.
ML Library | Serialization Format | Support Status |
---|---|---|
Pytorch | Pickle | ✅ Supported |
Keras | HD5 (Hierarchical Data Format) | ✅ Supported |
Classic ML Libraries (Sklearn, XGBoost, etc.) | Pickle, Cloudpickle, Dill, Joblib | ✅ Supported |
TensorFlow | Protocol Buffer | ❌ Not Supported |
Keras | Keras V3 (Hierarchical Data Format) | ❌ Not Supported |
Scanning ML models in binary files is not supported. Your models must be in one of the supported formats listed above.
ModelScan step settings
The recommended workflow is to add a ModelScan step to a Security or Build stage and then configure it as described below.
Scan Mode
- Orchestration mode: In this mode, the step executes the scan, then processes the results by normalizing and deduplicating them.
- Ingestion mode: In this mode, the ModelScan step ingests scan results from a specified file. The scan results file must be in JSON format.
Scan Configuration
The predefined configuration to use for the scan. All scan steps have at least one configuration.
Target
Type
-
Repository Scan a codebase repo.
In most cases, you specify the codebase using a code repo connector that connects to the Git account or repository where your code is stored. For information, go to Configure codebase.
You can also scan models stored in Hugging Face repositories by using the Harness GitHub connector, configured to connect to your Hugging Face account.
Target and variant detection
When Auto is enabled for code repositories, the step detects these values using git
:
- To detect the target, the step runs
git config --get remote.origin.url
. - To detect the variant, the step runs
git rev-parse --abbrev-ref HEAD
. The default assumption is that theHEAD
branch is the one you want to scan.
Note the following:
- Auto is not available when the Scan Mode is Ingestion.
- By default, Auto is selected when you add the step. You can change this setting if needed.
Name
The identifier for the target, such as codebaseAlpha
or jsmith/myalphaservice
. Descriptive target names make it much easier to navigate your scan data in the STO UI.
It is good practice to specify a baseline for every target.
Variant
The identifier for the specific variant to scan. This is usually the branch name, image tag, or product version. Harness maintains a historical trend for each variant.
Workspace
The workspace path on the pod running the scan step. The workspace path is /harness
by default.
You can override this if you want to scan only a subset of the workspace. For example, suppose the pipeline publishes artifacts to a subfolder /tmp/artifacts
and you want to scan these artifacts only. In this case, you can specify the workspace path as /harness/tmp/artifacts
.
Additionally, you can specify individual files to scan as well. For instance, if you only want to scan a specific file like /tmp/iac/infra.tf
, you can specify the workspace path as /harness/tmp/iac/infra.tf
Ingestion File
The path to your scan results when running an Ingestion scan, for example /shared/scan_results/myscan.latest.sarif
.
-
The data file must be in a supported format for the scanner.
-
The data file must be accessible to the scan step. It's good practice to save your results files to a shared path in your stage. In the visual editor, go to the stage where you're running the scan. Then go to Overview > Shared Paths. You can also add the path to the YAML stage definition like this:
- stage:
spec:
sharedPaths:
- /shared/scan_results
The ingestion file must be in JSON
format.
Log Level
The minimum severity of the messages you want to include in your scan logs. You can specify one of the following:
- DEBUG
- INFO
- WARNING
- ERROR
Fail on Severity
Every STO scan step has a Fail on Severity setting. If the scan finds any vulnerability with the specified severity level or higher, the pipeline fails automatically. You can specify one of the following:
CRITICAL
HIGH
MEDIUM
LOW
INFO
NONE
— Do not fail on severity
The YAML definition looks like this: fail_on_severity : critical # | high | medium | low | info | none
Additional Configuration
The fields under Additional Configuration vary based on the type of infrastructure. Depending on the infrastructure type selected, some fields may or may not appear in your settings. Below are the details for each field
- Override Security Test Image
- Privileged
- Image Pull Policy
- Run as User
- Set Container Resources
- Timeout
Advanced settings
In the Advanced settings, you can use the following options:
Proxy settings
This step supports Harness Secure Connect if you're using Harness Cloud infrastructure. During the Secure Connect setup, the HTTPS_PROXY
and HTTP_PROXY
variables are automatically configured to route traffic through the secure tunnel. If there are specific addresses that you want to bypass the Secure Connect proxy, you can define those in the NO_PROXY
variable. This can be configured in the Settings of your step.
If you need to configure a different proxy (not using Secure Connect), you can manually set the HTTPS_PROXY
, HTTP_PROXY
, and NO_PROXY
variables in the Settings of your step.
Definitions of Proxy variables:
HTTPS_PROXY
: Specify the proxy server for HTTPS requests, examplehttps://sc.internal.harness.io:30000
HTTP_PROXY
: Specify the proxy server for HTTP requests, examplehttp://sc.internal.harness.io:30000
NO_PROXY
: Specify the domains as comma-separated values that should bypass the proxy. This allows you to exclude certain traffic from being routed through the proxy.