Skip to main content

ModelScan Step Configuration



The ModelScan step in Harness STO uses the open-source scanner ModelScan to scan your machine learning (ML) models for security vulnerabilities. You can perform ModelScan scans in both Orchestration and Ingestion modes. This document will guide you through configuring the ModelScan step in your STO pipeline.

info

Supported ML Libraries and Formats

The following table lists the ML libraries and serialization formats, along with their support status in the ModelScan step.

ML LibrarySerialization FormatSupport Status
PytorchPickle✅ Supported
KerasHD5 (Hierarchical Data Format)✅ Supported
Classic ML Libraries (Sklearn, XGBoost, etc.)Pickle, Cloudpickle, Dill, Joblib✅ Supported
TensorFlowProtocol Buffer❌ Not Supported
KerasKeras V3 (Hierarchical Data Format)❌ Not Supported

Scanning ML models in binary files is not supported. Your models must be in one of the supported formats listed above.

ModelScan step settings

The recommended workflow is to add a ModelScan step to a Security or Build stage and then configure it as described below.

Scan Mode

  • Orchestration mode: In this mode, the step executes the scan, then processes the results by normalizing and deduplicating them.
  • Ingestion mode: In this mode, the ModelScan step ingests scan results from a specified file. The scan results file must be in JSON format.

Scan Configuration

The predefined configuration to use for the scan. All scan steps have at least one configuration.

Target

Type

  • Repository Scan a codebase repo.

    In most cases, you specify the codebase using a code repo connector that connects to the Git account or repository where your code is stored. For information, go to Configure codebase.

You can also scan models stored in Hugging Face repositories by using the Harness GitHub connector, configured to connect to your Hugging Face account.

Target and variant detection

When Auto is enabled for code repositories, the step detects these values using git:

  • To detect the target, the step runs git config --get remote.origin.url.
  • To detect the variant, the step runs git rev-parse --abbrev-ref HEAD. The default assumption is that the HEAD branch is the one you want to scan.

Note the following:

  • Auto is not available when the Scan Mode is Ingestion.
  • By default, Auto is selected when you add the step. You can change this setting if needed.

Name

The identifier for the target, such as codebaseAlpha or jsmith/myalphaservice. Descriptive target names make it much easier to navigate your scan data in the STO UI.

It is good practice to specify a baseline for every target.

Variant

The identifier for the specific variant to scan. This is usually the branch name, image tag, or product version. Harness maintains a historical trend for each variant.

Workspace

The workspace path on the pod running the scan step. The workspace path is /harness by default.

You can override this if you want to scan only a subset of the workspace. For example, suppose the pipeline publishes artifacts to a subfolder /tmp/artifacts and you want to scan these artifacts only. In this case, you can specify the workspace path as /harness/tmp/artifacts.

Additionally, you can specify individual files to scan as well. For instance, if you only want to scan a specific file like /tmp/iac/infra.tf, you can specify the workspace path as /harness/tmp/iac/infra.tf

Ingestion File

The path to your scan results when running an Ingestion scan, for example /shared/scan_results/myscan.latest.sarif.

  • The data file must be in a supported format for the scanner.

  • The data file must be accessible to the scan step. It's good practice to save your results files to a shared path in your stage. In the visual editor, go to the stage where you're running the scan. Then go to Overview > Shared Paths. You can also add the path to the YAML stage definition like this:

        - stage:
    spec:
    sharedPaths:
    - /shared/scan_results

The ingestion file must be in JSON format.

Log Level

The minimum severity of the messages you want to include in your scan logs. You can specify one of the following:

  • DEBUG
  • INFO
  • WARNING
  • ERROR

Fail on Severity

Every STO scan step has a Fail on Severity setting. If the scan finds any vulnerability with the specified severity level or higher, the pipeline fails automatically. You can specify one of the following:

  • CRITICAL
  • HIGH
  • MEDIUM
  • LOW
  • INFO
  • NONE — Do not fail on severity

The YAML definition looks like this: fail_on_severity : critical # | high | medium | low | info | none

Additional Configuration

The fields under Additional Configuration vary based on the type of infrastructure. Depending on the infrastructure type selected, some fields may or may not appear in your settings. Below are the details for each field

Advanced settings

In the Advanced settings, you can use the following options:

Proxy settings

This step supports Harness Secure Connect if you're using Harness Cloud infrastructure. During the Secure Connect setup, the HTTPS_PROXY and HTTP_PROXY variables are automatically configured to route traffic through the secure tunnel. If there are specific addresses that you want to bypass the Secure Connect proxy, you can define those in the NO_PROXY variable. This can be configured in the Settings of your step.

If you need to configure a different proxy (not using Secure Connect), you can manually set the HTTPS_PROXY, HTTP_PROXY, and NO_PROXY variables in the Settings of your step.

Definitions of Proxy variables:

  • HTTPS_PROXY: Specify the proxy server for HTTPS requests, example https://sc.internal.harness.io:30000
  • HTTP_PROXY: Specify the proxy server for HTTP requests, example http://sc.internal.harness.io:30000
  • NO_PROXY: Specify the domains as comma-separated values that should bypass the proxy. This allows you to exclude certain traffic from being routed through the proxy.