Import from GitHub
Multi-Repo catalog population script (GitHub)
In large engineering organizations, service sprawl across hundreds or even thousands of GitHub repositories is common. Manually onboarding each service into the Harness Software Catalog, either by creating catalog-info YAMLs individually or configuring one catalog location per repository, quickly becomes unmanageable. Moreover, using the default discovery plugins to register each repository as a location can lead to fragility; a single failure during sync can prevent the entire catalog from updating correctly.
This script provides a scalable and GitOps-friendly solution by automating the end-to-end onboarding process using the Harness IDP 2.0 Entities API and Git Experience (GitX). It programmatically fetches all repositories from your GitHub organization, dynamically generates a valid idp.yaml for each service, and pushes that file into a centralized GitHub repository. Each YAML is committed to a unique path using the Git connector you have configured in Harness. This ensures that service metadata is not only standardized but also version-controlled in Git, promoting visibility and auditability.
Script source
curl -o idp-catalog-population-multirepo-github.py https://raw.githubusercontent.com/harness-community/idp-samples/main/IDP-2.0-Samples/catalog-scripts/idp-catalog-population-multirepo-github.py
Before you begin
- Python 3 with
requestsandpython-dotenvlibraries installed. - A
.envfile configured with the following:
GITHUB_TOKEN = '<github-token>'
HARNESS_API_KEY = '<harness-api-key>'
HARNESS_ACCOUNT_ID = '<harness-account-id>'
CONNECTOR_REF = '<harness-git-connector-ref>'
ORG_IDENTIFIER = '<harness-org-id>'
PROJECT_IDENTIFIER = '<harness-project-id>'
CENTRAL_REPO = '<name-of-central-repo-to-store-yamls>'
GITHUB_ORG = '<github-org-name>'
GITHUB_TOKENmust haverepoandread:orgpermissions.HARNESS_API_KEYmust be a User/API Key with write access to IDP entities.
Execution
After creating your .env file, run the script:
python3 idp-catalog-population-multirepo-github.py
This will:
-
Fetch all repositories from your GitHub org.
-
For each repo:
- Sanitize the identifier.
- Generate a valid
idp.yaml. - Push it to the specified folder in
CENTRAL_REPO. - Register the entity in Harness using the Entities API.
Output structure
The catalog YAML files will be stored in the following pattern inside your central GitHub repository:
central-repo/
├── service-one/
│ └── idp.yaml
├── service-two/
│ └── idp.yaml
└── ...
Each YAML will look like:
apiVersion: harness.io/v1
kind: component
orgIdentifier: <your-org>
projectIdentifier: <your-project>
type: Service
identifier: sanitized_unique_id
name: repo-name
owner: group:account/IDP_Test
spec:
lifecycle: production
metadata:
description: "repo description from GitHub"
annotations:
backstage.io/source-location: url:https://github.com/<your-org>/<repo-name>
backstage.io/techdocs-ref: dir:.
tags:
- auto-onboarded
Logs & troubleshooting
- Script output includes a status message for each repo (success/failure).
- Failures are logged with full error messages from the Harness API.
- For personal GitHub accounts, change the GitHub API URL from:
https://api.github.com/orgs/{GITHUB_ORG}/repos
to:
https://api.github.com/users/{GITHUB_ORG}/repos
If you are interested to try out different request to work with entities, you can use the new Entities APIs.