github-idp-2
Multi-Repo Catalog Population Script (GitHub)
In large engineering organizations, service sprawl across hundreds or even thousands of GitHub repositories is common. Manually onboarding each service into the Harness Software Catalog — either by creating catalog-info YAMLs individually or configuring one catalog location per repository — quickly becomes unmanageable. Moreover, using the default discovery plugins to register each repository as a location can lead to fragility; a single failure during sync can prevent the entire catalog from updating correctly.
This script provides a scalable and GitOps-friendly solution by automating the end-to-end onboarding process using the Harness IDP 2.0 Entities API and Git Experience (GitX). It programmatically fetches all repositories from your GitHub organization, dynamically generates a valid idp.yaml for each service, and pushes that file into a centralized GitHub repository. Each YAML is committed to a unique path using the Git connector you've configured in Harness. This ensures that service metadata is not only standardized but also version-controlled in Git, promoting visibility and auditability.
Script Source
curl -o idp-catalog-population-multirepo-github.py https://raw.githubusercontent.com/harness-community/idp-samples/main/IDP-2.0-Samples/catalog-scripts/idp-catalog-population-multirepo-github.py
Pre-Requisites
- Python 3 with
requestsandpython-dotenvlibraries installed. - A
.envfile configured with the following:
GITHUB_TOKEN = '<github-token>'
HARNESS_API_KEY = '<harness-api-key>'
HARNESS_ACCOUNT_ID = '<harness-account-id>'
CONNECTOR_REF = '<harness-git-connector-ref>'
ORG_IDENTIFIER = '<harness-org-id>'
PROJECT_IDENTIFIER = '<harness-project-id>'
CENTRAL_REPO = '<name-of-central-repo-to-store-yamls>'
GITHUB_ORG = '<github-org-name>'
GITHUB_TOKENmust haverepoandread:orgpermissions.HARNESS_API_KEYmust be a User/API Key with write access to IDP entities.
Execution
After creating your .env file, run the script:
python3 idp-catalog-population-multirepo-github.py
This will:
-
Fetch all repositories from your GitHub org.
-
For each repo:
- Sanitize the identifier.
- Generate a valid
idp.yaml. - Push it to the specified folder in
CENTRAL_REPO. - Register the entity in Harness using the Entities API.
Output Structure
The catalog YAML files will be stored in the following pattern inside your central GitHub repository:
central-repo/
├── service-one/
│ └── idp.yaml
├── service-two/
│ └── idp.yaml
└── ...
Each YAML will look like:
apiVersion: harness.io/v1
kind: component
orgIdentifier: <your-org>
projectIdentifier: <your-project>
type: Service
identifier: sanitized_unique_id
name: repo-name
owner: group:account/IDP_Test
spec:
lifecycle: production
metadata:
description: "repo description from GitHub"
annotations:
backstage.io/source-location: url:https://github.com/<your-org>/<repo-name>
backstage.io/techdocs-ref: dir:.
tags:
- auto-onboarded
Logs & Troubleshooting
- Script output includes a status message for each repo (success/failure).
- Failures are logged with full error messages from the Harness API.
- For personal GitHub accounts, change the GitHub API URL from:
https://api.github.com/orgs/{GITHUB_ORG}/repos
to:
https://api.github.com/users/{GITHUB_ORG}/repos
If you're interested to try out different request to work with entities, you can use the new Entities APIs.