Suggestions to Review before Performing Spinnaker Updates and Upgrades
The following is a list of articles and suggestions to review when exploring an upgrade path to a Spinnaker environment. These are general directions to help admins make their upgrade process as painless as possible, but it is by no means meant to be a final checklist of an upgrade process. It is intended for guidance for our customers to help mitigate some of the risks associated when seeking to upgrade the environment. The actual settings for upgrading Spinnaker is fairly straightforward, but ensuring the best experience for users may be more complicated depending on the environment structure and design. Some of the best practices listed are general best practices that may also apply to testing plugins and connectivity with external-to-Spinnaker resources. Review the Release Notes about the latest release. The Release Notes will contain information about known issues within the release from the OSS community and Armory testing at the time of the release and will be updated with discoveries post-release.
- For example, there is a requirement for Armory Cloud registration as of Armory 2.27.x, so administration teams should account for this registration as a part of the process
If customers are planning to jump significant versions (for example, from
2.20
to2.25
), the general guidance is to upgrade and test each major version upgrade. - For example, go through the UAT testing strategy from 2.20 to 2.21. Confirming, rolling out, and then providing a cooling-off period before the next upgrade from 2.21 to 2.22. * This strategy can help make the process easier to complete in the case of a necessary upgrade rollback. A note that backward compatibility is very dependent at times on how end-users are designing their pipelines. This strategy will pinpoint which version has a breaking issue for the environment. Administrators should check for Operator compatibility versions: https://docs.armory.io/armory-enterprise/release-notes/rn-armory-operator/
- As an example, if customers are looking to change their
Kubernetes
version, they will need to review the Armory Operator Release notes for compatibility, such as for Kubernetes versions >= 1.21, customers should be using operator versions >= 1.7 (https://docs.armory.io/armory-enterprise/release-notes/rn-armory-operator/armory-operator-v1-7-0/#highlighted-updates) It may be necessary to declare a date and time for a maintenance window for your teams. If there are running or paused pipelines, they may prevent Spinnaker from rolling over to the pods containing the new version of the service. - Admins should also consider reasonable termination grace periods and set pod disruption budget expectations with their teams.* Admins can check for running pipelines by Leverage Spinnaker API to Pull a List of Running Pipelines and then stop or cancel them from running.
- Administrators are recommended to manually restart the services BEFORE the upgrade to prevent the possibility of Pipelines being triggered by Stale Artifacts.* Administrators may also want to look at Quieting their Echo Service. This change can be helpful for administrators who wish to prevent pipelines from triggering during maintenance windows, such as on changes for Pipeline as Code PRs. * Customers are encouraged to back up their persistent storage and databases before performing any upgrade in case data needs to be restored to an earlier version.Plugin updates should also be reviewed when upgrading Spinnaker, as there may be changes relevant to new versions
- Teams should review upgrading Armory Scale Agent to coincide with the difference in the Spinnaker version. Please review the following documentation specific to Scale Agent and Agent Plugin versions:https://docs.armory.io/scale-agent/release-notes/agent-service/https://docs.armory.io/scale-agent/release-notes/agent-plugin/* Armory Plugins: https://docs.armory.io/armory-enterprise/plugin-guide/ If teams are using custom artifacts as the result of a custom patch, teams should review the overrides and check for their necessity
- For example, if customers have declared the following hotfix
artifactId: armory/clouddriver-armory:2022.04.14.18.47.56.release-2.26.x
, it may no longer be necessary when upgrading to 2.27.x or 2.28.x and may cause incompatibilities* Likewise, customizations within the hotfix may need to be accounted for in the version upgrade. Customers should review the release notes and BOM to see if the artifact is necessary or if a new hotfix is needed. There are several strategies for cutting over a Spinnaker Environment and performing the actual upgrade. - Depending on the complexity of the environment, Cloud Administrators may look at using a Blue-Green/Red-Black deployment strategy, an upgrade-in-place strategy, or other methodologies. To help plan for upgrades in more detail, we invite our customers to open a support case so that we can consult and understand the customer's needs for their upgrade.
Notes about creating a UAT environment for testing
Suppose your production environment is required to be highly available, or the team would like to ensure minimal disruption. In that case, it is suggested that teams explore employing a UAT testing strategy for Spinnaker and the environment's infrastructure. A UAT environment should replicate the production environment as much as possible to ensure that the Spinnaker Admin team can capture any breaking issues before updating the production environment. The following is a generalized strategy for creating and testing a UAT environment. Spin up a UAT environment with the current in-production version of Spinnaker, and test data
- The closer a UAT is to a Production Environment, the more accurate the data the Cloud Administration team can acquire Please note that your UAT should not share common backend database resources. (For example, Clouddriver, Orca, and Front50 backends). Please be careful whenever sharing resources and any implications it may have with overwriting/interfering with production environments
- If these resources are shared, deployment errors can occur as data is cached within the backends. This data is used to determine such things as deployment targets, etc. Cloud Administrators should consult their DBA about exporting/extracting data from the existing DB and then providing it to the UAT for replication of the test data* It is also suggested to be careful about sharing the
Permanent Storage
resources since changes in the testing environment would modify the pipeline/applications in the production environment. Cloud Administrators should consider making a copy of the data. - Please note that changes to security and credentials can often lead to unexpected account lockouts. Some common reasons are errors when transferring credential information or limits on the accounts policy, such as a policy on simultaneous sign-ins. Test and confirm the functionality of the test environment as a copy. The Cloud Admin and the developers should test the UAT environment to verify the environment functions as expected.
- Test typical and critical pipelines/applications and processes* Ensure the credentials of the users, availability of access, and restrictions of access are functional Upgrade the UAT Environment Spinnaker version to the newer version
- While upgrading, admins should test the upgrade steps along with the upgrade checklist developed by the team.* The team should note any issues during the upgrade or unexpected errors. Monitoring Spinnaker is a crucial aspect of running Spinnaker.
- The promotion test should also look for any elevated error rates or similar issues to pinpoint any possible unforeseen problems.* Stakeholders should test all critical pipelines in the UAT environment for functionality, access, and capability. This step should include integration testing of critical pipelines/workflows to validate no changes in behavior.* While the environment is tested, Spinnaker Admins should keep an eye on monitoring the Spinnaker environment for unexpected spikes in resources or issues in logs.
- Once the checklist has been confirmed, and a process is verified, apply the established procedure to production.* The Cloud Administration can release the UAT environment resources once testing is completed, but the environment should be kept until it is confirmed that rollback testing is no longer required.