Skip to content

docs: add untaint controller documentation to CNI setup page#17190

Open
nagendrareddy10 wants to merge 18 commits intoistio:masterfrom
nagendrareddy10:fix/untaint-controller-docs
Open

docs: add untaint controller documentation to CNI setup page#17190
nagendrareddy10 wants to merge 18 commits intoistio:masterfrom
nagendrareddy10:fix/untaint-controller-docs

Conversation

@nagendrareddy10
Copy link
Copy Markdown
Member

@nagendrareddy10 nagendrareddy10 commented Feb 27, 2026

Description

Adds documentation for the Untaint Controller feature (added in istio/istio#48818) to the CNI node agent installation guide.

The untaint controller prevents a race condition where pods can be scheduled on new nodes (e.g., in autoscaler environments like Karpenter) before istio-cni is ready. It works by having the infrastructure provider place a NoSchedule taint (cni.istio.io/not-ready) on new nodes, and the untaint controller automatically removes it once the CNI agent reports ready.

This addresses active user confusion — the feature exists but has no documentation, leading to issues for users with Job pods and node autoscalers (see issue comments like this one).

Changes

Added a new "Untaint controller" subsection under "Race condition & mitigation" in content/en/docs/setup/additional-setup/cni/index.md, documenting:

  • What the untaint controller does and when to use it
  • The responsibility of the cluster operator/owner to ensure new nodes are tainted by their infrastructure provider
  • How to enable via IstioOperator and Helm
  • Full configuration reference (values.pilot.taint.enabled and PILOT_ENABLE_NODE_UNTAINT_CONTROLLERS)
  • Relationship to the existing repair mechanism

Reviewers

  • Ambient
  • Docs
  • Installation
  • Networking
  • Performance and Scalability
  • Extensions and Telemetry
  • Security
  • Test and Release
  • User Experience
  • Developer Infrastructure
  • Localization/Translation

Fixes: #15003

Adds a new 'Untaint controller' section to the CNI node agent installation
guide, documenting the feature added in istio/istio#48818.

The untaint controller addresses a race condition where pods can be scheduled
on new nodes before istio-cni is ready, by placing a NoExecute taint
(cni.istio.io/not-ready) on new nodes and removing it once the CNI agent
is ready.

Documents:
- What the untaint controller does and when to use it
- How to enable via IstioOperator and Helm
- Full configuration reference (taint.enabled, PILOT_ENABLE_NODE_UNTAINT_CONTROLLERS, taint.namespace)
- Relationship to the existing repair mechanism

Fixes: istio#15003
@nagendrareddy10 nagendrareddy10 requested a review from a team as a code owner February 27, 2026 08:27
@istio-testing istio-testing added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-ok-to-test labels Feb 27, 2026
@istio-testing
Copy link
Copy Markdown
Contributor

Hi @nagendrareddy10. Thanks for your PR.

I'm waiting for a istio member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Copy Markdown
Member

@dhawton dhawton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few breaking changes in here, such as the removal of spaces in the YAMLs.

…controller PR

Per @dhawton's CHANGES_REQUESTED review: the PR inadvertently reformatted
several existing sections, removing indentation from YAML code blocks and
changing list formatting.

Fixes:
- Restore frontmatter 'aliases:' to 4-space indent (was accidentally changed to 2-space)
- Restore prerequisite list markers from '-' back to '*' (original format)
- Restore 'spec:' child indentation in 3 existing IstioOperator YAML blocks:
  * cni_agent_operator_install (components/cni/namespace/enabled)
  * Handling init container injection for revisions (revision/values/pilot/cni)
  * Canary upgrade IstioOperator (profile/components/cni/values), including
    fixing the broken 'excludeNamespaces: - istio-system' back to proper YAML list
- Fix indentation in new untaint controller IstioOperator YAML example

The new 'Untaint controller' subsection content is unchanged.
Copy link
Copy Markdown
Member

@Arhell Arhell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ok-to-test

@istio-testing istio-testing added ok-to-test Set this label allow normal testing to take place for a PR not submitted by an Istio org member. and removed needs-ok-to-test labels Feb 28, 2026
Two build failures addressed:

1. lint_istio.io (MD004): The 'Additional configuration' bullet list used
   '-' markers but the surrounding document uses '*'. Changed back to '*'
   to maintain consistent unordered list style per MD004 rule.

2. gencheck_istio.io: The new IstioOperator YAML code block in the untaint
   controller section was missing 'snip_id=none', causing the snip
   generator to create a new snip in snips.sh that was not committed.
   Added 'snip_id=none' to exclude it from automatic snip generation,
   as this YAML block is illustrative (not a test snippet).
@istio-testing istio-testing added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 28, 2026
…x error

Hugo shortcodes cannot mix positional and named parameters.
'{{< text yaml snip_id=none >}}' mixes positional ('yaml') and
named ('snip_id=none'), causing a Hugo build failure:
  'got named parameter snip_id. Cannot mix named and positional parameters'

Fix by using the named form: '{{< text syntax=yaml snip_id=none >}}'
mdspell does not recognise 'untaint' as a valid word, causing 13
spelling errors in the new untaint controller documentation section.
Added 'untaint' to the .spelling whitelist in sorted order.
Two .spelling issues:
1. 'untaint' was placed after 'untar' but alphabetically 'untai' < 'untar'
   ('i' < 'r'), so 'untaint' must come before 'untar'. The gencheck CI
   auto-sorts the file and detected the wrong order.

2. 'Karpenter' (the node autoscaler referenced in the untaint controller
   docs) was missing from the dictionary, causing 2 spelling errors in
   lint. Added in correct sorted position (after Karma's, before katacoda).
Comment thread content/en/docs/setup/additional-setup/cni/index.md Outdated
Comment thread content/en/docs/setup/additional-setup/cni/index.md Outdated
Comment thread content/en/docs/setup/additional-setup/cni/index.md Outdated
Copy link
Copy Markdown
Contributor

@sridhargaddam sridhargaddam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit message still mentions that untaint controller will add the taint. Please update the commit message as well.

Comment thread content/en/docs/setup/additional-setup/cni/index.md Outdated
Copy link
Copy Markdown
Contributor

@sridhargaddam sridhargaddam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for addressing the review comments.

@nagendrareddy10 nagendrareddy10 added the auto-merge Set this label on a PR to auto-merge it on success of presubmit tests label Mar 13, 2026
@dhawton dhawton removed the auto-merge Set this label on a PR to auto-merge it on success of presubmit tests label Mar 13, 2026
@nagendrareddy10 nagendrareddy10 added the auto-merge Set this label on a PR to auto-merge it on success of presubmit tests label Mar 13, 2026
Comment thread content/en/docs/setup/additional-setup/cni/index.md Outdated
Comment thread content/en/docs/setup/additional-setup/cni/index.md Outdated
Comment thread content/en/docs/setup/additional-setup/cni/index.md Outdated
Comment thread content/en/docs/setup/additional-setup/cni/index.md Outdated
Comment thread content/en/docs/setup/additional-setup/cni/index.md Outdated
Comment thread content/en/docs/setup/additional-setup/cni/index.md Outdated
Comment thread .spelling Outdated
@dhawton dhawton removed the auto-merge Set this label on a PR to auto-merge it on success of presubmit tests label Mar 13, 2026
@dhawton
Copy link
Copy Markdown
Member

dhawton commented Mar 13, 2026

Do not set the auto-merge label. It doesn't work because it's only for PRs created by our automation, but do not set the auto-merge label. Your PR will not merge until approved by a Docs maintainer.

nagendrareddy10 and others added 7 commits March 13, 2026 17:44
Co-authored-by: Daniel Hawton <daniel.hawton@solo.io>
Co-authored-by: Daniel Hawton <daniel.hawton@solo.io>
Co-authored-by: Daniel Hawton <daniel.hawton@solo.io>
Co-authored-by: Daniel Hawton <daniel.hawton@solo.io>
Co-authored-by: Daniel Hawton <daniel.hawton@solo.io>
Co-authored-by: Daniel Hawton <daniel.hawton@solo.io>
Co-authored-by: Daniel Hawton <daniel.hawton@solo.io>
@nagendrareddy10 nagendrareddy10 requested a review from dhawton March 13, 2026 12:18
@istio-testing istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label Mar 20, 2026
@istio-testing istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Mar 20, 2026
@nagendrareddy10
Copy link
Copy Markdown
Member Author

@dhawton Addressed comments, Please review the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/docs ok-to-test Set this label allow normal testing to take place for a PR not submitted by an Istio org member. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Untaint controller needs documentation

6 participants