View Node Monitoring Details
This section describes how to view the monitoring details of a single node.
Prerequisites
You need to have the platform-admin role on the KubeSphere platform. For more information, see Users and Platform Roles.
Steps
-
Log in to the KubeSphere web console with a user who has the platform-admin role.
-
In the upper right corner of the page, click the
icon and select WizTelemetry Observability Platform.
-
Click Global Monitoring > Nodes in the left navigation pane.
-
In the node list, click the name of a node to open its details page.
-
In the upper area of the details page, view the basic information of the current node.
-
Click Manage Node to go to the node details page.
-
Click
to hide this area.
-
-
Under the Overview tab of the details page, view the overview information of the node.
Area Description Node Health
Displays the node’s scheduling status, ready status, and network, memory, disk, and process pressure. If
is displayed, the status is normal. If
is displayed, an alert is present.
-
Scheduling Status: Whether the node is scheduled normally.
-
Ready Status: Whether the node is ready to receive pods.
-
Network Availability: Whether the node’s network configuration is correct.
-
Memory Pressure: Whether the node’s remaining memory is less than the threshold. The default threshold is 100 MiB.
-
Disk Pressure: Whether the node’s remaining disk space or number of inodes is less than the threshold. The default disk space threshold is 10% of the total disk space, and the default inode count threshold is 5% of the maximum number of inodes.
-
Process Pressure: Whether the number of processes that can be created on the node is less than the threshold. A newly installed KubeSphere cluster does not have a process count threshold set by default.
Real-time Resource Usage
The real-time usage and total amount of CPU, memory, and disk for the current node.
Click the corresponding area to view the real-time usage percentage of that resource.
Node Quota Statistics
The CPU quota, memory quota, and ephemeral storage quota for the current node, including reserved amount, limit amount, and total amount.
Pods
The number of various types of pods on the current node.
Pod status types include:
-
Pending: The pod has been accepted by the system, but at least one container has not been created or is not running. In this state, the pod may be waiting to be scheduled or waiting for the container image to be downloaded.
-
Running: The pod has been assigned to a node, all containers in the pod have been created, and at least one container is running, starting, or restarting.
-
Succeeded: All containers in the pod have terminated successfully (terminated with exit code 0) and will not be restarted.
-
Failed: All containers in the pod have terminated, and at least one container terminated with a non-zero exit code.
-
Unknown: The system is unable to retrieve the pod’s status. This state typically occurs due to communication failure between the system and the host where the pod is located.
Pod QoS (Quality of Service) types include:
-
Guaranteed: Every container in the pod has memory limits, memory requests, CPU limits, and CPU requests set, and the memory limit equals the memory request, and the CPU limit equals the CPU request.
-
Burstable: At least one container in the pod does not meet the requirements for the Guaranteed type.
-
BestEffort: The containers in the pod are not configured with any memory limits, memory requests, CPU limits, or CPU requests.
The QoS type of a pod determines its runtime priority. When system resources are insufficient to run all pods, the system prioritizes ensuring the operation of pods with a QoS type of Guaranteed, followed by pods with a QoS type of Burstable, and lastly, pods with a QoS type of BestEffort.
Number of Pods Terminated and Restarted Due to OOM: The number of pods that were forcibly terminated and automatically restarted by the system due to insufficient memory (Out Of Memory).
Number of Pending Pods: The number of pods that have been created but cannot start due to insufficient resources or scheduling issues.
Number of Restarted Pods: The number of pods that have been automatically restarted due to failures or configuration changes.
Kubelet Health Status
-
Pod Startup Latency: The time required for a pod to transition from creation to the running state.
-
PLEG Relist Duration: The time taken by Kubelet to periodically check container status (e.g., liveness/readiness).
-
Runtime Operator Duration: The time taken by the container runtime to perform operations (e.g., starting/stopping containers).
-
Storage Operator Duration: The time taken by Kubelet to handle storage-related operations (e.g., mounting volumes).
-
-
Click the Pods tab on the details page to view monitoring information for all pods on the node.
-
Click the drop-down list above the list to select the sort field and sort order.
-
Click the search box above the list and enter keywords to search for pods by name.
-
Click
in the upper right corner of the list to refresh the list information.
-
Click
in the upper right corner of the list to customize the information displayed in the list.
-
-
Click the Monitoring tab on the details page to view detailed information about monitoring metrics within a specified time range.
-
Click
in the upper right corner to set the time range.
-
Click
/
in the upper right corner to enable/disable real-time data refresh.
-
Click
in the upper right corner to refresh data manually.
-