This section describes how to view the monitoring details of a single cluster.

Prerequisites

  • You need to have the platform-admin role on the KubeSphere platform. For more information, refer to Users and Platform Roles.

  • The host cluster and the member clusters to be monitored need to have WizTelemetry Global Monitoring enabled.

    Note

    If a member cluster does not have WizTelemetry Global Monitoring enabled, WizTelemetry Global Monitoring will not be able to retrieve data from that member cluster.

Steps

  1. Log in to the KubeSphere web console with a user who has the platform-admin role.

  2. In the upper right corner of the page, click the grid icon and select WizTelemetry Observability Platform.

  3. Click Global Monitoring > Clusters in the left navigation pane.

  4. In the cluster list, click the name of a cluster to open its details page.

  5. In the top area of the details page, view the basic information of the current cluster.

    • Click Manage Cluster to go to the management page of the current cluster.

    • Click chevron-down to hide this area.

  6. Under the Overview tab of the details page, view the overview information of the cluster.

    Area Description

    Resource Statistics

    The number and status of nodes in the current cluster, and the number of created workspaces, projects, Deployments, StatefulSets, DaemonSets, Jobs, Services, and pods.

    Real-time Resource Usage

    The real-time usage and total amount of CPU, memory, and disk for all nodes in the current cluster.

    Click the corresponding area to view the real-time usage percentage of that resource.

    Cluster Quota Statistics

    The CPU and memory quotas for containers and projects in the current cluster, including reserved amount, limit amount, and total amount.

    Cluster Alerts

    The number of alerts generated by global alert rule groups in the current cluster. Alerts displayed here do not include those generated by cluster and project alert rule groups. Global alert rule groups are managed by platform administrators in Global Alerting.

    Alert severity types include Info, Warning, Important, and Critical.

    Alert status types include:

    • Verifying: The monitoring metrics meet the preset conditions but have not yet satisfied the preset duration.

    • Triggered: The monitoring metrics meet the preset conditions and have satisfied the preset duration.

    Pods

    The number of various types of pods in the current cluster.

    Pod status types include:

    • Pending: The pod has been accepted by the system, but at least one container has not been created or is not running. In this state, the pod may be waiting to be scheduled or waiting for the container image to be downloaded.

    • Running: The pod has been assigned to a node, all containers in the pod have been created, and at least one container is running, starting, or restarting.

    • Succeeded: All containers in the pod have terminated successfully (terminated with exit code 0) and will not be restarted.

    • Failed: All containers in the pod have terminated, and at least one container terminated with a non-zero exit code.

    • Unknown: The system is unable to retrieve the pod’s status. This state typically occurs due to communication failure between the system and the host where the pod is located.

    Pod QoS (Quality of Service) types include:

    • Guaranteed: Every container in the pod has memory limits, memory requests, CPU limits, and CPU requests set, and the memory limit equals the memory request, and the CPU limit equals the CPU request.

    • Burstable: At least one container in the pod does not meet the requirements for the Guaranteed type.

    • BestEffort: The containers in the pod are not configured with any memory limits, memory requests, CPU limits, or CPU requests.

    The QoS type of a pod determines its runtime priority. When system resources are insufficient to run all pods, the system prioritizes ensuring the operation of pods with a QoS type of Guaranteed, followed by pods with a QoS type of Burstable, and lastly, pods with a QoS type of BestEffort.

    Number of Pods Terminated and Restarted Due to OOM: The number of pods that were forcibly terminated and automatically restarted by the system due to insufficient memory (Out Of Memory).

    Number of Pending Pods: The number of pods that have been created but cannot start due to insufficient resources or scheduling issues.

    Number of Restarted Pods: The number of pods that have been automatically restarted due to failures or configuration changes.

  7. Click the Nodes, Workspaces, or Projects tab on the details page to view monitoring information for all nodes, workspaces, or projects in the cluster.

    • Click the drop-down list above the list to select the sort field and sort order.

    • Click the search box above the list and enter a keyword to search for objects by name.

    • Click refresh in the upper right corner of the list to refresh the list information.

    • Click cogwheel in the upper right corner of the list to customize the information displayed in the list.

  8. Click the Monitoring tab on the details page to view detailed information of monitoring metrics within a specified time range.

    • Click timed-task in the upper right corner to set the time range.

    • Click start/pause in the upper right corner to enable/disable real-time data refresh.

    • Click refresh in the upper right corner to refresh data manually.