O&M monitoring

Help users solve problems in product operation and maintenance, data migration, alerting, monitoring.

How is License calculated?

Licenses on the platform are divided into two categories according to the usage scenarios, private cloud licenses and cloud management licenses.

Private cloud licenses are calculated based on the number of CPUs in the host, and cloud managed licenses are calculated based on the number of servers.

  • Number of CPUs: The sum of the number of CPUs (number of sockets) of the servers in the enabled state in the infrastructure, such as a four-way x86 server with two CPUs, the server is enabled in the infrastructure, and the number of Licenses used is 2.
  • Number of servers: The sum of the number of servers of each public cloud platform managed by the cloud management platform.

Host service issues

After the Host service installation is completed, the Host service is disabled by default and needs to be enabled for use. The host enablement method is as follows.

  • Enabling the host in the list of hosts in the cloud management platform.

  • Enabling the host using the climc command in the control node.

    $ climc host-enable id
    

Why does the Host service become offline?

The region’s HostPingDetectionTask sets host services that have not received a ping for more than 3 minutes to offline and sets the server status on the host to unknown.

The host service fails to start and the error “Fail to get network info: no networks” is reported.

The problem is usually caused by not registering a network for the host, you need to create an IP subnet for the host in the cloud management platform or use the Climc command to create a network in the control node.

$ climc network-create bcast0 host02 10.168.222.226 10.168.222.226 24 --gateway 10.168.222.1

How to solve the problem of qemu version mismatch?

  • Phenomenon: During use, when starting a server, you may encounter an error message similar to this.

    uses a qcow2 feature which is not supported by this qemu version: QCOW version 3
    
  • Reason: The main reason is that the qcow version is inconsistent. The original qcow2 creation version used a newer version of qemu-img, but now it is created using an older version, which is not supported by the newer version.

  • Solution: Perform a compatibility conversion inside the qemu-img with the newer version, for example by executing the following command, and after the conversion is complete, re-add the image.

    $ qemu-img convert -o compat=0.10 -f qcow2 -O qcow2 centos6-cloud-init.qcow2 centos-st-ssh-key.qcow2
    

How to migrate servers from other KVM platforms to the system?

  1. Export the image of the server (.qcow2 file) via libvirt.
  2. Upload the image to an http server.
  3. Import the image using OneCloud image server.
  4. Create a server from the image.
  5. If the original server has a cloud drive mounted, you can migrate it as follows.
    • Similar to the above operation, you need to generate an image of the data disk first, import it in the same way, OneCloud use this image to create a data cloud disk, and then just mount the cloud disk to the server.
    • First create a cloud drive of the same size in OneCloud , find the corresponding path, and copy the original cloud drive data directly to the new path. Find the corresponding path, copy the original cloud drive data directly to the new path, and then finally mount it to the server.

How to check if a server or host supports hardware virtualization?

The command egrep “vmx|svm” /proc/cpuinfo is executed in the terminal, and if there is an output, it means that hardware virtualization is supported. ESXi hosts and WIndows systems do not support this command.

How to automatically restore the server’s service after power up when the server room is ready to power down for maintenance?

When the server room needs to be powered off for maintenance, the OneCloud platform does not require any configuration to enable the server to be powered on and then the server service can be automatically restored.

How can I view the logs of common components in the system and common components?

After OneCloud containerized deployment, you can view the system components and component logs etc. in the OneCloud platform via Kubernetes related commands.

View component pods in action

System components are running as k8s pods, you can view the system components in OneCloud platform and their running status, etc. via the following commands.

# -n means namespace, currently our services are deployed under onecloud namespace, check the pods of all components
$ kubectl get pods -n onecloud 
# -o wide to see more details about the pods, such as which node they are running on
$ kubectl get pods -n onecloud -o wide
# View the details of the specified pod resource, e.g. view the details of the pod for the region component
$ kubectl describe pods -n onecloud default-region -759b4bff4c-hpmdd
# View information about all pods running on the specified host
$ kubectl get pods -n onecloud -o wide --field-selector=spec.nodeName=<host-name>

Restart the service

On a Kubernetes cluster, component pods are mostly managed through deployment, and new pods will be automatically rebuilt when they are deleted, so you can directly delete the corresponding component pods when restarting the component service.

# Restart web services, e.g. delete web front-end pods
$ kubectl delete pods $web_pod_name -n onecloud
# Restart the host service, e.g. delete all host pods
$ kubectl get pods -n onecloud -o wide | grep default-host | awk '{print #1}' | xargs kubectl delete pods -n onecloud
# restart all services, all services start with default
$ kubectl get pods -n onecloud | grep default | awk '{print $1}' | xargs kubectl delete pods -n onecloud

Update the service configuration and restart the service

All component services of OneCloud have a corresponding Configmaps file to store the service configuration. When the configuration information needs to be changed, you can update the service configuration and make it effective by following the steps below.

# Take the region service as an example to update its configmaps
$ kubectl edit configmaps default-region -n onecloud
# After the changes are done, delete the pods of the corresponding service to take effect
$ kubectl get pods -n onecloud |grep region
$ kubectl delete $region_pod_name -n onecloud

Viewing service logs

Take the region component as an example to introduce how to view the log information of the region component.

### First you need to find the pod where the region service is located
$ kubectl get pods -n onecloud |grep region
# View the logs of the region service container, where -f means follow, i.e. continuous log output, similar to journalctl's -f; --since 5m means view the log information for the last 5 minutes. Press CTRL+C to exit the log output
$ kubectl logs -n onecloud $region_pod_name -f --since 5m
# View the region container logs with all logs for the last 5 minutes to region.log
$ kubectl logs -n onecloud $region_pod_name --since 5m > region.log
# If some services have two containers, e.g. host service has containers named host and host-image, you need to add '-c' to specify which container's logs to view when viewing the container command
$ kubectl logs -n onecloud $host_pod_name -c host-image -f

Other common management commands

For more kubectl commands, please see the official kubectl documentation

View platform version information

## where onecloudcluster can be abbreviated to oc; default is the name of OneCloudCluster; -o yaml means output the API object of onecloudcluster type resources as yaml.
$ kubectl get onecloudcluster -n onecloud default -o yaml | grep version

View MySQL information

### View MySQL configuration connection new, where oc is onecloudcluster; default is the name of oc; grep -A 4 that belongs to the last 4 lines of data matched.
$ kubectl get oc -n onecloud default -o yaml | grep -A 4 mysql

View OC’s API object information

### View OC in action
$ kubectl get onecloudcluster -n onecloud
# View the OC's API object information in the form of a yaml file that contains all the cluster's configuration information.
$ kubeclt get oc -n onecloud -o yaml

Common administrative commands for the deployment management tool ocadm

The deployment management tool ocadm is similar to the kubeadm tool in a Kubernetes cluster.

## Create a cluster 
$ ocadm cluster create
# View cluster authentication information
$ ocadm cluster rcadmin
# Switch local image source to Alibaba Cloud image source
$ ocadm cluster update --image-repository registry.cn-beijing.aliyuncs.com/yunionio --wait
# Upgrade or rollback the product to the specified version, only when the system image source is Alibaba Cloud image source can use the following command
$ ocadm cluster update --version $version 
# Disable the host service of the node
$ ocadm node disable-host-agent --node $node_name 
# Enable the node's host service
$ ocadm node enable-host-agent --node $node_name 
# Disable the controller service of the node
$ ocadm node disable-onecloud-controller --node $node_name 
# Enable the controller service of the node
$ ocadm node enable-onecloud-controller --node $node_name 
# Disable Baremetal service
$ ocadm baremetal disable --node $node_name
# If you enable baremetal service on node1 host and listen on br0 NIC.
$ ocadm baremetal enable --node node1 --listen-interface br0
# Get the token information of the joined node at the First Node node
$ ocadm token create
# View token information at First Node
$ ocadm token list
# switch to the open source version of the front end, ce (community edition) for the open source version of the front end.
$ ocadm component web use-ce 
# switch to the commercial frontend, ee(enterprise edition) is the commercial frontend 
$ ocadm component web use-ce  
# Enable itsm component
$ ocadm component enable itsm
# Disable itsm component
$ ocadm component disable itsm
# Clean up the environment in case of installation failure, please use this command with caution
$ ocadm reset --force