Hello Timothé,

We have been working on a project CEEMS [1] since last few months that can monitor CPU, Memory and Disk usage of VMs and projects. Originally we started the project to be able to quantify energy and carbon footprint of compute workloads for HPC platforms. Later we extended it to support Openstack as well. It is effectively a Promtheus exporter that exports different usage and performance metrics of batch jobs and Openstack VMs.

We fetch CPU, memory and block disk usage stats directly from the cgroups of the VMs. Exporter supports gathering node level energy usage from either RAPL or BMC (IPMI/Redfish). We split the total energy between different VMs based on their relative CPU usage. For the emissions, exporter supports historical emission factors and real time factors (from Electricity Maps [2] and RTE eCo2 [3]). The exporter also supports monitoring network activity (TCP, UDP, IPv4/IPv6) and IO on network shares for each VM based on eBPF [4]. Besides exporter, the stack ships an API server that can store and update the aggregate usage metrics of VMs and projects.

A demo instance [5] is available to play around Grafana dashboards. More details on the stack can be consulted from docs [6]

Hope that helps!

Regards

Mahendra

[1] https://github.com/mahendrapaipuri/ceems

[2] https://app.electricitymaps.com/map/24h

[3] https://www.rte-france.com/en/eco2mix/co2-emissions

[4] https://ebpf.io/

[5] https://ceems-demo.myaddr.tools

[6] https://mahendrapaipuri.github.io/ceems/

On 11/12/2024 11:23, Timothé Bauge wrote:

Hello Uday,

Thank you very much for your reply and inputs, I will look into it.

I might have given too few informations regarding our setup, as we went with RedHat for this implementation of RHOSP, everything were deployed with TripleO and is containerised inside podman.

Our control nodes are running pacemaker for bundling galera, rabbitmq and haproxy.

If anyone would like to share what they are using and how, with similar setup or not, feel free to do so.

Best regards,

Timothé BAUGÉ

timothe.bauge@covage.com

COVAGE I Wholesale B2B du Groupe Altitude

De : Uday Dikshit <udaydikshit2007@gmail.com>
Envoyé : mardi 10 décembre 2024 20:03
À : Timothé Bauge <Timothe.Bauge@covage.com>
Cc : openstack-discuss@lists.openstack.org <openstack-discuss@lists.openstack.org>
Objet : Re: OpenStack metrics and logs

Vous n’obtenez pas souvent d’e-mail à partir de udaydikshit2007@gmail.com. Pourquoi c’est important

Hello Timothé

Since you are already using Grafana, you can add Openstack's mariadb Database connection as a data source which will help you collect basic information on the capacity of your cloud ecosystem.
Another add on you might like to have is benchmarking your openstack services as a part of regular health checks. For this Rally Openstack project can easily be utilised, it will help you capture service uptime and response time.
I hope this information might help you with your use case.

On Tue, Dec 10, 2024, 19:03 Timothé Bauge <Timothe.Bauge@covage.com> wrote:

Hello Stackers,

I would like to know what are you all using to monitor and supervise your Openstack clusters ?

We are in the process of setting up our own private cloud based on RedHat OpenStack Platform 17.1, we choose not to go with the RedHat Service Telemetry Platform, and were strongly advise against Ceilometer and aodh.
At the moment, we built our own stack based on Prometheus (with the plenty of exporters) for the metrics, Graylog + OpenSearch for the logs, and Grafana for the visualisation.

For now, we are only looking to retrieve basic information in order to know if a dysfunction occurs, but the end goal might be to go as far as to be able to count the cpu/mem/disk usage per hour per vm per project, etc.

So, what are you all using and how did you implement it ?

Best regards,

Timothé BAUGÉ

timothe.bauge@covage.com

COVAGE I Wholesale B2B du Groupe Altitude