Hello Timothé,
We have been working on a project CEEMS [1] since last few months
that can monitor CPU, Memory and Disk usage of VMs and projects.
Originally we started the project to be able to quantify energy
and carbon footprint of compute workloads for HPC platforms. Later
we extended it to support Openstack as well. It is effectively a
Promtheus exporter that exports different usage and performance
metrics of batch jobs and Openstack VMs.
We fetch CPU, memory and block disk usage stats directly from the
cgroups of the VMs. Exporter supports gathering node level energy
usage from either RAPL or BMC (IPMI/Redfish). We split the total
energy between different VMs based on their relative CPU usage.
For the emissions, exporter supports historical emission factors
and real time factors (from Electricity Maps [2] and RTE eCo2
[3]). The exporter also supports monitoring network activity (TCP,
UDP, IPv4/IPv6) and IO on network shares for each VM based on eBPF
[4]. Besides exporter, the stack ships an API server that can
store and update the aggregate usage metrics of VMs and projects.
A demo instance [5] is available to play around Grafana dashboards. More details on the stack can be consulted from docs [6]
Hope that helps!
Regards
Mahendra
[1] https://github.com/mahendrapaipuri/ceems
[2] https://app.electricitymaps.com/map/24h
[3] https://www.rte-france.com/en/eco2mix/co2-emissions
[4] https://ebpf.io/
[5] https://ceems-demo.myaddr.tools
[6] https://mahendrapaipuri.github.io/ceems/
Hello Uday,
Thank you very much for your reply and inputs, I will look into it.
I might have given too few informations regarding our setup, as we went with RedHat for this implementation of RHOSP, everything were deployed with TripleO and is containerised inside podman.Our control nodes are running pacemaker for bundling galera, rabbitmq and haproxy.
If anyone would like to share what they are using and how, with similar setup or not, feel free to do so.
Best regards,
De : Uday Dikshit <udaydikshit2007@gmail.com>
Envoyé : mardi 10 décembre 2024 20:03
À : Timothé Bauge <Timothe.Bauge@covage.com>
Cc : openstack-discuss@lists.openstack.org <openstack-discuss@lists.openstack.org>
Objet : Re: OpenStack metrics and logs
Vous n’obtenez pas souvent d’e-mail à partir de udaydikshit2007@gmail.com. Pourquoi c’est important
Hello Timothé
Since you are already using Grafana, you can add Openstack's mariadb Database connection as a data source which will help you collect basic information on the capacity of your cloud ecosystem.
Another add on you might like to have is benchmarking your openstack services as a part of regular health checks. For this Rally Openstack project can easily be utilised, it will help you capture service uptime and response time.
I hope this information might help you with your use case.
On Tue, Dec 10, 2024, 19:03 Timothé Bauge <Timothe.Bauge@covage.com> wrote:
Hello Stackers,
I would like to know what are you all using to monitor and supervise your Openstack clusters ?
We are in the process of setting up our own private cloud based on RedHat OpenStack Platform 17.1, we choose not to go with the RedHat Service Telemetry Platform, and were strongly advise against Ceilometer and aodh.
At the moment, we built our own stack based on Prometheus (with the plenty of exporters) for the metrics, Graylog + OpenSearch for the logs, and Grafana for the visualisation.
For now, we are only looking to retrieve basic information in order to know if a dysfunction occurs, but the end goal might be to go as far as to be able to count the cpu/mem/disk usage per hour per vm per project, etc.
So, what are you all using and how did you implement it ?
Best regards,