Re: [telemetry] Team meeting agenda for tomorrow
On 5/8/19 7:12 AM, openstack-discuss-request@lists.openstack.org wrote:
Hello Trinh, Where does the meeting happen? Will it be via IRC Telemetry channel? Or, in the Etherpad (https://etherpad.openstack.org/p/telemetry-meeting-agenda)? I would like to discuss and understand a bit better the context behind the Telemetry events deprecation.
Unfortunately, I have a conflict at that time and will not be able to attend. I do have a little bit of context on the Events deprecation to share. First, you will note the commit message from the commit [0] when Events were deprecated: " Deprecate event subsystem This subsystem has never been finished and is not maintained. Deprecate it for future removal. " I got the impression from jd at the time that there were a number of features in Telemetry, including Panko, that were not really "finished" and that the engineers who had worked on them had moved on to other things, so the features had become unsupported. In late 2018 there was an effort to clean up things that were not well maintained or didn't fit the direction of Telemetry. See also: https://julien.danjou.info/lessons-from-openstack-telemetry-deflation/ Events is one feature that often gets requested, but the use cases and demand for it are not expressed strongly or well understood by most people. If the Telemetry project has demand to de-deprecate Event handling (including Panko), I'd suggest a review of the requirements for event handling and possibly choosing a champion for maintaining the Panko service. Also note: over in Monasca we have a spec [1] for handling Events ingestion which I hope we will be completing in Train. Contributions and comments welcome. :) joseph [0] https://github.com/openstack/ceilometer/commit/8a0245a5b3e1357d35ad6653be37c... [1] https://github.com/openstack/monasca-specs/blob/master/specs/stein/approved/...
On Wed, May 8, 2019 at 12:19 AM Trinh Nguyen<dangtrinhnt@gmail.com> wrote:
Hi team,
As planned, we will have a team meeting at 02:00 UTC, May 9th on #openstack-telemetry to discuss what we gonna do for the next milestone (Train-1) and continue what we left off from the last meeting.
I put here [1] the agenda thinking that it should be fine for an hour meeting. If you have anything to talk about, please put it there too.
[1]https://etherpad.openstack.org/p/telemetry-meeting-agenda
Bests,
-- *Trinh Nguyen* *www.edlab.xyz<https://www.edlab.xyz>*
Unfortunately, I have a conflict at that time and will not be able to attend.
I do have a little bit of context on the Events deprecation to share.
First, you will note the commit message from the commit [0] when Events were deprecated:
"
Deprecate event subsystem
This subsystem has never been finished and is not maintained. Deprecate it for future removal.
"
I got the impression from jd at the time that there were a number of features in Telemetry,
including Panko, that were not really "finished" and that the engineers who had worked on them
had moved on to other things, so the features had become unsupported. In late 2018 there was
an effort to clean up things that were not well maintained or didn't fit the direction of Telemetry.
See also: https://julien.danjou.info/lessons-from-openstack-telemetry-deflation/
Thanks for the reply Joseph, I have seen the commit message, and I also read the blog you referenced (and other pages related to the same topic) which made us a bit worried. I will try to explain our perspective and impressions when we read those blog pages. It is also worth noting that we have just started engaging with the OpenStack community (so, pardon my ignorance with some parts of OpenStack, and how this OpenSource community works). We are already making some contributions to Kolla-ansible, and we want to start to contribute back to Telemetry as well. Before getting to the topic of Telemetry, and to be more precise, Ceilometer, let me state that I have taken part in other OpenSource projects and communities before, but these communities are managed by a different organization. So, Ceilometer; when we were designing and building our OpenStack Cloud, where billing is a crucial part of it. Ceilometer was chosen because it fits our requirements, working "out of the box" to provide valuable data for billing in a high availability fashion. It for sure lacks some features, but that is ok when one works with OpenSource. The pollers and event managers we are missing, we would like to create and contribute back to the community. Having said that, what puzzled me, and worried us, is the fact that features that work are being removed from a project just because some contributors/committers left the community. There wasn't (at least I did not see) a good technical reason to remove this feature (e.g. it does not deliver what is promised, or an alternative solution has been created somewhere and effort is being concentrated there, nobody uses it, and so on). If the features were broken, and there were no people to fix it, I would understand, but that is not the case. The feature works, and it delivers what is promised. Moreover, reading the blog you referenced does not provide a good feeling about how the community has managed the event (the project losing part of its contributors) in question. OpenSource has cycles, and it is understandable that sometimes we do not have many people working on something. OpenSource projects have cycles, and that is normal. As you can see, now there would be us starting/trying to engage with the Telemetry project/community. What is hard for us to understand is that the contributors while leaving are also "killing" the project by removing part of its features (that are very interesting and valuable for us). Why is that important for us? When we work with OpenSource we now that we might need to put effort to customize/adapt things to our business workflow, and we expect that the community will be there to receive and discuss these changes. Therefore, we have predictability that the software/system we base our business will be there, and we can contribute back to improve it. An open source community could and should live even if the project has no community for a while, then if people regroup and start to work on it again, the community is able to flourish. Events is one feature that often gets requested, but the use cases and
demand for it are not expressed
strongly or well understood by most people. If the Telemetry project has demand to de-deprecate
Event handling (including Panko), I'd suggest a review of the requirements for event handling and
possibly choosing a champion for maintaining the Panko service.
Also note: over in Monasca we have a spec [1] for handling Events ingestion which I hope we will be
completing in Train. Contributions and comments welcome. :)
joseph
[0] https://github.com/openstack/ceilometer/commit/8a0245a5b3e1357d35ad6653be37c...
[1] https://github.com/openstack/monasca-specs/blob/master/specs/stein/approved/...
It is awesome that you might have a similar spec (not developed yet) for Monasca, but the question would remain for us. One, two, or three years from now, what will happen if you, your team, or the people behind this spec/feature decide to leave the community? Will this feature be removed from Monasca too? On Wed, May 8, 2019 at 6:23 PM Joseph Davis <joseph.davis@suse.com> wrote:
On 5/8/19 7:12 AM, openstack-discuss-request@lists.openstack.org wrote:
Hello Trinh, Where does the meeting happen? Will it be via IRC Telemetry channel? Or, in the Etherpad (https://etherpad.openstack.org/p/telemetry-meeting-agenda)? I would like to discuss and understand a bit better the context behind the Telemetry events deprecation.
Unfortunately, I have a conflict at that time and will not be able to attend.
I do have a little bit of context on the Events deprecation to share.
First, you will note the commit message from the commit [0] when Events were deprecated:
"
Deprecate event subsystem
This subsystem has never been finished and is not maintained. Deprecate it for future removal.
"
I got the impression from jd at the time that there were a number of features in Telemetry,
including Panko, that were not really "finished" and that the engineers who had worked on them
had moved on to other things, so the features had become unsupported. In late 2018 there was
an effort to clean up things that were not well maintained or didn't fit the direction of Telemetry.
See also: https://julien.danjou.info/lessons-from-openstack-telemetry-deflation/
Events is one feature that often gets requested, but the use cases and demand for it are not expressed
strongly or well understood by most people. If the Telemetry project has demand to de-deprecate
Event handling (including Panko), I'd suggest a review of the requirements for event handling and
possibly choosing a champion for maintaining the Panko service.
Also note: over in Monasca we have a spec [1] for handling Events ingestion which I hope we will be
completing in Train. Contributions and comments welcome. :)
joseph
[0] https://github.com/openstack/ceilometer/commit/8a0245a5b3e1357d35ad6653be37c...
[1] https://github.com/openstack/monasca-specs/blob/master/specs/stein/approved/...
On Wed, May 8, 2019 at 12:19 AM Trinh Nguyen <dangtrinhnt@gmail.com> <dangtrinhnt@gmail.com> wrote:
Hi team,
As planned, we will have a team meeting at 02:00 UTC, May 9th on #openstack-telemetry to discuss what we gonna do for the next milestone (Train-1) and continue what we left off from the last meeting.
I put here [1] the agenda thinking that it should be fine for an hour meeting. If you have anything to talk about, please put it there too.
[1] https://etherpad.openstack.org/p/telemetry-meeting-agenda
Bests,
--**Trinh Nguyen** *www.edlab.xyz <https://www.edlab.xyz> <https://www.edlab.xyz>*
-- Rafael Weingärtner
On 5/8/19 5:45 PM, Rafael Weingärtner wrote: <snip>
Thanks for the reply Joseph,
I have seen the commit message, and I also read the blog you referenced (and other pages related to the same topic) which made us a bit worried. I will try to explain our perspective and impressions when we read those blog pages. It is also worth noting that we have just started engaging with the OpenStack community (so, pardon my ignorance with some parts of OpenStack, and how this OpenSource community works). We are already making some contributions to Kolla-ansible, and we want to start to contribute back to Telemetry as well.
Before getting to the topic of Telemetry, and to be more precise, Ceilometer, let me state that I have taken part in other OpenSource projects and communities before, but these communities are managed by a different organization.
So, Ceilometer; when we were designing and building our OpenStack Cloud, where billing is a crucial part of it. Ceilometer was chosen because it fits our requirements, working "out of the box" to provide valuable data for billing in a high availability fashion. It for sure lacks some features, but that is ok when one works with OpenSource. The pollers and event managers we are missing, we would like to create and contribute back to the community.
Having said that, what puzzled me, and worried us, is the fact that features that work are being removed from a project just because some contributors/committers left the community. There wasn't (at least I did not see) a good technical reason to remove this feature (e.g. it does not deliver what is promised, or an alternative solution has been created somewhere and effort is being concentrated there, nobody uses it, and so on). If the features were broken, and there were no people to fix it, I would understand, but that is not the case. The feature works, and it delivers what is promised. Moreover, reading the blog you referenced does not provide a good feeling about how the community has managed the event (the project losing part of its contributors) in question. OpenSource has cycles, and it is understandable that sometimes we do not have many people working on something. OpenSource projects have cycles, and that is normal. As you can see, now there would be us starting/trying to engage with the Telemetry project/community. What is hard for us to understand is that the contributors while leaving are also "killing" the project by removing part of its features (that are very interesting and valuable for us).
Yeah, the history of Telemetry is a bit unusual in how it developed, and I could give editorials and opinions about decisions that were made and how well they worked in the community, but I'll save that for another time. I will say that communication with the community could have been better. And while I think that simplifying Ceilometer was a good choice at the time when the number of contributors was dwindling, I agree that cutting out a feature that is being used by users is not how OpenStack ought to operate. And now I'm starting to give opinions so I'll stop. I will say that it may be beneficial to the Telemetry project if you can write out your use case for the Telemetry stack and describe why you want Events to be captured and how you will use them. Describe how they important to your billing solution (*), and if you are matching the event notifications up with other metering data. You can discuss with the team in the meeting if that use case and set of requirements goes in Storyboard or elsewhere. (*) I am curious if you are using CloudKitty or another solution.
Why is that important for us? When we work with OpenSource we now that we might need to put effort to customize/adapt things to our business workflow, and we expect that the community will be there to receive and discuss these changes. Therefore, we have predictability that the software/system we base our business will be there, and we can contribute back to improve it. An open source community could and should live even if the project has no community for a while, then if people regroup and start to work on it again, the community is able to flourish.
I'm really glad you recognize the benefits of contributing back to the community. It gives me hope. :) <snip>
It is awesome that you might have a similar spec (not developed yet) for Monasca, but the question would remain for us. One, two, or three years from now, what will happen if you, your team, or the people behind this spec/feature decide to leave the community? Will this feature be removed from Monasca too?
Developers leaving the community is a normal part of the lifecycle, so I think you would agree that part of having a healthy project is ensuring that when that happens the project can go on. Monasca has already seen a number of developers come and go, and will continue on for the foreseeable future. That is part of why we wrote a spec for the events-listener, so that if needed the work could change hands and continue with context. We try to plan and get cross-company agreement in the community. Of course, there are priorities and trade-offs and limits on developers, but Monasca and OpenStack seem to do a good job of being 'open' about it. <snip>
-- Rafael Weingärtner
joseph
Thanks, Joseph, Rafael for the great comments. Understanding the user's use-cases is a very important step to make a feature alive. On Thu, May 9, 2019 at 10:33 AM Joseph Davis <joseph.davis@suse.com> wrote:
On 5/8/19 5:45 PM, Rafael Weingärtner wrote: <snip>
Thanks for the reply Joseph,
I have seen the commit message, and I also read the blog you referenced (and other pages related to the same topic) which made us a bit worried. I will try to explain our perspective and impressions when we read those blog pages. It is also worth noting that we have just started engaging with the OpenStack community (so, pardon my ignorance with some parts of OpenStack, and how this OpenSource community works). We are already making some contributions to Kolla-ansible, and we want to start to contribute back to Telemetry as well.
Before getting to the topic of Telemetry, and to be more precise, Ceilometer, let me state that I have taken part in other OpenSource projects and communities before, but these communities are managed by a different organization.
So, Ceilometer; when we were designing and building our OpenStack Cloud, where billing is a crucial part of it. Ceilometer was chosen because it fits our requirements, working "out of the box" to provide valuable data for billing in a high availability fashion. It for sure lacks some features, but that is ok when one works with OpenSource. The pollers and event managers we are missing, we would like to create and contribute back to the community.
Having said that, what puzzled me, and worried us, is the fact that features that work are being removed from a project just because some contributors/committers left the community. There wasn't (at least I did not see) a good technical reason to remove this feature (e.g. it does not deliver what is promised, or an alternative solution has been created somewhere and effort is being concentrated there, nobody uses it, and so on). If the features were broken, and there were no people to fix it, I would understand, but that is not the case. The feature works, and it delivers what is promised. Moreover, reading the blog you referenced does not provide a good feeling about how the community has managed the event (the project losing part of its contributors) in question. OpenSource has cycles, and it is understandable that sometimes we do not have many people working on something. OpenSource projects have cycles, and that is normal. As you can see, now there would be us starting/trying to engage with the Telemetry project/community. What is hard for us to understand is that the contributors while leaving are also "killing" the project by removing part of its features (that are very interesting and valuable for us).
Yeah, the history of Telemetry is a bit unusual in how it developed, and I could give editorials and opinions about decisions that were made and how well they worked in the community, but I'll save that for another time. I will say that communication with the community could have been better. And while I think that simplifying Ceilometer was a good choice at the time when the number of contributors was dwindling, I agree that cutting out a feature that is being used by users is not how OpenStack ought to operate. And now I'm starting to give opinions so I'll stop.
I will say that it may be beneficial to the Telemetry project if you can write out your use case for the Telemetry stack and describe why you want Events to be captured and how you will use them. Describe how they important to your billing solution (*), and if you are matching the event notifications up with other metering data. You can discuss with the team in the meeting if that use case and set of requirements goes in Storyboard or elsewhere.
(*) I am curious if you are using CloudKitty or another solution.
Why is that important for us? When we work with OpenSource we now that we might need to put effort to customize/adapt things to our business workflow, and we expect that the community will be there to receive and discuss these changes. Therefore, we have predictability that the software/system we base our business will be there, and we can contribute back to improve it. An open source community could and should live even if the project has no community for a while, then if people regroup and start to work on it again, the community is able to flourish.
I'm really glad you recognize the benefits of contributing back to the community. It gives me hope. :) <snip>
It is awesome that you might have a similar spec (not developed yet) for Monasca, but the question would remain for us. One, two, or three years from now, what will happen if you, your team, or the people behind this spec/feature decide to leave the community? Will this feature be removed from Monasca too?
Developers leaving the community is a normal part of the lifecycle, so I think you would agree that part of having a healthy project is ensuring that when that happens the project can go on. Monasca has already seen a number of developers come and go, and will continue on for the foreseeable future. That is part of why we wrote a spec for the events-listener, so that if needed the work could change hands and continue with context. We try to plan and get cross-company agreement in the community. Of course, there are priorities and trade-offs and limits on developers, but Monasca and OpenStack seem to do a good job of being 'open' about it.
<snip>
-- Rafael Weingärtner
joseph
-- *Trinh Nguyen* *www.edlab.xyz <https://www.edlab.xyz>*
Is it time to rethink the approach to telemetry a bit? Having each project provide its telemetry data (such as Swift with statsd - https://docs.openstack.org/swift/latest/admin/objectstorage-monitoring.html or using a framework like Prometheus)? In the end, the projects are the ones who have the best knowledge of how to get the metrics. Tim From: Rafael Weingärtner <rafaelweingartner@gmail.com> Date: Thursday, 9 May 2019 at 02:51 To: Joseph Davis <joseph.davis@suse.com> Cc: openstack-discuss <openstack-discuss@lists.openstack.org>, Trinh Nguyen <dangtrinhnt@gmail.com> Subject: Re: [telemetry] Team meeting agenda for tomorrow Unfortunately, I have a conflict at that time and will not be able to attend. I do have a little bit of context on the Events deprecation to share. First, you will note the commit message from the commit [0] when Events were deprecated: " Deprecate event subsystem This subsystem has never been finished and is not maintained. Deprecate it for future removal. " I got the impression from jd at the time that there were a number of features in Telemetry, including Panko, that were not really "finished" and that the engineers who had worked on them had moved on to other things, so the features had become unsupported. In late 2018 there was an effort to clean up things that were not well maintained or didn't fit the direction of Telemetry. See also: https://julien.danjou.info/lessons-from-openstack-telemetry-deflation/ Thanks for the reply Joseph, I have seen the commit message, and I also read the blog you referenced (and other pages related to the same topic) which made us a bit worried. I will try to explain our perspective and impressions when we read those blog pages. It is also worth noting that we have just started engaging with the OpenStack community (so, pardon my ignorance with some parts of OpenStack, and how this OpenSource community works). We are already making some contributions to Kolla-ansible, and we want to start to contribute back to Telemetry as well. Before getting to the topic of Telemetry, and to be more precise, Ceilometer, let me state that I have taken part in other OpenSource projects and communities before, but these communities are managed by a different organization. So, Ceilometer; when we were designing and building our OpenStack Cloud, where billing is a crucial part of it. Ceilometer was chosen because it fits our requirements, working "out of the box" to provide valuable data for billing in a high availability fashion. It for sure lacks some features, but that is ok when one works with OpenSource. The pollers and event managers we are missing, we would like to create and contribute back to the community. Having said that, what puzzled me, and worried us, is the fact that features that work are being removed from a project just because some contributors/committers left the community. There wasn't (at least I did not see) a good technical reason to remove this feature (e.g. it does not deliver what is promised, or an alternative solution has been created somewhere and effort is being concentrated there, nobody uses it, and so on). If the features were broken, and there were no people to fix it, I would understand, but that is not the case. The feature works, and it delivers what is promised. Moreover, reading the blog you referenced does not provide a good feeling about how the community has managed the event (the project losing part of its contributors) in question. OpenSource has cycles, and it is understandable that sometimes we do not have many people working on something. OpenSource projects have cycles, and that is normal. As you can see, now there would be us starting/trying to engage with the Telemetry project/community. What is hard for us to understand is that the contributors while leaving are also "killing" the project by removing part of its features (that are very interesting and valuable for us). Why is that important for us? When we work with OpenSource we now that we might need to put effort to customize/adapt things to our business workflow, and we expect that the community will be there to receive and discuss these changes. Therefore, we have predictability that the software/system we base our business will be there, and we can contribute back to improve it. An open source community could and should live even if the project has no community for a while, then if people regroup and start to work on it again, the community is able to flourish. Events is one feature that often gets requested, but the use cases and demand for it are not expressed strongly or well understood by most people. If the Telemetry project has demand to de-deprecate Event handling (including Panko), I'd suggest a review of the requirements for event handling and possibly choosing a champion for maintaining the Panko service. Also note: over in Monasca we have a spec [1] for handling Events ingestion which I hope we will be completing in Train. Contributions and comments welcome. :) joseph [0] https://github.com/openstack/ceilometer/commit/8a0245a5b3e1357d35ad6653be37c... [1] https://github.com/openstack/monasca-specs/blob/master/specs/stein/approved/... It is awesome that you might have a similar spec (not developed yet) for Monasca, but the question would remain for us. One, two, or three years from now, what will happen if you, your team, or the people behind this spec/feature decide to leave the community? Will this feature be removed from Monasca too? On Wed, May 8, 2019 at 6:23 PM Joseph Davis <joseph.davis@suse.com<mailto:joseph.davis@suse.com>> wrote: On 5/8/19 7:12 AM, openstack-discuss-request@lists.openstack.org<mailto:openstack-discuss-request@lists.openstack.org> wrote: Hello Trinh, Where does the meeting happen? Will it be via IRC Telemetry channel? Or, in the Etherpad (https://etherpad.openstack.org/p/telemetry-meeting-agenda)? I would like to discuss and understand a bit better the context behind the Telemetry events deprecation. Unfortunately, I have a conflict at that time and will not be able to attend. I do have a little bit of context on the Events deprecation to share. First, you will note the commit message from the commit [0] when Events were deprecated: " Deprecate event subsystem This subsystem has never been finished and is not maintained. Deprecate it for future removal. " I got the impression from jd at the time that there were a number of features in Telemetry, including Panko, that were not really "finished" and that the engineers who had worked on them had moved on to other things, so the features had become unsupported. In late 2018 there was an effort to clean up things that were not well maintained or didn't fit the direction of Telemetry. See also: https://julien.danjou.info/lessons-from-openstack-telemetry-deflation/ Events is one feature that often gets requested, but the use cases and demand for it are not expressed strongly or well understood by most people. If the Telemetry project has demand to de-deprecate Event handling (including Panko), I'd suggest a review of the requirements for event handling and possibly choosing a champion for maintaining the Panko service. Also note: over in Monasca we have a spec [1] for handling Events ingestion which I hope we will be completing in Train. Contributions and comments welcome. :) joseph [0] https://github.com/openstack/ceilometer/commit/8a0245a5b3e1357d35ad6653be37c... [1] https://github.com/openstack/monasca-specs/blob/master/specs/stein/approved/... On Wed, May 8, 2019 at 12:19 AM Trinh Nguyen <dangtrinhnt@gmail.com><mailto:dangtrinhnt@gmail.com> wrote: Hi team, As planned, we will have a team meeting at 02:00 UTC, May 9th on #openstack-telemetry to discuss what we gonna do for the next milestone (Train-1) and continue what we left off from the last meeting. I put here [1] the agenda thinking that it should be fine for an hour meeting. If you have anything to talk about, please put it there too. [1] https://etherpad.openstack.org/p/telemetry-meeting-agenda Bests, -- *Trinh Nguyen* *www.edlab.xyz<http://www.edlab.xyz> <https://www.edlab.xyz><https://www.edlab.xyz>* -- Rafael Weingärtner
Hi Tim, It's exactly a great time for your idea as we are trying to develop the new roadmap/vision for Telemetry. I put your comment to the brainstorming etherpad [1] [1] https://etherpad.openstack.org/p/telemetry-train-roadmap Bests, On Thu, May 9, 2019 at 4:24 PM Tim Bell <Tim.Bell@cern.ch> wrote:
Is it time to rethink the approach to telemetry a bit?
Having each project provide its telemetry data (such as Swift with statsd - https://docs.openstack.org/swift/latest/admin/objectstorage-monitoring.html
or using a framework like Prometheus)?
In the end, the projects are the ones who have the best knowledge of how to get the metrics.
Tim
*From: *Rafael Weingärtner <rafaelweingartner@gmail.com> *Date: *Thursday, 9 May 2019 at 02:51 *To: *Joseph Davis <joseph.davis@suse.com> *Cc: *openstack-discuss <openstack-discuss@lists.openstack.org>, Trinh Nguyen <dangtrinhnt@gmail.com> *Subject: *Re: [telemetry] Team meeting agenda for tomorrow
Unfortunately, I have a conflict at that time and will not be able to attend.
I do have a little bit of context on the Events deprecation to share.
First, you will note the commit message from the commit [0] when Events were deprecated:
"
Deprecate event subsystem
This subsystem has never been finished and is not maintained.
Deprecate it for future removal.
"
I got the impression from jd at the time that there were a number of features in Telemetry,
including Panko, that were not really "finished" and that the engineers who had worked on them
had moved on to other things, so the features had become unsupported. In late 2018 there was
an effort to clean up things that were not well maintained or didn't fit the direction of Telemetry.
See also: https://julien.danjou.info/lessons-from-openstack-telemetry-deflation/
Thanks for the reply Joseph,
I have seen the commit message, and I also read the blog you referenced (and other pages related to the same topic) which made us a bit worried. I will try to explain our perspective and impressions when we read those blog pages. It is also worth noting that we have just started engaging with the OpenStack community (so, pardon my ignorance with some parts of OpenStack, and how this OpenSource community works). We are already making some contributions to Kolla-ansible, and we want to start to contribute back to Telemetry as well.
Before getting to the topic of Telemetry, and to be more precise, Ceilometer, let me state that I have taken part in other OpenSource projects and communities before, but these communities are managed by a different organization.
So, Ceilometer; when we were designing and building our OpenStack Cloud, where billing is a crucial part of it. Ceilometer was chosen because it fits our requirements, working "out of the box" to provide valuable data for billing in a high availability fashion. It for sure lacks some features, but that is ok when one works with OpenSource. The pollers and event managers we are missing, we would like to create and contribute back to the community.
Having said that, what puzzled me, and worried us, is the fact that features that work are being removed from a project just because some contributors/committers left the community. There wasn't (at least I did not see) a good technical reason to remove this feature (e.g. it does not deliver what is promised, or an alternative solution has been created somewhere and effort is being concentrated there, nobody uses it, and so on). If the features were broken, and there were no people to fix it, I would understand, but that is not the case. The feature works, and it delivers what is promised. Moreover, reading the blog you referenced does not provide a good feeling about how the community has managed the event (the project losing part of its contributors) in question. OpenSource has cycles, and it is understandable that sometimes we do not have many people working on something. OpenSource projects have cycles, and that is normal. As you can see, now there would be us starting/trying to engage with the Telemetry project/community. What is hard for us to understand is that the contributors while leaving are also "killing" the project by removing part of its features (that are very interesting and valuable for us).
Why is that important for us? When we work with OpenSource we now that we might need to put effort to customize/adapt things to our business workflow, and we expect that the community will be there to receive and discuss these changes. Therefore, we have predictability that the software/system we base our business will be there, and we can contribute back to improve it. An open source community could and should live even if the project has no community for a while, then if people regroup and start to work on it again, the community is able to flourish.
Events is one feature that often gets requested, but the use cases and demand for it are not expressed
strongly or well understood by most people. If the Telemetry project has demand to de-deprecate
Event handling (including Panko), I'd suggest a review of the requirements for event handling and
possibly choosing a champion for maintaining the Panko service.
Also note: over in Monasca we have a spec [1] for handling Events ingestion which I hope we will be
completing in Train. Contributions and comments welcome. :)
joseph
[0] https://github.com/openstack/ceilometer/commit/8a0245a5b3e1357d35ad6653be37c...
[1] https://github.com/openstack/monasca-specs/blob/master/specs/stein/approved/...
It is awesome that you might have a similar spec (not developed yet) for Monasca, but the question would remain for us. One, two, or three years from now, what will happen if you, your team, or the people behind this spec/feature decide to leave the community? Will this feature be removed from Monasca too?
On Wed, May 8, 2019 at 6:23 PM Joseph Davis <joseph.davis@suse.com> wrote:
On 5/8/19 7:12 AM, openstack-discuss-request@lists.openstack.org wrote:
Hello Trinh,
Where does the meeting happen? Will it be via IRC Telemetry channel? Or, in
the Etherpad (https://etherpad.openstack.org/p/telemetry-meeting-agenda)? I
would like to discuss and understand a bit better the context behind
the Telemetry
events deprecation.
Unfortunately, I have a conflict at that time and will not be able to attend.
I do have a little bit of context on the Events deprecation to share.
First, you will note the commit message from the commit [0] when Events were deprecated:
"
Deprecate event subsystem
This subsystem has never been finished and is not maintained.
Deprecate it for future removal.
"
I got the impression from jd at the time that there were a number of features in Telemetry,
including Panko, that were not really "finished" and that the engineers who had worked on them
had moved on to other things, so the features had become unsupported. In late 2018 there was
an effort to clean up things that were not well maintained or didn't fit the direction of Telemetry.
See also: https://julien.danjou.info/lessons-from-openstack-telemetry-deflation/
Events is one feature that often gets requested, but the use cases and demand for it are not expressed
strongly or well understood by most people. If the Telemetry project has demand to de-deprecate
Event handling (including Panko), I'd suggest a review of the requirements for event handling and
possibly choosing a champion for maintaining the Panko service.
Also note: over in Monasca we have a spec [1] for handling Events ingestion which I hope we will be
completing in Train. Contributions and comments welcome. :)
joseph
[0] https://github.com/openstack/ceilometer/commit/8a0245a5b3e1357d35ad6653be37c...
[1] https://github.com/openstack/monasca-specs/blob/master/specs/stein/approved/...
On Wed, May 8, 2019 at 12:19 AM Trinh Nguyen <dangtrinhnt@gmail.com> <dangtrinhnt@gmail.com> wrote:
Hi team,
As planned, we will have a team meeting at 02:00 UTC, May 9th on
#openstack-telemetry to discuss what we gonna do for the next milestone
(Train-1) and continue what we left off from the last meeting.
I put here [1] the agenda thinking that it should be fine for an hour
meeting. If you have anything to talk about, please put it there too.
[1] https://etherpad.openstack.org/p/telemetry-meeting-agenda
Bests,
--
****Trinh Nguyen**
*www.edlab.xyz <https://www.edlab.xyz> <https://www.edlab.xyz>*
--
Rafael Weingärtner
-- *Trinh Nguyen* *www.edlab.xyz <https://www.edlab.xyz>*
On Thu, May 09, 2019 at 07:24:43AM +0000, Tim Bell wrote:
Is it time to rethink the approach to telemetry a bit?
Having each project provide its telemetry data (such as Swift with statsd - https://docs.openstack.org/swift/latest/admin/objectstorage-monitoring.html or using a framework like Prometheus)?
In the end, the projects are the ones who have the best knowledge of how to get the metrics.
Tim
Yes please! I'd have some ideas, here. Prometheus has been mentioned so many times now as a requirement/request. There are also other projects to mention here, such as collectd, or OPNFV Barometer. Unfortunately, having a meetig at 4 am in the morning does not really work for me. May I kindly request to move the meeting to a more friendly hour? -- Matthias Runge <mrunge@matthias-runge.de>
On 2019-05-09 10:35:58 +0200 (+0200), Matthias Runge wrote: [...]
Unfortunately, having a meetig at 4 am in the morning does not really work for me. May I kindly request to move the meeting to a more friendly hour?
The World is round, and your "friendly" times are always someone else's "unfriendly" times. Asking the folks interested in participating in the meeting to agree on a consensus timeslot between them is fair, but please don't characterize someone else's locale as "unfriendly" just because it's on the opposite side of the planet from you. -- Jeremy Stanley
On Thu, May 09, 2019 at 12:43:01PM +0000, Jeremy Stanley wrote:
On 2019-05-09 10:35:58 +0200 (+0200), Matthias Runge wrote: [...]
Unfortunately, having a meetig at 4 am in the morning does not really work for me. May I kindly request to move the meeting to a more friendly hour?
The World is round, and your "friendly" times are always someone else's "unfriendly" times. Asking the folks interested in participating in the meeting to agree on a consensus timeslot between them is fair, but please don't characterize someone else's locale as "unfriendly" just because it's on the opposite side of the planet from you.
Right. It was not my intention to sound develuating. However, and for myself, I could not imagine a worse time. When asking for people to join an effort, it usually helps to have meetings in hours being accessible for them; the alternative would be to switch to asynchronous methods. Matthias -- Matthias Runge <mrunge@matthias-runge.de>
Agree. Instrumenting the code is the most efficient and recommended way to monitor the applications. We have discussed it during the Self-healing SIG PTG session last week. The problem is that telemetry topic is not and never will be high priority for individual projects so the coordination effort from community is required here. I thinks this is one of the areas where Telemetry and Monasca teams could work together on. Cheers Witek On 5/9/19 9:24 AM, Tim Bell wrote:
Is it time to rethink the approach to telemetry a bit?
Having each project provide its telemetry data (such as Swift with statsd - https://docs.openstack.org/swift/latest/admin/objectstorage-monitoring.html
or using a framework like Prometheus)?
In the end, the projects are the ones who have the best knowledge of how to get the metrics.
Tim
*From: *Rafael Weingärtner <rafaelweingartner@gmail.com> *Date: *Thursday, 9 May 2019 at 02:51 *To: *Joseph Davis <joseph.davis@suse.com> *Cc: *openstack-discuss <openstack-discuss@lists.openstack.org>, Trinh Nguyen <dangtrinhnt@gmail.com> *Subject: *Re: [telemetry] Team meeting agenda for tomorrow
Unfortunately, I have a conflict at that time and will not be able to attend.
I do have a little bit of context on the Events deprecation to share.
First, you will note the commit message from the commit [0] when Events were deprecated:
"
Deprecate event subsystem
This subsystem has never been finished and is not maintained.
Deprecate it for future removal.
"
I got the impression from jd at the time that there were a number of features in Telemetry,
including Panko, that were not really "finished" and that the engineers who had worked on them
had moved on to other things, so the features had become unsupported. In late 2018 there was
an effort to clean up things that were not well maintained or didn't fit the direction of Telemetry.
See also: https://julien.danjou.info/lessons-from-openstack-telemetry-deflation/
Thanks for the reply Joseph,
I have seen the commit message, and I also read the blog you referenced (and other pages related to the same topic) which made us a bit worried. I will try to explain our perspective and impressions when we read those blog pages. It is also worth noting that we have just started engaging with the OpenStack community (so, pardon my ignorance with some parts of OpenStack, and how this OpenSource community works). We are already making some contributions to Kolla-ansible, and we want to start to contribute back to Telemetry as well.
Before getting to the topic of Telemetry, and to be more precise, Ceilometer, let me state that I have taken part in other OpenSource projects and communities before, but these communities are managed by a different organization.
So, Ceilometer; when we were designing and building our OpenStack Cloud, where billing is a crucial part of it. Ceilometer was chosen because it fits our requirements, working "out of the box" to provide valuable data for billing in a high availability fashion. It for sure lacks some features, but that is ok when one works with OpenSource. The pollers and event managers we are missing, we would like to create and contribute back to the community.
Having said that, what puzzled me, and worried us, is the fact that features that work are being removed from a project just because some contributors/committers left the community. There wasn't (at least I did not see) a good technical reason to remove this feature (e.g. it does not deliver what is promised, or an alternative solution has been created somewhere and effort is being concentrated there, nobody uses it, and so on). If the features were broken, and there were no people to fix it, I would understand, but that is not the case. The feature works, and it delivers what is promised. Moreover, reading the blog you referenced does not provide a good feeling about how the community has managed the event (the project losing part of its contributors) in question. OpenSource has cycles, and it is understandable that sometimes we do not have many people working on something. OpenSource projects have cycles, and that is normal. As you can see, now there would be us starting/trying to engage with the Telemetry project/community. What is hard for us to understand is that the contributors while leaving are also "killing" the project by removing part of its features (that are very interesting and valuable for us).
Why is that important for us? When we work with OpenSource we now that we might need to put effort to customize/adapt things to our business workflow, and we expect that the community will be there to receive and discuss these changes. Therefore, we have predictability that the software/system we base our business will be there, and we can contribute back to improve it. An open source community could and should live even if the project has no community for a while, then if people regroup and start to work on it again, the community is able to flourish.
Events is one feature that often gets requested, but the use cases and demand for it are not expressed
strongly or well understood by most people. If the Telemetry project has demand to de-deprecate
Event handling (including Panko), I'd suggest a review of the requirements for event handling and
possibly choosing a champion for maintaining the Panko service.
Also note: over in Monasca we have a spec [1] for handling Events ingestion which I hope we will be
completing in Train. Contributions and comments welcome. :)
joseph
[0] https://github.com/openstack/ceilometer/commit/8a0245a5b3e1357d35ad6653be37c...
[1] https://github.com/openstack/monasca-specs/blob/master/specs/stein/approved/...
It is awesome that you might have a similar spec (not developed yet) for Monasca, but the question would remain for us. One, two, or three years from now, what will happen if you, your team, or the people behind this spec/feature decide to leave the community? Will this feature be removed from Monasca too?
On Wed, May 8, 2019 at 6:23 PM Joseph Davis <joseph.davis@suse.com <mailto:joseph.davis@suse.com>> wrote:
On 5/8/19 7:12 AM, openstack-discuss-request@lists.openstack.org <mailto:openstack-discuss-request@lists.openstack.org> wrote:
Hello Trinh,
Where does the meeting happen? Will it be via IRC Telemetry channel? Or, in
the Etherpad (https://etherpad.openstack.org/p/telemetry-meeting-agenda)? I
would like to discuss and understand a bit better the context behind
the Telemetry
events deprecation.
Unfortunately, I have a conflict at that time and will not be able to attend.
I do have a little bit of context on the Events deprecation to share.
First, you will note the commit message from the commit [0] when Events were deprecated:
"
Deprecate event subsystem
This subsystem has never been finished and is not maintained.
Deprecate it for future removal.
"
I got the impression from jd at the time that there were a number of features in Telemetry,
including Panko, that were not really "finished" and that the engineers who had worked on them
had moved on to other things, so the features had become unsupported. In late 2018 there was
an effort to clean up things that were not well maintained or didn't fit the direction of Telemetry.
See also: https://julien.danjou.info/lessons-from-openstack-telemetry-deflation/
Events is one feature that often gets requested, but the use cases and demand for it are not expressed
strongly or well understood by most people. If the Telemetry project has demand to de-deprecate
Event handling (including Panko), I'd suggest a review of the requirements for event handling and
possibly choosing a champion for maintaining the Panko service.
Also note: over in Monasca we have a spec [1] for handling Events ingestion which I hope we will be
completing in Train. Contributions and comments welcome. :)
joseph
[0] https://github.com/openstack/ceilometer/commit/8a0245a5b3e1357d35ad6653be37c...
[1] https://github.com/openstack/monasca-specs/blob/master/specs/stein/approved/...
On Wed, May 8, 2019 at 12:19 AM Trinh Nguyen<dangtrinhnt@gmail.com> <mailto:dangtrinhnt@gmail.com> wrote:
Hi team,
As planned, we will have a team meeting at 02:00 UTC, May 9th on
#openstack-telemetry to discuss what we gonna do for the next milestone
(Train-1) and continue what we left off from the last meeting.
I put here [1] the agenda thinking that it should be fine for an hour
meeting. If you have anything to talk about, please put it there too.
[1] https://etherpad.openstack.org/p/telemetry-meeting-agenda
Bests,
--
****Trinh Nguyen**
*www.edlab.xyz <http://www.edlab.xyz> <https://www.edlab.xyz> <https://www.edlab.xyz>*
--
Rafael Weingärtner
On 5/9/19 9:24 AM, Tim Bell wrote:
Is it time to rethink the approach to telemetry a bit?
Having each project provide its telemetry data (such as Swift with statsd - https://docs.openstack.org/swift/latest/admin/objectstorage-monitoring.html
or using a framework like Prometheus)?
In the end, the projects are the ones who have the best knowledge of how to get the metrics.
Tim
Tim, statsd for swift is for monitoring, it is *not* a usage metric. Likewise with Prometheus, who wont care if some data are missing. I very much would love to have each project handle metrics collection by themselves. Especially, I always though that the polling system implemented in Ceilometer is just wrong, and that every service must be able to report itself rather than being polled. I understand however that doing polling is easier than implementing such change in every service, so I get why it has been done this way. But then we need some kind of timeseries framework within OpenStack as a whole (through an Oslo library?), and also we must decide on a backend. Right now, the only serious thing we have is Gnocchi, since influxdb is gone through the open core model. Or do you have something else to suggest? Cheers, Thomas Goirand (zigo)
But then we need some kind of timeseries framework within OpenStack as a whole (through an Oslo library?),
What would be the requirements and the scope of this framework from your point of view?
and also we must decide on a backend. Right now, the only serious thing we have is Gnocchi, since influxdb is gone through the open core model. Or do you have something else to suggest?
Monasca can be used as the backend. As TSDB it uses Apache Cassandra with native clustering support or InfluxDB. Monasca uses Apache Kafka as the message queue. It can replicate and partition the measurements into independent InfluxDB instances. Additionally Monasca API could act as the load balancer monitoring the healthiness of InfluxDB instances and routing the queries to the assigned shards. We want to work in Train cycle to add upstream all configuration options to allow such setup [1]. Your feedback, comments and contribution are very welcome. Cheers Witek [1] https://storyboard.openstack.org/#!/story/2005620
On 5/9/19 2:35 PM, Witek Bedyk wrote:
But then we need some kind of timeseries framework within OpenStack as a whole (through an Oslo library?),
What would be the requirements and the scope of this framework from your point of view?
Currently, Ceilometer pushes values to a timeseries. We could see services doing this directly, without having Ceilometer in the middle doing the polling. I'm thinking about bandwidth and IO usage, which can potentially be quite resource intensive. For example, we could have neutron-metering-agent sending metrics to a timeseries directly, without going through the loop of rabbitmq. That's just an idea I'm throwing... Cheers, Thomas Goirand (zigo)
Hi Tim, I added your question as Proposal C to the roadmap etherpad [1]. Feel free to change it if I got something wrong. :) [1]https://etherpad.openstack.org/p/telemetry-train-roadmap joseph On 5/9/19 12:24 AM, Tim Bell wrote:
Is it time to rethink the approach to telemetry a bit?
Having each project provide its telemetry data (such as Swift with statsd - https://docs.openstack.org/swift/latest/admin/objectstorage-monitoring.html
or using a framework like Prometheus)?
In the end, the projects are the ones who have the best knowledge of how to get the metrics.
Tim
*<snip>*
Hi Joseph, Thanks for the update. I would suggest creating another mail thread since this is barely about the meeting yesterday. Bests, On Fri, May 10, 2019 at 12:57 AM Joseph Davis <joseph.davis@suse.com> wrote:
Hi Tim,
I added your question as Proposal C to the roadmap etherpad [1]. Feel free to change it if I got something wrong. :)
[1] https://etherpad.openstack.org/p/telemetry-train-roadmap
joseph
On 5/9/19 12:24 AM, Tim Bell wrote:
Is it time to rethink the approach to telemetry a bit?
Having each project provide its telemetry data (such as Swift with statsd - https://docs.openstack.org/swift/latest/admin/objectstorage-monitoring.html
or using a framework like Prometheus)?
In the end, the projects are the ones who have the best knowledge of how to get the metrics.
Tim
*<snip>*
-- *Trinh Nguyen* *www.edlab.xyz <https://www.edlab.xyz>*
Hi guys, Please cast your votes for the new meeting time: https://doodle.com/poll/cd9d3ksvpms4frud <https://doodle.com/poll/cd9d3ksvpms4frud?utm_campaign=poll_created&utm_medium=email&utm_source=poll_transactional&utm_content=inviteparticipants-cta#email> Bests, On Fri, May 10, 2019 at 10:42 AM Trinh Nguyen <dangtrinhnt@gmail.com> wrote:
Hi Joseph,
Thanks for the update. I would suggest creating another mail thread since this is barely about the meeting yesterday.
Bests,
On Fri, May 10, 2019 at 12:57 AM Joseph Davis <joseph.davis@suse.com> wrote:
Hi Tim,
I added your question as Proposal C to the roadmap etherpad [1]. Feel free to change it if I got something wrong. :)
[1] https://etherpad.openstack.org/p/telemetry-train-roadmap
joseph
On 5/9/19 12:24 AM, Tim Bell wrote:
Is it time to rethink the approach to telemetry a bit?
Having each project provide its telemetry data (such as Swift with statsd - https://docs.openstack.org/swift/latest/admin/objectstorage-monitoring.html
or using a framework like Prometheus)?
In the end, the projects are the ones who have the best knowledge of how to get the metrics.
Tim
*<snip>*
-- *Trinh Nguyen* *www.edlab.xyz <https://www.edlab.xyz>*
-- *Trinh Nguyen* *www.edlab.xyz <https://www.edlab.xyz>*
Hello, I am working on Watcher and we are currently changing how metrics are retrieved from different datasources such as Monasca or Gnocchi. Because of this major overhaul I would like to validate that everything is working correctly. Almost all of the optimization strategies in Watcher require the cpu utilization of an instance as metric but with newer versions of Ceilometer this has become unavailable. On IRC I received the information that Gnocchi could be used to configure an aggregate and this aggregate would then report cpu utilization, however, I have been unable to find documentation on how to achieve this. I was also notified that cpu_util is something that could be computed from other metrics. When reading https://docs.openstack.org/ceilometer/rocky/admin/telemetry-measurements.htm... the documentation seems to agree on this as it states that cpu_util is measured by using a 'rate of change' transformer. But I have not been able to find how this can be computed. I was hoping someone could spare the time to provide documentation or information on how this currently is best achieved. Kind Regards, Corne Lukken (Dantali0n)
The message at the end of this email is some three months old. I have the same problem. The question is: *How to use the new rate metrics in Gnocchi. *I am using a Stein Devstack for my tests.* * For example, I need the CPU rate, formerly named /cpu_util/. I created a new archive policy that uses /rate:mean/ aggregation and has a 1 minute granularity: $ gnocchi archive-policy show ceilometer-medium-rate +---------------------+------------------------------------------------------------------+ | Field | Value | +---------------------+------------------------------------------------------------------+ | aggregation_methods | rate:mean, mean | | back_window | 0 | | definition | - points: 10080, granularity: 0:01:00, timespan: 7 days, 0:00:00 | | name | ceilometer-medium-rate | +---------------------+------------------------------------------------------------------+ I added the new policy to the publishers in /pipeline.yaml/: $ tail -n5 /etc/ceilometer/pipeline.yaml sinks: - name: meter_sink publishers: - gnocchi://?archive_policy=medium&filter_project=gnocchi_swift *- gnocchi://?archive_policy=ceilometer-medium-rate&filter_project=gnocchi_swift* After restarting all of Ceilometer, my hope was that the CPU rate would magically appear in the metric list. But no: All metrics are linked to archive policy /medium/, and looking at the details of an instance, I don't detect anything rate-related: $ gnocchi resource show ae3659d6-8998-44ae-a494-5248adbebe11 +-----------------------+---------------------------------------------------------------------+ | Field | Value | +-----------------------+---------------------------------------------------------------------+ ... | metrics | compute.instance.booting.time: 76fac1f5-962e-4ff2-8790-1f497c99c17d | | | cpu: af930d9a-a218-4230-b729-fee7e3796944 | | | disk.ephemeral.size: 0e838da3-f78f-46bf-aefb-aeddf5ff3a80 | | | disk.root.size: 5b971bbf-e0de-4e23-ba50-a4a9bf7dfe6e | | | memory.resident: 09efd98d-c848-4379-ad89-f46ec526c183 | | | memory.swap.in: 1bb4bb3c-e40a-4810-997a-295b2fe2d5eb | | | memory.swap.out: 4d012697-1d89-4794-af29-61c01c925bb4 | | | memory.usage: 93eab625-0def-4780-9310-eceff46aab7b | | | memory: ea8f2152-09bd-4aac-bea5-fa8d4e72bbb1 | | | vcpus: e1c5acaf-1b10-4d34-98b5-3ad16de57a98 | | original_resource_id | ae3659d6-8998-44ae-a494-5248adbebe11 | ... | type | instance | | user_id | a9c935f52e5540fc9befae7f91b4b3ae | +-----------------------+---------------------------------------------------------------------+ Obviously, I am missing something. Where is the missing link? What do I have to do to get CPU usage rates? Do I have to create metrics? Do//I have to ask Ceilometer to create metrics? How? Right now, no instructions seem to exist at all. If that is correct, I would be happy to write documentation once I understand how it works. Thanks a lot. Bernd On 5/10/2019 3:49 PM, info@dantalion.nl wrote:
Hello,
I am working on Watcher and we are currently changing how metrics are retrieved from different datasources such as Monasca or Gnocchi. Because of this major overhaul I would like to validate that everything is working correctly.
Almost all of the optimization strategies in Watcher require the cpu utilization of an instance as metric but with newer versions of Ceilometer this has become unavailable.
On IRC I received the information that Gnocchi could be used to configure an aggregate and this aggregate would then report cpu utilization, however, I have been unable to find documentation on how to achieve this.
I was also notified that cpu_util is something that could be computed from other metrics. When reading https://docs.openstack.org/ceilometer/rocky/admin/telemetry-measurements.htm... the documentation seems to agree on this as it states that cpu_util is measured by using a 'rate of change' transformer. But I have not been able to find how this can be computed.
I was hoping someone could spare the time to provide documentation or information on how this currently is best achieved.
Kind Regards, Corne Lukken (Dantali0n)
Hi Bernd, There were a lot of people asked the same question before, unfortunately, I don't know the answer either(we are still using an old version of Ceilometer). The original cpu_util support has been removed from Ceilometer in favor of Gnocchi, but AFAIK, there is no doc in Gnocchi mentioned how to achieve the same thing and no clear answer from the Gnocchi maintainers. It'd be much appreciated if you could find the answer in the end, or there will be someone who has the already solved the issue. Best regards, Lingxian Kong Catalyst Cloud On Wed, Jul 31, 2019 at 1:28 PM Bernd Bausch <berndbausch@gmail.com> wrote:
The message at the end of this email is some three months old. I have the same problem. The question is: *How to use the new rate metrics in Gnocchi. *I am using a Stein Devstack for my tests.
For example, I need the CPU rate, formerly named *cpu_util*. I created a new archive policy that uses *rate:mean* aggregation and has a 1 minute granularity:
$ gnocchi archive-policy show ceilometer-medium-rate
+---------------------+------------------------------------------------------------------+ | Field | Value |
+---------------------+------------------------------------------------------------------+ | aggregation_methods | rate:mean, mean | | back_window | 0 | | definition | - points: 10080, granularity: 0:01:00, timespan: 7 days, 0:00:00 | | name | ceilometer-medium-rate |
+---------------------+------------------------------------------------------------------+
I added the new policy to the publishers in *pipeline.yaml*:
$ tail -n5 /etc/ceilometer/pipeline.yaml sinks: - name: meter_sink publishers: - gnocchi://?archive_policy=medium&filter_project=gnocchi_swift *- gnocchi://?archive_policy=ceilometer-medium-rate&filter_project=gnocchi_swift*
After restarting all of Ceilometer, my hope was that the CPU rate would magically appear in the metric list. But no: All metrics are linked to archive policy *medium*, and looking at the details of an instance, I don't detect anything rate-related:
$ gnocchi resource show ae3659d6-8998-44ae-a494-5248adbebe11
+-----------------------+---------------------------------------------------------------------+ | Field | Value |
+-----------------------+---------------------------------------------------------------------+ ... | metrics | compute.instance.booting.time: 76fac1f5-962e-4ff2-8790-1f497c99c17d | | | cpu: af930d9a-a218-4230-b729-fee7e3796944 | | | disk.ephemeral.size: 0e838da3-f78f-46bf-aefb-aeddf5ff3a80 | | | disk.root.size: 5b971bbf-e0de-4e23-ba50-a4a9bf7dfe6e | | | memory.resident: 09efd98d-c848-4379-ad89-f46ec526c183 | | | memory.swap.in: 1bb4bb3c-e40a-4810-997a-295b2fe2d5eb | | | memory.swap.out: 4d012697-1d89-4794-af29-61c01c925bb4 | | | memory.usage: 93eab625-0def-4780-9310-eceff46aab7b | | | memory: ea8f2152-09bd-4aac-bea5-fa8d4e72bbb1 | | | vcpus: e1c5acaf-1b10-4d34-98b5-3ad16de57a98 | | original_resource_id | ae3659d6-8998-44ae-a494-5248adbebe11 | ...
| type | instance | | user_id | a9c935f52e5540fc9befae7f91b4b3ae |
+-----------------------+---------------------------------------------------------------------+
Obviously, I am missing something. Where is the missing link? What do I have to do to get CPU usage rates? Do I have to create metrics? Do I have to ask Ceilometer to create metrics? How?
Right now, no instructions seem to exist at all. If that is correct, I would be happy to write documentation once I understand how it works.
Thanks a lot.
Bernd On 5/10/2019 3:49 PM, info@dantalion.nl wrote:
Hello,
I am working on Watcher and we are currently changing how metrics are retrieved from different datasources such as Monasca or Gnocchi. Because of this major overhaul I would like to validate that everything is working correctly.
Almost all of the optimization strategies in Watcher require the cpu utilization of an instance as metric but with newer versions of Ceilometer this has become unavailable.
On IRC I received the information that Gnocchi could be used to configure an aggregate and this aggregate would then report cpu utilization, however, I have been unable to find documentation on how to achieve this.
I was also notified that cpu_util is something that could be computed from other metrics. When readinghttps://docs.openstack.org/ceilometer/rocky/admin/telemetry-measurements.htm... the documentation seems to agree on this as it states that cpu_util is measured by using a 'rate of change' transformer. But I have not been able to find how this can be computed.
I was hoping someone could spare the time to provide documentation or information on how this currently is best achieved.
Kind Regards, Corne Lukken (Dantali0n)
Lingxian, Thanks for "bumping" my request and keeping it alive. The reason I need an answer: I am updating courseware to Stein that includes autoscaling based on CPU and disk I/O rates. Looks like I am "cutting edge" :) I don't think the problem is in the Gnocchi camp, but rather Ceilometer. To store rates of measures in z, the following is needed: * A /metric/. Raw measures are sent to the metric. * An /archive policy/. The metric has an archive policy. * The archive policy includes one or more /rate aggregates/ My cloud has archive policies with rate aggregates, but the question is about the first bullet: *How can I configure Ceilometer so that it creates the corresponding metrics and sends measures to them. *In other words, how is Ceilometer's output connected to my archive policy. From my experience, just adding the archive policy to Ceilometer's publishers is not sufficient. Ceilometer's source code includes /.../publisher/data/gnocchi_resources.yaml/, which might well be the place where this can be configured. I am not sure how to do it though, and this file is not documented. I can read the source, but my developer skills are insufficient for understanding how everything fits together. Bernd On 8/1/2019 9:01 AM, Lingxian Kong wrote:
Hi Bernd,
There were a lot of people asked the same question before, unfortunately, I don't know the answer either(we are still using an old version of Ceilometer). The original cpu_util support has been removed from Ceilometer in favor of Gnocchi, but AFAIK, there is no doc in Gnocchi mentioned how to achieve the same thing and no clear answer from the Gnocchi maintainers.
It'd be much appreciated if you could find the answer in the end, or there will be someone who has the already solved the issue.
Best regards, Lingxian Kong Catalyst Cloud
On Wed, Jul 31, 2019 at 1:28 PM Bernd Bausch <berndbausch@gmail.com <mailto:berndbausch@gmail.com>> wrote:
The message at the end of this email is some three months old. I have the same problem. The question is: *How to use the new rate metrics in Gnocchi. *I am using a Stein Devstack for my tests.* *
For example, I need the CPU rate, formerly named /cpu_util/. I created a new archive policy that uses /rate:mean/ aggregation and has a 1 minute granularity:
$ gnocchi archive-policy show ceilometer-medium-rate +---------------------+------------------------------------------------------------------+ | Field | Value | +---------------------+------------------------------------------------------------------+ | aggregation_methods | rate:mean, mean | | back_window | 0 | | definition | - points: 10080, granularity: 0:01:00, timespan: 7 days, 0:00:00 | | name | ceilometer-medium-rate | +---------------------+------------------------------------------------------------------+
I added the new policy to the publishers in /pipeline.yaml/:
$ tail -n5 /etc/ceilometer/pipeline.yaml sinks: - name: meter_sink publishers: - gnocchi://?archive_policy=medium&filter_project=gnocchi_swift *- gnocchi://?archive_policy=ceilometer-medium-rate&filter_project=gnocchi_swift*
After restarting all of Ceilometer, my hope was that the CPU rate would magically appear in the metric list. But no: All metrics are linked to archive policy /medium/, and looking at the details of an instance, I don't detect anything rate-related:
$ gnocchi resource show ae3659d6-8998-44ae-a494-5248adbebe11 +-----------------------+---------------------------------------------------------------------+ | Field | Value | +-----------------------+---------------------------------------------------------------------+ ... | metrics | compute.instance.booting.time: 76fac1f5-962e-4ff2-8790-1f497c99c17d | | | cpu: af930d9a-a218-4230-b729-fee7e3796944 | | | disk.ephemeral.size: 0e838da3-f78f-46bf-aefb-aeddf5ff3a80 | | | disk.root.size: 5b971bbf-e0de-4e23-ba50-a4a9bf7dfe6e | | | memory.resident: 09efd98d-c848-4379-ad89-f46ec526c183 | | | memory.swap.in <http://memory.swap.in>: 1bb4bb3c-e40a-4810-997a-295b2fe2d5eb | | | memory.swap.out: 4d012697-1d89-4794-af29-61c01c925bb4 | | | memory.usage: 93eab625-0def-4780-9310-eceff46aab7b | | | memory: ea8f2152-09bd-4aac-bea5-fa8d4e72bbb1 | | | vcpus: e1c5acaf-1b10-4d34-98b5-3ad16de57a98 | | original_resource_id | ae3659d6-8998-44ae-a494-5248adbebe11 | ...
| type | instance | | user_id | a9c935f52e5540fc9befae7f91b4b3ae | +-----------------------+---------------------------------------------------------------------+
Obviously, I am missing something. Where is the missing link? What do I have to do to get CPU usage rates? Do I have to create metrics? Do//I have to ask Ceilometer to create metrics? How?
Right now, no instructions seem to exist at all. If that is correct, I would be happy to write documentation once I understand how it works.
Thanks a lot.
Bernd
On 5/10/2019 3:49 PM, info@dantalion.nl <mailto:info@dantalion.nl> wrote:
Hello,
I am working on Watcher and we are currently changing how metrics are retrieved from different datasources such as Monasca or Gnocchi. Because of this major overhaul I would like to validate that everything is working correctly.
Almost all of the optimization strategies in Watcher require the cpu utilization of an instance as metric but with newer versions of Ceilometer this has become unavailable.
On IRC I received the information that Gnocchi could be used to configure an aggregate and this aggregate would then report cpu utilization, however, I have been unable to find documentation on how to achieve this.
I was also notified that cpu_util is something that could be computed from other metrics. When reading https://docs.openstack.org/ceilometer/rocky/admin/telemetry-measurements.htm... the documentation seems to agree on this as it states that cpu_util is measured by using a 'rate of change' transformer. But I have not been able to find how this can be computed.
I was hoping someone could spare the time to provide documentation or information on how this currently is best achieved.
Kind Regards, Corne Lukken (Dantali0n)
Hello Bernd, Hello Lingxian, +1 You are not alone in your fruitless endeavor. Sadly, I can not come up with a solution. We are stuck at the same point. Maybe some day a dedicated member of the OpenStack community give the ceilometer guys a push to explain their service. For us, also using Stein, it is in the state of "not production ready". Cheers, Ralf T. ________________________________ Von: Bernd Bausch <berndbausch@gmail.com> Gesendet: Donnerstag, 1. August 2019 03:16:25 An: Lingxian Kong <anlin.kong@gmail.com> Cc: openstack-discuss <openstack-discuss@lists.openstack.org> Betreff: Re: [telemetry][ceilometer][gnocchi] How to configure aggregate for cpu_util or calculate from metrics Lingxian, Thanks for "bumping" my request and keeping it alive. The reason I need an answer: I am updating courseware to Stein that includes autoscaling based on CPU and disk I/O rates. Looks like I am "cutting edge" :) I don't think the problem is in the Gnocchi camp, but rather Ceilometer. To store rates of measures in z, the following is needed: * A metric. Raw measures are sent to the metric. * An archive policy. The metric has an archive policy. * The archive policy includes one or more rate aggregates My cloud has archive policies with rate aggregates, but the question is about the first bullet: How can I configure Ceilometer so that it creates the corresponding metrics and sends measures to them. In other words, how is Ceilometer's output connected to my archive policy. From my experience, just adding the archive policy to Ceilometer's publishers is not sufficient. Ceilometer's source code includes .../publisher/data/gnocchi_resources.yaml, which might well be the place where this can be configured. I am not sure how to do it though, and this file is not documented. I can read the source, but my developer skills are insufficient for understanding how everything fits together. Bernd On 8/1/2019 9:01 AM, Lingxian Kong wrote: Hi Bernd, There were a lot of people asked the same question before, unfortunately, I don't know the answer either(we are still using an old version of Ceilometer). The original cpu_util support has been removed from Ceilometer in favor of Gnocchi, but AFAIK, there is no doc in Gnocchi mentioned how to achieve the same thing and no clear answer from the Gnocchi maintainers. It'd be much appreciated if you could find the answer in the end, or there will be someone who has the already solved the issue. Best regards, Lingxian Kong Catalyst Cloud On Wed, Jul 31, 2019 at 1:28 PM Bernd Bausch <berndbausch@gmail.com<mailto:berndbausch@gmail.com>> wrote: The message at the end of this email is some three months old. I have the same problem. The question is: How to use the new rate metrics in Gnocchi. I am using a Stein Devstack for my tests. For example, I need the CPU rate, formerly named cpu_util. I created a new archive policy that uses rate:mean aggregation and has a 1 minute granularity: $ gnocchi archive-policy show ceilometer-medium-rate +---------------------+------------------------------------------------------------------+ | Field | Value | +---------------------+------------------------------------------------------------------+ | aggregation_methods | rate:mean, mean | | back_window | 0 | | definition | - points: 10080, granularity: 0:01:00, timespan: 7 days, 0:00:00 | | name | ceilometer-medium-rate | +---------------------+------------------------------------------------------------------+ I added the new policy to the publishers in pipeline.yaml: $ tail -n5 /etc/ceilometer/pipeline.yaml sinks: - name: meter_sink publishers: - gnocchi://?archive_policy=medium&filter_project=gnocchi_swift - gnocchi://?archive_policy=ceilometer-medium-rate&filter_project=gnocchi_swift After restarting all of Ceilometer, my hope was that the CPU rate would magically appear in the metric list. But no: All metrics are linked to archive policy medium, and looking at the details of an instance, I don't detect anything rate-related: $ gnocchi resource show ae3659d6-8998-44ae-a494-5248adbebe11 +-----------------------+---------------------------------------------------------------------+ | Field | Value | +-----------------------+---------------------------------------------------------------------+ ... | metrics | compute.instance.booting.time: 76fac1f5-962e-4ff2-8790-1f497c99c17d | | | cpu: af930d9a-a218-4230-b729-fee7e3796944 | | | disk.ephemeral.size: 0e838da3-f78f-46bf-aefb-aeddf5ff3a80 | | | disk.root.size: 5b971bbf-e0de-4e23-ba50-a4a9bf7dfe6e | | | memory.resident: 09efd98d-c848-4379-ad89-f46ec526c183 | | | memory.swap.in<https://urldefense.proofpoint.com/v2/url?u=http-3A__memory.swap.in&d=DwMDaQ&c=vo2ie5TPcLdcgWuLVH4y8lsbGPqIayH3XbK3gK82Oco&r=WXex93lsaiQ-z7CeZkHv93lzt4fdCRIPXloSPQEU7CM&m=pnr97rQYDOFbG5UeNvvK1DDoP0YecUmqLwRt4SI4wOU&s=wDnZesKE356cMfbQrJMuwYwdEof7ULmQOFQgqE31umo&e=>: 1bb4bb3c-e40a-4810-997a-295b2fe2d5eb | | | memory.swap.out: 4d012697-1d89-4794-af29-61c01c925bb4 | | | memory.usage: 93eab625-0def-4780-9310-eceff46aab7b | | | memory: ea8f2152-09bd-4aac-bea5-fa8d4e72bbb1 | | | vcpus: e1c5acaf-1b10-4d34-98b5-3ad16de57a98 | | original_resource_id | ae3659d6-8998-44ae-a494-5248adbebe11 | ... | type | instance | | user_id | a9c935f52e5540fc9befae7f91b4b3ae | +-----------------------+---------------------------------------------------------------------+ Obviously, I am missing something. Where is the missing link? What do I have to do to get CPU usage rates? Do I have to create metrics? Do I have to ask Ceilometer to create metrics? How? Right now, no instructions seem to exist at all. If that is correct, I would be happy to write documentation once I understand how it works. Thanks a lot. Bernd On 5/10/2019 3:49 PM, info@dantalion.nl<mailto:info@dantalion.nl> wrote: Hello, I am working on Watcher and we are currently changing how metrics are retrieved from different datasources such as Monasca or Gnocchi. Because of this major overhaul I would like to validate that everything is working correctly. Almost all of the optimization strategies in Watcher require the cpu utilization of an instance as metric but with newer versions of Ceilometer this has become unavailable. On IRC I received the information that Gnocchi could be used to configure an aggregate and this aggregate would then report cpu utilization, however, I have been unable to find documentation on how to achieve this. I was also notified that cpu_util is something that could be computed from other metrics. When reading https://docs.openstack.org/ceilometer/rocky/admin/telemetry-measurements.html#openstack-compute<https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_ceilometer_rocky_admin_telemetry-2Dmeasurements.html-23openstack-2Dcompute&d=DwMDaQ&c=vo2ie5TPcLdcgWuLVH4y8lsbGPqIayH3XbK3gK82Oco&r=WXex93lsaiQ-z7CeZkHv93lzt4fdCRIPXloSPQEU7CM&m=pnr97rQYDOFbG5UeNvvK1DDoP0YecUmqLwRt4SI4wOU&s=-ncji0Wl7WScsqBfumudi0ot_et_UIRfjh2c464FYWY&e=> the documentation seems to agree on this as it states that cpu_util is measured by using a 'rate of change' transformer. But I have not been able to find how this can be computed. I was hoping someone could spare the time to provide documentation or information on how this currently is best achieved. Kind Regards, Corne Lukken (Dantali0n)
I have a solution. At least it works for me. Be aware that this is Devstack, but I think nothing I did to solve my problem is Devstack-specific. Also, I don't know whether there are more efficient or canonical ways to reconfigure Ceilometer. But it's good enough for me. These are my steps - you may not need all of them. * in *pipeline.yaml*, set publisher to gnocchi:// * in *the resource definition file*, define my new archive policy. By default, this file resides in the Ceilometer source tree .../ceilometer/publisher/data/gnocchi_resources.yaml, but you can use config parameter resources_definition_file to change the default (I didn't try). Example: - name: ceilometer-medium-rate aggregation_methods: - mean - rate:mean back_window: 0 definition: - granularity: 1 minute timespan: 7 days - granularity: 1 hour timespan: 365 days * in the same resource definition file, *adjust the archive policy *of rate metrics. Example: - resource_type: instance metrics: ... cpu: archive_policy_name: ceilometer-medium-rate * *delete all existing metrics and resources *from Gnocchi Probably only necessary when Ceilometer is running, and not needed if you reconfigure it before its first start. This is a drastic measure, but if you do it at the beginning of a deployment, it won't cause loss of much data. Why is this required? A metric contains an archive policy that can't be changed. Thus existing metrics need to be recreated. Why remove resources? Because they reference the metrics that I removed. * *restart all Ceilometer services* This is required for re-reading the pipeline and the resource definition files. Ceilometer will create resources and metrics as needed when it sends its samples to Gnocchi. I tested this by running a CPU hogging instance and listing its measures after a few minutes: gnocchi measures show --resource f28f6b78-9dd5-49cc-a6ac-28cb14477bf0 --aggregation rate:mean cpu +---------------------------+-------------+---------------+ | timestamp | granularity | value | +---------------------------+-------------+---------------+ | 2019-08-01T20:23:00+09:00 | 60.0 | 1810000000.0 | | 2019-08-01T20:24:00+09:00 | 60.0 | 39940000000.0 | | 2019-08-01T20:25:00+09:00 | 60.0 | 40110000000.0 | This means that the instance accumulated 39940000000 nanoseconds of CPU time in the 60 seconds at 20:24:00. Note that the old /cpu_util /was expressed in percent, so that Aodh alarms and Heat autoscaling definitions must be adapted. Good luck. Hire me as Ceilometer consultant if you get stuck :) Bernd On 8/1/2019 6:11 PM, Teckelmann, Ralf, NMU-OIP wrote:
Hello Bernd, Hello Lingxian,
+1
You are not alone in your fruitless endeavor. Sadly, I can not come up with a solution.
We are stuck at the same point.
Maybe some day a dedicated member of the OpenStack community give the ceilometer guys a push to explain their service. For us, also using Stein, it is in the state of "not production ready".
Cheers,
Ralf T. ------------------------------------------------------------------------ *Von:* Bernd Bausch <berndbausch@gmail.com> *Gesendet:* Donnerstag, 1. August 2019 03:16:25 *An:* Lingxian Kong <anlin.kong@gmail.com> *Cc:* openstack-discuss <openstack-discuss@lists.openstack.org> *Betreff:* Re: [telemetry][ceilometer][gnocchi] How to configure aggregate for cpu_util or calculate from metrics
Lingxian,
Thanks for "bumping" my request and keeping it alive. The reason I need an answer: I am updating courseware to Stein that includes autoscaling based on CPU and disk I/O rates. Looks like I am "cutting edge" :)
I don't think the problem is in the Gnocchi camp, but rather Ceilometer. To store rates of measures in z, the following is needed:
* A /metric/. Raw measures are sent to the metric. * An /archive policy/. The metric has an archive policy. * The archive policy includes one or more /rate aggregates/
My cloud has archive policies with rate aggregates, but the question is about the first bullet: *How can I configure Ceilometer so that it creates the corresponding metrics and sends measures to them. *In other words, how is Ceilometer's output connected to my archive policy. From my experience, just adding the archive policy to Ceilometer's publishers is not sufficient.
Ceilometer's source code includes /.../publisher/data/gnocchi_resources.yaml/, which might well be the place where this can be configured. I am not sure how to do it though, and this file is not documented. I can read the source, but my developer skills are insufficient for understanding how everything fits together.
Bernd
On 8/1/2019 9:01 AM, Lingxian Kong wrote:
Hi Bernd,
There were a lot of people asked the same question before, unfortunately, I don't know the answer either(we are still using an old version of Ceilometer). The original cpu_util support has been removed from Ceilometer in favor of Gnocchi, but AFAIK, there is no doc in Gnocchi mentioned how to achieve the same thing and no clear answer from the Gnocchi maintainers.
It'd be much appreciated if you could find the answer in the end, or there will be someone who has the already solved the issue.
Best regards, Lingxian Kong Catalyst Cloud
On Wed, Jul 31, 2019 at 1:28 PM Bernd Bausch <berndbausch@gmail.com <mailto:berndbausch@gmail.com>> wrote:
The message at the end of this email is some three months old. I have the same problem. The question is: *How to use the new rate metrics in Gnocchi. *I am using a Stein Devstack for my tests.* *
For example, I need the CPU rate, formerly named /cpu_util/. I created a new archive policy that uses /rate:mean/ aggregation and has a 1 minute granularity:
$ gnocchi archive-policy show ceilometer-medium-rate +---------------------+------------------------------------------------------------------+ | Field | Value | +---------------------+------------------------------------------------------------------+ | aggregation_methods | rate:mean, mean | | back_window | 0 | | definition | - points: 10080, granularity: 0:01:00, timespan: 7 days, 0:00:00 | | name | ceilometer-medium-rate | +---------------------+------------------------------------------------------------------+
I added the new policy to the publishers in /pipeline.yaml/:
$ tail -n5 /etc/ceilometer/pipeline.yaml sinks: - name: meter_sink publishers: - gnocchi://?archive_policy=medium&filter_project=gnocchi_swift *- gnocchi://?archive_policy=ceilometer-medium-rate&filter_project=gnocchi_swift*
After restarting all of Ceilometer, my hope was that the CPU rate would magically appear in the metric list. But no: All metrics are linked to archive policy /medium/, and looking at the details of an instance, I don't detect anything rate-related:
$ gnocchi resource show ae3659d6-8998-44ae-a494-5248adbebe11 +-----------------------+---------------------------------------------------------------------+ | Field | Value | +-----------------------+---------------------------------------------------------------------+ ... | metrics | compute.instance.booting.time: 76fac1f5-962e-4ff2-8790-1f497c99c17d | | | cpu: af930d9a-a218-4230-b729-fee7e3796944 | | | disk.ephemeral.size: 0e838da3-f78f-46bf-aefb-aeddf5ff3a80 | | | disk.root.size: 5b971bbf-e0de-4e23-ba50-a4a9bf7dfe6e | | | memory.resident: 09efd98d-c848-4379-ad89-f46ec526c183 | | | memory.swap.in <https://urldefense.proofpoint.com/v2/url?u=http-3A__memory.swap.in&d=DwMDaQ&c=vo2ie5TPcLdcgWuLVH4y8lsbGPqIayH3XbK3gK82Oco&r=WXex93lsaiQ-z7CeZkHv93lzt4fdCRIPXloSPQEU7CM&m=pnr97rQYDOFbG5UeNvvK1DDoP0YecUmqLwRt4SI4wOU&s=wDnZesKE356cMfbQrJMuwYwdEof7ULmQOFQgqE31umo&e=>: 1bb4bb3c-e40a-4810-997a-295b2fe2d5eb | | | memory.swap.out: 4d012697-1d89-4794-af29-61c01c925bb4 | | | memory.usage: 93eab625-0def-4780-9310-eceff46aab7b | | | memory: ea8f2152-09bd-4aac-bea5-fa8d4e72bbb1 | | | vcpus: e1c5acaf-1b10-4d34-98b5-3ad16de57a98 | | original_resource_id | ae3659d6-8998-44ae-a494-5248adbebe11 | ...
| type | instance | | user_id | a9c935f52e5540fc9befae7f91b4b3ae | +-----------------------+---------------------------------------------------------------------+
Obviously, I am missing something. Where is the missing link? What do I have to do to get CPU usage rates? Do I have to create metrics? Do//I have to ask Ceilometer to create metrics? How?
Right now, no instructions seem to exist at all. If that is correct, I would be happy to write documentation once I understand how it works.
Thanks a lot.
Bernd
On 5/10/2019 3:49 PM, info@dantalion.nl <mailto:info@dantalion.nl> wrote:
Hello,
I am working on Watcher and we are currently changing how metrics are retrieved from different datasources such as Monasca or Gnocchi. Because of this major overhaul I would like to validate that everything is working correctly.
Almost all of the optimization strategies in Watcher require the cpu utilization of an instance as metric but with newer versions of Ceilometer this has become unavailable.
On IRC I received the information that Gnocchi could be used to configure an aggregate and this aggregate would then report cpu utilization, however, I have been unable to find documentation on how to achieve this.
I was also notified that cpu_util is something that could be computed from other metrics. When reading https://docs.openstack.org/ceilometer/rocky/admin/telemetry-measurements.htm... <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_ceilometer_rocky_admin_telemetry-2Dmeasurements.html-23openstack-2Dcompute&d=DwMDaQ&c=vo2ie5TPcLdcgWuLVH4y8lsbGPqIayH3XbK3gK82Oco&r=WXex93lsaiQ-z7CeZkHv93lzt4fdCRIPXloSPQEU7CM&m=pnr97rQYDOFbG5UeNvvK1DDoP0YecUmqLwRt4SI4wOU&s=-ncji0Wl7WScsqBfumudi0ot_et_UIRfjh2c464FYWY&e=> the documentation seems to agree on this as it states that cpu_util is measured by using a 'rate of change' transformer. But I have not been able to find how this can be computed.
I was hoping someone could spare the time to provide documentation or information on how this currently is best achieved.
Kind Regards, Corne Lukken (Dantali0n)
Hey Bernd, Can you try with just one publisher instead of 2 and also drop the archive_policy query parameter and its value. Then ceilometer should publish metrics based on map defined gnocchi_resources.yaml And while you are at it. Could you post a list of archive policies already defined in gnocchi, I believe this list should match what is listed in gnocchi_resources.yaml. Hope that helps Sumit On 7/31/19 3:22 AM, Bernd Bausch wrote:
The message at the end of this email is some three months old. I have the same problem. The question is: *How to use the new rate metrics in Gnocchi. *I am using a Stein Devstack for my tests.* *
For example, I need the CPU rate, formerly named /cpu_util/. I created a new archive policy that uses /rate:mean/ aggregation and has a 1 minute granularity:
$ gnocchi archive-policy show ceilometer-medium-rate +---------------------+------------------------------------------------------------------+ | Field | Value | +---------------------+------------------------------------------------------------------+ | aggregation_methods | rate:mean, mean | | back_window | 0 | | definition | - points: 10080, granularity: 0:01:00, timespan: 7 days, 0:00:00 | | name | ceilometer-medium-rate | +---------------------+------------------------------------------------------------------+
I added the new policy to the publishers in /pipeline.yaml/:
$ tail -n5 /etc/ceilometer/pipeline.yaml sinks: - name: meter_sink publishers: - gnocchi://?archive_policy=medium&filter_project=gnocchi_swift *- gnocchi://?archive_policy=ceilometer-medium-rate&filter_project=gnocchi_swift*
After restarting all of Ceilometer, my hope was that the CPU rate would magically appear in the metric list. But no: All metrics are linked to archive policy /medium/, and looking at the details of an instance, I don't detect anything rate-related:
$ gnocchi resource show ae3659d6-8998-44ae-a494-5248adbebe11 +-----------------------+---------------------------------------------------------------------+ | Field | Value | +-----------------------+---------------------------------------------------------------------+ ... | metrics | compute.instance.booting.time: 76fac1f5-962e-4ff2-8790-1f497c99c17d | | | cpu: af930d9a-a218-4230-b729-fee7e3796944 | | | disk.ephemeral.size: 0e838da3-f78f-46bf-aefb-aeddf5ff3a80 | | | disk.root.size: 5b971bbf-e0de-4e23-ba50-a4a9bf7dfe6e | | | memory.resident: 09efd98d-c848-4379-ad89-f46ec526c183 | | | memory.swap.in: 1bb4bb3c-e40a-4810-997a-295b2fe2d5eb | | | memory.swap.out: 4d012697-1d89-4794-af29-61c01c925bb4 | | | memory.usage: 93eab625-0def-4780-9310-eceff46aab7b | | | memory: ea8f2152-09bd-4aac-bea5-fa8d4e72bbb1 | | | vcpus: e1c5acaf-1b10-4d34-98b5-3ad16de57a98 | | original_resource_id | ae3659d6-8998-44ae-a494-5248adbebe11 | ...
| type | instance | | user_id | a9c935f52e5540fc9befae7f91b4b3ae | +-----------------------+---------------------------------------------------------------------+
Obviously, I am missing something. Where is the missing link? What do I have to do to get CPU usage rates? Do I have to create metrics? Do//I have to ask Ceilometer to create metrics? How?
Right now, no instructions seem to exist at all. If that is correct, I would be happy to write documentation once I understand how it works.
Thanks a lot.
Bernd
On 5/10/2019 3:49 PM, info@dantalion.nl wrote:
Hello,
I am working on Watcher and we are currently changing how metrics are retrieved from different datasources such as Monasca or Gnocchi. Because of this major overhaul I would like to validate that everything is working correctly.
Almost all of the optimization strategies in Watcher require the cpu utilization of an instance as metric but with newer versions of Ceilometer this has become unavailable.
On IRC I received the information that Gnocchi could be used to configure an aggregate and this aggregate would then report cpu utilization, however, I have been unable to find documentation on how to achieve this.
I was also notified that cpu_util is something that could be computed from other metrics. When reading https://docs.openstack.org/ceilometer/rocky/admin/telemetry-measurements.htm... the documentation seems to agree on this as it states that cpu_util is measured by using a 'rate of change' transformer. But I have not been able to find how this can be computed.
I was hoping someone could spare the time to provide documentation or information on how this currently is best achieved.
Kind Regards, Corne Lukken (Dantali0n)
Thanks much, Sumit. I did not detect your reply until now. Still hard to manage the openstack-discuss mailing list. In the mean time, I have made a lot of progress and understand how Ceilometer creates its own archive policies and adds resources and their metrics to Gnocchi - based on gnocchi_resources.yaml, as you correctly remarked. Thanks to help from the Gnocchi team, I also know how to generate CPU utilization figures. See this issue on Github if you are interested: https://github.com/gnocchixyz/gnocchi/issues/1044. My ultimate goal is autoscaling based on CPU utilization. I have not solved that problem that, but it's a different question. One question at a time! Thanks again, this immediate question is answered. Bernd. On 8/1/2019 8:20 PM, Sumit Jamgade wrote:
Hey Bernd,
Can you try with just one publisher instead of 2 and also drop the archive_policy query parameter and its value.
Then ceilometer should publish metrics based on map defined gnocchi_resources.yaml
And while you are at it. Could you post a list of archive policies already defined in gnocchi, I believe this list should match what is listed in gnocchi_resources.yaml.
Hope that helps Sumit
On 7/31/19 3:22 AM, Bernd Bausch wrote:
The message at the end of this email is some three months old. I have the same problem. The question is: *How to use the new rate metrics in Gnocchi. *I am using a Stein Devstack for my tests.* *
For example, I need the CPU rate, formerly named /cpu_util/. I created a new archive policy that uses /rate:mean/ aggregation and has a 1 minute granularity:
$ gnocchi archive-policy show ceilometer-medium-rate +---------------------+------------------------------------------------------------------+ | Field | Value | +---------------------+------------------------------------------------------------------+ | aggregation_methods | rate:mean, mean | | back_window | 0 | | definition | - points: 10080, granularity: 0:01:00, timespan: 7 days, 0:00:00 | | name | ceilometer-medium-rate | +---------------------+------------------------------------------------------------------+
I added the new policy to the publishers in /pipeline.yaml/:
$ tail -n5 /etc/ceilometer/pipeline.yaml sinks: - name: meter_sink publishers: - gnocchi://?archive_policy=medium&filter_project=gnocchi_swift *- gnocchi://?archive_policy=ceilometer-medium-rate&filter_project=gnocchi_swift*
After restarting all of Ceilometer, my hope was that the CPU rate would magically appear in the metric list. But no: All metrics are linked to archive policy /medium/, and looking at the details of an instance, I don't detect anything rate-related:
$ gnocchi resource show ae3659d6-8998-44ae-a494-5248adbebe11 +-----------------------+---------------------------------------------------------------------+ | Field | Value | +-----------------------+---------------------------------------------------------------------+ ... | metrics | compute.instance.booting.time: 76fac1f5-962e-4ff2-8790-1f497c99c17d | | | cpu: af930d9a-a218-4230-b729-fee7e3796944 | | | disk.ephemeral.size: 0e838da3-f78f-46bf-aefb-aeddf5ff3a80 | | | disk.root.size: 5b971bbf-e0de-4e23-ba50-a4a9bf7dfe6e | | | memory.resident: 09efd98d-c848-4379-ad89-f46ec526c183 | | | memory.swap.in: 1bb4bb3c-e40a-4810-997a-295b2fe2d5eb | | | memory.swap.out: 4d012697-1d89-4794-af29-61c01c925bb4 | | | memory.usage: 93eab625-0def-4780-9310-eceff46aab7b | | | memory: ea8f2152-09bd-4aac-bea5-fa8d4e72bbb1 | | | vcpus: e1c5acaf-1b10-4d34-98b5-3ad16de57a98 | | original_resource_id | ae3659d6-8998-44ae-a494-5248adbebe11 | ...
| type | instance | | user_id | a9c935f52e5540fc9befae7f91b4b3ae | +-----------------------+---------------------------------------------------------------------+
Obviously, I am missing something. Where is the missing link? What do I have to do to get CPU usage rates? Do I have to create metrics? Do//I have to ask Ceilometer to create metrics? How?
Right now, no instructions seem to exist at all. If that is correct, I would be happy to write documentation once I understand how it works.
Thanks a lot.
Bernd
On 5/10/2019 3:49 PM, info@dantalion.nl wrote:
Hello,
I am working on Watcher and we are currently changing how metrics are retrieved from different datasources such as Monasca or Gnocchi. Because of this major overhaul I would like to validate that everything is working correctly.
Almost all of the optimization strategies in Watcher require the cpu utilization of an instance as metric but with newer versions of Ceilometer this has become unavailable.
On IRC I received the information that Gnocchi could be used to configure an aggregate and this aggregate would then report cpu utilization, however, I have been unable to find documentation on how to achieve this.
I was also notified that cpu_util is something that could be computed from other metrics. When reading https://docs.openstack.org/ceilometer/rocky/admin/telemetry-measurements.htm... the documentation seems to agree on this as it states that cpu_util is measured by using a 'rate of change' transformer. But I have not been able to find how this can be computed.
I was hoping someone could spare the time to provide documentation or information on how this currently is best achieved.
Kind Regards, Corne Lukken (Dantali0n)
On Wed, May 08, 2019 at 09:45:38PM -0300, Rafael Weingärtner wrote:
Having said that, what puzzled me, and worried us, is the fact that features that work are being removed from a project just because some contributors/committers left the community. There wasn't (at least I did not see) a good technical reason to remove this feature (e.g. it does not
If I remember correctly, it was the other way around. The idea was to make things cleaner: ceilometer to just gather data and to send it along, gnocchi for storage, panko for events, etc.
deliver what is promised, or an alternative solution has been created somewhere and effort is being concentrated there, nobody uses it, and so on). If the features were broken, and there were no people to fix it, I would understand, but that is not the case. The feature works, and it delivers what is promised. Moreover, reading the blog you referenced does not provide a good feeling about how the community has managed the event (the project losing part of its contributors) in question. OpenSource has cycles, and it is understandable that sometimes we do not have many people working on something. OpenSource projects have cycles, and that is normal. As you can see, now there would be us starting/trying to engage with the Telemetry project/community. What is hard for us to understand is that the contributors while leaving are also "killing" the project by removing part of its features (that are very interesting and valuable for us).
So, let's take your understanding what/how OpenSource works aside, please. I am sure, nobody is trying to kill their baby when leaving a project.
Why is that important for us? When we work with OpenSource we now that we might need to put effort to customize/adapt things to our business workflow, and we expect that the community will be there to receive and discuss these changes. Therefore, we have predictability that the software/system we base our business will be there, and we can contribute back to improve it. An open source community could and should live even if the project has no community for a while, then if people regroup and start to work on it again, the community is able to flourish.
Right. We're at the point "after no community", and it is up to the community to start something new, taking over the corresponding code (if they choose to do so). Matthias -- Matthias Runge <mrunge@matthias-runge.de>
Interesting thread! I was very intrigued about reading this thread, and reading through Julien's blog posts. We are consuming the various Telemetry projects and I would like to get more involved in the Telemetry-effort. The mentioned Train roadmap ethernetpad [1] is a great start on defining the focus for the Telemetry projects. I think this is a great start to getting to the roots of how, based on the words of Julien's, how Ceilometer should have been build and not how it was built to work around all the limitations of the old OpenStack era. Most new deployments are probably already using Gnocchi, I'm including myself, were even third parties has implemented billing connections to the API. In my opinion the bigger questions here is how the various Telemetry parts should evolve based on the theory that storage is already provided. There should be a lot of thought put into the difference between a metrics- and billingbased storage solution in the back. I'll check out the Doodle and see if I can make the next meeting. Best regards Tobias [1] https://etherpad.openstack.org/p/telemetry-train-roadmap On 05/08/2019 11:27 PM, Joseph Davis wrote:
On 5/8/19 7:12 AM, openstack-discuss-request@lists.openstack.org wrote:
Hello Trinh, Where does the meeting happen? Will it be via IRC Telemetry channel? Or, in the Etherpad (https://etherpad.openstack.org/p/telemetry-meeting-agenda)? I would like to discuss and understand a bit better the context behind the Telemetry events deprecation.
Unfortunately, I have a conflict at that time and will not be able to attend.
I do have a little bit of context on the Events deprecation to share.
First, you will note the commit message from the commit [0] when Events were deprecated:
"
Deprecate event subsystem
This subsystem has never been finished and is not maintained. Deprecate it for future removal.
"
I got the impression from jd at the time that there were a number of features in Telemetry,
including Panko, that were not really "finished" and that the engineers who had worked on them
had moved on to other things, so the features had become unsupported. In late 2018 there was
an effort to clean up things that were not well maintained or didn't fit the direction of Telemetry.
See also: https://julien.danjou.info/lessons-from-openstack-telemetry-deflation/
Events is one feature that often gets requested, but the use cases and demand for it are not expressed
strongly or well understood by most people. If the Telemetry project has demand to de-deprecate
Event handling (including Panko), I'd suggest a review of the requirements for event handling and
possibly choosing a champion for maintaining the Panko service.
Also note: over in Monasca we have a spec [1] for handling Events ingestion which I hope we will be
completing in Train. Contributions and comments welcome. :)
joseph
[0] https://github.com/openstack/ceilometer/commit/8a0245a5b3e1357d35ad6653be37c...
[1] https://github.com/openstack/monasca-specs/blob/master/specs/stein/approved/...
On Wed, May 8, 2019 at 12:19 AM Trinh Nguyen<dangtrinhnt@gmail.com> wrote:
Hi team,
As planned, we will have a team meeting at 02:00 UTC, May 9th on #openstack-telemetry to discuss what we gonna do for the next milestone (Train-1) and continue what we left off from the last meeting.
I put here [1] the agenda thinking that it should be fine for an hour meeting. If you have anything to talk about, please put it there too.
[1]https://etherpad.openstack.org/p/telemetry-meeting-agenda
Bests,
-- *Trinh Nguyen* *www.edlab.xyz<https://www.edlab.xyz>*
participants (14)
-
Bernd Bausch
-
info@dantalion.nl
-
Jeremy Stanley
-
Joseph Davis
-
Lingxian Kong
-
Matthias Runge
-
Rafael Weingärtner
-
Sumit Jamgade
-
Teckelmann, Ralf, NMU-OIP
-
Thomas Goirand
-
Tim Bell
-
Tobias Urdin
-
Trinh Nguyen
-
Witek Bedyk