RE: Data Center Survival in case of Disaster / HW Failure in DC
Hello, First Openstack isnt VMware.Crtiticals applications need to be design with a multiples regions/AZ aware. Best way was two regions , 2 contrôle plane (keystone , Galera cluster on both (not the same) 2 storage system and application spitted on both is the best pratice 😉 You can do also with two AZ but the control plane is share between two AZ (galezra cluster and RabbitMq ) If you are trying to do the same thing as a streched VMware Cluster on a dual-site it's complicated, i you have 3 sites (low latencies <1ms) you can ! 1.In case of any Sudden Hardware failure of one or more controller node OR Compute node OR Storage Node what will be the immediate redundant recovery setup need to be employed ? => You need to splt control plane throught multiples AZ (3 is the best for Galera cluster and RabbitMQ , 2 if you have HA VMware) => Network compute work with keepalived so also need to split them on two AZ/region, you need . Dhcp , Metadata services are also redundancy on multiples network node => compute : HA OpenStack (like VMware HA) seems to work since Wallaby with Masakari . Actually you need to do the job by yourself if you need to failover instances. The storage is also the bggest problem . You need a streched storage that work with Cinder : Ceph can do the job but you also need three site (quorum on monitor need 2 node alives).Netapp solution seems to work well (trident NFS) 2. In case H/W failure our recovery need to as soon as possible. For example less than30 Minutes after the first failure occurs. Galera Cluster and RabbitMq need to be check first With two sites 30mn is really a big challenge, with 3 site it should good ! 3. Is there setup options like a hot standby or similar setups or what we need to employ ? 2 AZ , 2 storage system and application spitted on both is the best pratice 😉 4. To meet all RTO (< 30 Minutes down time ) and RPO(from the exact point of crash all applications and data must be consistent) . Regards, Stéphane Chalansonnet -----Message d'origine----- De : openstack-discuss-request@lists.openstack.org <openstack-discuss-request@lists.openstack.org> Envoyé : jeudi 5 mai 2022 11:56 À : openstack-discuss@lists.openstack.org Objet : openstack-discuss Digest, Vol 43, Issue 12 Send openstack-discuss mailing list submissions to openstack-discuss@lists.openstack.org To subscribe or unsubscribe via the World Wide Web, visit http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-discuss or, via email, send a message with subject or body 'help' to openstack-discuss-request@lists.openstack.org You can reach the person managing the list at openstack-discuss-owner@lists.openstack.org When replying, please edit your Subject line so it is more specific than "Re: Contents of openstack-discuss digest..." Today's Topics: 1. Data Center Survival in case of Disaster / HW Failure in DC (KK CHN) 2. [nova][placement] Incomplete Consumers return negative value after upgrade (Jan Wasilewski) 3. Re: [all][tc][Release Management] Improvements in project governance (Slawek Kaplonski) ---------------------------------------------------------------------- Message: 1 Date: Thu, 5 May 2022 14:46:11 +0530 From: KK CHN <kkchn.in@gmail.com> To: openstack-discuss@lists.openstack.org Subject: Data Center Survival in case of Disaster / HW Failure in DC Message-ID: <CAKgGyB_UmDG+TknDaYLGAuyOnxY8ipSF3qC4tS90T0zHzu6fZg@mail.gmail.com> Content-Type: text/plain; charset="utf-8" List, We are having an old cloud setup with OpenStack Ussuri usng Debian OS, (Qemu KVM ). I know its very old and we can't upgrade to to new versions right now. The Deployment is as follows. A. 3 Controller in (cum compute nodes . VMs are running on controllers too..) in HA mode. B. 6 separate Compute nodes C. 3 separate Storage node with Ceph RBD Question is 1. In case of any Sudden Hardware failure of one or more controller node OR Compute node OR Storage Node what will be the immediate redundant recovery setup need to be employed ? 2. In case H/W failure our recovery need to as soon as possible. For example less than30 Minutes after the first failure occurs. 3. Is there setup options like a hot standby or similar setups or what we need to employ ? 4. To meet all RTO (< 30 Minutes down time ) and RPO(from the exact point of crash all applications and data must be consistent) . 5. Please share your thoughts for reliable crash/fault resistance configuration options in DC. We have a remote DR setup right now in a remote location. Also I would like to know if there is a recommended way to make the remote DR site Automatically up and run ? OR How to automate the service from DR site to meet exact RTO and RPO Any thoughts most welcom. Regards, Krish -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20220505/57a22952/attachment-0001.htm> ------------------------------ Message: 2 Date: Thu, 5 May 2022 11:31:16 +0200 From: Jan Wasilewski <finarffin@gmail.com> To: openstack-discuss@lists.openstack.org Subject: [nova][placement] Incomplete Consumers return negative value after upgrade Message-ID: <CAN4DDNgG=i8tGpPQW2PUXVUp72T9r4CWDdxmfRg-bX8T1eE5AA@mail.gmail.com> Content-Type: text/plain; charset="utf-8" Hi, after an upgrade from Stein to Train, I hit an issue with negative value during upgrade check for placement: # placement-status upgrade check+------------------------------------------------------------------+| Upgrade Check Results |+------------------------------------------------------------------+| Check: Missing Root Provider IDs || Result: Success || Details: None |+------------------------------------------------------------------+| Check: Incomplete Consumers || Result: Warning || Details: There are -20136 incomplete consumers table records for || existing allocations. Run the "placement-manage db || online_data_migrations" command. |+------------------------------------------------------------------+ Seems that negative value is a result that I get such values from tables consumer and allocations: mysql> select count(id), consumer_id from allocations group by consumer_id;...1035 rows in set (0.00 sec) mysql> select count(*) from consumers;+----------+| count(*) |+----------+| 21171 |+----------+1 row in set (0.04 sec) Unfortunately such warning cannot be solved by execution of suggested command( placement-manage db online_data_migrations) as it seems it adds records to consumers table - not to allocations, which looks like to be a problem here. I was following recommendations from this discussion: http://lists.openstack.org/pipermail/openstack-discuss/2020-November/018536.... but unfortunately it doesn't solve the issue(even not changing a negative value). I'm just wondering if I skipped something important and you can suggest some (obvious?) solution. Thank you in advance for your time and help. Best regards, Jan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20220505/96ed0b46/attachment-0001.htm> ------------------------------ Message: 3 Date: Thu, 05 May 2022 11:55:32 +0200 From: Slawek Kaplonski <skaplons@redhat.com> To: openstack-discuss@lists.openstack.org Cc: El?d Ill?s <elod.illes@est.tech> Subject: Re: [all][tc][Release Management] Improvements in project governance Message-ID: <3198538.44csPzL39Z@p1> Content-Type: text/plain; charset="utf-8" Hi, On ?roda, 20 kwietnia 2022 13:19:10 CEST El?d Ill?s wrote:
Hi,
At the very same time at the PTG we discussed this on the Release Management session [1] as well. To release deliverables without significant content is not ideal and this came up in previous discussions as well. On the other hand unfortunately this is the most feasible solution from release management team perspective especially because the team is quite small (new members are welcome! feel free to join the release management team! :)).
To change to independent release model is an option for some cases, but not for every project. (It is less clear for consumers what version is/should be used for which series; Fixing problems that comes up in specific stable branches, is not possible; testing the deliverable against a specific stable branch constraints is not possiblel; etc.)
See some other comments inline.
[1] https://etherpad.opendev.org/p/april2022-ptg-rel-mgt#L44
El?d
On 2022. 04. 19. 18:01, Michael Johnson wrote:
Comments inline.
Michael
Hi,
During the Zed PTG sessions in the TC room we were discussing some ideas how we can improve project governance.
One of the topics was related to the projects which don't really have any changes in the cycle. Currently we are forcing to do new release of basically the same code when it comes to the end of the cycle.
Can/Should we maybe change that and e.g. instead of forcing new release use last released version of the of the repo for new release too? In the past this has created confusion in the community about if a
If yes, should we then automatically propose change of the release model to the "independent" maybe? Personally, I would prefer to send an email to the discuss list
On Tue, Apr 19, 2022 at 6:34 AM Slawek Kaplonski<skaplons@redhat.com> wrote: project has been dropped/removed from OpenStack. That said, I think this is the point of the "independent" release classification. Yes, exactly as Michael says. proposing the switch to independent. Patches can sometimes get merged before everyone gets to give input. Especially since the patch would be proposed in the "releases" project and may not be on the team's dashboards. The release process catches libraries only (that had no merged change), so the number is not that huge, sending a mail seems to be a fair option.
(The process says: "Evaluate any libraries that did not have any change merged over the cycle to see if it is time to transition them to the independent release model <https://releases.openstack.org/reference/release_models.html#openstack-related-libraries>. Note: client libraries (and other libraries strongly tied to another deliverable) should generally follow their parent deliverable release model, even if they did not have a lot of activity themselves).")
What would be the best way how Release Management team can maybe notify TC about such less active projects which don't needs any new release in the cycle? That could be one of the potential conditions to check project's health by the TC team. It seems like this would be a straight forward script to write given we already have tools to capture the list of changes included in a given release.
There are a couple of good signals already for TC to catch inactive projects, like the generated patches that are not merged, for example:
https://review.opendev.org/q/topic:reno-yoga+is:open https://review.opendev.org/q/topic:create-yoga+is:open https://review.opendev.org/q/topic:add-xena-python-jobtemplates+is:ope n
(Note that in the past not merged patches caused issues and discussing with the TC resulted a suggestion to force-merge them to avoid future issues)
Another question is related to the projects which aren't really active and are broken during the final release time. We had such problem in the last cycle, see [1] for details. Should we still force pushing fixes for them to be able to release or maybe should we consider deprecation of such projects and not to release it at all? In the past we have simply not released projects that are broken and don't have people actively working on fixing them. It has been a signal to the community that if they value the project they need to contribute to it.
Yes, that's a fair point, too, maybe those broken deliverables should not be released at all. I'm not sure, but that might cause another issues for release management tooling, though...
Besides, during our PTG session we came to the conclusion that we need another step in our process: * "propose DNM changes on every repository by RequirementsFreeze (5 weeks before final release) to check that tests are still passing with the current set of dependencies" Hopefully this will catch broken things well in advance.
[1]http://lists.openstack.org/pipermail/openstack-discuss/2022-Marc h/027864.html
--
Slawek Kaplonski
Principal Software Engineer
Red Hat
Thx for all inputs in that topic so far. Here is my summary and conclusion of what was said in that thread: * we shouldn't try automatically switch such "inactive" projects to the independent model, and we should continue bumping versions of such projects every cycle as it makes many things easier, * Release Management team will test projects about 5 weeks before final release - that may help us find broken projects which then can be discussed and eventually marked as deprecated to not release broken code finally, * To check potentially inactive projects TC can: * finish script https://review.opendev.org/c/openstack/governance/+/810037[1] and use stats generate by that script to periodically check projects' health, * check projects with no merged generated patches, like: https://review.opendev.org/q/topic:reno-yoga+is:open[2] https://review.opendev.org/q/topic:create-yoga+is:open[3] https://review.opendev.org/q/topic:add-xena-python-jobtemplates+is:open[4] Feel free to add/change anything in that summary if I missed or misunderstood anything there or if You have any idea about other improvements we can do in that area. -- Slawek Kaplonski Principal Software Engineer Red Hat -------- [1] https://review.opendev.org/c/openstack/governance/+/810037 [2] https://review.opendev.org/q/topic:reno-yoga+is:open [3] https://review.opendev.org/q/topic:create-yoga+is:open [4] https://review.opendev.org/q/topic:add-xena-python-jobtemplates+is:open -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20220505/d480b3e6/attachment.htm> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: This is a digitally signed message part. URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20220505/d480b3e6/attachment.sig> ------------------------------ Subject: Digest Footer _______________________________________________ openstack-discuss mailing list openstack-discuss@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-discuss ------------------------------ End of openstack-discuss Digest, Vol 43, Issue 12 *************************************************
participants (1)
-
CHALANSONNET Stéphane (Acoss)