rabbitmq config changes not applied in newer format
Hi *, this might not be the right place to ask but I'm curious how others configure their rabbitmq. RabbitMQ version: 3.8.3 OpenStack version: Ussuri/Victoria (two different clusters) We have a highly available cloud controlled by pacemaker. The deployment is based on our own Salt states. We noticed that our config is not applied (e. g. after power outage), we need to import those manually with 'rabbitmqctl import /path/to/definitions.json'. Comparing it to an older cloud version (Rocky, Rabbit version 3.6.16) I noticed that the config format is different between those versions, the older version uses a json format (?): ---snip--- # old format cat /etc/rabbitmq/rabbitmq.config [ {mnesia, [ {dump_log_write_threshold, 300}, {dump_log_time_threshold, 180000} ] }, {rabbit, [ {collect_statistics_interval, 5000}, {tcp_listen_options, [ {backlog, 128},{nodelay, true},{keepalive, false} ] }, {tcp_listeners, [ {"x.x.x.x", 5672} ]}, {cluster_partition_handling, pause_minority}, {queue_master_locator, <<"min-masters">>}, {disk_free_limit, 50000000} ]}, {rabbitmq_management, [ {listener, [{ip, "x.x.x.x"}, {port, 15673}]}, {load_definitions, "/etc/rabbitmq/definitions.json"} ] } ]. # new format [...] listeners.tcp.default = 5672 listeners.tcp.other_port = 5673 listeners.tcp.other_ip = x.x.x.x:5672 [...] ---snip--- If I switch to the old format and restart rabbitmq on that node, the changes are applied successfully. Is this a known issue? Should I just stick to the old format? Any comments are appreciated! Regards, Eugen
Update: it doesn't seem to be a problem with the config file format but with the clustering. On a VM (single-control node) I tried the "newer" config file format and it worked. But that node is not part of a cluster, in our production cloud we have the "classic_config" enabled as backend and a list of all cluster nodes in the config file: cluster_formation.peer_discovery_backend = classic_config cluster_formation.classic_config.nodes.1 = rabbit@controller01.fqdn cluster_formation.classic_config.nodes.2 = rabbit@controller02.fqdn cluster_formation.classic_config.nodes.3 = rabbit@controller01 cluster_formation.classic_config.nodes.4 = rabbit@controller02 When I remove that cluster part from the newer config file the node doesn't join the cluster, this is what the config looks like: listeners.tcp.all_ips = 0.0.0.0:5672 management.load_definitions = /etc/rabbitmq/definitions.json management.ssl.port = 15671 management.ssl.ip = 0.0.0.0 management.ssl.cacertfile = /path/to/cafile management.ssl.certfile = /path/to/certfile management.ssl.keyfile = /path/to/keyfile This is the error message: 2022-03-24 15:25:49.183 [info] <0.1227.0> WAL: recovering ["/var/lib/rabbitmq/mnesia/rabbit@controller01/quorum/rabbit@controller01/00000002.wal"] 2022-03-24 15:25:49.359 [warning] <0.1227.0> wal: encountered error during recovery: badarg Then I switch back to the old format where no clustering is mentioned at all and it works. Could anyone give me a hint what I'm doing wrong? Zitat von Eugen Block <eblock@nde.ag>:
Hi *,
this might not be the right place to ask but I'm curious how others configure their rabbitmq.
RabbitMQ version: 3.8.3 OpenStack version: Ussuri/Victoria (two different clusters)
We have a highly available cloud controlled by pacemaker. The deployment is based on our own Salt states. We noticed that our config is not applied (e. g. after power outage), we need to import those manually with 'rabbitmqctl import /path/to/definitions.json'. Comparing it to an older cloud version (Rocky, Rabbit version 3.6.16) I noticed that the config format is different between those versions, the older version uses a json format (?):
---snip--- # old format
cat /etc/rabbitmq/rabbitmq.config [ {mnesia, [ {dump_log_write_threshold, 300}, {dump_log_time_threshold, 180000} ] }, {rabbit, [ {collect_statistics_interval, 5000}, {tcp_listen_options, [ {backlog, 128},{nodelay, true},{keepalive, false} ] }, {tcp_listeners, [ {"x.x.x.x", 5672} ]}, {cluster_partition_handling, pause_minority}, {queue_master_locator, <<"min-masters">>}, {disk_free_limit, 50000000} ]}, {rabbitmq_management, [ {listener, [{ip, "x.x.x.x"}, {port, 15673}]}, {load_definitions, "/etc/rabbitmq/definitions.json"} ] } ].
# new format [...] listeners.tcp.default = 5672 listeners.tcp.other_port = 5673 listeners.tcp.other_ip = x.x.x.x:5672 [...] ---snip---
If I switch to the old format and restart rabbitmq on that node, the changes are applied successfully. Is this a known issue? Should I just stick to the old format? Any comments are appreciated!
Regards, Eugen
Update #2: After playing around with my test setup (without pacemaker involved) I learned that the "new config file format" is just converted into the erlang format during rabbitmq startup. That's why our production setup works (we use the erlang format). But I noticed that in the HA setup there's no /var/lib/rabbitmq/schema/rabbitmq_management.schema file present. Apparently, that file is created by starting rabbitmq via systemd, it is not created when pacemaker starts rabbitmq although the plugin is enabled, the zipped files are all there but don't seem to be generated in the working directory like in my test setup. Also, the file /var/lib/rabbitmq/config/generated/rabbitmq.config is not generated in the HA setup. So the workaround is clearly to use the erlang config file format, but I'd like to understand the difference. I'll continue the investigation to see if I find it myself, but I'd appreciate any hints. Zitat von Eugen Block <eblock@nde.ag>:
Update: it doesn't seem to be a problem with the config file format but with the clustering. On a VM (single-control node) I tried the "newer" config file format and it worked. But that node is not part of a cluster, in our production cloud we have the "classic_config" enabled as backend and a list of all cluster nodes in the config file:
cluster_formation.peer_discovery_backend = classic_config cluster_formation.classic_config.nodes.1 = rabbit@controller01.fqdn cluster_formation.classic_config.nodes.2 = rabbit@controller02.fqdn cluster_formation.classic_config.nodes.3 = rabbit@controller01 cluster_formation.classic_config.nodes.4 = rabbit@controller02
When I remove that cluster part from the newer config file the node doesn't join the cluster, this is what the config looks like:
listeners.tcp.all_ips = 0.0.0.0:5672 management.load_definitions = /etc/rabbitmq/definitions.json management.ssl.port = 15671 management.ssl.ip = 0.0.0.0 management.ssl.cacertfile = /path/to/cafile management.ssl.certfile = /path/to/certfile management.ssl.keyfile = /path/to/keyfile
This is the error message:
2022-03-24 15:25:49.183 [info] <0.1227.0> WAL: recovering ["/var/lib/rabbitmq/mnesia/rabbit@controller01/quorum/rabbit@controller01/00000002.wal"] 2022-03-24 15:25:49.359 [warning] <0.1227.0> wal: encountered error during recovery: badarg
Then I switch back to the old format where no clustering is mentioned at all and it works. Could anyone give me a hint what I'm doing wrong?
Zitat von Eugen Block <eblock@nde.ag>:
Hi *,
this might not be the right place to ask but I'm curious how others configure their rabbitmq.
RabbitMQ version: 3.8.3 OpenStack version: Ussuri/Victoria (two different clusters)
We have a highly available cloud controlled by pacemaker. The deployment is based on our own Salt states. We noticed that our config is not applied (e. g. after power outage), we need to import those manually with 'rabbitmqctl import /path/to/definitions.json'. Comparing it to an older cloud version (Rocky, Rabbit version 3.6.16) I noticed that the config format is different between those versions, the older version uses a json format (?):
---snip--- # old format
cat /etc/rabbitmq/rabbitmq.config [ {mnesia, [ {dump_log_write_threshold, 300}, {dump_log_time_threshold, 180000} ] }, {rabbit, [ {collect_statistics_interval, 5000}, {tcp_listen_options, [ {backlog, 128},{nodelay, true},{keepalive, false} ] }, {tcp_listeners, [ {"x.x.x.x", 5672} ]}, {cluster_partition_handling, pause_minority}, {queue_master_locator, <<"min-masters">>}, {disk_free_limit, 50000000} ]}, {rabbitmq_management, [ {listener, [{ip, "x.x.x.x"}, {port, 15673}]}, {load_definitions, "/etc/rabbitmq/definitions.json"} ] } ].
# new format [...] listeners.tcp.default = 5672 listeners.tcp.other_port = 5673 listeners.tcp.other_ip = x.x.x.x:5672 [...] ---snip---
If I switch to the old format and restart rabbitmq on that node, the changes are applied successfully. Is this a known issue? Should I just stick to the old format? Any comments are appreciated!
Regards, Eugen
participants (1)
-
Eugen Block