[openstack-dev] [magnum] supported OS images and magnum spawn failures for Swarm and Kubernetes

Tobias Urdin tobias.urdin at binero.se
Thu Aug 23 14:46:07 UTC 2018


Now with Fedora 26 I have etcd available but etcd fails.

[root at swarm-u2rnie4d4ik6-master-0 ~]# /usr/bin/etcd 
--name="${ETCD_NAME}" --data-dir="${ETCD_DATA_DIR}" 
--listen-client-urls="${ETCD_LISTEN_CLIENT_URLS}" --debug
2018-08-23 14:34:15.596516 E | etcdmain: error verifying flags, 
--advertise-client-urls is required when --listen-client-urls is set 
explicitly. See 'etcd --help'.
2018-08-23 14:34:15.596611 E | etcdmain: When listening on specific 
address(es), this etcd process must advertise accessible url(s) to each 
connected client.

There is a issue where the --advertise-client-urls and TLS --cert-file 
and --key-file is not passed in the systemd file, changing this to:
/usr/bin/etcd --name="${ETCD_NAME}" --data-dir="${ETCD_DATA_DIR}" 
--listen-client-urls="${ETCD_LISTEN_CLIENT_URLS}" 
--advertise-client-urls="${ETCD_ADVERTISE_CLIENT_URLS}" 
--cert-file="${ETCD_PEER_CERT_FILE}" --key-file="${ETCD_PEER_KEY_FILE}"

Makes it work, any thoughts?

Best regards
Tobias

On 08/23/2018 03:54 PM, Tobias Urdin wrote:
> Found the issue, I assume I have to use Fedora Atomic 26 until Rocky 
> where I can start using Fedora Atomic 27.
> Will Fedora Atomia 28 be supported for Rocky?
>
> https://bugs.launchpad.net/magnum/+bug/1735381 (Run etcd and flanneld 
> in system containers, In Fedora Atomic 27 etcd and flanneld are 
> removed from the base image.)
> https://review.openstack.org/#/c/524116/ (Run etcd and flanneld in a 
> system container)
>
> Still wondering about the "The Parameter (nodes_affinity_policy) was 
> not provided" when using Mesos + Ubuntu?
>
> Best regards
> Tobias
>
> On 08/23/2018 02:56 PM, Tobias Urdin wrote:
>> Thanks for all of your help everyone,
>>
>> I've been busy with other thing but was able to pick up where I left 
>> regarding Magnum.
>> After fixing some issues I have been able to provision a working 
>> Kubernetes cluster.
>>
>> I'm still having issues with getting Docker Swarm working, I've tried 
>> with both Docker and flannel as the networking layer but
>> none of these works. After investigating the issue seems to be that 
>> etcd.service is not installed (unit file doesn't exist) so the master
>> doesn't work, the minion swarm node is provisioned but cannot join 
>> the cluster because there is no etcd.
>>
>> Anybody seen this issue before? I've been digging through all 
>> cloud-init logs and cannot see anything that would cause this.
>>
>> I also have another separate issue, when provisioning using the 
>> magnum-ui in Horizon and selecting ubuntu with Mesos I get the error
>> "The Parameter (nodes_affinity_policy) was not provided". The 
>> nodes_affinity_policy do have a default value in magnum.conf so I'm 
>> starting
>> to think this might be an issue with the magnum-ui dashboard?
>>
>> Best regards
>> Tobias
>>
>> On 08/04/2018 06:24 PM, Joe Topjian wrote:
>>> We recently deployed Magnum and I've been making my way through 
>>> getting both Swarm and Kubernetes running. I also ran into some 
>>> initial issues. These notes may or may not help, but thought I'd 
>>> share them in case:
>>>
>>> * We're using Barbican for SSL. I have not tried with the internal 
>>> x509keypair.
>>>
>>> * I was only able to get things running with Fedora Atomic 27, 
>>> specifically the version used in the Magnum docs: 
>>> https://docs.openstack.org/magnum/latest/install/launch-instance.html
>>>
>>> Anything beyond that wouldn't even boot in my cloud. I haven't dug 
>>> into this.
>>>
>>> * Kubernetes requires a Cluster Template to have a label of 
>>> cert_manager_api=true set in order for the cluster to fully come up 
>>> (at least, it didn't work for me until I set this).
>>>
>>> As far as troubleshooting methods go, check the cloud-init logs on 
>>> the individual instances to see if any of the "parts" have failed to 
>>> run. Manually re-run the parts on the command-line to get a better 
>>> idea of why they failed. Review the actual script, figure out the 
>>> variable interpolation and how it relates to the Cluster Template 
>>> being used.
>>>
>>> Eventually I was able to get clusters running with the stock 
>>> driver/templates, but wanted to tune them in order to better fit in 
>>> our cloud, so I've "forked" them. This is in no way a slight against 
>>> the existing drivers/templates nor do I recommend doing this until 
>>> you reach a point where the stock drivers won't meet your needs. But 
>>> I mention it because it's possible to do and it's not terribly hard. 
>>> This is still a work-in-progress and a bit hacky:
>>>
>>> https://github.com/cybera/magnum-templates
>>>
>>> Hope that helps,
>>> Joe
>>>
>>> On Fri, Aug 3, 2018 at 6:46 AM, Tobias Urdin <tobias.urdin at binero.se 
>>> <mailto:tobias.urdin at binero.se>> wrote:
>>>
>>>     Hello,
>>>
>>>     I'm testing around with Magnum and have so far only had issues.
>>>     I've tried deploying Docker Swarm (on Fedora Atomic 27, Fedora
>>>     Atomic 28) and Kubernetes (on Fedora Atomic 27) and haven't been
>>>     able to get it working.
>>>
>>>     Running Queens, is there any information about supported images?
>>>     Is Magnum maintained to support Fedora Atomic still?
>>>     What is in charge of population the certificates inside the
>>>     instances, because this seems to be the root of all issues, I'm
>>>     not using Barbican but the x509keypair driver
>>>     is that the reason?
>>>
>>>     Perhaps I missed some documentation that x509keypair does not
>>>     support what I'm trying to do?
>>>
>>>     I've seen the following issues:
>>>
>>>     Docker:
>>>     * Master does not start and listen on TCP because of certificate
>>>     issues
>>>     dockerd-current[1909]: Could not load X509 key pair (cert:
>>>     "/etc/docker/server.crt", key: "/etc/docker/server.key")
>>>
>>>     * Node does not start with:
>>>     Dependency failed for Docker Application Container Engine.
>>>     docker.service: Job docker.service/start failed with result
>>>     'dependency'.
>>>
>>>     Kubernetes:
>>>     * Master etcd does not start because /run/etcd does not exist
>>>     ** When that is created it fails to start because of certificate
>>>     2018-08-03 12:41:16.554257 C | etcdmain: open
>>>     /etc/etcd/certs/server.crt: no such file or directory
>>>
>>>     * Master kube-apiserver does not start because of certificate
>>>     unable to load server certificate: open
>>>     /etc/kubernetes/certs/server.crt: no such file or directory
>>>
>>>     * Master heat script just sleeps forever waiting for port 8080
>>>     to become available (kube-apiserver) so it can never kubectl
>>>     apply the final steps.
>>>
>>>     * Node does not even start and times out when Heat deploys it,
>>>     probably because master never finishes
>>>
>>>     Any help is appreciated perhaps I've missed something crucial,
>>>     I've not tested Kubernetes on CoreOS yet.
>>>
>>>     Best regards
>>>     Tobias
>>>
>>>     __________________________________________________________________________
>>>     OpenStack Development Mailing List (not for usage questions)
>>>     Unsubscribe:
>>>     OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>     <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>>>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>     <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180823/cdcc16e6/attachment.html>


More information about the OpenStack-dev mailing list