[openstack-dev] [magnum] supported OS images and magnum spawn failures for Swarm and Kubernetes
Tobias Urdin
tobias.urdin at binero.se
Thu Aug 23 14:46:07 UTC 2018
Now with Fedora 26 I have etcd available but etcd fails.
[root at swarm-u2rnie4d4ik6-master-0 ~]# /usr/bin/etcd
--name="${ETCD_NAME}" --data-dir="${ETCD_DATA_DIR}"
--listen-client-urls="${ETCD_LISTEN_CLIENT_URLS}" --debug
2018-08-23 14:34:15.596516 E | etcdmain: error verifying flags,
--advertise-client-urls is required when --listen-client-urls is set
explicitly. See 'etcd --help'.
2018-08-23 14:34:15.596611 E | etcdmain: When listening on specific
address(es), this etcd process must advertise accessible url(s) to each
connected client.
There is a issue where the --advertise-client-urls and TLS --cert-file
and --key-file is not passed in the systemd file, changing this to:
/usr/bin/etcd --name="${ETCD_NAME}" --data-dir="${ETCD_DATA_DIR}"
--listen-client-urls="${ETCD_LISTEN_CLIENT_URLS}"
--advertise-client-urls="${ETCD_ADVERTISE_CLIENT_URLS}"
--cert-file="${ETCD_PEER_CERT_FILE}" --key-file="${ETCD_PEER_KEY_FILE}"
Makes it work, any thoughts?
Best regards
Tobias
On 08/23/2018 03:54 PM, Tobias Urdin wrote:
> Found the issue, I assume I have to use Fedora Atomic 26 until Rocky
> where I can start using Fedora Atomic 27.
> Will Fedora Atomia 28 be supported for Rocky?
>
> https://bugs.launchpad.net/magnum/+bug/1735381 (Run etcd and flanneld
> in system containers, In Fedora Atomic 27 etcd and flanneld are
> removed from the base image.)
> https://review.openstack.org/#/c/524116/ (Run etcd and flanneld in a
> system container)
>
> Still wondering about the "The Parameter (nodes_affinity_policy) was
> not provided" when using Mesos + Ubuntu?
>
> Best regards
> Tobias
>
> On 08/23/2018 02:56 PM, Tobias Urdin wrote:
>> Thanks for all of your help everyone,
>>
>> I've been busy with other thing but was able to pick up where I left
>> regarding Magnum.
>> After fixing some issues I have been able to provision a working
>> Kubernetes cluster.
>>
>> I'm still having issues with getting Docker Swarm working, I've tried
>> with both Docker and flannel as the networking layer but
>> none of these works. After investigating the issue seems to be that
>> etcd.service is not installed (unit file doesn't exist) so the master
>> doesn't work, the minion swarm node is provisioned but cannot join
>> the cluster because there is no etcd.
>>
>> Anybody seen this issue before? I've been digging through all
>> cloud-init logs and cannot see anything that would cause this.
>>
>> I also have another separate issue, when provisioning using the
>> magnum-ui in Horizon and selecting ubuntu with Mesos I get the error
>> "The Parameter (nodes_affinity_policy) was not provided". The
>> nodes_affinity_policy do have a default value in magnum.conf so I'm
>> starting
>> to think this might be an issue with the magnum-ui dashboard?
>>
>> Best regards
>> Tobias
>>
>> On 08/04/2018 06:24 PM, Joe Topjian wrote:
>>> We recently deployed Magnum and I've been making my way through
>>> getting both Swarm and Kubernetes running. I also ran into some
>>> initial issues. These notes may or may not help, but thought I'd
>>> share them in case:
>>>
>>> * We're using Barbican for SSL. I have not tried with the internal
>>> x509keypair.
>>>
>>> * I was only able to get things running with Fedora Atomic 27,
>>> specifically the version used in the Magnum docs:
>>> https://docs.openstack.org/magnum/latest/install/launch-instance.html
>>>
>>> Anything beyond that wouldn't even boot in my cloud. I haven't dug
>>> into this.
>>>
>>> * Kubernetes requires a Cluster Template to have a label of
>>> cert_manager_api=true set in order for the cluster to fully come up
>>> (at least, it didn't work for me until I set this).
>>>
>>> As far as troubleshooting methods go, check the cloud-init logs on
>>> the individual instances to see if any of the "parts" have failed to
>>> run. Manually re-run the parts on the command-line to get a better
>>> idea of why they failed. Review the actual script, figure out the
>>> variable interpolation and how it relates to the Cluster Template
>>> being used.
>>>
>>> Eventually I was able to get clusters running with the stock
>>> driver/templates, but wanted to tune them in order to better fit in
>>> our cloud, so I've "forked" them. This is in no way a slight against
>>> the existing drivers/templates nor do I recommend doing this until
>>> you reach a point where the stock drivers won't meet your needs. But
>>> I mention it because it's possible to do and it's not terribly hard.
>>> This is still a work-in-progress and a bit hacky:
>>>
>>> https://github.com/cybera/magnum-templates
>>>
>>> Hope that helps,
>>> Joe
>>>
>>> On Fri, Aug 3, 2018 at 6:46 AM, Tobias Urdin <tobias.urdin at binero.se
>>> <mailto:tobias.urdin at binero.se>> wrote:
>>>
>>> Hello,
>>>
>>> I'm testing around with Magnum and have so far only had issues.
>>> I've tried deploying Docker Swarm (on Fedora Atomic 27, Fedora
>>> Atomic 28) and Kubernetes (on Fedora Atomic 27) and haven't been
>>> able to get it working.
>>>
>>> Running Queens, is there any information about supported images?
>>> Is Magnum maintained to support Fedora Atomic still?
>>> What is in charge of population the certificates inside the
>>> instances, because this seems to be the root of all issues, I'm
>>> not using Barbican but the x509keypair driver
>>> is that the reason?
>>>
>>> Perhaps I missed some documentation that x509keypair does not
>>> support what I'm trying to do?
>>>
>>> I've seen the following issues:
>>>
>>> Docker:
>>> * Master does not start and listen on TCP because of certificate
>>> issues
>>> dockerd-current[1909]: Could not load X509 key pair (cert:
>>> "/etc/docker/server.crt", key: "/etc/docker/server.key")
>>>
>>> * Node does not start with:
>>> Dependency failed for Docker Application Container Engine.
>>> docker.service: Job docker.service/start failed with result
>>> 'dependency'.
>>>
>>> Kubernetes:
>>> * Master etcd does not start because /run/etcd does not exist
>>> ** When that is created it fails to start because of certificate
>>> 2018-08-03 12:41:16.554257 C | etcdmain: open
>>> /etc/etcd/certs/server.crt: no such file or directory
>>>
>>> * Master kube-apiserver does not start because of certificate
>>> unable to load server certificate: open
>>> /etc/kubernetes/certs/server.crt: no such file or directory
>>>
>>> * Master heat script just sleeps forever waiting for port 8080
>>> to become available (kube-apiserver) so it can never kubectl
>>> apply the final steps.
>>>
>>> * Node does not even start and times out when Heat deploys it,
>>> probably because master never finishes
>>>
>>> Any help is appreciated perhaps I've missed something crucial,
>>> I've not tested Kubernetes on CoreOS yet.
>>>
>>> Best regards
>>> Tobias
>>>
>>> __________________________________________________________________________
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe:
>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180823/cdcc16e6/attachment.html>
More information about the OpenStack-dev
mailing list