[openstack-dev] Openstack-Zun Service Appears down
Hongbin Lu
hongbin034 at gmail.com
Sat Jun 23 04:29:25 UTC 2018
Hi Muhammad,
I am not sure what is the exact problem, but here is the list of things you
might want to check:
1. Make sure the security group is open. This document explains how to find
the security group(s) of the container:
https://docs.openstack.org/zun/latest/admin/security-groups.html#find-container-s-security-groups
.
2. Check if you can ping the container from outside
$ NET_ID=$(openstack network show private | awk '/ id /{print $4}')
$ sudo ip netns | grep $NET_ID
qdhcp-6d688072-a0c3-4f1c-979e-2d1882564931
$ sudo ip netns exec qdhcp-6d688072-a0c3-4f1c-979e-2d1882564931 ping
10.0.0.9
PING 10.0.0.9 (10.0.0.9) 56(84) bytes of data.
64 bytes from 10.0.0.9: icmp_seq=1 ttl=64 time=0.845 ms
64 bytes from 10.0.0.9: icmp_seq=2 ttl=64 time=0.258 ms
...
3. Check if you can ping outside from the container
$ zun list
...
| 2c5d01ef-11f9-46f6-8ef2-da59914b6a10 | pi-24-container | nginx |
Running | None | 10.0.0.9, fd66:a11d:c60:0:f816:3eff:fe9c:8f46 |
[80] |
...
$ CONTAINER_ID=$(zun show pi-24-container | awk '/ uuid /{print $4}')
$ docker ps | grep $CONTAINER_ID
f9cd8aa9a911 nginx:latest "nginx -g 'daemon of…" 21 minutes
ago Up 21 minutes
zun-2c5d01ef-11f9-46f6-8ef2-da59914b6a10
$ docker inspect -f {{.State.Pid}} f9cd8aa9a911
15001
$ sudo nsenter -t 15001 -n ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=49 time=1.46 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=49 time=0.957 ms
4. traceroute
$ sudo nsenter -t 15001 -n traceroute 8.8.8.8
traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 60 byte packets
1 ip-10-0-0-1.ec2.internal (10.0.0.1) 3.173 ms 3.154 ms 3.139 ms
2 ip-172-24-4-1.ec2.internal (172.24.4.1) 3.086 ms 3.074 ms 3.064 ms
...
5. ping & tcpdump on various interfaces. This is for checking where the
traffic is blocked.
$ sudo ip netns exec qdhcp-6d688072-a0c3-4f1c-979e-2d1882564931 tcpdump -i
tap2813b3ae-8d
...
See if you can find something by performing above steps. In any case, you
might consider restarting the neutron processes which might fix everything
magically. If no, I would need more details about your setup.
Best regards,
Hongbin
On Fri, Jun 22, 2018 at 5:05 AM Usman Awais <usman.awais at gmail.com> wrote:
> Hi Hongbin,
>
> Many thanks, got it running, that was awesom... :) The problem was
> unsynched time. I installed chrony and it started working.
>
> Now I am running into another problem; the networking of the container.
> The container gets started, I can shell into it through appcontainer API,
> it even gets the correct IP address of my private network (named priv-net)
> in openstack, through DHCP. But when I ping any address, even the address
> of the priv-net's gateway, it does nothing. I have following configuration
>
> neutron-openvswitch-agent is running
> neutron-ovs-cleanup is running
> neutron-destroy-patch-ports is running
> kuryr-libnetwork is running
> docker is running
> zun-compute is running
>
> The eth0 network card has standard configuration of an OVSBridge.
>
> When I create a new container it also creates taps and patch ports on the
> compute node. Now I am going to try to use kuryr script to test what
> happens with "bridged" and "host" networks.
>
> Muhammad Usman Awais
>
>
>
> On Thu, Jun 21, 2018 at 1:14 PM, Hongbin Lu <hongbin034 at gmail.com> wrote:
>
>> HI Muhammad,
>>
>> Here is the code (run in controller node) that decides whether a service
>> is up
>> https://github.com/openstack/zun/blob/master/zun/api/servicegroup.py .
>> There are several possibilities to cause a service to be 'down':
>> 1. The service was being 'force_down' via API (e.g. explicitly issued a
>> command like "appcontainer service forcedown")
>> 2. The zun compute process is not doing the heartbeat for a certain
>> period of time (CONF.service_down_time).
>> 3. The zun compute process is doing the heartbeat properly but the time
>> between controller node and compute node is out of sync.
>>
>> In before, #3 is the common pitfall that people ran into. If it is not
>> #3, you might want to check if the zun compute process is doing the
>> heartbeat properly. Each zun compute process is running a periodic task to
>> update its state in DB:
>> https://github.com/openstack/zun/blob/master/zun/servicegroup/zun_service_periodic.py
>> . The call of ' report_state_up ' will record this service is up in DB
>> at current time. You might want to check if this periodic task is running
>> properly, or if the current state is updated in the DB.
>>
>> Above is my best guess. Please feel free to follow it up with me or other
>> team members if you need further assistant for this issue.
>>
>> Best regards,
>> Hongbin
>>
>> On Wed, Jun 20, 2018 at 9:14 AM Usman Awais <usman.awais at gmail.com>
>> wrote:
>>
>>> Dear Zuners,
>>>
>>> I have installed RDO pike. I stopped openstack-nova-compute service on
>>> one of the hosts, and installed a zun-compute service. Although all the
>>> services seems to be running ok on both controller and compute but when I
>>> do
>>>
>>> openstack appcontainer service list
>>>
>>> It gives me following
>>>
>>>
>>> +----+--------------+-------------+-------+----------+-----------------+---------------------+-------------------+
>>> | Id | Host | Binary | State | Disabled | Disabled Reason |
>>> Updated At | Availability Zone |
>>>
>>> +----+--------------+-------------+-------+----------+-----------------+---------------------+-------------------+
>>> | 1 | node1.os.lab | zun-compute | down | False | None |
>>> 2018-06-20 13:14:31 | nova |
>>>
>>> +----+--------------+-------------+-------+----------+-----------------+---------------------+-------------------+
>>>
>>> I have checked all ports in both directions they are open, including
>>> etcd ports and others. All services are running, only docker service has
>>> the warning message saying "failed to retrieve docker-runc version: exec:
>>> \"docker-runc\": executable file not found in $PATH". There is also a
>>> message at zun-compute
>>> "/usr/lib64/python2.7/site-packages/sqlalchemy/sql/default_comparator.py:161:
>>> SAWarning: The IN-predicate on "container.uuid" was invoked with an empty
>>> sequence. This results in a contradiction, which nonetheless can be
>>> expensive to evaluate. Consider alternative strategies for improved
>>> performance."
>>>
>>> Please guide...
>>>
>>> Regards,
>>> Muhammad Usman Awais
>>>
>>>
>>>
>>> __________________________________________________________________________
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe:
>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180623/f92fedfd/attachment.html>
More information about the OpenStack-dev
mailing list