[Openstack] VM can't ping self floating IP after a snapshot is taken

Evan Callicoat diopter at gmail.com
Thu Aug 23 20:38:12 UTC 2012


Hello all!

I'm the original author of the hairpin patch, and things have changed a
little bit in Essex and Folsom from the original Diablo target. I believe I
can shed some light on what should be done here to solve the issue in
either case.

---
For Essex (stable/essex), in nova/virt/libvirt/connection.py:
---

Currently _enable_hairpin() is only being called from spawn(). However,
spawn() is not the only place that vifs (veth#) get added to a bridge
(which is when we need to enable hairpin_mode on them). The more relevant
function is _create_new_domain(), which is called from spawn() and other
places. Without changing the information that gets passed to
_create_new_domain() (which is just 'xml' from to_xml()), we can easily
rewrite the first 2 lines in _enable_hairpin(), as follows:

def _enable_hairpin(self, xml):
    interfaces = self.get_interfaces(xml['name'])

Then, we can move the self._enable_hairpin(instance) call from spawn() up
into _create_new_domain(), and pass it xml as follows:

[...]
self._enable_hairpin(xml)
return domain

This will run the hairpin code every time a domain gets created, which is
also when the domain's vif(s) gets inserted into the bridge with the
default of hairpin_mode=0.

---
For Folsom (trunk), in nova/virt/libvirt/driver.py:
---

There've been a lot more changes made here, but the same strategy as above
should work. Here, _create_new_domain() has been split into
_create_domain() and _create_domain_and_network(), and _enable_hairpin()
was moved from spawn() to _create_domain_and_network(), which seems like
it'd be the right thing to do, but doesn't quite cover all of the cases of
vif reinsertion, since _create_domain() is the only function which actually
creates the domain (_create_domain_and_network() just calls it after doing
some pre-work). The solution here is likewise fairly simple; make the same
2 changes to _enable_hairpin():

def _enable_hairpin(self, xml):
    interfaces = self.get_interfaces(xml['name'])

And move it from _create_domain_and_network() to _create_domain(), like
before:

[...]
self._enable_hairpin(xml)
return domain

I haven't yet tested this on my Essex clusters and I don't have a Folsom
cluster handy at present, but the change is simple and makes sense. Looking
at to_xml() and _prepare_xml_info(), it appears that the 'xml' variable
_create_[new_]domain() gets is just a python dictionary, and xml['name'] =
instance['name'], exactly what _enable_hairpin() was using the 'instance'
variable for previously.

Let me know if this works, or doesn't work, or doesn't make sense, or if
you need an address to send gifts, etc. Hope it's solved!

-Evan

On Thu, Aug 23, 2012 at 11:20 AM, Sam Su <susltd.su at gmail.com> wrote:

> Hi Oleg,
>
> Thank you for your investigation. Good lucky!
>
> Can you let me know if find how to fix the bug?
>
> Thanks,
> Sam
>
> On Wed, Aug 22, 2012 at 12:50 PM, Oleg Gelbukh <ogelbukh at mirantis.com>wrote:
>
>> Hello,
>>
>> Is it possible that, during snapshotting, libvirt just tears down virtual
>> interface at some point, and then re-creates it, with hairpin_mode disabled
>> again?
>> This bugfix [https://bugs.launchpad.net/nova/+bug/933640] implies that
>> fix works on spawn of instance. This means that upon resume after snapshot,
>> hairpin is not restored. May be if we insert the _enable_hairpin() call in
>> snapshot procedure, it helps.
>> We're currently investigating this issue in one of our environments, hope
>> to come up with answer by tomorrow.
>>
>> --
>> Best regards,
>> Oleg
>>
>>  On Wed, Aug 22, 2012 at 11:29 PM, Sam Su <susltd.su at gmail.com> wrote:
>>
>>>  My friend has found a way to enable ping itself, when this problem
>>> happened. But not found why this happen.
>>> sudo echo "1" >
>>> /sys/class/net/br1000/brif/<virtual-interface-name>/hairpin_mode
>>>
>>> I file a ticket to report this problem:
>>> https://bugs.launchpad.net/nova/+bug/1040255
>>>
>>> hopefully someone can find why this happen and solve it.
>>>
>>> Thanks,
>>> Sam
>>>
>>>
>>> On Fri, Jul 20, 2012 at 3:50 PM, Gabriel Hurley <
>>> Gabriel.Hurley at nebula.com> wrote:
>>>
>>>>  I ran into some similar issues with the _*enable*_hairpin() call. The
>>>> call is allowed to fail silently and (in my case) was failing. I couldn’t
>>>> for the life of me figure out why, though, and since I’m really not a
>>>> networking person I didn’t trace it along too far.****
>>>>
>>>> ** **
>>>>
>>>> Just thought I’d share my similar pain.****
>>>>
>>>> ** **
>>>>
>>>> **-          **Gabriel****
>>>>
>>>> ** **
>>>>
>>>> *From:* openstack-bounces+gabriel.hurley=nebula.com at lists.launchpad.net[mailto:
>>>> openstack-bounces+gabriel.hurley=nebula.com at lists.launchpad.net] *On
>>>> Behalf Of *Sam Su
>>>> *Sent:* Thursday, July 19, 2012 11:50 AM
>>>> *To:* Brian Haley
>>>> *Cc:* openstack
>>>> *Subject:* Re: [Openstack] VM can't ping self floating IP after a
>>>> snapshot is taken****
>>>>
>>>> ** **
>>>>
>>>> Thank you for your support.****
>>>>
>>>> ** **
>>>>
>>>> I checked the file  nova/virt/libvirt/connection.py, the sentence
>>>> self._enable_hairpin(instance) is already added to the
>>>> function  _hard_reboot().****
>>>>
>>>> It looks like there are some difference between taking snapshot and
>>>> reboot instance. I tried to figure out how to fix this bug but failed.
>>>> ****
>>>>
>>>> ** **
>>>>
>>>> It will be much appreciated if anyone can give some hints.****
>>>>
>>>> ** **
>>>>
>>>> Thanks,****
>>>>
>>>> Sam****
>>>>
>>>> ** **
>>>>
>>>> On Thu, Jul 19, 2012 at 8:37 AM, Brian Haley <brian.haley at hp.com>
>>>> wrote:****
>>>>
>>>> On 07/17/2012 05:56 PM, Sam Su wrote:
>>>> > Hi,
>>>> >
>>>> > Just This always happens in Essex release. After I take a snapshot of
>>>> my VM ( I
>>>> > tried Ubuntu 12.04 or CentOS 5.8), VM can't ping its self floating
>>>> IP; before I
>>>> > take a snapshot though, VM can ping its self floating IP.
>>>> >
>>>> > This looks closely related to
>>>> https://bugs.launchpad.net/nova/+bug/933640, but
>>>> > still a little different. In 933640, it sounds like VM can't ping its
>>>> self
>>>> > floating IP regardless whether we take a snapshot or not.
>>>> >
>>>> > Any suggestion to make an easy fix? And what is the root cause of the
>>>> problem?****
>>>>
>>>> It might be because there's a missing _enable_hairpin() call in the
>>>> reboot()
>>>> function.  Try something like this...
>>>>
>>>> nova/virt/libvirt/connection.py, _hard_reboot():
>>>>
>>>>              self._create_new_domain(xml)
>>>> +            self._enable_hairpin(instance)
>>>>              self.firewall_driver.apply_instance_filter(instance,
>>>> network_info)
>>>>
>>>> At least that's what I remember doing myself recently when testing
>>>> after a
>>>> reboot, don't know about snapshot.
>>>>
>>>> Folsom has changed enough that something different would need to be
>>>> done there.
>>>>
>>>> -Brian****
>>>>
>>>> ** **
>>>>
>>>
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~openstack
>>> Post to     : openstack at lists.launchpad.net
>>> Unsubscribe : https://launchpad.net/~openstack
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>>
>>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack at lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20120823/8ae299cd/attachment.html>


More information about the Openstack mailing list