Hello Eugen,

IP configurations on vm is happening via neutron DHCP 
But yesterday we enabled config drive metadata on Ubuntu guest image after that we faced the same issue 
and then restart memchached, Neutron metadata agent, neutron DHCP agent and neutron l3 agent service and add ingress security group rule over TCP protocol over remote ip 169.254.169.254 then I immediately launched vm it didn't work 
But I tried again after 8 hours i.e. today's morning with debian, CentOS and Ubuntu guest images everything is working fine
This is very strange behaviour from last two days it was not working but now it start working  

Can you please input your thoughts or RCA for this?


Thanks 
Arihant Jain 



On Wed, 9 Oct, 2024, 2:48 pm Eugen Block, <eblock@nde.ag> wrote:
Hi,

for external networks you will need to inject metadata via 
config-drive. Does your VM have the IP configured which neutron 
assigned to it?

Zitat von AJ_ sunny <jains8550@gmail.com>:

> Hello team,
>
> I am also facing the similar kind of problem in which cloud-init not able
> to push key-pair inside the image due to which I am not able to ssh the vm
>
> Inside the vm
> On curl http://169.254.169.254/openstack
> Failed to connect to 169.254.169.254 port 80 connection refused
>
>
> VM with direct external ip not able to ssh
> But vm with tenant network with floating ip able to ssh this is very
> strange scenario
>
> In neutron logs I am also getting error
> Unexpected number of DHCP interface for metadata proxy expected 1, got2
>
>
> Please provide the assistance on this
>
>
> Thanks
> Arihant Jain
>
> On Fri, 18 Nov, 2022, 7:24 am Tobias McNulty, <tobias@caktusgroup.com>
> wrote:
>
>> Thank you all for the helpful responses and suggestions. I tried these
>> steps, but I am afraid the problem was user error.
>>
>> I thought I had adequately tested the internal network previously, but
>> that was not the case. cloud-init and security groups now appear to
>> work seamlessly on an internal subnet. Furthermore, floating IPs from
>> the external subnet are properly allocated and are reachable from the
>> LAN.
>>
>> I believe the issue was that I accidentally left DHCP disabled on the
>> internal subnet previously. When I disable DHCP on the internal subnet
>> now, a new instance will hang for ~400-500 seconds at this point in
>> the boot process:
>>
>>          Starting [0;1;39mLoad AppArmor pro���managed internally by
>> snapd[0m...
>>          Starting [0;1;39mInitial cloud-init job (pre-networking)[0m...
>>          Mounting [0;1;39mArbitrary Executable File Formats File
>> System[0m...
>> [[0;32m  OK  [0m] Mounted [0;1;39mArbitrary Executable File Formats
>> File System[0m.
>> [    7.673299] cloud-init[508]: Cloud-init v. 22.3.4-0ubuntu1~22.04.1
>> running 'init-local' at Fri, 18 Nov 2022 01:18:29 +0000. Up 7.61
>> seconds.
>> [[0;32m  OK  [0m] Finished [0;1;39mLoad AppArmor pro���s managed
>> internally by snapd[0m.
>>
>> Eventually the instance finishes booting and displays the timeout
>> attempting to reach 169.254.169.254:
>>
>> [  430.150383] cloud-init[551]: Cloud-init v. 22.3.4-0ubuntu1~22.04.1
>> running 'init' at Fri, 18 Nov 2022 01:25:31 +0000. Up 430.12 seconds.
>> <snip>
>> [  430.210288] cloud-init[551]: 2022-11-18 01:25:31,748 -
>> url_helper.py[ERROR]: Timed out, no response from urls:
>> ['http://169.254.169.254/openstack']
>> [  430.217100] cloud-init[551]: 2022-11-18 01:25:31,749 -
>> util.py[WARNING]: No active metadata service found
>>
>> In summary, I believe that:
>>
>> * cloud-init will timeout if DHCP is disabled (presumably because it
>> has no IP with which to make a request?)
>> * Security groups may not work as expected for instances created in an
>> external subnet. The proper configuration is to create instances in a
>> virtual subnet and assign floating IPs from the external subnet.
>>
>> Hopefully this message is helpful to someone in the future, and thank
>> you all for your patience and support!
>>
>> Tobias
>>
>> On Tue, Nov 15, 2022 at 12:27 PM Sean Mooney <smooney@redhat.com> wrote:
>> >
>> > On Tue, 2022-11-15 at 09:02 -0800, Clark Boylan wrote:
>> > > On Tue, Nov 15, 2022, at 6:14 AM, Tobias McNulty wrote:
>> > > > As an update, I tried the non-HWE kernel with the same result. Could
>> it
>> > > > be a hardware/driver issue with the 10G NICs? It's so repeatable.
>> I'll
>> > > > look into finding some other hardware to test with.
>> > > >
>> > > > Has anyone else experienced such a complete failure with cloud-init
>> > > > and/or security groups, and do you have any advice on how I might
>> > > > continue to debug this?
>> > >
>> > > I'm not sure this will be helpful since you seem to have narrowed down
>> the issue to VM networking, but here are some of the things that I do when
>> debugging boot time VM setup failures:
>> > >
>> > > * Use config drive instead of metadata service. The metadata service
>> hasn't always been reliable.
>> > > * Bake information like DHCP config for interfaces and user ssh keys
>> into an image and boot that. This way you don't need to rely on actions
>> taken at boot time.
>> > > * Use a different boot time configurator tool. Glean is the one the
>> OpenDev team uses for test nodes. When I debug things there I tend to test
>> with cloud-init to compare glean behavior. But you can do this in reverse.
>> > >
>> > > Again, I'm not sure this is helpful in this specific instance. But
>> thought I'd send it out anyway to help those who may land here through
>> Google search in the future.
>> >
>> > one thing that you shoudl check in addtion to considering ^
>> > is make sure that the nova api is configured to use memcache.
>> >
>> > cloud init only retries request until the first request succceds.
>> > once the first request works it assumes that the rest will. if you are
>> using a loadbalance and multipel nova-metadtaa-api process
>> > without memcache, and it take more then 10-30 seconds(cant recall how
>> long cloud-init waits) to build the metadatta respocnce then
>> > cloud init can fail. basically if the second request need to rebuild
>> everythign again because its not in a shared cache( memcache)
>> > then teh request can time out and cloud init wont try again.
>> >
>> > >
>> > > >
>> > > > Many thanks,
>> > > > Tobias
>> > >
>> >
>>
>>