[openstack-dev] [Fuel] Order of network interfaces for bootstrap nodes

Dmitriy Shulyak dshulyak at mirantis.com
Thu Nov 20 20:01:55 UTC 2014


Hi folks,

There was interesting research today on random nics ordering for nodes in
bootstrap stage. And in my opinion it requires separate thread...
I will try to describe what the problem is and several ways to solve it.
Maybe i am missing the simple way, if you see it - please participate.
Link to LP bug: https://bugs.launchpad.net/fuel/+bug/1394466

When a node is booted first time it registers its interfaces in nailgun,
see sample of data (only related to discussion parts):
- name: eth0
  ip: 10.0.0.3/24
  mac: 00:00:03
- name: eth1
  ip: None
  mac: 00:00:04
* eth0 is admin network interface which was used for initial pxe boot

We have networks, for simplicity lets assume there is 2:
 - admin
 - public
When the node is added to cluster, in general you will see next schema:
- name: eth0
  ip: 10.0.0.3/24
  mac: 00:00:03
  networks:
    - admin
    - public
- name: eth1
  ip: None
  mac: 00:00:04

At this stage node is still using default system with bootstrap profile, so
there is no custom system with udev rules. And on next reboot there is no
way to guarantee that network cards will be discovered by kernel in same
order. If network cards is discovered in order that is diffrent from
original and nics configuration is updated, it is possible to end up with:
- name: eth0
  ip: None
  mac: 00:00:04
  networks:
    - admin
    - public
- name: eth1
  mac: 00:00:03
  ip: 10.0.0.3/24
Here you can see that networks is left connected to eth0 (in db). And
ofcourse this schema doesnt reflect physical infrastructure. I hope it is
clear now what is the problem.
If you want to investigate it yourself, please find db dump in snapshot
attached to the bug, you will be able to find described here case.
What happens next:
1. netcfg/choose_interface for kernel is misconfigured, and in my example
it will be 00:00:04, but should be 00:00:03
2. network configuration for l23network will be simply corrupted

So - possible solutions:
1. Reflect node interfaces ordering, with networks reassignment - Hard and
hackish
2. Do not update any interfaces info if networks assigned to them, then
udev rules will be applied and nics will be reordered into original state -
i would say easy and reliable solution
3. Create cobbler system when node is booted first time, and add udev rules
- it looks to me like proper solution, but requires design

Please share your thoughts/ideas, afaik this issue is not rare on scale
deployments.
Thank you
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20141120/c51e1802/attachment.html>


More information about the OpenStack-dev mailing list