[openstack-dev] PCI-passthrough - preliminary meeting this WEDNESDAY 10am PST / 7pm CET

Irena Berezovsky irenab at mellanox.com
Fri Apr 5 16:06:36 UTC 2013

Thank you for hosting the discussion.
I feel we jumped into the details without formalizing the tenant/provider use cases. I think  it worth to define  End2End user stories and take gradual steps for solution. It can definitely be outlined by the blueprint.
I  think that Quantum related questions are still to be discussed:
--  Is it extension or plugin. If its plugin is it generic or vendor specific
--  Impact on Nova - Quantum interaction
-- Does it require Quantum Scheduling
We can discuss it either prior to the summit (webex, emails) or at the summit.


From: froggienator at gmail.com [mailto:froggienator at gmail.com] On Behalf Of Ian Wells
Sent: Wednesday, April 03, 2013 9:35 PM
To: Jiang, Yunhong; Itzik Brown; Vladimir Popovski; Irena Berezovsky; niilo.minkkinen.ext at nsn.com; Chuck Short; OpenStack Development Mailing List
Cc: Ian Wells (iawells at cisco.com)
Subject: Re: PCI-passthrough - preliminary meeting this WEDNESDAY 10am PST / 7pm CET

Thanks for attending, everyone!

Here's the notes from the meeting today (generally formatted as title - slide content - comments from meeting).  Out of attachment-phobia I've taken the text out of the presentation.  They're a bit brief, so feel free to ask for clarification if it's needed.

As I said, if there's enough interest we could organise another meeting next week; and maybe it would be a good idea to get the outline of a blueprint ready for discussion at the Summit session.

Problems to solve
- Configuring available devices
- Permitting users to use passthrough
- Marking passthrough as supported
- Booting with passthrough
- Finding a free device
- Mapping the device
- Networking-specific issues
- Migration - possible but hard; not for the first release
- Run-time assignment - possible but not even supported for virtual devices at the moment; not for the first release

Configuring available devices
- On the compute server:
  pci_dev={JSON describing the device}
  Includes PCI path
  Includes PCI device type (or nova-compute could discover this via the hypervisor driver)
  May include undiscoverable attributes like where a network card is connected

.. What attributes needs specifying?
Other hypervisors than KVM? (Xen should be possible directly via libvirt; xenapi, VMWare, ...)
Given meeting attendance, likely our first target would be libvirt/KVM, though
Direct PCI versus SR-IOV (but SR-IOV function count is typically static so it they could be listed in the Nova config)
Another option: don't map the (network) device directly but use a vNIC linked directly to it (see Mellanox code, below)

Permitting passthrough
- Nova must have extra markers in the flavor specifying that the VM is entitled to PCI devices
- Customers can be charged differently

In the future, maybe 'start with PCI if PCI is available?' rather than 'start with PCI and don't start if none is available'

Marking support on the Glance image
- (Optional)
- Show which images support a device type; refuse to start unless the device is supported
- Mark an image with properties

Booting with passthrough
- Extra arguments to the 'nova boot' command
- May be general purpose arguments...
  ... or special specific arguments
  -nic net_id=XXX,passthrough_dev=ixgbe

Likely that we have to specify a list of constraints (driver type, device class, tagged in a certain way) rather than a simple type
Could put characteristics on Quantum port and not on nic parameter
Some things will already have to be loaded off the port, e.g. network discriminator, firewalling
Some things will be available there in the future, likely QoS information
Connection between PCI port and CPU (i.e. that the PCI device is local to the CPU to which VM is bound to avoid putting the QPI interprocessor interconnect in the PCI datapath) - of future interest

Finding a free device
- New scheduler behaviour
- Compute devices report passthrough resources
- Scheduler tries to find a free resource
- Resources updated when VMs use and free them
- New information in DB
- Need to keep track of the above resources

Needs checking - does this information get stored to the DB or is it in the scheduler only?
Capabilities, properties - 'a device connected to this network', 'a device of this type'

Mapping the device
- Changes to the hypervisor driver to configure the VM
  We've changed libvirt/driver.py
  Changing the XML is easy
- Bare-metal guys may also want to use the scheduler
  If their baremetal machines are described they'll get free PCI device selection from the same scheduler
  The device will be present whether you use it or not; some VMs will get 'freebies'

Networking-specific cases
- We want to integrate a NIC into a Quantum network
  This would involve a PCI-enabled NIC driver
  pci_dev in config may say how the NIC's attached to the network
  Hypervisor driver tells Quantum this information
  Quantum configures the switch that the network device is attached to, instead of the OVS that the vNIC is attached to
  Or Quantum may indicate that the PF should be set to encapsulate all that VF's traffic (with VLAN tags, for instance)

Firewalling will be a problem - may need programming on the NIC or the switch, may simply not be available
Should we be programming the VF encap from the Quantum agent or from information passed to Nova from the Quantum plugging call?

Zadara Storage/Cisco Code
- Code that Cisco received from Zadara and updated (Folsom-based)
- Currently experimental, we're only using it for networking
- Has enabling, scheduling, mapping
- We use it for networking; code can't attach to specific networks
- Not the end result, merely something that works for now

Mellanox code
- Available now, attempting to merge into Quantum
- Nova side just maps NICs into libvirt VMs
- Quantum side does most of the work - NICs are programmed to use the correct encap for the network they're used on via the Quantum agent
- 2 modes - one is SRIOV and passthrough, one uses an SRIOV device with macvtap and a virtual NIC in the VM
- No scheduling - assumption is that there are more SRIOV devices available than there are VMs using them

Tieto code
- collaboration with Intel
- ...? (short on details, sorry

On 2 April 2013 01:13, Ian Wells <ijw.ubuntu at cack.org.uk<mailto:ijw.ubuntu at cack.org.uk>> wrote:
Here's the webex details for anyone that would care to discuss PCI device passthrough in Openstack.  This will be a preliminary meeting to see who's interested, what their use cases are, and for anyone who has code to give a quick briefing on what they have.

Ian Wells invites you to attend this online meeting.

Topic: Openstack and PCI passthrough
Date: Wednesday, April 3, 2013
Time: 7:00 pm, Europe Summer Time (Amsterdam, GMT+02:00)
Meeting Number: 206 594 344
Meeting Password: cisco

To join the online meeting (Now from mobile devices!)
1. Go to https://cisco.webex.com/ciscosales/j.php?ED=222105092&UID=0&RT=MiMyMg%3D%3D
2. Enter your name and email address.
3. Enter the meeting password: Please obtain your meeting password from your host.
4. Click "Join Now".

To view in other time zones or languages, please click the link:

ALERT:Toll-Free Dial Restrictions for (408) and (919) Area Codes

The affected toll free numbers are: (866) 432-9903<tel:%28866%29%20432-9903> for the San Jose/Milpitas area and (866) 349-3520<tel:%28866%29%20349-3520> for the RTP area.

Please dial the local access number for your area from the list below:
- San Jose/Milpitas (408) area: 525-6800
- RTP (919) area: 392-3330

To join the teleconference only
1. Dial into Cisco WebEx (view all Global Access Numbers at
2. Follow the prompts to enter the Meeting Number (listed above) or Access Code followed by the # sign.

San Jose, CA: +1.408.525.6800<tel:%2B1.408.525.6800> RTP: +1.919.392.3330<tel:%2B1.919.392.3330>
US/Canada: +1.866.432.9903<tel:%2B1.866.432.9903> United Kingdom: +44.20.8824.0117<tel:%2B44.20.8824.0117>
India: +91.80.4350.1111<tel:%2B91.80.4350.1111> Germany: +49.619.6773.9002<tel:%2B49.619.6773.9002>
Japan: +81.3.5763.9394<tel:%2B81.3.5763.9394> China: +86.10.8515.5666<tel:%2B86.10.8515.5666>

For assistance
1. Go to https://cisco.webex.com/ciscosales/mc
2. On the left navigation bar, click "Support".

You can contact me at:
iawells at cisco.com<mailto:iawells at cisco.com>

To add this meeting to your calendar program (for example Microsoft Outlook), click this link:

The playback of UCF (Universal Communications Format) rich media files requires appropriate players. To view this type of rich media files in the meeting, please check whether you have the players installed on your computer by going to https://cisco.webex.com/ciscosales/systemdiagnosis.php.



IMPORTANT NOTICE: This WebEx service includes a feature that allows audio and any documents and other materials exchanged or viewed during the session to be recorded. By joining this session, you automatically consent to such recordings. If you do not consent to the recording, discuss your concerns with the meeting host prior to the start of the recording or do not join the session. Please note that any such recordings may be subject to discovery in the event of litigation.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130405/6c339b54/attachment.html>

More information about the OpenStack-dev mailing list