[ops][nova][designate] Does anyone rely on fully-qualified instance names?
When attaching a port to an instance, nova will check for DNS support in neutron and set a 'dns_name' attribute if found. To populate this attribute, nova uses a sanitised version of the instance name, stored in the instance.hostname attribute. This sanitisation simply strips out any unicode characters and replaces underscores and spaces with dashes, before truncating to 63 characters. It does not currently replace periods and this is the cause of bug 1581977 [1], where an instance name such as 'ubuntu20.04' will fail to schedule since neutron identifies '04' as an invalid TLD. The question now is what to do to resolve this. There are two obvious paths available to us. The first is to simply catch these invalid hostnames and replace them with an arbitrary hostname of format 'Server-{serverUUID}'. This is what we currently do for purely unicode instance names and is what I've proposed at [2]. The other option is to strip all periods, or rather replace them with hyphens, when sanitizing the instance name. This is more predictable but breaks the ability to use the instance name as a FQDN. Such usage is something I'm told we've never supported, but I'm concerned that there are users out there who are relying on this all the same and I'd like to get a feel for whether this is the case first. So, the question: does anyone currently rely on this inadvertent "feature"? Cheers, Stephen [1] https://launchpad.net/bugs/1581977 [2] https://review.opendev.org/c/openstack/nova/+/764482
When attaching a port to an instance, nova will check for DNS support in neutron and set a 'dns_name' attribute if found. To populate this attribute, nova uses a sanitised version of the instance name, stored in the instance.hostname attribute. This sanitisation simply strips out any unicode characters and replaces underscores and spaces with dashes, before truncating to 63 characters. It does not currently replace periods and this is the cause of bug 1581977 [1], where an instance name such as 'ubuntu20.04' will fail to schedule since neutron identifies '04' as an invalid TLD. stripping out the unicode is actully incorrect behavior. hostname are allowed to contain unicode caraters. the asci subset is recommended but i would find that transfromation itself to be a bug in the implemation. its certenly not guarenteed to happen in the api and is not documented so it is not something
On Mon, 2020-11-30 at 11:51 +0000, Stephen Finucane wrote: people shoudl rely on in any way.
The question now is what to do to resolve this. There are two obvious paths available to us. The first is to simply catch these invalid hostnames and replace them with an arbitrary hostname of format 'Server-{serverUUID}'. This is what we currently do for purely unicode instance names and is what I've proposed at [2]. The other option is to strip all periods, or rather replace them with hyphens, when sanitizing the instance name. This is more predictable but breaks the ability to use the instance name as a FQDN. Such usage is something I'm told we've never supported, but I'm concerned that there are users out there who are relying on this all the same and I'd like to get a feel for whether this is the case first.
So, the question: does anyone currently rely on this inadvertent "feature"?
the other option is to return a 400 bad request and not do magic in the api. i would stongly prefer to not transform the users input in any way and require that they pass valid hostname which is what our current api constratit is and always has been. any usage of an FQDN previously was undfined behavior that may or may not have workd but was never actully allowed. i personally dont consider https://launchpad.net/bugs/1581977 to be a valid but outside of the fact we are not returning a 400 when you pass an FQDN. to do eather feature you describe above you would really need a spec if you want this to be part of the api contract as right now any sanitization or transfromation we do right now are not part fo the api garunetees.
Cheers, Stephen
[1] https://launchpad.net/bugs/1581977 [2] https://review.opendev.org/c/openstack/nova/+/764482
On 2020-11-30 11:51:35 +0000 (+0000), Stephen Finucane wrote: [...]
This is more predictable but breaks the ability to use the instance name as a FQDN. Such usage is something I'm told we've never supported, but I'm concerned that there are users out there who are relying on this all the same and I'd like to get a feel for whether this is the case first.
So, the question: does anyone currently rely on this inadvertent "feature"? [...]
Just to be clear because I'm not entirely sure I follow the question: Are you asking whether users rely on being able to set FQDNs as instance names? Or are you asking whether users rely on Neutron setting those instance names automatically as DNS names? If it's the first, then yes lots of people (including my personal servers, as well as all of the control plane infrastructure for the OpenDev Collaboratory) enter the canonical DNS names of servers as their instance names. We don't rely on any direct DNS integration in our providers, but we do manually match instance names with address records in the relevant DNS zones we maintain. This also came up in another bug report recently, by the way: https://launchpad.net/bugs/1888722 -- Jeremy Stanley
On Mon, 2020-11-30 at 13:58 +0000, Jeremy Stanley wrote:
On 2020-11-30 11:51:35 +0000 (+0000), Stephen Finucane wrote: [...]
This is more predictable but breaks the ability to use the instance name as a FQDN. Such usage is something I'm told we've never supported, but I'm concerned that there are users out there who are relying on this all the same and I'd like to get a feel for whether this is the case first.
So, the question: does anyone currently rely on this inadvertent "feature"? [...]
Just to be clear because I'm not entirely sure I follow the question: Are you asking whether users rely on being able to set FQDNs as instance names? Or are you asking whether users rely on Neutron setting those instance names automatically as DNS names?
If it's the first, then yes lots of people (including my personal servers, as well as all of the control plane infrastructure for the OpenDev Collaboratory) enter the canonical DNS names of servers as their instance names. We don't rely on any direct DNS integration in our providers, but we do manually match instance names with address records in the relevant DNS zones we maintain.
The two questions are unfortunately intertwined. The same information - 'instance.hostname' - is used both by cloud-init (via the metadata service/config drive) to initialize the instance name [1] and by neutron when attaching ports on a network with DNS integration [2]. Unless we decouple those, any change will affect both. Stephen [1] https://github.com/openstack/nova/blob/16cabdd10/nova/api/metadata/base.py#L... [2] https://github.com/openstack/nova/blob/16cabdd10/nova/network/neutron.py#L15...
This also came up in another bug report recently, by the way:
On 2020-11-30 14:45:52 +0000 (+0000), Stephen Finucane wrote: [...]
The two questions are unfortunately intertwined. The same information - 'instance.hostname' - is used both by cloud-init (via the metadata service/config drive) to initialize the instance name [1] and by neutron when attaching ports on a network with DNS integration [2]. Unless we decouple those, any change will affect both. [...]
Okay, then there was also a hidden additional question in there. Do people who set DNS(-like) names for their server instance names also rely on the hostname reported in the instance metadata for configuring the guest OS? In our case we don't, we explicitly set up /etc/hosts and such with configuration management rather than trusting something like cloud-init to get it correct. In our case I don't think we particularly care what "hostname" gets reported in the metadata, only that the instance name remains flexible. -- Jeremy Stanley
On Mon, 2020-11-30 at 13:58 +0000, Jeremy Stanley wrote:
On 2020-11-30 11:51:35 +0000 (+0000), Stephen Finucane wrote: [...]
This is more predictable but breaks the ability to use the instance name as a FQDN. Such usage is something I'm told we've never supported, but I'm concerned that there are users out there who are relying on this all the same and I'd like to get a feel for whether this is the case first.
So, the question: does anyone currently rely on this inadvertent "feature"? [...]
Just to be clear because I'm not entirely sure I follow the question: Are you asking whether users rely on being able to set FQDNs as instance names? Or are you asking whether users rely on Neutron setting those instance names automatically as DNS names? yes stephen was asking whether users rely on being able to set FQDNs as instance names
If it's the first, then yes lots of people (including my personal servers, as well as all of the control plane infrastructure for the OpenDev Collaboratory) enter the canonical DNS names of servers as their instance names.
so that has never actully been supported, the server name has to be a hostname not a FQDN. any use of a an fqdn has never been intended to be supproted. the fact it every worked is a result of incorerct sanitasation in the api which should have rejected it. the sanitisation code was orginally added to adress https://bugs.launchpad.net/nova/+bug/1495388 by try an munge server names into valid hostnames. https://github.com/openstack/nova/commit/bc6f30de953303604625e84ad2345cfb595... we could extend it but if we really want to support FQDN that really feals like an new feature that would require an api microversion not a backportable bug fix. that bugfix is also not really correct '哈哈哈' which was the dislpayname that resulted in an empty host name is not invalid. https://tools.ietf.org/html/rfc5890 and the related docs which discirbes Internationalized Domain Names for Applications covers the use of unicode in domain names. granted back in 2014 our unicode support in openstack was not great but longterm we should allow internatalisted hostname that is out of scope of this converation however. wide use coudl temper that persepctive if suffince exising misuse of the api exists that we need to standarise that use. techinally you can set an fqdn in /etc/hostname but its discuraged https://www.freedesktop.org/software/systemd/man/hostname.html " The hostname may be a free-form string up to 64 characters in length; however, it is recommended that it consists only of 7-bit ASCII lower-case characters and no spaces or dots, and limits itself to the format allowed for DNS domain name labels, even though this is not a strict requirement." https://www.freedesktop.org/wiki/Software/systemd/hostnamed/ " Generate a single DNS label only, not an FQDN. That means no dots allowed. Strip them, or replace them by "-". It's probably safer not to use any non-ASCII chars, even if DNS allows this in some way these days. In fact, restrict your charset to a-zA-Z0-9, - . Strip other chars, or try to replace them in some smart way with chars from this set, for example "ä" → "ae" and suchlike, and use "-" as replacement for all kinds of punctuation chars or spaces. Try to avoid creating repeated "-", as well as "-" as the first or last char. Limit the hostname to 63 chars, which is the length of a DNS label If after stripping special chars the empty string is the result, you can pass this as-is to hostnamed in which case it will automatically make "localhost" out of this. It probably is a good idea to replace uppercase by lowercase chars " if we are to munge the host server name to create the hostname the conventional approch woudl be striping '.' and replacing it with "-" if we choose to support FQDNs we need to ensure that any server name that is set as an FQDN is handeled explcitly and we do not add the dhcp_domain name to it when genreating the server metadata. we also proably need to strip the domain from it to have /etc/hostname correctly set to just the host name. converting the name to server-<server-uuid> would work but it might be surprisign to some that expect the hostname in the server to match the server name which is why i prefer the 400 error for invalid input. regardless of what we choose however we should docuemtn this behavior as we dont document it at all.
We don't rely on any direct DNS integration in our providers, but we do manually match instance names with address records in the relevant DNS zones we maintain.
This also came up in another bug report recently, by the way:
On 2020-11-30 15:16:08 +0000 (+0000), Sean Mooney wrote: [...]
so that has never actully been supported, the server name has to be a hostname not a FQDN. any use of a an fqdn has never been intended to be supproted. [...] techinally you can set an fqdn in /etc/hostname but its discuraged [...]
Yes, we directly set /etc/hostname to contain the "short" hostname and set the FQDN in /etc/hosts as an alias for it bound to a (secondary) v4 loopback consistent with the default configurations for Debian/Ubuntu systems. We don't rely on hostname metadata or cloud-init to provide correct representations, as cloud providers like to mess around with things like default domains and that leads to an inconsistent experience across different environments. -- Jeremy Stanley
I think we should align to the RFCs. First thing to note is hostnames do not have the same rules as domain names. For terminology purposes, I will use the RFC terminology: <hostname>.<domain name> RFC 952 [1] and RFC 1123 [2] define the valid strings for a hostname. A short summary is alphabet (A-Z), digits (0-9), minus sign (-) up to 255 characters. Internationalized hostnames are converted to these rules using Punycode [3]. This is also true for domain names. One of the most common differences between hostnames and domain names is underscores. Underscore is invalid in a hostname, but valid in a domain name. So, I think I am in alignment with Sean. The hostname component should return a 400 if there is an illegal character in the string, such as a period. We also should use Punycode to store/handle unicode hostnames correctly. Michael [1] https://tools.ietf.org/html/rfc952 [2] https://tools.ietf.org/html/rfc1123#page-13 [3] https://www.rfc-editor.org/rfc/rfc3492.txt On Mon, Nov 30, 2020 at 8:54 AM Jeremy Stanley <fungi@yuggoth.org> wrote:
On 2020-11-30 15:16:08 +0000 (+0000), Sean Mooney wrote: [...]
so that has never actully been supported, the server name has to be a hostname not a FQDN. any use of a an fqdn has never been intended to be supproted. [...] techinally you can set an fqdn in /etc/hostname but its discuraged [...]
Yes, we directly set /etc/hostname to contain the "short" hostname and set the FQDN in /etc/hosts as an alias for it bound to a (secondary) v4 loopback consistent with the default configurations for Debian/Ubuntu systems. We don't rely on hostname metadata or cloud-init to provide correct representations, as cloud providers like to mess around with things like default domains and that leads to an inconsistent experience across different environments. -- Jeremy Stanley
On 2020-11-30 10:13:00 -0800 (-0800), Michael Johnson wrote: [...]
So, I think I am in alignment with Sean. The hostname component should return a 400 if there is an illegal character in the string, such as a period. We also should use Punycode to store/handle unicode hostnames correctly. [...]
So to be clear, you're suggesting that if someone asks for an instance name which can't be converted to a (legal) hostname, then the nova boot call be rejected outright, even though there are likely plenty of people who 1. (manually) align their instance names with FQDNs, and 2. don't use the hostname metadata anyway? -- Jeremy Stanley
On Mon, 2020-11-30 at 18:16 +0000, Jeremy Stanley wrote:
On 2020-11-30 10:13:00 -0800 (-0800), Michael Johnson wrote: [...]
So, I think I am in alignment with Sean. The hostname component should return a 400 if there is an illegal character in the string, such as a period. We also should use Punycode to store/handle unicode hostnames correctly. [...]
So to be clear, you're suggesting that if someone asks for an instance name which can't be converted to a (legal) hostname, then the nova boot call be rejected outright, even though there are likely plenty of people who 1. (manually) align their instance names with FQDNs, and 2. don't use the hostname metadata anyway? unfortunetly we cant do that even if its the right thing to do technically.
i think the path forward has to be something more like this. 1.) add a new workaround config option e.g. disable_strict_server_name_check. for upgrade reason it would default to true in wallaby when strict server name checking is disabled we will transform any malformed numeric tld by replacing the '.' with '-' or by replacing the hostname with server-<uuid> as we do with unicode hostnames. 2.) add a new api micro versions and do one of: a.) reject servers with invlaid server names that contain '.' with a 400 b.) transform server names according to the RFEs (replace all '.' and other disallowed charaters with -) c.) add support for FQDNs to the api. this coudl be something like adding a new field to store the fqdn as a top level filed on the server. make hostname contain jsut the host name with the full FQDN in the new fqdn field. if the server name is an fqdn the the fqdn field would just be the server name. if the server name is the a hostname then the nova dhcp_domain will be appended to servername this will allow the remvoal of dhcp_domain from the compute node for config driver generateion and we can generate the metadat form teh new fqdn filed. if designate is enabled then the fqdn will be taken form the port info. in the metadata we will store the instance.hostname which will never be an fqdn in all local hostname keys. we can store teh fqdn in the public_hostname key in the ec2 metadata and in a new fqdn filed. this will make the values consitent and useful. with the new microversion we will nolonger transform the hostname except for multi-create where it will be used as a template i.e. <server name>-<vm index> TBD if the new micorversion will continue to transform unicode hostname to server-<uuid> or allow them out of scope for now. 3.) change workaround config option default to false enforcing the new behavior for old micorverions but do not remove it. this will enable all vm even with old clinets to have to correct behavior. of requiring the hostname to be an actul hostname we will not remove this workaround option going forward allowing cloud that want the old behavior to contiue to have it but endusers can realy on the consitent behavior by opting in to the new microverion if they have a perference. this is a log way to say that i think we need a spec for the new api behavior that adds support for FQDN offically whatever we decide we need to document the actually expected behavior in both the api refernce and general server careate doumentation. if we backport any part of this i think the new behavior should be disabled by default e.g. transfroming the numeric top level domain so that the api behavior continues to be unaffected unless operators opt in to enabling it. toughts?
On 2020-11-30 19:50:22 +0000 (+0000), Sean Mooney wrote: [...]
we can generate the metadat form teh new fqdn filed. if designate is enabled then the fqdn will be taken form the port info. in the metadata we will store the instance.hostname which will never be an fqdn in all local hostname keys. we can store teh fqdn in the public_hostname key in the ec2 metadata and in a new fqdn filed. this will make the values consitent and useful. with the new microversion we will nolonger transform the hostname except for multi-create where it will be used as a template i.e. <server name>-<vm index> TBD if the new micorversion will continue to transform unicode hostname to server-<uuid> or allow them out of scope for now. [...]
If I'm understanding, this proposes to separate the instance name from the hostname, allowing them to be configured independently in API calls. If so, I agree this sounds like the sanest eventual behavior, even if getting there will require microversion bumps and non-backportable improvements. That would allow me to continue setting whatever instance names make sense for me, and I can still ignore the metadata's hostname content, but could also even start using it if it becomes a reliable way to set one across providers (in the far distant future when they've all upgraded). -- Jeremy Stanley
On 11/30/20 11:50, Sean Mooney wrote:
On Mon, 2020-11-30 at 18:16 +0000, Jeremy Stanley wrote:
On 2020-11-30 10:13:00 -0800 (-0800), Michael Johnson wrote: [...]
So, I think I am in alignment with Sean. The hostname component should return a 400 if there is an illegal character in the string, such as a period. We also should use Punycode to store/handle unicode hostnames correctly. [...]
So to be clear, you're suggesting that if someone asks for an instance name which can't be converted to a (legal) hostname, then the nova boot call be rejected outright, even though there are likely plenty of people who 1. (manually) align their instance names with FQDNs, and 2. don't use the hostname metadata anyway? unfortunetly we cant do that even if its the right thing to do technically.
i think the path forward has to be something more like this.
1.) add a new workaround config option e.g. disable_strict_server_name_check. for upgrade reason it would default to true in wallaby when strict server name checking is disabled we will transform any malformed numeric tld by replacing the '.' with '-' or by replacing the hostname with server-<uuid> as we do with unicode hostnames.
Note that the model we've been using for workarounds is that they default to "False" and that users "opt in" to working around existing behavior by setting them to "True". So I think this should be something more like [workarounds]enable_strict_server_name_check = False by default and users can opt-in by setting True.
2.) add a new api micro versions and do one of: a.) reject servers with invlaid server names that contain '.' with a 400 b.) transform server names according to the RFEs (replace all '.' and other disallowed charaters with -) c.) add support for FQDNs to the api. this coudl be something like adding a new field to store the fqdn as a top level filed on the server. make hostname contain jsut the host name with the full FQDN in the new fqdn field. [...]
This would be my preference for the long term: add a new request parameter in the API for 'hostname' using a new microversion and decouple from the display name and let the display name be a nickname for the server. This is what I hacked into the API downstream while I was at Yahoo 7-ish years ago when I was too newb to know how to propose it upstream and also hadn't yet experienced the full pain of carrying downstream patches long term. I think what I did was mostly just route the new 'hostname' request param into the existing code for the display name and then left display name alone to be only a label on the instance that is not used anywhere else. I think having the display name and hostname as separate request parameters in the API is the way we should have been doing this upstream long ago. Cheers, -melanie
Yeah, I think it is important that we decouple hostname from domain as well. This will be consistent with the existing neutron API and allow proper validation of the hostname for use by the guest OS. Michael On Mon, Nov 30, 2020 at 12:31 PM melanie witt <melwittt@gmail.com> wrote:
On 11/30/20 11:50, Sean Mooney wrote:
On Mon, 2020-11-30 at 18:16 +0000, Jeremy Stanley wrote:
On 2020-11-30 10:13:00 -0800 (-0800), Michael Johnson wrote: [...]
So, I think I am in alignment with Sean. The hostname component should return a 400 if there is an illegal character in the string, such as a period. We also should use Punycode to store/handle unicode hostnames correctly. [...]
So to be clear, you're suggesting that if someone asks for an instance name which can't be converted to a (legal) hostname, then the nova boot call be rejected outright, even though there are likely plenty of people who 1. (manually) align their instance names with FQDNs, and 2. don't use the hostname metadata anyway? unfortunetly we cant do that even if its the right thing to do technically.
i think the path forward has to be something more like this.
1.) add a new workaround config option e.g. disable_strict_server_name_check. for upgrade reason it would default to true in wallaby when strict server name checking is disabled we will transform any malformed numeric tld by replacing the '.' with '-' or by replacing the hostname with server-<uuid> as we do with unicode hostnames.
Note that the model we've been using for workarounds is that they default to "False" and that users "opt in" to working around existing behavior by setting them to "True". So I think this should be something more like [workarounds]enable_strict_server_name_check = False by default and users can opt-in by setting True.
2.) add a new api micro versions and do one of: a.) reject servers with invlaid server names that contain '.' with a 400 b.) transform server names according to the RFEs (replace all '.' and other disallowed charaters with -) c.) add support for FQDNs to the api. this coudl be something like adding a new field to store the fqdn as a top level filed on the server. make hostname contain jsut the host name with the full FQDN in the new fqdn field. [...]
This would be my preference for the long term: add a new request parameter in the API for 'hostname' using a new microversion and decouple from the display name and let the display name be a nickname for the server.
This is what I hacked into the API downstream while I was at Yahoo 7-ish years ago when I was too newb to know how to propose it upstream and also hadn't yet experienced the full pain of carrying downstream patches long term. I think what I did was mostly just route the new 'hostname' request param into the existing code for the display name and then left display name alone to be only a label on the instance that is not used anywhere else.
I think having the display name and hostname as separate request parameters in the API is the way we should have been doing this upstream long ago.
Cheers, -melanie
On Mon, 2020-11-30 at 19:50 +0000, Sean Mooney wrote:
On Mon, 2020-11-30 at 18:16 +0000, Jeremy Stanley wrote:
On 2020-11-30 10:13:00 -0800 (-0800), Michael Johnson wrote: [...]
So, I think I am in alignment with Sean. The hostname component should return a 400 if there is an illegal character in the string, such as a period. We also should use Punycode to store/handle unicode hostnames correctly. [...]
So to be clear, you're suggesting that if someone asks for an instance name which can't be converted to a (legal) hostname, then the nova boot call be rejected outright, even though there are likely plenty of people who 1. (manually) align their instance names with FQDNs, and 2. don't use the hostname metadata anyway? unfortunetly we cant do that even if its the right thing to do technically.
i think the path forward has to be something more like this.
1.) add a new workaround config option e.g. disable_strict_server_name_check. for upgrade reason it would default to true in wallaby when strict server name checking is disabled we will transform any malformed numeric tld by replacing the '.' with '-' or by replacing the hostname with server-<uuid> as we do with unicode hostnames.
This sounds like config-driven API behavior, in that the API will respond differently to the same request on different clouds (an 'ubuntu18.04' server name will work on one cloud and fail on the other). That's generally a big no- no. What makes this different?
2.) add a new api micro versions and do one of: a.) reject servers with invlaid server names that contain '.' with a 400 b.) transform server names according to the RFEs (replace all '.' and other disallowed charaters with -) c.) add support for FQDNs to the api. this coudl be something like adding a new field to store the fqdn as a top level filed on the server. make hostname contain jsut the host name with the full FQDN in the new fqdn field. if the server name is an fqdn the the fqdn field would just be the server name. if the server name is the a hostname then the nova dhcp_domain will be appended to servername this will allow the remvoal of dhcp_domain from the compute node for config driver generateion and we can generate the metadat form teh new fqdn filed. if designate is enabled then the fqdn will be taken form the port info. in the metadata we will store the instance.hostname which will never be an fqdn in all local hostname keys. we can store teh fqdn in the public_hostname key in the ec2 metadata and in a new fqdn filed. this will make the values consitent and useful. with the new microversion we will nolonger transform the hostname except for multi-create where it will be used as a template i.e. <server name>-<vm index> TBD if the new micorversion will continue to transform unicode hostname to server-<uuid> or allow them out of scope for now.
This sounds great, and I agree(d) that it's ultimately the correct solution [1], but it doesn't solve anything for users that try the following on any release to date when Designate is configured: openstack server create ... ubuntu18.04 We need to fix this in a backportable manner, hence my suggestion to simply rewrite hostnames if we detect that they're still invalid after sanitization. Remember, we already do sanitization, and with this people that are using valid FQDN like Ruby or Jeremy can continue to do so and people that don't know about this "feature" are able to create instances. I don't see what the downside of that would be. Stephen [1] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2...
3.) change workaround config option default to false enforcing the new behavior for old micorverions but do not remove it. this will enable all vm even with old clinets to have to correct behavior. of requiring the hostname to be an actul hostname we will not remove this workaround option going forward allowing cloud that want the old behavior to contiue to have it but endusers can realy on the consitent behavior by opting in to the new microverion if they have a perference.
this is a log way to say that i think we need a spec for the new api behavior that adds support for FQDN offically whatever we decide we need to document the actually expected behavior in both the api refernce and general server careate doumentation.
if we backport any part of this i think the new behavior should be disabled by default e.g. transfroming the numeric top level domain so that the api behavior continues to be unaffected unless operators opt in to enabling it.
toughts?
On Mon, 2020-11-30 at 19:50 +0000, Sean Mooney wrote:
On Mon, 2020-11-30 at 18:16 +0000, Jeremy Stanley wrote:
On 2020-11-30 10:13:00 -0800 (-0800), Michael Johnson wrote: [...]
So, I think I am in alignment with Sean. The hostname component should return a 400 if there is an illegal character in the string, such as a period. We also should use Punycode to store/handle unicode hostnames correctly. [...]
So to be clear, you're suggesting that if someone asks for an instance name which can't be converted to a (legal) hostname, then the nova boot call be rejected outright, even though there are likely plenty of people who 1. (manually) align their instance names with FQDNs, and 2. don't use the hostname metadata anyway? unfortunetly we cant do that even if its the right thing to do technically.
i think the path forward has to be something more like this.
1.) add a new workaround config option e.g. disable_strict_server_name_check. for upgrade reason it would default to true in wallaby when strict server name checking is disabled we will transform any malformed numeric tld by replacing the '.' with '-' or by replacing the hostname with server-<uuid> as we do with unicode hostnames.
This sounds like config-driven API behavior, in that the API will respond differently to the same request on different clouds (an 'ubuntu18.04' server name will work on one cloud and fail on the other). That's generally a big no- no. What makes this different?
On Tue, 2020-12-01 at 10:30 +0000, Stephen Finucane wrote: this is the backportable bit where we allow the numeric tlds if you opt into it. it is config driven api behavior but it is the only way i think its valid to backport a behavioral api change. we should not do that without a way to disable it hence a workaround config option.
2.) add a new api micro versions and do one of: a.) reject servers with invlaid server names that contain '.' with a 400 b.) transform server names according to the RFEs (replace all '.' and other disallowed charaters with -) c.) add support for FQDNs to the api. this coudl be something like adding a new field to store the fqdn as a top level filed on the server. make hostname contain jsut the host name with the full FQDN in the new fqdn field. if the server name is an fqdn the the fqdn field would just be the server name. if the server name is the a hostname then the nova dhcp_domain will be appended to servername this will allow the remvoal of dhcp_domain from the compute node for config driver generateion and we can generate the metadat form teh new fqdn filed. if designate is enabled then the fqdn will be taken form the port info. in the metadata we will store the instance.hostname which will never be an fqdn in all local hostname keys. we can store teh fqdn in the public_hostname key in the ec2 metadata and in a new fqdn filed. this will make the values consitent and useful. with the new microversion we will nolonger transform the hostname except for multi-create where it will be used as a template i.e. <server name>-<vm index> TBD if the new micorversion will continue to transform unicode hostname to server-<uuid> or allow them out of scope for now.
This sounds great, and I agree(d) that it's ultimately the correct solution [1], but it doesn't solve anything for users that try the following on any release to date when Designate is configured:
openstack server create ... ubuntu18.04
We need to fix this in a backportable manner, hence my suggestion to simply rewrite hostnames if we detect that they're still invalid after sanitization.
well im not sure we do. ubuntu18.04 would have always failed regardelss of if you have designate or not. numeric tld have never worked it was one of the first things i learned when i started working on hevana so i dont think we have to "fix" that in a backportable way, that is not new with designate, but if we must then we can do so with the workaround option. althouht melanie is right we should name it enable_strict_server_name_checking=false rather than disable_strict_server_name_check=true to follow convention.
Remember, we already do sanitization, and with this people that are using valid FQDN like Ruby or Jeremy can continue to do so and people that don't know about this "feature" are able to create instances. I don't see what the downside of that would be. un less we also adress the fact that the FQDN you enter will not be available to the instance in any way i really dont see the point in just allowing it to be set. the server will be presented with 4 different host names none of which will be the one that you entered as i pointed out in my previous mail. the only reason you can even ping it is because of the dns search domain if that is not congired the the server name will not resolve at all unless you also register that dns name manually or set it in /etc/hosts after teh fact.
you can do both of those without actully setting the server name to match so im not seeing a compleing reason to allow numeric tlds Ruby's and Jeremy's use cases condered given numeric tlds never worked it should not affect them provided we contiue to allow fqdns in the server name while enable_strict_server_name_checking=false
Stephen
[1] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2...
3.) change workaround config option default to false enforcing the new behavior for old micorverions but do not remove it. this will enable all vm even with old clinets to have to correct behavior. of requiring the hostname to be an actul hostname we will not remove this workaround option going forward allowing cloud that want the old behavior to contiue to have it but endusers can realy on the consitent behavior by opting in to the new microverion if they have a perference.
this is a log way to say that i think we need a spec for the new api behavior that adds support for FQDN offically whatever we decide we need to document the actually expected behavior in both the api refernce and general server careate doumentation.
if we backport any part of this i think the new behavior should be disabled by default e.g. transfroming the numeric top level domain so that the api behavior continues to be unaffected unless operators opt in to enabling it.
toughts?
On Mon, Nov 30, 2020 at 6:56 AM Stephen Finucane <stephenfin@redhat.com> wrote:
When attaching a port to an instance, nova will check for DNS support in neutron and set a 'dns_name' attribute if found. To populate this attribute, nova uses a sanitised version of the instance name, stored in the instance.hostname attribute. This sanitisation simply strips out any unicode characters and replaces underscores and spaces with dashes, before truncating to 63 characters. It does not currently replace periods and this is the cause of bug 1581977 [1], where an instance name such as 'ubuntu20.04' will fail to schedule since neutron identifies '04' as an invalid TLD.
The question now is what to do to resolve this. There are two obvious paths available to us. The first is to simply catch these invalid hostnames and replace them with an arbitrary hostname of format 'Server-{serverUUID}'. This is what we currently do for purely unicode instance names and is what I've proposed at [2]. The other option is to strip all periods, or rather replace them with hyphens, when sanitizing the instance name. This is more predictable but breaks the ability to use the instance name as a FQDN. Such usage is something I'm told we've never supported, but I'm concerned that there are users out there who are relying on this all the same and I'd like to get a feel for whether this is the case first.
So, the question: does anyone currently rely on this inadvertent "feature"?
I took a look and we (at Verizon Media) have users that create instances with fqdn-like names (VMs and BMs). I didn't look to see how many instances have such names, but we have tens of thousands of instances and eyeballing one of our clusters, > 90% of them have such names. --ruby
Cheers, Stephen
[1] https://launchpad.net/bugs/1581977 [2] https://review.opendev.org/c/openstack/nova/+/764482
On Mon, 2020-11-30 at 10:55 -0500, Ruby Loo wrote:
On Mon, Nov 30, 2020 at 6:56 AM Stephen Finucane <stephenfin@redhat.com> wrote:
When attaching a port to an instance, nova will check for DNS support in neutron and set a 'dns_name' attribute if found. To populate this attribute, nova uses a sanitised version of the instance name, stored in the instance.hostname attribute. This sanitisation simply strips out any unicode characters and replaces underscores and spaces with dashes, before truncating to 63 characters. It does not currently replace periods and this is the cause of bug 1581977 [1], where an instance name such as 'ubuntu20.04' will fail to schedule since neutron identifies '04' as an invalid TLD.
The question now is what to do to resolve this. There are two obvious paths available to us. The first is to simply catch these invalid hostnames and replace them with an arbitrary hostname of format 'Server-{serverUUID}'. This is what we currently do for purely unicode instance names and is what I've proposed at [2]. The other option is to strip all periods, or rather replace them with hyphens, when sanitizing the instance name. This is more predictable but breaks the ability to use the instance name as a FQDN. Such usage is something I'm told we've never supported, but I'm concerned that there are users out there who are relying on this all the same and I'd like to get a feel for whether this is the case first.
So, the question: does anyone currently rely on this inadvertent "feature"?
I took a look and we (at Verizon Media) have users that create instances with fqdn-like names (VMs and BMs). I didn't look to see how many instances have such names, but we have tens of thousands of instances and eyeballing one of our clusters, > 90% of them have such names.
ok based on this we cant outright block fqdns in teh api. their use is still undefined but we at least need to provide a deprecation cycle and upgrade procedure it we want to remove them though it sound like we need to actully add support which mean we need to agree on the semantics. i have done some testing with designate enabled and use fqdns http://paste.openstack.org/show/800564/ tl;dr its totally inconsitent and broken in various way but you can boot vmand networking appears to work. i booted a server wtih the server name test-dns.invalid.dns the default domain name for my deploymnent in designate is cloud.seanmooney.info the hostname in the vm is test-dns ubuntu@test-dns:~$ cat /etc/hostname test-dns ubuntu@test-dns:~$ hostname test-dns however the fqdn is reported as test-dns.cloud.seanmooney.info ubuntu@test-dns:~$ hostname -f test-dns.cloud.seanmooney.info if we list all fqdns we get ubuntu@test-dns:~$ hostname -A test-dns.invalid.dns.cloud.seanmooney.info test-dns.invalid.dns.cloud.seanmooney.info not that it appends the default designate domain to the server name looking at the ec2 metadata the server name is appended to the nova dhcp_domain conf option value "hostname": "test-dns.invalid.dns.novalocal", "instance-action": "none", "instance-id": "i-00000109", "instance-type": "small-multi-numa", "local-hostname": "test-dns.invalid.dns.novalocal", "local-ipv4": "172.20.4.32", "placement": { "availability-zone": "nova" }, "public-hostname": "test-dns.invalid.dns.novalocal", and the openstack varient is "local-hostname": "test-dns", so the server has 4 posible hostnames and none of them are server name "test-dns.invalid.dns" test-dns.invalid.dns does resolve on the host buntu@test-dns:~$ ping test-dns.invalid.dns PING test-dns.invalid.dns(test-dns.invalid.dns.cloud.seanmooney.info (2001:470:1836:1:f816:3eff:fe59:f619)) 56 data bytes 64 bytes from test-dns.invalid.dns.cloud.seanmooney.info (2001:470:1836:1:f816:3eff:fe59:f619): icmp_seq=1 ttl=64 time=0.039 ms 64 bytes from test-dns.invalid.dns.cloud.seanmooney.info (2001:470:1836:1:f816:3eff:fe59:f619): icmp_seq=2 ttl=64 time=0.062 ms ^C --- test-dns.invalid.dns ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1001ms rtt min/avg/max/mdev = 0.039/0.050/0.062/0.011 ms ubuntu@test-dns:~$ nslookup test-dns.invalid.dns Server: 127.0.0.53 Address: 127.0.0.53#53 Non-authoritative answer: Name: test-dns.invalid.dns Address: 172.20.4.32 Name: test-dns.invalid.dns Address: 2001:470:1836:1:f816:3eff:fe59:f619 but only because of the dns search path ubuntu@test-dns:~$ cat /etc/resolv.conf # This file is managed by man:systemd-resolved(8). Do not edit. # # This is a dynamic resolv.conf file for connecting local clients to the # internal DNS stub resolver of systemd-resolved. This file lists all # configured search domains. # # Run "resolvectl status" to see details about the uplink DNS servers # currently in use. # # Third party programs must not access this file directly, but only through the # symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a different way, # replace this symlink by a static file or a different symlink. # # See man:systemd-resolved.service(8) for details about the supported modes of # operation for /etc/resolv.conf. nameserver 127.0.0.53 options edns0 search cloud.seanmooney.info ubuntu@test-dns:~$ systemd-resolve --status Global LLMNR setting: no MulticastDNS setting: no DNSOverTLS setting: no DNSSEC setting: no DNSSEC supported: no DNSSEC NTA: 10.in-addr.arpa 16.172.in-addr.arpa 168.192.in-addr.arpa 17.172.in-addr.arpa 18.172.in-addr.arpa 19.172.in-addr.arpa 20.172.in-addr.arpa 21.172.in-addr.arpa 22.172.in-addr.arpa 23.172.in-addr.arpa 24.172.in-addr.arpa 25.172.in-addr.arpa 26.172.in-addr.arpa 27.172.in-addr.arpa 28.172.in-addr.arpa 29.172.in-addr.arpa 30.172.in-addr.arpa 31.172.in-addr.arpa corp d.f.ip6.arpa home internal intranet lan local private test Link 2 (enp1s0) Current Scopes: DNS DefaultRoute setting: yes LLMNR setting: yes MulticastDNS setting: no DNSOverTLS setting: no DNSSEC setting: no DNSSEC supported: no Current DNS Server: 172.20.4.2 DNS Servers: 172.20.4.2 2001:470:1836:1:f816:3eff:fe7b:583 DNS Domain: cloud.seanmooney.info as far as i can tell there is no file or data source that will allow you to see teh server name value test-dns.invalid.dns the closest you can get is the instance uuid 9387d654-93eb-41cf-9f59-a6099e0daba1 in the cloud metadata. so i you are using FQDNs today it does not work in any useful way.
--ruby
Cheers, Stephen
[1] https://launchpad.net/bugs/1581977 [2] https://review.opendev.org/c/openstack/nova/+/764482
If we are going to continue to use the "name" API field to populate metadata for the instance and set the dns_name field of the port, yes. Really the right answer is to add a hostname field to the nova API that had the proper validation and IDN support. That way the "name" field is simple API metadata and not overloaded for multiple purposes. Michael On Mon, Nov 30, 2020 at 10:52 AM Sean Mooney <smooney@redhat.com> wrote:
On Mon, 2020-11-30 at 10:55 -0500, Ruby Loo wrote:
On Mon, Nov 30, 2020 at 6:56 AM Stephen Finucane <stephenfin@redhat.com> wrote:
When attaching a port to an instance, nova will check for DNS support in neutron and set a 'dns_name' attribute if found. To populate this attribute, nova uses a sanitised version of the instance name, stored in the instance.hostname attribute. This sanitisation simply strips out any unicode characters and replaces underscores and spaces with dashes, before truncating to 63 characters. It does not currently replace periods and this is the cause of bug 1581977 [1], where an instance name such as 'ubuntu20.04' will fail to schedule since neutron identifies '04' as an invalid TLD.
The question now is what to do to resolve this. There are two obvious paths available to us. The first is to simply catch these invalid hostnames and replace them with an arbitrary hostname of format 'Server-{serverUUID}'. This is what we currently do for purely unicode instance names and is what I've proposed at [2]. The other option is to strip all periods, or rather replace them with hyphens, when sanitizing the instance name. This is more predictable but breaks the ability to use the instance name as a FQDN. Such usage is something I'm told we've never supported, but I'm concerned that there are users out there who are relying on this all the same and I'd like to get a feel for whether this is the case first.
So, the question: does anyone currently rely on this inadvertent "feature"?
I took a look and we (at Verizon Media) have users that create instances with fqdn-like names (VMs and BMs). I didn't look to see how many instances have such names, but we have tens of thousands of instances and eyeballing one of our clusters, > 90% of them have such names.
ok based on this we cant outright block fqdns in teh api. their use is still undefined but we at least need to provide a deprecation cycle and upgrade procedure it we want to remove them though it sound like we need to actully add support which mean we need to agree on the semantics.
i have done some testing with designate enabled and use fqdns
http://paste.openstack.org/show/800564/
tl;dr its totally inconsitent and broken in various way but you can boot vmand networking appears to work.
i booted a server wtih the server name test-dns.invalid.dns the default domain name for my deploymnent in designate is cloud.seanmooney.info
the hostname in the vm is test-dns
ubuntu@test-dns:~$ cat /etc/hostname test-dns
ubuntu@test-dns:~$ hostname test-dns
however the fqdn is reported as test-dns.cloud.seanmooney.info
ubuntu@test-dns:~$ hostname -f test-dns.cloud.seanmooney.info
if we list all fqdns we get ubuntu@test-dns:~$ hostname -A test-dns.invalid.dns.cloud.seanmooney.info test-dns.invalid.dns.cloud.seanmooney.info not that it appends the default designate domain to the server name
looking at the ec2 metadata the server name is appended to the nova dhcp_domain conf option value
"hostname": "test-dns.invalid.dns.novalocal", "instance-action": "none", "instance-id": "i-00000109", "instance-type": "small-multi-numa", "local-hostname": "test-dns.invalid.dns.novalocal", "local-ipv4": "172.20.4.32", "placement": { "availability-zone": "nova" }, "public-hostname": "test-dns.invalid.dns.novalocal",
and the openstack varient is "local-hostname": "test-dns",
so the server has 4 posible hostnames and none of them are server name "test-dns.invalid.dns"
test-dns.invalid.dns does resolve on the host
buntu@test-dns:~$ ping test-dns.invalid.dns PING test-dns.invalid.dns(test-dns.invalid.dns.cloud.seanmooney.info (2001:470:1836:1:f816:3eff:fe59:f619)) 56 data bytes 64 bytes from test-dns.invalid.dns.cloud.seanmooney.info (2001:470:1836:1:f816:3eff:fe59:f619): icmp_seq=1 ttl=64 time=0.039 ms 64 bytes from test-dns.invalid.dns.cloud.seanmooney.info (2001:470:1836:1:f816:3eff:fe59:f619): icmp_seq=2 ttl=64 time=0.062 ms ^C --- test-dns.invalid.dns ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1001ms rtt min/avg/max/mdev = 0.039/0.050/0.062/0.011 ms ubuntu@test-dns:~$ nslookup test-dns.invalid.dns Server: 127.0.0.53 Address: 127.0.0.53#53
Non-authoritative answer: Name: test-dns.invalid.dns Address: 172.20.4.32 Name: test-dns.invalid.dns Address: 2001:470:1836:1:f816:3eff:fe59:f619
but only because of the dns search path
ubuntu@test-dns:~$ cat /etc/resolv.conf # This file is managed by man:systemd-resolved(8). Do not edit. # # This is a dynamic resolv.conf file for connecting local clients to the # internal DNS stub resolver of systemd-resolved. This file lists all # configured search domains. # # Run "resolvectl status" to see details about the uplink DNS servers # currently in use. # # Third party programs must not access this file directly, but only through the # symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a different way, # replace this symlink by a static file or a different symlink. # # See man:systemd-resolved.service(8) for details about the supported modes of # operation for /etc/resolv.conf.
nameserver 127.0.0.53 options edns0 search cloud.seanmooney.info
ubuntu@test-dns:~$ systemd-resolve --status Global LLMNR setting: no MulticastDNS setting: no DNSOverTLS setting: no DNSSEC setting: no DNSSEC supported: no DNSSEC NTA: 10.in-addr.arpa 16.172.in-addr.arpa 168.192.in-addr.arpa 17.172.in-addr.arpa 18.172.in-addr.arpa 19.172.in-addr.arpa 20.172.in-addr.arpa 21.172.in-addr.arpa 22.172.in-addr.arpa 23.172.in-addr.arpa 24.172.in-addr.arpa 25.172.in-addr.arpa 26.172.in-addr.arpa 27.172.in-addr.arpa 28.172.in-addr.arpa 29.172.in-addr.arpa 30.172.in-addr.arpa 31.172.in-addr.arpa corp d.f.ip6.arpa home internal intranet lan local private test
Link 2 (enp1s0) Current Scopes: DNS DefaultRoute setting: yes LLMNR setting: yes MulticastDNS setting: no DNSOverTLS setting: no DNSSEC setting: no DNSSEC supported: no Current DNS Server: 172.20.4.2 DNS Servers: 172.20.4.2 2001:470:1836:1:f816:3eff:fe7b:583 DNS Domain: cloud.seanmooney.info
as far as i can tell there is no file or data source that will allow you to see teh server name value test-dns.invalid.dns the closest you can get is the instance uuid 9387d654-93eb-41cf-9f59-a6099e0daba1 in the cloud metadata.
so i you are using FQDNs today it does not work in any useful way.
--ruby
Cheers, Stephen
[1] https://launchpad.net/bugs/1581977 [2] https://review.opendev.org/c/openstack/nova/+/764482
On Mon, Nov 30, 2020 at 2:20 PM Michael Johnson <johnsomor@gmail.com> wrote:
If we are going to continue to use the "name" API field to populate metadata for the instance and set the dns_name field of the port, yes.
Really the right answer is to add a hostname field to the nova API that had the proper validation and IDN support. That way the "name" field is simple API metadata and not overloaded for multiple purposes.
Yeah, I don't think anyone's going to disagree with this. But this would be a new microversion, and so cannot be backported to stable releases. Our (RedHat's) pickle is what to do in the case of someone wanting to use domain-like VM names with Neutron DNS integration. Currently what ends up in Neutron's dns_name field (and in Nova's instance.hostname) is derived from the VM name, but that derivation allows things that Neutron doesn't agree are valid - namely, endings in .<number>. I think Stephen's original question is around modifying how we do that derivation, and whether starting to transform <name>.<number> into Server-<UUID> would break anyone. My proposal would be to just guard that new transformation with a [workarounds] config option. I know it's lazy and dirty, but as you pointed out, the real solution is a new API microversion that splits the VM display name from the hostname/FQDN entirely. The new [workarounds] option means anyone hitting bug 1581977 can turn it on fully aware of what's going to happen (namely, ubuntu.04 getting a hostname of Server-<UUID>), and anyone not affected can carry on as before.
Michael
On Mon, Nov 30, 2020 at 10:52 AM Sean Mooney <smooney@redhat.com> wrote:
On Mon, 2020-11-30 at 10:55 -0500, Ruby Loo wrote:
On Mon, Nov 30, 2020 at 6:56 AM Stephen Finucane <stephenfin@redhat.com> wrote:
When attaching a port to an instance, nova will check for DNS support in neutron and set a 'dns_name' attribute if found. To populate this attribute, nova uses a sanitised version of the instance name, stored in the instance.hostname attribute. This sanitisation simply strips out any unicode characters and replaces underscores and spaces with dashes, before truncating to 63 characters. It does not currently replace periods and this is the cause of bug 1581977 [1], where an instance name such as 'ubuntu20.04' will fail to schedule since neutron identifies '04' as an invalid TLD.
The question now is what to do to resolve this. There are two obvious paths available to us. The first is to simply catch these invalid hostnames and replace them with an arbitrary hostname of format 'Server-{serverUUID}'. This is what we currently do for purely unicode instance names and is what I've proposed at [2]. The other option is to strip all periods, or rather replace them with hyphens, when sanitizing the instance name. This is more predictable but breaks the ability to use the instance name as a FQDN. Such usage is something I'm told we've never supported, but I'm concerned that there are users out there who are relying on this all the same and I'd like to get a feel for whether this is the case first.
So, the question: does anyone currently rely on this inadvertent "feature"?
I took a look and we (at Verizon Media) have users that create instances with fqdn-like names (VMs and BMs). I didn't look to see how many instances have such names, but we have tens of thousands of instances and eyeballing one of our clusters, > 90% of them have such names.
ok based on this we cant outright block fqdns in teh api. their use is still undefined but we at least need to provide a deprecation cycle and upgrade procedure it we want to remove them though it sound like we need to actully add support which mean we need to agree on the semantics.
i have done some testing with designate enabled and use fqdns
http://paste.openstack.org/show/800564/
tl;dr its totally inconsitent and broken in various way but you can boot vmand networking appears to work.
i booted a server wtih the server name test-dns.invalid.dns the default domain name for my deploymnent in designate is cloud.seanmooney.info
the hostname in the vm is test-dns
ubuntu@test-dns:~$ cat /etc/hostname test-dns
ubuntu@test-dns:~$ hostname test-dns
however the fqdn is reported as test-dns.cloud.seanmooney.info
ubuntu@test-dns:~$ hostname -f test-dns.cloud.seanmooney.info
if we list all fqdns we get ubuntu@test-dns:~$ hostname -A test-dns.invalid.dns.cloud.seanmooney.info test-dns.invalid.dns.cloud.seanmooney.info not that it appends the default designate domain to the server name
looking at the ec2 metadata the server name is appended to the nova dhcp_domain conf option value
"hostname": "test-dns.invalid.dns.novalocal", "instance-action": "none", "instance-id": "i-00000109", "instance-type": "small-multi-numa", "local-hostname": "test-dns.invalid.dns.novalocal", "local-ipv4": "172.20.4.32", "placement": { "availability-zone": "nova" }, "public-hostname": "test-dns.invalid.dns.novalocal",
and the openstack varient is "local-hostname": "test-dns",
so the server has 4 posible hostnames and none of them are server name "test-dns.invalid.dns"
test-dns.invalid.dns does resolve on the host
buntu@test-dns:~$ ping test-dns.invalid.dns PING test-dns.invalid.dns(test-dns.invalid.dns.cloud.seanmooney.info (2001:470:1836:1:f816:3eff:fe59:f619)) 56 data bytes 64 bytes from test-dns.invalid.dns.cloud.seanmooney.info (2001:470:1836:1:f816:3eff:fe59:f619): icmp_seq=1 ttl=64 time=0.039 ms 64 bytes from test-dns.invalid.dns.cloud.seanmooney.info (2001:470:1836:1:f816:3eff:fe59:f619): icmp_seq=2 ttl=64 time=0.062 ms ^C --- test-dns.invalid.dns ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1001ms rtt min/avg/max/mdev = 0.039/0.050/0.062/0.011 ms ubuntu@test-dns:~$ nslookup test-dns.invalid.dns Server: 127.0.0.53 Address: 127.0.0.53#53
Non-authoritative answer: Name: test-dns.invalid.dns Address: 172.20.4.32 Name: test-dns.invalid.dns Address: 2001:470:1836:1:f816:3eff:fe59:f619
but only because of the dns search path
ubuntu@test-dns:~$ cat /etc/resolv.conf # This file is managed by man:systemd-resolved(8). Do not edit. # # This is a dynamic resolv.conf file for connecting local clients to the # internal DNS stub resolver of systemd-resolved. This file lists all # configured search domains. # # Run "resolvectl status" to see details about the uplink DNS servers # currently in use. # # Third party programs must not access this file directly, but only through the # symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a different way, # replace this symlink by a static file or a different symlink. # # See man:systemd-resolved.service(8) for details about the supported modes of # operation for /etc/resolv.conf.
nameserver 127.0.0.53 options edns0 search cloud.seanmooney.info
ubuntu@test-dns:~$ systemd-resolve --status Global LLMNR setting: no MulticastDNS setting: no DNSOverTLS setting: no DNSSEC setting: no DNSSEC supported: no DNSSEC NTA: 10.in-addr.arpa 16.172.in-addr.arpa 168.192.in-addr.arpa 17.172.in-addr.arpa 18.172.in-addr.arpa 19.172.in-addr.arpa 20.172.in-addr.arpa 21.172.in-addr.arpa 22.172.in-addr.arpa 23.172.in-addr.arpa 24.172.in-addr.arpa 25.172.in-addr.arpa 26.172.in-addr.arpa 27.172.in-addr.arpa 28.172.in-addr.arpa 29.172.in-addr.arpa 30.172.in-addr.arpa 31.172.in-addr.arpa corp d.f.ip6.arpa home internal intranet lan local private test
Link 2 (enp1s0) Current Scopes: DNS DefaultRoute setting: yes LLMNR setting: yes MulticastDNS setting: no DNSOverTLS setting: no DNSSEC setting: no DNSSEC supported: no Current DNS Server: 172.20.4.2 DNS Servers: 172.20.4.2 2001:470:1836:1:f816:3eff:fe7b:583 DNS Domain: cloud.seanmooney.info
as far as i can tell there is no file or data source that will allow you to see teh server name value test-dns.invalid.dns the closest you can get is the instance uuid 9387d654-93eb-41cf-9f59-a6099e0daba1 in the cloud metadata.
so i you are using FQDNs today it does not work in any useful way.
--ruby
Cheers, Stephen
[1] https://launchpad.net/bugs/1581977 [2] https://review.opendev.org/c/openstack/nova/+/764482
On Mon, 2020-11-30 at 11:51 +0000, Stephen Finucane wrote:
When attaching a port to an instance, nova will check for DNS support in neutron and set a 'dns_name' attribute if found. To populate this attribute, nova uses a sanitised version of the instance name, stored in the instance.hostname attribute. This sanitisation simply strips out any unicode characters and replaces underscores and spaces with dashes, before truncating to 63 characters. It does not currently replace periods and this is the cause of bug 1581977 [1], where an instance name such as 'ubuntu20.04' will fail to schedule since neutron identifies '04' as an invalid TLD.
The question now is what to do to resolve this. There are two obvious paths available to us. The first is to simply catch these invalid hostnames and replace them with an arbitrary hostname of format 'Server-{serverUUID}'. This is what we currently do for purely unicode instance names and is what I've proposed at [2]. The other option is to strip all periods, or rather replace them with hyphens, when sanitizing the instance name. This is more predictable but breaks the ability to use the instance name as a FQDN. Such usage is something I'm told we've never supported, but I'm concerned that there are users out there who are relying on this all the same and I'd like to get a feel for whether this is the case first.
So, the question: does anyone currently rely on this inadvertent "feature"?
Thanks to everyone who replied to this. We discussed this in today's nova meeting [1] and decided we're okay with changing how we generate instance names, and that we can backport this since there are no guarantees made in either the documentation or API reference as to what a instance's hostname will be and existing instance's won't see their hostname change. There are two options available to us: * Replace periods with dashes This has the best results for people that are naming their instance with FQDNs, since the hostname looks sane. 'test-instance.mydomain.org' -> 'test-instance-mydomain-org' 'ubuntu18.04' -> 'ubuntu18-04' * Strip everything after the first period This has the best results for everyone else, since the hostname better reflects the original display name. 'test-instance.mydomain.org' -> 'test-instance' 'ubuntu18.04' -> 'ubuntu18' If anyone has strong feeling on either approach, please let us know. If not, we'll duke this out ourselves on #openstack-nova next week. Also, as an aside, I think we all realize that long term, the best solution for this would probably be a API change. This would allow us to add an 'openstack server create --hostname' parameter that is correctly validated against the various RFCs. I'm not currently planning to work on this but I'd be happy to assist anyone that was interested in doing so. Cheers, Stephen [1] http://eavesdrop.openstack.org/meetings/nova/2020/nova.2020-12-03-16.00.log....
Cheers, Stephen
[1] https://launchpad.net/bugs/1581977 [2] https://review.opendev.org/c/openstack/nova/+/764482
So coming in very late here with a more... radical? idea. This is just brainstorming, but here it goes: Neutron explodes when we update the port with an invalid `dns_name` here [1]. The request we send in [1] is populated here [2]. So... why not just *not* do that? IOW, the port will not have a `dns_name` set at all by Nova when we create the VM, and users can use Neutron's port-update API [3] to set the hostname they desire. That way, if Neutron does return a BadRequest, it will really be because the fqdn is invalid, not because Nova tried to be "smart". [1] https://github.com/openstack/nova/blob/stable/train/nova/network/neutronv2/a... [2] https://github.com/openstack/nova/blob/stable/train/nova/network/neutronv2/a... [3] https://docs.openstack.org/api-ref/network/v2/index.html?expanded=update-por... On Thu, Dec 3, 2020 at 12:20 PM Stephen Finucane <stephenfin@redhat.com> wrote:
On Mon, 2020-11-30 at 11:51 +0000, Stephen Finucane wrote:
When attaching a port to an instance, nova will check for DNS support in neutron and set a 'dns_name' attribute if found. To populate this attribute, nova uses a sanitised version of the instance name, stored in the instance.hostname attribute. This sanitisation simply strips out any unicode characters and replaces underscores and spaces with dashes, before truncating to 63 characters. It does not currently replace periods and this is the cause of bug 1581977 [1], where an instance name such as 'ubuntu20.04' will fail to schedule since neutron identifies '04' as an invalid TLD.
The question now is what to do to resolve this. There are two obvious paths available to us. The first is to simply catch these invalid hostnames and replace them with an arbitrary hostname of format 'Server-{serverUUID}'. This is what we currently do for purely unicode instance names and is what I've proposed at [2]. The other option is to strip all periods, or rather replace them with hyphens, when sanitizing the instance name. This is more predictable but breaks the ability to use the instance name as a FQDN. Such usage is something I'm told we've never supported, but I'm concerned that there are users out there who are relying on this all the same and I'd like to get a feel for whether this is the case first.
So, the question: does anyone currently rely on this inadvertent "feature"?
Thanks to everyone who replied to this. We discussed this in today's nova meeting [1] and decided we're okay with changing how we generate instance names, and that we can backport this since there are no guarantees made in either the documentation or API reference as to what a instance's hostname will be and existing instance's won't see their hostname change. There are two options available to us:
* Replace periods with dashes
This has the best results for people that are naming their instance with FQDNs, since the hostname looks sane.
'test-instance.mydomain.org' -> 'test-instance-mydomain-org' 'ubuntu18.04' -> 'ubuntu18-04'
* Strip everything after the first period
This has the best results for everyone else, since the hostname better reflects the original display name.
'test-instance.mydomain.org' -> 'test-instance' 'ubuntu18.04' -> 'ubuntu18'
If anyone has strong feeling on either approach, please let us know. If not, we'll duke this out ourselves on #openstack-nova next week.
Also, as an aside, I think we all realize that long term, the best solution for this would probably be a API change. This would allow us to add an 'openstack server create --hostname' parameter that is correctly validated against the various RFCs. I'm not currently planning to work on this but I'd be happy to assist anyone that was interested in doing so.
Cheers, Stephen
[1] http://eavesdrop.openstack.org/meetings/nova/2020/nova.2020-12-03-16.00.log....
Cheers, Stephen
[1] https://launchpad.net/bugs/1581977 [2] https://review.opendev.org/c/openstack/nova/+/764482
On 11/30/20 12:51 PM, Stephen Finucane wrote:
The other option is to strip all periods, or rather replace them with hyphens, when sanitizing the instance name. This is more predictable but breaks the ability to use the instance name as a FQDN. Such usage is something I'm told we've never supported, but I'm concerned that there are users out there who are relying on this all the same and I'd like to get a feel for whether this is the case first.
Hi, We don't use Designate *yet*, but we're planning to. Using an FQDN for the instance name is what we used to do so far. Even if that's not something that *was* supported, it would IMO be desirable to support it, at least in the future. Just my 2 cents, hoping to help, Cheers, Thomas Goirand (zigo)
To be clear, I think there are confusions in between three names : #1 the instance display name #2 the /etc/hostname #3 the related /etc/hosts name For #1, having a FQDN [1] is OK. Also, as it's an API field, we can't change it or it would need a new microversion. For #2, as Jeremy said, in general you just have the short instance name, not the whole FQDN, so I think it's totally fine to strip the name after the first period (and AFAICT, that's why you see the short name already as the OS cuts it already) For #3, you can *either* have short names or FQDNs but given we see problems with Designate, I'd be telling that we should also strip the name instead of having the whole FQDN, as anyway the domain is not verified by Nova. -Sylvain [1] By FQDN, I mean a name like "instance.tld" where "tld" is "domain[\..*]+" On Thu, Dec 3, 2020 at 10:16 PM Thomas Goirand <zigo@debian.org> wrote:
On 11/30/20 12:51 PM, Stephen Finucane wrote:
The other option is to strip all periods, or rather replace them with hyphens, when sanitizing the instance name. This is more predictable but breaks the ability to use the instance name as a FQDN. Such usage is something I'm told we've never supported, but I'm concerned that there are users out there who are relying on this all the same and I'd like to get a feel for whether this is the case first.
Hi,
We don't use Designate *yet*, but we're planning to. Using an FQDN for the instance name is what we used to do so far. Even if that's not something that *was* supported, it would IMO be desirable to support it, at least in the future.
Just my 2 cents, hoping to help, Cheers,
Thomas Goirand (zigo)
On Mon, 2020-11-30 at 11:51 +0000, Stephen Finucane wrote:
When attaching a port to an instance, nova will check for DNS support in neutron and set a 'dns_name' attribute if found. To populate this attribute, nova uses a sanitised version of the instance name, stored in the instance.hostname attribute. This sanitisation simply strips out any unicode characters and replaces underscores and spaces with dashes, before truncating to 63 characters. It does not currently replace periods and this is the cause of bug 1581977 [1], where an instance name such as 'ubuntu20.04' will fail to schedule since neutron identifies '04' as an invalid TLD.
The question now is what to do to resolve this. There are two obvious paths available to us. The first is to simply catch these invalid hostnames and replace them with an arbitrary hostname of format 'Server-{serverUUID}'. This is what we currently do for purely unicode instance names and is what I've proposed at [2]. The other option is to strip all periods, or rather replace them with hyphens, when sanitizing the instance name. This is more predictable but breaks the ability to use the instance name as a FQDN. Such usage is something I'm told we've never supported, but I'm concerned that there are users out there who are relying on this all the same and I'd like to get a feel for whether this is the case first.
So, the question: does anyone currently rely on this inadvertent "feature"?
A quick update. I've reworked the change [1] such that it will always replace periods with hyphens. From the sounds of things, there are people who name their instance using FQDNs for management purposes but there does not appear to be anyone using the name published via the metadata service for DNS integration purposes. This makes replacing the periods the least complex solution to the immediate issue. A future change can look at exposing a way to configure this information via the API when creating a new instance. We might also want to change from stripping of unicode to replacement using punycode. If anyone missed this discussion the first time around and has concerns, please raise them here or on the review. Cheers, Stephen [1] https://review.opendev.org/c/openstack/nova/+/764482
Cheers, Stephen
[1] https://launchpad.net/bugs/1581977 [2] https://review.opendev.org/c/openstack/nova/+/764482
participants (9)
-
Artom Lifshitz
-
Jeremy Stanley
-
melanie witt
-
Michael Johnson
-
Ruby Loo
-
Sean Mooney
-
Stephen Finucane
-
Sylvain Bauza
-
Thomas Goirand