On Mon, 2020-11-30 at 19:50 +0000, Sean Mooney wrote:
On Mon, 2020-11-30 at 18:16 +0000, Jeremy Stanley wrote:
On 2020-11-30 10:13:00 -0800 (-0800), Michael Johnson wrote: [...]
So, I think I am in alignment with Sean. The hostname component should return a 400 if there is an illegal character in the string, such as a period. We also should use Punycode to store/handle unicode hostnames correctly. [...]
So to be clear, you're suggesting that if someone asks for an instance name which can't be converted to a (legal) hostname, then the nova boot call be rejected outright, even though there are likely plenty of people who 1. (manually) align their instance names with FQDNs, and 2. don't use the hostname metadata anyway? unfortunetly we cant do that even if its the right thing to do technically.
i think the path forward has to be something more like this.
1.) add a new workaround config option e.g. disable_strict_server_name_check. for upgrade reason it would default to true in wallaby when strict server name checking is disabled we will transform any malformed numeric tld by replacing the '.' with '-' or by replacing the hostname with server-<uuid> as we do with unicode hostnames.
This sounds like config-driven API behavior, in that the API will respond differently to the same request on different clouds (an 'ubuntu18.04' server name will work on one cloud and fail on the other). That's generally a big no- no. What makes this different?
On Tue, 2020-12-01 at 10:30 +0000, Stephen Finucane wrote: this is the backportable bit where we allow the numeric tlds if you opt into it. it is config driven api behavior but it is the only way i think its valid to backport a behavioral api change. we should not do that without a way to disable it hence a workaround config option.
2.) add a new api micro versions and do one of: a.) reject servers with invlaid server names that contain '.' with a 400 b.) transform server names according to the RFEs (replace all '.' and other disallowed charaters with -) c.) add support for FQDNs to the api. this coudl be something like adding a new field to store the fqdn as a top level filed on the server. make hostname contain jsut the host name with the full FQDN in the new fqdn field. if the server name is an fqdn the the fqdn field would just be the server name. if the server name is the a hostname then the nova dhcp_domain will be appended to servername this will allow the remvoal of dhcp_domain from the compute node for config driver generateion and we can generate the metadat form teh new fqdn filed. if designate is enabled then the fqdn will be taken form the port info. in the metadata we will store the instance.hostname which will never be an fqdn in all local hostname keys. we can store teh fqdn in the public_hostname key in the ec2 metadata and in a new fqdn filed. this will make the values consitent and useful. with the new microversion we will nolonger transform the hostname except for multi-create where it will be used as a template i.e. <server name>-<vm index> TBD if the new micorversion will continue to transform unicode hostname to server-<uuid> or allow them out of scope for now.
This sounds great, and I agree(d) that it's ultimately the correct solution [1], but it doesn't solve anything for users that try the following on any release to date when Designate is configured:
openstack server create ... ubuntu18.04
We need to fix this in a backportable manner, hence my suggestion to simply rewrite hostnames if we detect that they're still invalid after sanitization.
well im not sure we do. ubuntu18.04 would have always failed regardelss of if you have designate or not. numeric tld have never worked it was one of the first things i learned when i started working on hevana so i dont think we have to "fix" that in a backportable way, that is not new with designate, but if we must then we can do so with the workaround option. althouht melanie is right we should name it enable_strict_server_name_checking=false rather than disable_strict_server_name_check=true to follow convention.
Remember, we already do sanitization, and with this people that are using valid FQDN like Ruby or Jeremy can continue to do so and people that don't know about this "feature" are able to create instances. I don't see what the downside of that would be. un less we also adress the fact that the FQDN you enter will not be available to the instance in any way i really dont see the point in just allowing it to be set. the server will be presented with 4 different host names none of which will be the one that you entered as i pointed out in my previous mail. the only reason you can even ping it is because of the dns search domain if that is not congired the the server name will not resolve at all unless you also register that dns name manually or set it in /etc/hosts after teh fact.
you can do both of those without actully setting the server name to match so im not seeing a compleing reason to allow numeric tlds Ruby's and Jeremy's use cases condered given numeric tlds never worked it should not affect them provided we contiue to allow fqdns in the server name while enable_strict_server_name_checking=false
Stephen
[1] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2...
3.) change workaround config option default to false enforcing the new behavior for old micorverions but do not remove it. this will enable all vm even with old clinets to have to correct behavior. of requiring the hostname to be an actul hostname we will not remove this workaround option going forward allowing cloud that want the old behavior to contiue to have it but endusers can realy on the consitent behavior by opting in to the new microverion if they have a perference.
this is a log way to say that i think we need a spec for the new api behavior that adds support for FQDN offically whatever we decide we need to document the actually expected behavior in both the api refernce and general server careate doumentation.
if we backport any part of this i think the new behavior should be disabled by default e.g. transfroming the numeric top level domain so that the api behavior continues to be unaffected unless operators opt in to enabling it.
toughts?