On Mon, 2020-11-30 at 19:50 +0000, Sean Mooney wrote:
On Mon, 2020-11-30 at 18:16 +0000, Jeremy Stanley wrote:
On 2020-11-30 10:13:00 -0800 (-0800), Michael Johnson wrote: [...]
So, I think I am in alignment with Sean. The hostname component should return a 400 if there is an illegal character in the string, such as a period. We also should use Punycode to store/handle unicode hostnames correctly. [...]
So to be clear, you're suggesting that if someone asks for an instance name which can't be converted to a (legal) hostname, then the nova boot call be rejected outright, even though there are likely plenty of people who 1. (manually) align their instance names with FQDNs, and 2. don't use the hostname metadata anyway? unfortunetly we cant do that even if its the right thing to do technically.
i think the path forward has to be something more like this.
1.) add a new workaround config option e.g. disable_strict_server_name_check. for upgrade reason it would default to true in wallaby when strict server name checking is disabled we will transform any malformed numeric tld by replacing the '.' with '-' or by replacing the hostname with server-<uuid> as we do with unicode hostnames.
This sounds like config-driven API behavior, in that the API will respond differently to the same request on different clouds (an 'ubuntu18.04' server name will work on one cloud and fail on the other). That's generally a big no- no. What makes this different?
2.) add a new api micro versions and do one of: a.) reject servers with invlaid server names that contain '.' with a 400 b.) transform server names according to the RFEs (replace all '.' and other disallowed charaters with -) c.) add support for FQDNs to the api. this coudl be something like adding a new field to store the fqdn as a top level filed on the server. make hostname contain jsut the host name with the full FQDN in the new fqdn field. if the server name is an fqdn the the fqdn field would just be the server name. if the server name is the a hostname then the nova dhcp_domain will be appended to servername this will allow the remvoal of dhcp_domain from the compute node for config driver generateion and we can generate the metadat form teh new fqdn filed. if designate is enabled then the fqdn will be taken form the port info. in the metadata we will store the instance.hostname which will never be an fqdn in all local hostname keys. we can store teh fqdn in the public_hostname key in the ec2 metadata and in a new fqdn filed. this will make the values consitent and useful. with the new microversion we will nolonger transform the hostname except for multi-create where it will be used as a template i.e. <server name>-<vm index> TBD if the new micorversion will continue to transform unicode hostname to server-<uuid> or allow them out of scope for now.
This sounds great, and I agree(d) that it's ultimately the correct solution [1], but it doesn't solve anything for users that try the following on any release to date when Designate is configured: openstack server create ... ubuntu18.04 We need to fix this in a backportable manner, hence my suggestion to simply rewrite hostnames if we detect that they're still invalid after sanitization. Remember, we already do sanitization, and with this people that are using valid FQDN like Ruby or Jeremy can continue to do so and people that don't know about this "feature" are able to create instances. I don't see what the downside of that would be. Stephen [1] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2...
3.) change workaround config option default to false enforcing the new behavior for old micorverions but do not remove it. this will enable all vm even with old clinets to have to correct behavior. of requiring the hostname to be an actul hostname we will not remove this workaround option going forward allowing cloud that want the old behavior to contiue to have it but endusers can realy on the consitent behavior by opting in to the new microverion if they have a perference.
this is a log way to say that i think we need a spec for the new api behavior that adds support for FQDN offically whatever we decide we need to document the actually expected behavior in both the api refernce and general server careate doumentation.
if we backport any part of this i think the new behavior should be disabled by default e.g. transfroming the numeric top level domain so that the api behavior continues to be unaffected unless operators opt in to enabling it.
toughts?