Hey Clark, Pls find comments below. I need to justify the product that I maintain, but I perfectly understand you.
On 2. Jun 2022, at 18:22, Clark Boylan <cboylan@sapwetik.org> wrote:
On Thu, Jun 2, 2022, at 3:44 AM, Artem Goncharov wrote:
Hey,
https://review.opendev.org/c/openstack/openstacksdk/+/844414 <https://review.opendev.org/c/openstack/openstacksdk/+/844414> should address the issue. For R1 of SDK we want to ensure all interfaces of SDK use same code branch and not 3 different like it was before. So with the latest changes we now rely here on the regular mechanism that actually watches how to handle each known parameter. All of the arbitrary parameters in Swift are managed as container/object metadata and should get X-Container-Meta- or X-Object-Meta- prefixes in order to be considered and returned back by manila Swift. This is sadly not the case for Rax as you rightfully mention.
Note that the docs [3] still say that headers are passed through, which isn't quite the case in 0.99.0. Filtered headers for what the SDK thinks are valid are passed through. This change should probably update the docs too.
Partially yes, partially no. Documentation says it will pass them and the SW does that. It passes, however, only what it know would be accepted. I see the point here and will try to address this, but the issue is that we have generally a mixture of header and metadata props (which are implemented as headers). I will try to reimplement current handling to be able to guess better what user means.
So for now I have added header required by Rax into “known” exceptions and added explicit unit test for that. Would be cool to find way how I could test that really in Rax to ensure particular Zuul use case will work.
Probably the simplest thing is to have a test that checks the right headers are sent. Then if/when clouds change behavior we can update the tests to follow.
This is how I implemented that, but it is only a unit test. And here I was rather wondering if there would be possibility to have func test for that to save all of us from another analysis sessions on what the heck is going on right now. Point is that we only know that cloud changed behaviour by noticing Zuul fails now (in similar case to the current issue).
Generally we would need to think whether SDK should continue “tell user he is wrong before reaching out to the server” approach or to switch to the “believe the user and let server reject” approach generally. This here is precisely one of those cases. Alternatively we would need to widen our concept of vendors and move all of the specific Rax logic over there. We surely have lot of OpenStack based clouds, which do not explicitly implement everything in the compatible way. So in order to make SDK/CLI working with all of them we either drop most of the validation logic in SDK or really start implementing cloud specific overrides as part of “extensions”.
The problem with telling the user they are wrong before talking to the server is that it assumes that clouds don't do weird things. History has shown us that cloud definitely do weird things and this isn't related to any single cloud. Off the top of my head I can come up with a few examples and if we look through the git history of nodepool and opendev system-config I'm sure we'll find many many more occurences. This isn't a specific cloud problem either; they all do it one way or another.
It is my opinion that tools like openstacksdk should be forgiving in what they accept to enable users to use openstack in its many forms. Shade is a great example of this. Eventually shade was rolled into openstacksdk and doesn't really exist anymore. It feels like the spirit of shade has been lost though and we're regressing back to 2014 when you couldn't talk to two different clouds without 4 different client tools.
Similarly support for old versions of APIs have been ripped out of these tools. Unfortunately some clouds are still deploying old versions of clouds and ideally as a user I wouldn't need to think too hard about that. It isn't my fault this is the situation, I just want the software to work.
I perfectly see your point. To some extend you are right, but let me describe reasoning for all of those changes we have now: - Zuul is using one interface of SDK - Ansible is using other interface (moreover it mixes 3 different interfaces) - OSC is either not using SDK or using one of 3 mentioned interfaces - Those mentioned interfaces try to do same thing, but eventually differently - Vanila services evolve and require we permanently add new features to all of those interfaces simultaneously - Amount of contributions is incredibly low what makes it impossible to maintain multiple interfaces for doing same job - So what I am trying to do now is to have all of the mentioned tools rely on the single branch in code to be able to maintain it easier and reduce amount of questions like “hmm, it works if I use OSC, but it doesn’t when I use Ansible. What is wrong?”. Target is just to exactly reduce amount of those “different tools for different tasks and or clouds” as you mention and instead to have a single tool that work for every cloud. We do not take care of shade any more. We try to deprecate all of the individual clients exactly to have a single OSC be able to do everything. And it should just know all of those corner cases. We try to get rid of as much “reimplementation” from Ansible modules as possible. - Yes, maybe idea of trying to help user to know that he will fail once we reach server was wrong, but you can judge only from experience (when clouds behave differently). Idea as such is not bad, is having its right to live and is generally targeting to be more user friendly than the server response that just returns you 409 with “good luck figuring out what is wrong”. Here I was spending days trying to find issue in my request with only 409 “Bad Request” back. And when target clouds are different you as a user are left in even more problems. Generally idea of SDK is to save user from even knowing about differences of clouds he want to talk to. Without that the whole burden is on the user and you need to have all of those cloud hacks directly in your code (is it cool to have all of those hacks inside Zuul and reimplementing them through the different use cases? And repeating the same hacks in OSC and Ansible? Same for all of the user written applications?) - I know this work breaks some corner cases. And I perfectly see the pain my work is causing (even for myself). I have invested huge amount of time in rewriting this and joining code branches and tests. Trust me this was not fun. But unless this is done we wither stuck with old code without possibility to add new features (i.e. micro versions) or we continue living in the mess of tons of different ways to do the same thing with no maintainers. =========== So overall, while I understand all of the issues I am causing to projects right now, I kindly ask, be patient, report broken issues to me and help me fix this insanity. Believe me, I am not doing all of that with bad intents, but rather hoping to resolve issues we have and unify things in the light of low contributions. Otherwise SDK, CLI, Ansible, eventually Zuul are not able to properly keep up with OpenStack services themselves. I listen to complains carefully, and that is the reason for R1.0 not to exist as of now. Artem
Artem
On 1. Jun 2022, at 19:55, Clark Boylan <cboylan@sapwetik.org> wrote:
Hello,
Last week the OpenDev team upgraded Zuul which deployed zuul-executors with openstacksdk==0.99.0 installed. After this, test job logs uploaded to the rackspace swift location were no longer viewable in the Zuul dashboard. The reason for this is these objects did not have properly configured CORS headers any longer. The objects uploaded to OVH appear to have been fine (potentially because the headers are not required for OVH).
Today we downgraded the version of openstacksdk on the zuul-executors to 0.61.0 and the rackspace uploaded job log objects have functional CORS headers again. For this reason we're fairly confident that something about the 0.99.0 release is impacting the setting of these headers.
The code that does these uploads with openstacksdk can be found here [0]. I suspect that the problem is going to be related to this header set on line 227 [1]. In particular that appears to be rackspace specific and this issue is rackspace specific under 0.99.0. However, I wasn't able to find anything in openstacksdk that would have a problem with this. Does anyone else have ideas? I'm somewhat concerned that this represents a class of data loss (metadata is going missing) for swift object uploads.
Additionally, we rely on setting the x-delete-after header to timeout, expire, and prune our logs in swift [2]. If there is a general problem with 0.99.0 setting headers on objects it is possible that all of the objects we created using 0.99.0 do not have this metadata set and will leak in their swift containers.
[0] https://opendev.org/zuul/zuul-jobs/src/commit/e69d879caecb454c529a7d757b80ae... [1] https://opendev.org/zuul/zuul-jobs/src/commit/e69d879caecb454c529a7d757b80ae... [2] https://opendev.org/zuul/zuul-jobs/src/commit/e69d879caecb454c529a7d757b80ae...
[3] https://docs.openstack.org/openstacksdk/latest/user/connection.html#openstac... <https://docs.openstack.org/openstacksdk/latest/user/connection.html#openstack.connection.Connection.create_object>