Hello, Zorillas and interested stackers!
Thank you for attending last week's PTG. We had productive discussions and came up with good plans for this cycle!
Here is a summary of the topics we spoke about during the PTG. The etherpad with all the notes is available in [0]. If you would like to see the recordings for those discussions, they are available on the Manila Youtube channel [1].
- Share Transfer Future development
- Share transfers feature was introduced in Antelope but it has a couple of possible enhancements, such as: transferring shares that have snapshots, replicated shares, share networks alongside its share servers and shares.
- Snapshots shouldn't be very difficult to cover but we would need to keep in mind quota allocations while doing so.
- Transferring shares with replicas is the bigger challenge, considering we need to work with modifying the access rules.
- The next step would be to work on transferring share networks.
- AIs: Haixin plans to work on this feature during Bobcat.
- For the Bobcat cycle, we will focus on:
- Ensure we are landing even more tests and enabling the oslo policy defaults.
- Modify the testing we are doing to use the reader role to test read-only API, opposed to the member role.
- Audit all cross service interactions and isolate actions performed by the "service" user.
- Push for the completion of unit tests for Secure RBAC
- AIs: Manila team will continue promoting events to get tests reviewed and implemented
- Cross project discussion with Nova for VirtIO FSĀ
- Manila and Nova team discussed approaches for avoiding shares to be deleted while they are attached to instances.
- We have agreed that an admin/service level API for locking/unlocking shares deletion would be the best solution to avoid shares being deleted while mounted. This API would ideally be a 'locked by <consumer>' action instead of a simple boolean
- As part of this cross-project effort, the Manila team has also been focusing on getting more APIs available in OpenStack SDK. We have been making great progress with it and we plan to have all APIs nova is using merged in the OpenStack SDK by the bobcat cycle.
- Manila team to write a spec describing the behavior of the locking/unlocking API for share.
- Manila team to focus on having all APIs nova is using in OpenStack SDK by the Bobcat cycle.
- Scalable NFS Ganesha for CephFS - state of the kitchen
- We have shared a status of what was our focus during the previous release and some important bugs that have been fixed.
- Updates on moving from a standalone ganesha to a cephadm deployed:
- There are three changes being worked on at the moment, after past discussions:
- Ensure that the shares access rules will be applied when moving from the old to the new server
- Ensure that the share exists in both old and new ganesha servers and to reapply all access rules in both
- Change to represent export locations from both old and new export locations while both cephfs_nfs_cluster_id and ganesha_server_ips are present in the backend.
- With these three changes we would cover all the scenarios considering release upgrades.
- The changes are being tested and need feedback.
- A job for automated testing is in progress. We are targeting to have a multi-node CI job that will be capable of deploying ganesha using cephadm and run tests using it.
- AI: carloss and gouthamr will pursue closure for the three changes related to the upgrades and ashrodri will continue to work on the multi-node testing job.
- SQLAlchemy 2.0 and other DB challenges/changes
- DB testing is failing sporadically in the CI and we were discussing possible changes to avoid rechecks caused by sporadic failures.
- We intend to work on missing indexes for queries, so we can gain performance in the database.
- Implementing __repr__ to manila models
- Other people are already doing this and this could be of huge help for debugging, as one of the operators suggested in the session
- We will investigate and apply this in Manila
- AIs: gouthamr will open bugs for tracking the work on the joinedloads and a blueprint for the missing indexes.
- Sqlalchemy 2.0 changes have already started and we have a list of changes yet to merge. stephenfin is doing a great work on the changes.
- AIs: manila team to review the changes proposed by stephenfin
- Create share from replicated snapshot / keep source share on replica promote
- Users of this feature are looking for a way to test the inactive replicas before actually promoting them and if everything is correct, they continue with the promotion.
- One suggestion to get this working would be having a replication pause API, which would make the inactive instance able to be mounted
- We do think this feature could be relevant and some suggestions from the Manila team are:
- Letting users know if they can pause/unpause replication even before they create their share replicas
- Making users able to know about errors in the API (something similar to the check mechanism on security-service-update)
- AIs: carthaca and team will work on a spec and will discuss it further with the Manila team.
- We brought up the current issues with our CI, mostly caused by tests being very resource intensive.
- We discussed approaches for scenario testing and thought it is worth it trying to create containers to mount share during scenario tests instead of spawning VMs.
- Also, the audience has mentioned that it is difficult to keep track of all of these issues and it would be better if we had them documented somewhere, and that we could bring these up more often.
- Migration to Ubuntu 22 is yet to complete for some jobs due to failures in package installation and the usage of quagga.
- carloss will work on a wiki with tech debt items and will bring this up more often in the manila weekly meetings, so if people are available in that time frame to work on them, they will have more chances to raise their hands.
- carloss and gouthamr will work to fix the issues with jobs yet to migrate to Ubuntu 22
- Scheduler data placement based on vendor specific tool
- NetApp has a tool called Active IQ that uses machine learning to identify what is the best pool of disks to place a request based on custom parameters (latency, iops and so on)
- They are thinking of trying to have this tool interact with Manila, so it would also benefit from their tool when NetApp is the only storage vendor in the cloud.
- The suggestion is to have a new weigher that will make calls to the storage based on the request and use Active IQ response to identify where the share should be placed.
- Testing this feature could be a challenge, considering that it's not straightforward to foresee what a weigher would return in a given moment
- AIs: felipe_rodriguess alongside the NetApp team will work on a spec so this can be further discussed.
Thank you!
carloss