<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>Hi Gorka and Renat</p>
<p><br>
</p>
<p>Thanks you for your suggestions and sorry to have forgotten the
[mistral] subject prefix .<br>
</p>
<p><br>
</p>
<p>>Renat:<br>
>workflow should <span style="caret-color: rgb(39, 39, 40);">probably</span><span
style="caret-color: rgb(39, 39, 40);"> </span>be responsible for
tracking a status of an operation. <br>
</p>
<p>>Gorka:<br>
>Instead of a sleep, which may get you through this issue but
fall into a<br>
>different one and won't return the right status code, you
should<br>
>probably have a loop checking the status of the backup and
return a non<br>
>zero status code if it ends up in "error" state.
</p>
<p>The idea of Gorka sounds good.<br>
</p>
<p>If you look at the snapshot worflow of Jose Castro, you will find
a similar snippet:<br>
</p>
<p>
#<a class="moz-txt-link-freetext" href="https://techblog.web.cern.ch/techblog/post/scheduled-snapshots/">https://techblog.web.cern.ch/techblog/post/scheduled-snapshots/</a><br>
#<a class="moz-txt-link-freetext" href="https://gitlab.cern.ch/cloud-infrastructure/mistral-workflows/raw/master/workflows/instance_snapshot.yaml">https://gitlab.cern.ch/cloud-infrastructure/mistral-workflows/raw/master/workflows/instance_snapshot.yaml</a>
| sed -e 's%action_region: "cern"%action_region: "ch-zh1"%'
>instance_snapshot.yaml<br>
</p>
<p> stop_instance:<br>
description: 'Stops the instance for consistency'<br>
action: nova.servers_stop<br>
input:<br>
server: <% $.instance %><br>
action_region: <% $.action_region %><br>
on-success:<br>
- wait_for_stop_instance<br>
on-error:<br>
- error_task<br>
<br>
wait_for_stop_instance:<br>
description: 'Waits until the instance is shutoff to
continue'<br>
action: nova.servers_find<br>
input:<br>
id: <% $.instance %><br>
status: 'SHUTOFF'<br>
action_region: <% $.action_region %><br>
retry:<br>
delay: 5<br>
count: 40<br>
on-success:<br>
- check_boot_source<br>
on-error:<br>
- error_task<br>
</p>
<p><br>
</p>
<p>>We’ve discussed a more generic solution in the past for
similar situations but it seems to be virtually impossible to find
it.</p>
<p>Ok so it looks that this issue cannot be fixed with a small
bugfix. <br>
It would require a feature extension.</p>
<p>I can imagine that quite a few api calls from the different
openstack modules/services are asynchronous and would require
mistral to check their progress status every time in a different
ad hoc manner.<br>
That would make the such a new feature in mistral quite expensive
to implement.</p>
<p>It would be great if every async call would return a job_id in a
standard form by each service.<br>
So mistral would be able to track them in an uniform way.<br>
This would also allows openstack client to run in sync or async
mode, according to the user need.<br>
</p>
<p>But such a design requirement better need to be done at day one;
it is likely too late to change all openstack services...</p>
<p><br>
</p>
<p>However, there is a minor enhancement that could be done:<br>
let the user specify if a cron trigger need to auto-delete itself
after its last execution or not.<br>
</p>
<p>Keeping expired cron triggers could be nice for:<br>
-avoiding the such racing issues as with swift/radosgw<br>
-allowing the user to edit and reschedule a expired cron trigger<br>
</p>
<p>What do you think?<br>
</p>
<p><br>
</p>
<p>Best Regards</p>
<p>Francois<br>
</p>
<p><br>
</p>
<br>
<p><br>
</p>
<p><br>
<br>
<br>
<br>
</p>
<p><br>
</p>
<div class="moz-cite-prefix">On 9/24/19 8:36 AM, Renat Akhmerov
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:4f779c2f-e43e-4f1a-a3a2-44a4e5515ef7@Spark">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title></title>
<div name="messageBodySection">
<div dir="auto">Hi!
<div dir="auto"><br>
</div>
<div dir="auto">I would kindly ask you to add [mistral] into
the subject of the emails related to Mistral. I just saw
this thread accidentally (since I can’t read everything) and
missed it in the first place.</div>
<div dir="auto"><br>
</div>
<div dir="auto">On the issue itself… So yes, the discovery you
made makes perfect sense. I agree that a workflow should <span
style="caret-color: rgb(39, 39, 40);">probably</span><span
style="caret-color: rgb(39, 39, 40);"> </span>be
responsible for tracking a status of an operation. We’ve
discussed a more generic solution in the past for similar
situations but it seems to be virtually impossible to find
it. If you have some ideas, please share. We can discuss it.</div>
<div dir="auto"><br>
</div>
</div>
</div>
<div name="messageSignatureSection"><br>
<div class="matchFont">Thanks<br>
<br>
Renat Akhmerov<br>
@Nokia</div>
</div>
<div name="messageReplySection">On 23 Sep 2019, 14:41 +0700, Gorka
Eguileor <a class="moz-txt-link-rfc2396E" href="mailto:geguileo@redhat.com"><geguileo@redhat.com></a>, wrote:<br>
<blockquote type="cite" class="spark_quote" style="margin: 5px
5px; padding-left: 10px; border-left: thin solid #1abc9c;">On
20/09, Francois Scheurer wrote:<br>
<blockquote type="cite" class="spark_quote" style="margin: 5px
5px; padding-left: 10px; border-left: thin solid #e67e22;">Hi
Gorka<br>
<br>
<br>
<blockquote type="cite" class="spark_quote" style="margin:
5px 5px; padding-left: 10px; border-left: thin solid
#3498db;">Then I assume you prefer the Swift backup driver
over the Ceph one<br>
because you are using one of the OpenStack releases that
had trouble >with<br>
</blockquote>
Incremental Backups on the Ceph backup driver.<br>
<br>
<br>
You are probably right. But I cannot answer that because I
was not involve<br>
in that decision.<br>
<br>
<br>
Ok in the radosgw logs I see this:<br>
<br>
<br>
2019-09-20 15:40:06.805529 7f19edb9b700 20
token_id=gAAAAABdhNauRvNev5P90ovX7_cb5_4MkY1tg5JHFpAH8JL-_0vDs06lHW5F9Iphua7fxCWTxxdL-0fRzhR8We_nN6Hx9z3FTWcTXLUMtIUPe0WMKQgW6JkUTP8RwSjAfF4W04OztEg3VAUGN_5gWRlBX-KT9uypnEszadG1yA7gpjkCokNnD8oaIeE6arvs_EjfJib51rao<br>
2019-09-20 15:40:06.805664 7f19edb9b700 20 sending request
to<br>
<a class="moz-txt-link-freetext" href="https://keystone.service.stage.ewcs.ch/v3/auth/tokens">https://keystone.service.stage.ewcs.ch/v3/auth/tokens</a><br>
2019-09-20 15:40:06.805803 7f19edb9b700 20 ssl verification
is set to off<br>
2019-09-20 15:40:07.235356 7f19edb9b700 20 sending request
to<br>
<a class="moz-txt-link-freetext" href="https://keystone.service.stage.ewcs.ch/v3/auth/tokens">https://keystone.service.stage.ewcs.ch/v3/auth/tokens</a><br>
2019-09-20 15:40:07.235404 7f19edb9b700 20 ssl verification
is set to off<br>
2019-09-20 15:40:07.267091 7f19edb9b700 5 Failed keystone
auth from<br>
<a class="moz-txt-link-freetext" href="https://keystone.service.stage.ewcs.ch/v3/auth/tokens">https://keystone.service.stage.ewcs.ch/v3/auth/tokens</a> with
404<br>
BTW: our radosgw is configured to delegate user
authentication to keystone.<br>
<br>
In keystone logs I see this:<br>
<br>
2019-09-20 15:40:07.218 24 INFO keystone.token.provider<br>
[req-21b2f11c-9e67-4487-af05-420acfb65ace - - - - -] Token
being processed:<br>
token.user_id [f7c7296949f84a4387c5172808a0965b],<br>
token.expires_at[2019-09-21T13:40:07.000000Z],<br>
token.audit_ids[[u'hFweMPCrSO2D00rNcRNECw']],
token.methods[[u'password']],<br>
token.system[None], token.domain_id[None],<br>
token.project_id[4120792f50bc4cf2b4f97c4546462f06],
token.trust_id[None],<br>
token.federated_groups[None],
token.identity_provider_id[None],<br>
token.protocol_id[None],<br>
token.access_token_id[None],token.application_credential_id[None].<br>
2019-09-20 15:40:07.257 21 INFO keystone.common.wsgi<br>
[req-9f858abb-68f9-42cf-b71a-f1cafca91844
f7c7296949f84a4387c5172808a0965b<br>
4120792f50bc4cf2b4f97c4546462f06 - default default] GET<br>
<a class="moz-txt-link-freetext" href="http://keystone.service.stage.ewcs.ch/v3/auth/tokens">http://keystone.service.stage.ewcs.ch/v3/auth/tokens</a><br>
2019-09-20 15:40:07.265 21 WARNING keystone.common.wsgi<br>
[req-9f858abb-68f9-42cf-b71a-f1cafca91844
f7c7296949f84a4387c5172808a0965b<br>
4120792f50bc4cf2b4f97c4546462f06 - default default] Could
not find trust:<br>
934ed82d2b14413899023da0bee6a953.: TrustNotFound: Could not
find trust:<br>
934ed82d2b14413899023da0bee6a953.<br>
<br>
<br>
So what happens is following:<br>
<br>
1. when the user creates the cron trigger, mistral creates a
trust<br>
2. when the cron trigger executes the workflow, openstack
create a<br>
volume snapshot (a rbd image) then copy it to swift (rgw)
then<br>
delete the snapshot<br>
3. when the execution finishes, if the cron trigger has no
remaining<br>
executions scheduled, then mistral remove the cron trigger
and the trust<br>
<br>
The problem is a racing issue: apprently the copying of the
snapshot to<br>
swift run in the background and mistral removes the trust
before the<br>
operation completes...<br>
<br>
That explains the error in keystone and also the cron
trigger execution<br>
result which is "success" even if the resulting backup is
actually "failed".<br>
<br>
<br>
To test this theory I set up the same cron trigger with more
than one<br>
scheduled execution and the backups were suddenly created
correctly ;-).<br>
<br>
<br>
So something need to be done on the code to deal with this
racing issue.<br>
<br>
In the meantime, I will try to put a sleep action after the
'create backup'<br>
action.<br>
<br>
</blockquote>
<br>
Hi,<br>
<br>
Congrats on figuring out the issue. :-)<br>
<br>
Instead of a sleep, which may get you through this issue but
fall into a<br>
different one and won't return the right status code, you
should<br>
probably have a loop checking the status of the backup and
return a non<br>
zero status code if it ends up in "error" state.<br>
<br>
Cheers,<br>
Gorka.<br>
<br>
<blockquote type="cite" class="spark_quote" style="margin: 5px
5px; padding-left: 10px; border-left: thin solid #e67e22;"><br>
Best Regards<br>
<br>
Francois<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
On 9/20/19 4:02 PM, Gorka Eguileor wrote:<br>
<blockquote type="cite" class="spark_quote" style="margin:
5px 5px; padding-left: 10px; border-left: thin solid
#3498db;">On 20/09, Francois Scheurer wrote:<br>
<blockquote type="cite" class="spark_quote" style="margin:
5px 5px; padding-left: 10px; border-left: thin solid
#d35400;">Hi Gorka<br>
<br>
<br>
We have a swift endpoint set up on opentstack, which
points to our ceph<br>
radosgw backend<br>
<br>
Radosgw provides s3 & swift.<br>
<br>
So the swift logs are here actually the radosgw logs.<br>
<br>
</blockquote>
Hi,<br>
<br>
OK, thanks for the clarification.<br>
<br>
Then I assume you prefer the Swift backup driver over the
Ceph one<br>
because you are using one of the OpenStack releases that
had trouble<br>
with Incremental Backups on the Ceph backup driver.<br>
<br>
Cheers,<br>
Gorka.<br>
<br>
<br>
<blockquote type="cite" class="spark_quote" style="margin:
5px 5px; padding-left: 10px; border-left: thin solid
#d35400;">Cheers<br>
<br>
Francois<br>
<br>
<br>
<br>
On 9/20/19 2:46 PM, Gorka Eguileor wrote:<br>
<blockquote type="cite" class="spark_quote"
style="margin: 5px 5px; padding-left: 10px;
border-left: thin solid #34495e;">On 20/09, Francois
Scheurer wrote:<br>
<blockquote type="cite" class="spark_quote"
style="margin: 5px 5px; padding-left: 10px;
border-left: thin solid #2ecc71;">Dear Gorka and
Hervé<br>
<br>
<br>
Thanks for your hints.<br>
<br>
I have set the debug log level on radosgw.<br>
<br>
I will retest now and post here the results.<br>
<br>
<br>
Cheers<br>
<br>
Francois<br>
</blockquote>
Hi,<br>
<br>
Sorry, I may have missed something in the
conversation, weren't you<br>
using Swift?<br>
<br>
I think you need to see the Swift logs as well, since
that's the API<br>
service that complained about the authorization.<br>
<br>
Cheers,<br>
Gorka.<br>
<br>
<blockquote type="cite" class="spark_quote"
style="margin: 5px 5px; padding-left: 10px;
border-left: thin solid #2ecc71;"><br>
<br>
--<br>
<br>
<br>
EveryWare AG<br>
François Scheurer<br>
Senior Systems Engineer<br>
Zurlindenstrasse 52a<br>
CH-8003 Zürich<br>
<br>
tel: +41 44 466 60 00<br>
fax: +41 44 466 60 10<br>
mail: <a class="moz-txt-link-abbreviated" href="mailto:francois.scheurer@everyware.ch">francois.scheurer@everyware.ch</a><br>
web: <a class="moz-txt-link-freetext" href="http://www.everyware.ch">http://www.everyware.ch</a><br>
</blockquote>
</blockquote>
--<br>
<br>
<br>
EveryWare AG<br>
François Scheurer<br>
Senior Systems Engineer<br>
Zurlindenstrasse 52a<br>
CH-8003 Zürich<br>
<br>
tel: +41 44 466 60 00<br>
fax: +41 44 466 60 10<br>
mail: <a class="moz-txt-link-abbreviated" href="mailto:francois.scheurer@everyware.ch">francois.scheurer@everyware.ch</a><br>
web: <a class="moz-txt-link-freetext" href="http://www.everyware.ch">http://www.everyware.ch</a><br>
</blockquote>
<br>
</blockquote>
--<br>
<br>
<br>
EveryWare AG<br>
François Scheurer<br>
Senior Systems Engineer<br>
Zurlindenstrasse 52a<br>
CH-8003 Zürich<br>
<br>
tel: +41 44 466 60 00<br>
fax: +41 44 466 60 10<br>
mail: <a class="moz-txt-link-abbreviated" href="mailto:francois.scheurer@everyware.ch">francois.scheurer@everyware.ch</a><br>
web: <a class="moz-txt-link-freetext" href="http://www.everyware.ch">http://www.everyware.ch</a><br>
<br>
</blockquote>
<br>
<br>
<br>
</blockquote>
</div>
</blockquote>
<pre class="moz-signature" cols="72">--
EveryWare AG
François Scheurer
Senior Systems Engineer
Zurlindenstrasse 52a
CH-8003 Zürich
tel: +41 44 466 60 00
fax: +41 44 466 60 10
mail: <a class="moz-txt-link-abbreviated" href="mailto:francois.scheurer@everyware.ch">francois.scheurer@everyware.ch</a>
web: <a class="moz-txt-link-freetext" href="http://www.everyware.ch">http://www.everyware.ch</a> </pre>
</body>
</html>