[openstack-dev] [Security] [Bandit] Using multiprocessing/threading to speed up analysis
jamie.finnigan at hp.com
Mon Jun 8 17:17:36 UTC 2015
On 6/8/15, 8:26 AM, "Ian Cordasco" <ian.cordasco at RACKSPACE.COM> wrote:
>I drew up a blueprint
>hecks) to add the ability to use multiprocessing (or threading) to Bandit.
>This essentially means that each "thread" will be fed a file and analyze
>it and return the results. (A file will only ever be analyzed by one
>This has lead to significant speed improvements in Flake8 when running
>against a project like Nova and I think the same improvements could be
>made to Bandit.
We skipped parallel processing earlier in Bandit development to keep
things simple, but if we can speed up execution against the bigger code
bases with minimal additional complexity (still needs to be 'easy¹ to add
new checks) then that would be a nice win.
I don't think we¹d lose anything by processing in parallel vs. serial. If
we do ever add additional state tracking more broadly than per-file,
checks would need to be run at the end of execution anyway to take
advantage of the full state.
>I'd love some feedback on the following points:
>1. Should this be on by default?
> Technically, this is backwards incompatible (unless we decide to order
>the output before printing results) but since we're still in the 0.x
>release series of Bandit, SemVer allows backwards incompatible releases. I
>don't know if we want to take advantage of that or not though.
It looks like flake8 default behavior is off (1 "thread²), which makes
sense to me...
>2. Is output ordering significant/important to people?
Output ordering is important - output for repeated runs against an
unchanged code base should be predictable/repeatable. We also want to
continue to support aggregating output by filename or by issue. Under this
model though, we'd just collect results then order/output at completion
rather than during execution.
>3. If this is off by default, should the flag accept a special value,
>e.g., 'auto', to tell Bandit to always use the number of CPUs present on
That seems like a tidy way to do things.
Overall, this feels like a nice evolutionary step. Not opposed (in fact,
I'd support it), but would want to make sure it doesn't overly complicate
What library/ies would you suggest using? I still like the idea of
keeping as few external dependencies as possible.
More information about the OpenStack-dev