[openstack-dev] [Security] [Bandit] Using multiprocessing/threading to speed up analysis

Finnigan, Jamie jamie.finnigan at hp.com
Mon Jun 8 17:17:36 UTC 2015


On 6/8/15, 8:26 AM, "Ian Cordasco" <ian.cordasco at RACKSPACE.COM> wrote:

>Hey everyone,
>
>I drew up a blueprint
>(https://blueprints.launchpad.net/bandit/+spec/use-threading-when-running-
>c
>hecks) to add the ability to use multiprocessing (or threading) to Bandit.
>This essentially means that each "thread" will be fed a file and analyze
>it and return the results. (A file will only ever be analyzed by one
>thread.)
>
>This has lead to significant speed improvements in Flake8 when running
>against a project like Nova and I think the same improvements could be
>made to Bandit.

We skipped parallel processing earlier in Bandit development to keep
things simple, but if we can speed up execution against the bigger code
bases with minimal additional complexity (still needs to be 'easy¹ to add
new checks) then that would be a nice win.

I don't think we¹d lose anything by processing in parallel vs. serial.  If
we do ever add additional state tracking more broadly than per-file,
checks would need to be run at the end of execution anyway to take
advantage of the full state.


>
>I'd love some feedback on the following points:
>
>1. Should this be on by default?
>
>   Technically, this is backwards incompatible (unless we decide to order
>the output before printing results) but since we're still in the 0.x
>release series of Bandit, SemVer allows backwards incompatible releases. I
>don't know if we want to take advantage of that or not though.

It looks like flake8 default behavior is off (1 "thread²), which makes
sense to me...


>
>2. Is output ordering significant/important to people?

Output ordering is important - output for repeated runs against an
unchanged code base should be predictable/repeatable. We also want to
continue to support aggregating output by filename or by issue. Under this
model though, we'd just collect results then order/output at completion
rather than during execution.


>
>3. If this is off by default, should the flag accept a special value,
>e.g., 'auto', to tell Bandit to always use the number of CPUs present on
>the machine?

That seems like a tidy way to do things.


Overall, this feels like a nice evolutionary step.  Not opposed (in fact,
I'd support it), but would want to make sure it doesn't overly complicate
things.

What library/ies would you suggest using?  I still like the idea of
keeping as few external dependencies as possible.


Jamie




More information about the OpenStack-dev mailing list