[openstack-dev] [Security] [Bandit] Using multiprocessing/threading to speed up analysis

Ian Cordasco ian.cordasco at RACKSPACE.COM
Mon Jun 8 21:40:15 UTC 2015

On 6/8/15, 12:17, "Finnigan, Jamie" <jamie.finnigan at hp.com> wrote:

>On 6/8/15, 8:26 AM, "Ian Cordasco" <ian.cordasco at RACKSPACE.COM> wrote:
>>Hey everyone,
>>I drew up a blueprint
>>hecks) to add the ability to use multiprocessing (or threading) to
>>This essentially means that each "thread" will be fed a file and analyze
>>it and return the results. (A file will only ever be analyzed by one
>>This has lead to significant speed improvements in Flake8 when running
>>against a project like Nova and I think the same improvements could be
>>made to Bandit.
>We skipped parallel processing earlier in Bandit development to keep
>things simple, but if we can speed up execution against the bigger code
>bases with minimal additional complexity (still needs to be 'easy¹ to add
>new checks) then that would be a nice win.
>I don't think we¹d lose anything by processing in parallel vs. serial.  If
>we do ever add additional state tracking more broadly than per-file,
>checks would need to be run at the end of execution anyway to take
>advantage of the full state.

Yeah, I don't see this affecting the current state of checks or plugins.
So unless something drastically changes in how we write them, this
shouldn't affect them.

>>I'd love some feedback on the following points:
>>1. Should this be on by default?
>>   Technically, this is backwards incompatible (unless we decide to order
>>the output before printing results) but since we're still in the 0.x
>>release series of Bandit, SemVer allows backwards incompatible releases.
>>don't know if we want to take advantage of that or not though.
>It looks like flake8 default behavior is off (1 "thread²), which makes
>sense to me...

It's actually "auto" which is whatever the number of  CPUs you have:

>>2. Is output ordering significant/important to people?
>Output ordering is important - output for repeated runs against an
>unchanged code base should be predictable/repeatable. We also want to
>continue to support aggregating output by filename or by issue. Under this
>model though, we'd just collect results then order/output at completion
>rather than during execution.

Right. Ordering would be done after execution.

>>3. If this is off by default, should the flag accept a special value,
>>e.g., 'auto', to tell Bandit to always use the number of CPUs present on
>>the machine?
>That seems like a tidy way to do things.
>Overall, this feels like a nice evolutionary step.  Not opposed (in fact,
>I'd support it), but would want to make sure it doesn't overly complicate

I can probably work on a PoC implementation this weekend to give y'all a
better idea of what it would look like.

>What library/ies would you suggest using?  I still like the idea of
>keeping as few external dependencies as possible.

No extra dependencies. Flake8's dependencies are also very light. This
will just take advantage of the multiprocessing library that has been
included in the standard library since Python 2.5 or 2.6.

>OpenStack Development Mailing List (not for usage questions)
>Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe

More information about the OpenStack-dev mailing list