[openstack-dev] [Security] [Bandit] Using multiprocessing/threading to speed up analysis

Murphy, Grant grant.murphy at hp.com
Tue Jun 9 03:55:35 UTC 2015


If you guys go down this road I would suggest using
https://docs.python.org/2/library/multiprocessing.html rather than Python
Threads if that is what is being proposed..





On 6/8/15, 10:17 AM, "Finnigan, Jamie" <jamie.finnigan at hp.com> wrote:

>On 6/8/15, 8:26 AM, "Ian Cordasco" <ian.cordasco at RACKSPACE.COM> wrote:
>
>>Hey everyone,
>>
>>I drew up a blueprint
>>(https://blueprints.launchpad.net/bandit/+spec/use-threading-when-running
>>-
>>c
>>hecks) to add the ability to use multiprocessing (or threading) to
>>Bandit.
>>This essentially means that each "thread" will be fed a file and analyze
>>it and return the results. (A file will only ever be analyzed by one
>>thread.)
>>
>>This has lead to significant speed improvements in Flake8 when running
>>against a project like Nova and I think the same improvements could be
>>made to Bandit.
>
>We skipped parallel processing earlier in Bandit development to keep
>things simple, but if we can speed up execution against the bigger code
>bases with minimal additional complexity (still needs to be 'easy¹ to add
>new checks) then that would be a nice win.
>
>I don't think we¹d lose anything by processing in parallel vs. serial.  If
>we do ever add additional state tracking more broadly than per-file,
>checks would need to be run at the end of execution anyway to take
>advantage of the full state.
>
>
>>
>>I'd love some feedback on the following points:
>>
>>1. Should this be on by default?
>>
>>   Technically, this is backwards incompatible (unless we decide to order
>>the output before printing results) but since we're still in the 0.x
>>release series of Bandit, SemVer allows backwards incompatible releases.
>>I
>>don't know if we want to take advantage of that or not though.
>
>It looks like flake8 default behavior is off (1 "thread²), which makes
>sense to me...
>
>
>>
>>2. Is output ordering significant/important to people?
>
>Output ordering is important - output for repeated runs against an
>unchanged code base should be predictable/repeatable. We also want to
>continue to support aggregating output by filename or by issue. Under this
>model though, we'd just collect results then order/output at completion
>rather than during execution.
>
>
>>
>>3. If this is off by default, should the flag accept a special value,
>>e.g., 'auto', to tell Bandit to always use the number of CPUs present on
>>the machine?
>
>That seems like a tidy way to do things.
>
>
>Overall, this feels like a nice evolutionary step.  Not opposed (in fact,
>I'd support it), but would want to make sure it doesn't overly complicate
>things.
>
>What library/ies would you suggest using?  I still like the idea of
>keeping as few external dependencies as possible.
>
>
>Jamie
>
>
>__________________________________________________________________________
>OpenStack Development Mailing List (not for usage questions)
>Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list