[openstack-dev] [searchlight] Searching admin-only fields

McLellan, Steven steve.mclellan at hpe.com
Tue Nov 10 18:49:26 UTC 2015


An issue discovered at the end of the Liberty cycle [1] meant that while still present in results (for admins) we had to disable searching of admin-only fields (TL;DR – a non-admin can fish for results to discover values for admin-only fields by searching e.g. "hostId": "a*"). We'd like to fix this, though I've only come up with three ideas. Any feedback would be very welcome.

1. There is a plugin for Elasticsearch called Shield (unfortunately only available commercially) that provides field level access by allowing roles to specify a list of fields they can search and see in results. The list is inclusive and can use wildcards. Shield has access to the parsed query and so can exclude any terms that refer to blocked fields (it treats them as having no results). It also disables the _all field for roles where a field list is specified. Even were we able to use it, Shield is more restrictive that we would like.

2. Create multiple indices (searchlight-admin, searchlight-user) and index some fields only in the admin index. This has several things going for it:
 * it's free
 * it's reasonably easy to understand
 * the code isn't complicated
 * it's secure
 * allows full text searching of _all

The strikes:
 * more indices (especially where someone configures indices specific to a plugin, though we could also allow plugins to not require an admin index)
 * double the data stored
 * greater risk of inconsistency (Elasticsearch doesn't have transactions)
 * complicates the effort for zero-downtime reindexing

3. Implement something similar to Shield's field control ourselves. We'd need to exclude fields from _all (because there's no good way to intercept queries against it), and scrub incoming queries against the admin-only field list.

Naively, it's not too hard to conceive of doing this, but I envisage a trickle of edge cases that work around the protection. For instance, to protect 'hostId' one might take the incoming dictionary and look for all instances of 'hostId', returning a 403 if it's found. This will find false positives (e.g. another type has a non-admin field called hostId), and (worse) false negatives; a query such as {"query_string": {"query": "hostId:a*"}} would escape it. Even scrubbing the entire input string would have holes ({"multimatch": {"fields": ["hostI*"], "query": "aaabbccc"}}). We would probably be able to determine many of the issues, but I'd always worry about finding more holes. Shield has the advantage of being post-query parser.

4. ???

Conclusion:
My view is that a separate index is the only sensible way to do this, but I am willing to be swayed.

[1] https://bugs.launchpad.net/searchlight/+bug/1504399




More information about the OpenStack-dev mailing list