[Openstack-operators] [openstack][nova] total region lockdown (very weird)

Jay Pipes jaypipes at gmail.com
Wed Feb 19 19:55:19 UTC 2014


On Wed, 2014-02-19 at 15:52 -0300, Alejandro Comisario wrote:
> Hi community, the weirdest thing happened to one of our openstack
> regions, running more than 200 vms

> This region is :
> * openstack essex 2012.1.4
> * ubuntu 12.04.2
> * Linux DC4-r59-02vms 3.2.0-49-generic #75-Ubuntu SMP Tue Jun 18
> 17:39:32 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
> 
> On Saturday all 16 nodes increased suddenly with no reason the network
> throughput at the exaclty same time and saturated themselves (peaking
> at 500Mb/s), it was something that involved only that vlan/region.
> 
> Something that was weird, is that graphics shows that lots of traffic
> was getting out from the vms via the vnet interfaces and from the host
> itself, but you see nothing inside the virtual machines, on the
> contrary. (im attaching graphics of everything) 
> We see lots of packages discarded from the tor and aggregation of this
> region, but we dont know what happened that caused this burst in
> bandwidth.
> 
> Here's the graphics of:
> -----------------------
> The view from the compute node, increasing drastically the bandwidth
> usage, same happened to all 15 compute nodes.
> http://oi58.tinypic.com/246261k.jpg
> 
> 
> 
> The view of the vms traffic from the compute perspective, of every
> vnet, in theory, increasing the traffic.
> http://oi62.tinypic.com/2el9j0o.jpg
> 
> 
> 
> The view from inside of the vms, totally decreasing the traffic
> because of this kind of self saturation from every compute (this view
> is of one compute)
> http://oi59.tinypic.com/vh991l.jpg

Hi Alejandro,

You may want to check to see if software running in certain VMs was
compromised. It looks very much like a DDoS attack to me.

Is there anything common to these instances?

e-00002d88
e-00003386
e-00003387
e-00002bec

Those instances look like they were hacked and started executing
something that kicked off a bunch of outbound net I/O.

You can tell that it is something inside those particular tenants
because the outbound network I/O on the management plane eth0 was hardly
anything (4.22 MB/s) vs. the tenant-facing data plane (bond0/vnet10) was
showing outbound I/O at 17-21 MB/s over the same time.

I'd run a virus or rootkit hunter on the instances above.

Best,
-jay





More information about the OpenStack-operators mailing list