Open Stack

Tue Jan 3 02:54:48 UTC 2017

On 03/01/17 07:03, Jeremy Stanley wrote:
> On 2016-12-30 12:38:02 +0800 (+0800), Tom Fifield wrote:
>> This problem (legitimate users having all posts flatly rejected as
>> spam) is still happening. Any progress?
>
> We're basically ready to retry the previous upgrade now that some
> issues have been identified/corrected by Marton. Change
> https://review.openstack.org/408657 to trigger it is waiting on
> https://review.openstack.org/416072 to grant him a shell account on
> the production server for improved reactivity if it fails again, so
> that he can collaborate with us more directly troubleshooting before
> we have to make the call to fix or revert.

Excellent!

>> I've been doing what I can without access to the server, but my
>> latest attempt - completely deleting an account so it could be
>> re-created by the affected user - was thwarted by a 500 error. Did
>> that appear in the logs?
>
> Given that I don't know the exact time you tried nor your IP
> address, and am unsure what that failure would look like in the logs
> aside from coming with a 500 error code this Apache access log entry
> 18 minutes before your E-mail's timestamp stands out:
>
>     [30/Dec/2016:04:20:40 +0000] "GET /admin/askbot/post/25842/
>     HTTP/1.1" 500 917
>     "https://ask.openstack.org/admin/auth/user/2253/delete/"
>     "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:50.0) Gecko/20100101
>     Firefox/50.0"
>
> I don't see anything related around that timeframe in the Apache
> error log files, the Askbot application log, the Solr Jetty log...
> though there's this in dmesg which is suspiciously close (timestamps
> in dmesg aren't entirely reliable, so this could have happened 5
> minutes earlier):
>
>     [Fri Dec 30 04:25:25 2016] apache2 invoked oom-killer:
>     gfp_mask=0x200da, order=0, oom_score_adj=0
>
> And indeed, there's a spike in swap utilization around that time
> which, given the five-minute granularity could certainly support the
> notion that a runaway process ate all available virtual memory on
> the system:
>
> <URL: http://cacti.openstack.org/cacti/graph.php?action=zoom&local_graph_id=2546&rra_id=2&view_type=&graph_start=1483029900&graph_end=1483156284&graph_height=120&graph_width=500&title_font_size=10 >
>
> The RAM usage graph suggests we were doing okay with a target
> utilization of ~50% so something eating an additional 6GiB of memory
> in a matter of a few minutes would definitely count as anomalous:
>
> <URL: http://cacti.openstack.org/cacti/graph.php?action=zoom&local_graph_id=2544&rra_id=2&view_type=&graph_start=1483029900&graph_end=1483156284&graph_height=120&graph_width=500&title_font_size=10 >
>
> Was the memory spike caused by that deletion? Or was the deletion
> error coincidental and caused by a memory event which just happened
> to be going on at the same time? I have insufficient knowledge of
> the system to be able to say either way. It's possible there are
> additional logs I don't know to look at which could tell us.
>

Ah, how silly of me to fail to specify information - apologies!

If you check for a POST after a GET of:

/admin/auth/user/2253/delete/

Around

2017-02-03 02:52

from

1.169.254.207

that should be it.

I believe the OOM error comes from trying to view any post in the django 
admin interface (/admin/askbot/post/%d/ ) -- those things essentially 
never stop loading for some reason.

Open Stack

[OpenStack-Infra] Logs for Ask.O.o - chasing false positive spam labeling

OpenStack

Community

Documentation

Branding & Legal