[Openstack] Unable to launch hadoop cluster in Sahara

varun bhatnagar varun292006 at gmail.com
Mon Oct 19 15:57:07 UTC 2015


Hi,

Got a bit further with 4.0.2mrv2 image, I saw some commands getting
executed:

2015-10-19 16:47:55.573 1434 DEBUG sahara.utils.ssh_remote [-]
[launchnew-test402-001] Executing "apt-get install --force-yes -y
mapr-jobtracker mapr-tasktracker" _log_command
/usr/lib/python2.7/site-packages/sahara/utils/ssh_remote.py:767

2015-10-19 16:48:32.451 1434 DEBUG sahara.utils.ssh_remote [-]
[launchnew-test402-001] _execute_command took 36.9 seconds to complete
_log_command /usr/lib/python2.7/site-packages/sahara/utils/ssh_remote.py:767
2015-10-19 16:48:33.733 1434 DEBUG sahara.utils.ssh_remote [-]
[launchnew-test402-001] Executing "apt-get install --force-yes -y
mapr-zookeeper mapr-webserver" _log_command
/usr/lib/python2.7/site-packages/sahara/utils/ssh_remote.py:767

2015-10-19 16:49:30.532 1434 DEBUG sahara.utils.ssh_remote [-]
[launchnew-test402-001] Executing "apt-get install --force-yes -y
mapr-oozie-internal=4.0.1* mapr-oozie=4.0.1*" _log_command
/usr/lib/python2.7/site-packages/sahara/utils/ssh_remote.py:767


But after sometime met with a new error:

2015-10-19 17:20:18.310 1434 ERROR sahara.service.ops [-] Error during
operating on cluster launchNEW (reason: An error occurred in thread
'configure-sh-dba1f3ab-5dbb-4752-b1a4-ece3c8c97f02': 'Operation' timed out
after 600 second(s)
Error ID: dd43c7da-cf29-4264-9563-4bc199bba1dc
Error ID: ba104047-f1f0-460e-bb18-1cf8eac6f865)
2015-10-19 17:20:18.677 1434 INFO sahara.utils.general [-] Cluster status
has been changed: id=c8ece084-20f5-4905-b01f-95679cbdb4fc, New status=Error


After logging into the node I can see that the packages got installed

mapr-core-internal is already the newest version.
mapr-core-internal set to manually installed.
mapr-fileserver is already the newest version.
mapr-hadoop-core is already the newest version.
mapr-hadoop-core set to manually installed.
mapr-jobtracker is already the newest version.
mapr-mapreduce1 is already the newest version.
mapr-mapreduce1 set to manually installed.
mapr-mapreduce2 is already the newest version.
mapr-mapreduce2 set to manually installed.
mapr-nfs is already the newest version.
mapr-tasktracker is already the newest version.
mapr-webserver is already the newest version.
mapr-zk-internal is already the newest version.
mapr-zk-internal set to manually installed.
mapr-zookeeper is already the newest version.
mapr-oozie is already the newest version.
mapr-oozie-internal is already the newest version.


Any ideas/suggestions to fix this?

BR,
Varun

On Sun, Oct 18, 2015 at 1:22 PM, varun bhatnagar <varun292006 at gmail.com>
wrote:

> Hi Chris,
>
> Thanks for the suggestion. I have posted my question on his blog but it is
> not yet approved and is still in pending state.
> Is it possible to provide any other suggestion to solve this problem. I am
> really out of ideas at the moment and I reallt want to fix this problem as
> my actual work is on hold.
>
> Can anyone please suggest something else?
>
> BR,
> Varun
>
> On Fri, Oct 16, 2015 at 7:07 PM, Chris Buccella <
> chris.buccella at antallagon.com> wrote:
>
>> Those "No route to host" log messages are DEBUG level; they don't
>> indicate a problem per say, it may just be that the instance didn't come up
>> yet. If the cluster transitioned from Waiting to Configuring, I assume the
>> instance did become accessible.
>>
>> You might try reaching out to Abizer, the author of the blog, directly.
>>
>> On Fri, Oct 16, 2015 at 8:36 AM, varun bhatnagar <varun292006 at gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Now I have tried launching one more cluster with having 1 instance (All
>>> in one setup). When the cluster was in "Waiting" state but I saw the below
>>> messages getting logged in sahara log file:
>>>
>>> 2015-10-16 13:21:00.045 27689 DEBUG sahara.service.engine [-] Can't
>>> login to node mymapr-allinone-001 172.24.4.235, reason error: [Errno 113]
>>> No route to host _is_accessible
>>> /usr/lib/python2.7/site-packages/sahara/service/engine.py:128
>>> 2015-10-16 13:21:08.702 27689 DEBUG sahara.service.engine [-] Can't
>>> login to node mymapr-allinone-001 172.24.4.235, reason error: [Errno 113]
>>> No route to host _is_accessible
>>> /usr/lib/python2.7/site-packages/sahara/service/engine.py:128
>>> 2015-10-16 13:21:17.106 27689 DEBUG sahara.service.engine [-] Can't
>>> login to node mymapr-allinone-001 172.24.4.235, reason error: [Errno 113]
>>> No route to host _is_accessible
>>> /usr/lib/python2.7/site-packages/sahara/service/engine.py:128
>>>
>>> The instance was running and was reachable by floating IP and ssh also
>>> works fine then why is this message being thrown by Sahara?
>>>
>>>
>>> [root at controller ~(keystone_admin)]# nova list
>>>
>>> +--------------------------------------+---------------------+--------+------------+-------------+----------------------------------+
>>> | ID                                   | Name                | Status |
>>> Task State | Power State | Networks                         |
>>>
>>> +--------------------------------------+---------------------+--------+------------+-------------+----------------------------------+
>>> | ceb87326-1f84-476b-86fa-fc004a9c5744 | mymapr-allinone-001 | ACTIVE |
>>> -          | Running     | internal=11.0.0.12, 172.24.4.235 |
>>>
>>> +--------------------------------------+---------------------+--------+------------+-------------+----------------------------------+
>>> [root at controller ~(keystone_admin)]# ping 172.24.4.235
>>> PING 172.24.4.235 (172.24.4.235) 56(84) bytes of data.
>>> 64 bytes from 172.24.4.235: icmp_seq=1 ttl=63 time=10.6 ms
>>> 64 bytes from 172.24.4.235: icmp_seq=2 ttl=63 time=1.85 ms
>>> 64 bytes from 172.24.4.235: icmp_seq=3 ttl=63 time=1.02 ms
>>> ^C
>>> --- 172.24.4.235 ping statistics ---
>>> 3 packets transmitted, 3 received, 0% packet loss, time 2004ms
>>> rtt min/avg/max/mdev = 1.028/4.520/10.680/4.369 ms
>>> [root at controller ~(keystone_admin)]# ssh -i Downloads/testkey.pem
>>> ubuntu at 172.24.4.235
>>> The authenticity of host '172.24.4.235 (172.24.4.235)' can't be
>>> established.
>>> ECDSA key fingerprint is a3:cc:5c:4e:fb:c8:83:80:46:54:77:31:e7:60:c5:c2.
>>> Are you sure you want to continue connecting (yes/no)? yes
>>> Warning: Permanently added '172.24.4.235' (ECDSA) to the list of known
>>> hosts.
>>> Welcome to Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-44-generic x86_64)
>>>
>>>  * Documentation:  https://help.ubuntu.com/
>>>
>>>   System information as of Fri Oct 16 11:41:12 UTC 2015
>>>
>>>   System load:  0.77              Processes:           124
>>>   Usage of /:   4.7% of 98.30GB   Users logged in:     0
>>>   Memory usage: 1%                IP address for eth0: 11.0.0.12
>>>   Swap usage:   0%
>>>
>>>   Graph this data and manage this system at:
>>>     https://landscape.canonical.com/
>>>
>>>   Get cloud support with Ubuntu Advantage Cloud Guest:
>>>     http://www.ubuntu.com/business/services/cloud
>>>
>>> 7 packages can be updated.
>>> 7 updates are security updates.
>>>
>>>
>>> ubuntu at mymapr-allinone-001:~$ ifconfig
>>> eth0      Link encap:Ethernet  HWaddr fa:16:3e:46:f5:5d
>>>           inet addr:11.0.0.12  Bcast:11.0.0.255  Mask:255.255.255.0
>>>           inet6 addr: fe80::f816:3eff:fe46:f55d/64 Scope:Link
>>>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>>           RX packets:954 errors:0 dropped:0 overruns:0 frame:0
>>>           TX packets:944 errors:0 dropped:0 overruns:0 carrier:0
>>>           collisions:0 txqueuelen:1000
>>>           RX bytes:109259 (109.2 KB)  TX bytes:134857 (134.8 KB)
>>>
>>> lo        Link encap:Local Loopback
>>>           inet addr:127.0.0.1  Mask:255.0.0.0
>>>           inet6 addr: ::1/128 Scope:Host
>>>           UP LOOPBACK RUNNING  MTU:65536  Metric:1
>>>           RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>           TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>           collisions:0 txqueuelen:0
>>>           RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
>>>
>>> ubuntu at mymapr-allinone-001:~$
>>> ubuntu at mymapr-allinone-001:~$
>>>
>>>
>>> After sometime the cluster entered into "Configuring" state and after
>>> some time ran into Error and displayed the below message:
>>>
>>> 2015-10-16 14:19:10.603 27689 ERROR sahara.service.ops [-] Error during
>>> operating on cluster myMapR (reason: SSHException: Error reading SSH
>>> protocol banner)
>>> 2015-10-16 14:19:10.994 27689 INFO sahara.utils.general [-] Cluster
>>> status has been changed: id=e15529e9-be16-40a2-ba87-7132713f7460, New
>>> status=Error
>>>
>>>
>>> Can anyone please suggest some way to configure this cluster?
>>>
>>> BR,
>>> Varun
>>>
>>>
>>> On Fri, Oct 16, 2015 at 9:26 AM, varun bhatnagar <varun292006 at gmail.com>
>>> wrote:
>>>
>>>> Hi Chris,
>>>>
>>>> Thanks a lot for your answer.
>>>> I still have couple of questions.
>>>>
>>>> You mentioned something about configuring keys -- could you please tell
>>>> me where & how should I do that?
>>>> Also, you mentioned that node couldn't access the outside world -- Is
>>>> this a mandatory thing that the nodes should contact the outside world. I
>>>> have assigned floating IPs but these are just internal and this won't be
>>>> able to access outside world as this is my test lab which doesn't have this
>>>> outside connectivity. The traffic doesn't go out of the OpenStack
>>>> environment.
>>>>
>>>> I am using neutron as my network component.
>>>>
>>>>
>>>> BR,
>>>> Varun
>>>>
>>>>
>>>> On Fri, Oct 16, 2015 at 1:10 AM, Chris Buccella <
>>>> chris.buccella at antallagon.com> wrote:
>>>>
>>>>> I haven't tried the mapr plugin, but here are some thoughts:
>>>>>
>>>>> Sahara's error reporting is pretty bad... in my experience, the plugin
>>>>> rarely logs the true cause of an error; you'll need to dig for it.
>>>>> Configure your cluster to use a key so you can login to the nodes. You can
>>>>> then login to the controller node and look in log files there. For a time
>>>>> out error, it could be that a node couldn't access another node, or
>>>>> couldn't access the outside world. In that case, it might be as simple as
>>>>> ensuring your security groups are permissive enough.
>>>>>
>>>>> On Wed, Oct 14, 2015 at 10:49 AM, varun bhatnagar <
>>>>> varun292006 at gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am having OpenStack Kilo single node setup and I am trying to
>>>>>> launch Hadoop cluster using MapR plugin in Sahara.
>>>>>>
>>>>>> I am following the example mentioned in the below site:
>>>>>>
>>>>>>
>>>>>> https://www.mapr.com/blog/tutorial-how-set-mapr-private-cloud-using-sahara-devstack#.Vg5ntfmqqko
>>>>>>
>>>>>> It starts off well but after certain time it reaches a point where
>>>>>> the cluster status gets changed to "Configuring" and then after some time
>>>>>> it runs into error due to timeout. Can anyone please help in successfully
>>>>>> launching the cluster?
>>>>>>
>>>>>>
>>>>>>
>>>>>> *2015-10-14 16:28:37.869 30280 DEBUG sahara.context [-] Thread
>>>>>> configure-sh-c1050100-1775-4654-93a8-05c2ba350864 failed with exception:
>>>>>> 'Operation' timed out after 600 second(s)*
>>>>>> *Error ID: e2f61330-2c84-482e-97d0-d3947a5e5f02 _wrapper
>>>>>> /usr/lib/python2.7/site-packages/sahara/context.py:193*
>>>>>> *2015-10-14 16:28:37.923 30280 ERROR sahara.service.ops [-] Error
>>>>>> during operating on cluster LaunchCluster (reason: An error occurred in
>>>>>> thread 'configure-sh-c1050100-1775-4654-93a8-05c2ba350864': 'Operation'
>>>>>> timed out after 600 second(s)*
>>>>>> *Error ID: e2f61330-2c84-482e-97d0-d3947a5e5f02*
>>>>>> *Error ID: f533bba0-afa8-4c99-bb73-a13f8c47d4a9)*
>>>>>> *2015-10-14 16:28:38.572 30280 INFO sahara.utils.general [-] Cluster
>>>>>> status has been changed: id=2ee0f93f-195b-4e53-a6be-241bdeff5958, New
>>>>>> status=Error*
>>>>>> *2015-10-14 16:28:39.576 30280 DEBUG keystonemiddleware.auth_token
>>>>>> [-] Removing headers from request environment:
>>>>>> X-Service-Catalog,X-Identity-Status,X-Service-Identity-Status,X-Roles,X-Service-Roles,X-Domain-Name,X-Service-Domain-Name,X-Project-Id,X-Service-Project-Id,X-Project-Domain-Name,X-Service-Project-Domain-Name,X-User-Id,X-Service-User-Id,X-User-Name,X-Service-User-Name,X-Project-Name,X-Service-Project-Name,X-User-Domain-Id,X-Service-User-Domain-Id,X-Domain-Id,X-Service-Domain-Id,X-User-Domain-Name,X-Service-User-Domain-Name,X-Project-Domain-Id,X-Service-Project-Domain-Id,X-Role,X-User,X-Tenant-Name,X-Tenant-Id,X-Tenant
>>>>>> _remove_auth_headers
>>>>>> /usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py:672*
>>>>>>
>>>>>> BR,
>>>>>> Varun
>>>>>>
>>>>>> _______________________________________________
>>>>>> Mailing list:
>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>>>>> Post to     : openstack at lists.openstack.org
>>>>>> Unsubscribe :
>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20151019/b9573fda/attachment.html>


More information about the Openstack mailing list