[Openstack] [Savanna] Vanilla Plugin with default configuration throws "Connection Refused"
Marc Solanas Tarre -X (msolanas - AAP3 INC at Cisco)
msolanas at cisco.com
Wed Dec 18 18:25:14 UTC 2013
Hi,
I asked this question in Launchpad (https://answers.launchpad.net/savanna/+question/240969), but I thought it might reach more people if I use the list.
My set up is:
Ubuntu 12.04
OpenStack Havana with Vanilla Plugin
I have deployed a cluster with the following node groups:
1 x master:
-Uses 1 cinder volume : 2TB
-namenode
-secondarynamenode
-oozie
-datanode
-jobtracker
-tasktracker
2x slaves:
-Uses 1 cinder volume: 2TB
-datanode
-tasktracker
Both node groups used the following flavor:
VCPUs: 32
RAM: 250000
Root disk: 300GB
Ephemeral: 300GB
Swap: 0
They also use the default Ubuntu Hadoop Vanilla image downloadable from https://savanna.readthedocs.org/en/latest/userdoc/vanilla_plugin.html
The /etc/hosts file in all nodes is:
127.0.0.1 localhost
10.0.0.2 test-master2T-001.novalocal test-master2T-001
10.0.0.3 test-slave2T-001.novalocal test-slave2T-001
10.0.0.4 test-slave2T-002.novalocal test-slave2T-002
Without changing any of the default configuration, the cluster boots correctly.
The problem is that, when running a job (for example, teragen 100GB), the map tasks fail many times, having to repeat them, thus increasing the job time. They seem to fail randomly, from one slave or the other, depending on the execution.
Checking the logs of the datanotes in the slaves, I can see this error:
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: java.net.ConnectException: Call to test-master2T-001/10.0.0.2:8020 failed on connection exception: java.net.ConnectException: Connection refused
Full error: http://pastebin.com/DDp39yqt
The logs of the datanode in the master, gives this error:
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: checkDiskError: exception:
java.net.SocketException: Original Exception : java.io.IOException: Connection reset by peer
Full error: http://pastebin.com/NXYXELQX
I have tried changing hadoop.tmp.dir to point to the 2TB cinder volume /volumes/disk1/lib/hadoop/hdfs/tmp, but nothing changed.
Thank you in advance.
Marc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20131218/b025f4a5/attachment.html>
More information about the Openstack
mailing list