[Openstack] [Savanna] Vanilla Plugin with default configuration throws "Connection Refused"

Dmitry Mescheryakov dmescheryakov at mirantis.com
Wed Dec 18 19:15:17 UTC 2013


Mark,

I believe we didn't face the problem so far. Did you test network
connection between nodes on its stability and throughput? Maybe the error
is caused by network oversaturation.

Though the errors show network as a problem, it might be worth checking
with Hadoop community if such exceptions could be caused by reason
different from network malfunction.

Dmitry




2013/12/18 Marc Solanas Tarre -X (msolanas - AAP3 INC at Cisco) <
msolanas at cisco.com>

>   Hi,
>
> I asked this question in Launchpad (
> https://answers.launchpad.net/savanna/+question/240969), but I thought it
> might reach more people if I use the list.
>
> My set up is:
>
> Ubuntu 12.04
> OpenStack Havana with Vanilla Plugin
>
> I have deployed a cluster with the following node groups:
>
> 1 x master:
>
>   -Uses 1 cinder volume : 2TB
>
>   -namenode
>   -secondarynamenode
>   -oozie
>   -datanode
>   -jobtracker
>   -tasktracker
>
> 2x slaves:
>
>   -Uses 1 cinder volume: 2TB
>
>   -datanode
>   -tasktracker
>
> Both node groups used the following flavor:
>
> VCPUs: 32
> RAM: 250000
> Root disk: 300GB
> Ephemeral: 300GB
> Swap: 0
>
> They also use the default Ubuntu Hadoop Vanilla image downloadable from
> https://savanna.readthedocs.org/en/latest/userdoc/vanilla_plugin.html
>
> The /etc/hosts file in all nodes is:
> 127.0.0.1 localhost
> 10.0.0.2 test-master2T-001.novalocal test-master2T-001
> 10.0.0.3 test-slave2T-001.novalocal test-slave2T-001
> 10.0.0.4 test-slave2T-002.novalocal test-slave2T-002
>
> Without changing any of the default configuration, the cluster boots
> correctly.
>
> The problem is that, when running a job (for example, teragen 100GB), the
> map tasks fail many times, having to repeat them, thus increasing the job
> time. They seem to fail randomly, from one slave or the other, depending on
> the execution.
>
> Checking the logs of the datanotes in the slaves, I can see this error:
>
>  WARN org.apache.hadoop.hdfs.server.datanode.DataNode: java.net.
> ConnectException: Call to test-master2T-001/10.0.0.2:8020 failed on
> connection exception: java.net.ConnectException: Connection refused
>
> Full error: http://pastebin.com/DDp39yqt
>
> The logs of the datanode in the master, gives this error:
>
> WARN org.apache.hadoop.hdfs.server.datanode.DataNode: checkDiskError:
> exception:
> java.net.SocketException: Original Exception : java.io.IOException:
> Connection reset by peer
>
> Full error: http://pastebin.com/NXYXELQX
>
> I have tried changing hadoop.tmp.dir to point to the 2TB cinder volume
> /volumes/disk1/lib/hadoop/hdfs/tmp, but nothing changed.
>
> Thank you in advance.
>
>  Marc
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20131218/272e136a/attachment.html>


More information about the Openstack mailing list