[Openstack] [Savanna] Vanilla Plugin with default configuration throws "Connection Refused"

Dmitry Mescheryakov dmescheryakov at mirantis.com
Wed Dec 18 19:42:11 UTC 2013


And I just got an another idea. The Vanilla plugin has data-locality
disabled by default (I don't remember exactly why). I am not sure how
mapper will behave without data-locality info, i.e. will it understand that
it should always take data from its own node? So it is worth trying to
enable it. Here is the doc:

http://savanna.readthedocs.org/en/0.3/userdoc/features.html#data-locality

Dmitry


2013/12/18 Dmitry Mescheryakov <dmescheryakov at mirantis.com>

> Mark,
>
> I believe we didn't face the problem so far. Did you test network
> connection between nodes on its stability and throughput? Maybe the error
> is caused by network oversaturation.
>
> Though the errors show network as a problem, it might be worth checking
> with Hadoop community if such exceptions could be caused by reason
> different from network malfunction.
>
> Dmitry
>
>
>
>
> 2013/12/18 Marc Solanas Tarre -X (msolanas - AAP3 INC at Cisco) <
> msolanas at cisco.com>
>
>>   Hi,
>>
>> I asked this question in Launchpad (
>> https://answers.launchpad.net/savanna/+question/240969), but I thought
>> it might reach more people if I use the list.
>>
>> My set up is:
>>
>> Ubuntu 12.04
>> OpenStack Havana with Vanilla Plugin
>>
>> I have deployed a cluster with the following node groups:
>>
>> 1 x master:
>>
>>   -Uses 1 cinder volume : 2TB
>>
>>   -namenode
>>   -secondarynamenode
>>   -oozie
>>   -datanode
>>   -jobtracker
>>   -tasktracker
>>
>> 2x slaves:
>>
>>   -Uses 1 cinder volume: 2TB
>>
>>   -datanode
>>   -tasktracker
>>
>> Both node groups used the following flavor:
>>
>> VCPUs: 32
>> RAM: 250000
>> Root disk: 300GB
>> Ephemeral: 300GB
>> Swap: 0
>>
>> They also use the default Ubuntu Hadoop Vanilla image downloadable from
>> https://savanna.readthedocs.org/en/latest/userdoc/vanilla_plugin.html
>>
>> The /etc/hosts file in all nodes is:
>> 127.0.0.1 localhost
>> 10.0.0.2 test-master2T-001.novalocal test-master2T-001
>> 10.0.0.3 test-slave2T-001.novalocal test-slave2T-001
>> 10.0.0.4 test-slave2T-002.novalocal test-slave2T-002
>>
>> Without changing any of the default configuration, the cluster boots
>> correctly.
>>
>> The problem is that, when running a job (for example, teragen 100GB), the
>> map tasks fail many times, having to repeat them, thus increasing the job
>> time. They seem to fail randomly, from one slave or the other, depending on
>> the execution.
>>
>> Checking the logs of the datanotes in the slaves, I can see this error:
>>
>>  WARN org.apache.hadoop.hdfs.server.datanode.DataNode: java.net.
>> ConnectException: Call to test-master2T-001/10.0.0.2:8020 failed on
>> connection exception: java.net.ConnectException: Connection refused
>>
>> Full error: http://pastebin.com/DDp39yqt
>>
>> The logs of the datanode in the master, gives this error:
>>
>> WARN org.apache.hadoop.hdfs.server.datanode.DataNode: checkDiskError:
>> exception:
>> java.net.SocketException: Original Exception : java.io.IOException:
>> Connection reset by peer
>>
>> Full error: http://pastebin.com/NXYXELQX
>>
>> I have tried changing hadoop.tmp.dir to point to the 2TB cinder volume
>> /volumes/disk1/lib/hadoop/hdfs/tmp, but nothing changed.
>>
>> Thank you in advance.
>>
>>  Marc
>>
>> _______________________________________________
>> Mailing list:
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> Post to     : openstack at lists.openstack.org
>> Unsubscribe :
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20131218/cd9de428/attachment.html>


More information about the Openstack mailing list