Huge thanks to our Platinum Members Endace and LiveAction,
and our Silver Member Veeam, for supporting the Wireshark Foundation and project.

Wireshark-users: Re: [Wireshark-users] FCIP issues with SAN replication

From: Hansang Bae <for_list_hbae@xxxxxxxxxx>
Date: Thu, 15 Jul 2010 23:26:48 -0400
On 7/14/2010 2:52 PM, Chandler, Mel wrote:

Gerald,

 

We do have window scaling enabled, but it’s hard set at 32kb per HP’s recommendations.  I will recommend we “supersize” it when we discuss this issue again.

 

 

Bill,

 

I’m starting to see the pattern here, I’ll recommend to HP we set the window as large as it’ll go.  They’re following some chart they have in the user guide based on CIR and latency, that tells them what window size and scaling factor to set.  They’re magic formula has indicated we should be using 32kb, but I’ve performed tests with iperf and seen with larger windows comes greater data rates.  I suspect this is not the issue though, as it still fails with high TCP timer expired errors.

 

 

Martin,

 

The near side is 10.244.249.31/32 to the far side is 10.245.249.31/32.  I think IP addressing is fine; otherwise, we would have worse problems where replication would not even start.

 

We’re following HP’s recommendations and so far they’ve wanted to set the window at 32kb.  I agree with you it should be as large as possible, but for now, this is HP’s show.  I will recommend a larger window next time we discuss the issue with them.

 

I agree there is a configuration problem or timeout that needs to be adjusted or possibly a malfunctioning NIC in their FC gateways.

 

 

 

Now the update:

 

I discovered if I disable the FCP decode, Wireshark does decode it correctly as FCIP.

We applied a QoS config to flag SAN replication traffic as DSCP EF and have seen consistent ping times of ~36ms between sites and the bandwidth climb as high as 45Mbps on a 1Gbps link.  They still fail after replicating for a few hours.  Last time we watched them replicate for 12 hours and then fail.  The TCP timer exceed counter seems to indicate that is the problem, but I have nothing significant on the wireshark captures to support this.

HP has decided that the MPX110 on the far side needs to be replaced.  I'll post an update after that's done.



Mel,
I'm coming into this thread a bit late, but do you have the trace files handy?  You can always use editcap to chop it off at the header.  FCIP is very jitter sensitive, but it sounds like you have a big enough pipe.   Please remember one thing....rule of TCP says that you *CANNOT* transfer more than one windows worth per round trip.  THAT'S IT!   So in your case, you will NEVER transfer more than 32KB per 36ms.  

If you're only using 32KB, you don't need window scaling.  So I"m not sure why HP would recommend turning on RFC 1323 features if you're not going to use them.

Also, Stevens graph may show you something that jumps out, so you should take a look.  The sessions I presented during Sharkfest 2010 may give you additional clues as to what to look for.

hsb