ZFS configuration and tuning example on Sun Fire X4540

It's not new at my work, I did it 3.5 years ago, and the systems are still running perfectly, so I think may be worth to write it down here for sharing. I don't even know if it's still on the market, but some concepts, for example, raid layout and network tuning never too out of dated.

Hardware and setup

The x4540 has 6 controllers with 8 disks each for a total of 48 disks. By default, this system is configured with raidz1 devices comprised of disks on each of the 6 controllers. This redundant configuration is optimized for space with single-parity data protection, not for performance.
So, what I did was to use raidz2, which is raid6 to provide much better performance, ever further, I use 10+2 for each array, so that I got performance and redundancy.

Here is the layout for both array and system disks.

Solaris 10 OS mirror 1+1

<hostname>_1 raidz2 9+2

<hostname>_2 raidz2 9+2

<hostname>_3 raidz2 10+2

<hostname>_4 raidz2 10+2



  t0 t1 t2 t3 t4 t5 t6 t7
c0 c0t0 c0t1 c0t2 c0t3 c0t4 c0t5 c0t6 c0t7
c1 c1t0 c1t1 c1t2 c1t3 c1t4 c1t5 c1t6 c1t7
c2 c2t0 c2t1 c2t2 c2t3 c2t4 c2t5 c2t6 c2t7
c3 c3t0 c3t1 c3t2 c3t3 c3t4 c3t5 c3t6 c3t7
c4 c4t0 c4t1 c4t2 c4t3 c4t4 c4t5 c4t6 c4t7
c5 c5t0 c5t1 c5t2 c5t3 c5t4 c5t5 c5t6 c5t7
Each array, including mirror array for system, will survive from either one controller or 2 disks fail, which is just good for raidz2

ZFS creation scripts:
zpool create -f -m /spool12_1 spool12_1 raidz2 c0t1d0 c1t1d0 c2t0d0 c2t1d0 c3t0d0 c3t1d0 c4t0d0 c4t1d0 c5t0d0 c5t1d0 c0t2d0
zpool create -f -m /spool12_2 spool12_2 raidz2 c0t3d0 c1t2d0 c1t3d0 c2t2d0 c2t3d0 c3t2d0 c3t3d0 c4t2d0 c4t3d0 c5t2d0 c5t3d0
zpool create -f -m /spool12_3 spool12_3 raidz2 c0t4d0 c0t5d0 c1t4d0 c1t5d0 c2t4d0 c2t5d0 c3t4d0 c3t5d0 c4t4d0 c4t5d0 c5t4d0 c5t5d0
zpool create -f -m /spool12_4 spool12_4 raidz2 c0t6d0 c0t7d0 c1t6d0 c1t7d0 c2t6d0 c2t7d0 c3t6d0 c3t7d0 c4t6d0 c4t7d0 c5t6d0 c5t7d0

Here is the disk physical layout for the server, you can also get the same information by using command hd -x

 

                                            X4540 Rear

3: c0t3

7: c0t7

11: c1t3

15: c1t7

19: c2t3

23: c2t7

27: c3t3

31: c3t7

35: c4t3

39: c4t7

43: c5t3

47: c5t7

2: c0t2

6: c0t6

10: c1t2

14: c1t6

18: c2t2

22: c2t6

26: c3t2

30: c3t6

34: c4t2

38: c4t6

42: c5t2

46: c5t6

1: c0t1

5: c0t5

9: c1t1

13: c1t5

17: c2t1

21: c2t5

25: c3t1

29: c3t5

33: c4t1

37: c4t5

41: c5t1

45: c5t5

0: c0t0

4: c0t4

8: c1t0

12: c1t4

16: c2t0

20: c2t4

24: c3t0

28: c3t4

32: c4t0

36: c4t4

40: c5t0

44: c5t4

*

*

X4540

*

Front

*

*

 

Here is one of 4 ZFS pools

pool: spool05_3
 state: ONLINE
 scan: scrub repaired 0 in 15h27m with 0 errors on Sat Jul 13 01:48:19 2013
config:

        NAME        STATE     READ WRITE CKSUM
        spool05_3   ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            c0t4d0  ONLINE       0     0     0
            c0t5d0  ONLINE       0     0     0
            c1t4d0  ONLINE       0     0     0
            c1t5d0  ONLINE       0     0     0
            c2t4d0  ONLINE       0     0     0
            c2t5d0  ONLINE       0     0     0
            c3t4d0  ONLINE       0     0     0
            c3t5d0  ONLINE       0     0     0
            c4t4d0  ONLINE       0     0     0
            c4t5d0  ONLINE       0     0     0
            c5t4d0  ONLINE       0     0     0
            c5t5d0  ONLINE       0     0     0

errors: No known data errors

Tunning

     1. ZFS ARC setting 

        In /etc/system, one line to limit ARC to grab memory too much so OS or other application get squished, it has 64GB memory, leave 12GB to OS and application.

        set zfs:zfs_arc_max = 51539607552

      2. Network tunning for long haul transfers

        Network setup is an aggrated network interface on four 1gb ethernet port, initial setting turns out very poor for long haul transfers. < 2MB/sec.

        ndd set /dev/tcp tcp_max_buf 67108864
ndd set
/dev/tcp tcp_cwnd_max 33554432
ndd set
/dev/tcp tcp_recv_hiwat 1048576
ndd set
/dev/tcp tcp_xmit_hiwat 1048576
After tunning, we were able wrtie at sustainable rate @ 6.5GB/sec on 20 same nodes via LAN.

     3. ZFS Scrub

do 3 times a year scrub for each ZFS pool, the script is in my other article