Date: Fri, 19 May 1995 12:06:36 -0600
From: Mark Weil [Kodak FE Abq]
Message-Id: <9505191806.AA01073@zia.West.Sun.COM>
To: Paul Caskey
Subject: scsi



HISTORY:

While preparing an IPC to be taken home I encountered some problems using
a Sun 1.05GB 3.5" disk.  The solution may be interesting to some.


SYSTEM CONFIGURATION:

Sun IPC  ---  internal Sun 1.05GB 3.5" disk (id=3, sd0) Empty
	 ---  external Seagate WREN V 500MB 5.25" disk (id=1, sd1) SunOS
	 ---  external Sun 150MB .25" tape drive (id=4, st0)

The intent was to copy the OS from the WREN V to the 1.05GB internal disk
so the IPC could be taken home.


PROBLEM SUMMARY:

During the disk-to-disk dump (500MB external to 1.05GB internal) messages
in the console window indicated (at least to me) very bad problems.  The
messages were of the form:

Jan  4 14:45:07 sass001 vmunix: sd0a:   Error for command 'write'
Jan  4 14:45:07 sass001 vmunix: sd0a:   Error Level: Retryable
Jan  4 14:45:07 sass001 vmunix: sd0a:   Block 17256, Absolute Block: 17256
Jan  4 14:45:07 sass001 vmunix: sd0a:   Sense Key: Aborted Command
Jan  4 14:45:07 sass001 vmunix: sd0a:   Vendor 'SEAGATE' error code: 0x47

These messages were repeated with a different block number and were
being generated on the order of, say, 10e6 per second (lots!!!).

I interpeted these messages to mean the disk was bad, and the operation
was NOT successful.  I assumed the resulting 1.05GB disk would not be
useable as an OS disk.

I called Mark @ Sun and discussed the problem with him.  He located a
wonderful "fix it" document on the SunSolve CD.  I have included that document
at the end of this one.


SOLUTION SUMMARY:

The "fix" was to use the adb utility and modify the kernel to operate the
SCSI bus at only asynch (slow) speeds.

	adb -w /vmunix
	scsi_options?W 58
	$q

Now reboot the system and repeat tests.

The subsequent disk-to-disk dumps were successful - no errors were generated!


OTHER APPLICATIONS:

I have seen a similiar problem on moat when booting and also during the
monthly disk-to-disk dumps.  The messages are a little different, but still
indicate the need for the "fix".  The messages on moat were of the form:

Dec  6 15:19:17 moat vmunix: esp0:      data transfer overrun
Dec  6 15:19:17 moat vmunix:    State=DATA Last State=DATA_DONE
Dec  6 15:19:17 moat vmunix:    Latched stat=0x11<XZERO,IO> intr=0x10<BUS> fifo
0x1
Dec  6 15:19:17 moat vmunix:    last msg out: <unknown msg 0xff>; last msg in:
IDENTIFY
Dec  6 15:19:17 moat vmunix:    DMA csr=0x80000000
Dec  6 15:19:17 moat vmunix:    addr=fff0c000 last=fff05e01 last_count=61ff
Dec  6 15:19:17 moat vmunix:    Cmd dump for Target 1 Lun 0:
Dec  6 15:19:17 moat vmunix:    cdb=[ 0x8 0x0 0x7a 0x20 0x50 0x0 0x0 0x0 0x0
0x0 ]
Dec  6 15:19:17 moat vmunix:    pkt_state 0xf<XFER,CMD,SEL,ARB> pkt_flags 0x0
pkt_statistics 0x3
Dec  6 15:19:17 moat vmunix:    cmd_flags=0x21 cmd_timeout 35
Dec  6 15:19:17 moat vmunix:    Mapped Dma Space:
Dec  6 15:19:17 moat vmunix:            Base = 0x2000 Count = 0xa000
Dec  6 15:19:17 moat vmunix:    Transfer History:
Dec  6 15:19:17 moat vmunix:            Base = 0x2000 Count = 0xa000
Dec  6 15:19:17 moat vmunix:    current phase 0x26=DATAIN       stat=0x1
0x61ff
Dec  6 15:19:17 moat vmunix:    current phase 0x1b=RESEL        stat=0x7
0x1     0x0
Dec  6 15:19:17 moat vmunix:    current phase 0x5=MSG_IN        stat=0x7
0x4
Dec  6 15:19:17 moat vmunix:    current phase 0x28=DISCONNECT   stat=0x7
0xa000
Dec  6 15:19:17 moat vmunix:    current phase 0x2c=SAVEDP       stat=0x7
0xa000
Dec  6 15:19:17 moat vmunix:    current phase 0x26=DATAIN       stat=0x11
0xa000
Dec  6 15:19:17 moat vmunix:    current phase 0x1b=RESEL        stat=0x17
0x1     0x0
Dec  6 15:19:17 moat vmunix:    current phase 0x5=MSG_IN        stat=0x17
0x4
Dec  6 15:19:17 moat vmunix:    current phase 0x28=DISCONNECT   stat=0x17
0xa000
Dec  6 15:19:17 moat vmunix:    current phase 0x2c=SAVEDP       stat=0x17
0xa000
Dec  6 15:19:17 moat vmunix:    current phase 0x20=SELECT       stat=0x10
0x1     0x0
Dec  6 15:19:17 moat vmunix:    current phase 0x1=CMD_START     stat=0x10
0x8     0x20
Dec  6 15:19:17 moat vmunix:    current phase 0xb=CMD_CMPLT     stat=0x17
0x400
Dec  6 15:19:17 moat vmunix:    current phase 0x27=STATUS       stat=0x17
0x0
Dec  6 15:19:17 moat vmunix:    current phase 0xb=CMD_CMPLT     stat=0x13
Dec  6 15:19:17 moat vmunix:    current phase 0x26=DATAIN       stat=0x11
0x400
Dec  6 15:19:17 moat vmunix: esp0: Target 1.0 reducing sync. transfer rate
Dec  6 15:19:17 moat vmunix: esp0: Reverting to slow SCSI cable mode
Dec  6 15:19:17 moat vmunix: sd1:    SCSI transport failed: reason 'data_ovr':
retrying command

I applied the kernel fix today and will reboot moat this evening.  I will
examine the boot messages for errors, but I expect none.


THINGS TO REMEMBER:

The "fix" outlined in this document is a change to the kernel.  Any time
a kernel is rebuilt this change will have to be included.  For example
you rebuild a kernel to increase the MAXUSERS variable, you must remember
to apply the "adb" patch outlined above prioir to rebooting.

According to the Help Document located by Mark @ Sun these messages do
not indicate a Fatal problem - everything should work fine - the fact that
a message appears in the console window every 1/10th of a second can't be
good for performance!


HELP DOCUMENT LOCATED BY Mark@Sun:

INFODOC ID         : 1109

SYNOPSIS           : guidelines for support of fast (10MB/sec) SCSI systems

DETAIL DESCRIPTION : SCSI Configurations using Single-Ended Devices

SCOPE:

	The high performance SCSI devices now available provide
	the capability of significantly improving system performance
	for some applications.  One of the special capabilities of
	these devices is the ability to transfer data at a 10-megabyte-
	per-second data rate using the "fast SCSI" synchronous transfer
	timings defined by the SCSI-2 standard. These high performance
	SCSI devices are fully compatible with standard SCSI devices
	and will operate in almost all normal SCSI configurations.
	Some SCSI enclosures, cables, and terminators do not take into
	account the special loading and impedance matching requirements
	for fast SCSI.  The attachment of such peripherals may cause
	systems using fast SCSI devices to operate incorrectly.  Such
	nonconforming SCSI cables and enclosures include some of Sun's
	early designs and some third-party cables, terminators, and
	peripheral device enclosures.

	The installation manuals for all fast SCSI devices and all new
	Sun installation manuals contain the strong recommendation that
	fast SCSI devices not be placed on the same SCSI port with SCSI
	components that do not conform with the requirements for fast
	SCSI.  This paper provides recommendations for the technical
	modifications that can be made in a SCSI system to allow the
	operation of fast SCSI and nonconforming enclosures, cables,
	or terminators on the same system.

SOLUTION SUMMARY:

1.0     IDENTIFICATION OF SUN SYSTEMS REQUIRING SPECIAL ATTENTION

	Differential SCSI host adapters and devices, including the
	DSBE/S card and the Differential SCSI Data Center Disk Tray,
	are all designed to meet fast SCSI requirements and will
	operate at 10 Megabytes per second.  The maximum total cable
	length of a differential SCSI system is 25 meters.  The
	installation guides for the SCSI devices indicate the
	equivalent cable length of the device.

	SCSI host systems that operate at 5 megabytes per second,
	including all Sun SPARC-based systems developed prior to the
	SPARCsystem 10, will support any presently defined
	configuration of 5 megabyte SCSI devices.  A fast SCSI device
	can be installed on such systems, since the host and the fast
	SCSI device automatically negotiate the proper operational
	speed.  Fast SCSI devices attached to 5 megabyte hosts will
	only operate at 5 megabytes, but the capacity and access
	latency improvements provided by many such devices can still
	improve the flexibility and performance of such systems.
	Single-ended SCSI systems operating at 5 megabytes have a
	maximum total cable length of 6 meters.


1.1     SCSI systems and host adapters that operate at 10 megabytes per
	second, including the SPARCsystem 600MP series, the SPARCsystem
	10, and the FSBE/S host adapter, will support any presently
	defined configuration of 5 megabyte devices.  Again, the host
	will determine automatically that the devices are 5 megabyte
	per second devices and negotiate the proper operational speed
	with each device.

	SCSI host systems that operate at 10 megabytes per second and
	have at least one fast SCSI device attached require that the
	entire SCSI port configuration be composed of components that
	will support fast SCSI.  The components include cables, device
	enclosures, and terminators.  The recent Sun SCSI products,
	including the Desktop Storage Pack, the Desktop Storage Module,
	and SCSI Expansion Pedestal are devices and enclosures that
	meet the fast SCSI requirements.  The regulated terminator (Sun
	part number 150-1785-02) meets the fast SCSI requirements.  The
	host will negotiate with the 10 megabyte devices to perform 10
	megabyte transfers and with each of the other devices to
	perform transfers at their preferred rates.  Single-ended SCSI
	systems operating at 10 megabytes using the proper components
	have a maximum total cable length of 6 meters, in accordance
	with the proposed SCSI-3 standard.

1.2     Those Sun enclosures with the three-row 50-pin D connector,
	including the External Storage Module, do not meet the fast
	SCSI requirements.  Those Sun enclosures with the
	Centronics-style 50-pin flat ribbon contact connector,
	including the Front Load 1/2-inch Tape Drive, do not meet the
	fast SCSI requirements. The Sun SCSI terminators other than
	150-1785-02 do not meet the fast SCSI requirements.  Section 4
	of this paper defines the steps that must be taken to assure
	reliable operation of fast SCSI systems containing combinations
	of fast SCSI devices and components that do not meet the fast
	SCSI requirements.  The maximum total cable length for such
	systems should not exceed 6 meters.


			 SUMMARY OF SYSTEM REQUIREMENTS
				   TABLE 1

	   |  SCSI Host  |  fast SCSI  |  5 Mbyte SCSI  |    Special    |
	   |    Type     |    device   |     device     | Modifications |
	   |             |  installed? |   installed?   |    Required?  |
	   |_____________|_____________|________________|_______________|
	   |             |             |                |               |
	   | 5 megabyte  |  don't care |   don't care   |       no      |
	   |_____________|_____________|________________|_______________|
	   |             |             |                |               |
	   | 10 megabyte |     no      |   don't care   |       no      |
	   |_____________|_____________|________________|_______________|
	   |             |             |                |               |
	   | 10 megabyte |     yes     |   all conform  |       no      |
	   |             |             |  to fast SCSI  |               |
	   |             |             |  requirements  |               |
	   |_____________|_____________|________________|_______________|
	   |             |             |                |               |
	   | 10 megabyte |     yes     |   one or more  |      yes      |
	   |             |             | don't conform  | see section 4 |
	   |             |             |  to fast SCSI  |               |
	   |             |             |  requirements  |               |
	   |_____________|_____________|________________|_______________|


2.0     IDENTIFICATION OF MIXED VENDOR SYSTEMS REQUIRING SPECIAL ATTENTION

	SCSI peripheral devices, connectors, and cables provided by
	companies other than Sun are not tested by Sun in the fast SCSI
	environment.  If any of the following symptoms occur when using
	such devices in Sun fast SCSI systems, it may be because the
	peripheral device, related components, or the configuration

	does not conform to the fast SCSI requirements.  The steps
	described in section 4 can usually be used to correct these
	symptoms if the components meet the standard SCSI
	requirements.  The system will usually continue operating
	normally, even if these errors do occur, because as part of the
	software error recovery, the SCSI data rate is slowed to allow
	reliable operation.

	The maximum total cable length for such devices should be 6
	meters if they properly follow the recommendations of the SCSI
	standards committee.

				CHART OF SYMPTOMS
	   RELATED TO SCSI DEVICES NOT MEETING FAST SCSI REQUIREMENTS

	Sun OS 4.1.3
	      Examples of the warning system messages that occur during
		 boot are contained in the appendix to this paper.  The
		 key words of one symptom are:

		 Target 1.0 reducing sync. transfer rate
		 SCSI transport failed: reason 'reset': retrying command

		 Target 1.0 reverting to async. mode
		 SCSI transport failed: reason 'reset': retrying command

		 A second symptom may be:

		 Current command timeout for Target 3 Lun 0
		 Cmd dump for Target 3 Lun 0:

		 Target 3.0 reducing sync. transfer rate
		 SCSI transport failed: reason 'reset': retrying command

		 A third symptom may be:

		 Error for command 'read'
		 Error Level: Retryable
		 Sense Key: Aborted Command
		 Vendor 'XXYYZZ' error code: 0x47

	 Sun Solaris 2.x

	      Examples of the warning system messages that occur during
		 boot are contained in the appendix to this paper.  The
		 key words of one symptom are:

		 WARNING: ....
		      SCSI bus DATA IN phase parity error
		 WARNING: ....
		      Error for command 'read'     Error Level: Retryable
		      Sense Key: Aborted Command
			 ......

		 A second symptom may be:

		 WARNING: ....
		      SCSI transport failed: reason 'timeout':retrying command

	      The present negotiated data rate in kilobytes per second
		 can be determined for a disk by requesting the necessary
		 data with the prtconf command as shown below.  If the
		negotiated rate is lower than expected, error recovery
		procedures may have been executed because of nonconforming
		devices in the configuration.

		 # prtconf -v

			 esp, unit #0
			    Driver software properties:
				 name <target1-sync-speed> length <4>
				    value <0x00002710>.

		 The value 0x00002710 is 10000 kilobytes per second in
		 decimal.

	      If the boot process was not observed, the boot messages
		 are stored in the file /var/adm/messages for reference.
		 The messages can be displayed by performing the command:

		 # dmesg | more

 3.0    METHODS FOR MANAGING FAST SCSI SYSTEMS WITH NONCONFORMING COMPONENTS

	3.1     Follow installation recommendations

	The use of fast SCSI hosts and fast SCSI peripherals provides
	significant performance improvements for some types of
	applications.  To take full advantage of those performance
	improvements, the installation guides for SCSI devices
	recommend that only those components and peripheral devices
	supporting fast SCSI requirements be installed on a fast SCSI
	port.  If nonconforming devices must also be installed on a
	host, a separate SCSI host adapter should be installed and all
	the nonconforming devices should be installed on that SCSI
	port, isolated from all the fast SCSI devices that are running
	on fast SCSI host adapters.

	3.2     Actively terminate SCSI configurations containing the ESM

	The External Storage Module (ESM) is a special case, since it
	conforms to the fast SCSI requirements except for its adapter
	cable and terminator.  The following procedure allows the
	correct termination of the External Storage Module and allows
	correct fast SCSI operation for all fast SCSI devices installed
	on the SCSI port as well as normal synchronous operation for
	the devices installed in the ESM.

	One or two ESMs may be installed in the middle of a string of
	SCSI devices.  Use a Desktop Storage Pack or Desktop Storage
	Module with a regulated terminator (Sun part number
	150-1785-02) as the device farthest away from the host on the
	SCSI port.  Connect the ESM's into the string of SCSI devices
	using 0.8 m Sun cables.  (Sun part number 530-1829-01,
	Rev.51).  Do not exceed the maximum total cable length of 6
	meters.


	3.3     Slow all SCSI ports to asynchronous operation.

	For all other fast SCSI hosts attaching devices that do not
	conform with the fast SCSI requirements, the operating system
	should be modified to run all SCSI ports in asynchronous mode.
	This slower mode fully interlocks all the SCSI data transfer
	signals and provides for reliable operation of the Extended
	Storage Module at the end of a SCSI bus.  It allows Sun
	configurations containing both fast SCSI drives and
	nonconforming devices to operate reliably on fast SCSI ports.

	If the system configuration meets the standard SCSI
	requirements,  reliable operation can usually be provided
	with third-party components and peripherals as well.  The
	slower data rate applies to all SCSI ports on the system.  Some
	applications may show a decrease in performance because of the
	slower data rate.

     For 4.1.x. OS:

		 To change to the slower asynchronous data rate, type:

			 adb -w /vmunix
			 scsi_options?W 58
			 $q

		 then reboot the system.

		To turn synchronous transfer back on at the
		highest possible speed, use the same procedure,
		replacing the middle line with:

			scsi_options?W 178

      For Solaris 2.x:

		To change to the slower asynchronous data rate,
		add the following line to /etc/system file:

			 set scsi_options = 0x58

		then reboot the system.

		To turn synchronous transfer back on at the
		highest possible speed without using tagged
		queueing, change the scsi_options line to:

			set scsi_options = 0X178

		To turn synchronous transfer back on at the
		highest possible speed allowing tagged queueing
		(if available in the operating system),
		change the scsi_options line to:

			set scsi_options = 0X1f8




				 APPENDIX A

		      SAMPLES OF 4.1.3 ERROR MESSAGES


 In this example, target 1 (sd1 on esp0) is a fast scsi disk

 Sep 16 15:53:23 b34a vmunix: esp0: Target 1.0 reducing sync. transfer rate
 Sep 16 15:53:23 b34a vmunix: sd1:  SCSI transport failed: reason 'reset':
				    retrying command
 Sep 16 15:53:23 b34a vmunix: esp0: Current command timeout for Target 1 Lun 0
 Sep 16 15:53:23 b34a vmunix: esp0: State=DATA_DONE (0xa), Last State=DATA
 (0x9)
 Sep 16 15:53:23 b34a vmunix: esp0: Cmd dump for Target 1 Lun 0:
 Sep 16 15:53:23 b34a vmunix: esp0: cdb=[0x8 0x0 0x7e 0x0 0x10 0x0 0x0 0x0
					  0x0 0x0]
 Sep 16 15:53:23 b34a vmunix: esp0: Target 1.0 reverting to async. mode
 Sep 16 15:53:23 b34a vmunix: sd1:  SCSI transport failed: reason 'reset':
				    retrying command
 or

 Sep 16 15:57:41 b34a vmunix: sd3 at esp0 target 0 lun 0
 Sep 16 15:57:41 b34a vmunix: sd3: <SUN0669 cyl 1614 alt 2 hd 15 sec 54>
 Sep 16 16:01:12 b34a vmunix: esp0: Current command timeout for Target 3 Lun 0
 Sep 16 16:01:12 b34a vmunix: esp0: State=DATA_DONE (0xa), Last State=DATA
 (0x9)
 Sep 16 16:01:12 b34a vmunix: esp0: Cmd dump for Target 3 Lun 0:
 Sep 16 16:01:12 b34a vmunix: esp0: cdb=[0x8 0x0 0x0 0x0 0x7e 0x0 0x0 0x0
					  0x0 0x0]
 Sep 16 16:01:12 b34a vmunix: esp0: Target 3.0 reducing sync. transfer rate
 Sep 16 16:01:12 b34a vmunix: sd0:  SCSI transport failed: reason 'reset':
				    retrying command
 Sep 16 16:01:12 b34a vmunix: sd1:  SCSI transport failed: reason 'reset':
				    retrying command
or

 Sep 16 16:36:51 b34a vmunix: sd3c: Error for command 'read'
 Sep 16 16:36:51 b34a vmunix: sd3c: Error Level: Retryable
 Sep 16 16:36:51 b34a vmunix: sd3c: Block 1386, Absolute Block: 1386
 Sep 16 16:36:51 b34a vmunix: sd3c: Sense Key: Aborted Command
 Sep 16 16:36:51 b34a vmunix: sd3c: Vendor 'MICROP' error code: 0x47


		 SAMPLES OF SOLARIS 2.x ERROR MESSAGES


 In this example internal disk 1 (target 1) is a 10 MB/sec disk:

 WARNING: /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000
 (esp0):
	   SCSI bus DATA IN phase parity error

 WARNING: /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@1,
	   0 (sd1):
	 Error for command 'read' Error Level: Retryable
	 Block 59640, Absolute Block: 59640
	 Sense Key: Aborted Command
	 Vendor 'SEAGATE' error code: 0x48 (<unknown extended sense code
				      0x48>), 0x0

 or:

 WARNING: /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@1,
	   0 (sd1):
	   SCSI transport failed: reason 'timeout': retrying command


			    APPENDIX   B

 TABLE OF DEVICES, SYSTEMS, AND THEIR FAST-SCSI CHARACTERISTICS

 SYSTEMS AND HOST ADAPTERS

 Official Name                                           SCSI Data Rate

   SPARCsystem 10                                         fast SCSI

	 424 Megabyte internal Disk                      5 MByte SCSI
	 1.05 Gigabyte internal Disk                     fast SCSI

   SPARCstation 1                                        5 MByte SCSI
   SPARCstation 1+                                       5 MByte SCSI
   SPARCstation IPC                                      5 MByte SCSI
   SPARCstation SLC                                      5 MByte SCSI
   SPARCstation IPX                                      5 MByte SCSI
   SPARCstation ELC                                      5 MByte SCSI
   SPARCstation 2                                        5 MByte SCSI
   SPARCserver  4/330                                    5 MByte SCSI
   SPARCserver  4/370                                    5 MByte SCSI
   SPARCserver  4/390                                    5 MByte SCSI
   SPARCserver  630MP                                    presently fast SCSI
   SPARCserver  670MP                                    presently fast SCSI
   SPARCserver  690MP                                    presently fast SCSI
   SBus SCSI Host Adapter                                5 MByte SCSI
   SBE/S Host Adapter                                    5 MByte SCSI
   FSBE/S Host Adapter                                   fast SCSI
   DSBE/S Host Adapter                                   differential fast SCSI

 PERIPHERALS

 Official Name                           Common Name     SCSI Data Rate

   Desktop Storage Pack                  Lunchbox

	 207 Megabyte Disk                               5 MByte SCSI
	 424 Megabyte Disk                               5 MByte SCSI
	 Sun CD ROM                                      5 MByte SCSI
	 150 Megabyte 1/4" Tape                          5 MByte SCSI

   Desktop Storage Module                Dinnerbox

	 1.3 Gigabyte Disk                               5 MByte SCSI
	 2.3 Gigabyte 8 mm Tape Drive                    5 MByte SCSI
	 5.0 Gigabyte 8 mm Tape Drive                    5 MByte SCSI

   SCSI Expansion Pedestal               Bullwinkle

	 1.3 Gigabyte Disk                               5 MByte SCSI
	 2.3 Gigabyte 8 mm Tape Drive                    5 MByte SCSI
	 5.0 Gigabyte 8 mm Tape Drive                    5 MByte SCSI
	 Sun CD ROM                                      5 MByte SCSI
	 2.1 Gigabyte Disk                               differential fast SCSI

   Differential SCSI Data Center Disk Tray   Tarzan
	 2.1 Gigabyte Disk                               differential fast SCSI

   Front Load Tape Drive 1/2" tape                       5 MByte SCSI

   External Storage Module               P-Box           5 MByte SCSI

KEYWORDS           : SCSI configurations, using single-ended devices

PRODUCT            : Prphl