Problem mit SAS-Platten

mr44er · 20 Februar 2018

Tach!

Ich hab hier zwei SAS-Platten, die eine spuckt folgendes in dmesg:

Code:

(da3:mps0:0:5:0): Descriptor 0x80: f7 01
(da3:mps0:0:5:0): Descriptor 0x81: 00 00 01 00 00 00
(da3:mps0:0:5:0): Retrying command (per sense data)
(da3:mps0:0:5:0): READ(6). CDB: 08 00 00 00 01 00
(da3:mps0:0:5:0): CAM status: SCSI Status Error
(da3:mps0:0:5:0): SCSI status: Check Condition
(da3:mps0:0:5:0): SCSI sense: MEDIUM ERROR asc:31,0 (Medium format corrupted)
(da3:mps0:0:5:0):
(da3:mps0:0:5:0): Field Replaceable Unit: 0
(da3:mps0:0:5:0): Command Specific Info: 0
(da3:mps0:0:5:0):
(da3:mps0:0:5:0): Descriptor 0x80: f7 01
(da3:mps0:0:5:0): Descriptor 0x81: 00 00 01 00 00 00
(da3:mps0:0:5:0): Error 5, Retries exhausted

die andere:

Code:

(da0:mps0:0:1:0): Descriptor 0x80: f8 21
(da0:mps0:0:1:0): Descriptor 0x81: 00 00 00 00 00 00
(da0:mps0:0:1:0): Error 22, Unretryable error
(da0:mps0:0:1:0): READ(10). CDB: 28 00 e8 e0 88 af 00 00 01 00
(da0:mps0:0:1:0): CAM status: SCSI Status Error
(da0:mps0:0:1:0): SCSI status: Check Condition
(da0:mps0:0:1:0): SCSI sense: ILLEGAL REQUEST asc:20,0 (Invalid command operation code)
(da0:mps0:0:1:0):
(da0:mps0:0:1:0): Field Replaceable Unit: 0
(da0:mps0:0:1:0): Command Specific Info: 0
(da0:mps0:0:1:0):
(da0:mps0:0:1:0): Descriptor 0x80: f8 21
(da0:mps0:0:1:0): Descriptor 0x81: 00 00 00 00 00 00
(da0:mps0:0:1:0): Error 22, Unretryable error

Die Platten sind 'gebraucht', haben aber grad mal 200 Stunden runter.

Code:

smartctl -a /dev/da3
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-RELEASE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:  HITACHI
Product:  HUS723020ALS641
Revision:  MS06
Compliance:  SPC-4
Rotation Rate:  7200 rpm
Form Factor:  3.5 inches
Logical Unit id:  0x5000cca01b8ef204
Serial number:  YFJJM1YD
Device type:  disk
Transport protocol:  SAS (SPL-3)
Local Time is:  Tue Feb 20 18:48:31 2018 CET
device is NOT READY (e.g. spun down, busy)
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

Bei /dev/da3 lass ich grade mal ein 'camcontrol format da3' laufen, in der Hoffnung, dass es was bringt. Daher schätze ich auch, dass smart auslesen gerade nicht geht.

Code:

smartctl -a /dev/da0
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-RELEASE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:  HITACHI
Product:  HUS723020ALS641
Revision:  MS06
Compliance:  SPC-4
User Capacity:  2,000,398,934,016 bytes [2.00 TB]
Logical block size:  512 bytes
Formatted with type 2 protection
Rotation Rate:  7200 rpm
Form Factor:  3.5 inches
Logical Unit id:  0x5000cca01b8e7158
Serial number:  YFJJAH9D
Device type:  disk
Transport protocol:  SAS (SPL-3)
Local Time is:  Tue Feb 20 18:47:52 2018 CET
SMART support is:  Available - device has SMART capability.
SMART support is:  Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:  29 C
Drive Trip Temperature:  55 C

Manufactured in week 48 of year 2012
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  19
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  23
Elements in grown defect list: 0

Vendor (Seagate) cache information
  Blocks sent to initiator = 994519810048

Error counter log:
  Errors Corrected by  Total  Correction  Gigabytes  Total
  ECC  rereads/  errors  algorithm  processed  uncorrected
  fast | delayed  rewrites  corrected  invocations  [10^9 bytes]  errors
read:  0  0  0  0  36  25.988  0
write:  0  0  0  0  1591  20.014  0
verify:  0  0  0  0  145  77.352  0

Non-medium error count:  0

SMART Self-test log
Num  Test  Status  segment  LifeTime  LBA_first_err [SK ASC ASQ]
  Description  number  (hours)
# 1  Background short  Completed  -  200  - [-  -  -]

Long (extended) Self Test duration: 6 seconds [0.1 minutes]

Info zum Controller (ist auf IT-Mode geflasht, weil wegen ZFS):

Code:

mps0 Adapter:
  Board Name: H220
  Board Assembly: H3-25278-05D
  Chip Name: LSISAS2308
  Chip Revision: ALL
  BIOS Revision: 7.39.02.00
Firmware Revision: 20.00.07.00
  Integrated RAID: no

Mach ich dabei was falsch? Ich stell die Kiste gerade noch zusammen und teste, läuft noch nicht produktiv.

mr44er · 20 Februar 2018

Update: Beide Platten formatiert, Fehlermeldungen weg. Vermutung: Wahrscheinlich war irgendeine krumme sectorsize draufgepackt.

double-p · 21 Februar 2018

Wuerde ich aber trotzdem beobachten.. ich hab schon "frisch-ab-Werk" Platten gehabt, die in unter 20h Betriebszeit umfaenglich gestorben sind.

mr44er · 21 Februar 2018

Jau, das wird ein RAIDZ3. Von daher bin ich recht entspannt.

mr44er · 22 Februar 2018

Update: Seit mittlerweile über 12 Stunden schubber ich Daten auf den Pool, rennt wie geschmiert.

mr44er · 22 Februar 2018

Update2:

Mir hat die Ursache da keine Ruhe gelassen, zumal die Disks jetzt einwandfrei laufen und bin dann über was leicht zu übersehendes gestolpert.

Formatted with type 2 protection

Das gab smartctl vor der Formatierung aus. (Steht/stand zwischen Logical block size und Rotation Rate)

Schnelles googlen brachte mir das:
http://talesinit.blogspot.de/2015/11/formatted-with-type-2-protection-huh.html

Not knowing what this was, I then went down a seemingly never ending spiral of T10 Protection Information [PDF] standards. Its pretty neat, how I understand it is the disk controller formats the platters to 520 byte sectors, instead of the more traditional 512 byte sectors, these 8 extra bytes per sector are there for the controller to make sure that the data written to that sector is the same data that is read from it, sort of like data verification. The disk controller can then presents the system (HBA controller or raid card) with the normal 512 bytes of data per section, and any SCSI compatible controller should be able to read and write to it just fine.

Ich hab also instinktiv richtig gehandelt und auch das mit der Sektorengröße geraten.

Problem mit SAS-Platten

mr44er

moderater Moderator

mr44er

moderater Moderator

double-p

BOFH

mr44er

moderater Moderator

mr44er

moderater Moderator

mr44er

moderater Moderator

Wir schützen deine Privatsphäre