SSD wird immer wieder ausgeworfen/detached

gadean

Depp vom Dienst!
Hi zusammen,
ich hab mal wieder ein Problem, kann es bisher jedoch nicht lokalisieren.
Aus mir unerfindlichen Gründen wird die SSD in unregelmäßigen Zeitabständen ausgeworfen.
Die SSD wird als Log Device in einem zfs-Pool genutzt und ist über einem zusätzlichen SATA-Controller angeschlossen, da alle Anschlüsse am Mainboard belegt sind.

Die Verkabelung habe ich überprüft und das Kabel auch vorsorglich ausgetauscht. Die ersten Monate lief das Setup so ohne Probleme.
Verbaut ist folgender Controller: HighPoint Rocket 620, Controller
Code:
pcib1: <ACPI PCI-PCI bridge> irq 17 at device 28.0 on pci0
pci1: <ACPI PCI bus> on pcib1
ahci0: <Marvell 88SE912x AHCI SATA controller> port 0xd080-0xd087,0xd000-0xd003,0xcc00-0xcc07,0xc880-0xc883,0xc800-0xc80f mem 0xfeaff000-0xfeaff7ff irq 16 at device 0.0 on pci1
ahci0: AHCI v1.00 with 2 6Gbps ports, Port Multiplier supported with FBS
ahci0: quirks=0x140<EDGEIS,NOBSYRES>
ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich1: <AHCI channel> at channel 1 on ahci0
atapci0: <Marvell 88SE912x UDMA133 controller> port 0xdc00-0xdc07,0xd880-0xd883,0xd800-0xd807,0xd480-0xd483,0xd400-0xd40f mem 0xfeaffc00-0xfeaffc0f irq 17 at device 0.1 on pci1
S.M.A.R.T. der SSD (den short Test habe ich abgebrochen und einen long gestartet):
Code:
# smartctl -a /dev/ada0
smartctl 6.3 2014-07-26 r3976 [FreeBSD 10.0-RELEASE-p3 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Marvell based SanDisk SSDs
Device Model:     SanDisk SD6SB1M064G1022I
Serial Number:    14074*******
LU WWN Device Id: 5 001b44 bd0b8a2bc
Firmware Version: X231600
User Capacity:    64,023,257,088 bytes [64.0 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      Unknown (0x000a)
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Apr 19 21:57:40 2015 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x11) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  10) minutes.

SMART Attributes Data Structure revision number: 4
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0032   100   100   ---    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   124   100   ---    Old_age   Always       -       124
 12 Power_Cycle_Count       0x0032   100   100   ---    Old_age   Always       -       23
166 Min_W/E_Cycle           0x0032   100   100   ---    Old_age   Always       -       1
167 Min_Bad_Block/Die       0x0032   100   100   ---    Old_age   Always       -       15
168 Maximum_Erase_Cycle     0x0032   100   100   ---    Old_age   Always       -       905
169 Total_Bad_Block         0x0032   100   100   ---    Old_age   Always       -       70
171 Program_Fail_Count      0x0032   100   100   ---    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   ---    Old_age   Always       -       0
173 Avg_Write_Erase_Ct      0x0032   100   100   ---    Old_age   Always       -       312
174 Unexpect_Power_Loss_Ct  0x0032   100   100   ---    Old_age   Always       -       18
187 Reported_Uncorrect      0x0032   100   100   ---    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   063   038   ---    Old_age   Always       -       37 (Min/Max 24/38)
212 SATA_PHY_Error          0x0032   100   100   ---    Old_age   Always       -       0
230 Perc_Write_Erase_Count  0x0032   100   100   ---    Old_age   Always       -       2600
232 Perc_Avail_Resrvd_Space 0x0033   100   100   004    Pre-fail  Always       -       100
233 Total_NAND_Writes_GiB   0x0032   100   100   ---    Old_age   Always       -       20309
241 Total_Writes_GiB        0x0030   253   253   ---    Old_age   Offline      -       2
242 Total_Reads_GiB         0x0030   253   253   ---    Old_age   Offline      -       2
243 Unknown_Attribute       0x0032   100   100   ---    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%       124         -
# 2  Short offline       Aborted by host               00%       123         -

Selective Self-tests/Logging not supported
immer wiederkehrender Output auf der Console/dmesg
Code:
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <SanDisk SD6SB1M064G1022I X231600> ATA-8 SATA 3.x device
ada0: Serial Number 14074*******
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes)
ada0: Command Queueing enabled
ada0: 61057MB (125045424 512 byte sectors: 16H 63S/T 16383C)
ada0: Previously was known as ad4
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <SanDisk SD6SB1M064G1022I X231600> s/n 14074******* detached
(ada0:ahcich0:0:0:0): Periph destroyed
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <SanDisk SD6SB1M064G1022I X231600> ATA-8 SATA 3.x device
ada0: Serial Number 14074*******
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes)
ada0: Command Queueing enabled
ada0: 61057MB (125045424 512 byte sectors: 16H 63S/T 16383C)
ada0: Previously was known as ad4
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <SanDisk SD6SB1M064G1022I X231600> s/n 14074******* detached
(ada0:ahcich0:0:0:0): Periph destroyed

Habt ihr irgendwelche Tips oder Hinweise? Übersehe ich etwas?
Ich bin für alle Hinweise dankbar.

LG
 
Hoi,

das könnte ein Auswurf durch Stromverlust / Spannungsschwankung sein. Prüf mal die Stromversorgung und die Kabel ob da alles ok ist. Bei der Gelegenheit am besten auch gleich mal schauen ob die Firmware Version die letzte aktuelle ist.

Gruß Bummibär
 
So also Netzteil kann ich auch ausschließen (ausgetauscht), tritt immer noch auf, die Firmware müsste aktuell sein (2012).
Noch andere Vorschläge?
 
Sorry total vergessen, es hängt nur die eine Platte dran, den Port hatte ich ebenfalls schon getauscht.
 
Zurück
Oben