Festplatten Probleme und korruptes UFS2

schorsch_76

FreeBSD Fanboy
Hallo Bsd Kollegen,

auf meinem NAS hab ich diese Platte als backup Platte am laufen. Die reinen Smart Werte zeigen keine Probleme aber das UFS wird immer wieder korrupt. Es gibt aber jede Menge DMA Fehler.

Was kann ich hier machen? Mainboard tauschen? Kabel hab ich schon neu rein gepackt.... :confused:



Code:
root@nas-dsm:/mnt # smartctl -a /dev/ada1
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-RELEASE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Samsung SpinPoint M8 (AF)
Device Model:     ST1000LM024 HN-M101MBB
Serial Number:    S2R8J9GCA06016
LU WWN Device Id: 5 0004cf 208b663f2
Firmware Version: 2AR10002
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.0, 3.0 Gb/s (current: 1.5 Gb/s)
Local Time is:    Mon Sep 24 21:37:42 2018 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (13560) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 226) minutes.
SCT capabilities:              (0x003f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail  Always       -       1
  2 Throughput_Performance  0x0026   252   252   000    Old_age   Always       -       0
  3 Spin_Up_Time            0x0023   089   089   025    Pre-fail  Always       -       3469
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1781
  5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   252   252   051    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0024   252   252   015    Old_age   Offline      -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       1504
10 Spin_Retry_Count        0x0032   252   252   051    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       11
12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1062
191 G-Sense_Error_Rate      0x0022   100   100   000    Old_age   Always       -       5
192 Power-Off_Retract_Count 0x0022   252   252   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0002   062   055   000    Old_age   Always       -       38 (Min/Max 14/55)
195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   252   252   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   252   252   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0036   098   098   000    Old_age   Always       -       1359
200 Multi_Zone_Error_Rate   0x002a   100   100   000    Old_age   Always       -       796
223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always       -       11
225 Load_Cycle_Count        0x0032   097   097   000    Old_age   Always       -       33429

SMART Error Log Version: 1
ATA Error Count: 28 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 28 occurred at disk power-on lifetime: 1462 hours (60 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 31 71 46 4d e0  Error: ICRC, ABRT at LBA = 0x004d4671 = 5064305

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ca 00 40 62 46 4d e0 00      00:19:28.824  WRITE DMA
  c6 00 10 00 00 00 e0 00      00:19:28.824  SET MULTIPLE MODE
  ef 03 45 00 00 00 e0 00      00:19:28.824  SET FEATURES [Set transfer mode]
  ec 00 00 00 00 00 e0 00      00:19:28.824  IDENTIFY DEVICE
  ec 00 00 00 00 00 e0 00      00:19:28.824  IDENTIFY DEVICE

Error 27 occurred at disk power-on lifetime: 1462 hours (60 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 31 71 46 4d e0  Error: ICRC, ABRT at LBA = 0x004d4671 = 5064305

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ca 00 40 62 46 4d e0 00      00:19:28.674  WRITE DMA
  ca 00 40 22 46 4d e0 00      00:19:28.673  WRITE DMA
  ca 00 40 22 46 4d e0 00      00:19:28.673  WRITE DMA
  ca 00 40 22 46 4d e0 00      00:19:28.673  WRITE DMA
  ca 00 40 22 46 4d e0 00      00:19:28.673  WRITE DMA

Error 26 occurred at disk power-on lifetime: 1462 hours (60 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 31 31 46 4d e0  Error: ICRC, ABRT at LBA = 0x004d4631 = 5064241

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ca 00 40 22 46 4d e0 00      00:19:28.673  WRITE DMA
  ca 00 40 22 46 4d e0 00      00:19:28.673  WRITE DMA
  ca 00 40 22 46 4d e0 00      00:19:28.673  WRITE DMA
  ca 00 40 22 46 4d e0 00      00:19:28.673  WRITE DMA
  ca 00 40 22 46 4d e0 00      00:19:28.673  WRITE DMA

Error 25 occurred at disk power-on lifetime: 1462 hours (60 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 31 31 46 4d e0  Error: ICRC, ABRT at LBA = 0x004d4631 = 5064241

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ca 00 40 22 46 4d e0 00      00:19:28.673  WRITE DMA
  ca 00 40 22 46 4d e0 00      00:19:28.673  WRITE DMA
  ca 00 40 22 46 4d e0 00      00:19:28.673  WRITE DMA
  ca 00 40 22 46 4d e0 00      00:19:28.673  WRITE DMA
  ca 00 40 e2 45 4d e0 00      00:19:28.673  WRITE DMA

Error 24 occurred at disk power-on lifetime: 1462 hours (60 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 31 31 46 4d e0  Error: ICRC, ABRT at LBA = 0x004d4631 = 5064241

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ca 00 40 22 46 4d e0 00      00:19:28.673  WRITE DMA
  ca 00 40 22 46 4d e0 00      00:19:28.673  WRITE DMA
  ca 00 40 22 46 4d e0 00      00:19:28.673  WRITE DMA
  ca 00 40 e2 45 4d e0 00      00:19:28.673  WRITE DMA
  ca 00 40 e2 45 4d e0 00      00:19:28.673  WRITE DMA

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      1504         -
# 2  Offline             Completed without error       00%         0         -
# 3  Offline             Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Completed [00% left] (0-65535)
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
Hier ohne Spoiler Tag.
Code:
root@nas-dsm:/mnt # smartctl -a /dev/ada1
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-RELEASE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Samsung SpinPoint M8 (AF)
Device Model:     ST1000LM024 HN-M101MBB
Serial Number:    S2R8J9GCA06016
LU WWN Device Id: 5 0004cf 208b663f2
Firmware Version: 2AR10002
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.0, 3.0 Gb/s (current: 1.5 Gb/s)
Local Time is:    Mon Sep 24 21:37:42 2018 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (13560) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 226) minutes.
SCT capabilities:              (0x003f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail  Always       -       1
  2 Throughput_Performance  0x0026   252   252   000    Old_age   Always       -       0
  3 Spin_Up_Time            0x0023   089   089   025    Pre-fail  Always       -       3469
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1781
  5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   252   252   051    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0024   252   252   015    Old_age   Offline      -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       1504
10 Spin_Retry_Count        0x0032   252   252   051    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       11
12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1062
191 G-Sense_Error_Rate      0x0022   100   100   000    Old_age   Always       -       5
192 Power-Off_Retract_Count 0x0022   252   252   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0002   062   055   000    Old_age   Always       -       38 (Min/Max 14/55)
195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   252   252   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   252   252   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0036   098   098   000    Old_age   Always       -       1359
200 Multi_Zone_Error_Rate   0x002a   100   100   000    Old_age   Always       -       796
223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always       -       11
225 Load_Cycle_Count        0x0032   097   097   000    Old_age   Always       -       33429

SMART Error Log Version: 1
ATA Error Count: 28 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 28 occurred at disk power-on lifetime: 1462 hours (60 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 31 71 46 4d e0  Error: ICRC, ABRT at LBA = 0x004d4671 = 5064305

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ca 00 40 62 46 4d e0 00      00:19:28.824  WRITE DMA
  c6 00 10 00 00 00 e0 00      00:19:28.824  SET MULTIPLE MODE
  ef 03 45 00 00 00 e0 00      00:19:28.824  SET FEATURES [Set transfer mode]
  ec 00 00 00 00 00 e0 00      00:19:28.824  IDENTIFY DEVICE
  ec 00 00 00 00 00 e0 00      00:19:28.824  IDENTIFY DEVICE

Error 27 occurred at disk power-on lifetime: 1462 hours (60 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 31 71 46 4d e0  Error: ICRC, ABRT at LBA = 0x004d4671 = 5064305

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ca 00 40 62 46 4d e0 00      00:19:28.674  WRITE DMA
  ca 00 40 22 46 4d e0 00      00:19:28.673  WRITE DMA
  ca 00 40 22 46 4d e0 00      00:19:28.673  WRITE DMA
  ca 00 40 22 46 4d e0 00      00:19:28.673  WRITE DMA
  ca 00 40 22 46 4d e0 00      00:19:28.673  WRITE DMA

Error 26 occurred at disk power-on lifetime: 1462 hours (60 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 31 31 46 4d e0  Error: ICRC, ABRT at LBA = 0x004d4631 = 5064241

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ca 00 40 22 46 4d e0 00      00:19:28.673  WRITE DMA
  ca 00 40 22 46 4d e0 00      00:19:28.673  WRITE DMA
  ca 00 40 22 46 4d e0 00      00:19:28.673  WRITE DMA
  ca 00 40 22 46 4d e0 00      00:19:28.673  WRITE DMA
  ca 00 40 22 46 4d e0 00      00:19:28.673  WRITE DMA

Error 25 occurred at disk power-on lifetime: 1462 hours (60 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 31 31 46 4d e0  Error: ICRC, ABRT at LBA = 0x004d4631 = 5064241

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ca 00 40 22 46 4d e0 00      00:19:28.673  WRITE DMA
  ca 00 40 22 46 4d e0 00      00:19:28.673  WRITE DMA
  ca 00 40 22 46 4d e0 00      00:19:28.673  WRITE DMA
  ca 00 40 22 46 4d e0 00      00:19:28.673  WRITE DMA
  ca 00 40 e2 45 4d e0 00      00:19:28.673  WRITE DMA

Error 24 occurred at disk power-on lifetime: 1462 hours (60 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 31 31 46 4d e0  Error: ICRC, ABRT at LBA = 0x004d4631 = 5064241

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ca 00 40 22 46 4d e0 00      00:19:28.673  WRITE DMA
  ca 00 40 22 46 4d e0 00      00:19:28.673  WRITE DMA
  ca 00 40 22 46 4d e0 00      00:19:28.673  WRITE DMA
  ca 00 40 e2 45 4d e0 00      00:19:28.673  WRITE DMA
  ca 00 40 e2 45 4d e0 00      00:19:28.673  WRITE DMA

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      1504         -
# 2  Offline             Completed without error       00%         0         -
# 3  Offline             Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Completed [00% left] (0-65535)
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
Hier ist die Fehlermeldung des unsauberen UFS

Code:
root@nas-dsm:~ # ./open-backup.sh
mount: /dev/gpt/GELI-NAS-Backup.eli: R/W mount of /mnt/USB3.0-1T denied. Filesystem is not clean - run fsck.: Operation not permitted
root@nas-dsm:~ # cat open-backup.sh
#!/bin/sh

KEY=/root/scripts/nas-backup.key
TARGET_DISK=/dev/gpt/GELI-NAS-Backup
TARGET_PATH=/mnt/USB3.0-1T

geli attach -p -k "$KEY" "$TARGET_DISK"
mount "$TARGET_DISK".eli "$TARGET_PATH"

root@nas-dsm:~ # fsck_ufs -y /dev/gpt/GELI-NAS-Backup.eli
** /dev/gpt/GELI-NAS-Backup.eli
** Last Mounted on /mnt/USB3.0-1T
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
FREE BLK COUNT(S) WRONG IN SUPERBLK
SALVAGE? yes

SUMMARY INFORMATION BAD
SALVAGE? yes

BLK(S) MISSING IN BIT MAPS
SALVAGE? yes

3 files, 26684426 used, 209837413 free (5 frags, 26229676 blocks, 0.0% fragmentation)

***** FILE SYSTEM MARKED CLEAN *****

***** FILE SYSTEM WAS MODIFIED *****
root@nas-dsm:~ # mount /dev/gpt/GELI-NAS-Backup
GELI-NAS-Backup%     GELI-NAS-Backup.eli%
root@nas-dsm:~ # mount /dev/gpt/GELI-NAS-Backup.eli /mnt/USB3.0-1T/
root@nas-dsm:~ # ls /mnt/USB3.0-1T/
lost+found              zroot@2018-09-25.zfs
root@nas-dsm:~ # ls /mnt/USB3.0-1T/lost+found/
 
hi

sata kabel mal wechseln , anschluesse reinigen .

wenn moeglich , sata anschluss auf dem board mall wechseln , zum testen.

holger
 
Das er das Kabel getauscht hat, wurde ja schon erwähnt. Interessant wäre mal die eigentliche Fehlermeldung. Was steht im dmesg, wie ist der Output von fsck?
Kann es sein das es nie "richtig" repariert wurde? Also von einem Livesystem so oft ein fsck auf die Platte bis keine Fehler mehr kommen?
 
Oben hab ich den Output von fsck_ufs gepostet. Danach kommt auch kein Fehler mehr bzw. FreeBSD meldet erst mal nichts mehr. Wenn ich dann wieder ein Backup mache und 100 GB neu auf die Platte schubsen will ist danach das UFS wieder korrupt.

dmesg hab ich noch nicht nachgesehen. Werde ich heute Abend noch machen :)

Da diese Kiste ca. 10 Jahre alt ist, habe ich den Verdacht das a) Das Mainboard die Grätsche macht oder b) Das Netzteil nicht mehr gut ist. Leider ist das so eine ähnliche Kiste wie ein HP Microserver mit keinem Standard Netzteil.
 
Code:
mount: /dev/gpt/GELI-NAS-Backup.eli: R/W mount of /mnt/USB3.0-1T denied. Filesystem is not clean - run fsck.: Operation not permitted
Das das Dateisystem als unsauber markiert ist, spricht sehr dafür, dass es nicht sauber geunmountet wird. Also entweder gar kein unmount durchgeführt wird oder es die Platte nie erreicht. Da würde ich mal weiterstochern.
 
FreeBSD meldet erst mal nichts mehr. Wenn ich dann wieder ein Backup mache und 100 GB neu auf die Platte schubsen will ist danach das UFS wieder korrupt.
wenn das derart reproduzierbar ist, kann man doch alle möglichen Tests machen und so den Fehler eingrenzen. Etwa:
-kannst du einen anderen Port auf dem Mainboard benutzen?
-kannst du alternativ eine andere Platte versuchen? am Mainboard/über extern (was dann das Netzteil ausschließen könnte)?
-kannst du eine andere Stromversorgung benutzen?
-manuell mounten und unmounten, statt script zu verwenden
Und dann eben probieren, ob und wann der Fehler auftritt. Vielleicht gibt es noch mehr Möglichkeiten, etwa auch ein anderes FS mal zu versuchen (was das für einen Sinn macht, sehe ich gerade selbst nicht). Probleme machen ja sporadische Fehler, die man nicht zu greifen bekommt.
Immerhin scheinen die anderen Platten im NAS ja problemlos zu laufen und auch das System stabil zu bleiben. Bei defektem Netzteil oder MB würde ich eher auch hier Probleme erwarten.
 
Komisch .... heute war er nicht mehr erreichbar und musste per Power Schalter aus und an geschalten werden. Per ACPI ist er nicht mehr runter gefahren...... :(
 
Das dmesg zeigt:

Code:
Sep 26 03:02:30 nas-dsm kernel: GEOM_ELI: Device gpt/GELI-NAS-Backup.eli created.
Sep 26 03:02:30 nas-dsm kernel: GEOM_ELI: Encryption: AES-XTS 256
Sep 26 03:02:30 nas-dsm kernel: GEOM_ELI:     Crypto: software
Sep 26 03:02:30 nas-dsm kernel: WARNING: /mnt/USB3.0-1T: GJOURNAL flag on fs but no gjournal provider below
Sep 26 03:12:34 nas-dsm kernel: (ada1:ata0:0:1:0): WRITE_DMA48. ACB: 35 00 a2 c9 63 40 53 00 00 00 00 01
Sep 26 03:12:34 nas-dsm kernel: (ada1:ata0:0:1:0): CAM status: ATA Status Error
Sep 26 03:12:34 nas-dsm kernel: (ada1:ata0:0:1:0): ATA status: 51 (DRDY SERV ERR), error: 84 (ICRC ABRT )
Sep 26 03:12:34 nas-dsm kernel: (ada1:ata0:0:1:0): RES: 51 84 a1 ca 63 53 53 00 00 01 00
Sep 26 03:12:34 nas-dsm kernel: (ada1:ata0:0:1:0): Retrying command
Sep 26 03:12:34 nas-dsm kernel: (ada1:ata0:0:1:0): WRITE_DMA48. ACB: 35 00 a2 0d 65 40 53 00 00 00 00 01
Sep 26 03:12:34 nas-dsm kernel: (ada1:ata0:0:1:0): CAM status: ATA Status Error
Sep 26 03:12:34 nas-dsm kernel: (ada1:ata0:0:1:0): ATA status: 51 (DRDY SERV ERR), error: 84 (ICRC ABRT )
Sep 26 03:12:34 nas-dsm kernel: (ada1:ata0:0:1:0): RES: 51 84 81 0e 65 53 53 00 00 21 00
Sep 26 03:12:34 nas-dsm kernel: (ada1:ata0:0:1:0): Retrying command
Sep 26 03:12:35 nas-dsm kernel: (ada1:ata0:0:1:0): WRITE_DMA48. ACB: 35 00 a2 58 65 40 53 00 00 00 00 01
Sep 26 03:12:35 nas-dsm kernel: (ada1:ata0:0:1:0): CAM status: ATA Status Error
Sep 26 03:12:35 nas-dsm kernel: (ada1:ata0:0:1:0): ATA status: 51 (DRDY SERV ERR), error: 84 (ICRC ABRT )
Sep 26 03:12:35 nas-dsm kernel: (ada1:ata0:0:1:0): RES: 51 84 21 59 65 53 53 00 00 81 00
Sep 26 03:12:35 nas-dsm kernel: (ada1:ata0:0:1:0): Retrying command
Sep 26 03:12:35 nas-dsm kernel: (ada1:ata0:0:1:0): WRITE_DMA48. ACB: 35 00 a2 af 65 40 53 00 00 00 00 01
Sep 26 03:12:35 nas-dsm kernel: (ada1:ata0:0:1:0): CAM status: ATA Status Error
Sep 26 03:12:35 nas-dsm kernel: (ada1:ata0:0:1:0): ATA status: 51 (DRDY SERV ERR), error: 84 (ICRC ABRT )
Sep 26 03:12:35 nas-dsm kernel: (ada1:ata0:0:1:0): RES: 51 84 41 b0 65 53 53 00 00 61 00
Sep 26 03:12:35 nas-dsm kernel: (ada1:ata0:0:1:0): Retrying command
Sep 26 03:12:35 nas-dsm kernel: (ada1:ata0:0:1:0): WRITE_DMA48. ACB: 35 00 a2 4f 66 40 53 00 00 00 00 01
Sep 26 03:12:35 nas-dsm kernel: (ada1:ata0:0:1:0): CAM status: ATA Status Error
Sep 26 03:12:35 nas-dsm kernel: (ada1:ata0:0:1:0): ATA status: 51 (DRDY SERV ERR), error: 84 (ICRC ABRT )
Sep 26 03:12:35 nas-dsm kernel: (ada1:ata0:0:1:0): RES: 51 84 d1 4f 66 53 53 00 00 d1 00
Sep 26 03:12:35 nas-dsm kernel: (ada1:ata0:0:1:0): Retrying command
Sep 26 03:12:35 nas-dsm kernel: (ada1:ata0:0:1:0): WRITE_DMA48. ACB: 35 00 a2 85 66 40 53 00 00 00 00 01
Sep 26 03:12:35 nas-dsm kernel: (ada1:ata0:0:1:0): CAM status: ATA Status Error
Sep 26 03:12:35 nas-dsm kernel: (ada1:ata0:0:1:0): ATA status: 51 (DRDY SERV ERR), error: 84 (ICRC ABRT )
Sep 26 03:12:35 nas-dsm kernel: (ada1:ata0:0:1:0): RES: 51 84 c1 85 66 53 53 00 00 e1 00
Sep 26 03:12:35 nas-dsm kernel: (ada1:ata0:0:1:0): Retrying command
Sep 26 03:12:36 nas-dsm kernel: (ada1:ata0:0:1:0): WRITE_DMA48. ACB: 35 00 a2 aa 66 40 53 00 00 00 00 01
Sep 26 03:12:36 nas-dsm kernel: (ada1:ata0:0:1:0): CAM status: ATA Status Error
Sep 26 03:12:36 nas-dsm kernel: (ada1:ata0:0:1:0): ATA status: 51 (DRDY SERV ERR), error: 84 (ICRC ABRT )
Sep 26 03:12:36 nas-dsm kernel: (ada1:ata0:0:1:0): RES: 51 84 91 ab 66 53 53 00 00 11 00
Sep 26 03:12:36 nas-dsm kernel: (ada1:ata0:0:1:0): Retrying command
Sep 26 03:12:36 nas-dsm kernel: (ada1:ata0:0:1:0): WRITE_DMA48. ACB: 35 00 a2 c0 66 40 53 00 00 00 00 01
Sep 26 03:12:36 nas-dsm kernel: (ada1:ata0:0:1:0): CAM status: ATA Status Error
Sep 26 03:12:36 nas-dsm kernel: (ada1:ata0:0:1:0): ATA status: 51 (DRDY SERV ERR), error: 84 (ICRC ABRT )
Sep 26 03:12:36 nas-dsm kernel: (ada1:ata0:0:1:0): RES: 51 84 61 c1 66 53 53 00 00 41 00
...
Sep 26 03:15:33 nas-dsm kernel: (ada1:ata0:0:1:0): RES: 51 84 f1 36 d2 58 58 00 00 f1 00
Sep 26 03:15:33 nas-dsm kernel: (ada1:ata0:0:1:0): Error 5, Retries exhausted
Sep 26 03:15:33 nas-dsm kernel: GEOM_ELI: g_eli_write_done() failed (error=5) gpt/GELI-NAS-Backup.eli[WRITE(offset=762967851008, length=131072)]
Sep 26 03:15:33 nas-dsm kernel: g_vfs_done():gpt/GELI-NAS-Backup.eli[WRITE(offset=762967851008, length=131072)]error = 5
Sep 26 03:15:33 nas-dsm kernel: (ada1:ata0:0:1:0): WRITE_DMA48. ACB: 35 00 e2 37 d2 40 58 00 00 00 00 01
Sep 26 03:15:33 nas-dsm kernel: (ada1:ata0:0:1:0): CAM status: ATA Status Error
Sep 26 03:15:33 nas-dsm kernel: (ada1:ata0:0:1:0): ATA status: 51 (DRDY SERV ERR), error: 84 (ICRC ABRT )
Sep 26 03:15:33 nas-dsm kernel: (ada1:ata0:0:1:0): RES: 51 84 f1 37 d2 58 58 00 00 f1 00
Sep 26 03:15:33 nas-dsm kernel: (ada1:ata0:0:1:0): Retrying command
....
Das wiederholt sich....
 
Auch wenn die S.M.A.R.T Werte sagen alles OK, kann die HDD trotzdem einen def. haben. Kannst du eine andere testen. Hast du vielleicht noch ein OS auf der Disk installiert?
 
So eine Spinpoint M8 ist ja nun auch schon ein paar Jährchen alt, von daher würde ich die einfach aussortieren...
Vorausgesetzt, eine andere Platte lässt sich einwandfrei betreiben.
 
Die Festplatten haben an einem anderen Rechner problemlos funktioniert. Daher beende das "Leben" dieses alten Rechners ;) Ich hab jetzt neue Hardware bestellt. Neues MB, neue CPU, neues RAM, neues Netzteil, neues Gehäuse.

Da der Kernel GENERIC ist, sollte ich die alte Systemplatte problemlos in einem neuen Rechner nutzen können. Stimmt das so?
 
Da der Kernel GENERIC ist, sollte ich die alte Systemplatte problemlos in einem neuen Rechner nutzen können. Stimmt das so?
Im Prinzip schon.
Achten musst Du höchstens auf die Unterschiede UEFI vs. BIOS.
Aber aber praktisch jedes UEFI heutzutage heutzutage noch BIOS-Emulation bietet, sollte es selbst im Fall des Falles irgendwie bootbar sein.
 
Aber aber praktisch jedes UEFI heutzutage heutzutage noch BIOS-Emulation bietet, sollte es selbst im Fall des Falles irgendwie bootbar sein.
Das will Intel übrigens ändern. Ab ihrer 2020er Generation oder so soll die BIOS-Emulation rausfallen. Aber vielleicht überlegen sie es sich noch mal, nun wo es wieder Konkurrenz gibt, die das als Vorteil vermarkten könnte...
 
Das will Intel übrigens ändern. Ab ihrer 2020er Generation oder so soll die BIOS-Emulation rausfallen.
Ist natürlich zusätzlich Kram was man zusätzlich pflegen muss.
Generell begrüße ich nen Schritt zu weniger Komplexität.

Allerdings würde ich dann erstmal das ganze UEFI killen und durch ne schlanke Lösung a-la Coreboot ersetzen (den Sinn UEFI konnte mir ohnehin noch niemand klar machen; und ein ein buntes Setup zählt nicht).

Und wenn Intel schon mal dabei ist, dann dürfen sie auch gleich die Managment-Engine, Security-Extensions und ähnlichen Mist aus ihren CPUs entfernen.
 
Allerdings würde ich dann erstmal das ganze UEFI killen und durch ne schlanke Lösung a-la Coreboot ersetzen (den Sinn UEFI konnte mir ohnehin noch niemand klar machen; und ein ein buntes Setup zählt nicht).
Einverstanden.

Und wenn Intel schon mal dabei ist, dann dürfen sie auch gleich die Managment-Engine, Security-Extensions und ähnlichen Mist aus ihren CPUs entfernen.
Dem stimme ich vollumfänglich zu. Allerdings ist es eher Wunschdenken zu glauben, das das jemals passiert. Eher friert die Hölle zu.;)
 
Dem stimme ich vollumfänglich zu. Allerdings ist es eher Wunschdenken zu glauben, das das jemals passiert.
Mal sehen. Vielleicht findet ja doch noch ein umdenken statt.
Zumindest unter den Profis hat sich ja die Erkenntnis durchgesetzt, dass das alles Bullshit ist.

Alternativ gehen wir von der x86 Schiene weg und bauen z.B. irgendwas auf RISC V Basis. :-)
 
Zurück
Oben