Site Tools


linux:general:troubleshooting

Troubleshooting

Description

apt-get autoremove leads to a broken symlink which requires a reload of grub

apt-get autoremove
...
The link /vmlinuz.old is a damaged link
Removing symbolic link vmlinuz.old
 you may need to re-run your boot loader[grub]
The link /initrd.img.old is a damaged link
Removing symbolic link initrd.img.old
 you may need to re-run your boot loader[grub]

Reason

Broken symlinks

Fix

Run “update-grub”

update-grub
 Generating grub configuration file ...
 Found linux image: /boot/vmlinuz-3.13.0-157-generic
 Found initrd image: /boot/initrd.img-3.13.0-157-generic
 Found linux image: /boot/vmlinuz-3.13.0-153-generic
 Found initrd image: /boot/initrd.img-3.13.0-153-generic
 Found memtest86+ image: /boot/memtest86+.elf
 Found memtest86+ image: /boot/memtest86+.bin
 done



Linux starts in emergency mode - faulty logical volume (xfs)

Description

After entering your root password and opening “journalctl -xb” you find red entries, something like

kernel: [299102] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
kernel: [299102] ffff880308ed2000: c7 00 00 00 48 89 5c 24 08 48 89 74 24 10 57 48  ....H.\$.H.t$.WH
kernel: [299103] ffff880308ed2010: 83 ec 30 48 8d 54 24 50 e8 03 38 c9 ff 85 c0 0f  ..0H.T$P..8.....
kernel: [299104] ffff880308ed2020: 88 97 00 00 00 48 8b 5c 24 50 48 8d 54 24 58 48  .....H.\$PH.T$XH
kernel: [299104] ffff880308ed2030: 8b cb e8 e1 25 d2 ff 85 c0 78 74 48 8b 7c 24 58  ....%....xtH.|$X
kernel: [299114] XFS (dm-2): Metadata corruption detected at xfs_inode_buf_verify+0x66/0xc0 [xfs], xfs_inode block 0x13bfa0
kernel: [299115] XFS (dm-2): Unmount and run xfs_repair
kernel: [299115] XFS (dm-2): First 64 bytes of corrupted metadata buffer:
kernel: [299116] ffff880308ed2000: c7 00 00 00 48 89 5c 24 08 48 89 74 24 10 57 48  ....H.\$.H.t$.WH
kernel: [299116] ffff880308ed2010: 83 ec 30 48 8d 54 24 50 e8 03 38 c9 ff 85 c0 0f  ..0H.T$P..8.....
kernel: [299117] ffff880308ed2020: 88 97 00 00 00 48 8b 5c 24 50 48 8d 54 24 58 48  .....H.\$PH.T$XH
kernel: [299117] ffff880308ed2030: 8b cb e8 e1 25 d2 ff 85 c0 78 74 48 8b 7c 24 58  ....%....xtH.|$X
kernel: [299189] XFS (dm-2): metadata I/O error: block 0x13bfa0 ("xfs_trans_read_buf_map") error 117 numblks 16
kernel: [299195] XFS (dm-2): xfs_imap_to_bp: xfs_trans_read_buf() returned error -117.
kernel: [299196] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 3519 of file ../fs/xfs/xfs_inode.c.  Return address = 0xffffffffa02d4192
kernel: [299200] XFS (dm-2): Corruption of in-memory data detected.  Shutting down filesystem
kernel: [299200] XFS (dm-2): Please umount the filesystem and rectify the problem(s)
kernel: [372139] XFS (dm-2): xfs_log_force: error -5 returned

Reason

The server might have KVM installed which messed something up…
Serverfault

Fix

Check “journalctl -xb” to find out which LV is corrupted
Get the right LV for “dm-X”

dmsetup info /dev/dm-2
then have a look at your disks
df -h
and mounts
mount
the LV should not be mounted.

Try to repair the filesystem

xfs_repair /dev/mapper/VG02-LVdata
If this fails because of the journal log, try resetting the log
xfs_repair -L /dev/mapper/VG02-LVdata
If completed, rerun the first xfs_repair (without -L) again.
If successful, try to mount the device
mount -a
and check your filesystem
df -h
ll /data

Finally restart your system and pray…



ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock'

Description

You can connect to your MySQL with 127.0.0.1 but not with localhost

mysql -h127.0.0.1 -uroot -p
#Welcome to the MySQL monitor...

mysql -hlocalhost -uroot -p
#ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock'

Reason

MySQL will try to connect to the unix socket if you tell it to connect to localhost. If you tell it to connect to 127.0.0.1 you are forcing it to connect to the network socket.

Fix

Verify if the socket is really your problem.

mysql --print-defaults

mysql would have been started with the following arguments:
--port=3306 --socket=/var/run/mysqld/mysqld.sock
mysqld --print-defaults

mysqld would have been started with the following arguments:
--user=mysql --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock --port=3306...

Compare the defaults for “socket”, both should be the same. If you get a different socket for your client, try to connect to your database by using the same as specified for mysqld

mysql --socket=/var/run/mysqld/mysqld.sock -hlocalhost -uroot -p

If that worked, check if you have specified a socket for both, the client and the daemon in your my.cnf

...
[client]
port            = 3306
socket          = /var/run/mysqld/mysqld.sock
...
[mysqld]
port            = 3306
socket          = /var/run/mysqld/mysqld.sock
...

If this is the case, check if your my.cnf is in one of the following folders. The default options are read from these files in the given order:

/etc/mysql/my.cnf 
/etc/my.cnf 
~/.my.cnf       

If you dont have your my.cnf in one of the folders, create a symlink at one of these locations referencing your config file

example:
ln -s /usr/local/mysql/etc/my.cnf /etc/my.cnf

The connection should work now.

Bug: soft lockup in messages

Description

You can find multiple “Bug: soft lockup” entries in /var/log/messages or journalctl

May 25 07:23:59 XXXXXXX kernel: [13445315.881356] BUG: soft lockup - CPU#16 stuck for 23s! [yyyyyyy:81602]

Reason

A 'soft lockup' is defined as a bug that causes the kernel to loop in kernel mode for more than 20 seconds without giving other tasks a chance to run. The watchdog daemon will send an non-maskable interrupt (NMI) to all CPUs in the system who, in turn, print the stack traces of their currently running tasks.

-SUSE KB 7017652

Fix

Solution 1:

Restart your system and/ or decrease your CPU load.

Solution 2:

Increase the time (default 10) before soft lockups are fired.

echo 20 > /proc/sys/kernel/watchdog_thresh
or
echo "kernel.watchdog_thresh=20" > /etc/sysctl.d/99-watchdog_thresh.conf
 
sysctl -p  /etc/sysctl.d/99-watchdog_thresh.conf


systemctl runs in timeout

Description

In this example, installation of docker-ce with the following command doesnt work

curl -sSL https://get.docker.com | sh

# Executing docker install script, commit: f45d7c11389849ff46a6b4d94e0dd1ffebca32c1
+ sh -c apt-get update -qq >/dev/null
+ sh -c DEBIAN_FRONTEND=noninteractive apt-get install -y -qq apt-transport-https ca-certificates curl >/dev/null
+ sh -c curl -fsSL "https://download.docker.com/linux/ubuntu/gpg" | apt-key add -qq - >/dev/null
+ sh -c echo "deb [arch=amd64] https://download.docker.com/linux/ubuntu xenial stable" > /etc/apt/sources.list.d/docker.list
+ sh -c apt-get update -qq >/dev/null
+ [ -n  ]
+ sh -c apt-get install -y -qq --no-install-recommends docker-ce >/dev/null

Broadcast message from systemd-journald@lunetikk (Wed 2019-10-23 00:22:12 CEST):

systemd[1]: Caught <SEGV>, dumped core as pid 26368.


Broadcast message from systemd-journald@lunetikk (Wed 2019-10-23 00:22:12 CEST):

systemd[1]: Freezing execution.

E: Sub-process /usr/bin/dpkg returned an error code (1)

Rerun “apt-get install docker-ce” shows the following

apt-get install docker-ce
Reading package lists... Done
Building dependency tree
Reading state information... Done
docker-ce is already the newest version (5:19.03.4~3-0~ubuntu-xenial).
After this operation, 0 B of additional disk space will be used.
Do you want to continue? [Y/n]
Setting up docker-ce (5:19.03.4~3-0~ubuntu-xenial) ...
Failed to execute operation: Connection timed out
Failed to execute operation: Connection timed out
Failed to retrieve unit state: Connection timed out
Failed to start docker.service: Connection timed out
See system logs and 'systemctl status docker.service' for details.
invoke-rc.d: initscript docker, action "start" failed.
Failed to get properties: Connection timed out
dpkg: error processing package docker-ce (--configure):
 subprocess installed post-installation script returned error exit status 1
Errors were encountered while processing:
 docker-ce
E: Sub-process /usr/bin/dpkg returned an error code (1)

You cant reconfigure

dpkg-reconfigure docker-ce
/usr/sbin/dpkg-reconfigure: docker-ce is broken or not fully installed

Listing the units for “systemctl status” runs in timeout

systemctl status docke<TAB>
Failed to list unit files: Connection timed out
Failed to list units: Connection timed out
Failed to list unit files: Connection timed out

Reason


In my case, my disk was “inconsistent”. Reboot got me stuck in busybox.


Fix


I was able to run “fsck.ext4 /dev/vda2” to fix the orphaned inodes

Reboot after this got me back onto my system and “systemctl” was working again.


linux/general/troubleshooting.txt · Last modified: 2020/12/03 15:12 by lunetikk