Site Tools


linux:general:troubleshooting

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
linux:general:troubleshooting [2018/07/05 12:49] lunetikklinux:general:troubleshooting [2020/12/03 15:12] (current) – [Linux starts in emergency mode - faulty logical volume (xfs)] lunetikk
Line 1: Line 1:
 ===== Troubleshooting ===== ===== Troubleshooting =====
 +
 +==== Removing old kernels leads to broken symlinks ====
 +
 +=== Description ===
 +
 +apt-get autoremove leads to a broken symlink which requires a reload of grub
 +
 +<code>
 +apt-get autoremove
 +...
 +The link /vmlinuz.old is a damaged link
 +Removing symbolic link vmlinuz.old
 + you may need to re-run your boot loader[grub]
 +The link /initrd.img.old is a damaged link
 +Removing symbolic link initrd.img.old
 + you may need to re-run your boot loader[grub]
 +</code>
 +
 +=== Reason === 
 +
 +Broken symlinks
 +
 +=== Fix === 
 +
 +Run "update-grub"
 +<code>
 +update-grub
 + Generating grub configuration file ...
 + Found linux image: /boot/vmlinuz-3.13.0-157-generic
 + Found initrd image: /boot/initrd.img-3.13.0-157-generic
 + Found linux image: /boot/vmlinuz-3.13.0-153-generic
 + Found initrd image: /boot/initrd.img-3.13.0-153-generic
 + Found memtest86+ image: /boot/memtest86+.elf
 + Found memtest86+ image: /boot/memtest86+.bin
 + done
 +</code>
 +
 +\\
 +\\
  
 ==== Linux starts in emergency mode - faulty logical volume (xfs) ==== ==== Linux starts in emergency mode - faulty logical volume (xfs) ====
  
-== Description ==+=== Description ===
  
 {{:linux:general:linuxemergencymode.png|}} {{:linux:general:linuxemergencymode.png|}}
Line 30: Line 69:
 </code> </code>
  
-== Reason == +=== Reason === 
 The server might have KVM installed which messed something up...\\ The server might have KVM installed which messed something up...\\
 [[https://serverfault.com/questions/897842/corruption-of-in-memory-data-detected-where-does-the-issue-lie|Serverfault]] [[https://serverfault.com/questions/897842/corruption-of-in-memory-data-detected-where-does-the-issue-lie|Serverfault]]
 \\ \\
  
-== Fix == +=== Fix === 
 Check "journalctl -xb" to find out which LV is corrupted\\ Check "journalctl -xb" to find out which LV is corrupted\\
-Get the right LV for "dm-X" <code>dmsetup info /dev/dm-2</code>then have a look at your disks  +Get the right LV for "dm-X" <code>dmsetup info /dev/dm-2</code>then have a look at your disks  
-<code>df -h</code> and mounts <code>mount</code>the LV should not be mounted.\\+<code>df -h</code> and mounts <code>mount</code>the LV should not be mounted.\\
  
 Try to repair the filesystem Try to repair the filesystem
Line 55: Line 94:
 Finally restart your system and pray... Finally restart your system and pray...
  
-Have a look at this website for more xfs_repair related info\\ 
-[[http://fibrevillage.com/storage/666-how-to-repair-a-xfs-filesystem|fibrevillage.com - How to repair a xfs filesystem]] 
 \\ \\
 \\ \\
 +==== ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' ====
 +
 +=== Description ===
 +
 +You can connect to your MySQL with 127.0.0.1 but not with localhost
 +
 +<code>
 +mysql -h127.0.0.1 -uroot -p
 +#Welcome to the MySQL monitor...
 +
 +mysql -hlocalhost -uroot -p
 +#ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock'
 +</code>
 +
 +=== Reason === 
 +
 +MySQL will try to connect to the unix socket if you tell it to connect to localhost. If you tell it to connect to 127.0.0.1 you are forcing it to connect to the network socket.
 +
 +=== Fix === 
 +
 +Verify if the socket is really your problem. 
 +
 +<code>
 +mysql --print-defaults
 +
 +mysql would have been started with the following arguments:
 +--port=3306 --socket=/var/run/mysqld/mysqld.sock
 +</code>
 +<code>
 +mysqld --print-defaults
 +
 +mysqld would have been started with the following arguments:
 +--user=mysql --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock --port=3306...
 +</code>
 +
 +Compare the defaults for "socket", both should be the same. If you get a different socket for your client, try to connect to your database by using the same as specified for mysqld
 +<code>mysql --socket=/var/run/mysqld/mysqld.sock -hlocalhost -uroot -p</code>
 +
 +If that worked, check if you have specified a socket for both, the client and the daemon in your my.cnf
 +<code>
 +...
 +[client]
 +port            = 3306
 +socket          = /var/run/mysqld/mysqld.sock
 +...
 +[mysqld]
 +port            = 3306
 +socket          = /var/run/mysqld/mysqld.sock
 +...
 +</code>
 +
 +If this is the case, check if your my.cnf is in one of the following folders. The default options are read from these files in the given order:
 +<code>
 +/etc/mysql/my.cnf 
 +/etc/my.cnf 
 +~/.my.cnf       
 +</code>
 +
 +If you dont have your my.cnf in one of the folders, create a symlink at one of these locations referencing your config file
 +<code>
 +example:
 +ln -s /usr/local/mysql/etc/my.cnf /etc/my.cnf
 +</code>
 +
 +The connection should work now. 
 +\\
 +\\
 +
 +==== Bug: soft lockup in messages ====
 +
 +=== Description ===
 +
 +You can find multiple "Bug: soft lockup" entries in /var/log/messages or journalctl
 +
 +<code>
 +May 25 07:23:59 XXXXXXX kernel: [13445315.881356] BUG: soft lockup - CPU#16 stuck for 23s! [yyyyyyy:81602]
 +</code>
 +
 +=== Reason === 
 +
 +>A 'soft lockup' is defined as a bug that causes the kernel to loop in kernel mode for more than 20 seconds without giving other tasks a chance to run. The watchdog daemon will send an non-maskable interrupt (NMI) to all CPUs in the system who, in turn, print the stack traces of their currently running tasks. 
 +-SUSE KB [[https://www.suse.com/support/kb/doc/?id=7017652|7017652]]
 +
 +=== Fix === 
 +
 +__Solution 1:__
 +
 +Restart your system and/ or decrease your CPU load.
 +
 +__Solution 2:__
 +
 +Increase the time (default 10) before soft lockups are fired.
 +
 +<code bash >echo 20 > /proc/sys/kernel/watchdog_thresh</code> 
 +or
 +<code bash>
 +echo "kernel.watchdog_thresh=20" > /etc/sysctl.d/99-watchdog_thresh.conf
 +
 +sysctl -p  /etc/sysctl.d/99-watchdog_thresh.conf
 +</code>
 +\\
 +\\
 +
 +
 +==== systemctl runs in timeout ====
 +
 +=== Description ===
 +
 +In this example, installation of docker-ce with the following command doesnt work
 +<code>
 +curl -sSL https://get.docker.com | sh
 +
 +# Executing docker install script, commit: f45d7c11389849ff46a6b4d94e0dd1ffebca32c1
 ++ sh -c apt-get update -qq >/dev/null
 ++ sh -c DEBIAN_FRONTEND=noninteractive apt-get install -y -qq apt-transport-https ca-certificates curl >/dev/null
 ++ sh -c curl -fsSL "https://download.docker.com/linux/ubuntu/gpg" | apt-key add -qq - >/dev/null
 ++ sh -c echo "deb [arch=amd64] https://download.docker.com/linux/ubuntu xenial stable" > /etc/apt/sources.list.d/docker.list
 ++ sh -c apt-get update -qq >/dev/null
 ++ [ -n  ]
 ++ sh -c apt-get install -y -qq --no-install-recommends docker-ce >/dev/null
 +
 +Broadcast message from systemd-journald@lunetikk (Wed 2019-10-23 00:22:12 CEST):
 +
 +systemd[1]: Caught <SEGV>, dumped core as pid 26368.
 +
 +
 +Broadcast message from systemd-journald@lunetikk (Wed 2019-10-23 00:22:12 CEST):
 +
 +systemd[1]: Freezing execution.
 +
 +E: Sub-process /usr/bin/dpkg returned an error code (1)
 +</code>
 +
 +Rerun "apt-get install docker-ce" shows the following
 +<code>
 +apt-get install docker-ce
 +Reading package lists... Done
 +Building dependency tree
 +Reading state information... Done
 +docker-ce is already the newest version (5:19.03.4~3-0~ubuntu-xenial).
 +After this operation, 0 B of additional disk space will be used.
 +Do you want to continue? [Y/n]
 +Setting up docker-ce (5:19.03.4~3-0~ubuntu-xenial) ...
 +Failed to execute operation: Connection timed out
 +Failed to execute operation: Connection timed out
 +Failed to retrieve unit state: Connection timed out
 +Failed to start docker.service: Connection timed out
 +See system logs and 'systemctl status docker.service' for details.
 +invoke-rc.d: initscript docker, action "start" failed.
 +Failed to get properties: Connection timed out
 +dpkg: error processing package docker-ce (--configure):
 + subprocess installed post-installation script returned error exit status 1
 +Errors were encountered while processing:
 + docker-ce
 +E: Sub-process /usr/bin/dpkg returned an error code (1)
 +</code>
 +
 +You cant reconfigure
 +<code>
 +dpkg-reconfigure docker-ce
 +/usr/sbin/dpkg-reconfigure: docker-ce is broken or not fully installed
 +</code>
 +
 +Listing the units for "systemctl status" runs in timeout
 +<code>
 +systemctl status docke<TAB>
 +Failed to list unit files: Connection timed out
 +Failed to list units: Connection timed out
 +Failed to list unit files: Connection timed out
 +</code>
 +
 +=== Reason ===  
 +\\
 +
 +In my case, my disk was "inconsistent". Reboot got me stuck in busybox. \\
 +
 +{{:linux:general:pasted:20191023-005512.png}}\\
 +
 +=== Fix ===  
 +\\
 +
 +I was able to run "fsck.ext4 /dev/vda2" to fix the orphaned inodes
 +
 +{{:linux:general:pasted:20191023-005629.png}}
 +
 +{{:linux:general:pasted:20191023-005648.png}}
 +
 +Reboot after this got me back onto my system and "systemctl" was working again.
 +
 +\\
 +
linux/general/troubleshooting.1530787752.txt.gz · Last modified: 2018/07/05 12:49 by lunetikk