Sometimes, computers misbehave. And very often, you want to know why. Let’s look at some tools that help you do that!
But first, let’s make sure you’re able to do introspection. Often,
system introspection requires that you have certain privileges, like
being the member of a group (like
power for shutdown). The
is the ultimate privilege; they can do pretty much anything. You can run
a command as
root (but be careful!) using
If something goes wrong, the first place to start is to look at what happened around the time when things went wrong. For this, we need to look at logs.
Traditionally, logs were all stored in
/var/log, and many still are.
Usually there’s a file or folder per program. Use
find your way through them.
There’s also a kernel log that you can see using the
This used to be available as a plain-text file, but nowadays you often
have to go through
dmesg to get at it.
Finally, there is the “system log”, which is increasingly where all of
your log messages go. On most, though not all, Linux systems, that log
is managed by
systemd, the “system daemon”, which controls all the
services that run in the background (and much much more at this point).
That log is accessible through the somewhat inconvenient
tool if you are root, or part of the
journalctl, you should be aware of these flags in particular:
-u UNIT: show only messages related to the given systemd service
--full: don’t truncate long lines (the stupidest feature)
-b: only show messages from the latest boot (see also
-n100: only show last 100 entries
What is happening?
If something is wrong, or you just want to get a feel for what’s going on in your system, you have a number of tools at your disposal for inspecting the currently running system:
top, and the improved version
htop, which show you
various statistics for the currently running processes on the system.
CPU use, memory use, process trees, etc. There are lots of shortcuts,
t is particularly useful for enabling the tree view. You can also
see the process tree with
-p to include PIDs). If you want
to know what those programs are doing, you’ll often want to tail their
dmesg -w, and
tail -f are you friends
Sometimes, you want to know more about the resources being used overall
on your system.
excellent for that. It gives you real-time resource metrics for lots of
different subsystems like I/O, networking, CPU utilization, context
switches, and the like.
man dstat is the place to start.
If you’re running out of disk space, there are two primary utilities
you’ll want to know about:
du. The former shows you the
status of all the partitions on your system (try it with
the latter measures the size of all the folders you give it, including
their contents (see also
To figure out what network connections you have open,
ss is the way to
ss -t will show all open TCP connections.
ss -tl will show all
listening (i.e., server) ports on your system.
-p will also include
which process is using that connection, and
-n will give you the raw
There are many ways to configure your system, but we’ll got through
two very common ones: networking and services. Most applications on your
system tell you how to configure them in their manpage, and usually it
will involve editing files in
/etc; the system configuration
If you want to configure your network, the
ip command lets you do
that. Its arguments take on a slightly weird form, but
ip help command
will get you pretty far.
ip addr shows you information about your
network interfaces and how they’re configured (IP addresses and such),
ip route shows you how network traffic is routed to different
network hosts. Network problems can often be resolved purely through the
ip tool. There’s also
iw for managing wireless network interfaces.
ping is a handy tool for checking how deeply things are broken. Try
pinging a hostname (google.com), an external IP address (18.104.22.168), and
an internal IP address (192.168.1.1 or default gw). You may also want to
/etc/resolv.conf to check your DNS settings (how hostnames
are resolved to IP addresses).
To configure services, you pretty much have to interact with
these days, for better or for worse. Most services on your system will
have a systemd service file that defines a systemd unit. These files
define what command to run when that services is started, how to stop
it, where to log things, etc. They’re usually not too bad to read, and
you can find most of them in
/usr/lib/systemd/system/. You can also
define your own in
Once you have a systemd service in mind, you use the
to interact with it.
systemctl enable UNIT will set the service to
start on boot (
disable removes it again), and
restart will do what you expect. If something goes wrong, systemd will
let you know, and you can use
journalctl -u UNIT to see the
application’s log. You can also use
systemctl status to see how all
your system services are doing. If your boot feels slow, it’s probably
due to a couple of slow services, and you can use
blame) to figure out which ones.