Server Monitoring Explained: What to Track and Why It Matters

Understand server monitoring fundamentals: what metrics to track, the difference between server and website monitoring, and how to set up effective alerting.

Last updated: 2026-02-18

What Server Monitoring Is

Server monitoring is the practice of tracking a server's health, performance, and availability through metrics like CPU usage, memory consumption, disk space, and network throughput. It answers the question: is the infrastructure running correctly?

This is different from website monitoring, which answers a different question: is the user experience working correctly? Both are essential, and understanding the distinction helps you build a monitoring strategy without gaps.

Server Monitoring vs Website Monitoring

These two approaches monitor the same system from opposite perspectives.

AspectServer MonitoringWebsite Monitoring
PerspectiveInternal (from inside the server)External (from the user's perspective)
What it checksCPU, memory, disk, processes, logsHTTP response, SSL, DNS, availability
Answers the question"Is the server healthy?""Can users reach the site?"
DetectsResource exhaustion, process crashes, disk fullOutages, slow responses, certificate errors, DNS changes
Blind spotsNetwork issues between server and user, DNS problems, CDN failuresInternal server metrics, why the server is struggling
Typical toolsPrometheus, Grafana, Datadog, New RelicSite Watcher, Pingdom, UptimeRobot
Requires server accessYes (agent installation)No (checks from outside)

A server can have healthy metrics (low CPU, plenty of memory, disk space available) while the website is completely unreachable due to a DNS misconfiguration, an expired SSL certificate, or a firewall rule blocking external traffic. Conversely, server monitoring can show a CPU at 100% and a database running out of connections while the website still responds to basic health checks.

You need both perspectives. Server monitoring tells you why something is wrong. Website monitoring tells you that something is wrong from the user's point of view.

The most dangerous outages are the ones where server metrics look fine but users cannot reach your site. These are caused by DNS, SSL, CDN, or network issues that only external monitoring can detect.

Key Server Metrics to Monitor

CPU Usage

CPU usage indicates how much processing capacity the server is consuming. Sustained high CPU (above 80-90%) means the server is under heavy load and may start dropping requests or responding slowly.

Watch for: sustained high CPU (not spikes — brief spikes during deployments or cron jobs are normal), a steady upward trend over days or weeks (indicates growing load), and CPU wait time (indicates the CPU is waiting for disk I/O, which points to a storage bottleneck rather than a compute bottleneck).

Memory and Swap

Memory usage shows how much RAM the server is consuming. When physical RAM is exhausted, the operating system starts using swap space (disk-based virtual memory), which is orders of magnitude slower.

Watch for: memory usage consistently above 85%, any swap usage on a production server (swap means RAM is insufficient), and the OOM (Out of Memory) killer activating, which forcefully terminates processes to free memory — often your web server or database.

Disk Space and I/O

Disk monitoring covers both capacity (how full the disk is) and performance (how fast it can read and write).

Watch for: disk usage above 80% (give yourself a buffer — log files and temporary files can consume space rapidly), high disk I/O wait times (indicates the disk is a bottleneck), and inode exhaustion (the disk can run out of inodes before running out of space, making it impossible to create new files even with available disk space).

Network

Network monitoring tracks bandwidth usage, connection counts, and packet loss.

Watch for: bandwidth approaching your server's or plan's limits, a high number of established connections (could indicate a connection leak or a DDoS attack), packet loss (indicates network instability between your server and the internet), and unusual outbound traffic (could indicate a compromised server).

Process-Level Monitoring

Beyond system-level metrics, monitor the specific processes that serve your application.

Watch for: web server (nginx, Apache) process count and status, database (PostgreSQL, MySQL) connection pool usage and query latency, application processes (Node.js, Python, PHP-FPM) — are they running and responsive, and queue workers and background job processors — are they consuming jobs or falling behind.

Add External Monitoring to Your Server Stack

Server monitoring tells you what is happening inside. Site Watcher tells you what users experience outside — uptime, SSL, DNS, domain expiry, and vendor health. Free for 3 targets.

Server Monitoring Approaches

Agent-Based Monitoring

Agent-based tools install a small program (agent) on your server that collects metrics and sends them to a central platform.

Examples: Prometheus with node_exporter, Datadog Agent, New Relic Infrastructure, Zabbix Agent.

Pros: Deep visibility into every metric — CPU, memory, disk, network, processes, custom application metrics. Can collect metrics at high frequency (every 10-15 seconds). Supports custom checks tailored to your specific application.

Cons: Requires server access to install and maintain the agent. The agent itself consumes resources (usually minimal). You are responsible for keeping the agent updated. If the server goes completely offline, the agent cannot report — you lose visibility at exactly the moment you need it most.

Agentless / External Monitoring

External monitoring checks your services from outside your infrastructure. No software is installed on your server. The monitoring service sends HTTP requests, DNS queries, or TCP connections from its own servers and reports what it sees.

Examples: Site Watcher, Pingdom, UptimeRobot, StatusCake.

Pros: No server access required. Works with any hosting setup (shared hosting, managed platforms, serverless). Reflects the actual user experience — if the monitoring tool cannot reach your site, neither can your users. Catches network, DNS, SSL, and CDN issues that agent-based monitoring misses.

Cons: Cannot see internal server metrics. Cannot tell you why the server is slow — only that it is slow (or down). Limited to what is observable from outside.

Using Both Together

The ideal monitoring setup combines both approaches.

1

Set Up External Website Monitoring First

Start with external monitoring (uptime, SSL, DNS, domain) because it catches the issues that directly impact users. This requires no server access and takes minutes to configure.

2

Add Server-Level Metrics

Install an agent (or use your hosting platform's built-in metrics) to track CPU, memory, disk, and network. This gives you the diagnostic data to investigate issues detected by external monitoring.

3

Configure Application-Level Checks

Add monitoring for your specific application stack: database connections, queue depth, cache hit rates, error rates. These are the early warning signals that predict outages before they happen.

4

Unify Alerting

Route all alerts through a single channel (Slack, PagerDuty, email) so your team has one place to look during an incident. External monitoring saying "site is down" combined with server monitoring saying "CPU at 100%" immediately tells you the story.

Common Server Monitoring Mistakes

Monitoring everything equally. Not all metrics are equally important. A CPU spike during a daily backup is normal. Disk space at 95% is an emergency. Set alert thresholds that match the actual risk — critical for disk space and process health, warning-only for temporary CPU spikes.

No baseline. Without knowing what "normal" looks like, you cannot identify abnormal. Observe metrics for 1-2 weeks before setting alert thresholds. A server that normally runs at 60% CPU should alert at 85%. A server that normally runs at 20% should alert at 50%.

Alert fatigue. If your team receives 50 alerts a day, they stop paying attention. Every alert should be actionable. If an alert fires and the response is "ignore it," the threshold is wrong. Reduce noise ruthlessly.

Ignoring disk I/O. Teams monitor CPU and memory religiously but ignore disk I/O. A database running on a slow disk can cause response times to spike even when CPU and memory look healthy. Monitor disk read/write latency, not just disk space.

Not monitoring the monitor. If your monitoring server goes down, you lose all visibility. Self-hosted monitoring (Prometheus, Uptime Kuma) needs its own health checks. External monitoring services handle this for you because they run on redundant infrastructure.

The most common monitoring gap is having server monitoring but no external monitoring. Your Grafana dashboard shows green across the board while users see a DNS error because someone accidentally deleted an A record. Always monitor from the outside in.

Server monitoring shows you the engine. Website monitoring shows you the road. You need both to know where you are going and whether you will get there.

Complete Your Monitoring Stack

Site Watcher provides the external monitoring layer — uptime, SSL, DNS, domain expiry, and vendor dependencies — that complements your server-side tools. $39/mo unlimited. Free for 3 targets.