Operations · CTF

Blue Team Operations Guide

Want to see some code?

Monitoring Setup

Install Splunk Enterprise and forwarders on every Linux and Windows host before go-live.

Open Monitoring Guide

Detection Queries

Ready-to-paste SPL alerts and dashboards for Linux, Windows, and server services.

Open Dashboards & Alerts

Overview & Priorities

Your role is detection and response · not pentesting, not patching everything in sight. The red team will get in. Your job is to know when, where, and how, fast enough to contain it.

Priority order during prep

1. Asset inventory · you can't defend what you don't know exists.

2. Credential rotation · assume the red team has every default.

3. Baseline snapshots · you need a "known good" for diffing later.

4. Splunk ingestion · get logs flowing ASAP; you're blind without them.

5. Hardening · lock down services, disable junk, firewall rules.

6. Dashboards & alerts · build the cockpit you'll watch live.

7. Backups · snapshot everything before go-live.

Team roles (3-person)

Watcher

Eyes on Splunk dashboards, triages alerts, logs every event in the running journal.

Responder

Investigates the Watcher's leads · runs commands on hosts, pulls files, decides containment.

Reserve / Sleep

Off-shift, recovering. Rotates in for the next slot. Stay disciplined · don't burn out hour 2.

Shift Schedule

Three people · 2.5 days · 8-hour shifts · always 2 active, 1 on reserve/sleep.

Slot	00–08	08–16	16–24
Day 1	P3 sleep	P1 + P2	P2 + P3
Day 2	P3 + P1	P1 + P2	P2 + P3
Day 3	P3 + P1	P1 + P2	·

Tip Shift overlaps by 30 min · outgoing person briefs incoming using the handoff template. Never end a shift mid-incident without a verbal walkthrough.

Golden Rules

Log every action you take · timestamp, host, command, outcome.
Never act on a host alone · buddy-check destructive commands before running.
Write down findings immediately · if it's not in the journal, it didn't happen.
Don't reboot servers without explicit team agreement · you may destroy live evidence.
Trust nothing, verify twice · false positives come fast and waste cycles.
If you panic, stand up. Walk 60 seconds. Then act.
The red team wants you tunnel-visioned · keep the dashboard rotation discipline.

Asset Inventory

Build a master list of every host, IP, role, OS, and service before anything else. Without it you are blind.

Network discovery

bash · quick scan

# Discover live hosts on the local /24
sudo nmap -sn 192.168.1.0/24 -oA hosts_alive

# Service/version scan on discovered hosts
sudo nmap -sV -sC -O -iL hosts_alive.gnmap -oA services

# Quick top-1000 sweep
sudo nmap -T4 --top-ports 1000 192.168.1.0/24 -oA top1000

Inventory template

IP	Hostname	OS	Role	Owner
10.0.0.10	dc01	Win 2022	Domain Controller	P1
10.0.0.20	web01	Ubuntu 22	Apache + PHP	P2
10.0.0.30	db01	Debian 12	MySQL	P2
10.0.0.40	splunk	Ubuntu 22	SIEM	P3

Baseline Snapshots

You need a "known good" of every host. Hash everything you can. When something feels off later, you diff against this.

bash · Linux baseline

# Hash all binaries in PATH
find /usr/bin /usr/sbin /bin /sbin -type f -exec sha256sum {} \; > /root/baseline-bins.txt

# Capture state of services, listeners, users, cron
systemctl list-units --type=service --state=running > /root/baseline-services.txt
ss -tlnp > /root/baseline-listeners.txt
cat /etc/passwd /etc/shadow /etc/group > /root/baseline-accounts.txt
crontab -l; ls -la /etc/cron* > /root/baseline-cron.txt

# Pull the baseline off-host immediately
scp /root/baseline-*.txt blue@splunk:/var/baselines/$(hostname)/

PowerShell · Windows baseline

Get-Service | Where Status -eq Running | Export-Csv baseline-services.csv
Get-Process | Export-Csv baseline-procs.csv
Get-NetTCPConnection -State Listen | Export-Csv baseline-listeners.csv
Get-LocalUser; Get-LocalGroupMember Administrators | Export-Csv baseline-admins.csv
Get-ScheduledTask | Export-Csv baseline-tasks.csv

Harden Linux

SSH

/etc/ssh/sshd_config

PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
PermitEmptyPasswords no
MaxAuthTries 3
ClientAliveInterval 300
ClientAliveCountMax 2
AllowUsers blue
LoginGraceTime 30

bash

sudo sshd -t  # validate config
sudo systemctl restart sshd

fail2ban (SSH protection)

bash

sudo apt install -y fail2ban
sudo cp /etc/fail2ban/jail.conf /etc/fail2ban/jail.local
# in jail.local: bantime = 1h, maxretry = 3, [sshd] enabled = true
sudo systemctl enable --now fail2ban

Quick wins

Disable unused services: systemctl disable --now <svc>
Update packages: apt update && apt upgrade -y
Enable unattended-upgrades for security patches
Lock all unused user accounts: passwd -l <user>
Set strong umask (027) in /etc/profile
Enable auditd and forward to Splunk
Install AIDE for file integrity monitoring (aide --init && aide --check)

Harden Windows

Local policy quick-wins

Rename and disable the local Administrator account; create a fresh admin
Enforce password policy: 14+ chars, complexity, lockout after 5 fails
Enable Windows Defender + tamper protection; ensure real-time scan is on
Disable SMBv1: Disable-WindowsOptionalFeature -Online -FeatureName SMB1Protocol
Disable LLMNR + NetBIOS broadcast (mitigate Responder/MITM)
Enable PowerShell logging: ScriptBlock + Module + Transcription
Enable Windows Event Forwarding or Splunk UF for Security/System/PowerShell logs

Disable LLMNR (GPO)

GPO Path

Computer Configuration → Administrative Templates
  → Network → DNS Client
    → Turn Off Multicast Name Resolution = Enabled

PowerShell logging

PowerShell (Admin)

Set-ItemProperty -Path "HKLM:\Software\Policies\Microsoft\Windows\PowerShell\ScriptBlockLogging" -Name EnableScriptBlockLogging -Value 1
Set-ItemProperty -Path "HKLM:\Software\Policies\Microsoft\Windows\PowerShell\ModuleLogging" -Name EnableModuleLogging -Value 1

Note For deeper AD hardening · Mimikatz, Pass-the-Hash, krbtgt, Kerberoasting · see the Targetted Hardening page.

Firewall / Network Device Hardening

UFW (Ubuntu) baseline

bash

sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow from 10.0.0.0/24 to any port 22  # SSH from LAN only
sudo ufw allow 80,443/tcp                            # web
sudo ufw allow from 10.0.0.40 to any port 9997  # Splunk receive
sudo ufw enable
sudo ufw status verbose

Network device principles

Change default credentials on every router, switch, AP
Disable Telnet/HTTP · SSH/HTTPS only
ACLs: deny inbound by default, allow only what's needed
Egress filter from DC: no internet from domain controllers
Segment workstations from servers (separate VLANs if possible)
Log to Splunk via syslog

Credential Rotation

Assume every default password is already burned. Rotate everything in the first hour.

Linux · bulk password reset

bash (run as root)

# Generate strong random password
openssl rand -base64 24

# Force password change on next login
passwd <user>
chage -d 0 <user>

# Lock dormant accounts
usermod -L <user>

Windows / AD

PowerShell (Admin / DC)

# Force password reset for all enabled users
Get-ADUser -Filter {Enabled -eq $true} |
  Set-ADUser -ChangePasswordAtLogon $true

# Reset specific account with random password
$pw = ConvertTo-SecureString ([System.Web.Security.Membership]::GeneratePassword(20,5)) -AsPlainText -Force
Set-ADAccountPassword -Identity <user> -NewPassword $pw -Reset

Critical The krbtgt account must be reset twice (10h apart) to invalidate any Golden Tickets the red team may have crafted. See Golden Ticket hardening.

Splunk Setup

The full Splunk Enterprise install + Universal Forwarder setup for Linux and Windows clients is on the Monitoring Guide page.

Quick start

Server install (single .deb), forwarders on every host, port 9997 for receive, port 8000 for the web UI.

Open Monitoring Guide

Forwarders

UF on Linux uses /opt/splunkforwarder; on Windows the MSI wizard with a receiving indexer pre-set.

Forwarder setup

Splunk Dashboards

Pre-baked SPL queries for alerts and dashboards across Linux, Windows, and server services live on the Dashboards & Alerts page.

Build the "Live Ops" dashboard first: failed logins timechart, top source IPs, host heartbeats
Then build alerts (Save As → Alert) for the HIGH severity queries
Set dashboard auto-refresh to 5 min · keeps it live
Pin a "Last 24h log volume per host" panel · silent hosts = forwarder problem

Backups

Snapshot every important host before go-live. If the red team trashes a box, you restore from the snapshot, not from a panicked Google search.

VM snapshots

bash · VirtualBox / VMware

# VirtualBox
VBoxManage snapshot <vmname> take "pre-ctf-baseline" --description "clean state"

# VMware (per-VM)
vmrun snapshot /path/to/vm.vmx "pre-ctf-baseline"

Application data

MySQL/PostgreSQL: nightly mysqldump --all-databases | gzip off-host
Web roots: tar -czf /backup/web-$(date +%F).tgz /var/www
AD: wbadmin start systemstatebackup on the DC
Splunk indexes: stop Splunk → tar /opt/splunk/var/lib/splunk → restart
Pull all backups to a separate "blue" host, not on the production net

Monitoring Playbook

What you do every hour as the Watcher. Discipline beats genius · keep the rotation.

Hourly rotation (all panels)

Failed logins by source IP · anything over 50 in 5 min = brute force.
Successful logins of Domain Admins · every one of these is investigated.
New processes per host · anything unusual (powershell.exe spawning cmd, wmic, vssadmin) gets a ticket.
Outbound network from servers · DC should have zero egress traffic.
Host heartbeats · a silent host means a dead UF or a wiped log channel.
HTTP 5xx / web errors · sudden spike = exploitation attempt.
Privileged Windows events · 4720, 4732, 4740, 4624 type 10 (RDP), 4769 RC4.

Journal entry template

journal.md

[14:23] FAILED-LOGIN-SPIKE
  src_ip: 10.0.0.66 (workstation05)
  target: db01:22 · 47 fails in 4 min
  action: blocked src_ip on db01 ufw, opened ticket
  status: contained, watching for further attempts from same /24

Triage & Escalation

Sev	Examples	Action	SLA
P1	DA login from unknown IP, krbtgt activity, ntds.dit access	Wake reserve, full team active, isolate DC	0 min
P2	Web shell, new admin, lateral SMB	Responder takes lead, contain host	5 min
P3	Brute force, recon, scan noise	Block source, log, monitor	15 min
P4	Single failed login, low-rate noise	Note in journal, ignore	·

Note Escalating early is free. Escalating late is a domain takeover. If unsure: assume P2 and verify down.

IR: Linux Compromise

You suspect a Linux host is owned. Run this checklist before rebooting.

bash · first-look IR

# Active connections + listening sockets
ss -tnp; ss -tlnp

# Suspicious processes
ps auxf
ps -ef --forest

# Recently modified files (last 24h)
find / -mtime -1 -type f -not -path "/proc/*" -not -path "/sys/*" 2>/dev/null

# Logged-in users + history
w; last -i | head -30
cat /home/*/.bash_history /root/.bash_history 2>/dev/null

# SUID/SGID newly added (compare against baseline)
find / -perm /6000 -type f 2>/dev/null > /tmp/suid-now.txt
diff /root/baseline-suid.txt /tmp/suid-now.txt

Containment options (least destructive first)

Network isolation · drop firewall to deny all except Splunk + management host
Kill the suspicious process (preserve memory dump first if possible)
Disable the compromised account: passwd -l user
Snapshot the host (if VM) before any further changes
Restore from baseline snapshot as last resort

IR: Windows Compromise

PowerShell (Admin)

# Active connections
Get-NetTCPConnection | Where State -eq Established | Sort RemoteAddress

# Recent processes (with command line)
Get-CimInstance Win32_Process | Select Name, ProcessId, ParentProcessId, CommandLine | ft -auto

# Newly created services
Get-WinEvent -FilterHashtable @{LogName='System'; Id=7045} -MaxEvents 50

# Recent logins (Event 4624)
Get-WinEvent -FilterHashtable @{LogName='Security'; Id=4624; StartTime=(Get-Date).AddHours(-2)}

# Scheduled tasks added recently
Get-ScheduledTask | Where Date -gt (Get-Date).AddDays(-1)

IR: Find Persistence Mechanisms

Common places attackers hide to survive reboots.

Linux

cron: crontab -l; ls -la /etc/cron.*; cat /etc/anacrontab
systemd: systemctl list-unit-files --state=enabled
SSH keys: check every user's ~/.ssh/authorized_keys
Shell init: ~/.bashrc, ~/.bash_profile, /etc/profile.d/
Loadable kernel modules: lsmod; cat /etc/modules-load.d/*
SUID binaries (diff against baseline)
rc.local, systemd timers, motd scripts

Windows

Run keys: HKLM\Software\Microsoft\Windows\CurrentVersion\Run (and HKCU)
Scheduled tasks: schtasks /query /v
Services: Get-Service | Where StartType -eq Automatic
WMI event subscriptions: Get-WMIObject -Namespace root\subscription -Class __FilterToConsumerBinding
Startup folder: shell:startup, shell:common startup
Image File Execution Options (debugger hijack)
BITS jobs: bitsadmin /list /allusers /verbose

IR: Lateral Movement

Spotting the attacker hopping host-to-host.

Indicators

SMB: Windows Event 5140 admin share access (\\host\C$, ADMIN$)
PSExec: service named PSEXESVC (Event 7045), or process PSEXESVC.exe
WMI remote: WmiPrvSE.exe spawning unusual children
RDP: Event 4624 with Logon_Type=10, especially from non-admin workstation
SSH lateral (Linux): unusual user logins between hosts in /var/log/auth.log
Pass-the-Hash: Event 4624 Logon_Type=3 with NTLM auth from unexpected source

Containment

Block source host at the firewall · kill its outbound to internal targets
Kill active sessions: logoff <sessionid> on Windows, kill SSH PIDs on Linux
Reset credentials of every account that touched the source host
Audit the destination hosts for new accounts, services, scheduled tasks

Common Attack Patterns

SSH brute force

High failed-login count from one IP. Block IP, ensure key-only auth, fail2ban active.

Web shell upload

Look for new .php, .aspx, .jsp files in web roots; outbound from www-data; POST with body containing cmd=.

Credential dumping

Sysmon Event 10 ProcessAccess on lsass.exe. Mimikatz/Rubeus signatures in command line.

Kerberoasting

Single user requesting many service tickets (Event 4769, RC4 encryption flag).

Pass-the-Hash

Event 4624 Logon_Type=3 + NTLM authentication from a non-DC source. Block laterally.

Persistence via scheduled task

New schtask running unsigned binary or PowerShell -enc. Disable, investigate parent.

Splunk Query Cheatsheet

SPL · most common queries

# All events from one host in last hour
index=* host=db01 earliest=-1h

# Failed SSH logins by IP
index=main sourcetype=linux_secure "Failed password"
| stats count by src_ip, user
| sort -count

# Windows logon failures
index=wineventlog EventCode=4625
| stats count by Account_Name, src_ip

# Top sourcetypes / volume
index=* earliest=-1h
| stats count by host, sourcetype | sort -count

# Live process creation (requires Sysmon or 4688)
index=wineventlog EventCode=4688 New_Process_Name="*powershell*"
| table _time, ComputerName, Process_Command_Line

# Silent hosts (no logs in last 30 min)
| metadata type=hosts | eval mins_silent=round((now()-recentTime)/60,1)
| where mins_silent > 30
| sort -mins_silent

For more, see the dedicated Dashboards & Alerts page.

Critical Ports Reference

Port	Proto	Service	Notes
22	TCP	SSH	Restrict to mgmt subnet
53	UDP/TCP	DNS	Watch for tunneling
88	TCP	Kerberos	DC only
135	TCP	RPC endpoint mapper	Windows only, internal
389/636	TCP	LDAP / LDAPS	DC only
445	TCP	SMB	Block between workstations
3389	TCP	RDP	Bastion only, never internet-exposed
5985/5986	TCP	WinRM	Internal mgmt only
8000	TCP	Splunk Web UI	Internal access only
9997	TCP	Splunk receive	From forwarders to indexer

Log File Locations

Linux

Path	Content
/var/log/auth.log	SSH, sudo, login
/var/log/syslog	General system events
/var/log/kern.log	Kernel messages
/var/log/apache2/access.log	Apache requests
/var/log/nginx/access.log	Nginx requests
/var/log/mysql/error.log	MySQL errors
~/.bash_history	Per-user shell history
journalctl -u <svc>	systemd unit logs

Windows

Channel	Content
Security	Logons, account changes, audit (4624, 4625, 4720, 4732, 4740, 4769)
System	Service events (7045 = new service)
Application	App-level errors
Microsoft-Windows-PowerShell/Operational	4103, 4104 PowerShell logs
Microsoft-Windows-Sysmon/Operational	1/3/8/10/22 etc. (if Sysmon installed)

Useful Tools

Sysmon (Windows)

Endpoint visibility on steroids · process create, network connect, image load, remote thread, registry. Use SwiftOnSecurity config as the baseline.

SwiftOnSecurity config

auditd (Linux)

Kernel-level audit framework. Use auditctl to track file accesses, syscalls, and command execution.

chkrootkit / rkhunter

Rootkit detectors. Run during prep for a baseline; re-run during IR to spot kernel-level implants.

AIDE

File integrity checker. Compute baseline hashes, then aide --check later to find tampered files.

fail2ban

Watches log files and bans IPs that hit thresholds. Easy SSH/web brute force defense.

tcpdump / Wireshark

For packet capture during incidents. tcpdump -i any -w incident.pcap host <ip> for a quick capture.

Shift Handoff Template

Use this every shift change. Verbal walkthrough + journal entry.

handoff.md

SHIFT HANDOFF · <date> <time>
Outgoing: P1
Incoming: P2

ACTIVE INCIDENTS
  1. <short title> · sev P2 · host: db01 · owner: P1
     status: contained, monitoring egress
     next-action: re-image at 14:00 if no further activity

OPEN TICKETS
  - #007 SSH bruteforce 10.0.0.66 → blocked, watching
  - #008 unusual cron on web01 → escalate to P3 if reappears

ENVIRONMENT CHANGES THIS SHIFT
  - rotated credentials on dc01 (krbtgt 1st reset done, 2nd at 19:00)
  - added 2 new alerts in Splunk (failed PowerShell -enc, new service)

WATCHLIST FOR INCOMING SHIFT
  - krbtgt 2nd reset at 19:00 · must run before red team can reuse golden ticket
  - Splunk disk at 78% · rotate older indexes if needed

UNCERTAIN / TO-INVESTIGATE
  - one alert at 12:34 looked like recon from 10.0.0.99 · couldn't reach owner
  - need second pair of eyes on web01 access.log for SQLi patterns