Debugging and Profiling
The Diagnostic Mindset
Software will execute exactly as instructed, regardless of the programmer's intentions. Debugging bridges the gap between intended and actual behavior, and while this process can be time-intensive, there are effective techniques for identifying and resolving issues in buggy or resource-intensive code. Often times, debugging can be seen as a reactive process that slows down development, but implementing systematic debugging practices accelerates development cycles and reduces time spent tracking down issues.
A Systematic Approach (Observe, Hypothesize, Test)
Effective debugging follows a methodical process rather than random trial and error. The systematic approach consists of five key steps:
- Reproduce consistently - If you can't reliably reproduce the bug, you can't verify your fix. Document the exact steps, inputs, and environment that trigger the issue.
- Isolate the problem - Use binary search through your code. Comment out half the functionality, see if the bug persists, and narrow down the problematic section.
- Form a hypothesis - Based on symptoms and isolated code, develop a theory about what's wrong. "I think the null pointer occurs because the API returns empty data."
- Test the hypothesis - Add logging, use a debugger, or write a minimal test case to prove or disprove your theory. Make one change at a time.
- Fix and verify - Once confirmed, implement the fix and verify it resolves the issue without breaking other functionality. Run related tests.
Example: Suppose your web app crashes when processing user uploads. You reproduce it (step 1), isolate it to the file parsing function (step 2), hypothesize that large files cause memory issues (step 3), confirm by testing with various file sizes (step 4), and implement streaming processing instead of loading entire files into memory (step 5).
Debugging Fundamentals
Print Debugging & Logging
As Brian Kernighan noted in "Unix for Beginners" (1979), "The most effective debugging tool is still careful thought, coupled with judiciously placed print statements." The simplest approach to debugging involves adding print statements near suspected problem areas and iterating until sufficient information is gathered to identify the root cause. This method's simplicity and ease of implementation make it a preferred choice for many software engineers.
Print debugging can be enhanced by implementing proper logging instead of simple print statements. Logging systems offer several advantages over basic printing: they can output to multiple destinations including files, sockets, and remote servers, making log review more convenient than scanning terminal output. They also support severity levels (INFO, DEBUG, WARN, ERROR) for filtered output, and they establish a logging infrastructure that can serve both immediate debugging needs and long-term monitoring requirements.
If used correctly, logging can significantly increase development velocity. Below, are a few tips to help make logs more useful:
- Set log levels properly to allow you to filter out unnecessary messages (ex. diagnostic messages) to help you narrow down to the actual issue.
- Many libraries support structured logging, a method of organizing log data into a structured format, making it easier to analyze and interpret. Instead of recording raw text, structured logging uses key-value pairs which provide context and additional information about the logged event. The structured nature makes it easier to search, filter and analyze logs.
- You want to always be able to find the source code for any given log entry. This means using unique messages, prefixes, etc. which will help you trace a code path using log messages. This is more difficult if multiple places create the same log entry.
- Use a log viewer to make it easier to process and view logs
- It can be useful (especially in a web-development context) to use a correlation id to track a request throughout the entire transaction.
- Logging should not be competing with your software for resources. Use logging levels strategically as well as opting for accumulated metrics over textual logs as needed.
- Never log sensitive information.
- Only log what you need but finding out what you actually need is an iterative process.
Here's a practical example demonstrating different log levels:
import logging
# Configure logging
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
def process_user_data(user_id, data):
logger.debug(f"Starting to process data for user {user_id}")
if not data:
logger.warning(f"User {user_id} submitted empty data")
return None
try:
# Simulate processing
result = {"user_id": user_id, "processed": len(data)}
logger.info(f"Successfully processed {len(data)} items for user {user_id}")
return result
except Exception as e:
logger.error(f"Failed to process data for user {user_id}: {e}")
raise
# Run with different scenarios
process_user_data(123, ["item1", "item2", "item3"])
process_user_data(456, [])
Run this script to see all log levels:
$ python logging_demo.py
You can change the logging level to logging.INFO to filter out DEBUG messages:
# Edit the script to change level=logging.DEBUG to level=logging.INFO
# Then run again to see only INFO, WARNING, and ERROR messages
$ python logging_demo.py
Or set to logging.WARNING to see only warnings and errors. This makes it easy to control verbosity in development vs. production environments.
External Logging
When working with external dependencies like web servers, databases, or containerized services, you'll often need to check their logs since client-side error messages may not provide enough detail. Most programs write logs to /var/log on UNIX systems, and modern containerized applications expose logs through commands like docker logs <container-name>.
Interactive Debuggers
When print debugging is not enough you should use a debugger. Debuggers are programs that let you interact with the execution of a program, allowing the following:
- Halt execution of the program when it reaches a certain line.
- Step through the program one instruction at a time.
- Inspect values of variables after the program crashed.
- Conditionally halt the execution when a given condition is met.
- And many more advanced features
Many programming languages come with some form of debugger.
Core Concepts (Stepping, State Inspection, Call Stack)
Debuggers allow you to pause program execution at breakpoints, then step through code line by line. Key operations include "Step Over" (execute the current line including any function calls), "Step Into" (enter a function to debug its internals), and "Step Out" (continue until the current function returns). While paused, you can inspect the call stack (the sequence of function calls that led to the current point), examine and modify variable values, and set watch expressions to monitor specific values as the program executes.
Advanced Breakpoints (Conditional, Hit Count, Logpoints)
Conditional breakpoints extend the basic breakpoint concept by adding programmable conditions. Instead of stopping every time a particular line is reached, the debugger only halts execution when the specified condition is true. For example, a conditional breakpoint might only trigger when a variable reaches a certain value or when a specific error condition occurs. This capability is especially valuable when debugging issues that only manifest under specific circumstances or when dealing with code that executes frequently but only occasionally exhibits problematic behavior.
Hit count breakpoints pause execution only after a line has been hit a certain number of times. This is useful when debugging loops - for example, breaking only on the 100th iteration when you suspect an issue occurs after many iterations.
Logpoints (also called tracepoints) allow you to log messages to the console without stopping execution and without modifying your source code. Instead of adding print statements and rerunning your program, you can inject logging at any point during a debugging session.
Hands-On: Using Python's pdb
Python's built-in debugger pdb provides an interactive debugging experience. Here's a buggy program and how to debug it:
# buggy_calculator.py
def calculate_average(numbers):
total = 0
for num in numbers:
total += num
return total / len(numbers)
def process_data(data):
results = []
for item in data:
avg = calculate_average(item['values'])
results.append({'name': item['name'], 'average': avg})
return results
if __name__ == '__main__':
data = [
{'name': 'dataset1', 'values': [1, 2, 3, 4, 5]},
{'name': 'dataset2', 'values': []}, # This will cause an error!
{'name': 'dataset3', 'values': [10, 20, 30]}
]
print(process_data(data))
To debug with pdb, you can either:
- Run with
python -m pdb buggy_calculator.py - Add
import pdb; pdb.set_trace()where you want to break
def calculate_average(numbers):
import pdb; pdb.set_trace() # Execution will pause here
total = 0
for num in numbers:
total += num
return total / len(numbers)
Common pdb commands:
l(list) - Show current code contextn(next) - Execute current line (step over)s(step) - Step into function callsc(continue) - Continue execution until next breakpointp variable_name- Print variable valuepp variable_name- Pretty-print variable valuew(where) - Show call stackb line_number- Set breakpoint at line numberb function_name- Set breakpoint at functioncondition bp_number condition- Make breakpoint conditional
Specialized Tools
System Call Tracers (strace, dtrace)
Even if what you are trying to debug is a black box binary there are tools that can help you with that. Whenever programs need to perform actions that only the kernel can, they use System Calls. There are commands that let you trace the syscalls your program makes. In Linux there's strace and macOS and BSD have dtrace. dtrace can be tricky to use because it uses its own D language, but there is a wrapper called dtruss that provides an interface more similar to strace (more details here).
Below are some examples of using strace or dtruss to show stat syscall traces for an execution of ls. For a deeper dive into strace, this article and this zine are good reads.
# On Linux
sudo strace -e lstat ls -l > /dev/null
# On macOS
sudo dtruss -t lstat64_extended ls -l > /dev/null
Here's a practical example debugging why a Python script is slow. First, create a simple script:
# slow_script.py
with open('/etc/hosts', 'r') as f:
for line in f:
pass
Now trace it with strace to see what system calls it makes:
$ strace -c python slow_script.py
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
34.21 0.000026 13 2 read
23.68 0.000018 9 2 openat
15.79 0.000012 12 1 write
...
The -c flag provides a summary. You can also see individual calls with -e trace=openat,read to focus on specific syscalls. This helps identify if your program is doing excessive file I/O, network calls, or other system operations.
Web Development Tools
Browser developer tools (Chrome DevTools, Firefox Developer Tools) are essential for web development debugging. Here's a practical walkthrough of common debugging scenarios:
Debugging JavaScript:
- Open DevTools (F12 or Ctrl+Shift+I / Cmd+Option+I)
- Go to the Sources tab
- Find your JavaScript file in the file tree
- Click on a line number to set a breakpoint
- Refresh the page - execution will pause at your breakpoint
- Use the controls to step through code, inspect variables in the Scope panel
- Use the Console to evaluate expressions in the current context
Debugging Network Requests:
- Open the Network tab
- Perform the action that triggers the request (e.g., submit a form, click a button)
- Click on the request to see:
- Headers (request/response headers)
- Preview (formatted response)
- Response (raw response)
- Timing (how long each phase took)
- Right-click a request and select "Copy as cURL" to replay it from the command line
- Use "Preserve log" to keep requests across page navigations
Common Use Cases:
- API debugging: Check if your frontend is sending the correct data, verify response structure
- Performance issues: Use the Network tab's timing column to find slow requests
- Cookie/Storage issues: Application tab shows cookies, localStorage, sessionStorage
- JavaScript errors: Console tab shows errors with stack traces; click to jump to the problematic line
- Live editing: Modify CSS in the Elements tab or JavaScript in Sources to test fixes without redeploying
Performance Profiling
Even if your code functionally behaves as you would expect, that might not be good enough if it takes all your CPU or memory in the process. Algorithms classes often teach big O notation but not how to find hot spots in your programs. Since premature optimization is the root of all evil, you should learn about profilers and monitoring tools. They will help you understand which parts of your program are taking most of the time and/or resources so you can focus on optimizing those parts.
Key Profiling Areas
CPU Profiling (incl. Real/User/Sys Time)
Timing
Similarly to the debugging case, in many scenarios it can be enough to just print the wall clock time it took your code between two points. However, wall clock time can be misleading since your computer might be running other processes at the same time or waiting for events to happen. It is common for tools to make a distinction between Real, User and Sys time. In general, User + Sys tells you how much time your process actually spent in the CPU (more detailed explanation here).
- Real - Wall clock elapsed time from start to finish of the program, including the time taken by other processes and time taken while blocked (e.g. waiting for I/O or network)
- User - Amount of time spent in the CPU running user code
- Sys - Amount of time spent in the CPU running kernel code
For example, try running a command that performs an HTTP request and prefixing it with time. Under a slow connection it can take over 2 seconds for the request to complete but the process itself will only take ~15ms of CPU user time and 12ms of kernel CPU time.
CPU Profilers
Most of the time when people refer to profilers they actually mean CPU profilers, which are the most common. There are two main types of CPU profilers: tracing and sampling profilers. Tracing profilers keep a record of every function call your program makes whereas sampling profilers probe your program periodically (commonly every millisecond) and record the program's stack. They use these records to present aggregate statistics of what your program spent the most time doing. Here is a good intro article if you want more detail on this topic.
In Python we can use the cProfile module to profile time per function call. Here is a simple example that implements a rudimentary grep in Python:
#!/usr/bin/env python
import sys, re
def grep(pattern, file):
with open(file, 'r') as f:
print(file)
for i, line in enumerate(f.readlines()):
pattern = re.compile(pattern)
match = pattern.search(line)
if match is not None:
print("{}: {}".format(i, line), end="")
if __name__ == '__main__':
times = int(sys.argv[1])
pattern = sys.argv[2]
for i in range(times):
for file in sys.argv[3:]:
grep(pattern, file)
We can profile this code using the following command. Analyzing the output we can see that IO is taking most of the time and that compiling the regex takes a fair amount of time as well.
$ python -m cProfile -s tottime grep.py 1000 '^(import|\s*def)[^,]*$' *.py
[omitted program output]
ncalls tottime percall cumtime percall filename:lineno(function)
8000 0.266 0.000 0.292 0.000 {built-in method io.open}
8000 0.153 0.000 0.894 0.000 grep.py:5(grep)
17000 0.101 0.000 0.101 0.000 {built-in method builtins.print}
8000 0.100 0.000 0.129 0.000 {method 'readlines' of '_io._IOBase' objects}
93000 0.097 0.000 0.111 0.000 re.py:286(_compile)
93000 0.069 0.000 0.069 0.000 {method 'search' of '_sre.SRE_Pattern' objects}
93000 0.030 0.000 0.141 0.000 re.py:231(compile)
17000 0.019 0.000 0.029 0.000 codecs.py:318(decode)
1 0.017 0.017 0.911 0.911 grep.py:3(<module>)
[omitted lines]
Notice that re.py:286(_compile) is called 93,000 times! The regex is being recompiled on every line. Let's fix this:
def grep(pattern, file):
regex = re.compile(pattern) # Compile once, outside the loop
with open(file, 'r') as f:
print(file)
for i, line in enumerate(f.readlines()):
match = regex.search(line) # Use the compiled regex
if match is not None:
print("{}: {}".format(i, line), end="")
After the optimization, profiling again shows:
ncalls tottime percall cumtime percall filename:lineno(function)
8000 0.234 0.000 0.259 0.000 {built-in method io.open}
8000 0.116 0.000 0.642 0.000 grep.py:5(grep)
17000 0.098 0.000 0.098 0.000 {built-in method builtins.print}
93000 0.067 0.000 0.067 0.000 {method 'search' of '_sre.SRE_Pattern' objects}
8000 0.055 0.000 0.070 0.000 {method 'readlines' of '_io._IOBase' objects}
8 0.026 0.003 0.029 0.004 {built-in method _sre.compile}
...
The _sre.compile calls dropped from 93,000 to 8 (once per file), and total time improved significantly. This demonstrates how profiling identifies bottlenecks and validates optimizations.
A caveat of Python's cProfile profiler (and many profilers for that matter) is that they display time per function call. That can become unintuitive really fast, especially if you are using third party libraries in your code since internal function calls are also accounted for.
A more intuitive way of displaying profiling information is to include the time taken per line of code, which is what line profilers do.
For instance, the following piece of Python code performs a request to the class website and parses the response to get all URLs in the page:
#!/usr/bin/env python
# urls.py
import requests
from bs4 import BeautifulSoup
# This is a decorator that tells line_profiler
# that we want to analyze this function
@profile
def get_urls():
response = requests.get('https://missing.csail.mit.edu')
s = BeautifulSoup(response.content, 'lxml')
urls = []
for url in s.find_all('a'):
urls.append(url['href'])
if __name__ == '__main__':
get_urls()
If we used Python's cProfile profiler we'd get over 2500 lines of output, and even with sorting it'd be hard to understand where the time is being spent. First install line_profiler (pip install line_profiler), then run it:
$ kernprof -l -v urls.py
Wrote profile results to urls.py.lprof
Timer unit: 1e-06 s
Total time: 0.636188 s
File: a.py
Function: get_urls at line 5
Line # Hits Time Per Hit % Time Line Contents
==============================================================
5 @profile
6 def get_urls():
7 1 613909.0 613909.0 96.5 response = requests.get('https://missing.csail.mit.edu')
8 1 21559.0 21559.0 3.4 s = BeautifulSoup(response.content, 'lxml')
9 1 2.0 2.0 0.0 urls = []
10 25 685.0 27.4 0.1 for url in s.find_all('a'):
11 24 33.0 1.4 0.0 urls.append(url['href'])
Memory Profiling
In languages like C or C++ memory leaks can cause your program to never release memory that it doesn't need anymore. To help in the process of memory debugging you can use tools like Valgrind that will help you identify memory leaks.
In garbage collected languages like Python it is still useful to use a memory profiler because as long as you have pointers to objects in memory they won't be garbage collected.
Here's an example program and its associated output when running it with memory-profiler (note the decorator like in line-profiler). First, install it: pip install memory-profiler
@profile
def my_func():
a = [1] * (10 ** 6)
b = [2] * (2 * 10 ** 7)
del b
return a
if __name__ == '__main__':
my_func()
$ python -m memory_profiler example.py
Line # Mem usage Increment Line Contents
==============================================
3 @profile
4 5.97 MB 0.00 MB def my_func():
5 13.61 MB 7.64 MB a = [1] * (10 ** 6)
6 166.20 MB 152.59 MB b = [2] * (2 * 10 ** 7)
7 13.61 MB -152.59 MB del b
8 13.61 MB 0.00 MB return a
Here's a more realistic example showing a memory leak in a web application:
# memory_leak.py
cache = [] # Global cache that grows unbounded
@profile
def process_requests():
for i in range(1000):
data = fetch_data(i) # Simulates fetching data
cache.append(data) # Memory leak: never cleaned up
result = analyze(data)
def fetch_data(id):
return [id] * 10000 # Simulate large data
def analyze(data):
return sum(data)
if __name__ == '__main__':
process_requests()
Running memory profiler reveals the leak:
$ python -m memory_profiler memory_leak.py
Line # Mem usage Increment Line Contents
==============================================
3 @profile
4 14.1 MB 0.0 MB def process_requests():
5 14.1 MB 0.0 MB for i in range(1000):
6 90.2 MB 76.1 MB data = fetch_data(i)
7 90.2 MB 0.0 MB cache.append(data) # Memory keeps growing!
8 90.2 MB 0.0 MB result = analyze(data)
The fix: implement a bounded cache or clear old entries:
from collections import deque
cache = deque(maxlen=100) # Only keep last 100 items
Event & I/O Profiling
As it was the case for strace for debugging, you might want to ignore the specifics of the code that you are running and treat it like a black box when profiling.
The perf command abstracts CPU differences away and does not report time or memory, but instead it reports system events related to your programs.
For example, perf can easily report poor cache locality, high amounts of page faults or livelocks. Here is an overview of the command:
perf list- List the events that can be traced with perfperf stat COMMAND ARG1 ARG2- Gets counts of different events related to a process or commandperf record COMMAND ARG1 ARG2- Records the run of a command and saves the statistical data into a file calledperf.dataperf report- Formats and prints the data collected inperf.data
Visualizing Performance: Flame Graphs
Flame graphs are a visualization technique for understanding where your program spends time. They show the call stack on the Y-axis and time/sample count on the X-axis, making it easy to identify hot paths in your code.
How to read a flame graph:
- Each box represents a function in the call stack
- Width of the box = how much time was spent (or how many samples)
- Height = call stack depth (functions calling other functions)
- Colors are usually random (for visual distinction)
- Look for wide boxes at the top - these are your bottlenecks
Generating flame graphs with py-spy:
# Install py-spy
pip install py-spy
# Profile a running Python program and generate flame graph
py-spy record -o profile.svg --duration 30 -- python your_script.py
# Or attach to a running process
py-spy record -o profile.svg --pid 12345
This creates an interactive SVG file you can open in your browser. Click on boxes to zoom in and see specific call paths.
Example interpretation:
If you see a wide box labeled json.loads, it means your program is spending significant time parsing JSON. You might optimize by:
- Parsing JSON once and caching the result
- Using a faster JSON library like
orjson - Reducing the amount of JSON data being parsed
Flame graphs work well with the perf command on Linux:
# Record performance data
perf record -F 99 -g python your_script.py
# Generate flame graph (requires flamegraph.pl from github.com/brendangregg/FlameGraph)
perf script | stackcollapse-perf.pl | flamegraph.pl > flame.svg
Resource Monitoring
Sometimes, the first step towards analyzing the performance of your program is to understand what its actual resource consumption is. Programs often run slowly when they are resource constrained, e.g. without enough memory or on a slow network connection.
Essential monitoring tools:
htop - Interactive process viewer showing CPU, memory, and process information in real-time
- Press
F6to sort by different columns (CPU%, MEM%, TIME) - Press
tto show process tree hierarchy - Press
hfor help with all keybindings - Use this when: Your program is slow and you want to see if it's using all CPU or running out of memory
# Example: Find which process is using the most CPU
htop # then press F6 and select CPU%
du - Disk usage analyzer
du -hshows human-readable sizesdu -sh *shows size of each item in current directory- Use this when: Your disk is full and you need to find large directories
# Find largest directories in your home folder
du -h ~ | sort -h | tail -20
lsof - List open files and which processes have them open
lsof /path/to/fileshows which process is using a filelsof -i :8080shows what process is using port 8080- Use this when: You get "file in use" errors or need to find what's using a port
# Find what's running on port 3000
lsof -i :3000
# Example output:
# COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
# node 12345 user 23u IPv4 123456 0t0 TCP *:3000 (LISTEN)