How I Built My Own SIEM Project for My Cybersecurity Portfolio
If you're trying to get a job in cybersecurity, you hear the same advice all the time: "build projects." It's good advice, but it can be hard to know where to start. I wanted to build something that was more than just a simple script, something that would really show off the skills I've been learning. That's why I decided to build my own Security Information and Event Management (SIEM) tool from scratch using Python.
I wanted to walk you through how I did it, the problems I ran into, and what I learned. You can check out all the code on my GitHub page if you want to follow along.
The Big Idea: What's a SIEM and Why Build One?
A SIEM is basically a central hub for security. It pulls in logs and data from all over a network, analyzes it for anything suspicious, and sends out alerts. It's a core tool for any security team. By building my own, I could get hands-on with network traffic, threat analysis, and even a bit of web development and AI.
The Core: Building the Web Dashboard with Flask
I needed a way to see all the data and alerts. A web dashboard was the obvious choice. I used Flask, which is a really simple web framework for Python. The most important part was making it update in real-time, so I didn't have to keep hitting refresh. For that, I used Flask-SocketIO.
Here’s the basic setup in my siem_unified.py
file. This code starts the web server and sets up the real-time connection.
from flask import Flask, render_template
from flask_socketio import SocketIO
app = Flask(__name__)
app.config['SECRET_KEY'] = 'secret!'
socketio = SocketIO(app)
@app.route('/')
def index():
return render_template('index.html')
if __name__ == '__main__':
# ... (code to start other processes) ...
socketio.run(app, debug=True)
This was my first time using SocketIO, and it was a game-changer. It creates a persistent connection between the server and the browser, so the server can push updates instantly. When my tool catches a new network packet, it shows up on the screen right away.
The Eyes and Ears: Capturing Network Packets with Scapy
Next, I needed to actually capture network traffic. The best tool for this in Python is a library called Scapy. It's incredibly powerful and can sniff traffic right off the network card. I wrote a separate script, packet_capture.py
, to handle this.
The main logic is pretty simple. I tell Scapy to run a function for every single packet it sees.
from scapy.all import sniff, IP, TCP, UDP
def packet_callback(packet):
if IP in packet:
ip_src = packet[IP].src
ip_dst = packet[IP].dst
packet_data = {
'source_ip': ip_src,
'destination_ip': ip_dst,
}
send_data_to_siem(packet_data)
sniff(prn=packet_callback, store=0)
One of the first problems I hit was how to get the packet_data
from this script over to my main Flask app. I solved this by using a simple message queue. The packet capture script adds data to the queue, and the Flask app reads from it. This kept the two processes separate and made the whole thing more stable.
The Brains: Using AI for Threat Analysis
This is the part I was most excited about. Instead of just flagging packets based on simple rules (like a specific port number), I wanted to use AI to analyze them. I set up a script, llm_analyzer.py
, to connect to different AI models like OpenAI's GPT and Google's Gemini.
When my SIEM flags a packet as potentially suspicious, it sends the packet's data to the AI with a very specific prompt. Getting the prompt right was key. Here’s a simplified version of what I used:
def analyze_with_llm(data_to_analyze):
prompt = f"""
You are a Senior Cybersecurity Analyst.
Analyze the following network data and determine if it is malicious.
Provide a risk score from 0 to 100.
Explain your reasoning in a short summary.
Data:
Source IP: {data_to_analyze['source_ip']}
Destination IP: {data_to_analyze['destination_ip']}
Port: {data_to_analyze['destination_port']}
Return your answer ONLY in JSON format with keys "risk_score", "summary", and "is_malicious".
"""
# ... (code to send prompt to the AI API and get the response) ...
return response_json
This was a huge lesson in "prompt engineering." Just asking "is this bad?" gives you terrible results. By telling the AI to act like a senior analyst and to return the data in a specific format, I was able to get structured, reliable analysis that I could then display in my dashboard.
The Little Details: Geolocation and Logging
To make the data easier to understand, I wanted to show where in the world the IP addresses were coming from. I used a GeoIP database to look up each IP and display the country flag next to it. To avoid slowing things down, I created a cache so that if I saw the same IP address again, I didn't have to look it up twice.
Here is the core function from geoip_lookup.py
:
import geoip2.database
ip_cache = {}
def get_geoip_data(ip_address):
if ip_address in ip_cache:
return ip_cache[ip_address]
try:
with geoip2.database.Reader('path/to/your/GeoLite2-City.mmdb') as reader:
response = reader.city(ip_address)
country = response.country.iso_code
ip_cache[ip_address] = country
return country
except Exception as e:
return None
What I Learned From All This
Building this SIEM was a huge learning experience. It forced me to solve real problems, like making different scripts talk to each other and tuning AI prompts. I got way more comfortable with networking concepts and full-stack development.
Comments
Post a Comment