Why "private" search engines aren't, what they're hiding, and how to build a search stack that trusts nobody — including itself.


Think about what a search query actually is. Not in the abstract, but in the specific — the thing you typed at 2am when the symptoms wouldn't resolve, the name you looked up before the meeting, the question you'd never ask out loud. The search box is the most intimate data collection instrument ever built, precisely because we've been trained to treat it like a private thought. We type into it the way people used to write in diaries: without audience, without filter, without the performance that attends every other form of communication.

That intimacy is the product.

Google understood this before almost anyone else, and built an empire on it. But Google at least never pretended otherwise. The surveillance was the point, the advertising revenue was the mechanism, and the bargain — your data for access to the sum of human knowledge — was always visible to anyone who cared to look. You could disagree with it. You could resent it. But it was honest in its way.

The more corrosive development has been the rise of what we might call privacy theater — search engines that market themselves as alternatives to Google's surveillance model while quietly maintaining business models that depend on knowing something about who's asking. DuckDuckGo is the most prominent example. For years it cultivated a reputation as the privacy-conscious user's search engine, the refuge from the Google panopticon. Then in 2022 it emerged that DuckDuckGo had a private agreement with Microsoft exempting Microsoft trackers from their blocking — an arrangement that had apparently existed for some time without disclosure. The privacy brand and the business reality were two different things.

Startpage presents a different but structurally similar problem. It returns Google results without passing your identity to Google, which sounds appealing until you learn that it is owned by System1, an advertising technology company whose core business is audience acquisition and targeted advertising. The privacy promise is genuine in certain narrow technical senses. The ownership structure tells a different story about incentives.

The sharpest illustration of where these companies' loyalties actually lie came from a practical experiment. When we configured SearXNG — a self-hosted metasearch engine we'll build in this series — to route its upstream queries through Tor, making the user genuinely and technically anonymous to the search engine receiving the query, DuckDuckGo returned a 403. Brave Search rate-limited the exit node immediately. The "privacy engines," confronted with actual privacy tooling, slammed the door. Google, whose surveillance operation dwarfs both of them combined, returned results without complaint.

The reason is worth sitting with. DuckDuckGo and Brave need to be your chosen intermediary to monetize your presence. A Tor user who has opted out of their ecosystem entirely represents a direct threat to their traffic numbers and the data they do collect. Google doesn't have that problem — they have your Gmail, your Android device, your Chrome history, your logged-in search across every other moment of your day. One anonymous query from a Tor exit node is noise against that signal. For the "privacy" engines, that anonymous query is the whole relationship, and they'd rather not have it on those terms.

Privacy as a product is always conditional. Privacy as infrastructure — software you run yourself, on your own hardware, that routes through anonymization layers you control — is something else entirely. That's what this series is about.


What SearXNG Actually Is

SearXNG is not a search engine in the way Google or DuckDuckGo are search engines. It doesn't crawl the web, doesn't maintain an index, and doesn't have data centers full of servers building a picture of what the internet contains. What it does is sit between you and all of those engines simultaneously, query them on your behalf, aggregate the results, deduplicate them, and hand them back to you — without any of the upstream engines knowing it was you who asked.

This architecture is called a metasearch engine, and the concept is older than Google's dominance. What SearXNG brings to it is a combination of aggressive privacy defaults, deep configurability, and — most importantly — the ability to run the entire thing yourself, on your own hardware, under your own control. There is no SearXNG company. There is no SearXNG server you're trusting. There is no privacy policy to read skeptically or terms of service to accept. You install it, you run it, and the only party with access to your queries is you.

It is the actively maintained fork of the original SearX project, with a larger contributor base, more frequent updates, and better default engine coverage. Out of the box it queries dozens of sources — general web search, news, images, video, academic papers, code repositories, maps — and lets you weight, enable, or disable any of them. The interface is clean and functional. For everyday research use it is, frankly, better than any single engine because it synthesizes results from all of them at once.

What it protects you from is specific and worth being precise about. When your SearXNG instance queries Google, Google sees a request from your server's IP address with no cookies, no session history, no Google account, no persistent identity of any kind. If you've additionally routed those outbound queries through Tor — as we will in Part II — Google sees a request from a Tor exit node. It doesn't know your IP, your location, your ISP, or anything else. The query itself is all it has, stripped of every contextual signal that makes a query valuable for surveillance purposes.

What it does not automatically protect you from is what happens after you click. The search is private. The click is a separate transaction — one that goes from your browser directly to the target site, carrying your IP address, your browser fingerprint, and whatever cookies and trackers that site deploys. This is the click problem, and it's real, and Part II addresses it directly. It's worth naming here so the picture is complete from the start: SearXNG is a powerful and meaningful privacy tool, but it is one layer of a stack, not a complete solution on its own.

The complete solution involves understanding what each layer protects, what it doesn't, and assembling them deliberately. That's what we're building.


Prerequisites and Philosophy of the Stack

Before touching a command line, it's worth understanding what you're building and why each component earns its place. Privacy tooling assembled without a coherent threat model is just complexity — potentially a false sense of security dressed up as infrastructure. What follows is a genuinely layered defense where each component addresses a specific attack surface that the others leave exposed.

The stack has three pillars.

WireGuard with a killswitch is the foundation. Your Internet Service Provider sits in a privileged position — every packet you send passes through their infrastructure, and in most jurisdictions they are legally required to retain metadata about that traffic and provide it to authorities on request. In some jurisdictions they sell it commercially. WireGuard encrypts your traffic and routes it through a VPN exit, so your ISP sees an encrypted tunnel to a server and nothing else. The killswitch — implemented in iptables or nftables — ensures that if the VPN connection drops, your traffic stops entirely rather than falling back to your bare IP. No leak on disconnect, no grace period of exposure. Either the tunnel is up or nothing moves.

Tor is the second layer, and it operates on a different surface. Your VPN provider, whatever their privacy policy says, is a single point of trust. They know your real IP. They can see that you're making search queries even if they can't read the content. Tor solves this by routing your traffic through three independently operated relays — entry, middle, and exit — such that no single node has both your identity and your destination. When SearXNG routes its upstream queries through Tor, the search engine at the other end sees a Tor exit node IP. Your VPN provider sees Tor traffic. Your ISP sees the WireGuard tunnel. Nobody in that chain has the complete picture.

The order matters: VPN first, then Tor. This means your ISP cannot see that you're using Tor at all — a meaningful protection in environments where Tor usage itself attracts scrutiny. The Tor entry node sees your VPN IP rather than your real IP, adding another layer of separation.

SearXNG is the third layer and the one closest to the query itself. Even over Tor, querying Google directly from your browser would allow Google to correlate queries made from the same exit node within a session, building a partial profile. SearXNG distributes your queries across multiple engines simultaneously, presents no persistent session identity, carries no cookies, and runs entirely on your own hardware. It is the interface through which your intent touches the surveillance infrastructure of the commercial web — and it's been specifically engineered to minimize what that infrastructure learns in the exchange.

To follow this guide you'll need a Linux system running systemd, Python 3.10 or higher, Git, a functioning WireGuard VPN with a killswitch already configured and verified, and Tor installable from your distribution's package manager. The guide is written against a Kubuntu/Debian-based system but translates directly to any systemd-based distribution with minimal adjustment. Administrative access via sudo is assumed for the Tor installation step. Everything else runs in userspace.

What you do not need is a server, a domain name, a static IP, or any external infrastructure. This is a local installation — SearXNG binds to 127.0.0.1 and is accessible only from your own machine. That locality is a feature. There is no attack surface exposed to the internet, no authentication layer to harden, no public endpoint to discover and probe. The threat model this addresses is surveillance by platforms and data brokers, not targeted intrusion — and for that purpose, localhost is exactly the right place to run it.


Installation

Getting the Code

Start by cloning the SearXNG repository into a permanent home. Downloads folders are for downloads — give this a proper location:

git clone https://github.com/searxng/searxng.git ~/searxng
cd ~/searxng

The Python Environment

SearXNG runs in a Python virtual environment — an isolated bubble that keeps its dependencies completely separate from your system Python and any other Python applications you run. This is non-negotiable good practice: it means SearXNG's dependency tree can't interfere with anything else, and uninstalling later is as clean as deleting the directory.

Create and activate the environment, then bring the tooling current:

python3 -m venv venv
source venv/bin/activate
python -m pip install -U pip setuptools wheel

Your prompt will change to show (venv) — that's your confirmation that you're working inside the isolated environment. Now install SearXNG's full dependency tree:

python -m pip install -r requirements.txt
python -m pip install -e . --no-build-isolation

The -e flag installs SearXNG in editable mode, meaning Python resolves the searx package directly from the source directory rather than copying it into the venv's site-packages. This is how SearXNG is designed to be run from source, and it's what allows the import chain to resolve correctly.

Configuring settings.yml

SearXNG's entire configuration lives in a single YAML file. Back it up before touching it — a habit worth keeping for any config you're about to manipulate with sed:

cp searx/settings.yml searx/settings.yml.bak

Generate a secret key. This is the cryptographic secret SearXNG uses to sign sessions. It needs to be random and it needs to stay private:

KEY="$(python3 -c 'import secrets; print(secrets.token_hex(32))')"
echo "$KEY"

Save that output somewhere — a password manager is appropriate. Now write it into the config. Pay close attention to the regex here: the pattern is .* not .&*. In sed, & means "the entire matched string" — a single character difference that produces a config file with two concatenated quoted values on the same line, which is invalid YAML and will prevent SearXNG from starting. The correct command:

sed -i 's/^\([[:space:]]*secret_key:\).*/\1 "'"$KEY"'"/' searx/settings.yml

Bind to localhost and set your port:

sed -i '/^server:$/,/^general:$/ s/^\([[:space:]]*bind_address:\).*/\1 "127.0.0.1"/' searx/settings.yml
sed -i '/^server:$/,/^general:$/ s/^\([[:space:]]*port:\).*/\1 8888/' searx/settings.yml

Binding to 127.0.0.1 is important — it means SearXNG is only reachable from your own machine. Binding to 0.0.0.0 would expose it on all network interfaces, which on a personal workstation is unnecessary and inadvisable.

Verify all three settings landed correctly:

sed -n '/^server:$/,/^general:$/p' searx/settings.yml | grep -E "bind_address|port|secret_key"

You should see exactly:

port: 8888
bind_address: "127.0.0.1"
secret_key: "your_generated_key"

No duplicate values, no leftover defaults appended to your key. If the secret_key line shows two quoted values concatenated — your original default still attached to your new key — restore from backup and redo that step with the corrected regex.

Once all configuration changes are complete, take a second backup of the final state:

cp searx/settings.yml searx/settings.yml.bak2

The first backup reflects the stock config. This one reflects your working configuration. Both are worth keeping.

Making the Settings Path Permanent

SearXNG needs to know where its config file lives via an environment variable. Add it permanently to your shell profile so it's always set:

echo 'export SEARXNG_SETTINGS_PATH="$HOME/searxng/searx/settings.yml"' >> ~/.bashrc
source ~/.bashrc

Using $HOME here rather than $PWD is deliberate — $PWD would evaluate to whatever directory you happen to be in when you open a terminal, which is almost never the right answer in a persistent environment variable.

First Run

With the venv active and the environment variable set, start SearXNG manually to confirm it works before handing it off to systemd:

cd ~/searxng
source venv/bin/activate
python searx/webapp.py

You should see Flask announce itself:

* Serving Flask app 'webapp'
* Debug mode: off

Open http://127.0.0.1:8888 in your browser. If the SearXNG interface loads and returns results, the installation is clean. Hit Ctrl+C to stop it — Part II covers running it properly as a persistent background service.

A note on the warnings you'll likely see in the terminal output: X-Forwarded-For nor X-Real-IP header is set is expected when running without a reverse proxy in front of SearXNG. It's informational noise, not a failure. Errors from specific engines — rate limiting, 403s — are also normal. SearXNG handles engine failures gracefully and routes around them. If results are appearing in the browser, everything is working.

For now, you have a functioning private search instance running on your own hardware. No account. No company. No policy. It queries the engines you've always used, aggregates the results, and hands them back — without any of those engines knowing it was you.

That's the foundation. Part II builds the rest of the stack: Tor proxying for the outbound queries, a systemd service for automatic startup and clean shutdown, and Firefox configuration that closes the click problem so the full research workflow — search and result — stays behind the anonymization layer.


Part II: "No Vendor, No Policy, No Problem" — continues with Tor configuration, systemd service setup, browser hardening, and honest threat modeling.


Border Cyber Group.