Qvault’s detection engine now includes a small language model that runs entirely on your device. It catches sensitive information hidden in natural language that patterns alone can’t find — and it never transmits a single byte of your data.

The Problem: PII That Patterns Miss

Regex and heuristic analysis are excellent at catching structured PII: email addresses, phone numbers, SSNs, CNPJs, credit card numbers. They work because these entities follow predictable formats.

But legal documents often contain sensitive information expressed in natural language. Consider this sentence from a real Brazilian hosting contract:

"…representada por seu sócio João da Silva."

There’s no email, no ID number, no pattern to match — just a person’s name in context. A regex will never find it. A heuristic scanner might catch it if it recognizes “representada por” as a trigger, but what about less common phrasings?

This is where a language model excels. It understands context, not just patterns.

How the 3-Layer Pipeline Works

Each layer builds on the previous one. Entities found by earlier layers are excluded from later passes to prevent duplicates.

Document text (per page)
  → Layer 1: Regex    (emails, SSNs, phones, dates, amounts...)     95% confidence
  → Layer 2: Heuristic (names, companies, addresses by context)     75–92% confidence
  → Layer 3: LLM      (contextual PII regex/heuristic missed)      70% confidence
  → Combined, deduplicated, sorted by position

Layer 1: Regex Patterns

High-precision pattern matching for structured data. Supports multiple jurisdictions: US (SSN, EIN), EU (IBAN), Brazil (CPF, CNPJ, CEP), Germany, and global patterns (email, phone, credit card, dates, monetary amounts).

  • Confidence: 95%
  • Deterministic, fastest layer

Layer 2: Heuristic Analysis

Context-aware detection using capitalization patterns, legal role keywords, and company suffixes. Supports EN, PT-BR, ES legal terminology.

  • Confidence: 75–92%

Layer 3: Local AI Model

A quantized Qwen 2.5 0.5B model powered by HuggingFace Candle (pure Rust, no C++ dependencies). The model receives each page of text with a structured prompt and returns a JSON array of PII entities it found. Results are mapped back to exact character offsets and filtered against existing detections.

  • Confidence: 70%
  • GPU accelerated (CUDA / Metal)
  • ~400 MB download
  • Optional — Qvault works perfectly without it

Why Candle Instead of llama.cpp?

We evaluated several options for local inference: llama.cpp, ONNX Runtime, Burn, and Candle. We chose Candle because:

  • Pure Rust — no cmake, no C++ compiler needed. Trivial cross-platform builds.
  • Simple Tauri integration — no external binaries. Just another Rust crate.
  • GPU acceleration — CUDA for NVIDIA, Metal for Apple Silicon. Falls back to CPU automatically.
  • Performance is sufficient — for a 0.5B model, the speed difference versus llama.cpp is negligible.

Privacy by Design

The AI model follows the same zero-transmission principle as everything else in Qvault:

  • The model file (~400 MB) is downloaded once from HuggingFace and cached locally.
  • All inference runs on your CPU or GPU — no API calls, no cloud services.
  • The feature is entirely optional.
  • The model auto-unloads from memory after 30 seconds of inactivity.

Brazilian Contract Detection Improvements

Alongside Layer 3, we also improved Brazilian contract detection in Layers 1 and 2:

  • CEP (postal code) pattern: 00000-000
  • Company suffixes: LTDA, EIRELI, ME, EPP, Cia
  • “representada por” pattern for extracting representative names
  • Brazilian legal roles: Sócio, Diretor, Gerente, Administrador, Procurador
  • Address stop words: Rua, Alameda, Avenida, Vila, Bairro (prevents false name detections)

How to Enable Layer 3

  1. Open Qvault and go to Settings.
  2. In the AI Detection section, click Download Model (~400 MB, one-time).
  3. Once downloaded, click Load Model. The settings page will show your compute device (CPU, CUDA GPU, or Metal GPU).
  4. Enable the toggle. Layer 3 will now run automatically during scans.

The model auto-unloads after 30 seconds of inactivity to free memory. It reloads automatically on the next scan.


Qvault is free, open source (MIT), and runs on macOS, Windows, and Linux. Check out the website or the source on GitHub.