torchsight

On-premise security scanner powered by local LLMs.

OVERVIEW

TorchSight scans files for sensitive data, credentials, malicious payloads, and compliance violations using a locally-running LLM. No data leaves your machine.

It classifies documents across 7 categories and 51 subcategories — PII, credentials, financial, medical, confidential, malicious, and safe.

CATEGORIES

Malicious (14) injection, exploit, shell, phishing, prompt injection, supply chain, SSRF, SSTI, XXE
Confidential (9) classified, military, intelligence, weapons systems, nuclear, internal, M&A
Credentials (8) passwords, API keys, tokens, private keys, connection strings, cloud config
PII (6) identity, contact, government ID, biometric, behavioral, metadata
Safe (6) documentation, code, config, media, email, business
Financial (4) credit cards, bank accounts, transactions, tax records
Medical (4) diagnosis, prescription, lab results, insurance

BEAM MODEL

Base modelQwen 3.5 27B
MethodLoRA (r=128, α=256)
Training data78,358 samples
Sources18 (all permissive)
Category accuracy95.1%
vs Claude Opus 479.9%
vs Gemini 2.5 Pro75.4%
Default quantizationq4_K_M (~17 GB)
LicenseApache 2.0

SUPPORTED FILES

Text — txt, csv, json, xml, yaml, toml, log, md, sql, env
Code — py, rs, go, java, js, ts, c, cpp, rb, php, sh
Documents — pdf, docx, doc, xlsx, xls, pptx
Images — png, jpg, gif, bmp, tiff, webp
Email — eml, msg, mbox, pst, ost
Secrets — pem, key, crt, pub, env

CI/CD

# GitHub Actions
- run: torchsight /path --format sarif --fail-on high
- uses: github/codeql-action/upload-sarif@v3
  with:
    sarif_file: torchsight_report.sarif

# Scan git diff
$ torchsight --diff HEAD~1 --fail-on medium