# Compliance — How We Crawl

**URL:** https://nosible.com/legal/compliance

What NOSIBLE crawls and what it doesn't, how we crawl ethically, where data is hosted, and how we handle privacy, logs, and security.

## What We Crawl

Long-form written content from hundreds of thousands of sources, across more than 150 countries and 95 languages. News, press releases, company sites, specialist content, governments, and official records.

## What We Don't Crawl

- Paywalled content — we never crawl behind paywalls
- Social media or marketplaces
- Pornographic sites
- Personally identifiable information is not held in our index

## How We Crawl

- **robots.txt** — we respect robots.txt to the best of our ability
- **Caching** — conditional request headers reduce bandwidth and server load
- **Fair rate** — we crawl in diverse batches, designed never to overwhelm a site
- **Snippets only** — we retain a structured version of each page and index short passages; not a reading copy

## Date Verification

We do not trust a page's stated publish date. We test it against independent signals: metadata, corroboration across reputable sources, verification against systems of record. The threshold can be tuned to your risk tolerance.

## Hosting and Security

Almost entirely on dedicated bare-metal servers in France, Germany, and the United Kingdom. Data is geographically replicated and backed up. Servers run a hardened operating system, are reachable only over a private network, and log all access. A SOC 2 audit is underway.

## Privacy and Logs

Usage is logged to enforce rate limits. Request logging can be turned off entirely. A private API key means no logs are ever recorded. Logs are encrypted at rest with a unique key per API key.

## Right to Be Forgotten

A site we have indexed can ask to be removed. We remove its data and notify affected customers.

## Related

- [Copyright](https://nosible.com/legal/copyright) — our copyright position
- [Contact](https://nosible.com/contact) — reach us with questions
