Compliance
What we crawl, how we crawl it, and where it lives.
We are a search engine. We crawl, index, and cross-reference the open web. We hold ourselves to the standards of the search engines you already trust. This is a plain account of how.
We index long-form written content from hundreds of thousands of sources, across more than 150 countries and 95 languages. News, press releases, company sites, specialist content, governments, and official records.
We do not crawl behind paywalls, and never will. We do not index social media, marketplaces, or pornographic sites. We screen every domain for security. We hold no personally identifiable information in our index.
We do not trust a page's stated publish date. We test it against independent signals. Metadata. Corroboration across reputable sources. Verification against systems of record. A sanity check against what was true at the time. The threshold can be tuned to your risk tolerance.
We run almost entirely on dedicated bare-metal servers in data centers in France, Germany, and the United Kingdom. The data is geographically replicated and backed up. Servers run a hardened operating system, are reachable only over a private network, and log all access. A SOC 2 audit is underway.
SOC 2: System and Organization Controls 2.
Usage is logged to enforce rate limits. Request logging can be turned off entirely. Ask for a private API key and no logs are ever recorded. Or make an existing key private, and its logs are destroyed within one business day. Logs are encrypted at rest, with a unique key per API key.
A site we have indexed can ask to be forgotten. We remove its data and notify affected customers.