Research

Technical insights, product updates, and deep dives from the Nosible team

Can Faceted Search at Web-Scale Self Organize?
Faceted Search

Can Faceted Search at Web-Scale Self Organize?

Can Faceted Search at Web-Scale Self Organize? As it turns out, yes it can! In this post we outline our new and improved adaptive named entity tagging system!

Stuart Reid October 16, 2025
Introducing Cybernaut-1: Agentic Search using MCTS
cybernaut-1

Introducing Cybernaut-1: Agentic Search using MCTS

Cybernaut-1 combines our powerful hybrid-3 search algorithm with LLM-guided Monte Carlo Tree Search to deliver world class search results on difficult queries.

Stuart Reid August 26, 2025
The Road to Cybernaut-1: Rebuilding Search for AI
cybernaut-1 Technical

The Road to Cybernaut-1: Rebuilding Search for AI

AI needs its own search engine. This is how we’re rebuilding search for AI -- and the road to Cybernaut-1, the first high-trust agentic search engine.

Stuart Reid August 20, 2025
A Pattern for Scaling the Value Proposition of LLMs: Ensemble and Distil πŸš€
Technical Sentiment Signals

A Pattern for Scaling the Value Proposition of LLMs: Ensemble and Distil πŸš€

We introduce the ensemble and distil data pattern and use it to fit an ordinary least squares linear regression that outperforms GPT-4 at financial news sentiment classification using sentence transformer embeddings as features.

Stuart Reid February 6, 2024
News Sentiment Showdown: Who Checks Vibes Best?
Technical Sentiment

News Sentiment Showdown: Who Checks Vibes Best?

A comparison of sentiment classifications made by TextBlob, VADER, Flair, SigmaFSA, FinBERT, FinBERT-Tone, Text-Bison, Text-Unicorn, Gemini-Pro, GPT-3.5, GPT-4, and GPT-4-Turbo. We look at accuracy, time, and cost and include a dataset of 10,368 labelled news stories (with code) for our followers.

Stuart Reid January 28, 2024
Using Vector Search to See Signals in Company News
Technical Vector Search Signals

Using Vector Search to See Signals in Company News

How we use vector search to extract investment signals from a multi-terabyte company news dataset that currently contains over 55 million embeddings, 150+ million sentences, 4+ billion words, and 5+ billion GPT tokens.

Stuart Reid January 21, 2024