Research

Technical insights, product updates, and deep dives from the Nosible team

Matching GPT-5.1 at Financial Sentiment with Active Learning and Qwen3

Here's how we fine-tuned Qwen3 0.6B to beat FinBERT and match GPT-5.1 accuracy. Complete with open-source models, datasets, and training scripts. Spoiler alert: active learning is all you need.

Simon van Dyk December 12, 2025

Faceted Search

Can Faceted Search at Web-Scale Self Organize?

Can Faceted Search at Web-Scale Self Organize? As it turns out, yes it can! In this post we outline our new and improved adaptive named entity tagging system!

Stuart Reid October 16, 2025

cybernaut-1

Introducing Cybernaut-1: Agentic Search using MCTS

Cybernaut-1 combines our powerful hybrid-3 search algorithm with LLM-guided Monte Carlo Tree Search to deliver world class search results on difficult queries.

Stuart Reid August 26, 2025

cybernaut-1 Technical

The Road to Cybernaut-1: Rebuilding Search for AI

AI needs its own search engine. This is how we’re rebuilding search for AI -- and the road to Cybernaut-1, the first high-trust agentic search engine.

Stuart Reid August 20, 2025

Technical Sentiment Signals

A Pattern for Scaling the Value Proposition of LLMs: Ensemble and Distil 🚀

We introduce the ensemble and distil data pattern and use it to fit an ordinary least squares linear regression that outperforms GPT-4 at financial news sentiment classification using sentence transformer embeddings as features.

Stuart Reid February 6, 2024

Technical Sentiment

News Sentiment Showdown: Who Checks Vibes Best?

A comparison of sentiment classifications made by TextBlob, VADER, Flair, SigmaFSA, FinBERT, FinBERT-Tone, Text-Bison, Text-Unicorn, Gemini-Pro, GPT-3.5, GPT-4, and GPT-4-Turbo. We look at accuracy, time, and cost and include a dataset of 10,368 labelled news stories (with code) for our followers.

Stuart Reid January 28, 2024

Technical Vector Search Signals

Using Vector Search to See Signals in Company News

How we use vector search to extract investment signals from a multi-terabyte company news dataset that currently contains over 55 million embeddings, 150+ million sentences, 4+ billion words, and 5+ billion GPT tokens.

Stuart Reid January 21, 2024