Repository Radar - PR#16
Keeping an eye on the world of OSS software - one scan at a time
Welcome to PR#16 of Repository Radar – your no-fluff scan of open-source software infrastructure. This week features TensorZero, Chatterbox, Firecrawl, Wren AI, LEANN, VibeVoice, and Claude Code PM. From LLM feedback loops to speech, data pipelines, BI, RAG, and project management, OSS is fast becoming the default stack for developers.
📡 ABOVE THE RADAR (aka the BFD)
In “above the radar” we take a look at some of the big splash software infrastructure announcements and go on the hunt for OSS that are similar.
This week, congratulations to TensorZero, which announced a 7.3m USD seed round led by FirstMark with Bessemer, Bedrock, DRW, Coalition, and several strategic angels. We first featured them in PR#10 when they had just passed 6k stars. Now they are closing in on 10k, showing strong momentum.
TensorZero is building the open source feedback loop for production LLM applications. It combines inference, observability, optimization, and experimentation into one stack, turning raw inferences into a data flywheel that makes LLM apps smarter with every interaction.
The bigger picture: as the LLM ecosystem fragments across providers and fine tunes, the real moat will come from feedback-driven workflows. OSS projects like TensorZero ensure these loops stay open, composable, and community driven. Open source is not just where innovation starts, it is where durable infrastructure for AI is being forged.
🧩 TensorZero (GitHub) 9.9k ☆ – An Open Source Stack for Industrial Grade LLM Applications
TensorZero is an open source stack for industrial grade LLM applications. It unifies an LLM gateway, observability, optimization, evaluation, and experimentation. The result is a feedback loop that makes applications smarter over time using real world data and human input.
Why It's a Big Deal:
Brings structure to the fragmented LLM ecosystem by offering a single, coherent stack instead of piecemeal tools.
Turns production data and human feedback into a continuous learning cycle, making applications smarter over time.
Demonstrates the momentum of OSS in AI, growing rapidly in adoption and community contributions.
Under the Hood:
Rust core with <1ms p99 latency at scale, paired with a Python SDK for accessibility.
Modular design covering gateway, observability, optimization, evaluation, and experimentation, all usable incrementally.
Fully self-hosted, GitOps-friendly, and OpenAI-compatible, supporting every major LLM provider with routing, retries, and fallbacks.

TensorZero puts the "learning" back into machine learning, ensuring that as models scale and diversify, developers keep control of the feedback loops that matter.
🔭 ON THE RADAR
Stuff that’s hot and is trending at over 10K stars.
🗣️ Chatterbox (GitHub) 11.2k ☆ – Open Source TTS with Emotion Control
The Scoop: Chatterbox, from Resemble AI, is a state-of-the-art open source TTS model that rivals commercial systems like ElevenLabs. It’s the first OSS TTS to support emotion exaggeration control for expressive, human-like voices.
Why It's a Big Deal
Democratizes access to high-quality, production-grade TTS without vendor lock-in.
Enables creative and interactive use cases with unique emotion control.
Benchmarked against closed-source leaders and consistently preferred in evaluations.
Under the Hood
0.5B Llama-based model trained on 0.5m hours of curated data; MIT-licensed.
Alignment-informed inference for stability, sub-200ms latency, and voice conversion scripts.
Built-in imperceptible watermarking (PerTh) ensures responsible AI audio generation.
Chatterbox makes expressive, open, and responsible speech synthesis available to developers and creators everywhere.
🔥 Firecrawl (GitHub) 54.1k ☆ – The Web Data API for AI
The Scoop: Firecrawl turns any website into clean, LLM-ready markdown or structured data. With crawling, scraping, mapping, and extraction APIs, it simplifies the hardest parts of web data ingestion for AI workflows.
Why It’s a Big Deal
Makes high-quality web data accessible for AI without brittle scraping setups.
Integrates seamlessly with LangChain, LlamaIndex, Flowise, and other AI dev tools.
Unlocks richer AI use cases by supporting structured data extraction, monitoring, and change tracking.
Under the Hood
Rust + Node + Python SDKs with async APIs; AGPL-3.0 license.
Features crawling, scraping, mapping, and AI-based extraction; supports custom headers, batching, and anti-bot handling.
Deploy via Firecrawl Cloud or self-host locally; works with any OpenAI-compatible LLM client.
Firecrawl makes the messy web usable for AI developers, powering anything from RAG pipelines to automated monitoring systems.
📊 Wren AI (GitHub) 11.3k ☆ – Open Source GenBI Agent
The Scoop: Wren AI lets you query any database in natural language and instantly get SQL, charts, and AI-powered insights. It’s an open source Generative BI agent designed for modern data teams.
Why It's a Big Deal
Lowers the barrier for non-technical users to access and analyze complex datasets.
Produces accurate SQL and governed outputs via a semantic layer.
Embeds directly into apps or workflows, enabling SaaS features and custom agents.
Under the Hood
Supports major data sources (Postgres, BigQuery, Snowflake, Redshift, MySQL, ClickHouse, Oracle, MSSQL, Trino, DuckDB).
Integrates with multiple LLMs: OpenAI, Anthropic, Gemini, Groq, Ollama, Databricks, and more.
Modular architecture with API access, managed cloud service, and open source self-hosted deployment.
Wren AI bridges the gap between data and decision-making, giving teams natural-language superpowers on top of their databases.
🔬 BELOW THE RADAR
Our hot picks for recent OSS projects to keep a close eye on for the future.
🛠️ Claude Code PM (GitHub) 3.4k ☆ – AI-Native Project Management System
The Scoop: Claude Code PM is an AI-native project management system that turns GitHub Issues and Git worktrees into a spec-driven development workflow. It preserves context from PRD to epic to issue, enables parallel agent execution without branch chaos, and keeps a full audit trail in GitHub. The result is faster shipping with less vibe coding and more transparent collaboration between humans and AI agents.
Get started: Install into your repo and initialize the workflow.
# Unix or macOS
cd path/to/your/project
curl -sSL https://raw.githubusercontent.com/automazeio/ccpm/main/ccpm.sh | bash
/pm:init
# Create a PRD and convert to an epic
/pm:prd-new example-feature
/pm:prd-parse example-feature
/pm:epic-oneshot example-feature📦 LEANN (GitHub) 9.3k ☆ – Lightweight Vector Index for Personal RAG
The Scoop: LEANN is a lightweight vector index for personal RAG. It achieves about 97% storage savings by storing a pruned graph and recomputing embeddings on demand, so you can index emails, browser history, documents, chats, and code on your laptop with strong privacy and no cloud costs. It integrates with Claude Code, Ollama, OpenAI, and Hugging Face.
Get started: Install with uv, build a tiny index, and search.
# Setup
git clone https://github.com/yichuan-w/LEANN.git leann
cd leann
uv venv && source .venv/bin/activate
uv pip install leann
# Build an index and query it
python - <<'PY'
from leann import LeannBuilder, LeannSearcher
from pathlib import Path
idx = str(Path("./demo.leann").resolve())
b = LeannBuilder(backend_name="hnsw")
b.add_text("LEANN saves 97% storage vs traditional vector DBs.")
b.build_index(idx)
s = LeannSearcher(idx)
print(s.search("storage savings", top_k=1))
PY🎙️ VibeVoice (GitHub) 6k ☆ – Long-Form Conversational TTS
The Scoop: VibeVoice is a long-form, multi-speaker text-to-speech framework for natural conversations. It uses continuous speech tokenizers at 7.5 Hz and a next-token diffusion approach to generate expressive dialogue, handling up to 4 speakers and very long contexts for podcast-like audio, narration, and agent voices.
Get started: Install and launch the demo.
# Clone and install
git clone https://github.com/microsoft/VibeVoice.git
cd VibeVoice
pip install -e .
# Run the Gradio demo (choose a model)
python demo/gradio_demo.py --model_path microsoft/VibeVoice-1.5B --share
# or
python demo/gradio_demo.py --model_path microsoft/VibeVoice-Large --shareRepository Radar is brought to you by Alexander, a Partner at Picus Capital, and Claudius, an Investor there. In this Substack, we focus on software infrastructure and open-source innovation in AI and beyond, tracking major trends while uncovering the hidden gems shaping the future of technology.










