Lab - Aarush Sharma

evalops on pypi

+

If eval has friction, people skip it. When people skip eval, bad outputs reach users.

A Python library that scores LLM output quality in one function call. Works with any model provider. Zero mandatory dependencies. Ships with CI regression gates so a prompt change can't quietly break production accuracy.

pip install evalsystem

eval-driven open source python CI

View on GitHub →

brand-llm-visibility-audit active

+

AI models are becoming the first stop for purchase decisions. No established playbook exists for optimizing brand presence in LLM responses.

When someone asks ChatGPT "best CRM?", most companies have no idea what comes back. This runs a 4-stage agentic pipeline across multiple AI models: audit current visibility, send a probe agent to dig into gaps, run cross-model diagnosis, then output a concrete GEO playbook with prioritized actions.

agentic open source 219 unit tests multi-model

View on GitHub →

evaluation-harness prototype

+

Eval before you build. Know if the problem is solvable with your current approach before spending cycles iterating.

A prompt evaluation harness for VLM-based extraction on freight documents. Compares three strategies (naive, structured, few-shot) against ground truth and classifies every failure: is it layout ambiguity, format variance, scan quality, or something a better prompt can actually fix? The taxonomy tells teams what to iterate on and what to stop wasting time on.

eval-driven multimodal open source 3 prompt strategies

View on GitHub →

shower-thoughts just for fun

+

You learn agent patterns better when the stakes are low and the feedback loop is fast.

Type a shower thought. Four AI agents debate it: an Optimist argues why you're brilliant, a Cynic finds the fatal flaw, a Researcher searches the web for prior art, and a Judge delivers the verdict. Built to explore multi-agent orchestration, role-based personas, and tool use in a context where experimentation is the point.

multi-agent crewai streamlit open source

View on GitHub →

Lab.

Shipped at Work

Let's build
something.

Shipped at Work

Let's buildsomething.

Let's build
something.