Assah Bismark

Software Engineering

Stabilize Claude Code for Open-Weight Models

A three-layer proxy stack to keep Claude Code stable when routing through LiteLLM to open-weight models.
Running Claude Code with open-weight models like DeepSeek or Qwen through a LiteLLM proxy works. Until it doesn't. The port changes every time LiteLLM restarts because it picks a random dynamic port. The request payload grows until backends reject it with 400 errors. Usage stats come back null and t...More ›

More Agents Is Not All You Need

A Google Research paper tested 180 agent configurations and found multi-agent systems average -3.5% performance. I tested it on code generation and got a different result.
Everyone building AI agent systems right now assumes the same thing: more agents, better results. Split the work across specialized agents, let them collaborate, and the output improves. A research paper from Google Research, DeepMind, and MIT tested that assumption across 180 controlled configurati...More ›

Architecture Is the New Product

AI writes the code now. The architecture is what actually ships.
What do software teams actually produce? Most people say features. Ship features, hit deadlines, move the roadmap. For a long time that was close enough. But now AI can generate features. It can scaffold services, write CRUD endpoints, wire up frontends, produce tests. The code itself is approaching...More ›

Taste, Judgment, and the Thing AI Cannot Do

AI handles the logic. But logic was never the hard part.
AI has been around since the very beginning of computers. This is not some new phenomenon. Computing has steadily evolved from binary code and assembly language to the high-level languages we use today, all of it trying to bridge the gap between what a human wants and what a machine can execute. Pre...More ›

There Is No AI Thinking (And You Can't Outsource It)

The machines got faster. The real question is whether we got lazier.
There Is No AI Thinking (And You Can't Outsource It) We have officially entered the era of Inference-at-Scale. The marketing hype surrounding "Artificial Intelligence" has never been louder, bolstered by the deployment of NVIDIA's Rubin) architecture and the rise of Agentic Ecosystems. These systems...More ›

The Infinite Software Crisis

What Happens When AI Writes Faster Than We Can Think
The Infinite Software Crisis: What Happens When AI Writes Faster Than We Can Think There's been a lot of talk lately about what's being called the "Infinite Software Crisis". It's not a new observation - people have been warning about this for a while - but it's been circulating more in engineering ...More ›

Streams, Parallelism, and the Quest for Speed

A pragmatic exploration of Java streams and parallelism.
I've always been fascinated by the relentless pursuit of performance in software. It's a game of optimization, a constant dance between elegance and efficiency. Lately, I've been diving deep into two concepts that feel like they're at the heart of this pursuit: streams and parallelism. They're not n...More ›

Rust concurrency with Mandelbrot Set

Exploring concurrency in Rust programming.
Rust excels in concurrent programming by enforcing rules that prevent memory errors and data races. For example: - Mutexes: Rust ensures you only access shared data when holding the lock and releases it automatically. In C/C++, this relationship is often left to comments. - Read-Only Data: Rust prev...More ›