Skip to main content

Just-In-Time Prompting: A Remedy for Context Collapse

· 5 min read
Morgan Moneywise
CEO at Morgan Moneywise, Inc.

A cute robot wizard reclining on the ground, exhausted with spiral eyes, reaching out to a hand from a blue portal delivering a glowing magic scroll

Watching an AI agent struggle with a LaTeX ampersand for the tenth time isn't just boring. It’s expensive. You’re sitting there watching your automation burn through your daily quota in real-time just because the LLM can’t remember a backslash.

I tried the usual prompt engineering voodoo. I even threw the "Pro" model at it, hoping the extra reasoning would bail me out. It did not 🥲

That’s when it clicked. Between the start of the session and the final compilation, there are so many intermediate steps that the agent inevitably hits the "lost in the middle" problem. By the time it’s actually time to fix the compile issues, it has forgotten the rules I painstakingly wrote into the system prompt!

I needed a way to inject the rules after the error happens but before the agent tries to fix it. I was about to do it manually—and honestly, at that point, I might as well have just written the LaTeX myself—but then I remember the new Skills feature in gemini-cli. It was exactly the approach I was looking for.

And now I have a blueprint for building AI agents that can reliably troubleshoot and fix their own mistakes with surgical precision!

3 Design Patterns to Stop Polluting Your AI Agent's Context Window

· 7 min read
Morgan Moneywise
CEO at Morgan Moneywise, Inc.

A cute robot wizard in a blue robe sweeping up 'context junk' with a broom

I've been watching Gemini CLI development for a while, and I started noticing a pattern that felt... redundant. First, we got custom Slash Commands. Then Custom Sub-Agents. Now, we have Skills.

It started to feel like feature bloat. Why do we need three different ways to shove a prompt into the context window? Is this just marketing, or is there actual engineering logic here?

On the surface, it looks like a massive violation of SRP. If all three features are just dumping strings into the context window, why do we need three separate abstractions? Is this just feature bloat, or is there an actual architectural reason for the redundancy?

So off I went poking around the source code again. It turns out the real difference isn't in what they can do, but in how much work they make you do to keep the session (and your sanity) from collapsing.

Agent Augmentation vs. Delegation in Google's Gemini CLI Skills System

· 6 min read
Morgan Moneywise
CEO at Morgan Moneywise, Inc.

A cute robot wizard in a Matrix-style chair saying 'I know kung fu'

Documentation tells you what the developers intended to share; the main branch tells you what they are actually building.

After accidentally discovering the undocumented sub-agent feature, I found myself watching Gemini CLI's repository with a closer lens. While reviewing the git logs on SHA d3c206c, I noticed a few mentions of "skills."

At first, I thought it might just be a marketing rebrand of custom agents. But the more I looked, the more it felt like a totally different approach to architecting Agentic AI workflows. I’ve been thinking of it as Delegation vs. Augmentation.

By understanding this distinction, you can stop stuffing your system prompts with every tool imaginable and avoid the unnecessary complexity of managing a swarm of sub-agents. If you can just make your current agent a bit smarter on the fly, why bother with all the extra overhead?

How to Create Custom Sub‑Agents in Gemini CLI

· 6 min read
Morgan Moneywise
CEO at Morgan Moneywise, Inc.

A digital wizard summoning sub-agent spirits

I have been waiting for Google to add custom sub‑agent support to the Gemini CLI for what feels like forever. Claude Code already has it, and at this point I expected feature parity.

Tracking GitHub Issue #3132 has been a daily ritual, and while someone recently hinted that it is "already implemented," the official docs at geminicli.com/docs remain silent.

Naturally, this led to a deep dive into the source code. And guess what? It is there. It's just hidden.

This post is a quick technical breakdown of how to enable the experimental sub‑agent system, how it works under the hood, and how to define your own custom agents using the undocumented TOML-based configuration.

How To Make LLMs Generate Time Series Forecasts Instead Of Texts

· 13 min read
Morgan Moneywise
CEO at Morgan Moneywise, Inc.

picture of a parrot symbolizing an LLM and a metronome symboliziing the LLM morphing into a tool for time series forecasting

Introduction

Since ChatGPT hit the scene, the term 'Large Language Models (LLMs)' has become a buzzword, with everyone on social media sharing the latest papers on the next big advancement. At first, I was just as excited, but eventually, I started to lose interest as many of these so-called breakthroughs felt like incremental improvements. They didn’t offer that 'wow' factor that ChatGPT did. But then, I stumbled upon a post from Amazon Science that reignited my interest in LLMs. They were talking about using LLMs, not for the usual NLP tasks, but for something entirely different: time series forecasting!

This got me excited because imagine being able to harness the power of LLMs—models that have already shown amazing feats in Natural Language Processing (NLP)—and apply it to time series! Could we finally predict the future with perfect accuracy? Well, obviously not, but even reducing uncertainty would be incredibly valuable.

In this blog post, I’ll walk you through how the authors of the Chronos paper successfully repurposed any LLM for time series forecasting. And if you’re the hands-on type, you can follow along with all the code to reproduce the diagrams and results by checking out this GitHub repository.

How to Evaluate Probabilistic Forecasts with Weighted Quantile Loss

· 10 min read
Morgan Moneywise
CEO at Morgan Moneywise, Inc.

picture of a pop art hourglass with fractal branches breaking out, symbolizing multiple possibilities in a trippy, vibrant style

Introduction

So, there I was, reading this paper on time series models, and suddenly I hit a section that made me go, "WTF?" The model didn’t just spit out a single predicted value; instead, it produced a whole range of predictions. My brain instantly went into overdrive—how on earth are you supposed to evaluate that? I’m used to straightforward metrics like RMSE or MAD, where you basically compute the difference between the actual and predicted values. But now, the predicted value has multiple numbers, and let’s just say, my head was ready to explode 😅

That’s when the author introduced Weighted Quantile Loss, and I knew I had to dive deep into it and put together this guide to help folks understand it just as clearly.

How To Use RAG To Crowdsource Event Forecasts

· 31 min read
Morgan Moneywise
CEO at Morgan Moneywise, Inc.

picture of a robot in a room full of monitors

Introduction

As someone who works with vector databases daily, I've become accustomed to the conventional applications of Retrieval-Augmented Generation (RAG) in scenarios such as extracting information from dense user manuals, navigating complex code bases, or conducting in-depth legal research. These "talk to your documents" use cases, while impressive, often revolve around similar challenges across different datasets, which can become somewhat monotonous.

So, it was particularly refreshing when I came across the paper "Approaching Human-Level Forecasting with Language Models" by researchers Danny Halawi, Fred Zhang, Chen Yueh-Han, and Jacob Steinhardt from UC Berkeley. They propose a novel (at least to me) use of RAG: forecasting events!

How to Set Up Your Local SWE-Agent Dev Environment in 5 Minutes (or less!)

· 10 min read
Morgan Moneywise
CEO at Morgan Moneywise, Inc.

picture of a robot riding a blue whale

Introduction

Imagine a tool that can dive into real GitHub repositories to debug and fix issues automatically. That's SWE-agent for you, a project born at Princeton University, where language models like GPT-4 are turned into software engineering agents. These aren't just toys either; they've shown to resolve over 12% of software issues tested in the SWE-bench dataset. While 12% might not initially seem high, it represents a significant leap from the previous benchmark of just 3.79%. This achievement marks a considerable advancement in the field, underscoring the growing potential of AI to transform software development and maintenance.

My journey into SWE-agent began with curiosity and a bit of a stumble. I wanted to set up a local dev environment to study the model's inference step but the project doesn't say how to set up such an environment! It's a familiar story in open-source projects, especially those with roots in academia. I encountered a mix of excitement and frustration, reading through the setup instructions in the README and realizing the commitment needed to even start. And I wasn't the only one feeling this way; a community issue highlighted similar struggles.

Deciding to lean into the challenge, I saw an opportunity to simplify this for everyone. While the official setup process is being refined, I've put together an alternative guide to get you up and running with SWE-agent in a local dev environment using dev containers.

All you need is Docker and VS Code!

How To Use RAG To Improve Your LLM's Reasoning Skills

· 12 min read
Morgan Moneywise
CEO at Morgan Moneywise, Inc.

picture of gears to represent integration tests

Introduction

Retrieval Augmented Generation (RAG) typically finds its place in enhancing document-based question answering (QnA), effectively leveraging extensive databases to provide contextually relevant information for Large Language Models (LLMs) to formulate precise answers. Traditionally, when looking to boost the reasoning capabilities of LLMs, the go-to strategy has been fine-tuning these models with additional data. However, fine-tuning is not only resource-intensive but also presents scalability challenges.

Interestingly, RAG could potentially offer a more efficient pathway to enhance LLMs' reasoning skills without the hefty costs of fine-tuning. This intriguing premise is explored in depth in Enhancing LLM Intelligence with ARM-RAG: Auxiliary Rationale Memory for Retrieval Augmented Generation by Eric Melz, which proposes a novel use of RAG beyond its conventional application, aiming to refine and expand the problem-solving prowess of LLMs efficiently.

How to do RAG without Vector Databases

· 13 min read
Morgan Moneywise
CEO at Morgan Moneywise, Inc.

picture of gears to represent integration tests

Introduction

When it comes to bestowing Large Language Models (LLMs) with long-term memory, the prevalent approach often involves a Retrieval Augmented Generation (RAG) solution, with vector databases acting as the storage mechanism for the long-term memory. This begs the question: Can we achieve the same results without vector databases?

Enter "RecallM: An Adaptable Memory Mechanism with Temporal Understanding for Large Language Models" by Brandon Kynoch, Hugo Latapie, and Dwane van der Sluis. This paper proposes the use of an automatically constructed knowledge graph as the backbone of long-term memory for LLMs.