Rob McKnight

•

min. read

Context in Engineering is Scarce

Great engineering requires deep context, but AI is currently stuck at the surface.

Stay Updated with Agentic Labs

Join Our Mailing List

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

This is a journey many take.

After PMing a semi-similar product, I understand why Google Search Console has been developed the way it has been, and why they don’t jump to add features we all thought were easy to add.

Clearly I just got off a meeting with engineering 😂
— Paul Shapiro (@fighto) August 9, 2024

You find out things aren't as easy as they seem. From surface-level analysis, features often appear straightforward to implement, and existing product decisions may seem arbitrary or overly cautious. It's easy to criticize or propose “obvious” improvements without understanding the underlying complexities.

Those underlying complexities are context. The context of engineering decisions, whether at Uber, Google, or any tech company, is the dark matter that decides what gets built.

But context is scarce.

AI Abundancy vs Context Scarcity

We live in an age of digital abundance, where vast amounts of information are readily available. AI has exacerbated this “problem.” The natural language interface makes AI vastly superior to search engines for information discovery. With a single prompt, you can find almost any information you want.

But that information has zero depth. If you want detail, nuance, context, then you’re in trouble. AI isn’t designed for depth. The predict-the-next-word construction can only ever stay on a single level. If you are an excellent prompt engineer, you can use AI to probe deeper, but then you are left with a secondary problem: Context is scarce.

Because it is scarce, LLMs will try to infer it from the prompt. Responses rely heavily on the construction of the question. If you are using an LLM to understand code and ask “Is this a good approach,” it will invariably answer in the affirmative. It doesn’t have the context for good vs bad code, so gives the answer based on the question.

It struggles even more with Why. If you ask why code is a certain way, an LLM will just give you the easiest explanation–it has no concept or context for Why.

Context doesn’t necessarily exist in the training set for an AI model. Consider the problem above. If we ask Claude why Google Search Console is lacking in features, it responds:

I apologize, but I don't have direct knowledge about Google Search Console's current feature set or any specific limitations it may have. As an AI assistant, I don't have real-time access to Google's products or their development plans.

It admits it doesn’t have context. We can push it for what it thinks, but this is that exact predict-the-next-word problem:

Based on general knowledge of SEO tools and common critiques in the industry, I can offer some thoughts on potential limitations that users might perceive:

It doesn't give context; it provides educated guesses or generalizations based on patterns in its training data. This surface-level insight lacks the depth and specificity of proper contextual understanding. While these AI responses might seem plausible or even insightful initially, they often miss the crucial nuances, historical decisions, and complex interdependencies that shape real-world product development.

In essence, AI provides a simulacrum of knowledge–an approximation that may be useful for general discussion or brainstorming but falls short when understanding the intricate realities of specific products or engineering challenges. In fact, as AI aggregates information from various sources, we experience context collapse, with the original context of information lost or distorted. Despite the abundance, we cannot make sense of all this information or know what's truly relevant and important.

The 4 Levels of Context

How can we formalize context? We need to understand where we are now, and where we need to be, then work towards bridging that gap.

Level 0: General knowledge

This is the stuff that is "generally known" and is base state of LLMs. If you ask an LLM a question without any additional context (like the GSC example above), it will look for an answer entirely based on the general knowledge within its dataset.

The answer is then dependent on the context within the dataset. If the context exists, an LLM can give better insight and answers that if the the context wasn’t available publicly. If you press the model for more understanding it will try and give you more insight, but that insight is (likely) confabulation (“Hallucination” in AI parlance).

Level 1: Surface-level knowledge

This is where you have some basic knowledge about the topic but nothing deeper. This is the state Shapiro was in prior to PM-ing a similar product. He had surface-level knowledge of Google Search Console to fall back on, but no specific context.

With surface-level knowledge, your context is the output of a process rather than the inputs. If you're only looking at the final product or interface, you might make assumptions or criticisms without understanding the underlying reasons, constraints, or decisions that led to that output.

WIth LLMs, RAG models are at this level. RAG models can access and present factual data, they still lack the deeper understanding of the context behind that information. They can provide accurate details about a product's features or specifications, but may struggle to explain the rationale behind design choices or the complex interplay of factors that influenced the development process.

Level 2: Underlying structure

Here, an individual has knowledge of the implementation details. In the context of the Google Search Console, this might be a developer looking over the code. They can see the architecture, the data flow, and the specific algorithms used. This level of knowledge allows for a deeper understanding of why certain features work the way they do or why certain limitations exist.

For LLMs, this level might be achieved through more advanced retrieval methods that not only access surface-level information but also pull in related implementation details, architectural decisions, or system constraints. However, current LLMs don't operate at this level of understanding.

Level 3: Decision-level context

This is the deepest level of context and the most difficult to acquire. It encompasses not just how something is built, but why it was built that way. This includes the historical context, the decision-making processes, the trade-offs considered, and the long-term vision that shaped the product. It is the Why.

In the Google Search Console example, this would involve understanding the strategic decisions behind feature prioritization, the rationale for specific UX choices, and insight into future development plans. It's the kind of knowledge typically held by long-time employees or those deeply involved in the product's evolution. Shapiro is now attaining this level at Uber.

For LLMs, achieving this level of context would require not just access to factual information and implementation details, but also to the reasoning and decision-making processes behind them. This would involve training on or having access to all documentation, all code, all comments, all meeting notes–information that is typically not publicly available and often closely guarded by companies.

Enlightening Engineering Dark Matter

How would we start to "deepen" our level of context in codebases? We’d need to understand the invisible fabric that gives meaning and structure to the visible code. It's the underlying rationale, historical decisions, and interconnected knowledge that exists within and around a codebase.

This "dark matter" of engineering can include:

Architectural decisions: The reasoning behind the overall structure and design patterns used in the codebase.
Historical context: Why specific approaches were chosen or abandoned over time.
Business logic: The connection between code and the real-world processes or rules they represent.
Legacy constraints: Limitations imposed by older systems or ideas that the current code must work around.

But it can also include so much more—the whims of a previous manager, team lore, or lessons from undocumented experiments. All this context is often an unknown unknown for new developers. This is what Paul is up against above. From the outside, it seems many design decisions follow a straight path from evident to complete. From the inside, you get the context and unknowns that make that from idea to execution anything but straight.

This is the tacit and tribal knowledge of the team and members. Tacit knowledge gained by every individual over the years, and tribal knowledge spread between them. Unwritten rules, conventions, or "gotchas" known to experienced team members but not documented.

This is all the dark matter, the context, that is entirely unknown to any new developer on a team. How do we make this explicit and give engineers the deeper meaning of a codebase?

“Education is teaching people context.”

– Tyler Cowen

True value lies not in information itself, but in the context that gives information meaning and relevance.

So, we need to cultivate deep context to get to Level 3. Let’s be clear–tooling is only part of the solution here. We’re not quite at the point where any AI tool can extract tacit knowledge from an engineer’s head. But here is what good tooling can do.

First, it can give you deeper layers of context from within the codebase. While most RAG systems operate at Level 1, providing surface-level knowledge, at Agentic Labs, we've built a Level 2 system with our documentation tool, Stanza, and are actively working towards a Level 3 system. Stanza uses AI to generate detailed breakdowns of code and codebases, providing a crucial step in uncovering the deeper layers of context within the codebase.

The key advantage here is that we make the whole codebase available for consumption. By "losslessly" converting the code into a natural language description and adding intermediate, higher-level summaries, it allows the engineer to more quickly navigate the WHOLE codebase context. This approach overcomes two common limitations: the tunnel vision that can occur with Level 1 RAG systems, and the cognitive constraints that limit how much code an individual can decipher and hold in their head at once.

By operating at Level 2 and pushing towards Level 3, Stanza offers a more comprehensive understanding of the codebase's underlying structure and rationale, going beyond the surface-level information provided by conventional tools. This comprehensive understanding is crucial for navigating the complex web of decisions, dependencies, and historical context that shape a codebase–the very "dark matter" that is so often elusive to newcomers.

Stanza offers a comprehensive view of the codebase structure and AI-generated descriptions of each component. It breaks down what the code does, detailing the process of signing up new users, including input validation, password hashing, and user authentication.

This AI analysis goes beyond simple code comments or essential documentation. Stanza infers the purpose and functionality of code sections, providing new team members with immediate insights into the “why” behind implementations. By offering explanations of critical files and their roles, Stanza helps bridge the gap between seeing the code and understanding its purpose within the larger system.

Second, Stanza offers the opportunity to discuss these insights and the code with the AI. In this case, because the AI can access the entire codebase, it can offer the deeper understanding that general LLMs miss. The predict-the-next-word here can include everything it knows about the codebase.

This allows for a more dynamic and interactive exploration of the codebase's context. The AI doesn't just regurgitate general information; it provides relevant options tailored to the project's goals and existing infrastructure. Stanza “cross-cuts” answers, connecting the dots across related sections of the codebase to provides context that illuminates how different parts interact and impact each other. This comprehensive insight helps understand complex systems and make informed decisions about code or architectural changes.

Third, while this AI-assisted analysis is a powerful starting point, it's essential to recognize that it can't capture all aspects of the “dark matter.” To truly cultivate deep context, Stanza acts as a collaboration tool, allowing experienced team members to review and enhance these AI insights, adding crucial information about:

The rationale behind architectural decisions
Historical context of why specific approaches were chosen
Known limitations or quirks of the system
Future plans or ongoing discussions about potential changes

This isn't just AI-generated context. It is AI-assisted context. The context comes from the code and the team; the Stanza AI brings it to the surface for any team member, new or old.

Stanza's potential extends beyond current codebase analysis. By leveraging git history, Stanza can provide insights into the evolution of the code over time, offering a deeper understanding of why certain implementations ended up the way they did. This historical perspective adds another layer of context, illuminating the path from initial design to current implementation.

As teams integrate Stanza more deeply into their design and development processes, the system's ability to provide context grows exponentially. When used consistently for design creation and documentation, Stanza accumulates a rich repository of context on design decisions. This accumulated knowledge builds Stanza's ability to make informed suggestions about future improvements, effectively reaching the Level 4 context–decision-level understanding.

Context remains scarce, but it always exists. If you are interested in using Stanza to find and cultivate context in your code and organization, contact us for a demo.