Chapter 6: AI Brain (LLM Communication)#
Welcome back! In Chapter 5: Code Analysis (AI Understanding), we saw how our project uses AI to analyze the fetched code, identifying core concepts ("Abstractions") and how they relate to each other. We talked about the AI acting like a smart code analyst.
But how exactly does our project talk to this AI? How does it send the code and questions, and how does it get the answers back?
This is where the concept of the AI Brain (LLM Communication) comes in. It's the vital link between our project's code and the powerful capabilities of a Large Language Model (LLM) β the "AI Brain" itself.
What is LLM Communication?#
Think of our project needing to consult an expert who lives far away. The project can't just think the question and get the answer. It needs a way to communicate.
LLM Communication is like:
- Writing a Letter: Preparing a clear and specific question or instruction for the AI (this is called creating a "Prompt").
- Sending the Letter: Sending that prompt over the internet to the LLM's service (via an API, or Application Programming Interface).
- Receiving the Reply: Getting the AI's answer back through the same connection.
This communication is the core utility used whenever the project needs the AI to "think" β whether it's understanding code, deciding on chapter order, or generating tutorial content.
The Problem: Talking to an External Expert#
Talking to an external service like an LLM isn't as simple as calling a function within our own code. We face several challenges:
- External Service: The LLM runs on someone else's computers (like Google's or Anthropic's). We communicate over the network.
- Specific Format: LLMs expect questions in a particular text format (the prompt). Getting the prompt right is crucial for good answers.
- Reliability: Network connections can fail, and LLM services might sometimes be slow or return errors. We need to handle these issues.
- Authentication: We usually need a special key (an API Key) to prove we are allowed to use the service and to track our usage. This key must be kept secret.
- Cost: Using LLMs costs money, usually based on how much text you send and receive. We might want to log usage or avoid repeated calls for the same question.
- Flexibility: We might want the option to switch between different LLM providers (like Google Gemini, Anthropic Claude, etc.) without rewriting all the code that uses the AI.
The Solution: A Dedicated Communication Utility#
To handle these complexities in one place, our project has a dedicated utility function, call_llm
. This function acts as our project's single point of contact for the AI Brain.
Its job is straightforward:
- Take a
prompt
(the question for the AI) as input. - Handle all the messy details of sending the prompt to the LLM API.
- Receive the AI's raw text response.
- Return the response text.
This utility hides the complexity of the LLM API interaction from the rest of the code, making the project cleaner and easier to manage.
How call_llm
Works (Under the Hood)#
The call_llm
utility is located in the file function_app/utils/call_llm.py
.
Let's look at the steps it takes when another part of the code (like one of the analysis nodes) calls it:
- Receive Prompt: It gets the
prompt
string from the calling node. - Logging: It logs the outgoing prompt to a file. This is helpful for debugging and seeing exactly what questions were asked to the AI.
- Caching (Optional): It checks if this exact prompt has been sent before and if the response is saved in a local cache file (
llm_cache.json
). If a cached response exists and caching is enabled, it immediately returns the saved response, skipping the actual API call. This saves time and cost. - Get API Key: It retrieves the necessary API key from the project's environment variables (like
GEMINI_API_KEY
). Keeping keys in environment variables is a standard security practice. - Connect to LLM Service: It uses a specific software library (like
google-generativeai
for Gemini) to connect to the chosen LLM provider's service. - Send Prompt: It sends the prepared prompt to the LLM API.
- Wait for Response: It waits for the LLM service to process the prompt and send back a response.
- Receive Response: It gets the raw text response from the AI.
- Logging: It logs the incoming response from the AI.
- Caching (Optional): If caching is enabled, it saves the prompt and the received response in the local cache file for future use.
- Return Response: It returns the raw text response back to the part of the code that called
call_llm
.
sequenceDiagram
participant CallingNode as e.g., IdentifyAbstractions Node
participant LLM_Util as call_llm Utility
participant LogFile as LLM Log File
participant CacheFile as llm_cache.json
participant LLM_API as LLM Provider API (e.g., Gemini API)
CallingNode->>LLM_Util: call_llm(prompt_text)
LLM_Util->>LogFile: Log PROMPT
alt if use_cache is True
LLM_Util->>CacheFile: Check for prompt
alt if prompt is in cache
CacheFile-->>LLM_Util: Cached response
LLM_Util->>LogFile: Log RESPONSE (from cache)
LLM_Util-->>CallingNode: Return cached response
Note right of CallingNode: API call skipped!
else if prompt is NOT in cache
LLM_Util->>LLM_Util: Get API Key (from env)
LLM_Util->>LLM_API: Send Prompt
LLM_API-->>LLM_Util: Return Response
LLM_Util->>LogFile: Log RESPONSE (from API)
LLM_Util->>CacheFile: Save prompt & response
LLM_Util-->>CallingNode: Return API response
end
else if use_cache is False
LLM_Util->>LLM_Util: Get API Key (from env)
LLM_Util->>LLM_API: Send Prompt
LLM_API-->>LLM_Util: Return Response
LLM_Util->>LogFile: Log RESPONSE (from API)
LLM_Util-->>CallingNode: Return API response
end
This diagram shows how the call_llm
utility acts as the intermediary, handling logging, caching, and the actual API interaction between the calling code (like a Pocket Flow Node) and the external LLM service.
Looking at the Code (utils/call_llm.py
)#
Let's examine the core structure of the call_llm
utility function. The full file includes setup for logging and caching, but the core interaction with the LLM is quite focused.
# function_app/utils/call_llm.py (Simplified)
from google import genai # Library to interact with Google Gemini
import os
import logging # For logging calls
import json # For cache file
# ... other imports for logging/datetime ...
# ... logging and cache setup code ...
# By default, we use Google Gemini
def call_llm(prompt: str, use_cache: bool = True) -> str:
logger.info(f"PROMPT: {prompt}") # Log the incoming prompt
# --- Cache Check (Simplified) ---
if use_cache and os.path.exists(cache_file):
try:
with open(cache_file, 'r') as f:
cache = json.load(f)
if prompt in cache:
logger.info(f"RESPONSE: (from cache)")
return cache[prompt] # Return from cache if found
except:
pass # Handle potential cache file errors
# --- LLM API Call ---
client = genai.Client( # Create a client object
api_key=os.getenv("GEMINI_API_KEY", "your-api-key"), # Get API key securely
)
model = os.getenv("GEMINI_MODEL", "gemini-2.5-pro-exp-03-25") # Get model name (configurable)
try:
response = client.models.generate_content( # Send the prompt to the model
model=model,
contents=[prompt] # The prompt goes here
)
response_text = response.text # Get the text response back
except Exception as e:
logger.error(f"LLM API Error: {e}")
# Re-raise the exception so the calling Node can handle retries (Pocket Flow feature)
raise e
# --- Cache Save (Simplified) ---
if use_cache:
try:
cache = {} # Reload cache to avoid overwriting concurrent writes
if os.path.exists(cache_file):
with open(cache_file, 'r') as f:
cache = json.load(f)
cache[prompt] = response_text # Add the new response
with open(cache_file, 'w') as f:
json.dump(cache, f) # Save the updated cache
except Exception as e:
logger.error(f"Failed to save cache: {e}")
logger.info(f"RESPONSE: {response_text}") # Log the actual response
return response_text # Return the response
# --- Commented out examples for other LLMs ---
# def call_llm_anthropic(...): ...
# def call_llm_openai(...): ...
# etc.
This snippet shows:
* It uses the google.genai
library to interact with Gemini.
* It retrieves the GEMINI_API_KEY
and GEMINI_MODEL
from environment variables using os.getenv()
.
* client.models.generate_content
is the core line that sends the prompt and gets the response.
* Basic caching logic loads from and saves to llm_cache.json
.
* Logging records both prompts and responses.
* Crucially, it re-raises any exceptions from the API call. This allows the Pocket Flow Node that called call_llm
(like IdentifyAbstractions
) to use its built-in retry logic (max_retries
, wait
) to handle temporary network issues or API errors.
You'll notice commented-out functions for other LLM providers (Anthropic, OpenAI). While not active by default, their presence shows how this utility is designed to potentially support switching LLM backends in the future without affecting the nodes that call it.
How Nodes Use call_llm
#
As we saw in Chapter 5: Code Analysis (AI Understanding), the nodes like IdentifyAbstractions
and AnalyzeRelationships
don't worry about how to call the AI. They simply prepare the necessary information (the code context, the instructions) in their prep
methods, formulate the complete prompt
string within their exec
methods, and then call call_llm(prompt)
.
The call_llm
function handles the communication, error handling (basic), logging, and caching. The node then receives the raw text output from call_llm
in its exec
method and is responsible for parsing and validating that output (as we saw with the YAML parsing in Chapter 5).
This separation of concerns (nodes handle data preparation, prompting, and response processing/validation; call_llm
handles the actual API communication) makes the system modular and easier to maintain.
Benefits of a Dedicated LLM Communication Utility#
Feature | Benefit | How call_llm achieves it |
---|---|---|
Abstraction | Hides the complexity of specific LLM APIs. | Provides a single, simple function interface. |
Centralization | Manages API keys, logging, and caching in one place. | All handled within the call_llm function. |
Flexibility | Allows swapping LLM providers more easily. | Code for different providers can live here; configuration determines which is used. |
Observability | Provides a record of all AI interactions. | Logging prompts and responses. |
Cost/Speed Optimization | Avoids repeated calls for the same input. | Simple cache implementation. |
Conclusion#
The call_llm
utility is the project's essential bridge to the AI Brain (the LLM). By encapsulating all the details of API communication, logging, and caching in one place, it allows the rest of the project β especially the Pocket Flow Nodes responsible for analysis and generation β to interact with the AI using a simple, reliable function call. This separation makes the system more robust, easier to develop, and potentially adaptable to different AI models in the future.
Now that we've covered how we talk to the AI and how it analyzes the code, we're ready for the next big step: using this analysis to generate the actual tutorial content.
Next Chapter: Tutorial Content Generation
Generated by AI Codebase Knowledge Builder. References: 1(https://github.com/hieuminh65/Tutorial-Codebase-Knowledge/blob/be7f595a38221b3dd7b1585dc226e47c815dec6e/function_app/nodes.py), 2(https://github.com/hieuminh65/Tutorial-Codebase-Knowledge/blob/be7f595a38221b3dd7b1585dc226e47c815dec6e/function_app/utils/call_llm.py), 3(https://github.com/hieuminh65/Tutorial-Codebase-Knowledge/blob/be7f595a38221b3dd7b1585dc226e47c815dec6e/nodes.py), 4(https://github.com/hieuminh65/Tutorial-Codebase-Knowledge/blob/be7f595a38221b3dd7b1585dc226e47c815dec6e/utils/call_llm.py)