Skip to content

Chapter 6: AI Brain (LLM Communication)#

Welcome back! In Chapter 5: Code Analysis (AI Understanding), we saw how our project uses AI to analyze the fetched code, identifying core concepts ("Abstractions") and how they relate to each other. We talked about the AI acting like a smart code analyst.

But how exactly does our project talk to this AI? How does it send the code and questions, and how does it get the answers back?

This is where the concept of the AI Brain (LLM Communication) comes in. It's the vital link between our project's code and the powerful capabilities of a Large Language Model (LLM) – the "AI Brain" itself.

What is LLM Communication?#

Think of our project needing to consult an expert who lives far away. The project can't just think the question and get the answer. It needs a way to communicate.

LLM Communication is like:

  1. Writing a Letter: Preparing a clear and specific question or instruction for the AI (this is called creating a "Prompt").
  2. Sending the Letter: Sending that prompt over the internet to the LLM's service (via an API, or Application Programming Interface).
  3. Receiving the Reply: Getting the AI's answer back through the same connection.

This communication is the core utility used whenever the project needs the AI to "think" – whether it's understanding code, deciding on chapter order, or generating tutorial content.

The Problem: Talking to an External Expert#

Talking to an external service like an LLM isn't as simple as calling a function within our own code. We face several challenges:

  • External Service: The LLM runs on someone else's computers (like Google's or Anthropic's). We communicate over the network.
  • Specific Format: LLMs expect questions in a particular text format (the prompt). Getting the prompt right is crucial for good answers.
  • Reliability: Network connections can fail, and LLM services might sometimes be slow or return errors. We need to handle these issues.
  • Authentication: We usually need a special key (an API Key) to prove we are allowed to use the service and to track our usage. This key must be kept secret.
  • Cost: Using LLMs costs money, usually based on how much text you send and receive. We might want to log usage or avoid repeated calls for the same question.
  • Flexibility: We might want the option to switch between different LLM providers (like Google Gemini, Anthropic Claude, etc.) without rewriting all the code that uses the AI.

The Solution: A Dedicated Communication Utility#

To handle these complexities in one place, our project has a dedicated utility function, call_llm. This function acts as our project's single point of contact for the AI Brain.

Its job is straightforward:

  • Take a prompt (the question for the AI) as input.
  • Handle all the messy details of sending the prompt to the LLM API.
  • Receive the AI's raw text response.
  • Return the response text.

This utility hides the complexity of the LLM API interaction from the rest of the code, making the project cleaner and easier to manage.

How call_llm Works (Under the Hood)#

The call_llm utility is located in the file function_app/utils/call_llm.py.

Let's look at the steps it takes when another part of the code (like one of the analysis nodes) calls it:

  1. Receive Prompt: It gets the prompt string from the calling node.
  2. Logging: It logs the outgoing prompt to a file. This is helpful for debugging and seeing exactly what questions were asked to the AI.
  3. Caching (Optional): It checks if this exact prompt has been sent before and if the response is saved in a local cache file (llm_cache.json). If a cached response exists and caching is enabled, it immediately returns the saved response, skipping the actual API call. This saves time and cost.
  4. Get API Key: It retrieves the necessary API key from the project's environment variables (like GEMINI_API_KEY). Keeping keys in environment variables is a standard security practice.
  5. Connect to LLM Service: It uses a specific software library (like google-generativeai for Gemini) to connect to the chosen LLM provider's service.
  6. Send Prompt: It sends the prepared prompt to the LLM API.
  7. Wait for Response: It waits for the LLM service to process the prompt and send back a response.
  8. Receive Response: It gets the raw text response from the AI.
  9. Logging: It logs the incoming response from the AI.
  10. Caching (Optional): If caching is enabled, it saves the prompt and the received response in the local cache file for future use.
  11. Return Response: It returns the raw text response back to the part of the code that called call_llm.
sequenceDiagram
    participant CallingNode as e.g., IdentifyAbstractions Node
    participant LLM_Util as call_llm Utility
    participant LogFile as LLM Log File
    participant CacheFile as llm_cache.json
    participant LLM_API as LLM Provider API (e.g., Gemini API)

    CallingNode->>LLM_Util: call_llm(prompt_text)
    LLM_Util->>LogFile: Log PROMPT
    alt if use_cache is True
        LLM_Util->>CacheFile: Check for prompt
        alt if prompt is in cache
            CacheFile-->>LLM_Util: Cached response
            LLM_Util->>LogFile: Log RESPONSE (from cache)
            LLM_Util-->>CallingNode: Return cached response
            Note right of CallingNode: API call skipped!
        else if prompt is NOT in cache
            LLM_Util->>LLM_Util: Get API Key (from env)
            LLM_Util->>LLM_API: Send Prompt
            LLM_API-->>LLM_Util: Return Response
            LLM_Util->>LogFile: Log RESPONSE (from API)
            LLM_Util->>CacheFile: Save prompt & response
            LLM_Util-->>CallingNode: Return API response
        end
    else if use_cache is False
        LLM_Util->>LLM_Util: Get API Key (from env)
        LLM_Util->>LLM_API: Send Prompt
        LLM_API-->>LLM_Util: Return Response
        LLM_Util->>LogFile: Log RESPONSE (from API)
        LLM_Util-->>CallingNode: Return API response
    end

This diagram shows how the call_llm utility acts as the intermediary, handling logging, caching, and the actual API interaction between the calling code (like a Pocket Flow Node) and the external LLM service.

Looking at the Code (utils/call_llm.py)#

Let's examine the core structure of the call_llm utility function. The full file includes setup for logging and caching, but the core interaction with the LLM is quite focused.

# function_app/utils/call_llm.py (Simplified)
from google import genai # Library to interact with Google Gemini
import os
import logging # For logging calls
import json # For cache file
# ... other imports for logging/datetime ...

# ... logging and cache setup code ...

# By default, we use Google Gemini
def call_llm(prompt: str, use_cache: bool = True) -> str:
    logger.info(f"PROMPT: {prompt}") # Log the incoming prompt

    # --- Cache Check (Simplified) ---
    if use_cache and os.path.exists(cache_file):
        try:
            with open(cache_file, 'r') as f:
                cache = json.load(f)
                if prompt in cache:
                    logger.info(f"RESPONSE: (from cache)")
                    return cache[prompt] # Return from cache if found
        except:
            pass # Handle potential cache file errors

    # --- LLM API Call ---
    client = genai.Client( # Create a client object
        api_key=os.getenv("GEMINI_API_KEY", "your-api-key"), # Get API key securely
    )
    model = os.getenv("GEMINI_MODEL", "gemini-2.5-pro-exp-03-25") # Get model name (configurable)

    try:
        response = client.models.generate_content( # Send the prompt to the model
            model=model,
            contents=[prompt] # The prompt goes here
        )
        response_text = response.text # Get the text response back

    except Exception as e:
        logger.error(f"LLM API Error: {e}")
        # Re-raise the exception so the calling Node can handle retries (Pocket Flow feature)
        raise e

    # --- Cache Save (Simplified) ---
    if use_cache:
        try:
            cache = {} # Reload cache to avoid overwriting concurrent writes
            if os.path.exists(cache_file):
                with open(cache_file, 'r') as f:
                    cache = json.load(f)
            cache[prompt] = response_text # Add the new response
            with open(cache_file, 'w') as f:
                json.dump(cache, f) # Save the updated cache
        except Exception as e:
            logger.error(f"Failed to save cache: {e}")

    logger.info(f"RESPONSE: {response_text}") # Log the actual response
    return response_text # Return the response

# --- Commented out examples for other LLMs ---
# def call_llm_anthropic(...): ...
# def call_llm_openai(...): ...
# etc.

This snippet shows: * It uses the google.genai library to interact with Gemini. * It retrieves the GEMINI_API_KEY and GEMINI_MODEL from environment variables using os.getenv(). * client.models.generate_content is the core line that sends the prompt and gets the response. * Basic caching logic loads from and saves to llm_cache.json. * Logging records both prompts and responses. * Crucially, it re-raises any exceptions from the API call. This allows the Pocket Flow Node that called call_llm (like IdentifyAbstractions) to use its built-in retry logic (max_retries, wait) to handle temporary network issues or API errors.

You'll notice commented-out functions for other LLM providers (Anthropic, OpenAI). While not active by default, their presence shows how this utility is designed to potentially support switching LLM backends in the future without affecting the nodes that call it.

How Nodes Use call_llm#

As we saw in Chapter 5: Code Analysis (AI Understanding), the nodes like IdentifyAbstractions and AnalyzeRelationships don't worry about how to call the AI. They simply prepare the necessary information (the code context, the instructions) in their prep methods, formulate the complete prompt string within their exec methods, and then call call_llm(prompt).

The call_llm function handles the communication, error handling (basic), logging, and caching. The node then receives the raw text output from call_llm in its exec method and is responsible for parsing and validating that output (as we saw with the YAML parsing in Chapter 5).

This separation of concerns (nodes handle data preparation, prompting, and response processing/validation; call_llm handles the actual API communication) makes the system modular and easier to maintain.

Benefits of a Dedicated LLM Communication Utility#

Feature Benefit How call_llm achieves it
Abstraction Hides the complexity of specific LLM APIs. Provides a single, simple function interface.
Centralization Manages API keys, logging, and caching in one place. All handled within the call_llm function.
Flexibility Allows swapping LLM providers more easily. Code for different providers can live here; configuration determines which is used.
Observability Provides a record of all AI interactions. Logging prompts and responses.
Cost/Speed Optimization Avoids repeated calls for the same input. Simple cache implementation.

Conclusion#

The call_llm utility is the project's essential bridge to the AI Brain (the LLM). By encapsulating all the details of API communication, logging, and caching in one place, it allows the rest of the project – especially the Pocket Flow Nodes responsible for analysis and generation – to interact with the AI using a simple, reliable function call. This separation makes the system more robust, easier to develop, and potentially adaptable to different AI models in the future.

Now that we've covered how we talk to the AI and how it analyzes the code, we're ready for the next big step: using this analysis to generate the actual tutorial content.

Next Chapter: Tutorial Content Generation


Generated by AI Codebase Knowledge Builder. References: 1(https://github.com/hieuminh65/Tutorial-Codebase-Knowledge/blob/be7f595a38221b3dd7b1585dc226e47c815dec6e/function_app/nodes.py), 2(https://github.com/hieuminh65/Tutorial-Codebase-Knowledge/blob/be7f595a38221b3dd7b1585dc226e47c815dec6e/function_app/utils/call_llm.py), 3(https://github.com/hieuminh65/Tutorial-Codebase-Knowledge/blob/be7f595a38221b3dd7b1585dc226e47c815dec6e/nodes.py), 4(https://github.com/hieuminh65/Tutorial-Codebase-Knowledge/blob/be7f595a38221b3dd7b1585dc226e47c815dec6e/utils/call_llm.py)