No-Code LLM Agent and RAG Solution at Google Cloud Next 2024
Apr 13, 2024
(On the photo: friends we made at the Startups at the Skyfall pre-conference party including Anna Nerezova from GDG NYC)
In the previous Google Cloud Next 2024 post I mentioned that there were two hackathons at the conference:
For the AI Hackathon, it turned out that the teams will need to leverage the new Agent Builder offerings. I must go back in time at least to 2023 May to Google I/O Connect in Miami. I was looking around for various LLM solutions for a domain-specific knowledge-based (a.k.a. RAG) chat agent and when I talked with Christopher Overholt he drew my attention to the DialogFlow ecosystem besides the Vertex AI. Someone might not necessarily have to go to the Vertex AI model garden and fine-tune a model or build a RAG on Search or other vector databases. At the time I exported HTML HTML-based hierarchical help knowledge base from ThruThink but I experienced server errors with the document chunking / parsing / indexing.
DialogFlow has a way longer history. I have been working on DialogFlow apps for a long time now helping GDG Fresno ex-secretary Mark Simonian to port enhance his applications. Mark is a pediatrician and he was the original expert who developed an automated telephone robot protocol to help baby parents determine Motrin (or Tylenol or Advil) dosage for very young babies. The traditional DialogFlow operated with explicit Intents and Entities and in case of applications like the dosage helper function calling could be very helpful (it is called fulfillment in DialogFlow lingo). Mark’s original agent had a very deep tree structure handling all possible integer baby weight choices, whereas a fulfillment-converted version can handle any arbitrary weight (fractional weight) and the agent structure is much simpler. However the fulfillment function requires programming knowledge, it is a special Cloud Function I wrote in JavaScript.
The classic DialogFlow also was capable of employing NLP (Natural Language Processing) AI techniques and we had to only specify a few example conversations and it was able to understand other possible variations of a question. This was before the GPT era. Since the transformer generative AI boom, DialogFlow got enhanced parallelly to Vertex AI.
It had a set of offerings such as “Enterprise Search Engine” and “Generative AI App Builder”. These already refactored DialogFlow to utilize large language models and greatly simplified the configuration process to no code level. However, there’s a newer latest generation of LLM and generative AI search-based offerings under the umbrella name of Agent Builder. It provides a no-code way to define:
- Very complex agent with simply describing the agent behavior with so-called playbooks. The agents can be hierarchical and the flow could invoke one agent playbook from another.
- The agent(s) can have also DataSets:
- Indexed knowledge bases for RAG
- Function calls
- API calls
- These DataSets can be referenced in the playbooks in a specific markup way, but the whole process is very natural and completely no code.
The engineer has some knobs to tune some parameters, such as chunking size during indexing, RAG retrieval sensitivity, LLM model used, and so on. After the hackathon was announced we were provided with some training videos (recorded by Pak-Ming Wan mostly) and I studied them at the end of the day at my hotel.
The domain
Since I was participating in the ongoing sustainability session series by GDG Tucson (Dan Stormont), and I recently learned about SDGs (Sustainable Development Goals), I decided to download the various yearly reports from the United Nations website and index them for Retrieval Augmented Generation. The resulting agent would help answer questions and have a chat about SDGs. There are 17 SDGs and even though I would wish to have time to go over every one of the yearly reports, someone must try and see how much fun it is to chat with a domain expert agent.
How I built it
I spent a lot of time curating indexable data for the Sustainable Development Goals. I download yearly reports, extended reports, gender snapshots, and massaged data sheets to help build the data store. I used the Agent Builder interface to develop a playbook where I leverage the established data stores. I experimented with various indexing techniques, thresholds, and parameters for data indexing and retrieval. I also went through many versions of the instruction prompts to cover the intended use cases of the agent.
Challenges I ran into
- Data gathering and preparation is still extremely important and a foundational step even for no-code frameworks. Sometimes I needed to convert files to supported data formats.
- PDF documents can be very versatile and sometimes tricky to chunk and index, especially if the PDF contains figures and charts: OCR parsing might be better than the default or the layout parsing but OCR didn’t perform well for me.
- I also had to disable grounding. Grounding has five confidence level thresholds and even when I allowed very low confidence level retrievals, sometimes the retrieval engine still could not perform a RAG for me, so I resorted to completely disabling that filtering.
- It’s a testament to the neck-breaking pace Google is developing under the hood that in the tutorial videos by Pak the product has a different name than “Agent Builder” and the Playbooks were also called differently. So talking with Pak (he was there in person) at the hackathon I had a better picture after a few minutes of showing off my work so far. I’m referring to another YouTube video here.
- I learned one of the most important lessons while interacting with Ferdinand Loesch - an engineer who is working on this feature - when I showed my examples to him. Originally I simply went on manually and entered questions and answers. However, that would have taught the Agent to avoid using the RAG, since my manual entries wouldn’t contain DataStore lookup entries. So the good way to go about examples was to initiate chat in the test sandbox and convert good conversation into examples. These would contain the DataStore lookups (along with the retrieval phrase and retrieved chunks) and I could still slightly tune the textual user questions and agent response.
In the photo, you can see wrappers of the rice cake snacks the organizers provided for us. That was yummy and a lifesaver because I neglected the lunch in favor of the hackathon so the other food source was some protein bars from the Certified Professional Lounge.
Accomplishments I’m proud of
The Agent Builder provided a no-code way of developing an agent that was able to interpret vast amounts of information and synthesize it to help with unique and creative conversation pieces.
Agent Builder provides ready-to-use connectors for a wide range of chat platforms such as Slack, Meta Messenger, Discord, Telegram, Viber, Google Chat, etc. I went on to try to integrate the agent into the Fresno State Sustainability Club Slack but I realized I wasn’t the workspace admin.
In Summary
Google provides a no-code managed framework for developing complex agent scenarios. The details are still extremely important and it’s crucial to discover all possible options of the offered features to reach the best possible outcome.
After the Hackathon
After the hacakthon submission I kept working on the agent post hackathon and developed a stand-alone Streamlit fornt-end (source code) and developed my glue functions to call from the front-end to reach the Agent-Builder. I did my research and had to come to a solution on my own.