How I Could Have Won My First Kaggle Competition at Google Cloud Next 2024 in Las Vegas

Apr 12, 2024

Google Cloud Next Conferences are always amazing. They are at the level of Google I/O, but on top of that, there are also a lot of after-parties and sometimes even morning events. Back on February 21st I participated in a Gen AI Lab and Duet AI Roadshow at the Google Redwood City campus.

This was my first time at that campus and it was one of the most beautiful ones. (My only complaint could have been that there were no Level 2 chargers for visitors.)

During that event, I won a knowledge quiz game and was awarded a golden ticket to Cloud Next. Earlier in January when it was announced that Cloud Next would be held in Las Vegas for the first time in its history (normally it was either at the San Francisco Moscone Center or a bigger Bay Area Google campus event center, such as MP6). Once I learned that I immediately reserved a hotel room and a flight, even though I haven’t had a ticket for the conference itself. Allegiant fortunately had a direct flight from Fresno, and the OYO Hotel had $11 rooms, although with all the extra fees it ballooned up to about ~$60 a night, but that is still way cheaper than any Bay Area accommodation. The airfare bumped up the total cost, but ~$200 + ~$250 is still decent if I can snatch a conference ticket.

So I was super ecstatic when I got that golden ticket. However, after my excitement cooled down I realized that my GDSC (Google Developer Student Club) Fresno students I’m cooperating with need that ticket way more than me. Last year I carpooled Zheng Wei Ng and Ren Hao Wong to the Google Cloud Next 2023 in Moscone - and I could tell a lot of stories - but both of them were about to graduate this summer and Catherine Laserna (Google Generation and Smittcamp Scholar) will take over the baton for the GDSC Fresno. I had only one golden ticket, and after a conversation with the students, we concluded that Katherine would earn the ticket.

Fortunately, later I could grab a 100% coupon code (after a $600 and then a $400 total cost coupon - which is a huge discount compared to the $2000 full price) and also an entry as a GDG Fresno organizer for myself. So I was able to bring a four-person Fresno delegation to the conference in various ways.

At the conference, I learned about two hackathons:

  1. A Kaggle competition
  2. An AI hackathon

I focused the majority of my time and energy on these competitions. I like a challenge and it was a long time overdue to participate in a Kaggle competition.

  • To prevent any cheating the competition dataset was secret, and we only got 3 sample inputs and outputs.
  • Teams needed to submit large language models that perform as great as possible over trivia questions, riddles, and writing poems.
  • The Kaggle competition environment was offline (without any online connection), I guess to prevent the off-loading of the competition dataset. Therefore competitors needed to package up anything very snug because every unexpected connection attempt of any part of the submission would fail.

Looking back this was very tricky for a first competition. I not only needed to learn Kaggle submissions but in a way that would be self-contained without any online connection. However, at the kick-off afternoon, I made my first default submission to at least take a reference slot on the leaderboard I can improve later. My ideas were the following to increase my score:

  1. Trying to use a larger LLM. The default submission was with a 1.1b Gemma.
  2. I quickly learned that the submission evaluation environment has limitations, and there was also an execution time limit. I was able to get a 7b gemma working (fp16 int4 GPU optimized), but the CPU-optimized version timed out. But the GPU model moved me up a little on the leaderboard.
  3. Then I was thinking about a 13b Llama2 or Mistral. I was not able to get the instance to hold the size of a 13b model and finish in time.
  4. I spent too much time trying to tackle the model size issue. It was clear I didn’t have time anything like fine-tuning or RAG with some quiz and riddle datasets.
  5. So I focused on prompt engineering. First I “flatter the model”. LLMs are capable of various roles stemming from telling a lullaby to a child, speaking as a pirate, or writing scientific publications. Because they are so versatile it is very important to tell them what is their role, because magically they’ll perform better. This technique gives less boost in this modern age because newer training improves the models’ default behavior so that the gap is not that big. But I’d still “flatter the model” by telling what areas it is expert in.
  6. I also use a CoT (Chain of Thought) prompt. The most classic is the “think through step by step” instruction. Normally this results in the model explicitly showing the detailed steps. This competition needed short answers to the point, so on top of CoT enabling I needed to tell the model to think through things quietly and not spell out the intermediate steps out loud.
  7. I could also perform some string processing of the LLM response. I told the model to ideally “Try to answer with no more than five words.”.
  8. With my efforts I finished 13 out of 27.

There were a lot of teams, and many of them tried fine-tuning and RAG solutions. The most important trick was a prompt engineering one: few-shot prompting. There was a guy named Chris who made some submissions but he wasn’t participating because he was affiliated with Kaggle. He also tried RAG and fine-tuning, but his winning solution was “simply” few-shot prompting. He didn’t even flatter the model or use Chain-of-Thought. With enough nice examples, the few-shot prompting implicitly explains all of this to the model. My jaw dropped! I could have won if I had not forgotten the few-shot examples!

After the RAG Fusion’s context resolution prompt this competition provided again a hard lesson on how important is few-shot prompting. As for prompt engineering, in general, I’ve heard some engineer voices downplaying it as a pseudoscience, but it is just as important part of a solution as any other building blocks (such as RAG or fine-tuning), or in this case: more important because I could have won simply by prompt engineering.

This blog post is too long now so I’ll talk about the AI Hackathon in the next one.

Comments loading...