AI-Assisted Spreadsheet Autofill: Context is King

AI-Assisted Spreadsheet Autofill: Context is King

May 21, 2024

Filling out spreadsheets can be tedious and time-consuming. What if you could automate this process with the power of AI? This blog post explores how to leverage context within Large Language Models (LLMs) to supercharge your spreadsheet autofill capabilities.

Before we begin, let’s define some key terms:

  • LLM (Large Language Model): A type of artificial intelligence that excels at understanding and generating human language. LLMs are trained on massive datasets of text and code, enabling them to perform tasks like writing different kinds of creative content, translating languages, and answering your questions in an informative way.
  • Prompt: The input you provide to an LLM to guide its response. A well-crafted prompt is essential for getting the desired output.
  • Token: Tokens can be thought of as pieces of words. One token is roughly equivalent to 4 characters of text generation. As a point of reference, the collected works of Shakespeare are about 900,000 words or 1.2M tokens.
  • Tablesmith: The spreadsheet automation tool I created, empowered with AI capabilities (https://tablesmith.io/).

Let’s jump into a practical example now.

Recipe Autofill

Imagine you have a spreadsheet containing 200 recipe names, and you want to add columns for ingredients, steps, and tags. Manually searching for this information on Google and copying/pasting would be incredibly tedious. Let’s explore how AI can streamline this process.

The Naive Approach:

  1. Open ChatGPT
  2. Copy a recipe name from your spreadsheet and ask ChatGPT for the ingredients.
  3. Paste the generated ingredients back into your spreadsheet.
  4. Ask ChatGPT for the steps and fill them in the spreadsheet.
  5. Repeat 200 times.

Setting aside the repetitive nature of this method, let’s focus on a crucial aspect: context. When you ask ChatGPT for the steps after asking for the ingredients, it automatically understands that you want the steps to correspond to those specific ingredients. This implicit connection stems from the conversational context within ChatGPT – your previous prompts and answers are carried forward.

You can also manually replicate this context by explicitly including past information in your prompt:

Tell me the steps for this recipe: [Recipe Name]. I have the following ingredients: [Ingredients Generated Before].

However, this approach is cumbersome, requiring manual prompt updates and potentially overwhelming the LLM with lengthy text blobs.

Automating Context with Code:

If you have programming knowledge, automating this process is fairly straightforward. You’ll need an LLM API key to integrate it into your code. Then, you can loop through your spreadsheet rows, sending ingredient requests to the LLM API, receiving the answer, filling the cell, and proceeding to the steps. The key is to include previous content in each API call to maintain the context:

[
  {"role": "user", "content" : "Tell me the ingredients of Kung Pao Chicken"},
  {"role": "assistant", "content" : "<AI Generated Ingredients>"},
  {"role": "user", "content" : "Now tell me the steps"}
]

Remember that your token count will accumulate as you ask more questions, as all previous content contributes to the context. Sometimes, you might only need a partial context. For instance, if you only want tags based on ingredients, you wouldn’t need to include the steps info in your prompt.

Effortless Context Management with Tablesmith:

Tablesmith simplifies this process, requiring no coding experience. The crucial aspect is understanding how to leverage context, which is easily enabled within the tool.

Row Context:

When you create an “Add Column” stage in Tablesmith and choose “AI Text” as the input type, you’ll encounter the “Save row context” option:

Enabling this option instructs Tablesmith to remember the prompts and answers within this stage for each row. Subsequent AI autofill stages will automatically inherit this context. It’s as simple as enabling the option and crafting your prompts naturally, mimicking a conversation with ChatGPT.

Recipe Generator

Let’s shift gears to another use case, exploring column context. Imagine you want to generate 200 unique recipe names, starting with an empty spreadsheet.

One-by-One Generation with Context:

While ChatGPT could handle generating 200 names at once, a step-by-step approach is more robust for complex prompts. To avoid duplicate recipe names, we need context, but this time, the context applies within a single column.

Column Context in Tablesmith:

In Tablesmith, this is handled differently than row context. Since recipe name generation occurs within a single stage, sharing context with subsequent stages is irrelevant. We need column context – where each cell’s data generation is informed by previous prompts and answers within the same column.

To enable this, open the prompt editor and activate “Enable column context.” You’ll notice a new “Context key” option and a “First prompt/Following prompt” button:

Why Two Prompts?

Consider how you’d naturally request multiple recipe names one by one. You’d start with “Show me a recipe name,” and then continue with “next” to generate subsequent names. The “First Prompt” establishes the overall requirement, while the “Following Prompt” guides the generation of each subsequent item.

Managing Token Usage with Column Keys:

A crucial consideration with column context is the rapidly accumulating context size. With each new cell, the entire column’s history is included, potentially leading to excessive token consumption. If the first cell autofill uses n tokens and your column has m rows, the total token cost can be calculated as:

n * (1 + m) * m / 2

For example, if the first cell uses 65 tokens and you want 100 rows, the total cost is 328,250 tokens! That’s a lot of tokens, and it’s only 100 rows—what happens if we want to autofill 1,000 rows?

This is where the “Context key” comes into play. Instead of generating 200 recipes directly, let’s introduce a constraint: generate 20 recipes for each of 10 different countries. Now, you only need context within each country group, as recipes are unlikely to repeat across countries. This significantly reduces token usage by limiting the context scope. If the first cell autofill uses n tokens, your column has m rows, and you group them into k groups, the formula becomes:

(n * (1 + m/k) * m / (2*k)) * k

For example, if the first cell uses 65 tokens, you want 100 rows, and group them into 10 groups, the total cost is 35,750 tokens. You see the huge difference here!

Data Extraction – A Bonus Tip

While not directly related to context, extracting data from unstructured text is a common need. Consider a spreadsheet containing raw text recipes scraped from websites. You could manually send each text to ChatGPT and ask for specific information, such as the recipe name.

However, most LLM APIs offer a powerful feature called “function calling” or “tool use”. You define a function like this:

def SaveRecipe(name, ingredients, steps, tags)

Then, you provide both the unstructured text and the function definition to the LLM. The LLM won’t execute the function but will extract the relevant parameters (name, ingredients, steps, tags) from the text. This is a highly efficient way to extract structured data, as it bypasses the need for conversational context and minimizes token usage.

Tablesmith’s “Data Extraction” stage leverages this functionality. Since columns are already defined, you simply select them and add any necessary hints for the LLM.

Conclusion

Mastering the concept of context is crucial for maximizing the efficiency and accuracy of AI-assisted spreadsheet autofill. Tablesmith provides an intuitive interface for managing both row and column context, empowering you to automate complex tasks without writing a single line of code. By understanding the nuances of context and utilizing tools like Tablesmith, you can unlock the full potential of AI for spreadsheet automation.

Last updated on