Querying ultra-long contexts with summary trees

Jun 9, 2024 18:24 · 869 words · 5 minute read AI LLM

Views my own. There might well be more clever ways to do all of this that I don’t know of.

Suppose you have some long, ordered text data, like news stories going back a century or journal entries or server logs. You’d like to analyze them with a Large Language Model (LLM), but you can’t just concatenate them together (the context window isn’t that long!).

A fun solution to this problem is a summary tree. The idea is that we use the LLM to summarize pairs of consecutive documents, then summarize pairs of consecutive summaries, then summarize pairs of those, and so on, until there’s a tree where the root represents an overall summary of all of the documents and the leaves represent individual documents. Here’s an example of the kind of prompt we might use to produce such a tree:

You are an AI assistant named Claude. Your job is to summarize text documents.

Here is some prior context to help you understand the background of the documents:

    <prior_context>
    {prior_context}
    </prior_context>

Here are the specific documents to summarize at this time:

    <entries_to_summarize>
    {e}
    </entries_to_summarize>

Please carefully read through the documents and prior context. Then, write a summary of the new documents, following these guidelines:

    - Begin the summary with the date or date range of the documents in YYYY-MM-DD format
    - Discuss the most significant events or topics covered in the documents. Provide enough detail about each one so that someone could understand the key points without needing to read the original documents. 
    - Conclude with a few sentences of high-level reflection, extracting key insights or lessons learned.
    - Please avoid providing platitudes or other low-content statements.

    <scratchpad>
    Main points to cover in summary:
    - Date range
    - Key events & topics 
    - Reflective insights
    Refer back to prior context to understand background and tie documents together thematically where relevant.
    </scratchpad>

Note that the prior context is important: at every point we concatenate the highest-level summaries of all the documents so far and use that as prior context for the next summary. This allows the model to use earlier information in deciding how to summarize later documents.

We can interact with this tree by traversing it to produce a cut (i.e. a combination of nodes, none of which is descended from the rest, that collectively cover all leaves) with the right level of detail about a given question/analysis task, and then analyzing it at the level of that cut rather than at the level of individual documents.

To produce a cut, we might use a prompt like:

I am going to provide you with a numbered list of summaries of text documents.
Furthermore I will provide you with a prompt, which is a task that you'll be completing later.

<begin prompt>
{prompt}
<end prompt>

Full Context:
{full_context}

Given the list of summaries, consider if there are any questions you need to answer in order to respond well to the prompt above.
In doing this, please adopt the mindset of a seasoned therapist, judiciously deciding which aspects are most important to dig into.   
Please list these questions.

Then, reflect on which of these summaries would be most useful to dig into at a finer level of detail to answer your questions.
Please include the exact phrase "INSUFFICIENT DETAIL" followed by the number of the document to get more detail on.
Once more, your response MUST contain "INSUFFICIENT DETAIL <NUMBER>".
You will then be provided with a more detailed version of that numbered summary.

Some entries do not have any more detail to provide. These are marked "INELIGIBLE DOCUMENT".
Do not ask for more detail on these.
Once more, do not ask for more detail on INELIGIBLE DOCUMENT's.
As mentioned above, your response MUST contain "INSUFFICIENT DETAIL <NUMBER>" for some number.

Then we just look for the INSUFFICIENT DETAIL string, extract the number, and replace that node in the cut with its children.

Once we have a final cut for a given prompt, we feed that cut in as context and query the model with that plus the prompt.

A few fiddly details:

Building the tree might be expensive/slow, especially because it can’t be parallelized (due to the prior context). If the set of documents is static that’s fine, but if it’s growing it’s good to use a left-heavy binary tree to avoid new documents causing old documents to get re-summarized. Specifically, we use a binary tree where the initial split puts the largest-available power of 2 of documents in the left node. [Thanks to Jesse Salomon for this idea.]
Getting the cut-producing prompt right took some trial and error and is probably pretty domain-specific.. It’s pretty easy to either under-emphasize the need for detail (resulting in the model never asking for more detail), or else to over-emphasize it (causing the model to try produce a cut that’s just made of the original documents). In practice I found it sometimes helpful to put a cap on the number of refinements the model could request and just bias towards more refining.
Getting the summarization prompt right took some trial and error and is probably pretty domain-specific. The main difficulty is ensuring that the summaries don’t devolve into overly-general statements or platitudes.