⚠️ Privacy & Security Reminder: Whenever you use structured output (or any LLM for that matter), always consider the privacy and sensitivity of the data you're working with. Make sure to follow HIPAA, FERPA, and all other relevant regulations if handling protected or confidential information. Use de-identified or synthetic data for demos, and be mindful that not all AI tools offer robust data privacy protections.
Exploring use cases for structured output in behavior analysis
Earlier this year I was lucky enough to present on one of my favorite features of large language models (LLMs), structured output 👈🔥.
Now, I am a sucker for exploring the latest and greatest that these tools provide, but instead of talking about agents, skills, or Insert thing that is going to drop in 2-minutes and change the face of AI again for the 100th time, I chose to spend my time talking about this feature of LLMs that has been around for a while because it has SO MANY potential use cases for behavior analysis. That is what is exciting to me. The possibilities that this technology can unlock.
If you are new to structured output... no worries, we will get you up to speed. From there, we are going to explore some of the many use cases where this can be used in behavior analysis, and close out some benefits and drawbacks of using structured output in your clinical and operational workflows.
So... What is structured output?
Unless you've been living under a rock for the past few years (if you have, honestly, got room?), you've likely interacted with an LLM in some form. Maybe you've typed a question into ChatGPT or Claude, or scrolled past the "AI Overview" at the top of a Google search. The interaction likely goes something like this: you provide an input, sprinkle in some context, and hit send. Structured output works somewhat the same way. But instead of streaming back a wall of text, the model returns a response that conforms to a shape you define for it. Imagine telling the model: "give me back a list with exactly these fields, filled in, every time." That orderly, predictable output can then be used downstream far more cleanly than raw text ever could.
Now, I operated clinically for many years, and one thing is certain. Clinicians operate in a sea of unstructured text. We are constantly interviewing clients, taking notes, and trying to aggregate information from a variety of sources. The kicker is, we typically add all that information to more unstructured documents. Furthermore, when we share and receive information with others, this is usually delivered in a PDF (which in our world, typically means we are back to unstructured text).
Now, I am not dismissing the value that these unstructured documents provide. They are incredibly useful and insightful permanent products that can have an amazing impact on the services we deliver. They can contain a recipe for success to navigate the toughest situations, can result in surfacing critical information for all involved, and even result in authorizations for life-altering services. But our permanent products now have a shelf life. It was a fit-for-purpose adventure. And the review of that information is often a manual process that is time-consuming. This time-consuming task can result in us either allocating valuable resources to the review of our permanent products, spot checking a sample of permanent products, or foregoing review of them entirely. That is painful.
This is where structured output comes in. The information we capture in these documents can and should eventually be structured. And with the advent of LLMs, we are able to spin up and down systems that can help us extract that valuable information in an efficient and effective manner.
Evaluating Permanent Products with Structured Output
Say you want to evaluate the completeness of a functional behavior assessment written by one of your clinicians. Your organization has a 50-point quality checklist, and historically, someone had to sit down with the report and the checklist and work through it manually. Time consuming, and prone to human error.
With structured output, you pass the assessment report into a model alongside your checklist criteria and define exactly what you want back. The model generates structured data from the document and returns something like this:
Example: defining the schema for the model to return
import { z } from "zod";
// Define the shape of the response you want the model to return.
// Each field is a boolean — true if the element is present in the document, false if it is missing.
// The .describe() call tells the model what to look for when evaluating each field.
const FBAChecklist = z.object({
client_first_name_present: z
.boolean()
.describe("True if the client's first name appears in the document."),
date_of_assessment_present: z
.boolean()
.describe("True if the date the assessment was conducted is documented."),
reason_for_referral_present: z
.boolean()
.describe(
"True if the reason the client was referred for services is stated."
),
background_history_present: z
.boolean()
.describe(
"True if relevant background and developmental history is included."
),
informant_interviews_conducted: z
.boolean()
.describe(
"True if at least one informant interview was conducted and documented."
),
direct_observation_completed: z
.boolean()
.describe(
"True if at least one direct observation of the target behavior is documented."
),
hypothesized_function_stated: z
.boolean()
.describe(
"True if a hypothesized function of the target behavior is clearly stated."
),
clinician_signature_present: z
.boolean()
.describe(
"True if the clinician's signature or credentials appear on the document."
),
});Example: structured output returned by the model
{
"client_first_name_present": true,
"date_of_assessment_present": true,
"reason_for_referral_present": true,
"background_history_present": true,
"informant_interviews_conducted": true,
"direct_observation_completed": true,
"hypothesized_function_stated": true,
"clinician_signature_present": true
}Now, if you are like me, you are probably thinking of all the different ways you can use structured output to improve your clinical and operational workflows. Here are a few ideas that come to mind:
| Theme | What It Covers | Example Use Cases |
|---|---|---|
| Clinical Documentation | Turning session notes, progress summaries, and intake forms into structured data | Monitor session notes for compliance, structure intake forms |
| Treatment & Intervention Design | Pulling structured information out of BIPs and treatment plans | Flag goals with missing mastery criteria, extract replacement behaviors, flag non-measurable goal language |
| Accreditation & Credentialing | Extracting information from provider documents to support accreditation reviews | Flag missing policies and procedures, extract supervision hours, identify documented competency areas |
| Billing & Authorization | Monitoring billing documents and authorization requests for completeness and compliance | Flag missing required components in authorization requests, identify billing codes that lack supporting documentation, extract service hours from session notes |
| Research Synthesis | Extracting structured data from research articles to accelerate literature review and evidence-based practice | Extract dependent and independent variables, pull participant characteristics, identify social validity findings |
I am sure there are many more use cases that could be explored, but these are a few that come to mind.
Advantages and Drawbacks
Like any tool, structured output comes with real strengths and real limitations worth understanding before you build with it.
Advantage #1: Speed and Efficiency
Manual review is slow. A supervisor or quality assurance reviewer can only get through so many documents in a day, which means most organizations end up reviewing a sample rather than the full population of documents. Structured output changes that. The model can process documents in seconds, meaning every assessment, every note, and every report can be reviewed rather than just a handful.
Advantage #2: Measurement Over Time
Because the output always comes back in the same format, you can track performance across time in a way that is simply not possible with manual review (unless manually added). Want to know if a clinician's documentation quality has improved or degraded since their last supervision cycle? The data is there. Want to see which checklist items are most frequently missed across your entire team? That pattern is now visible. Consistency in the output creates opportunities for meaningful analysis and in turn, opportunities for improvement.
Advantage #3: Standardization Across Teams
The same criteria are applied to every document, across every site, every time a review is conducted. A clinician in one location is held to the same documentation standards as a clinician in another. That type of structural consistency is difficult to achieve manually, but is easily achieved with structured output.
Drawback #1: Model Accuracy
Structured output is only as good as the model providing the output. If the model wasn't trained on enough relevant data to be able to accurately identify the information requested, the output may look clean and well structured while still being wrong. Clinical documents in behavior analysis are from a specialized domain, and general-purpose models may not have been exposed to enough high quality training data to make them perform reliably for this purpose. We should always be mindful of the limitations of these models and take action to monitor and improve the accuracy of the model over time.
Drawback #2: Data Privacy
Clinical documents contain some of the most sensitive information that exists. We are talking about information about diagnoses, behavioral histories, family constellations, and more. These documents are protected under HIPAA and other applicable laws. Before passing any clinical documents into any third-party system, you need to understand where the data are going, how it will be stored, and whether the vendor(s) you are working with has the appropriate Business Associate Agreement (BAA) in place to protect the data. Not all models and platforms are built with healthcare compliance in mind. Get informed before you get started.
Drawback #3: Maintenance
The implementation of structured output may seem like it's a one-time setup, but it's not. You are introducing a new layer of technology into your clinical and operational workflow, and that comes with real ongoing costs beyond what you are paying to use the model. Someone has to own the system and monitor it for performance degradation over time. When your documentation changes, your prompts and schemas may need to change as well. When a staff member has questions about how a document was scored, someone has to be able to answer those questions. None of this is trivial and all of it is important.
Conclusion
Behavior analysts have always worked in a world of unstructured text. Notes, reports, assessments, and a whole host of other documents are the permanent products of our practice. They serve us well, allowing for flexibility and nuance in how we document our work. Structured output does not change what we document, or how we document it. Instead, it may change how quickly we can move from unstructured to structured data. Those structured data are exciting. The data offer an opportunity to ask new questions, and to answer them with data.
The use cases we explored here are not hypothetical. They are real problems that real organizations are spending real time and money on today. Structured output offers a path to doing that work faster, more consistently (potentially), and at a scale that was not previously possible without custom small models, systems, or custom data extraction tools.
That said, none of this comes for free. The accuracy of the model matters. The privacy of the data matters. The ongoing maintenance of the system matters. Anyone building in this space needs to take those responsibilities seriously because there is a lot at stake.
The story is still out on structured output in behavior analysis. We are still early. The tooling is improving, the models are improving, and the field is only beginning to explore what this can unlock. I hope this post gave you a useful starting point. If you have use cases you are exploring or questions about implementation, I would love to hear about them.