Demystifying AI for Behavior Analysts: Navigating Ethical Adoption and Algorithmic Bias (Citations)

Anthropic. (2026, April 7). Claude Mythos Preview system card. Anthropic. https://anthropic.com
Bai, Y., Jones, A., Ndousse, K., Askell, A., Chen, A., DasSarma, N., Drain, D., Fort, S., Ganguli, D., Henighan, T., Joseph, N., Kadavath, S., Kernion, J., Conerly, T., El-Showk, S., Elhage, N., Hatfield-Dodds, Z., Hernandez, D., Hume, T., . . . Kaplan, J. (2022). Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv. https://doi.org/10.48550/arXiv.2204.05862
Batista, R. M., & Griffiths, T. L. (2026). A rational analysis of the effects of sycophantic AI. arXiv. https://doi.org/10.48550/arXiv.2602.14270
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., . . . Amodei, D. (2020). Language models are few-shot learners. arXiv. https://doi.org/10.48550/arXiv.2005.14165
Chandra, K., Kleiman-Weiner, M., Ragan-Kelley, J., & Tenenbaum, J. B. (2026). Sycophantic chatbots cause delusional spiraling, even in ideal Bayesians. arXiv. https://doi.org/10.48550/arXiv.2602.19141
Christiano, P., Leike, J., Brown, T. B., Martic, M., Legg, S., & Amodei, D. (2023). Deep reinforcement learning from human preferences. arXiv. https://doi.org/10.48550/arXiv.1706.03741
Cox, D. J. (2025). Ethical behavior analysis in the age of artificial intelligence (AI): The importance of understanding model building while formal AI literacy curricula are developed. Perspectives on Behavior Science. Advance online publication. https://doi.org/10.1007/s40614-025-00459-z
Cox, D. J., & Jennings, A. M. (2024). The promises and possibilities of artificial intelligence in the delivery of behavior analytic services. Behavior Analysis in Practice, 17, 123–136. https://doi.org/10.1007/s40617-023-00864-3
Cox, D. J., & Sosine, J. (2025). A data-driven, algorithmic approach to recommending hours of ABA for individuals with ASD. Behavioral Interventions, 40, e70014. https://doi.org/10.1002/bin.70014
Cox, D. J., Weil, L., Sosine, J., Jennings, A. M., & Santos, C. (2025). Getting more from your IOA data: Alternative measures to total, occurrence, and non-occurrence agreement. Behavioral Interventions, e70031. https://doi.org/10.1002/bin.70031
Crossman, E. K. (1985). The kiss and the promise: A review of Hubert L. Dreyfus' What computers can't do: The limits of artificial intelligence. Journal of the Experimental Analysis of Behavior, 44(2), 271–277. https://doi.org/10.1901/jeab.1985.44-271
Dufour, M.-M., Lanovaz, M. J., & Cardinal, P. (2020). Artificial intelligence for the measurement of vocal stereotypy. Journal of the Experimental Analysis of Behavior, 114(3), 368–380. https://doi.org/10.1002/jeab.636
Guo, Y., Guo, M., Su, J., Yang, Z., Zhu, M., Li, H., Qiu, M., & Liu, S. S. (2024). Bias in large language models: Origin, evaluation, and mitigation. arXiv. https://doi.org/10.48550/arXiv.2411.10915
Jennings, A. M., & Cox, D. J. (2023). Starting the conversation around the ethical use of artificial intelligence in applied behavior analysis. Behavior Analysis in Practice, 17(1), 107–122. https://doi.org/10.1007/s40617-023-00868-z
Jošt, G., Taneski, V., & Karakatič, S. (2024). The impact of large language models on programming education and student learning outcomes. Applied Sciences, 14(10), 4115. https://doi.org/10.3390/app14104115
Kosmyna, N., Hauptmann, E., Yuan, Y. T., Situ, J., Liao, X.-H., Beresnitzky, A. V., Braunstein, I., & Maes, P. (2025). Your brain on ChatGPT: Accumulation of cognitive debt when using an AI assistant for essay writing task. arXiv. https://doi.org/10.48550/arXiv.2506.08872
Lanovaz, M. J. (2022). Some characteristics and arguments in favor of a science of machine behavior analysis. Perspectives on Behavior Science, 45(2), 399–419. https://doi.org/10.1007/s40614-022-00332-3
Liu, S., Wright, A. P., Patterson, B. L., Wanderer, J. P., Turer, R. W., Nelson, S. D., McCoy, A. B., Sittig, D. F., & Wright, A. (2023). Using AI-generated suggestions from ChatGPT to optimize clinical decision support. Journal of the American Medical Informatics Association, 30(7), 1237–1245. https://doi.org/10.1093/jamia/ocad072
Mahajan, A., Obermeyer, Z., Daneshjou, R., Lester, J., & Powell, D. (2025). Cognitive bias in clinical large language models. npj Digital Medicine, 8, 428. https://doi.org/10.1038/s41746-025-01790-0
Mohamed, A., Assi, M., & Guizani, M. (2026). The impact of LLM-assistants on software developer productivity: A systematic review and mapping study. arXiv. https://doi.org/10.48550/arXiv.2507.03156
Morris, C., Jones, S. H., & Oliveira, J. P. (2024). A Practitioner's Guide to Measuring Procedural Fidelity. Behavior analysis in practice, 17(2), 643–655. https://doi.org/10.1007/s40617-024-00910-8
Mutanga, M. B., Msane, J., Mndaweni, T. N., Hlongwane, B. B., & Ngcobo, N. Z. (2025). Exploring the impact of LLM prompting on students' learning. Trends in Higher Education, 4(3), 31. https://doi.org/10.3390/higheredu4030031
O'Neil, C. (2016). Weapons of math destruction: How big data increases inequality and threatens democracy. Crown.
Perrigo, B. (2023, January 18). Exclusive: OpenAI used Kenyan workers on less than $2 per hour to make ChatGPT less toxic. TIME. https://time.com/6247678/openai-chatgpt-kenya-workers/
Poole-Dayan, E., Roy, D., & Kabbara, J. (2025). LLM targeted underperformance disproportionately impacts vulnerable users. arXiv. https://doi.org/10.48550/arXiv.2406.17737
Raj, M., Berg, J. M., & Seamans, R. (2026). The artificial intelligence disclosure penalty: Humans persistently devalue AI-generated creative writing. Journal of Experimental Psychology: General, 155(4), 896–915. https://doi.org/10.1037/xge0001889
Sinayev, A., & Courtney, C. (2025). Effectiveness of LLM-based AI assistance for small business productivity. Research Square. https://doi.org/10.21203/rs.3.rs-6481789/v1
Singhal, K., Azizi, S., Tu, T., Mahdavi, S. S., Wei, J., Chung, H. W., Scales, N., Tanwani, A., Cole-Lewis, H., Pfohl, S., Payne, P., Seneviratne, M., Gamble, P., Kelly, C., Babiker, A., Schärli, N., Chowdhery, A., Mansfield, P., Demner-Fushman, D., . . . Natarajan, V. (2023). Large language models encode clinical knowledge. Nature, 620(7972), 172–180. https://doi.org/10.1038/s41586-023-06291-2
Stephens, K. R., & Hutchison, W. R. (1992). Behavioral personal digital assistants: The seventh generation of computing. The Analysis of Verbal Behavior, 10, 149–156. https://doi.org/10.1007/BF03392881
Sun, C., McEwan, A., Boulton, K. A., Demetriou, E. A., Sadozai, A. K., Lampit, A., & Guastella, A. J. (2025). Artificial intelligence for tracking social behaviours and supporting an autism spectrum disorder diagnosis: Systematic review and meta-analysis. EBioMedicine, 120, 105931. https://doi.org/10.1016/j.ebiom.2025.105931
Tam, T. Y. C., Sivarajkumar, S., Kapoor, S., Stolyar, A. V., Polanska, K., McCarthy, K. R., Osterhoudt, H., Wu, X., Visweswaran, S., Fu, S., Mathur, P., Cacciamani, G. E., Sun, C., Peng, Y., & Wang, Y. (2024). A framework for human evaluation of large language models in healthcare derived from literature review. npj Digital Medicine, 7, 258. https://doi.org/10.1038/s41746-024-01258-7
Templin, T., Fort, S., Padmanabham, P., Seshadri, P., Rimal, R., Oliva, J., Hassmiller Lich, K., Sylvia, S., & Sinnott-Armstrong, N. (2025). Framework for bias evaluation in large language models in healthcare settings. NPJ digital medicine, 8(1), 414. https://doi.org/10.1038/s41746-025-01786-w
Turgeon, S., & Lanovaz, M. J. (2020). Tutorial: Applying machine learning in behavioral research. Perspectives on Behavior Science, 43(4), 697–723. https://doi.org/10.1007/s40614-020-00270-y
Wulf, J., & Meierhofer, J. (2025). The impact of large language models on task automation in manufacturing services. Procedia CIRP, 134, 1089–1094. https://doi.org/10.1016/j.procir.2025.03.071
Xu, Z., Jain, S., & Kankanhalli, M. (2025). Hallucination is inevitable: An innate limitation of large language models. arXiv. https://doi.org/10.48550/arXiv.2401.11817
Zack, T., Lehman, E., Suzgun, M., Rodriguez, J. A., Celi, L. A., Gichoya, J., Jurafsky, D., Szolovits, P., Bates, D. W., Abdulnour, R.-E. E., Butte, A. J., & Alsentzer, E. (2024). Assessing the potential of GPT-4 to perpetuate racial and gender biases in health care: A model evaluation study. The Lancet Digital Health, 6, e12–e22. https://doi.org/10.1016/S2589-7500(23)00225-X