Demystifying AI for Behavior Analysts: Navigating Ethical Adoption and Algorithmic Bias (Citations)

  1. Anthropic. (2026, April 7). Claude Mythos Preview system card. Anthropic. https://anthropic.com
  2. Bai, Y., Jones, A., Ndousse, K., Askell, A., Chen, A., DasSarma, N., Drain, D., Fort, S., Ganguli, D., Henighan, T., Joseph, N., Kadavath, S., Kernion, J., Conerly, T., El-Showk, S., Elhage, N., Hatfield-Dodds, Z., Hernandez, D., Hume, T., . . . Kaplan, J. (2022). Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv. https://doi.org/10.48550/arXiv.2204.05862
  3. Batista, R. M., & Griffiths, T. L. (2026). A rational analysis of the effects of sycophantic AI. arXiv. https://doi.org/10.48550/arXiv.2602.14270
  4. Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., . . . Amodei, D. (2020). Language models are few-shot learners. arXiv. https://doi.org/10.48550/arXiv.2005.14165
  5. Chandra, K., Kleiman-Weiner, M., Ragan-Kelley, J., & Tenenbaum, J. B. (2026). Sycophantic chatbots cause delusional spiraling, even in ideal Bayesians. arXiv. https://doi.org/10.48550/arXiv.2602.19141
  6. Christiano, P., Leike, J., Brown, T. B., Martic, M., Legg, S., & Amodei, D. (2023). Deep reinforcement learning from human preferences. arXiv. https://doi.org/10.48550/arXiv.1706.03741
  7. Cox, D. J. (2025). Ethical behavior analysis in the age of artificial intelligence (AI): The importance of understanding model building while formal AI literacy curricula are developed. Perspectives on Behavior Science. Advance online publication. https://doi.org/10.1007/s40614-025-00459-z
  8. Cox, D. J., & Jennings, A. M. (2024). The promises and possibilities of artificial intelligence in the delivery of behavior analytic services. Behavior Analysis in Practice, 17, 123–136. https://doi.org/10.1007/s40617-023-00864-3
  9. Cox, D. J., & Sosine, J. (2025). A data-driven, algorithmic approach to recommending hours of ABA for individuals with ASD. Behavioral Interventions, 40, e70014. https://doi.org/10.1002/bin.70014
  10. Cox, D. J., Weil, L., Sosine, J., Jennings, A. M., & Santos, C. (2025). Getting more from your IOA data: Alternative measures to total, occurrence, and non-occurrence agreement. Behavioral Interventions, e70031. https://doi.org/10.1002/bin.70031
  11. Crossman, E. K. (1985). The kiss and the promise: A review of Hubert L. Dreyfus' What computers can't do: The limits of artificial intelligence. Journal of the Experimental Analysis of Behavior, 44(2), 271–277. https://doi.org/10.1901/jeab.1985.44-271
  12. Dufour, M.-M., Lanovaz, M. J., & Cardinal, P. (2020). Artificial intelligence for the measurement of vocal stereotypy. Journal of the Experimental Analysis of Behavior, 114(3), 368–380. https://doi.org/10.1002/jeab.636
  13. Guo, Y., Guo, M., Su, J., Yang, Z., Zhu, M., Li, H., Qiu, M., & Liu, S. S. (2024). Bias in large language models: Origin, evaluation, and mitigation. arXiv. https://doi.org/10.48550/arXiv.2411.10915
  14. Jennings, A. M., & Cox, D. J. (2023). Starting the conversation around the ethical use of artificial intelligence in applied behavior analysis. Behavior Analysis in Practice, 17(1), 107–122. https://doi.org/10.1007/s40617-023-00868-z
  15. Jošt, G., Taneski, V., & Karakatič, S. (2024). The impact of large language models on programming education and student learning outcomes. Applied Sciences, 14(10), 4115. https://doi.org/10.3390/app14104115
  16. Kosmyna, N., Hauptmann, E., Yuan, Y. T., Situ, J., Liao, X.-H., Beresnitzky, A. V., Braunstein, I., & Maes, P. (2025). Your brain on ChatGPT: Accumulation of cognitive debt when using an AI assistant for essay writing task. arXiv. https://doi.org/10.48550/arXiv.2506.08872
  17. Lanovaz, M. J. (2022). Some characteristics and arguments in favor of a science of machine behavior analysis. Perspectives on Behavior Science, 45(2), 399–419. https://doi.org/10.1007/s40614-022-00332-3
  18. Liu, S., Wright, A. P., Patterson, B. L., Wanderer, J. P., Turer, R. W., Nelson, S. D., McCoy, A. B., Sittig, D. F., & Wright, A. (2023). Using AI-generated suggestions from ChatGPT to optimize clinical decision support. Journal of the American Medical Informatics Association, 30(7), 1237–1245. https://doi.org/10.1093/jamia/ocad072
  19. Mahajan, A., Obermeyer, Z., Daneshjou, R., Lester, J., & Powell, D. (2025). Cognitive bias in clinical large language models. npj Digital Medicine, 8, 428. https://doi.org/10.1038/s41746-025-01790-0
  20. Mohamed, A., Assi, M., & Guizani, M. (2026). The impact of LLM-assistants on software developer productivity: A systematic review and mapping study. arXiv. https://doi.org/10.48550/arXiv.2507.03156
  21. Morris, C., Jones, S. H., & Oliveira, J. P. (2024). A Practitioner's Guide to Measuring Procedural Fidelity. Behavior analysis in practice, 17(2), 643–655. https://doi.org/10.1007/s40617-024-00910-8
  22. Mutanga, M. B., Msane, J., Mndaweni, T. N., Hlongwane, B. B., & Ngcobo, N. Z. (2025). Exploring the impact of LLM prompting on students' learning. Trends in Higher Education, 4(3), 31. https://doi.org/10.3390/higheredu4030031
  23. O'Neil, C. (2016). Weapons of math destruction: How big data increases inequality and threatens democracy. Crown.
  24. Perrigo, B. (2023, January 18). Exclusive: OpenAI used Kenyan workers on less than $2 per hour to make ChatGPT less toxic. TIME. https://time.com/6247678/openai-chatgpt-kenya-workers/
  25. Poole-Dayan, E., Roy, D., & Kabbara, J. (2025). LLM targeted underperformance disproportionately impacts vulnerable users. arXiv. https://doi.org/10.48550/arXiv.2406.17737
  26. Raj, M., Berg, J. M., & Seamans, R. (2026). The artificial intelligence disclosure penalty: Humans persistently devalue AI-generated creative writing. Journal of Experimental Psychology: General, 155(4), 896–915. https://doi.org/10.1037/xge0001889
  27. Sinayev, A., & Courtney, C. (2025). Effectiveness of LLM-based AI assistance for small business productivity. Research Square. https://doi.org/10.21203/rs.3.rs-6481789/v1
  28. Singhal, K., Azizi, S., Tu, T., Mahdavi, S. S., Wei, J., Chung, H. W., Scales, N., Tanwani, A., Cole-Lewis, H., Pfohl, S., Payne, P., Seneviratne, M., Gamble, P., Kelly, C., Babiker, A., Schärli, N., Chowdhery, A., Mansfield, P., Demner-Fushman, D., . . . Natarajan, V. (2023). Large language models encode clinical knowledge. Nature, 620(7972), 172–180. https://doi.org/10.1038/s41586-023-06291-2
  29. Stephens, K. R., & Hutchison, W. R. (1992). Behavioral personal digital assistants: The seventh generation of computing. The Analysis of Verbal Behavior, 10, 149–156. https://doi.org/10.1007/BF03392881
  30. Sun, C., McEwan, A., Boulton, K. A., Demetriou, E. A., Sadozai, A. K., Lampit, A., & Guastella, A. J. (2025). Artificial intelligence for tracking social behaviours and supporting an autism spectrum disorder diagnosis: Systematic review and meta-analysis. EBioMedicine, 120, 105931. https://doi.org/10.1016/j.ebiom.2025.105931
  31. Tam, T. Y. C., Sivarajkumar, S., Kapoor, S., Stolyar, A. V., Polanska, K., McCarthy, K. R., Osterhoudt, H., Wu, X., Visweswaran, S., Fu, S., Mathur, P., Cacciamani, G. E., Sun, C., Peng, Y., & Wang, Y. (2024). A framework for human evaluation of large language models in healthcare derived from literature review. npj Digital Medicine, 7, 258. https://doi.org/10.1038/s41746-024-01258-7
  32. Templin, T., Fort, S., Padmanabham, P., Seshadri, P., Rimal, R., Oliva, J., Hassmiller Lich, K., Sylvia, S., & Sinnott-Armstrong, N. (2025). Framework for bias evaluation in large language models in healthcare settings. NPJ digital medicine, 8(1), 414. https://doi.org/10.1038/s41746-025-01786-w
  33. Turgeon, S., & Lanovaz, M. J. (2020). Tutorial: Applying machine learning in behavioral research. Perspectives on Behavior Science, 43(4), 697–723. https://doi.org/10.1007/s40614-020-00270-y
  34. Wulf, J., & Meierhofer, J. (2025). The impact of large language models on task automation in manufacturing services. Procedia CIRP, 134, 1089–1094. https://doi.org/10.1016/j.procir.2025.03.071
  35. Xu, Z., Jain, S., & Kankanhalli, M. (2025). Hallucination is inevitable: An innate limitation of large language models. arXiv. https://doi.org/10.48550/arXiv.2401.11817
  36. Zack, T., Lehman, E., Suzgun, M., Rodriguez, J. A., Celi, L. A., Gichoya, J., Jurafsky, D., Szolovits, P., Bates, D. W., Abdulnour, R.-E. E., Butte, A. J., & Alsentzer, E. (2024). Assessing the potential of GPT-4 to perpetuate racial and gender biases in health care: A model evaluation study. The Lancet Digital Health, 6, e12–e22. https://doi.org/10.1016/S2589-7500(23)00225-X