The American Psychiatric Association (APA) has updated its Privacy Policy and Terms of Use, including with new information specifically addressed to individuals in the European Economic Area. As described in the Privacy Policy and Terms of Use, this website utilizes cookies, including for the purpose of offering an optimal online experience and services tailored to your preferences.

Please read the entire Privacy Policy and Terms of Use. By closing this message, browsing this website, continuing the navigation, or otherwise continuing to use the APA's websites, you confirm that you understand and accept the terms of the Privacy Policy and Terms of Use, including the utilization of cookies.

×
Full Access

Guideline Development Process

This guideline was developed using a process intended to meet standards of the Institute of Medicine (2011) (now known as the National Academy of Medicine). The process is fully described in a document available on the American Psychiatric Association (APA) website at www.psychiatry.org/psychiatrists/practice/clinical-practice-guidelines/guideline-development-process.

Management of Potential Conflicts of Interest

Members of the Guideline Writing Group (GWG) are required to disclose all potential conflicts of interest before appointment, before and during guideline development, and on publication. If any potential conflicts are found or disclosed during the guideline development process, the member must recuse himself or herself from any related discussion and voting on a related recommendation. The members of both the GWG and the Systematic Review Group (SRG) reported no conflicts of interest. The “Disclosures” section includes more detailed disclosure information for each GWG and SRG member involved in the guideline’s development.

Guideline Writing Group Composition

The GWG was initially composed of eight psychiatrists with general research and clinical expertise and a psychiatric resident (A. D.). This non-topic-specific group was intended to provide diverse and balanced views on the guideline topic in order to minimize potential bias. One psychiatrist (P. B.) and one psychologist (M. F. L.) were added to provide subject matter expertise in schizophrenia. An additional member (A. S. Y.) provided input on quality measure considerations. The vice-chair of the GWG (L. J. F.) provided methodological expertise on such topics as appraising the strength of research evidence. The GWG was also diverse and balanced with respect to other characteristics, such as geographic location and demographic background. The National Alliance on Mental Illness, Mental Health America, and the Schizophrenia and Related Disorders Alliance of America reviewed the draft and provided perspective from patients, families, and other care partners.

Systematic Review Methodology

The Agency for Healthcare Research and Quality’s (AHRQ) systematic review, Treatments for Schizophrenia in Adults (McDonagh et al. 2017), served as the predominant source of information for this guideline. APA also conducted a search of additional systematic reviews and meta-analyses to include consideration of placebo-controlled trials that were not part of the AHRQ review. An additional search was conducted in MEDLINE (PubMed) and PsycINFO on treatments for neurological side effects of antipsychotic medications, including acute dystonia, parkinsonism, akathisia, and tardive syndromes. The search terms, limits used, and dates of these searches are available in Appendix B. Results were limited to English-language, adult (18 and older), and human-only studies. These titles and abstracts were reviewed for relevance by one individual (L. J. F.). Available guidelines from other organizations were also reviewed (Addington et al. 2017a, 2017b; Barnes et al. 2011; Buchanan et al. 2010; Crockford and Addington 2017; Galletly et al. 2016; Hasan et al. 2012; National Institute for Health and Care Excellence 2014; Pringsheim et al. 2017; Scottish Intercollegiate Guidelines Network 2013).

Rating the Strength of Supporting Research Evidence

Strength of supporting research evidence describes the level of confidence that findings from scientific observation and testing of an effect of an intervention reflect the true effect. Confidence is enhanced by such factors as rigorous study design and minimal potential for study bias.

Ratings were determined, in accordance with the AHRQ’s “Methods Guide for Effectiveness and Comparative Effectiveness Reviews” (Agency for Healthcare Research and Quality 2014), by the methodologist (L. J. F.) and reviewed by members of the SRG and GWG. Available clinical trials were assessed across four primary domains: risk of bias, consistency of findings across studies, directness of the effect on a specific health outcome, and precision of the estimate of effect.

The ratings are defined as follows:

  • High (denoted by the letter A) = high confidence that the evidence reflects the true effect. Further research is very unlikely to change our confidence in the estimate of effect.

  • Moderate (denoted by the letter B) = moderate confidence that the evidence reflects the true effect. Further research may change our confidence in the estimate of effect and may change the estimate.

  • Low (denoted by the letter C) = low confidence that the evidence reflects the true effect. Further research is likely to change our confidence in the estimate of effect and is likely to change the estimate.

The AHRQ has an additional category of insufficient for evidence that is unavailable or does not permit estimation of an effect. APA uses the low rating when evidence is insufficient because there is low confidence in the conclusion and further research, if conducted, would likely change the estimated effect or confidence in the estimated effect.

Rating the Strength of Guideline Statements

Each guideline statement is separately rated to indicate strength of recommendation and strength of supporting research evidence. Strength of recommendation describes the level of confidence that potential benefits of an intervention outweigh potential harms. This level of confidence is a consensus judgment of the authors of the guideline and is informed by available evidence, which includes evidence from clinical trials as well as expert opinion and patient values and preferences.

There are two possible ratings: recommendation or suggestion. A recommendation (denoted by the numeral 1 after the guideline statement) indicates confidence that the benefits of the intervention clearly outweigh harms. A suggestion (denoted by the numeral 2 after the guideline statement) indicates greater uncertainty. Although the benefits of the statement are still viewed as outweighing the harms, the balance of benefits and harms is more difficult to judge, or the benefits or the harms may be less clear. With a suggestion, patient values and preferences may be more variable, and this can influence the clinical decision that is ultimately made. These strengths of recommendation correspond to ratings of strong or weak (also termed conditional) as defined under the Grading of Recommendations Assessment, Development and Evaluation (GRADE) method for rating recommendations in clinical practice guidelines (described in publications such as Guyatt et al. 2008 and others available on the website of the GRADE working group at www.gradeworkinggroup.org).

When a negative statement is made, ratings of strength of recommendation should be understood as meaning the inverse of the above (e.g., recommendation indicates confidence that harms clearly outweigh benefits).

The GWG determined ratings of strength of recommendation by a modified Delphi method using blind, iterative voting and discussion. In order for the GWG members to be able to ask for clarifications about the evidence, the wording of statements, or the process, the vice-chair of the GWG served as a resource and did not vote on statements. All other formally appointed GWG members, including the chair, voted.

In weighing potential benefits and harms, GWG members considered the strength of supporting research evidence, their own clinical experiences and opinions, and patient preferences. For recommendations, at least 10 out of 11 members must have voted to recommend the intervention or assessment after three rounds of voting, and at most 1 member was allowed to vote other than “recommend” the intervention or assessment. On the basis of the discussion among the GWG members, adjustments to the wording of recommendations could be made between the voting rounds. If this level of consensus was not achieved, the GWG could have agreed to make a suggestion rather than a recommendation. No suggestion or statement could have been made if 3 or more members voted “no statement.” Differences of opinion within the GWG about ratings of strength of recommendation, if any, are described in the subsection “Balancing of Potential Benefits and Harms in Rating the Strength of the Guideline Statement” for each statement.

Use of Guidelines to Enhance Quality of Care

Clinical practice guidelines can help enhance quality by synthesizing available research evidence and delineating recommendations for care on the basis of the available evidence. In some circumstances, practice guideline recommendations will be appropriate to use in developing quality measures. Guideline statements can also be used in other ways, such as educational activities or electronic clinical decision support, to enhance the quality of care that patients receive. Furthermore, when availability of services is a major barrier to implementing guideline recommendations, improved tracking of service availability and program development initiatives may need to be implemented by health organizations, health insurance plans, federal or state agencies, or other regulatory programs.

Typically, guideline recommendations that are chosen for development into quality measures will advance one or more aims of the Institute of Medicine’s report on Crossing the Quality Chasm (Institute of Medicine Committee on Quality of Health Care in America 2001) and the ongoing work guided by the multistakeholder-integrated, AHRQ-led National Quality Strategy by facilitating care that is safe, effective, patient centered, timely, efficient, and equitable. To achieve these aims, a broad range of quality measures (Watkins et al. 2015) is needed that spans the entire continuum of care (e.g., prevention, screening, assessment, treatment, continuing care), addresses the different levels of the health system hierarchy (e.g., system-wide, organization, program/department, individual clinicians), and includes measures of different types (e.g., process, outcome, patient-centered experience). Emphasis is also needed on factors that influence the dissemination and adoption of evidence-based practices (Drake et al. 2008; Greenhalgh et al. 2004; Horvitz-Lennon et al. 2009).

Measure development is complex and requires detailed development of specification and pilot testing (Center for Health Policy/Center for Primary Care and Outcomes Research and Battelle Memorial Institute 2011; Fernandes-Taylor and Harris 2012; Iyer et al. 2016; Pincus et al. 2016; Watkins et al. 2011). Generally, however, measure development should be guided by the available evidence and focused on measures that are broadly relevant and meaningful to patients, clinicians, and policy makers. Measure feasibility is another crucial aspect of measure development but is often decided on the basis of current data availability, which limits opportunities for development of novel measurement concepts. Furthermore, innovation in workflow and data collection systems can benefit from looking beyond practical limitations in the early development stages in order to foster development of meaningful measures.

Often, quality measures will focus on gaps in care or on care processes and outcomes that have significant variability across specialties, health care settings, geographic areas, or patients’ demographic characteristics. Administrative databases, registries, and data from electronic health records can help to identify gaps in care and key domains that would benefit from performance improvements (Acevedo et al. 2015; Patel et al. 2015; Watkins et al. 2016). Nevertheless, for some guideline statements, evidence of practice gaps or variability will be based on anecdotal observations if the typical practices of psychiatrists and other health professionals are unknown. Variability in the use of guideline-recommended approaches may reflect appropriate differences that are tailored to the patient’s preferences, treatment of co-occurring illnesses, or other clinical circumstances that may not have been studied in the available research. On the other hand, variability may indicate a need to strengthen clinician knowledge or address other barriers to adoption of best practices (Drake et al. 2008; Greenhalgh et al. 2004; Horvitz-Lennon et al. 2009). When performance is compared among organizations, variability may reflect a need for quality improvement initiatives to improve overall outcomes but could also reflect case-mix differences such as socioeconomic factors or the prevalence of co-occurring illnesses.

When a guideline recommendation is considered for development into a quality measure, it must be possible to define the applicable patient group (i.e., the denominator) and the clinical action or outcome of interest that is measured (i.e., the numerator) in validated, clear, and quantifiable terms. Furthermore, the health system’s or clinician’s performance on the measure must be readily ascertained from chart review, patient-reported outcome measures, registries, or administrative data. Documentation of quality measures can be challenging and, depending on the practice setting, can pose practical barriers to meaningful interpretation of quality measures based on guideline recommendations. For example, when recommendations relate to patient assessment or treatment selection, clinical judgment may need to be used to determine whether the clinician has addressed the factors that merit emphasis for an individual patient. In other circumstances, standardized instruments can facilitate quality measurement reporting, but it is difficult to assess the appropriateness of clinical judgment in a validated, standardized manner. Furthermore, utilization of standardized assessments remains low (Fortney et al. 2017), and clinical findings are not routinely documented in a standardized format. Many clinicians appropriately use free text prose to describe symptoms, response to treatment, discussions with family, plans of treatment, and other aspects of care and clinical decision-making. Reviewing these free text records for measurement purposes would be impractical, and it would be difficult to hold clinicians accountable to such measures without significant increases in electronic medical record use and advances in natural language processing technology.

Conceptually, quality measures can be developed for purposes of accountability, for internal or health system–based quality improvement, or both. Accountability measures require clinicians to report their rate of performance of a specified process, intermediate outcome, or outcome in a specified group of patients. Because these data are used to determine financial incentives or penalties based on performance, accountability measures must be scientifically validated, have a strong evidence base, and fill gaps in care. In contrast, internal or health system–based quality improvement measures are typically designed by and for individual providers, health systems, or payers. They typically focus on measurements that can suggest ways for clinicians or administrators to improve efficiency and delivery of services within a particular setting. Internal or health system–based quality improvement programs may or may not link performance with payment, and, in general, these measures are not subject to strict testing and validation requirements.

Quality improvement activities, including performance measures derived from these guidelines, should yield improvements in quality of care to justify any clinician burden (e.g., documentation burden) or related administrative costs (e.g., for manual extraction of data from charts, for modifications of electronic medical record systems to capture required data elements). Possible unintended consequences of any derived measures also need to be addressed in testing of a fully specified measure in a variety of practice settings. For example, highly specified measures may lead to overuse of standardized language that does not accurately reflect what has occurred in practice. If multiple discrete fields are used to capture information on a paper or electronic record form, data will be easily retrievable and reportable, but oversimplification is a possible unintended consequence of measurement. Just as guideline developers must balance the benefits and harms of a particular guideline recommendation, developers of performance measures must weigh the potential benefits, burdens, and unintended consequences in optimizing quality measure design and testing.

External Review

This guideline was made available for review in May–June 2019 by stakeholders, including the APA membership, scientific and clinical experts, allied organizations, and the public. In addition, a number of patient advocacy organizations were invited for input. A total of 98 individuals and 20 organizations submitted comments on the guideline (for a list of the names, see the section “Individuals and Organizations That Submitted Comments”). The chair and co-chair of the GWG reviewed and addressed all comments received; substantive issues were reviewed by the GWG.

Funding and Approval

This guideline development project was funded and supported by the APA without any involvement of industry or external funding. The guideline was submitted to the APA Assembly and APA Board of Trustees and approved on November 17, 2019 and December 15, 2019, respectively.