Skip to content
Back to BlogResearch Methods

How do you use AI to screen studies for a systematic review?

12 min read2,322 wordsNEW

AI-assisted screening uses machine learning to rank the titles and abstracts a systematic review returns, so reviewers read the most relevant studies first.

AI-assisted screening uses machine learning to rank the titles and abstracts a systematic review returns, so reviewers read the most relevant studies first. For Vietnamese and ESL researchers facing a first systematic review or meta-analysis, these tools can turn weeks of manual sifting into days, but only when they accelerate a transparent, standards-based process rather than replace human judgement.

This guide answers the seven questions Vietnamese researchers ask MAAS publishing mentors most often about using AI to screen studies for a systematic review aimed at a Scopus Q1/Q2 journal.

Author: MAAS Research Methods & Evidence Synthesis Desk · Reviewed by a Principal Publishing Advisor (PhD, Scopus Q1 author and reviewer in systematic reviews)
Last updated: 2026-06-21
Category: research-methods


What does AI-assisted screening actually do in a systematic review?

Direct answer: AI-assisted screening reads the titles and abstracts your search returns and predicts how likely each is to be relevant, so you review the most promising studies first and reach your included set faster. It supports the title-and-abstract stage; it does not design your protocol, judge study quality, or decide eligibility on its own.

Evidence: A systematic review of text-mining approaches found that automating study identification can meaningfully reduce screening workload while keeping recall high, framing these methods as decision support, not replacement (O'Mara-Eves et al., 2015). A later practical guide grouped the mature tools into two jobs reviewers delegate: prioritising relevant records and reducing how many must be read by hand (Marshall & Wallace, 2019).

Example: A Vietnamese master's student MAAS coached had a search returning over 4,000 records and assumed she would read every abstract by hand. Her mentor reframed it: an active-learning tool would surface likely-relevant studies early so she could stop once new hits dried up — cutting an estimated three weeks of reading to about five days.


Which AI screening tools do reviewers actually use?

Direct answer: Most evidence-synthesis teams use a small set of established tools. Rayyan offers collaborative title-and-abstract screening with relevance suggestions; ASReview applies active learning to reorder records as you label them; general-purpose large language models are an emerging, less validated option. Start with a tool that has a published methods paper and a transparent workflow.

Evidence: Rayyan is a free web and mobile app built to speed up initial screening of titles and abstracts through semi-automation (Ouzzani et al., 2016). ASReview is an open-source framework whose active learning re-ranks the unscreened pile each time a reviewer labels a record, which its developers show finds relevant studies far earlier than random reading (van de Schoot et al., 2021).

Example: A MAAS-coached pair of co-authors split the work: one screened in Rayyan for its blinding and conflict-resolution features, the other ran the same records through ASReview to confirm no relevant study sat late in the queue. The overlap gave them confidence before locking the included set.

Tool type What it does at screening Best first use
Rayyan Collaborative title/abstract screening with relevance hints and blinding Two-reviewer screening with conflict resolution
ASReview Active learning that re-ranks unread records as you label Prioritising a large search to find relevant studies early
General LLM (e.g. GPT-4) Classifies a record as include/exclude from a prompt Experimental second-pass check, not a sole screener

How accurate is AI at title-and-abstract screening?

Direct answer: As a prioritisation aid, AI screening keeps recall high and cuts workload substantially, because you keep reading until relevant hits stop appearing. Used to auto-exclude records without human review, accuracy is far less reliable and depends heavily on the topic and dataset balance. The safe setting is high recall, human-confirmed exclusions.

Evidence: The text-mining review reported large potential workload savings when automation prioritises rather than auto-excludes, while cautioning that performance varies by review and recall must be protected (O'Mara-Eves et al., 2015). Evaluated as a standalone screener, GPT-4's performance dropped across stages once adjusted for chance agreement and dataset imbalance, ranging from negligible to moderate depending on how many records were truly relevant (Khraisha et al., 2024).

Example: A Vietnamese doctoral candidate wanted an LLM to exclude records automatically. Her MAAS mentor showed her the imbalance problem — when only a handful of 3,000 records are relevant, a model that excludes aggressively looks accurate yet can drop the few studies that matter — so they used AI only to rank, confirming every exclusion by eye.


Can a large language model like GPT-4 screen studies on its own?

Direct answer: Not yet, not safely. Large language models can classify abstracts and sometimes match human reviewers on well-structured tasks, but their output is sensitive to prompt wording and dataset imbalance, and they can confidently exclude a relevant study. Treat an LLM as a second opinion, never your only screener.

Evidence: A pre-registered "human-out-of-the-loop" evaluation found GPT-4 highly sensitive to prompt design and data imbalance, with headline accuracy falling once chance agreement was accounted for (Khraisha et al., 2024). A false exclusion means a missing study that biases your synthesis, which is why reporting standards still expect human selection decisions (Page et al., 2021).

Example: A MAAS Publishing Advisory client tested GPT-4 on 200 abstracts she had already screened. It agreed on most but excluded three studies she had included, two central to her meta-analysis. The LLM was useful for flagging disagreements to re-check, not for the final call.


How do you use AI screening without breaking systematic review standards?

Direct answer: Keep two independent human reviewers at the title-and-abstract stage, use AI to prioritise and surface disagreements, resolve conflicts by discussion, and never let a model be the sole reason a study is excluded. The AI changes the order you read in, not the rigour of how you decide.

Evidence: Cochrane methods guidance recommends two reviewers independently screen records to reduce the risk of wrongly excluding eligible studies (Higgins et al., 2024). PRISMA 2020 asks authors to specify the selection process, including how many reviewers screened each record and whether automation tools were used — so an AI-assisted workflow must be reportable, not hidden (Page et al., 2021).

Example: A Vietnamese research team MAAS mentored kept their two-reviewer design but added ASReview to set reading order, stopping only after a fixed run of consecutive irrelevant records. Every excluded study still had a human reason recorded, matching what a Q1 methods reviewer expects.


How do you report your use of AI tools so reviewers trust your review?

Direct answer: State which tool you used, at which stage, for what purpose, and how humans stayed in control. Name the software and version, describe whether AI prioritised or classified records, confirm two reviewers made eligibility decisions, and explain how you protected recall. Transparent reporting turns "we used AI" from a red flag into a strength.

Evidence: PRISMA 2020 requires a clear description of the selection process and any automation tools used to identify and screen records (Page et al., 2021). The automation literature stresses that tools should be reported in enough detail for a reader to judge their effect on completeness, because undocumented automation undermines trust in the search (Marshall & Wallace, 2019).

Example: A MAAS-coached author wrote two plain sentences in her methods: the tool and version used to prioritise screening, and confirmation that two reviewers independently decided eligibility with conflicts resolved by a third. That short, honest paragraph pre-empted the exact question a reviewer would otherwise raise.


How can Vietnamese and ESL researchers use AI screening to reach a first publication?

Direct answer: Use AI to remove the volume barrier that slows ESL first-time reviewers, not the thinking. Let a tool prioritise reading so you spend your hours on studies that matter, draft your eligibility criteria carefully first, and build the reporting habit from day one. The result is a defensible review you can submit with confidence.

Evidence: Active-learning tools find relevant studies early so reviewers stop sooner without sacrificing recall — the efficiency a solo or small ESL team needs (van de Schoot et al., 2021). Because PRISMA and Cochrane anchor credibility in a transparent, human-led selection process (Higgins et al., 2024; Page et al., 2021), an ESL researcher who documents an AI-assisted, human-controlled workflow competes on rigour, not first-language fluency.

Example: A MAAS mentor guided a Vietnamese undergraduate building a systematic review for a scholarship application through the Outline → Draft → Final model: an outline fixing eligibility criteria and search terms, a draft where ASReview ordered screening and two screeners decided eligibility, and a final PRISMA-compliant write-up. She stayed the author throughout — the mentor advised, not produced.


Frequently asked questions

Does using AI to screen studies count as misconduct?
No, when it is disclosed and human-controlled. Using software to prioritise which abstracts you read is an accepted efficiency, and reporting guidelines ask you to describe any automation tools. It only becomes a problem if you hide it or let a model make eligibility decisions no human checked.

Can AI replace the second human screener?
No. Methods guidance still expects two independent reviewers, because a single screener — human or model — raises the risk of wrongly excluding eligible studies. AI can flag disagreements, but it does not satisfy the two-reviewer standard on its own.

Which tool should a first-time reviewer start with?
Most first-timers start with Rayyan for collaborative title-and-abstract screening, then add ASReview when the search is large and active learning can surface relevant studies early. Both have published methods papers, making them easy to cite and report.

Will journals reject a review that used AI screening?
Not for using AI itself, provided you report it properly and kept humans in control of eligibility. Reviewers react badly to undocumented automation or to AI auto-excluding without checking, so the deciding factor is transparency and recall, not the tool.

How much time does AI screening actually save?
It varies by topic and search size, but prioritisation tools can cut a large screening workload substantially by letting you stop once relevant hits stop appearing. The saving comes from reading fewer irrelevant abstracts, not skipping human decisions on borderline studies.

Can MAAS help me run an AI-assisted systematic review?
Yes. MAAS Publishing Advisory coaches Vietnamese researchers through eligibility criteria, tool selection, a recall-protecting screening workflow, and PRISMA-compliant reporting using the Outline → Draft → Final model. Book a consultation through our contact page.


Ready to run a rigorous, AI-assisted systematic review?

A systematic review is judged on whether a reviewer can trust that you found and selected studies fairly, and AI screening helps only inside a transparent, human-led process. MAAS Publishing Advisory pairs you with a PhD-level mentor — 23% of our experts hold doctorates — for a free 20-minute consultation, matches you to the right advisor within 48 hours, and backs every engagement with our three-tier Pass / Merit / Distinction guarantee and a 90-day post-submission warranty. We coach; you stay the author.

Book a Publishing Advisory consultation with MAAS Academic Mentoring →



References

  • Higgins, J. P. T., Thomas, J., Chandler, J., Cumpston, M., Li, T., Page, M. J., & Welch, V. A. (Eds.). (2024). Cochrane handbook for systematic reviews of interventions (Version 6.5). Cochrane. Retrieved June 21, 2026, from https://www.cochrane.org/handbook
  • Khraisha, Q., Put, S., Kappenberg, J., Warraitch, A., & Hadfield, K. (2024). Can large language models replace humans in systematic reviews? Evaluating GPT-4's efficacy in screening and extracting data from peer-reviewed and grey literature in multiple languages. Research Synthesis Methods, 15(4), 616–626. https://doi.org/10.1002/jrsm.1715
  • Marshall, I. J., & Wallace, B. C. (2019). Toward systematic review automation: A practical guide to using machine learning tools in research synthesis. Systematic Reviews, 8, Article 163. https://doi.org/10.1186/s13643-019-1074-9
  • O'Mara-Eves, A., Thomas, J., McNaught, J., Miwa, M., & Ananiadou, S. (2015). Using text mining for study identification in systematic reviews: A systematic review of current approaches. Systematic Reviews, 4, Article 5. https://doi.org/10.1186/2046-4053-4-5
  • Ouzzani, M., Hammady, H., Fedorowicz, Z., & Elmagarmid, A. (2016). Rayyan—a web and mobile app for systematic reviews. Systematic Reviews, 5, Article 210. https://doi.org/10.1186/s13643-016-0384-4
  • Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., Shamseer, L., Tetzlaff, J. M., Akl, E. A., Brennan, S. E., Chou, R., Glanville, J., Grimshaw, J. M., Hróbjartsson, A., Lalu, M. M., Li, T., Loder, E. W., Mayo-Wilson, E., McDonald, S., ... Moher, D. (2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ, 372, Article n71. https://doi.org/10.1136/bmj.n71
  • van de Schoot, R., de Bruin, J., Schram, R., Zahedi, P., de Boer, J., Weijdema, F., Kramer, B., Huijts, M., Hoogerwerf, M., Ferdinands, G., Harkema, A., Willemsen, J., Ma, Y., Fang, Q., Hindriks, S., Tummers, L., & Oberski, D. L. (2021). An open source machine learning framework for efficient and transparent systematic reviews. Nature Machine Intelligence, 3(2), 125–133. https://doi.org/10.1038/s42256-020-00287-7

This article is part of the MAAS Journal series for Vietnamese international postgraduate students and researchers. MAAS Publishing Advisory is an advisory partner — we coach authors through the Outline → Draft → Final delivery model with developmental feedback from PhD-level, Scopus-published mentors. We do not write, submit, or guarantee acceptance of work on an author's behalf.

Share this articleFacebookLinkedInZaloEmail
Want guidance like this?

From this article
to your dissertation.

A 15-minute discovery call — our PhD & Master experts translate this framework into your specific topic and supervisor expectations.