LibGuides: Good review practice: a researcher guide to systematic review methodology in the sciences of food and health: 2. Searching for studies

The importance of the search

The literature search lays the foundation of evidence for a systematic review. A weak search degrades and potentially invalidates a review’s value. An information professional or librarian with systematic review experience should be a part of the systematic review team if possible. If not possible, training and consultation with such a professional should be sought to ensure that the searching step of a review is competent.

Three principles shape a systematic review’s search. The search must be systematic, comprehensive, and reproducible.

A systematic search is built to deliberately take full advantage of each database’s structure and syntax to do the best job possible of retrieving the relevant literature from that database. A systematic review search will have a single search string for each database.

A comprehensive search is built on two factors. First is the robustness the string, which needs to capture the varieties of terms describing key concepts across the literature, combined with controlled vocabulary terms. Second is running the string, appropriately modified, in multiple databases chosen on the likelihood that they will index needed literature. Depending on the SR question, database searches may be supplemented by searching for grey literature and unpublished studies.

A reproducible search captures the exact search string as configured for each database and its hosting platform. The strings will be published initially in the protocol and then in the body of or as a supplement to the finished study. If the exact string, database, or platform is missing from the reporting, the search is not actually reproducible.

Precision verus recall

Precision versus recall : Every search balances recall and precision. Recall is the number of relevant results a search captures compared to all the possible relevant results indexed in a database. Precision is the number of relevant results retrieved compared to the total number of results that search retrieved. A more precise search will return a higher proportion of relevant results but is more likely to exclude relevant results. Ideally a systematic review of quantitative evidence is truly comprehensive, but that ideal needs to be balanced against the authors’ capacity to screen results.

The question's framing might introduce opportunities to limit the results that must be screened, by limiting to human studies, for instance, or to a particular type of study, or to a geographical region.

Good practice point: All of the results from the finalised search must be downloaded, deduplicated, and screened for a systematic review.

Search steps

Choosing databases

A systematic review cannot be based on evidence from a single database. The search should be run in as many databases available to the review team that include relevant research.

Which databases are most appropriate for a systematic review will depend on the subject focus of the review question. To judge if a database is potentially useful, look at the database's stated scope and content, which describes, in general terms, what will be found in the database. Bear in mind that a database often includes a broader spread of literature than its name implies. Other important strategies include running exploratory searches in any database that could be useful, and consulting with a librarian.

Databases to consider searching include—but are not limited to—FSTA, CAB Abstracts, Medline or PubMed, SciELO, Web of Science Core Collection, Agricola, Scopus, EMBASE, and Biosis Previews.

Good practice point: An academic search engine like Google Scholar is not a database. While Google Scholar may be a useful tool for supplementing the evidence with grey literature, it sits outside the databases that will be the main sources for finding the evidence base for the systematic review.

Identify key articles

A group of key articles that represent the evidence that the search strategy will need to find will be equally important at the beginning of the search process and in the testing phase. At the start they are key sources to be mined for terms for building the search strategy. At the end of the process, they facilitate testing the search string in each database included in the search.

Develop the search strategy

A search string is made up of its terms and the syntax that connects the terms. Developing the search strategy takes expertise and time. Two components make up the search string: the terms used, and the Boolean operators connecting them. The terms will include free text search terms, and controlled vocabulary terms.

The free text search terms are those that have been brainstormed and gathered in finding the widest appropriate terminology to represent each concept in the question framework.

Controlled vocabulary terms are pulled from thesaurus which undergird subject specific databases. These vocabularies can have specific names like MeSH for PubMed and Medline or they are called subject terms, keywords, descriptors, or subject headings, depending upon the platform hosting the database. These terms are applied by indexers to each record in a database. By adding additional, regularized terms which capture the main concepts and topics addressed in the research the record represents, these terms make relevant research easier to find. They also can be “exploded” so that a search includes the term itself plus any more specific terms subsumed under that term. This improves the comprehensiveness of the search. Controlled vocabulary terms also pull together many variations on terms.

A database’s thesaurus should be searched to identify appropriate terms. The specifics of how to search a thesaurus and explode its terms varies according to the database and the platform it is being searched on. The help section of any database will provide guidance, as will an experienced librarian.

Controlled vocabulary terms should not be confused with the author supplied keywords that are included in records in the Web of Science Core Collection and Scopus. Those terms are idiosyncratic and do not collate variations of a term.

Once appropriate terms have been gathered and tested for each concept, they need to be pulled together into a search string. All the terms that represent a single concept will be connected by the Boolean operator OR. The controlled vocabulary terms will also be added to those terms with OR. Finally, each section built for each concept will be connected to the others with the Boolean operator AND. The results from the search will all have at least one of the terms from each concept block that has been included in the search string present in the record.

Find additional information about building a search string, along with guidance on formatting terms (truncation) and combing terms for sensitivity (adjacency searching) and the pros and cons of limits, filters, and hedges in the supplementary section building a search

Good practice point: Sometimes including a search concept does not work well for a search. Outcomes, for example, tend not to be mentioned in a record's title or abstract, so if the outcome is included as a concept in the search, there is a risk that relevant studies will be missed with the search. Each concept added with AND further restricts the search.

Test the search and translate to additional databases

When all the terms for all concepts have been gathered and combined with correct syntax (the rules governing how terms need to be connected in each interface) for the first database, the search string must be reviewed and tested. The first level of testing should be run by the author of the string, looking for any internal errors, but also being sure that it is retrieving all of the key articles in the database being searched. The string should also be reviewed by subject experts in the review team, to ensure that no free text search terms have been omitted. Ideally the string should also be peer reviewed or checked by a librarian or information professional.[2]

The search string as it has been developed in the first database will not run correctly if it is simply copied and pasted into others. Each subject specific database will have its own controlled vocabulary built around its subject focus, and each database platform will have its own syntax. Boolean AND and OR are used across platforms, but fields, proximity operators and the process of building a search string vary.

When running the searches in additional databases, it is normal to get many duplicate results. Each database will also have unique content, and it because of this that the evidence base for a systematic review will not be comprehensive unless multiple databases are searched.

Complexity of translation

This chart gives a sense of some of the syntax and functionality variations across platforms for a single database, FSTA.

Variations between different databases on different platforms will be greater.

	Ovid	EBSCOhost	Web of Science
Labels used for controlled vocabulary terms	Subject headings (which are phrase searched) or heading words (which are word searched)	Descriptor, subject, or keyword, depending upon which screen you are on.	Descriptor as the field to search, but keyword in the record.
Combined title/abstract/ heading words fields search	Available as Advanced Search (when map to subject heading is not ticked	Not available, must be built manually. All Text Fields search searches all fields except Section and Subsection Codes, which helps cut down on false hits for some search terms .	Available as Topic search
Stemming and lemmatization	None. Searches exactly what is typed, unless modified with wildcards.	Default setting includes stemming and lemmatization. To override, put search term within quotation marks	Default setting includes stemming and lemmatization. To block it, go to advanced search and toggle to exact search under more options. Or put terms inside quotation marks. Wildcards in a search term will also turn of stemming or lemmatization for the term
Indexing date available	Yes. Use Entry Date field	Not available as a field, but can be manually entered. Type UC [three letters for month] [year] or UC [year], e.g. UC Feb 2021 or UC 2021. Find more information about field codes in EBSCO Help	Yes. In basic search mode see Add date range, and use Index Date.
Explode functionality	Explode includes all levels of narrower terms to the bottom of the thesaurus hierarchy	Explode includes narrower terms one level below the term exploded	Not available. Can be done manually by adding terms in the thesaurus view.

Searching beyond databases

Searching appropriate databases will be the main method for identifying a review's evidence. But additional searching methods should supplement the database searches to minimize chances that relevant literature is missed. Supplementary search methods include:

Handsearching: so-called because in the past it required scanning the table of contents of key of issues by hand, in the electronic age, this can now be done online on journals websites or with table of content services. Key journal titles for the topic are scanned for any relevant articles that, because of search or indexing anomalies, have been missed by the database search.
Reference list searching: the reference lists of the key articles should be scanned for relevant articles.
Citation searching: Web of Science, Scopus, and Google Scholar enable viewing the articles that have cited an article, so can lead to relevant records published after the key article was published.
Grey literature: Depending on the review question, grey literature, or literature that has not been formally published in an academic book or journal, might be valuable sources of evidence. Grey literature can include conference proceedings, governmental or organisational reports, dissertations, patents, and more. Some databases include grey literature, and some do not. Even when grey literature is included within a database, if this source of evidence is appropriate for a review question, then grey literature should also be searched for by going directly to appropriate organizations’ websites. A list of other resources for locating grey literature can be found in the appendix.
Unpublished studies: when the data from unpublished studies can be tracked down and included in a systematic review it can help guard against publication bias, which is the phenomenon where studies with null results are less likely to be accepted for publication than studies with statistically significant results. The existence of unpublished studies can be discovered through conference proceedings, dissertations and through research networks, and researchers can be contacted directly.

Updating searches

Near the end of conducting a systematic review, all searches should be rerun to see if any relevant studies have been published since they were run towards the start of the project.

The most efficient way to do this is to formulate the search to only include records that have been added to each database since the finalised searches were run at the review's start. Be aware that the field indicating when records were added to a database has many different names, depending on both the database and the platform it is being searched on. Check a database's help section to confirm which field to use.

Tip: The date records are added to a database is not the same as the publication date of an article. They sometimes align, but often do not. It is a peculiarity of publishing that articles are frequently released months before their stated publication date. When this happens, the articles are indexed (added to the database) earlier than their publication date. Sometimes articles are added to a database later than their stated publication date. This can happen when a new journal title is added to a database and its back issues are indexed along with the most current articles. It also happens when a publisher releases material after the publication date.

Reporting the search

Finalised search strategies are usually bulky. It is acceptable to not include them in the main body of the article, but they must be included in full in an appendix or a repository such as the Open Science Framework and linked to the review. Journals provide guidance on their preferred formatting.

Documentation needs to be detailed enough that another researcher could reproduce the review's search strategy exactly, without having to guess about any of the elements that could change the search's results.

PRISMA-S provides extensive guidance on the elements that must be reported for a search strategy to truly be reproducible. Information is required both about the information sources searched, and about the search strategy as run in each information source.

Good practice point: Always copy and paste search strings into protocol documents and into manuscripts or supplementary materials rather than retyping it. It is easy to introduce errors with retyping, which invalidates the reproducibility of the search.