Subject Guides: Generative AI for Legal Research: Assessing AI-generated Content

Assessing AI-generated Content

It can be difficult to assess AI-generated output without at least some understanding of how the system works. This will help you assess the strengths and limitations of the output.

One way to start thinking about a system's limitations is to use the 3 layers method.

Output is only the last layer of a system but it is the most obvious one—in reality, the input and analysis layers have major effects on the quality of the output.

For example, if the input layer is not focused on Canadian jurisdiction (e.g. includes American content, as is the case with most generic tools), the output may reflect a different jurisdiction or blend information from multiple jurisdictions.

The following questions can help you reflect on the 3 layers of a genAI tool.

1. Assess System Limitations

The input layer refers to the dataset (the information) that the system runs on.

The following questions can be used to assess the input layer of an AI tool:

What dataset underlies this tool?
What is the breadth of the dataset? (e.g. jurisdiction, types of resources)
How current is the data?
What types of human bias exist in the dataset? (e.g. inaccuracies, omissions, systemic bias)

How could these types of bias manifest in the output?

In the analysis layer, the machine interprets its input in light of the task set by the user.

The following questions can be used to assess the analysis layer of an AI tool:

What factors are being considered?
What kind of bias could exist in the algorithm(s)?
How transparent is the system?
Does the tool tell or show you how the input is analysed?
Do you have the ability to control how the input is analysed?

The output layer refers to the results generated by the tool in response to a task. With generative AI, this is typically generated text in response to your prompt.

The following questions can be used to assess the analysis layer of an AI tool:

How does the algorithm receive feedback on the relevance of the output?
Does the user have the ability to help “train” the tool?
What kind of bias could have been introduced by humans involved in “training” the model?
What kind of bias is reflected in the output?
What is missing from your results?

2. Verify Information

AI-generated text is not a substitute for reading the primary sources (case law and legislation) on your topic.

This is true for any secondary source as well, but is especially dangerous for AI-generated content because they can include factually inaccurate information (often called hallucinations).

Each AI system treats source citation differently—some may provide a link to a case, while others may only reference a style of cause. This can make it difficult to track down the source because of the lack of information needed to verify which case it is referencing. Never assume a source exists if you cannot find it.

For any sources referenced by a genAI system:

Check that the source actually exists by finding a copy (e.g. finding the full-text of a case in Westlaw, Lexis, or CanLII).
Read each source carefully to ensure that the genAI tool has accurately summarized it (e.g. not misrepresented the case or summarized a dissent).

Take all steps you would normally take when researching case law or legislation, such as noting up. Don't assume the system will do any traditional legal research steps for you. For instance, it may summarize a case that was later overturned by a higher court or received negative treatment.

3. Compare with Other Sources

You should never rely on only AI-generated content to answer a legal research question.

Even if you are able to identify and correct any hallucinations, the results may be incomplete. For example, a genAI tool might mislead you by:

citing a number of relevant cases but not the leading case on an issue or binding cases from your jurisdiction;
citing cases but neglecting to mention that there is applicable legislation; or
answering your question in a narrow sense without alerting you to additional factors that are relevant.

You should always consult additional, human-authored sources to confirm that you have correct information that represents a fullsome approach to your legal research question.

4. Update for Currency

In Step 1, you assessed the genAI tool's limitations, including the crucial question of how current or up-to-date the input layer's dataset is.

Some genAI tools have a specific knowledge cut off date, such as ChatGPT's various models. This means that any legal answer provided by the system will not necessarily reflect current law.

Other systems, like Lexis+ AI, are hooked up to a broader platform's dataset, which means that they run on the most recent content available on that platform. However, even these systems have currency limitations. For example, Lexis' case databases are updated on a daily basis, but their legislation can often be a week or two behind the state of the law. The result for the researcher is that any legislative summary generated by Lexis+ AI will need to be updated to ensure no amendments have passed since Lexis last updated that statute in their system.

If you know the currency limitations of your genAI tool's input layer, you can simply note up case law and legislation as you would normally in the course of your research. If currency information is not available, do not assume that it is up-to-date information and focus on conducting additional research.

5. Take Steps to Address Bias

Generative AI is necessarily biased in that it reproduces patterns in language that already exist in an underlying dataset. Pattern projection is useful in law because the field relies on precedent, but it can also perpetuate bias in several ways.

Bias has long been a known issue with all types of AI-driven tools. One infamous example is risk assessment software COMPAS, which was found to predict greater recidivism for Black defendants than White defendants.

Think carefully about the following types of bias that may be present in your output:

Generated text reflects the underlying dataset's biases, which can include historical and systemic bias; often lacks representation from diverse perspectives; and may not reflect the full reality of possible legal outcomes.

Examples:

GenAI may recommend a harsher sentence for a Black defendant compared to a White defendant with similar circumstances
Reliance on a dataset of one judge's opinions may disproportionately reflect that judge's legal philosophy in a response
A question about outcomes based on a case law dataset does not take into consideration outcomes from the large number of matters that settle out of court

It is usually unclear how GenAI tools determine which cases or other sources to reference, summarize, or otherwise bring into a generated response—this is the "black box" of AI.

Examples:

The system may be biased in favour of certain types of cases or legal arguments over others
The system may be biased in favour of older cases that are heavily cited versus more recent cases

These tools also typically do not account for contextual information, which means that their output reflects bias due to the narrow information it is drawing on.

Example:

A genAI tool that summarizes individual cases may not be able to identify whether that case has been overturned by a higher court because it lacks context

Many legal research tasks do not have one right answer; a series of possible answers or angles to the problem may exist. Even a relatively simple task like summarizing a case reflects a certain bias—both in terms of what aspects of the case to prioritize in the summary, and in the language used to summarize the facts.

Examples:

One genAI system might summarize a case based on its outcome, while another might focus on the legal test used
A genAI tool might use language to summarize the actions of an individual in a way that heightens or lessens the severity of those actions

The analysis layer may also be biased based on the language model itself, which may have learned certain assumptions about words and phrases based on its dataset that do not reflect their legal interpretation.

Example:

A system may predict that the word "disability" is used more frequently in relation to physical disabilities than mental disabilities, and its answer may therefore be skewed to reflect that interpretation of language

A researcher's personal bias (worldview, implicit bias, personal experiences, etc.) will be reflected in the prompt they provide, which will in turn affect what the system provides as output.

Example:

A researcher might frame their prompt in a way that confirms a preexisting belief that a client is guilty

Automation bias: A researcher may be more likely to trust the output of a system because it comes from a machine.

Example:

A researcher might be more likely to believe hallucinated content that matches what they expect to find and neglect to conduct further research

Bias can be incredibly challenging to counteract in research of any kind, but awareness and reflectivity will allow you to take steps to counteract any biases with additional sources, such as those representing diverse perspectives.

Source:
Lederman Law Library. (2025). Critically assessing AI generated content (GenAI), Canadian legal research manual. Queen’s University. https://guides.library.queensu.ca/legal-research-manual/critically-assessing-generative-artificial-intelligence