On the ethics of coauthored AI drafts

Why the real question is not whether AI was used, but whether the thinking still belongs to the author

A colleague submits a paper. In the acknowledgments, she notes that she used ChatGPT to tighten the prose of the discussion section. The paper is accepted. Six months later, at a conference, she mentions in a hallway conversation that she also used the model to draft the opening paragraph of the introduction and to rewrite several transitions. She did not mention this in the paper. Was that wrong?

Most researchers today have a soft, working answer to this question: yes, more or less, probably. The discomfort is not about the tool. Spell-checkers use statistical models too, and we have all used Grammarly for years. The discomfort is about disclosure. Something about the fact that she did not mention it feels like an omission. But when we try to articulate why, the reasoning falls apart quickly. Spell-checkers are not disclosed. Grammarly is not disclosed. Nobody discloses the use of a thesaurus.

The usual answer is that AI-generated text is categorically different from these earlier tools. This is true at the level of capacity. A language model can produce a plausible paragraph from a one-sentence prompt, and no thesaurus can do that. But “categorically different” is doing a lot of work in that sentence, and the work is rarely spelled out. The difference that matters, ethically, is not the capacity of the tool. It is the question of whose thinking the text represents.

Let me sharpen the question. When a reader encounters a sentence in your paper, they are entitled to assume it represents your thinking. Not your prose exactly, because readers have always understood that prose is cleaned, edited, and sometimes written with help. But the thought behind the prose is understood to be yours. When you write “the evidence suggests a moderate effect,” the reader takes you to be asserting, on your own authority, that the evidence suggests this. This is the basic compact of academic writing. Everything else, including the footnotes, the methods, and the peer review, rests on it.

The question with AI-drafted prose is whether this compact still holds. The answer turns out to depend on a distinction nobody in the current discourse is making clearly enough: the distinction between prose help and thought help.

Prose help is when the model refines language you have already committed to. You wrote the sentence; the model shortened it. You wrote the paragraph; the model suggested a better transition. You had the thought; the model helped you express it. This is continuous with editing, and editors have been part of academic writing for as long as academic writing has existed. The compact holds. The thought is yours.

Thought help is different. When you ask a model to draft a paragraph you have not yet thought through, when the model supplies the argument’s shape and not just its expression, the compact breaks. A reader who assumes the thought is yours is being misled. The thought came from the model. You endorsed it after the fact, maybe, but you did not produce it. That is a different thing, and it matters.

The trouble is that the distinction is not always easy to maintain in practice. A researcher sits down to write a paragraph, has a vague sense of what they want to say, and asks the model for a draft. The draft comes back. They read it, edit it, change a few sentences, and keep the overall structure. The paragraph is now half theirs and half the model’s. Did the model give prose help or thought help? There is no clean answer. The model’s draft probably shaped how they thought about what they wanted to say. The thought is entangled.

This is not a new problem. It is the problem of the research assistant, the coauthor, or the graduate student whose draft chapter becomes the advisor’s paper. What is new is the scale and invisibility of the entanglement. A research assistant is named. A coauthor is named. A graduate student is usually named. The language model is not named, and the contribution is harder to pinpoint.

The right answer, I think, is that we need to apply the same standard to language models that we apply to human collaborators, adjusted for what is specific to the tool. The standards we already have for human contributions are not perfect, but they are more developed than most discussions admit. A research assistant who ran your regressions is credited. A colleague who read your draft and suggested a rewrite of section three is named in the acknowledgments. A student whose idea became the seed of your argument is coauthored. These distinctions are familiar. They can be extended.

Here is the extension I would propose. The question to ask, for any piece of AI-generated text in your paper, is this: where does this sit on the spectrum from prose help to thought help? Then ask a second question: what would I do if a human had given me this same contribution?

If the model tightened a sentence you had already committed to, no disclosure is needed. This is equivalent to copy editing, and we do not credit copy editors in academic papers. We should, often, but that is a separate fight. If the model suggested a framing you had been circling but had not yet settled on, a brief acknowledgment is appropriate: “The authors used a large language model to help clarify the discussion’s framing.” This is equivalent to acknowledging a helpful colleague.

If the model drafted substantive portions of the argument, including paragraphs whose content, not just expression, came from the model, then we are in coauthorship territory, and the paper needs a serious conversation about how to represent this. Most journals will not currently accept a language model as a named coauthor. That is probably correct for now, given that models cannot be held responsible for their claims. But it means that in such cases, you need to rewrite the paragraphs yourself before submission, not because the model’s version was wrong, but because you are claiming authorship over material you did not produce.

This is more demanding than the current norms. But it is not a higher standard than we already apply to human work. It is the same standard, applied to a new kind of contribution.

There is a further question, which is whether the use of models is appropriate at all in certain contexts. Thesis writing, for example. Early-career training. Qualifying examinations. The case for restricting model use in these contexts is not that the model’s output is wrong. It often is not. The case is that the purpose of the exercise is to develop the student’s own writing and thinking, and letting a model do it defeats the purpose. A student who submits a thesis partly drafted by a model has not demonstrated the capacity the thesis is meant to demonstrate. This is not a research ethics question. It is a pedagogy question. The answer for theses is different from the answer for papers because the purposes are different.

Journals are beginning to articulate policies. Most now require disclosure of substantive AI use. Few articulate the distinction I have just drawn, the distinction between prose and thought help. As a result, their policies are either too strict, requiring disclosure of any use, including spell-checking, or too loose, requiring only the declaration that no AI-generated text appears in the paper, which almost nobody can truthfully sign. The policies will improve. In the meantime, the burden is on the writer to know, for themselves, where their particular use sits on the spectrum, and to disclose accordingly.

One honest note on my own practice, and then I will close. The distinction between prose help and thought help is easier to draw in theory than to hold to in practice. A model that is any good at writing will, inevitably, shape your thinking as it shapes your prose. You cannot fully separate the help. The cleanest response to this is not to try. It is to restrict your use to cases where you were going to write the paragraph yourself anyway, and to treat the model as a sharper version of the dictionary or the thesaurus. If you would not have written the paragraph without the model, the model wrote it. If you would have written it and the model improved it, you wrote it. The line is fuzzy, but most researchers, asked honestly, can tell which side of it they were on.

The compact between writer and reader holds if the writer can honestly say: these are my thoughts, even if they are not entirely my words. It breaks if the writer cannot.

Published April 25, 2026 · The Almanac