Perhaps we should not grade students on weekly, or other occasional, writing during the term or semester.
How about going back to the old system where, apart from experimental lab work, nothing is graded until the end of the term?
All weekly assignments should just be considered prep for one exam at the end of the term where the student has an opportunity to demonstrate mastery of the course's subject matter. They can prepare as they wish, use AI, and even cheat on the homework, but there will be a revelation at the end of the term.
That final test can be proctored, monitored, audited to ensure that whatever words are used are indeed the student's own words. The resulting grade depends on that, and that alone.
The approach of continuous assessment, which to me always seemed suspect and ripe for abuse, was completely broken by the AI tools that are now available.
This approach does not really solve the core issue. In practice, students often do poorly when evaluation is concentrated in one end of term exam. It also pushes many students to cram at the end of the term instead of learning steadily.
A better approach is to rethink what we assess and how we assess it. Research shows that the design of assessments plays an important role in academic integrity. Assignments that require original thinking and regular engagement can reduce incentives to cheat and improve learning outcomes.
Students are also people. If we're managing a software project, a single deadline at the end is sure to suffer from delays. It's better to split things into shorter deliverables with more frequent feedback.
I got diagnosed with cancer just before finals in the first semester of my senior year. Sure, it kill my chances at graduating Summa Cum Laude, and I didn’t make the Dean’s list that semester even though I worked my ass off, as usual. Frustrating, but that’s life. I should not, however, have failed that semester, which I would have if only the final week’s assignments were counted. People have bad weeks. In most white collar jobs I’d have probably been able to take some time for myself, maybe given someone else my most urgent tasks, and likely been given plenty of leeway. Even doctors, lawyers, etc. People deserve to have bad weeks without losing months of work.
Why not multiple exams? In fact, why not many exams?
Sure, it requires more resources, but it shouldn't require much more:
- We've had multiple exams before AI, and I don't see how AI makes it any harder. Obviously these are closed-book
- Schools should already be banning phones in class (and colleges have insane tuitions, they can afford more exams)
- The students who go out of their way to cheat - as long as they're a minority, let them. Why not? Either they'll fail later in life, or they didn't need to learn the material because they're pathological fakers (even if you won and forced them to learn the material, they'd probably still fake their way out of using it). Then, I doubt you need much proctoring to ensure that most students don't cheat, because most of the smart students are generally smart enough to know that actually learning the material is probably important (or if the material is probably not important, it doesn't matter if the students all cheat...)
Meanwhile, downsides of one exam:
- Disadvantages students who get overly stressed about unrecoverable exams, or have a particularly bad day on the exam
- Many students will blow off the (ungraded) assignments and put off actually learning until the end
- Less graded content (especially if the exam isn't overly long, which would disadvantage some students)
Indeed. Many of my technical undergrad courses were very exam heavy. Typically 3-5 midterms and one final. Sometimes the final was as little as 10% of the grade. The idea was that if you'd done well throughout the semester you can relax during finals weeks.
Homework was assigned but not graded.
Periodic tests is the way to go.
I hated courses where the final was more than 30%. Forget 100%.
The purpose of grades is to punish students, something which they are keenly aware of. Remove grades from the equation and hold students back until they have mastered the material and they will cease cheating.
If someone knows 80% of the topics on an exam like the back of their hand and doesn't know the other 20% they shouldn't get a B, they should pass the subjects they know and be asked to retake and relearn the subjects they don't know.
When people know they can make mistakes and the result is not a perpetual black mark on their record (any grade not an A) but they are given the chance to improve and demonstrate this improvement then perhaps they might be more willing to admit and understand mistakes instead of cheating.
I don't disagree with you that a reasonable way to cope with the current problems is to ensure everything that "counts" is done in a controlled environment, but pedagogy and its goals are vast.
There are things you learn from spending several days structuring a 20-page argument that you will not learn (and cannot assess) from oral examination or a 5-paragraph essay written in a blue book.
If you have spent several days structuring a 20 page argument in October on any topic you'll have learnt a great deal about the subject matter. When you get to the exam hall in, say, May it will stand to you.
That knowledge will show up in the blue book vis-a-vis the other exam candidates.
Sure--yes--the student will learn something if they actually wrote a 20-page paper on some given topic. But how are you going to evaluate their ability to compose the 20-page argument?
I would prefer not to be confrontational here, but I am having a hard time imagining that you've deeply considered the pedagogy of how to teach and evaluate students on squishy skills like this.
Knowing a bunch of facts about something is a world apart from structuring a compelling in-depth argument about it.
In the simplest case, where we'll say the exam question was precisely the topic of the 20 page paper, the candidate would be golden. Of course, it's unlikely in a 3 hr. exam that you'll be asked to write a 20 page response; but in edited form, you could definitely produce three cogent pages about some particular aspect of the original paper - if you've done the work. If you truly wrote the 20 page paper, you can surely produce three literate, cogent, responsive and topical pages.
There are many disciplines in which students work on effectively distinct projects.
For example, the life-changingly-well-designed newswriting course I took in college assigned every single student a different story to spend several weeks reporting out so that we wouldn't all be out harassing the same poor people for interviews.
Genuinely interested. What was the final like? This seems more in the experimental science (ok, journalism) category. I may have to adjust my thinking to be more expansive and also include things like "vocational".
Students are very grade-motivated and unfortunately they rarely do the homework assignments if they are not worth points.
At-home coding projects, writing essays, etc also exercise different skiils than you can test for in a 2 hour written exam. It's unfortunate that due to rampant AI cheating, we can no longer reward the students who put in the work and develop these skills.
Why a single test at the end of the semester? Why not allow the student to demonstrate mastery at anytime during the semester when they are ready? Then they can move on to the next objective, or, if they fall short, continue to study until they meet the objectives.
Of course, creating good exams is difficult, but you have to do that either way.
Schools stopped doing that because students largely refuse to prepare. Testing throughout the year is like a CI pipeline and is shown to work better for the median student.
Students are neither generally stupid nor constitutionally lazy. I sense that when expectations are clear they'll often surprise you with diligence. We should trust them to do the right thing. If they do, it's an A; and if not, it's less than that.
> I sense that when expectations are clear they'll often surprise you with diligence.
Data does not support your sense.
Most students do not have good time management skills, usually because they have no models and/or have not been taught these skills.
Furthermore, continuous feedback, whether graded or not, has been found to be more effective than one-shot feedback.
Evaluation and assessment is a complex topic towards which many people (not necessarily you) want to take an overly simplified approach.
There are trade offs for any system that is chosen. The organizations providing the grades have to decide what their priorities are (e.g., time, accuracy, etc.).
I'm not sure what public school system has instilled that confidence in you, but it musn't have been mine. I'm also not sure why you think clear expectations about an end of year test will lead to better results than clear expectations about multiple spaced out tests. The data shows that it doesn't.
I think if they offered a proctored do-over a week later, the bad results on the first test might prompt them to make an attempt at studying for the next week, and the prospect of having to sit through two tests and getting shamed for having to do-over might prompt people to actually study for the first test.
Ultimately, you ask the student, in one audited test, to demonstrate that they've absorbed the essence of the course material and have developed some level of mastery.
If the change is not designed to educate the student, then the point isn’t education.
As a general rule when changing complex systems, you sacrifice what you aren’t trying to optimize. If you make a random change to a car without consideration for gas mileage it’s very likely to reduce gas mileage.
Schools are not merely in the business of maximizing education, they have their own prestige to uphold, and they would like to give degrees with their name on it to students who have actually upheld their end of the contract.
(The other side of that contract is, kids are not merely attending schools to learn, but to earn a degree that carries some degree of prestige)
To what end? Not cheating on the weekly assignments is surely more beneficial to learning than cheating on them is, but I don’t see how removing the assignments altogether would help students learn.
It's a crude blade to avoid the issues of AI pollution of weekly submissions, of which few teachers have much confidence that the submission itself was actually written by the student - who's assumed to be learning something.
The OP was about students dumbing down their own work to avoid AI detectors ratting them out. That seems like a big loss.
And what would the goal of that be? I thought the goal of education was... education. The grading is not goal in itself. Will this really motivate kids to do better?
It's to prove that a student is actually educated and has a firm grasp of the course material. If one gets an A every week on AI-assisted submissions, can one make such a claim? And can a teacher make the claim that they've achieved any actual education of the student?
A grade, on a single proctored test, is a crude metric, but at least it would be a brutally fair one.
Even before LLMs, there was a _lot_ of deception and cheating in university. I -- and I do not say this with pride -- used to write essays for my classmates for money. In my own defense, I needed the money. I also know that in addition to homework for money, many fraternities and sororities kept copies of prior exams and assignments, and getting access to these was one of the perks of membership. Knowing what kind of questions to expect (let alone the exact questions) can easily give someone a few extra IQ points for free.
Personally, I felt that the drive to automate the parts of the professors' workloads that mattered (i.e. teaching and grading and evaluation and research), only so that they can be given work that matters less the more they do it (i.e. publishing slightly different flavors of the same paper, to meet KPIs), was oddly perverse.
The multiple-choice test and the puzzle-solving test and really any standardized test can be exploited by any group that is sufficiently organized. This is also true in corporate interviewing where corporations think (or pretend) that they are interviewing an individual, whereas they are actually interviewing a _network_ of candidates who share details about the interviewers and the questions. I know people who got rejected in spite of getting all the interview questions correct (the theory is that nobody can do that well, so they must have had help from previously rejected/accepted candidates).
The word "trust" shares a root with the word "tree" and "truth" and "druid". Most exams and interviews are trying to speed-run trust-building (note that "verification" is from the latin word that means "true"). If trust and truth are analogous to "tree", then we are trying to speed-run the growth of a tree -- much like the orange tree, in the film, _The Illusionist_. And like the orange tree, it is a near-complete illusion, a ritual meant to keep the legal department and HR department happy.
The LLMs have simply made the corruption of academia accessible to _all_ students with an internet connection (EDIT: and instantaneous and cheap, unlike a human writer).
There has never been a shortcut to building trust. One cannot LLM their way into being a (metaphorical) druid.
I do not look forward to the Voight-Kampff tests that will come to dominate all aspects of online and asynchronous human interaction.
Note that, short of homework/classwork that _can't_ be gamed by an LLM (for some fundamental reason), even the high-quality honest students will be forced to cheat, so as to not be eclipsed by the actual low-quality cheating students[0].
I imagine that we may end up wrapping around to live in-person dialectics, as were standard in the time of Socrates and Parmenides[1]. If so, this should be fun.
[0]: If left unaddressed, we may see a bimodal distribution of great and terrible students graduating college, with those in between dropping out. If college is an attempt to categorize and rank a population, this would be a major fault in that mechanism.
[1]: Not to the exclusion of the other kinds of tests, writing is still important, critical even. But as a kind of verification-step, that should inform how much the academic community should trust the writing (I can imagine that all the writers here are experiencing stage-fright as they are reading these words).
Came here to say the same thing. The AI problem is functionally no different to the paid essay writers. Grade everything at face value, and then have people write essays under exam conditions for grading.
> one exam at the end of the term where the student has an opportunity to demonstrate mastery of the course's subject matter. The resulting grade depends on that, and that alone.
I love this idea. And if a student is having a really bad day, or their dog just died, or they have bad cramps, or they have a hard time dealing with the intense stress of your entire grade being decided in one exam... well, those loser students can just fuck right off.
That's how it was for me - one exam per course at the end of each semester. To qualify for the exam you had to do take-home assignments. Didn't pass? Try again next semester. Was it easy? Hell no, but I learned a lot.
Would you design a system to assess knowledge, avoiding the distortions of AI on weekly submissions, according to the general case or the exceptional case?
Accommodations are part of the fabric already. It doesn't seem inconceivable that we couldn't deal with them in exceptional circumstances in a similar way to how it's done today.
How it’s done today is that they rely on your other marks from earlier in the semester to inform how your exam grade should be adjusted. That doesn’t work if there are no other marks to use.
The core of the problem the article is about isn't AI or LLMs, it's about scam software that claims to catch cheating. It's crap for the same reasons that crime predictions software is crap. It's selling a panacea, and that kind of product inherently attracts scammers.
If your school uses software to detect AI writing, that's a problem with the quality of your school. The people choosing that software are too stupid to be running a school. The software isn't going to get any better.
I'm always startled about how HN approaches these topics. When we have a press release from a university about how researchers can detect thoughts via fMRI, we have no issue with the claim. But if a vendor makes a pretty believable claim that there are repetitive statistical patterns in LLM output, it's all of sudden treated the same as palm reading.
The problem isn't that AI detection doesn't work. State of the art in this field is pretty solid. The only issue is that it's probabilistic, so it sometimes fails, and when it does, we have nothing else in situations where you actually want to know if someone put in the work.
So what are you proposing, exactly? That we run a large-scale experiment of "let's see what happens if children don't actually need to learn to do thinking and writing on their own"? The reality is that without some form of compulsion, most kids would rather play video games / scroll through TikTok all day. Or that we move to a vastly more resource-intensive model where every kid is given personalized instruction and watched 1:1?
>> But if a vendor makes a pretty believable claim that there are repetitive statistical patterns in LLM output, it's all of sudden treated the same as palm reading.
That's what fortunetellers do. The problem isn't guessing correctly about AI content in writing. The problem is false positives. That's what puts it in the same category is predictive policing scam software. And fortunetelling.
It has nothing to do with predictive policing. I don't understand this example, it has nothing to do with detecting intent. You're looking for evidence of a past misdeed.
False positive and false negative rates are non-zero, as with almost anything, but the tools are pretty good. I encourage you to give them a try. Pangram is a good state-of-the-art choice and you can try it for free. They also publish evals and other data about their approach.
Eliminating any statistically significant difference between a high-quality human-written text and LLM-written text is exactly what the LLMs are being trained for. At this point, "text is low quality, therefore must be human" is a much stronger signal.
> Eliminating any statistically significant difference between a high-quality human-written text and LLM-written text is exactly what the LLMs are being trained for.
I think you're basing this off a fundamental misunderstanding of what these detectors look for. LLMs generate human-like text, but they also generate roughly the same style and content every time for a given prompt, modulo some small amount of nondeterminism. In essence, they are a very predictable human. Ask Gemini or ChatGPT ten times in a row to write an essay about why AI is awesome, and it will probably strike about the same tone every single time, with similar syntax, idioms, etc.
This is what these tools detect: the default output of "hey ChatGPT, write me a school essay about X". This can be evaded with clever prompting to assume a different writing personality, but there's only so much evasion you can do without making the text weird in other ways.
You can detect if texts from a year ago used AI based on statistical patterns. Nobody is taking issue with that. But once you tell people "we will run these tests to detect if your future submissions are using AI" you create an adversarial environment and your statistical methods will continuously break. Not because statistics is broken, but because you are trying to hit a moving target that doesn't want to be hit.
That's not like detecting thoughts via fMRI, it's like detecting tomorrows malware with yesterday's malware signatures. Or like researchers making a vaccine against the common cold
And the obvious proposal to fix that has been made multiple times in this thread: don't make take-at-home tasks part of the grade. Instead of trying to punish what you can't reliably detect, take away the incentive to do it in the first place
> You can detect if texts from a year ago used AI based on statistical patterns.
I don't understand your argument. The vendors for these detection tools can acquire recent samples from all frontier models just as easily as you can use them to write essays. There's nothing that requires a one-year delay.
> When we have a press release from a university about how researchers can detect thoughts via fMRI, we have no issue with the claim.
Different people. I for one have always claimed that fMRI is too coarse-grained for detailed thought detection.
If AI detection "sometimes fails", it doesn't "work". It works well enough to convict someone with other evidence, but when there's no other evidence nor an attempt to get any, it has no good use.
What I propose is simple: grade only closed-book exams, and hold students' phones during the exams. Students don't need 1:1 monitoring, it's the same as 10-20 years ago.
Does crapping on the average school's deep well of expertise for evaluating how effectively AI software solutions address their problems somehow fix the underlying problem (that the cost of catching cheaters is significantly higher than the cost of cheating)?
(This is roughly the same problem as evaluating software that only does an approximation of what it claims to do.)
(Aside: AI-based variations on this theme are in the early stages of proliferating across our society. They're being developed by many people using this forum and being sold to our schools, businesses, governments, and other organizations with little regard to whether they actually do what they claim.)
I've noticed I write a lot different because of combative online arguments. I have a problem.
So much of my communication is directed to people who don't want to hear me or understand me. So I've become very punchy and repetitive, trying to hammer home ideas that people are either unable or unwilling to understand.
I need to find ways to talk to people who want to hear and understand me.
It's hard to find other people who actually want to hear and understand though. People have different interests, and even when people appear to be working towards the same goal, they often aren't; like a boss who just won't understand the bad news, because it's easier to ignore the problem.
One of the worst habits distinctive to online discussion-board writing (especially the sorts of places with lots and lots of people and where it's fairly hard to get permanently kicked out—like here) is too much hedging and over-specifying to try to head off shitposting by bad or bad-faith readers. It's all over forum posts, and it's poor writing, but without moderation that slaps down responses based on plain mis-reading you have to write that way, or your post will spawn all kinds of really stupid tangent strings of posts (and they still do anyway, sometimes). And, yes, the excessive and too-close-together repetitiveness you mention is part of that.
The result is that a ton of web forum/social-media posting would, in any other context, be fairly poor writing (even if it's otherwise got no problems) simply because of the the extra crap and contortions required to minimize garbage posts by poor readers who are, themselves, allowed to post to the same medium.
This is in addition to, though not wholly separate from, the tendency toward combativeness in online posting.
I totally agree with this. I would add that it's well beyond the discussion boards. It's probably most clear there and it's well possible we learned it there and then took it into our social interactions everywhere, but the majority of my irl interactions—except with my closest friends—are sorta like this. Sometimes I think its ADHD, other times I think it could be any number of things, but I think to say anything that isn't dead simple (or in dead agreement with the other person), you need a few sentences. Often, you need to hear the third sentence before the first will make sense. But if you get distracted by the first one or can't suspend your disagreement enough to get to the third you will think the person is mistaken. You'll think that about both their first point and the larger one, which you didn't really hear or even get to but thought you did. So the speaker does the hedging each sentence in hopes of getting to the third (or whatever) sentence.
To add to this: another sign of posting on online boards is starting your comments with "I agree" because otherwise the other person might default to assuming you are disagreeing (as is the norm for replies), leading to a comment chain of people violently agreeing with each other without realizing it
One thing that helps: remember that there are many people reading your response, one of them possibly being the person you replied to. Write for the audience, not specifically for the person you're responding to. It's a rare thing for someone to change their mind; it's a much more common thing for others to read your comment and gain something from it.
I just wanted to tell you that I read your comment immediately after writing mine and it's almost eerie how similar they are. There's the proof, if we needed any!
> I need to find ways to talk to people who want to hear and understand me.
Ask more questions. It takes work when dealing with smart people who think beyond the question you asked, adding their own context, and then replying with a different question. But those are the people who are willing to engage with you. Statements without questions can be ignored, and people who engage with different questions than the ones that you asked can be safely ignored as those who don't want to engage.
The cure to a purely adversarial conversation is educated curiosity. The educated part being being able to differentiate the threads that will lead down a tribalistic path vs those that will lead down an exploratory one.
More important than all of the above, is knowing when to walk away. It's barely a majority, but that barely majority "want" to waste your time. Ignore their DOS attempt, and save your time for people who want to engage, fairly. The fairly part being the most important.
It might not mean much, and it won't lead to an interesting conversation, but here's one that has read your comment, and every single word resonated like a tuning fork.
I find that a little faith goes a long way here: assume that you have a higher audience and speak to them accordingly.
Don't let the loud ones confuse you: normal, reasonable people (with normal, reasonable thoughts, just like yours) might not always reply, but they also read you.
I'm guessing you mean politics, but surely this is topic, person, time, and space dependent.
For example, I abhor talking about modern politics. If it’s election season and I’m being asked to cast a vote or take some other specific civic action, then I understand it’s my civic duty to understand the situation and make a decision accordingly and I do.
But if it’s March and there’s really nothing specific I can do as a result of this particular conversation, I would probably also be in your camp of the “unwilling”. I would much rather chat about something else, or nothing at all.
I'm also assuming you're referring to in-person communication. If it's online communication, all bets are off. It's unlikely you're having a linear conversation and these days you're probably not even talking to a person.
There's a tension (imo) between deciding to only spend time trying to talk to people who immediately agree with you or are open to hearing you out vs those who immediately disagree such that they will fight hard to not hear, not understand, misinterpret, or "not have time for this". The latter is a specific form of disagreement where they've "noise-canceled" the possibility of learning or understanding (even if it would be perfectly reasonable for them to disagree with it afterward).
Is your life easier to not waste time on them? I guess. But obviously you're going to put yourself in a similar bubble, and to whatever extent the issue is important it's now become undiscussed. As you've hinted at, they could be right and you wrong, but the difference is (at least in the premise) that one is willing to talk and listen and so really only one side has the potential to change and it's not based on the merit of the argument—because of course no conversation took place. How hard does one try to encourage someone else to listen? Or rather continuing pursuing a conversation that's being denied? That's the tension. I don't know other than it seems like the side unwilling to listen wins a little bit each time they've successfully evaded it and wins a little more when the other has decided to let it go. I don't just mean they've won a proverbial argument, I mean the issue or decision in question tilts toward their side.
> I need to find ways to talk to people who want to hear and understand me.
I'm told blogging works for some. I don't really know how you build an audience, though, and it's hard to keep going (first-hand experience) without one.
I object to the idea that the LLM writing that these students are trying to distinguish themselves from, is actually good in the first place. Although students might well end up writing worse because people are trusting the detection of LLM content to other LLMs. (And really, it's bizarre that these massively complex systems required to produce roughly human-like output, apparently offer such simplistic reasoning for what they detect as non-human.)
Honestly, I lean towards shaming educators who do that. If you can't detect the whiff of LLM with your own senses, then it has been used properly and shouldn't be faulted. If that premise invalidates your assignment, change the assignment. It's not as if you're assigning this work to test the basic mechanics of writing (grammar, sentence/paragraph structure, parallelism, whatever) — I mean, how much of that did you consciously try to teach? My recollection is, not an awful lot; and I can only imagine it's gotten worse since I was in K-12 (and I went to pretty darn good K-12).
> If you can't detect the whiff of LLM with your own senses, then it has been used properly and shouldn't be faulted.
But wouldn't this apply to any cheating method? I don't think educators would be able to tell the difference between using a calculator, getting answers from previous tests, resubmitting assignments, etc.
Students who are at a level where they'd be learning to do the computations a calculator does, shouldn't have graded homework. And even at that level, real mathematics is more than just computation.
> getting answers from previous tests
Decades ago, my teachers and professors knew advanced tricks for this, like "not just reusing the test questions from last year". Sometimes they even changed the constants in math questions between sections of the class.
Reading previous tests (including correct answers) was never considered cheating, or even slightly unethical, in my education. In fact, one of our professors had this party trick of working through all the answers for a past-year exam (perhaps multiple of them; I can't recall the details, but certainly much faster than students were expected to work things out under exam conditions) in the space of a single lecture, near the end of the course. Students were meant to see this and learn from it (as well as be impressed).
>Students who are at a level where they'd be learning to do the computations a calculator does, shouldn't have graded homework. And even at that level, real mathematics is more than just computation.
So, a math level less than Real analysis shouldn't have graded homework?
>Decades ago, my teachers and professors knew advanced tricks for this, like "not just reusing the test questions from last year".
Math is not the only subject. For an English class, what constant would you change so that students get a comparable exam (especially if you are going to do this between sections in the same corhort)?
>resubmitting assignments
Students are not stupid, and obviously would not resubmit an assignment for the same teacher. However, there is a significant overlap between classes, so certain assignments should be retooled for other assignments.
We can't, and neither can the machines that people build and/or use for "detection." Everyone in this thread also needs to recognize the entrenched differences between secondary educators, who have wholeheartedly adopted AI products into their teaching workflow, and tertiary educators, who have adopted them only by necessity. "By necessity" in this case means "having to spend a ton of time dealing with, talking about, and learning about this nonsense."
The discourse around "cheating" with these products has always been a mistake. We should have characterized them less as "cheating machines" and more as "expediency machines." Because once you're invested in describing students as having academic dishonesty issues rather than skill issues, you've made it an administrative problem. You never come back from that.
For mine, we lost the issue long ago when accountability culture won. We should never have bothered with the idea that "mechanics, grammar, and proofreading" should be part of a "rubric" that "assessed outcomes" for "good writing." We should have just said "we don't care if you don't think this is worthwhile, because your time is worth nothing." The last two years of student labor certainly suggests this.
The point has always been the act of writing itself. What you write about is almost irrelevant; it’s that you spent the time writing, that you had ideas in your head, and that you squeezed them onto the page.
Sure. And my point is that the assignment is poorly conceived if an LLM's output can appear to "have ideas" that satisfy the prompt. Last I checked, they don't do a good job of modeling a specific, non-notable person within particular constraints, and then all the relevant life experiences of that person. An LLM essay should be human-detectable for the same reasons that one from an essay mill would be.
No matter how intricate and detailed an object is, it will appear similar to any other blurry mess if it's viewed through a shoddy lens.
I think your point stands for upper level work; however, at medium to lower levels, your counterfactual starts to weaken. The ideas have always been there, but it's the ability to express them--well enough to notice their presence--that is not.
Is that not pointless now? The point of writing was previously to communicate our thoughts and ideas to other people. Now and going forward that is unnecessary. The most efficient and effective way for us to communicate our thoughts and ideas is to have an agent organize and write them down for us.
I was exploring ai porn, for science, and noticed another perverse incentive. I tried to prompt for a naked man and woman standing next to a pool, but could not get it to generate that image. Instead it insisted that the two characters must be having enthusiastic penetrative sex. A dozen prompts could not escape that strange attractor of porn.
It turns out to be built into the training data. The diffusion model just doesn't have many references of naked people not embedded in porn tropes, so it autocompletes porn.
Online moderation of generated images have the same weird incentive. Since real people seldom film themselves having sex, a naked person not having sex is a red flag for a possible real person, and gets moderated more strongly.
So in the new world, well written sentences are a handicap and nudity is generally accompanied by an exchange of fluids.
Grade school has never been kind to genuine writers. It reminds me of SAT essays that favored formulaic writing, because guess what: the grading criteria were formulaic!
I think grading in general can be stymying for students' motivation and creative drives.
I had fun with those because they only care about the quality of the writing not the content so I would make sure that none of my facts or references were real.
There was nothing useful about the particular formula they were teaching. It wouldn't even be useful for a bureaucrat. It only tested how well you knew the formula, how confidently you simplified inherently nuanced topics, and how lucky you got that the random underpaid SAT grader (usually a teacher looking for a pittance of extra cash) thought your essay fit the rubrics they were given.
True. Writing structures for arguments and analysis make a huge difference in effective writing.
I wish brevity and linguistic precision were taught more, as well. Miscommunication due to ambiguity is one of the biggest causes I see for confusion or heated arguments.
Maybe I’m less worried. Teachers seem to have adopted.
In my experience educators no longer use AI detectors given the risk of false positives. But some work is obviously lazy AI content. When that happens, educators talk to the student to see if they understand what they wrote.
Teachers cope with more in person writing, oral presentations, defense of what’s been written.
If you think out it the pre-AI computing generation is itself anomalous for having ubiquitous access to efficient human-only writing tools. We probably wrote more than previous generations. Early Internet / blogging culture bears this out.
One of the skills teachers have always demonstrated, is to be able to detect when students copy. This has never pushed students to artificially add mistakes to their essays.
If now teachers abdicate this judgment to a software, students should be allowed to abdicate their duties to a computer as well.
Training students to write a single theme in multiple styles—including intentionally "bad" writing—is "originally" a great educational method. It teaches real composition by helping students understand what works and what doesn't. It builds good criteria in students.
But, the article's focus on writing "worse" for AI detectors misses what is important. Trying to distinguish humans from machines does not develop student capability. In fact, it's a fleeting technique because AI writing styles will vary and improve over time.
We’re also training young people to get used to being surveilled by automated black-box tools, and to accept serious real-world consequences from their judgements.
These kinds of things are novel to us and deserving skepticism, but become just the world we live in to them.
When I was in high school I was a better writer when I had time (versus in class) and generally a better writer than I was a student. The net result was fairly often being accused of plagiarism. Not because the teacher had proof(I never plagiarized), but because the teacher couldn’t believe I could write to the level I sometimes wrote at on take home assignments. Admittedly, I was a wildly inconsistent student.
This reminds me a bit of that. AI writing is—in many ways—objectively very good, but that doesn’t matter if no one thinks you wrote it. AI writing is boring exactly because it is consistent and like any art form people want to see something original.
Sounds like a great opportunity for kids in high school to learn how to feed back the AI detection results into the model and have this process be automated. Next level would be fine tuning the model via reinforcement learning and sharing it with your friends via Hugging Face.
A few times in some Discord communities, I've been accused of being A.I. because of how I write. Kind of sad and a bit annoying. I also quite like em dashes, but have felt the need to reduce how much I use them.
Glad to see some schools and teachers teach how to use them well, rather than ban them outright.
em-dashes have been house style for where I've worked for over a couple decades. If people don't like it, F them. I'm not going to change how I write because people may think it make me more AI-like.
If you're just going to use software to judge the output of students then why don't we all just keep them at home? I have a computer at home and it seems like everyone from the teachers to the school board have just abdicated their responsibility. This doesn't sound like a system that needs to be maintained.
You know you can't just say "I detect AI written prose" and then do whatever you want about it, right? It's not difficult, sure, to detect it. It's difficult to prove that it's true and then punish the student for it.
Define "worse". I absolutely hated this formal essay style even before LLMs were a thing. All these "on the other side", "in conclusion" patterns with loads of generics of doesn't convey anything useful. And they make it really hard to tell if the writer is pretending to know anything or actually knows their shit but don't know how to write so that doesn't sound like an essay assignment. Good riddance.
On a side note: the fixed-pattern essay thing seems to be an American invention, or at least popularized by the American education system.
nobody's asking who profits from false positives. these AI detection vendors have a direct financial incentive to flag aggressively. more flags = "more value" = more school contracts renewed. same playbook as selling antivirus to your grandma. sell fear, charge per seat, and make the false positive rate someone else's problem.
Do you have any evidence to back this up or is it speculative?
My institution subscribes to TurnItIn's AI detector. The documentation is quite clear that the system is tuned in a manner that produces a significant number of false negatives and minimizes false positives. They also state that they don't report anything under "20% AI-generated" content.
So the marketing I've seen is intended to reassure skittish administrators that the software is not going to generate false accusations.
That being said, I have no idea whether the marketing claims are true. The software is a black box.
I've started do this on social media. I got "called out" after using big words or using a - in a sentence. So now I write less good on purpose, so whatever I commented doesn't get drawn into a sidetrack off-topic witch-hunt.
As soon as someone yells "witch" you cannot disprove you're not one, and I've even had people put my handwritten comments through "AI detector" websites that "proved" they were AI (they weren't). It literally just highlighted two popular English phases.
LLMs were trained on sites like HN and Reddit, so now if you write like a HN or Reddit commentator, you sound like AI...
Here's one vote for just be the witch if that's what people need from you.
Just make it be what you want to say and how you want to say it. And when they come after you, shame them to the best of your ability or treat them like they are not there.
I don’t think this is a good long term solution. LLMs can do easy language substitutions and you can even force them to add errors. So relying on that alone won’t work as people intentionally make things look more “human.”
Right, but the problem here are other humans yelling "witch," not LLMs. You're combating people's terrible witch-detector, not anything factual or real.
This is true, I know someone that has read multiple versions of the bible and their writing style became very similar to that. There's a term for it, I just forgot what the term was
This is what terrifies me about the public school system. A revolution has occurred, but it’s unevenly distributed.
The schools simply don’t have the flexibility, agility, or frankly it seems motivation to adapt to what has already happened.
The ship has sailed; essay writing is no longer a viable form of assessment.
The idea to try to build a reliable AI detector is asinine, and fundamentally misunderstands how any of this works now, let alone the very obvious trend-lines.
Stop with the lazy half-baked solutions, get your head out of the sand, rethink the whole curriculum. This is an emergency, we needed to be urgently attending to this years ago.
Public Schools. I think terror there is built as a feature, not a bug. So be afraid.
But keep in mind, it may have always been this way. God bless those few cool teachers in each school who are aware of this and work to rescue a few who need it.
> essay writing is no longer a viable form of assessment.
Of course it is. In person, with an unseen prompt/question. By hand or not doesn’t really matter as we can airgap or just monitor via software when in class.
From what I have seen, (some) private schools are moving faster here; not to say private primary/secondary schools are unaffected, rather that it's worst in public schools.
> The assignment had been to write an essay about Kurt Vonnegut’s Harrison Bergeron—a story about a dystopian society that enforces “equality” by handicapping anyone who excel
Did not this self censorship process started decades ago? There are certain answers expected in academia, arguing for anything else would get you in troubles. Not using “devoid” seems pretty minor inconvenience.
For me biggest wtf is why students are still expected to write graded essays, and to keep this make believe it is somehow useful and applicable skill.
Avoid the theory-heavy disciplines. You won't be told what to think (as often) if you take History and Geography rather than Sociology and Gender Studies.
The profit motive is corrupting and polluting every level of the education space.
Teachers are being hamstrung on curriculum. The districts enter into contracts that require the use of certain programs for certain amounts of time. We've known for decades (if not a century) that direct instruction works [1] but you can't sell devices, platforms and consulting services that way.
We're literally at the point in education we were in the 1950s when the health benefits of nicotine in your Q zone were lighting up the airwaves.
And generative AI means it's all but impossible to have take home writing assignments. But hey this is another opportunity to sell AI or cheating detection software, that's often just an em-dash detection [2].
We have a generation that gets to college quite possibly having never written a book. social promotion through grades and the constant distraction of electronic devices in classroom settings. I don't even necessarily blame the parents entirely either because we've constructed a society where 2 people need 5 jobs to make ends meet.
And while all this is going on we have a coordinated and well-funded effort to defund public education and move government funds to private schools based on the failing public education that's failing because we defunded it. This is usually backed up by some baloney study that shows charter shcool produce better results that really comes down to charter schools being able to be selective with enrolments while public schools cannot be. Plus we mingle in special education kids into public education because those programs got defunded too.
And really that's just a bunch of already affluent people who want a tax break for doing somethign they were going to do anyway: send their kids to private schools so they don't have to mingle with the poors and aren't taught inconvenient things like human reproduction, critical thinking and self-determination.
And after all of that we just end up teaching kids how to pass standardized tests.
Yes, Sorry, I did not instruct my agent to do this. I wanted to give it more autonomy and try to make it more aggressive with tool use. Will block it from here >.>
Yeah, after posting I had a look through your comment history and it's pretty clear that you're posting in good faith. I would definitely not let an agent anywhere near HN in the current state of things. (I wouldn't let one publish on my behalf anywhere on the Internet, honestly, but that has more to do with personal principles.)
Did they not even test their AI detection tool to verify that it can detect when something is human written? That should have been exactly as important as the opposite. Maybe a tool that checked that would be equally as ineffective and we’d move on from the subject entirely
How about going back to the old system where, apart from experimental lab work, nothing is graded until the end of the term?
All weekly assignments should just be considered prep for one exam at the end of the term where the student has an opportunity to demonstrate mastery of the course's subject matter. They can prepare as they wish, use AI, and even cheat on the homework, but there will be a revelation at the end of the term.
That final test can be proctored, monitored, audited to ensure that whatever words are used are indeed the student's own words. The resulting grade depends on that, and that alone.
The approach of continuous assessment, which to me always seemed suspect and ripe for abuse, was completely broken by the AI tools that are now available.
A better approach is to rethink what we assess and how we assess it. Research shows that the design of assessments plays an important role in academic integrity. Assignments that require original thinking and regular engagement can reduce incentives to cheat and improve learning outcomes.
https://www.sciencedirect.com/science/article/abs/pii/S22119...
If the only remedy is monitored end of term exams, so be it.
Sure, it requires more resources, but it shouldn't require much more:
- We've had multiple exams before AI, and I don't see how AI makes it any harder. Obviously these are closed-book
- Schools should already be banning phones in class (and colleges have insane tuitions, they can afford more exams)
- The students who go out of their way to cheat - as long as they're a minority, let them. Why not? Either they'll fail later in life, or they didn't need to learn the material because they're pathological fakers (even if you won and forced them to learn the material, they'd probably still fake their way out of using it). Then, I doubt you need much proctoring to ensure that most students don't cheat, because most of the smart students are generally smart enough to know that actually learning the material is probably important (or if the material is probably not important, it doesn't matter if the students all cheat...)
Meanwhile, downsides of one exam:
- Disadvantages students who get overly stressed about unrecoverable exams, or have a particularly bad day on the exam
- Many students will blow off the (ungraded) assignments and put off actually learning until the end
- Less graded content (especially if the exam isn't overly long, which would disadvantage some students)
Homework was assigned but not graded.
Periodic tests is the way to go.
I hated courses where the final was more than 30%. Forget 100%.
If someone knows 80% of the topics on an exam like the back of their hand and doesn't know the other 20% they shouldn't get a B, they should pass the subjects they know and be asked to retake and relearn the subjects they don't know.
When people know they can make mistakes and the result is not a perpetual black mark on their record (any grade not an A) but they are given the chance to improve and demonstrate this improvement then perhaps they might be more willing to admit and understand mistakes instead of cheating.
There are things you learn from spending several days structuring a 20-page argument that you will not learn (and cannot assess) from oral examination or a 5-paragraph essay written in a blue book.
That knowledge will show up in the blue book vis-a-vis the other exam candidates.
I would prefer not to be confrontational here, but I am having a hard time imagining that you've deeply considered the pedagogy of how to teach and evaluate students on squishy skills like this.
Knowing a bunch of facts about something is a world apart from structuring a compelling in-depth argument about it.
For example, the life-changingly-well-designed newswriting course I took in college assigned every single student a different story to spend several weeks reporting out so that we wouldn't all be out harassing the same poor people for interviews.
At-home coding projects, writing essays, etc also exercise different skiils than you can test for in a 2 hour written exam. It's unfortunate that due to rampant AI cheating, we can no longer reward the students who put in the work and develop these skills.
Of course, creating good exams is difficult, but you have to do that either way.
Data does not support your sense.
Most students do not have good time management skills, usually because they have no models and/or have not been taught these skills.
Furthermore, continuous feedback, whether graded or not, has been found to be more effective than one-shot feedback.
Evaluation and assessment is a complex topic towards which many people (not necessarily you) want to take an overly simplified approach.
There are trade offs for any system that is chosen. The organizations providing the grades have to decide what their priorities are (e.g., time, accuracy, etc.).
Do you only learn when you’re being graded?
As a general rule when changing complex systems, you sacrifice what you aren’t trying to optimize. If you make a random change to a car without consideration for gas mileage it’s very likely to reduce gas mileage.
(The other side of that contract is, kids are not merely attending schools to learn, but to earn a degree that carries some degree of prestige)
The OP was about students dumbing down their own work to avoid AI detectors ratting them out. That seems like a big loss.
A grade, on a single proctored test, is a crude metric, but at least it would be a brutally fair one.
Personally, I felt that the drive to automate the parts of the professors' workloads that mattered (i.e. teaching and grading and evaluation and research), only so that they can be given work that matters less the more they do it (i.e. publishing slightly different flavors of the same paper, to meet KPIs), was oddly perverse.
The multiple-choice test and the puzzle-solving test and really any standardized test can be exploited by any group that is sufficiently organized. This is also true in corporate interviewing where corporations think (or pretend) that they are interviewing an individual, whereas they are actually interviewing a _network_ of candidates who share details about the interviewers and the questions. I know people who got rejected in spite of getting all the interview questions correct (the theory is that nobody can do that well, so they must have had help from previously rejected/accepted candidates).
The word "trust" shares a root with the word "tree" and "truth" and "druid". Most exams and interviews are trying to speed-run trust-building (note that "verification" is from the latin word that means "true"). If trust and truth are analogous to "tree", then we are trying to speed-run the growth of a tree -- much like the orange tree, in the film, _The Illusionist_. And like the orange tree, it is a near-complete illusion, a ritual meant to keep the legal department and HR department happy.
The LLMs have simply made the corruption of academia accessible to _all_ students with an internet connection (EDIT: and instantaneous and cheap, unlike a human writer).
There has never been a shortcut to building trust. One cannot LLM their way into being a (metaphorical) druid.
I do not look forward to the Voight-Kampff tests that will come to dominate all aspects of online and asynchronous human interaction.
Note that, short of homework/classwork that _can't_ be gamed by an LLM (for some fundamental reason), even the high-quality honest students will be forced to cheat, so as to not be eclipsed by the actual low-quality cheating students[0].
I imagine that we may end up wrapping around to live in-person dialectics, as were standard in the time of Socrates and Parmenides[1]. If so, this should be fun.
[0]: If left unaddressed, we may see a bimodal distribution of great and terrible students graduating college, with those in between dropping out. If college is an attempt to categorize and rank a population, this would be a major fault in that mechanism.
[1]: Not to the exclusion of the other kinds of tests, writing is still important, critical even. But as a kind of verification-step, that should inform how much the academic community should trust the writing (I can imagine that all the writers here are experiencing stage-fright as they are reading these words).
I love this idea. And if a student is having a really bad day, or their dog just died, or they have bad cramps, or they have a hard time dealing with the intense stress of your entire grade being decided in one exam... well, those loser students can just fuck right off.
Accommodations are part of the fabric already. It doesn't seem inconceivable that we couldn't deal with them in exceptional circumstances in a similar way to how it's done today.
Accommodations are real and necessary, but applied at the end.
(Experimental sciences are an exception)
... well then, why not use those same protections (proctoring, monitoring, auditing) in continuous examination?
If your school uses software to detect AI writing, that's a problem with the quality of your school. The people choosing that software are too stupid to be running a school. The software isn't going to get any better.
The problem isn't that AI detection doesn't work. State of the art in this field is pretty solid. The only issue is that it's probabilistic, so it sometimes fails, and when it does, we have nothing else in situations where you actually want to know if someone put in the work.
So what are you proposing, exactly? That we run a large-scale experiment of "let's see what happens if children don't actually need to learn to do thinking and writing on their own"? The reality is that without some form of compulsion, most kids would rather play video games / scroll through TikTok all day. Or that we move to a vastly more resource-intensive model where every kid is given personalized instruction and watched 1:1?
That's what fortunetellers do. The problem isn't guessing correctly about AI content in writing. The problem is false positives. That's what puts it in the same category is predictive policing scam software. And fortunetelling.
False positive and false negative rates are non-zero, as with almost anything, but the tools are pretty good. I encourage you to give them a try. Pangram is a good state-of-the-art choice and you can try it for free. They also publish evals and other data about their approach.
I think you're basing this off a fundamental misunderstanding of what these detectors look for. LLMs generate human-like text, but they also generate roughly the same style and content every time for a given prompt, modulo some small amount of nondeterminism. In essence, they are a very predictable human. Ask Gemini or ChatGPT ten times in a row to write an essay about why AI is awesome, and it will probably strike about the same tone every single time, with similar syntax, idioms, etc.
This is what these tools detect: the default output of "hey ChatGPT, write me a school essay about X". This can be evaded with clever prompting to assume a different writing personality, but there's only so much evasion you can do without making the text weird in other ways.
That's not like detecting thoughts via fMRI, it's like detecting tomorrows malware with yesterday's malware signatures. Or like researchers making a vaccine against the common cold
And the obvious proposal to fix that has been made multiple times in this thread: don't make take-at-home tasks part of the grade. Instead of trying to punish what you can't reliably detect, take away the incentive to do it in the first place
I don't understand your argument. The vendors for these detection tools can acquire recent samples from all frontier models just as easily as you can use them to write essays. There's nothing that requires a one-year delay.
Do AI vendors specifically train models to circumvent AI detectors? Why would they?
Different people. I for one have always claimed that fMRI is too coarse-grained for detailed thought detection.
If AI detection "sometimes fails", it doesn't "work". It works well enough to convict someone with other evidence, but when there's no other evidence nor an attempt to get any, it has no good use.
What I propose is simple: grade only closed-book exams, and hold students' phones during the exams. Students don't need 1:1 monitoring, it's the same as 10-20 years ago.
(This is roughly the same problem as evaluating software that only does an approximation of what it claims to do.)
(Aside: AI-based variations on this theme are in the early stages of proliferating across our society. They're being developed by many people using this forum and being sold to our schools, businesses, governments, and other organizations with little regard to whether they actually do what they claim.)
I've noticed I write a lot different because of combative online arguments. I have a problem.
So much of my communication is directed to people who don't want to hear me or understand me. So I've become very punchy and repetitive, trying to hammer home ideas that people are either unable or unwilling to understand.
I need to find ways to talk to people who want to hear and understand me.
It's hard to find other people who actually want to hear and understand though. People have different interests, and even when people appear to be working towards the same goal, they often aren't; like a boss who just won't understand the bad news, because it's easier to ignore the problem.
The result is that a ton of web forum/social-media posting would, in any other context, be fairly poor writing (even if it's otherwise got no problems) simply because of the the extra crap and contortions required to minimize garbage posts by poor readers who are, themselves, allowed to post to the same medium.
This is in addition to, though not wholly separate from, the tendency toward combativeness in online posting.
Ask more questions. It takes work when dealing with smart people who think beyond the question you asked, adding their own context, and then replying with a different question. But those are the people who are willing to engage with you. Statements without questions can be ignored, and people who engage with different questions than the ones that you asked can be safely ignored as those who don't want to engage.
The cure to a purely adversarial conversation is educated curiosity. The educated part being being able to differentiate the threads that will lead down a tribalistic path vs those that will lead down an exploratory one.
More important than all of the above, is knowing when to walk away. It's barely a majority, but that barely majority "want" to waste your time. Ignore their DOS attempt, and save your time for people who want to engage, fairly. The fairly part being the most important.
I find that a little faith goes a long way here: assume that you have a higher audience and speak to them accordingly.
Don't let the loud ones confuse you: normal, reasonable people (with normal, reasonable thoughts, just like yours) might not always reply, but they also read you.
For example, I abhor talking about modern politics. If it’s election season and I’m being asked to cast a vote or take some other specific civic action, then I understand it’s my civic duty to understand the situation and make a decision accordingly and I do.
But if it’s March and there’s really nothing specific I can do as a result of this particular conversation, I would probably also be in your camp of the “unwilling”. I would much rather chat about something else, or nothing at all.
I'm also assuming you're referring to in-person communication. If it's online communication, all bets are off. It's unlikely you're having a linear conversation and these days you're probably not even talking to a person.
If they don't want to listen, why waste the time?
> So I've become very punchy and repetitive, trying to hammer home ideas that people are either unable or unwilling to understand.
If they don't want it, why stuff it down their throats? Aren't they allowed to have their own ideas?
Is your life easier to not waste time on them? I guess. But obviously you're going to put yourself in a similar bubble, and to whatever extent the issue is important it's now become undiscussed. As you've hinted at, they could be right and you wrong, but the difference is (at least in the premise) that one is willing to talk and listen and so really only one side has the potential to change and it's not based on the merit of the argument—because of course no conversation took place. How hard does one try to encourage someone else to listen? Or rather continuing pursuing a conversation that's being denied? That's the tension. I don't know other than it seems like the side unwilling to listen wins a little bit each time they've successfully evaded it and wins a little more when the other has decided to let it go. I don't just mean they've won a proverbial argument, I mean the issue or decision in question tilts toward their side.
I'm told blogging works for some. I don't really know how you build an audience, though, and it's hard to keep going (first-hand experience) without one.
Honestly, I lean towards shaming educators who do that. If you can't detect the whiff of LLM with your own senses, then it has been used properly and shouldn't be faulted. If that premise invalidates your assignment, change the assignment. It's not as if you're assigning this work to test the basic mechanics of writing (grammar, sentence/paragraph structure, parallelism, whatever) — I mean, how much of that did you consciously try to teach? My recollection is, not an awful lot; and I can only imagine it's gotten worse since I was in K-12 (and I went to pretty darn good K-12).
But wouldn't this apply to any cheating method? I don't think educators would be able to tell the difference between using a calculator, getting answers from previous tests, resubmitting assignments, etc.
> using a calculator
Students who are at a level where they'd be learning to do the computations a calculator does, shouldn't have graded homework. And even at that level, real mathematics is more than just computation.
> getting answers from previous tests
Decades ago, my teachers and professors knew advanced tricks for this, like "not just reusing the test questions from last year". Sometimes they even changed the constants in math questions between sections of the class.
Reading previous tests (including correct answers) was never considered cheating, or even slightly unethical, in my education. In fact, one of our professors had this party trick of working through all the answers for a past-year exam (perhaps multiple of them; I can't recall the details, but certainly much faster than students were expected to work things out under exam conditions) in the space of a single lecture, near the end of the course. Students were meant to see this and learn from it (as well as be impressed).
> resubmitting assignments
Why would you ever not notice this?
>Students who are at a level where they'd be learning to do the computations a calculator does, shouldn't have graded homework. And even at that level, real mathematics is more than just computation.
So, a math level less than Real analysis shouldn't have graded homework?
>Decades ago, my teachers and professors knew advanced tricks for this, like "not just reusing the test questions from last year".
Math is not the only subject. For an English class, what constant would you change so that students get a comparable exam (especially if you are going to do this between sections in the same corhort)?
>resubmitting assignments
Students are not stupid, and obviously would not resubmit an assignment for the same teacher. However, there is a significant overlap between classes, so certain assignments should be retooled for other assignments.
The discourse around "cheating" with these products has always been a mistake. We should have characterized them less as "cheating machines" and more as "expediency machines." Because once you're invested in describing students as having academic dishonesty issues rather than skill issues, you've made it an administrative problem. You never come back from that.
For mine, we lost the issue long ago when accountability culture won. We should never have bothered with the idea that "mechanics, grammar, and proofreading" should be part of a "rubric" that "assessed outcomes" for "good writing." We should have just said "we don't care if you don't think this is worthwhile, because your time is worth nothing." The last two years of student labor certainly suggests this.
I think your point stands for upper level work; however, at medium to lower levels, your counterfactual starts to weaken. The ideas have always been there, but it's the ability to express them--well enough to notice their presence--that is not.
It turns out to be built into the training data. The diffusion model just doesn't have many references of naked people not embedded in porn tropes, so it autocompletes porn.
Online moderation of generated images have the same weird incentive. Since real people seldom film themselves having sex, a naked person not having sex is a red flag for a possible real person, and gets moderated more strongly.
So in the new world, well written sentences are a handicap and nudity is generally accompanied by an exchange of fluids.
I think grading in general can be stymying for students' motivation and creative drives.
Good riddance to the thing.
I wish brevity and linguistic precision were taught more, as well. Miscommunication due to ambiguity is one of the biggest causes I see for confusion or heated arguments.
In my experience educators no longer use AI detectors given the risk of false positives. But some work is obviously lazy AI content. When that happens, educators talk to the student to see if they understand what they wrote.
Teachers cope with more in person writing, oral presentations, defense of what’s been written.
If you think out it the pre-AI computing generation is itself anomalous for having ubiquitous access to efficient human-only writing tools. We probably wrote more than previous generations. Early Internet / blogging culture bears this out.
If now teachers abdicate this judgment to a software, students should be allowed to abdicate their duties to a computer as well.
But, the article's focus on writing "worse" for AI detectors misses what is important. Trying to distinguish humans from machines does not develop student capability. In fact, it's a fleeting technique because AI writing styles will vary and improve over time.
These kinds of things are novel to us and deserving skepticism, but become just the world we live in to them.
This reminds me a bit of that. AI writing is—in many ways—objectively very good, but that doesn’t matter if no one thinks you wrote it. AI writing is boring exactly because it is consistent and like any art form people want to see something original.
Glad to see some schools and teachers teach how to use them well, rather than ban them outright.
why are they using software to detect software?
I can detect AI written prose in less than five seconds; I would expect a trained teacher to be able to do that as well.
On a side note: the fixed-pattern essay thing seems to be an American invention, or at least popularized by the American education system.
My institution subscribes to TurnItIn's AI detector. The documentation is quite clear that the system is tuned in a manner that produces a significant number of false negatives and minimizes false positives. They also state that they don't report anything under "20% AI-generated" content.
So the marketing I've seen is intended to reassure skittish administrators that the software is not going to generate false accusations.
That being said, I have no idea whether the marketing claims are true. The software is a black box.
As soon as someone yells "witch" you cannot disprove you're not one, and I've even had people put my handwritten comments through "AI detector" websites that "proved" they were AI (they weren't). It literally just highlighted two popular English phases.
LLMs were trained on sites like HN and Reddit, so now if you write like a HN or Reddit commentator, you sound like AI...
If someone calls an article like this a "jeremiad" I know they're a human.
Just make it be what you want to say and how you want to say it. And when they come after you, shame them to the best of your ability or treat them like they are not there.
It wasn't someone who was primarily motivated by fear of the past that made it work the first time.
I've begun downvoting each and every entry that questions the authenticity of a comment or article.
I don't even bother if the claim is true or not. A text can be AI-generated and interesting, or human-written and dumb.
LinkedIn, OTOH....
This will likely be valuable for AI skills too.
The schools simply don’t have the flexibility, agility, or frankly it seems motivation to adapt to what has already happened.
The ship has sailed; essay writing is no longer a viable form of assessment.
The idea to try to build a reliable AI detector is asinine, and fundamentally misunderstands how any of this works now, let alone the very obvious trend-lines.
Stop with the lazy half-baked solutions, get your head out of the sand, rethink the whole curriculum. This is an emergency, we needed to be urgently attending to this years ago.
But keep in mind, it may have always been this way. God bless those few cool teachers in each school who are aware of this and work to rescue a few who need it.
Love changes everything. Good teachers matter.
Of course it is. In person, with an unseen prompt/question. By hand or not doesn’t really matter as we can airgap or just monitor via software when in class.
This has nothing to do with Public School in particular. This is impacting private and university education too.
Did not this self censorship process started decades ago? There are certain answers expected in academia, arguing for anything else would get you in troubles. Not using “devoid” seems pretty minor inconvenience.
For me biggest wtf is why students are still expected to write graded essays, and to keep this make believe it is somehow useful and applicable skill.
In short it’s a good way measure thinking.
Teachers are being hamstrung on curriculum. The districts enter into contracts that require the use of certain programs for certain amounts of time. We've known for decades (if not a century) that direct instruction works [1] but you can't sell devices, platforms and consulting services that way.
We're literally at the point in education we were in the 1950s when the health benefits of nicotine in your Q zone were lighting up the airwaves.
And generative AI means it's all but impossible to have take home writing assignments. But hey this is another opportunity to sell AI or cheating detection software, that's often just an em-dash detection [2].
We have a generation that gets to college quite possibly having never written a book. social promotion through grades and the constant distraction of electronic devices in classroom settings. I don't even necessarily blame the parents entirely either because we've constructed a society where 2 people need 5 jobs to make ends meet.
And while all this is going on we have a coordinated and well-funded effort to defund public education and move government funds to private schools based on the failing public education that's failing because we defunded it. This is usually backed up by some baloney study that shows charter shcool produce better results that really comes down to charter schools being able to be selective with enrolments while public schools cannot be. Plus we mingle in special education kids into public education because those programs got defunded too.
And really that's just a bunch of already affluent people who want a tax break for doing somethign they were going to do anyway: send their kids to private schools so they don't have to mingle with the poors and aren't taught inconvenient things like human reproduction, critical thinking and self-determination.
And after all of that we just end up teaching kids how to pass standardized tests.
[1]: https://marginalrevolution.com/marginalrevolution/2018/02/di...
[2]: https://medium.com/@brentcsutoras/the-em-dash-dilemma-how-a-...
Have you considered using your own words to express those thoughts?