Adapting Teaching and Assessment Strategies in the Age of LLMs

Published: 6 Jul 2025

💡 All opinions expressed in these writings are mine only and are not representative of my employers' views.

As an educator in the field of computer science and programming, I am conflicted by the rise in LLM usage by students.

On one hand, LLMs (and the AI agents that are built on them) are ingenious search-and-summarization engines, using which knowledge on every topic from all over the world can be retrieved from a single prompt. This brings about a radical level of accessibility of knowledge to learners.

On the other hand, students have also outsourced the act of thinking and problem solving to LLMs. With LLM usage, we aren't sure anymore how much of a student's performance for an assignment truly reflects his level of competency and understanding on a tested topic. The distinction between original work and plagiarism is also blurred, as the LLMs output results by pattern-matching written words from the public internet.

Many educators think that the problem with accurate assessment could be solved by shifting the goal post - that we should also test the students' ability to perform with the aid of LLMs to get them ready for the future of work (people like to compare the use of LLMs to the use of calculators). BUT...

Shifting the goal post is not a full solution

Shifting the goal post might work (and I repeat, "might") when we are testing for certain competencies where the usage of LLM is more complementary than substitutory.

For example, LLMs might work well with Final Year Project assignments, where the students would have to condense business requirements into a refined prompt, and review the LLM's code output in terms of it's accuracy and context to the bigger, existing codebase. In this case, the students' problem-solving ability, his understanding of programming fundamentals, and even his level of competency in tool-usage (the LLM being one of those tools), might be fairly assessed.

But for leectcode style questions that tests on the students' fundamental knowledge on data structures and algorithms in a direct-manner, the usage of LLMs tend to be more substitutory in nature. Take the case of now (in)famous "Roy Lee".

I believe that the objective of most educational institutes is to produce graduates that are (1) ready for the future of work, while also being (2) strong in the fundamentals of their craft. In fact, I would even say that having strong fundamentals is a prerequisite to being ready for the future of work. So we still have to address the case where current teaching and assessment strategies fail in helping students build their fundamentals due to LLM usage.

Why should we delay LLM use

A recent research by MIT confirmed what many educators would have thought, by common-sense, to be the case when students are exposed to LLMs early. To a very high degree, these students would experience:

These are direct quotes from the paper:

We assigned participants to three groups: LLM group, Search Engine group, Brain-only group, where each participant used a designated tool (or no tool in the latter) to write an essay. We conducted 3 sessions with the same group assignment for each participant. In the 4th session we asked LLM group participants to use no tools (we refer to them as LLM-to-Brain), and the Brain-only group participants were asked to use LLM (Brain-to-LLM).

(...)

(Results from this experiment) offers evidence that:

  1. Early AI reliance may result in shallow encoding.
    LLM group's poor recall and incorrect quoting is a possible indicator that their earlier essays were not internally integrated, likely due to outsourcing cognitive processinging to the LLM.
  2. Witholding LLM tools during early stages might support memory formation
    Brain-only group's stronger behavioural recall, supported by more robust EEG connectivity, suggests that initial unaided effort promoted durable memory traces, enabling more effective reactivation even when LLM tools were introduced later.
  3. Metacognitive engagement is higher in the Brain-to-LLM group
    Brain-only group might have mentally compared their past unaided efforts with tool-generated suggestions (as supported by their comments during the interviews), engaging in self-reflection and elaborative rehearsal, a process linked to executive control and semantic integration, as seen in their EEG profile.

(...) The LLM-to-Brain group's early dependence on LLM tools appeared to have impaired long-term semantic retention and contextual memory, limiting their ability to reconstruct content without assistance. In contrast, Brain-to-LLM participants could leverage tools more strategically, resulting in stronger performance and more cohesive neural signatures.

This study corresponds heavily with an earlier study (Abbas et al.), which shows that ChatGPT usgae leads to:

From personal experience, I can confirm that these observations are correct.

However, that's not the end of the story. LLM-usage can have positive effects on learning as well! The same study from Kosmyna et al. has also shown that if learners already have "sufficient self-driven cognitive effort" in the early stages of learning without using AI, then using AI in the later stages can reinforce nerual connectivities that were formed earlier.

That means, we do not want to totally discourage the use of LLMs, but we want to delay their usage to only after the students have put in enough independent effort with thinking and problem solving.

How do we delay LLM use

The most obvious way to make sure students' competency and knowledge are assessed without influence from LLMs is to go back to the traditional method of having on-site, closed-book exams. However, in the school system that I am in, they have disallowed this option. It is also debatable whether exams themselves are the best, most accurate form of assessment.

So, assuming that we are going by take-home practical assignments that has fixed stipulated deadlines (which is the assessment method for many programming courses), what are some methods that would encourage students to delay LLM usage until they have "engaged in sufficient self-driven cognitive effort"? I have used these methods myself for 2 semesters, and they have been consistently effective.

Step 1: Using "code reviews" as the assessment method

Take note that we still want to allow students to use LLMs for their work, but we want to put a strong emphasis on how and when LLMs are used. So the question now becomes:

Are we able to systematically inject into the assessment method a way that inspects for LLM-usage in their work, and even use this as a "teaching opportunity" on how and when to use LLMs?

Apparently, there is a reliable way to inpsect for LLM-usage through "code reviews"!

I am rather confident that "code reviews" is a very accurate measure of students' LLM usage in their assignments. This is supported by this finding by Kosmyna et al. It is mentioned in the paper that:

"The most consistent and significant behavioral divergence between the groups was observed in the ability to quote one's own essay. LLM users significantly underperformed in this domain, with 83% of participants (15/18) reporting difficulty quoting in Session 1, and none providing correct quotes."

I'll leave the explanation of the neuralscientific mechanics to the paper. But this is a very important discovery - most students can't quote the work they've done when LLMs were used. This corresponds with my personal experience of doing code reviews with students, and also interviewing prospective students by asking them about implementation of specific features in their portfolio projects.

Making use of this rule, here is how I conducted assessments - I told students that LLMs may be used for their work, but:

What makes this assessment method especially effective is the finding by Abbas et al., which states that students who have a higher sensitivity to rewards will have a lower tendency of ChatGPT use, as they might be afraid of being penalized for getting caught for using AI... even if the assessment does not have such a rule!

Having "code reviews" as an assessment method is basically telling students that there is a (relatively high) chance of getting caught with using AI. Of course, they can try "sanitizing" their work, or they can just spend the same time understanding their code. They can decide which is the more fruitful endeavour!

I hope you can see that "code reviews", and the potential to being "exposed", incentivises students to be honest, and puts a focus on "understanding" before "outcomes".

Step 2: Release assignment briefs and learning content early to reduce time pressure

The same study by Abbas et al. has also shown that students have a higher tendency to use LLMs when they are faced with time pressure. This presents us with the main idea, which is to give students time allowance so they may choose "learning" over "shortcuts".

One obvious method to reduce time pressure is to release the assignment briefs early. However, that doesn't seem to be the end of the story. The effective duration a student has to complete his assignment is actually also dependent on the time between the stipulated deadline and the date when the last content of the course is released. To put it more concisely, the effective duration to complete an assignment is:


deadline - max(assignment_available_time, last_content_available_time);
            

For the past 2 semesters, I have released my assignment briefs and all learning content at the start of each semester. Every student I surveyed has expressed love for this format!

💡 Note that in traditional school systems, all learning content are provided the moment you purchase a textbook. So if you come from that setup, you would find the idea of releasing learning content early to be an already given proposition. But in other systems where learning content is customized, these learning content are often gated by time, i.e. content for each week is only released a couple of days before the physical lesson happens.

From personal experience, releasing all learning content early has many benefits. It encourages students to go on self-paced learning, which is one of the most effective kind of learning as it is usually done out of interest. Also, by the time I conduct the physical lesson for the week, more students would have reviewed the content beforehand, leading to more proactive asking of questions (which means less need to rely on LLMs for quick answers and shortcuts). This leads to more active class participation and easier, more enjoyable face-to-face instruction.

Best of all, in the time I have used this method, no students had given late submissions, with a third of them consistently submitting their assignments 1-2 days before the deadline! Evidently, this is a very effective method for reducing time pressure.

Some colleagues might find that releasing all learning content at the start of the semester to be an uncomfortable proposition, as it is common practice for new learning materials to be developed concurrently as the semester is ongoing. This happens because teachers are often multi-tasking between their teaching duty and other work commitments.

However, I would disagree with the colleagues' dismissal that releasing learning content should be a priority at all. Being a software engineering course, some colleagues suggest that students should practice "time/project management", i.e. students should work on features which were taught in class first, instead of starting their assignments only after the last course content has been delivered.

As someone still relatively fresh from active engineering practice, I can say confidently that this is a flawed view of "project management". Project requirements should be figured as early as possible so as to allocate enough time for polish, debugging and integration of features. How would students be able to plan their development timeline fully if there are key features of the program where implementation details are still murky?

One may also argue that students should practice the "Agile" flavour of project management, where features are worked on iteratively and the final product "evolves" as new requirements are surfaced / new content is delivered through the weeks. This is also a flawed idea. Unless the student has fully mastered the art of his craft, it is likely that the student would either have

  1. a highly decoupled feature set resulting in an inconsistent product. For example, making a game without an obvious core mechanic, featuring multiple minigames or a carnival of disparate features. Of course, that is fine if that is what you are looking for... but please be aware of what you'd get from most of the students!
  2. a highly coupled codebase where things will break and code will have to be rewritten everytime there is a new iteration.

Agile project management itself often introduces a longer development time than the standard "waterfall" method anyway, in exchange for the extra versatility to respond to evolving requirements. This inadvertently already adds on to time pressure for students.

Step 3: Reduce perceived workload by clarifying assignment requirements

Abbas et al. has also discovered that higher workload for the students would also motivate students to seek quick answers and shortcuts from LLMs.

If we go with the general basis that the amount of learning content drives the total amount of work required by students of their assignments, then we would find that it is very difficult to reduce the students' workload because we would need to reduce the size of our learning content, or the total number of subjects being taken at a time. Changing an established curriculum for any subject is a complex matter and the process takes time.

So while I leave matters relating to updating curriculums to the managers and curriculum reviewers sitting above me, let us look, as teachers on the ground, at what else is within our scope of influence over the students.

One thing that surprised me initially was the discovery that many students over-estimated their workload, i.e. they have the wrong perception of what is really required to achieve their desired grades - especially the "passing" grade (i.e. in this case, the "D" grade).

Perhaps this is because many teachers somehow tend to emphasize on requirements for getting "A"s more than the requirements for getting "D"s. With only knolwedge about requirements for getting "A"s, students would misunderstand the amount of work required!

This might be because we hope the students would "aim for the stars and land on the moon" (i.e., making students try their best to aim for an "A"). Maybe because "A students" tend to be more outspoken than the rest, and so we end up only answering their questions on achieveing the "A" requirements as they're the only questions raised. Maybe we only showcase past grade-A works as we are eager to celebrate our students' successes and hope to inspire them, but students often receive messages that are different from our best intentions

(Or maybe there is just generally too much emphasis on grades! As will be elaborated on later...)

To fix this, generally I would

  1. Showcase both grade-D works and grade-A works from past students
  2. Be crystal clear about grade-D requirements in the rubrics
  3. Verbally explain and emphasise on grade-D requirements as stated in the rubrics
  4. Remind students to prioritize "passing" before achieving higher grades

So many students told me that they felt "relieved" when they knew the passing requirements (i.e., getting a "D") were easy, and that the workload is actually lighter than they initially expected. They were more motivated to push for higher grades once they knew baseline standards have been achieved.

And with this sentiment, I would expect the students' urgency to use LLMs as a shortcut would reduce as well.

Step 4: Emphasise on long-term growth instead of short-term goals

Let the students focus less on their grades now (short-term goals) but more on their overall understanding of their discipline unrestricted by time (long-term growth). The point is to make students set their eyes on a goal that splits the "work" they have to do over a longer time horizon.

This step aims to address both the "workload" and "time pressure" factors that led to LLM usage. However, note that students would only have the mental allowance to consider this mindset shift when more concrete solutions to reduce their time pressure and perceived workload has been applied (see step 2 and step 3 above).

This is a really hard task. I suspect that I have not personally achieved this mindset shift with most of my students, because of:

Well, we do what we can, in our current capacities.

Step 5: Explain to students the neuralogical effects of habitual LLM usage

I mean, the effects of habitual LLM usage (described here) are pretty scary. A little scare tactic can work 😉

Introducing a nicotine patch to LLMs

We may create awareness among students about the negative side effects of early, habitual LLM-usage. However, it'd still be difficult for students who have already formed an unhealthy reliance to quit the habit. Hence, I have created a "nicotine patch" to ChatGPT - an AI chatbot that attempts to answer students' academic questions via the Socratic dialogue, instead of immediately giving answers.

The Socratic Method is a QnA-centric teaching method where the tutor would challenge students with leading questions. Answers from the student will form a hypothesis using which the tutor will challenge the student with a new set of questions. This will repeat until either the student arrives at the answer themselves, or the student gives up.

I am very certain that this chatbot has been a very effective teaching tool as compared to ChatGPT. The Socratic Method encourages critical and independent thinking from the students, allowing students the chance to build neural connections during the learning process. This method also effectively scaffolds information as students progress through problems in steps, following the tutor's guidance.

This chatbot also introduces a human-over-the-loop monitoring of student chatlogs. This allows teachers to check on the students' questions and thought processes, and perhaps bring some important teaching points to class on the next day. In contrast, there is so much information about the students' learning progress that is lost to the teachers when students use ChatGPT instead.

Due to IP-rights concerns, I cannot divulge the technical implementation and design details for the chatbot. But a prototype has been tried and tested to positive reception from the students, proving the effectiveness of the idea behind such a tool, and how it complements pretty well with the teaching and assessment strategies outlined above.