Teacher Vs. AI: The Ultimate Assessment Showdown

By: Catrina Mitchum

Posted On

June 23, 2023

Featured In

June 23, 2023

Share this article

Foreword: I was only able to experiment with this because of the following:

1.) PowerNotes has a commercial OpenAI license, which means 3 things:

- Anything put into the AI goes from the PowerNotes name (so we have anonymity)

- That data is deleted after 30 days

- Nothing that goes through the AI is used to feed the LLM beast

2.) I asked my students if they were willing to let me put their work through PowerNotes AI (see my explanation below) and if they wanted to see the feedback.

The ask:

Screenshot of Google form with this text:By submitting this form, you're agreeing to allow Dr. Catrina Mitchum to do the following:1. Grade your work as an instructor using the rubric in the course. 2. Enter your work into the PowerNotes AI tool, which means your work WILL NOT be used to train the AI LLM and your work WILL be anonymous and deleted after 30 days and only reviewed for security purposes, to get a grade and feedback.3. Compare the AI grading and feedback with my own and write about it on a public facing blog. YOUR work will not be shared anywhere on the blog; I'll just be comparing the AI response with my own.I will not assign a grade or give the feedback that the AI provided to me unless you'd like it. — The form Catrina used for consent with her students.

I wasn’t sure if I’d get anyone to agree, but 6 students said yes and they all wanted to know what the AI had to say.

The second thing I need to tell you is that I’m “hacking” the tool because it was a safe space to try this out. It’s not really intended for this, but I think “breaking” tools tends to open up new possibilities for using them.

This is a “do it with me” piece where I’ll walk you through how I used an AI writing tool to generate feedback and a score for 4 assignments for some of my online students. I’m lucky enough to have access to the new AI writing tools in the PowerNotes platform, so that’s what I used because it created a safe space for students. If you’d like to try out PowerNotes’ AI features for yourself, reach out here.

‍TL/DR: If you’ve been playing with AI at all, the results were not surprising, and it is anecdotal. However, I think there’s a lot to be said for confirming that teachers are still needed - and a lot of potential for AI to replace certain parts of our processes. We might get to be readers and collaborators in our students’ writing instead of gatekeepers.

Here we go.

‍

Getting Started:

The first thing I did was create topics to help keep myself organized. I created a topic for Instructions, Rubric, Student Work, and AI Feedback for two different activities. That gave me a total of 8 topics, which made it easy to go in and prompt the AI.

1. Instructions -Discussion 22. Rubric Discussion 23. Student Work - Discussion 24. AI Feedback - Discussion 25. Instructions - Assignment 1etc. — Topics that Catrina built within PowerNotes to keep things organized

After I created the topics, I went into the Instructions topics and added a freeform note with the instructions copied and pasted from the course. For Assignment 1, I added a second note that included a copy of the template as well.

Then, I went into the Rubrics topic and did the same thing; however, in this case, I needed to take a matrix and make it something the AI could understand, so I copied and pasted the matrix and then reformatted and clearly labeled each criterion, what to look for, and how many points it was worth.

The part I was excited about here was that the Discussion Board rubric was based largely on completion. There were two criteria assessed on completeness.

Criterion 1Response to discussion topic is complete and timely. Thoroughly and thoughtfully answers prompts.Meets Requirements: 25 pointsPartially Meets Requirements: 19.75 pointsMissing or Not Submitted: 0 pointsCriterion 2Responds to multiple classmates' initial discussion posts. Responses are thoughtful and contribute constructively to the discussion.Meets Requirements: 25 pointsPartially Meets Requirements: 19.75 pointsMissing or Not Submitted: 0 points — One of the rubrics used showing the criterion to be assessed

The Assignment rubric looked more like a traditional rubric in that each criterion was labeled A-F with a specific metric that needed to be reached. I’m not going to include that here because it’s pretty standard, but it did also mean that I had to be specific with the AI about what constituted each points level. I’ve included a snapshot so you can see the differences. There were a total of 5 criteria.

Criterion 1 Steps 1 & 2: Introduce topic and complete table.Topic is succinct and clear and explained well in sentences. Table is completed fully and correctly: 50 pointsTopic is understandable as written. Table is mostly complete and most answers are correct: 44.5 pointsTopic requires clarification. Table is mostly completed, but some of the answers need more work: 39.5 pointsTopic is unclear, and several answers are missing: 34.5 points Answer is not attempted: 0 pointsCriterion 2

Finally, I went into the Student Work topic and added a freeform note for each student. Initially, I just copied and pasted, removing names. But when I tried to prompt the AI to respond by asking it to:

Use the Instructions and Rubric to grade and Give Feedback on the Student Work note.

The output was this:

Unfortunately, there is no student work note provided to grade and give feedback on. Please provide the necessary information for me to assist you.

Clearly, it didn’t understand. So I went back into the Student Work Freeform notes and added “Student Work” at the top. I also fed the instructions and rubric one at a time.

‍

Asking AI for Feedback and Scores:

‍

I asked the AI Assistant (Brainstorm™) in PowerNotes to:

Use the Instructions to provide feedback on the Student Work

And selected the whole Instructions topic and just the one piece of student work I wanted feedback on. PowerNotes is calling this closed prompting, and it’s basically closing the AI view to what you put in front of it. In this case, I put some instructions and an example of responding to those instructions and asked it to compare.

Top - Brainstorm with AIUnder that - a search bar with the ability to cancel or sendBottom - list of topics to expand or choose from that include Instructions - Discussion 2, Rubric - Discussion 2 and Student work - Discussion 2. The instructions and one box in the Student work are selected.

I did this across four activities (2 discussion boards and 2 assignments) for six students. Then, because all 6 students wanted to see the AI output, I put my feedback alongside the AI feedback into a table for the students.

I then ask the AI to:

Use the rubric to score the Student Work

I took some time to go through the AI feedback and left notes for students clarifying places where the AI was just wrong or it gave feedback that wasn’t relevant (so clearly hallucinating).

Hiccups

Interestingly, it wasn’t consistent and as I worked through getting the score and feedback on all 24 student assignments, it would periodically tell me it didn’t know what the student work or the rubric was.

Sometimes that meant I had to just wait and try again later. Other times, I could work around it by giving just the student work or specifying that it addresses all 5 or both criteria.

I did try asking it to “Act as a Rhetoric and Composition Professor” and to “Act as an Information Literacy Professor” –these additions did not change the feedback structure or content. All it did was rewrite with some word choice swaps.

‍

How it Went:

Most things I found weren’t surprising based on what we already know about AI. I’m going to share them anyway just because I haven’t seen discussions of the impact this could have on students or what the possibilities might be.

AI “sees” and “doesn’t see” things without rhyme or reason.

The inconsistency is not new. We also know it hallucinates. In this context, I saw it:

Not seeing things that are there. The AI suggested “improvements” that were already in the students’ work. For example, one student was told they needed to include the research question, which they did. Several students were told that they didn’t address bias or credibility, which were required, but they did.

Seeing things that aren’t there. The AI gave a score and feedback on things that weren’t there. For example, one of the assignments asked for a screenshot of search results. It was part of the rubric as well. I didn’t include them when I ran the student work through, but the AI assumed that they were there.

Doing both intermittently. I didn’t include student responses to classmates in every instance, but I did for some. It was all over the place. Sometimes It assumed the responses were there and gave students full credit, sometimes it saw that it wasn’t there and gave them zero credit. Fortunately, the times I did submit the responses, it “saw” them.

Could this change in the future? Maybe. Based on my knowledge of large language models, it’s predictive, if that prediction improves, it might get better, but it would need to also get better at processing the input. I can’t change student work, so the machine will need to get better at processing what it’s given without any “engineering.”

‍

AI makes it up when it doesn’t know.

We already knew this, and honestly, this has struck me as the most humanlike thing it does.

Things it made up:

Mostly APA formatting suggestions in this case. For example, the AI told one student “For reference citation in APA 7 format, make sure you capitalize the first letter of all major words in the article title, not just the first word.” The student used a journal article. Eeesh. I left a Catrina Note on that one for sure! It wasn’t the only one either. I had to make quite a few “well, this isn’t right...”

‍It also gave unsound writing advice. The second assignment requires 3 paragraphs with the first introducing the topic, the second explaining how the article they’re analyzing helped them begin to answer their research question, and the third reflecting on what they still want to know and what next steps might be for them. The AI often suggested more details in student work that had the right balance of article details and what it taught them. Places where more was needed it didn’t offer suggestions. For example, instead of providing feedback to help the student give more details about why they picked a particular article, which was going to help them on the next assignment, the AI focused instead of giving feedback on things it didn’t see that were there.

My Feedback	AI Feedback
For the article justification, you needed a bit more detail - so why are the statistics and references important? As we move forward, you'll be evaluating this article in a deeper dive.	None.
You also needed a bit more detail in the usefulness of the article you selected. Why was it a good middle ground? How will getting an unbiased perspective help you answer your question?	None.
While it can be useful to have any sort of evidence that supports the connection, don't forget the value of NOT in your Boolean search. If your search is giving you things you don't want, you can add a NOT word or phrase to rein the search back into focus.	The student could benefit from using more specific keywords. For example, instead of using "mental health," they could use more specific terms such as "anxiety," "depression," or "stress”

‍

AI can’t be specific and contextual because it has no personal experience.

This is, again, something we already know, but from a teaching and learning perspective, that’s a problem. Specific, contextualized feedback is important for students, and we know that sharing personal experience provide powerful examples.

Specific.

When students had incorrect APA citations, I could tell them what exactly they were and give an example by fixing it. See the table for a comparison:

‍

My Feedback	AI Feedback
For your APA, the journal title and volume number should be italicized: "Journal of Communication Disorders, 65, 35-42..."	The selected article citation is formatted correctly in APA 7 format, but it would have been helpful if the student had provided a brief summary of the article and explained why it was chosen over other articles found in their search.

‍

Contextualized.

The AI didn’t have previous context of the student. I fed it the context of the assignment, but learning is often about connecting experiences. This is something the AI can’t do. Will it be able to in the future? We would need to give it access to “read” an entire course and make the course content, student work, and interactions part of the AI’s database. Not to mention the conferencing and in person interactions. That feels like a level of FERPA issues we don’t want.

In this example, I was both responding to this students assignment and referencing back to their previous discussion board post. The AI clearly missed that the student did talk about engagement, and didn’t have the context of the previous discussion where the student had mentioned engagement as well.

‍

My Feedback	AI Feedback
Online education is near and dear to me, so I'm looking forward to reading the article you picked. I agree that engagement is useful (it often determines successful achievement of learning outcomes) in this case. Something that wasn't quite clear, and the nuance might be useful is are you interested in how Covid impacted enrollment at the height of the pandemic or its ongoing impact? As we move forward, you'll be evaluating this article in a deeper dive, so think.	Additionally, the student may want to consider revising their preliminary topic statement. While it is a good starting point, it may be too broad to effectively guide their research. They could focus on a particular aspect of online higher education, such as the impact on student engagement or the effectiveness of online teaching methods.

My Feedback

AI Feedback

Online education is near and dear to me, so I'm looking forward to reading the article you picked. I agree that engagement is useful (it often determines successful achievement of learning outcomes) in this case. Something that wasn't quite clear, and the nuance might be useful is are you interested in how Covid impacted enrollment at the height of the pandemic or its ongoing impact? As we move forward, you'll be evaluating this article in a deeper dive, so think.

Additionally, the student may want to consider revising their preliminary topic statement. While it is a good starting point, it may be too broad to effectively guide their research. They could focus on a particular aspect of online higher education, such as the impact on student engagement or the effectiveness of online teaching methods.

‍

Personal.

The above is a good example of this, but there were a lot of times that I could give helpful tips and tricks for working through more common research problems like the fact that search terms change and adjust a lot. It can feel like you’re never going to master them. The AI gave general suggestions; I gave a specific one.

‍

My Feedback	AI Feedback
I'm glad the shift you made in your search phrases was useful across both, as well. When I do this kind of work, I like to keep track of these shifts (in part so I don't shift back to something I already know doesn't work!).	To improve searching skills, I suggest exploring different databases and trying out different keywords and search statements. It's important to be patient and persistent in searching for relevant sources.

‍

Basically, I can be a human reader and responder in ways students need, while the AI cannot.

‍

AI doesn’t “math” well and struggles with context even in the same session.

In this situation, a session is a specific chat. When I asked the AI to score based on the rubric criteria, it sometimes graded things that weren’t there (like responses to classmates), and sometimes took points off for things that were there (even after “refeeding” the student work in).

For example, after taking 1 point off in one Criterion worth 50 points and 2 points off in another Criterion worth 50 points, it added up the score out of a total possible 250 points as:

Final Score: 247/250 or 47.75/50

This was enough to make students leery. I asked students, when I sent them the comparison table, which they preferred (I had already turned in final grades), and inaccurate scoring and feedback were the reasons they gave me for preferring my grading to the AI.

‍

What I Learned

Frankly, I was nervous about this experiment, but since I had the means it felt like I needed to do it because I could create that safe space for student work. I had nothing to worry about.

This confirmed to me that AI isn’t there. Even with improvements, this kind of thing will always need a human eye and we would need an invasive level of AI in our classrooms (both LMS spaces and physical spaces) to be able to give the AI the kind of context it needs to provide feedback like a teacher.

Could it do the scoring? Maybe. Not now, but this seems the most likely place it could learn without needing access to all the things. It was most successful with the completion rubrics for the discussion board. These were a bit more objective. While it inconsistently assessed whether the responses were there, it wasn’t making up reasons to take points off for things it didn’t see (that weren’t actually even required) like it did with the more traditional A-F rubric. This, to me, is the most exciting prospect. Scoring and grades are the teaching and learning space where we need to remove bias. If AI can score for completion, which wouldn’t require an AI invasion of all the learning spaces, and teachers can give feedback on improvement, it would go a long way to overhauling higher education in inclusive and productive ways.

Hacking a tool is always time consuming; this was more time consuming than I was even anticipating (and I love this stuff). I’m also confident I used a lot of AI tokens (shh, don’t tell PowerNotes), but it was an interesting experience and I’m very grateful to my students who were just as curious as I was, and to PowerNotes for giving me space and autonomy to do this kind of work.