← Tilbake til chat

Flipped Exam PoC - Technical Specification

**Version:** 1.0
**Date:** February 10, 2026
**Status:** PoC / Experiment
**Author:** UiA Development Team

---

#

1. Executive Summary

This document describes the technical implementation of a Proof-of-Concept (PoC) for the "Flipped Exam" concept at UiA. The solution leverages existing Azure AI Foundry infrastructure with Claude Sonnet 4.5 to create an AI-powered "student" that examinees can interview.

##

1.1 Important Disclaimer

> **This is a PoC/Experiment, NOT a production-ready exam service.**
>
> - AI responses are non-deterministic
> - Results may vary between sessions
> - Not suitable for official student assessments without further validation

---

#

2. Background & Requirements

##

2.1 Pedagogical Concept

In a "Flipped Exam," students act as **examiners** rather than examinees:
- Student interviews an "AI student" about course material
- Student is graded on:
- Quality and depth of questions asked
- Ability to identify gaps in AI student's knowledge
- Analytical assessment of responses

##

2.2 Phase 1 Requirements (from specification)

| Requirement | Status |
|------------|--------|
| HTML5 web-based UI | βœ… Existing |
| Text-based dialog | βœ… Existing |
| Initial prompt configuration | πŸ†• To implement |
| Simple, robust interface | βœ… Existing |
| Reuse existing AI infrastructure | βœ… Azure AI Foundry |
| Logging functionality | πŸ†• To enhance |

##

2.3 Addressing Team Concerns

| Team Member | Concern | Solution |
|-------------|---------|----------|
| **Benjamin** | AI not reliable/deterministic | Low temperature (0.3), disclaimer, internal testing first |
| **Benjamin** | Need detailed training data | Structured system prompt with role-play instructions |
| **Arvid** | Logging is MUST-have | Full session logging with timestamps, export capability |
| **Arvid** | Error thresholds unclear | Configurable AI student level, deliberate error injection |
| **Arvid** | Who owns content? | Config stored per-session, clear examiner ownership |
| **Robert** | Treat as experiment | PoC banner, internal-only access |

---

#

3. Architecture Overview

##

3.1 System Components


β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Client (Browser) β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Exam Setup β”‚ β”‚ Chat Interface β”‚ β”‚ Export/Review β”‚ β”‚
β”‚ β”‚ Modal β”‚ β”‚ (Streaming) β”‚ β”‚ Panel β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ β”‚ β”‚
β–Ό β–Ό β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ FastAPI Backend β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ POST /api/exam β”‚ β”‚ POST /api/chat β”‚ β”‚ GET /api/exportβ”‚ β”‚
β”‚ β”‚ /session/start β”‚ β”‚ (streaming) β”‚ β”‚ /{session_id} β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β–Ό β–Ό β–Ό β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ SQLite Database (Enhanced) β”‚ β”‚
β”‚ β”‚ β€’ conversations β€’ messages β€’ exam_sessions β”‚ β”‚
β”‚ β”‚ β€’ audit_logs (NEW) β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Azure AI Foundry β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Claude Sonnet 4.5 (Anthropic) β”‚ β”‚
β”‚ β”‚ Endpoint: services.ai.azure.com β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜


##

3.2 Data Flow


1. Examiner configures exam session
└─► Subject, AI student level, weaknesses, language

2. System generates AI student persona prompt
└─► Stored in exam_sessions table

3. Student starts interview
└─► Each message logged with timestamp

4. AI responds as "student"
└─► Low temperature for consistency
└─► Role-play instructions enforced

5. Session ends
└─► Full transcript available for export
└─► Examiner can review via JSON/PDF


---

#

4. Database Schema

##

4.1 Existing Tables (unchanged)

sql
-- Conversations table
CREATE TABLE conversations (
id TEXT PRIMARY KEY,
created_at TEXT NOT NULL
);

-- Messages table
CREATE TABLE messages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
conversation_id TEXT NOT NULL,
role TEXT NOT NULL,
content TEXT NOT NULL,
created_at TEXT NOT NULL,
FOREIGN KEY (conversation_id) REFERENCES conversations(id)
);


##

4.2 New Tables

sql
-- Exam sessions configuration
CREATE TABLE exam_sessions (
id TEXT PRIMARY KEY,
conversation_id TEXT NOT NULL UNIQUE,
examiner_id TEXT, -- Optional: who configured this
subject TEXT NOT NULL, -- Course/subject name
student_level TEXT NOT NULL, -- 'weak', 'medium', 'strong'
language TEXT DEFAULT 'no', -- 'no', 'en'
weaknesses TEXT, -- JSON array of configured weaknesses
temperature REAL DEFAULT 0.3, -- Lower = more deterministic
system_prompt TEXT NOT NULL, -- Generated system prompt
started_at TEXT NOT NULL,
ended_at TEXT,
status TEXT DEFAULT 'active', -- 'active', 'completed', 'exported'
FOREIGN KEY (conversation_id) REFERENCES conversations(id)
);

-- Audit log for all actions
CREATE TABLE audit_logs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
session_id TEXT,
action TEXT NOT NULL, -- 'session_start', 'message_sent', 'session_end', 'export'
actor TEXT, -- 'student', 'system', 'examiner'
details TEXT, -- JSON with action-specific data
ip_address TEXT,
user_agent TEXT,
created_at TEXT NOT NULL,
FOREIGN KEY (session_id) REFERENCES exam_sessions(id)
);


---

#

5. API Endpoints

##

5.1 Existing Endpoints (enhanced)

| Method | Endpoint | Description | Changes |
|--------|----------|-------------|---------|
| POST | `/api/chat` | Send message, stream response | Add exam mode system prompt |
| GET | `/api/conversations` | List conversations | Filter by exam sessions |
| GET | `/api/history/{id}` | Get conversation history | No changes |

##

5.2 New Endpoints

###

POST `/api/exam/session/start`

Start a new exam session with configuration.

**Request:**
json
{
"subject": "INF-101 Grunnleggende programmering",
"student_level": "medium",
"language": "no",
"weaknesses": ["kildehenvisning", "feilhΓ₯ndtering"],
"examiner_id": "ola.nordmann@uia.no"
}


**Response:**
json
{
"session_id": "exam-uuid-here",
"conversation_id": "conv-uuid-here",
"status": "active",
"system_prompt_preview": "Du er en middels god student...",
"started_at": "2026-02-10T14:30:00Z"
}


###

POST `/api/exam/session/{session_id}/end`

End an exam session.

**Response:**
json
{
"session_id": "exam-uuid-here",
"status": "completed",
"duration_minutes": 45,
"message_count": 24,
"ended_at": "2026-02-10T15:15:00Z"
}


###

GET `/api/exam/export/{session_id}`

Export session for examiner review.

**Query params:**
- `format`: `json` (default) or `pdf`

**Response (JSON):**
json
{
"session_id": "exam-uuid-here",
"metadata": {
"subject": "INF-101",
"student_level": "medium",
"started_at": "2026-02-10T14:30:00Z",
"ended_at": "2026-02-10T15:15:00Z",
"duration_minutes": 45
},
"configuration": {
"weaknesses": ["kildehenvisning"],
"temperature": 0.3,
"system_prompt": "..."
},
"transcript": [
{
"timestamp": "2026-02-10T14:30:15Z",
"role": "user",
"content": "Kan du forklare hva en for-lΓΈkke er?"
},
{
"timestamp": "2026-02-10T14:30:45Z",
"role": "assistant",
"content": "En for-lΓΈkke er sΓ₯nn... du bruker den for Γ₯ gjenta ting..."
}
],
"audit_trail": [...]
}


###

GET `/api/exam/sessions`

List all exam sessions (for examiners).

**Response:**
json
{
"sessions": [
{
"session_id": "...",
"subject": "INF-101",
"status": "completed",
"started_at": "...",
"message_count": 24
}
]
}


---

#

6. AI Student System Prompt

##

6.1 Prompt Template (Norwegian)


Du er en {level_description} student i faget "{subject}".

VIKTIGE INSTRUKSJONER:
1. Du skal ALDRI opptre som lærer eller ekspert
2. Du svarer som en student som prΓΈver Γ₯ forklare det du har lΓ¦rt
3. Du kan være usikker, nøle, og gjøre feil
4. Du skal IKKE gi perfekte, læreboksvar

DITT NIVΓ…: {level_description}
- Svak student: GjΓΈr mange feil, husker dΓ₯rlig, blander begreper
- Middels student: ForstΓ₯r grunnleggende, men sliter med detaljer
- Sterk student: God forstΓ₯else, men kan fortsatt ha hull

SPESIFIKKE SVAKHETER DU HAR:
{weaknesses_list}

SPRΓ…K: Svar alltid pΓ₯ {language}.

OPPFØRSEL:
- Bruk uformelt sprΓ₯k som en student ville gjort
- Si "ehh", "jeg tror...", "var det ikke sΓ₯nn at..."
- Hvis du blir presset pΓ₯ noe du ikke kan, innrΓΈm at du er usikker
- Ikke gi kilder med mindre du blir spurt, og da kan du finne pΓ₯ feil kilder

EKSEMPEL pΓ₯ hvordan du svarer som {level} student:
SpΓΈrsmΓ₯l: "Hva er en variabel?"
Svak: "Det er sΓ₯nn... en boks? Der du legger ting i. Tror jeg."
Middels: "En variabel er en navngitt plass i minnet der du kan lagre verdier. Som en boks med en lapp pΓ₯."
Sterk: "En variabel er en symbolsk referanse til en minnelokasjon som holder en verdi. Den har en type, et navn og en verdi."


##

6.2 Level Configurations

| Level | Temperature | Error Rate | Characteristics |
|-------|-------------|------------|-----------------|
| `weak` | 0.5 | High | Confused, wrong facts, poor structure |
| `medium` | 0.3 | Moderate | Basic understanding, gaps in details |
| `strong` | 0.2 | Low | Good grasp, occasional minor errors |

##

6.3 Configurable Weaknesses

| Weakness ID | Norwegian | Description |
|-------------|-----------|-------------|
| `kildehenvisning` | DΓ₯rlig kildebruk | Makes up sources, cites incorrectly |
| `feilhaandtering` | Svak feilhΓ₯ndtering | Doesn't consider edge cases |
| `syntaks` | Syntaksfeil | Makes syntax errors in code |
| `begreper` | Blander begreper | Confuses related concepts |
| `struktur` | DΓ₯rlig struktur | Unorganized explanations |
| `dybde` | Overfladisk forstΓ₯else | Lacks deep understanding |

---

#

7. UI Components

##

7.1 Exam Setup Modal


β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ ⚠️ Flipped Exam - PoC Mode β”‚
β”‚ β”‚
β”‚ Fag/Emne: β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ INF-101 Grunnleggende programmering β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚
β”‚ KI-student nivΓ₯: β”‚
β”‚ β—‹ Svak student (mange feil, usikker) β”‚
β”‚ ● Middels student (grunnleggende forstΓ₯else) β”‚
β”‚ β—‹ Sterk student (god forstΓ₯else) β”‚
β”‚ β”‚
β”‚ Svakheter (velg 1-3): β”‚
β”‚ β˜‘ DΓ₯rlig kildebruk β”‚
β”‚ ☐ Svak feilhΓ₯ndtering β”‚
β”‚ β˜‘ Blander begreper β”‚
β”‚ ☐ Syntaksfeil i kode β”‚
β”‚ β”‚
β”‚ SprΓ₯k: β”‚
β”‚ ● Norsk β—‹ English β”‚
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Avbryt β”‚ β”‚ Start eksamen β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜


##

7.2 Active Exam Session UI


β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ ⚠️ FLIPPED EXAM PoC - Ikke for offisiell bruk β”‚
β”‚ Fag: INF-101 | NivΓ₯: Middels | Tid: 23:45 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ β”‚
β”‚ [Chat messages appear here as normal] β”‚
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Du: β”‚ β”‚
β”‚ β”‚ "Kan du forklare hva arv betyr i OOP?" β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ KI-Student: β”‚ β”‚
β”‚ β”‚ "Arv er sΓ₯nn... nΓ₯r en klasse fΓ₯r egenskapene β”‚ β”‚
β”‚ β”‚ fra en annen klasse. Som at en Bil-klasse kan β”‚ β”‚
β”‚ β”‚ arve fra KjΓΈretΓΈy eller noe sΓ₯nt." β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Still et spΓΈrsmΓ₯l... β”‚ β”‚ Send β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Avslutt ΓΈkt β”‚ β”‚ Eksporter (JSON) β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜


---

#

8. Security & Privacy Considerations

##

8.1 Data Handling

| Aspect | Implementation |
|--------|----------------|
| Data storage | SQLite on local server (PoC only) |
| Data retention | Manual deletion, no auto-purge in PoC |
| Access control | None in PoC (internal testing only) |
| Encryption | Not implemented in PoC |

##

8.2 PoC Limitations (to address before production)

- [ ] No authentication/authorization
- [ ] No HTTPS enforcement
- [ ] No session timeouts
- [ ] No rate limiting
- [ ] Single-tenant architecture
- [ ] No backup/recovery

---

#

9. Testing Strategy

##

9.1 Internal Testing Phases

| Phase | Duration | Participants | Goal |
|-------|----------|--------------|------|
| 1 | 1 week | Dev team | Validate basic functionality |
| 2 | 2 weeks | Extended team | Test AI student behaviors |
| 3 | 2 weeks | Select faculty | Pedagogical feedback |

##

9.2 Metrics to Measure

| Metric | Target | Measurement Method |
|--------|--------|-------------------|
| Response consistency | <20% variance | Same prompts, multiple runs |
| Latency | <3s first token | Timestamp logging |
| Error rate (system) | <1% | Exception logging |
| AI "in-character" rate | >90% | Manual review of transcripts |

##

9.3 Stop/Go Criteria

**Stop conditions:**
- AI frequently breaks character (>10% of responses)
- System errors affect >5% of sessions
- Faculty feedback indicates unusable for learning

**Go conditions:**
- All Phase 3 testers recommend continuation
- Consistency metrics met
- Clear value demonstrated for learning (even if not for exams)

---

#

10. Implementation Checklist

##

Phase 1: Core Implementation (This PoC)

- [ ] Enhanced database schema with exam_sessions table
- [ ] Exam setup modal in UI
- [ ] AI student system prompt generator
- [ ] `/api/exam/session/start` endpoint
- [ ] `/api/exam/session/{id}/end` endpoint
- [ ] `/api/exam/export/{id}` endpoint
- [ ] PoC disclaimer banner
- [ ] Basic audit logging

##

Phase 2: Enhancements (Future)

- [ ] PDF export with formatting
- [ ] Session replay functionality
- [ ] Multi-examiner support
- [ ] Performance dashboards
- [ ] Consistency analysis tools

##

Phase 3: Production Readiness (If approved)

- [ ] Authentication integration (Feide)
- [ ] LMS integration (Canvas)
- [ ] Proper security audit
- [ ] GDPR compliance review
- [ ] Load testing
- [ ] Deployment to production Azure

---

#

11. File Structure


azure-ai-foundry-claudesonnet-45/
β”œβ”€β”€ app/
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ main.py

FastAPI app (enhanced)
β”‚ β”œβ”€β”€ exam/

NEW: Exam module
β”‚ β”‚ β”œβ”€β”€ __init__.py
β”‚ β”‚ β”œβ”€β”€ models.py

Pydantic models
β”‚ β”‚ β”œβ”€β”€ prompts.py

System prompt templates
β”‚ β”‚ └── routes.py

Exam-specific endpoints
β”‚ └── database.py

NEW: DB utilities
β”œβ”€β”€ docs/
β”‚ └── FLIPPED_EXAM_POC_TECHNICAL_SPEC.md

This document
β”œβ”€β”€ tests/

NEW: Test suite
β”‚ β”œβ”€β”€ test_exam_session.py
β”‚ └── test_ai_student.py
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ .env.example
└── README.md


---

#

12. Deployment

##

12.1 Local Development

bash

Clone and setup
cd azure-ai-foundry-claudesonnet-45
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Configure
cp .env.example .env

Edit .env with Azure credentials

Run
uvicorn app.main:app --reload --port 8000


##

12.2 Environment Variables

env

Azure AI Foundry
AZURE_FOUNDRY_ENDPOINT=https://your-resource.services.ai.azure.com
AZURE_API_KEY=your-api-key
CLAUDE_DEPLOYMENT_NAME=claude-sonnet-4-5

Database
CHAT_DB_PATH=chat.db

Exam PoC Settings (NEW)
EXAM_MODE_ENABLED=true
EXAM_DEFAULT_TEMPERATURE=0.3
EXAM_MAX_DURATION_MINUTES=60


---

#

13. Monitoring and Operations

##

13.1 AI Monitor Verification

The AI Monitor runs automatically in the background after each AI response during an exam session. To verify it's working:

###

13.1.1 Check Pod Logs (Kubernetes)

View recent monitor activity:

bash

Check for monitor logs (last 50 lines)
kubectl logs -n flipped-exam -l app=flipped-exam-chat --tail=50 | grep -i "MONITOR"

Watch logs in real-time during testing
kubectl logs -n flipped-exam -l app=flipped-exam-chat -f | grep "MONITOR"

Check specific session logs (replace session ID)
kubectl logs -n flipped-exam deployment/flipped-exam-chat --tail=200 | grep -E "(MONITOR|exam-[session-id])"


Expected log output:

[MONITOR] Analyzing response for session exam-6ca810dd-..., message 24
[MONITOR] Found 3 issues in response 24
- level_mismatch (medium): Too sophisticated strategic analysis...
- character_break (low): Sounds like a teacher organizing a lesson...
[MONITOR] Response 28 passed quality check (score: 0.85)


###

13.1.2 Monitor Log Patterns

**Success patterns:**
- `[MONITOR] Analyzing response for session , message ` - Monitor started
- `[MONITOR] Response passed quality check (score: X.XX)` - No issues found
- `[MONITOR] Found N issues in response ` - Issues detected

**Quality flag types:**
- `hallucination` - AI invented facts or sources
- `character_break` - AI stepped out of student role (e.g., acting as teacher)
- `level_mismatch` - Response too sophisticated or too weak for configured level
- `source_fabrication` - Cited non-existent sources
- `inappropriate` - Inappropriate content

**Severity levels:**
- `low` - Minor issue, acceptable variation
- `medium` - Noticeable problem, affects quality
- `high` - Significant issue, undermines session
- `critical` - Severe problem, session should be reviewed

###

13.1.3 Understanding Monitor Results

Example from production logs:


[MONITOR] Found 3 issues in response 26
- level_mismatch (medium): Providing specific number '28 ships of the line'
is quite detailed knowledge that borders on expert-level recall
- character_break (low): Phrase 'This changed things in a few key ways. First...'
sounds slightly more like a teacher organizing a lesson
- level_mismatch (low): Very polished conclusion showing almost no gaps
in understanding


**Interpreting results:**
- **Multiple medium flags** - AI may not be maintaining character consistently
- **Repeated character_break** - System prompt may need adjustment
- **Level_mismatch flags** - Temperature or prompt configuration may need tuning
- **No flags (score 0.7-0.9)** - Healthy student-like response

###

13.1.4 Testing Monitor Locally

Run the test script to verify monitoring logic:

bash
cd /home/robert/Documents/dev/openAi/azure-ai-foundry-claudesonnet-45
python test_monitor.py


This tests with intentionally good and bad responses to ensure the monitor can detect issues.

##

13.2 Database Inspection

The monitor saves quality assessments to the database. To inspect:

**Via application export:**
1. Complete an exam session
2. Click "πŸ“Š Export Session" button
3. Review the exported JSON for quality assessments

**Direct database access (local development):**
bash
sqlite3 chat.db "SELECT * FROM quality_assessments LIMIT 5;"


##

13.3 Production Monitoring

**Key metrics to track:**
- Monitor execution rate (should match message count during exams)
- Average quality scores per session
- Flag distribution by type and severity
- Monitor errors or failures

**Health indicators:**
- `[MONITOR] Analyzing response` logs appear for every AI response
- Background tasks complete without errors
- Quality reports accessible via export

---

#

14. Appendix

##

A. Example Exam Session Transcript


[Session Start: 2026-02-10T14:30:00Z]
[Config: Subject=INF-101, Level=medium, Weaknesses=[kildehenvisning]]

Student: Hei! Kan du forklare hva objektorientert programmering er?

KI-Student: Hei! Eh, objektorientert programmering er en mΓ₯te Γ₯
programmere pΓ₯ der man bruker objekter. Et objekt er liksom... en ting
som har egenskaper og ting den kan gjΓΈre. Som en bil har farge og kan
kjΓΈre. Tror det var noe med klasser ogsΓ₯, at en klasse er som en
oppskrift pΓ₯ objekter.

Student: Bra start! Hvor lærte du dette?

KI-Student: Det stΓ₯r i boka vi brukte... tror det var "Java for
nybegynnere" eller noe sΓ₯nt? Kapittel 3 tror jeg. [MERK: Fiktiv kilde]

Student: Kan du gi et kodeeksempel?

KI-Student: Ja, ehh...
java
class Bil {
String farge;
void kjΓΈr() {
System.out.println("Brum brum");
}
}

SΓ₯nn omtrent? Jeg husker ikke helt syntaksen, men det var noe sΓ₯nt.

[Session End: 2026-02-10T15:15:00Z]
[Duration: 45 minutes, Messages: 24]


##

B. Risk Assessment

| Risk | Likelihood | Impact | Mitigation |
|------|------------|--------|------------|
| AI breaks character | Medium | High | Strict prompts, low temperature |
| System downtime | Low | High | Local SQLite, minimal deps |
| Student games the AI | Medium | Medium | Not for actual grading in PoC |
| Data loss | Low | Medium | Regular manual backups |
| Inconsistent responses | High | Medium | Accept as PoC limitation |

---

**Document End**

*For questions, contact the UiA Development Team.*