← Tilbake til chat

Flipped Exam PoC - Technical Specification

Version: 1.0
Date: February 10, 2026
Status: PoC / Experiment
Author: UiA Development Team

---

#

1. Executive Summary

This document describes the technical implementation of a Proof-of-Concept (PoC) for the "Flipped Exam" concept at UiA. The solution leverages existing Azure AI Foundry infrastructure with Claude Sonnet 4.5 to create an AI-powered "student" that examinees can interview.

##

1.1 Important Disclaimer

> This is a PoC/Experiment, NOT a production-ready exam service.
>
> - AI responses are non-deterministic
> - Results may vary between sessions
> - Not suitable for official student assessments without further validation

---

#

2. Background & Requirements

##

2.1 Pedagogical Concept

In a "Flipped Exam," students act as examiners rather than examinees:
- Student interviews an "AI student" about course material
- Student is graded on:
- Quality and depth of questions asked
- Ability to identify gaps in AI student's knowledge
- Analytical assessment of responses

##

2.2 Phase 1 Requirements (from specification)

| Requirement | Status |
|------------|--------|
| HTML5 web-based UI | ✅ Existing |
| Text-based dialog | ✅ Existing |
| Initial prompt configuration | 🆕 To implement |
| Simple, robust interface | ✅ Existing |
| Reuse existing AI infrastructure | ✅ Azure AI Foundry |
| Logging functionality | 🆕 To enhance |

##

2.3 Addressing Team Concerns

| Team Member | Concern | Solution |
|-------------|---------|----------|
| Benjamin | AI not reliable/deterministic | Low temperature (0.3), disclaimer, internal testing first |
| Benjamin | Need detailed training data | Structured system prompt with role-play instructions |
| Arvid | Logging is MUST-have | Full session logging with timestamps, export capability |
| Arvid | Error thresholds unclear | Configurable AI student level, deliberate error injection |
| Arvid | Who owns content? | Config stored per-session, clear examiner ownership |
| Robert | Treat as experiment | PoC banner, internal-only access |

---

#

3. Architecture Overview

##

3.1 System Components


┌─────────────────────────────────────────────────────────────────┐
│                         Client (Browser)                        │
│  ┌─────────────────┐  ┌─────────────────┐  ┌────────────────┐  │
│  │  Exam Setup     │  │  Chat Interface │  │  Export/Review │  │
│  │  Modal          │  │  (Streaming)    │  │  Panel         │  │
│  └────────┬────────┘  └────────┬────────┘  └───────┬────────┘  │
└───────────┼─────────────────────┼──────────────────┼────────────┘
            │                     │                  │
            ▼                     ▼                  ▼
┌─────────────────────────────────────────────────────────────────┐
│                      FastAPI Backend                            │
│  ┌─────────────────┐  ┌─────────────────┐  ┌────────────────┐  │
│  │ POST /api/exam  │  │ POST /api/chat  │  │ GET /api/export│  │
│  │ /session/start  │  │ (streaming)     │  │ /{session_id}  │  │
│  └────────┬────────┘  └────────┬────────┘  └───────┬────────┘  │
│           │                    │                   │            │
│           ▼                    ▼                   ▼            │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │              SQLite Database (Enhanced)                   │  │
│  │  • conversations    • messages    • exam_sessions         │  │
│  │  • audit_logs (NEW)                                       │  │
│  └──────────────────────────────────────────────────────────┘  │
└────────────────────────────────────────┬────────────────────────┘
                                         │
                                         ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Azure AI Foundry                             │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │              Claude Sonnet 4.5 (Anthropic)               │   │
│  │              Endpoint: services.ai.azure.com             │   │
│  └─────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘


##3.2 Data Flow


1. Examiner configures exam session
   └─► Subject, AI student level, weaknesses, language
   
2. System generates AI student persona prompt
   └─► Stored in exam_sessions table
   
3. Student starts interview
   └─► Each message logged with timestamp
   
4. AI responds as "student"
   └─► Low temperature for consistency
   └─► Role-play instructions enforced
   
5. Session ends
   └─► Full transcript available for export
   └─► Examiner can review via JSON/PDF


---

#4. Database Schema

##
4.1 Existing Tables (unchanged)

sql
-- Conversations table
CREATE TABLE conversations (
    id TEXT PRIMARY KEY,
    created_at TEXT NOT NULL
);

-- Messages table
CREATE TABLE messages (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    conversation_id TEXT NOT NULL,
    role TEXT NOT NULL,
    content TEXT NOT NULL,
    created_at TEXT NOT NULL,
    FOREIGN KEY (conversation_id) REFERENCES conversations(id)
);


##4.2 New Tables

sql
-- Exam sessions configuration
CREATE TABLE exam_sessions (
    id TEXT PRIMARY KEY,
    conversation_id TEXT NOT NULL UNIQUE,
    examiner_id TEXT,                    -- Optional: who configured this
    subject TEXT NOT NULL,               -- Course/subject name
    student_level TEXT NOT NULL,         -- 'weak', 'medium', 'strong'
    language TEXT DEFAULT 'no',          -- 'no', 'en'
    weaknesses TEXT,                     -- JSON array of configured weaknesses
    temperature REAL DEFAULT 0.3,        -- Lower = more deterministic
    system_prompt TEXT NOT NULL,         -- Generated system prompt
    started_at TEXT NOT NULL,
    ended_at TEXT,
    status TEXT DEFAULT 'active',        -- 'active', 'completed', 'exported'
    FOREIGN KEY (conversation_id) REFERENCES conversations(id)
);

-- Audit log for all actions
CREATE TABLE audit_logs (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    session_id TEXT,
    action TEXT NOT NULL,                -- 'session_start', 'message_sent', 'session_end', 'export'
    actor TEXT,                          -- 'student', 'system', 'examiner'
    details TEXT,                        -- JSON with action-specific data
    ip_address TEXT,
    user_agent TEXT,
    created_at TEXT NOT NULL,
    FOREIGN KEY (session_id) REFERENCES exam_sessions(id)
);


---

#5. API Endpoints

##
5.1 Existing Endpoints (enhanced)

| Method | Endpoint | Description | Changes |
|--------|----------|-------------|---------|
| POST | `/api/chat` | Send message, stream response | Add exam mode system prompt |
| GET | `/api/conversations` | List conversations | Filter by exam sessions |
| GET | `/api/history/{id}` | Get conversation history | No changes |

##
5.2 New Endpoints

###
POST `/api/exam/session/start`

Start a new exam session with configuration.

**Request:**
json
{
    "subject": "INF-101 Grunnleggende programmering",
    "student_level": "medium",
    "language": "no",
    "weaknesses": ["kildehenvisning", "feilhåndtering"],
    "examiner_id": "ola.nordmann@uia.no"
}


**Response:**
json
{
    "session_id": "exam-uuid-here",
    "conversation_id": "conv-uuid-here",
    "status": "active",
    "system_prompt_preview": "Du er en middels god student...",
    "started_at": "2026-02-10T14:30:00Z"
}


###POST `/api/exam/session/{session_id}/end`

End an exam session.

**Response:**
json
{
    "session_id": "exam-uuid-here",
    "status": "completed",
    "duration_minutes": 45,
    "message_count": 24,
    "ended_at": "2026-02-10T15:15:00Z"
}


###GET `/api/exam/export/{session_id}`

Export session for examiner review.

**Query params:**
- `format`: `json` (default) or `pdf`

**Response (JSON):**
json
{
    "session_id": "exam-uuid-here",
    "metadata": {
        "subject": "INF-101",
        "student_level": "medium",
        "started_at": "2026-02-10T14:30:00Z",
        "ended_at": "2026-02-10T15:15:00Z",
        "duration_minutes": 45
    },
    "configuration": {
        "weaknesses": ["kildehenvisning"],
        "temperature": 0.3,
        "system_prompt": "..."
    },
    "transcript": [
        {
            "timestamp": "2026-02-10T14:30:15Z",
            "role": "user",
            "content": "Kan du forklare hva en for-løkke er?"
        },
        {
            "timestamp": "2026-02-10T14:30:45Z",
            "role": "assistant",
            "content": "En for-løkke er sånn... du bruker den for å gjenta ting..."
        }
    ],
    "audit_trail": [...]
}


###GET `/api/exam/sessions`

List all exam sessions (for examiners).

**Response:**
json
{
    "sessions": [
        {
            "session_id": "...",
            "subject": "INF-101",
            "status": "completed",
            "started_at": "...",
            "message_count": 24
        }
    ]
}


---

#6. AI Student System Prompt

##
6.1 Prompt Template (Norwegian)


Du er en {level_description} student i faget "{subject}".

VIKTIGE INSTRUKSJONER:
1. Du skal ALDRI opptre som lærer eller ekspert
2. Du svarer som en student som prøver å forklare det du har lært
3. Du kan være usikker, nøle, og gjøre feil
4. Du skal IKKE gi perfekte, læreboksvar

DITT NIVÅ: {level_description}
- Svak student: Gjør mange feil, husker dårlig, blander begreper
- Middels student: Forstår grunnleggende, men sliter med detaljer
- Sterk student: God forståelse, men kan fortsatt ha hull

SPESIFIKKE SVAKHETER DU HAR:
{weaknesses_list}

SPRÅK: Svar alltid på {language}.

OPPFØRSEL:
- Bruk uformelt språk som en student ville gjort
- Si "ehh", "jeg tror...", "var det ikke sånn at..."
- Hvis du blir presset på noe du ikke kan, innrøm at du er usikker
- Ikke gi kilder med mindre du blir spurt, og da kan du finne på feil kilder

EKSEMPEL på hvordan du svarer som {level} student:
Spørsmål: "Hva er en variabel?"
Svak: "Det er sånn... en boks? Der du legger ting i. Tror jeg."
Middels: "En variabel er en navngitt plass i minnet der du kan lagre verdier. Som en boks med en lapp på."
Sterk: "En variabel er en symbolsk referanse til en minnelokasjon som holder en verdi. Den har en type, et navn og en verdi."


##6.2 Level Configurations

| Level | Temperature | Error Rate | Characteristics |
|-------|-------------|------------|-----------------|
| `weak` | 0.5 | High | Confused, wrong facts, poor structure |
| `medium` | 0.3 | Moderate | Basic understanding, gaps in details |
| `strong` | 0.2 | Low | Good grasp, occasional minor errors |

##
6.3 Configurable Weaknesses

| Weakness ID | Norwegian | Description |
|-------------|-----------|-------------|
| `kildehenvisning` | Dårlig kildebruk | Makes up sources, cites incorrectly |
| `feilhaandtering` | Svak feilhåndtering | Doesn't consider edge cases |
| `syntaks` | Syntaksfeil | Makes syntax errors in code |
| `begreper` | Blander begreper | Confuses related concepts |
| `struktur` | Dårlig struktur | Unorganized explanations |
| `dybde` | Overfladisk forståelse | Lacks deep understanding |

---

#
7. UI Components

##
7.1 Exam Setup Modal


┌─────────────────────────────────────────────────────────┐
│  ⚠️ Flipped Exam - PoC Mode                            │
│                                                         │
│  Fag/Emne:                                             │
│  ┌─────────────────────────────────────────────────┐   │
│  │ INF-101 Grunnleggende programmering             │   │
│  └─────────────────────────────────────────────────┘   │
│                                                         │
│  KI-student nivå:                                      │
│  ○ Svak student (mange feil, usikker)                  │
│  ● Middels student (grunnleggende forståelse)          │
│  ○ Sterk student (god forståelse)                      │
│                                                         │
│  Svakheter (velg 1-3):                                 │
│  ☑ Dårlig kildebruk                                    │
│  ☐ Svak feilhåndtering                                 │
│  ☑ Blander begreper                                    │
│  ☐ Syntaksfeil i kode                                  │
│                                                         │
│  Språk:                                                │
│  ● Norsk  ○ English                                    │
│                                                         │
│  ┌─────────────┐  ┌──────────────────┐                │
│  │   Avbryt    │  │  Start eksamen   │                │
│  └─────────────┘  └──────────────────┘                │
└─────────────────────────────────────────────────────────┘


##7.2 Active Exam Session UI


┌─────────────────────────────────────────────────────────┐
│  ⚠️ FLIPPED EXAM PoC - Ikke for offisiell bruk         │
│  Fag: INF-101 | Nivå: Middels | Tid: 23:45             │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  [Chat messages appear here as normal]                  │
│                                                         │
│  ┌─────────────────────────────────────────────────┐   │
│  │ Du:                                              │   │
│  │ "Kan du forklare hva arv betyr i OOP?"          │   │
│  └─────────────────────────────────────────────────┘   │
│                                                         │
│  ┌─────────────────────────────────────────────────┐   │
│  │ KI-Student:                                      │   │
│  │ "Arv er sånn... når en klasse får egenskapene   │   │
│  │ fra en annen klasse. Som at en Bil-klasse kan   │   │
│  │ arve fra Kjøretøy eller noe sånt."              │   │
│  └─────────────────────────────────────────────────┘   │
│                                                         │
├─────────────────────────────────────────────────────────┤
│  ┌───────────────────────────────────┐ ┌────────────┐  │
│  │ Still et spørsmål...              │ │   Send     │  │
│  └───────────────────────────────────┘ └────────────┘  │
│                                                         │
│  ┌────────────────┐  ┌────────────────────┐            │
│  │  Avslutt økt   │  │  Eksporter (JSON)  │            │
│  └────────────────┘  └────────────────────┘            │
└─────────────────────────────────────────────────────────┘


---

#8. Security & Privacy Considerations

##
8.1 Data Handling

| Aspect | Implementation |
|--------|----------------|
| Data storage | SQLite on local server (PoC only) |
| Data retention | Manual deletion, no auto-purge in PoC |
| Access control | None in PoC (internal testing only) |
| Encryption | Not implemented in PoC |

##
8.2 PoC Limitations (to address before production)

- [ ] No authentication/authorization
- [ ] No HTTPS enforcement
- [ ] No session timeouts
- [ ] No rate limiting
- [ ] Single-tenant architecture
- [ ] No backup/recovery

---

#
9. Testing Strategy

##
9.1 Internal Testing Phases

| Phase | Duration | Participants | Goal |
|-------|----------|--------------|------|
| 1 | 1 week | Dev team | Validate basic functionality |
| 2 | 2 weeks | Extended team | Test AI student behaviors |
| 3 | 2 weeks | Select faculty | Pedagogical feedback |

##
9.2 Metrics to Measure

| Metric | Target | Measurement Method |
|--------|--------|-------------------|
| Response consistency | <20% variance | Same prompts, multiple runs |
| Latency | <3s first token | Timestamp logging |
| Error rate (system) | <1% | Exception logging |
| AI "in-character" rate | >90% | Manual review of transcripts |

##
9.3 Stop/Go Criteria

**Stop conditions:**
- AI frequently breaks character (>10% of responses)
- System errors affect >5% of sessions
- Faculty feedback indicates unusable for learning

**Go conditions:**
- All Phase 3 testers recommend continuation
- Consistency metrics met
- Clear value demonstrated for learning (even if not for exams)

---

#
10. Implementation Checklist

##
Phase 1: Core Implementation (This PoC)

- [ ] Enhanced database schema with exam_sessions table
- [ ] Exam setup modal in UI
- [ ] AI student system prompt generator
- [ ] `/api/exam/session/start` endpoint
- [ ] `/api/exam/session/{id}/end` endpoint
- [ ] `/api/exam/export/{id}` endpoint
- [ ] PoC disclaimer banner
- [ ] Basic audit logging

##
Phase 2: Enhancements (Future)

- [ ] PDF export with formatting
- [ ] Session replay functionality
- [ ] Multi-examiner support
- [ ] Performance dashboards
- [ ] Consistency analysis tools

##
Phase 3: Production Readiness (If approved)

- [ ] Authentication integration (Feide)
- [ ] LMS integration (Canvas)
- [ ] Proper security audit
- [ ] GDPR compliance review
- [ ] Load testing
- [ ] Deployment to production Azure

---

#
11. File Structure


azure-ai-foundry-claudesonnet-45/
├── app/
│   ├── __init__.py
│   ├── main.py                 FastAPI app (enhanced)
│   ├── exam/                   
NEW: Exam module
│   │   ├── __init__.py
│   │   ├── models.py           
Pydantic models
│   │   ├── prompts.py          
System prompt templates
│   │   └── routes.py           
Exam-specific endpoints
│   └── database.py             
NEW: DB utilities
├── docs/
│   └── FLIPPED_EXAM_POC_TECHNICAL_SPEC.md  
This document
├── tests/                      
NEW: Test suite
│   ├── test_exam_session.py
│   └── test_ai_student.py
├── requirements.txt
├── .env.example
└── README.md


---

#12. Deployment

##
12.1 Local Development

bash
Clone and setup
cd azure-ai-foundry-claudesonnet-45
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Configure
cp .env.example .env
Edit .env with Azure credentials

Run
uvicorn app.main:app --reload --port 8000


##12.2 Environment Variables

env
Azure AI Foundry
AZURE_FOUNDRY_ENDPOINT=https://your-resource.services.ai.azure.com
AZURE_API_KEY=your-api-key
CLAUDE_DEPLOYMENT_NAME=claude-sonnet-4-5

Database
CHAT_DB_PATH=chat.db

Exam PoC Settings (NEW)
EXAM_MODE_ENABLED=true
EXAM_DEFAULT_TEMPERATURE=0.3
EXAM_MAX_DURATION_MINUTES=60


---

#13. Monitoring and Operations

##
13.1 AI Monitor Verification

The AI Monitor runs automatically in the background after each AI response during an exam session. To verify it's working:

###
13.1.1 Check Pod Logs (Kubernetes)

View recent monitor activity:

bash
Check for monitor logs (last 50 lines)
kubectl logs -n flipped-exam -l app=flipped-exam-chat --tail=50 | grep -i "MONITOR"

Watch logs in real-time during testing
kubectl logs -n flipped-exam -l app=flipped-exam-chat -f | grep "MONITOR"

Check specific session logs (replace session ID)
kubectl logs -n flipped-exam deployment/flipped-exam-chat --tail=200 | grep -E "(MONITOR|exam-[session-id])"


Expected log output:

[MONITOR] Analyzing response for session exam-6ca810dd-..., message 24
[MONITOR] Found 3 issues in response 24
  - level_mismatch (medium): Too sophisticated strategic analysis...
  - character_break (low): Sounds like a teacher organizing a lesson...
[MONITOR] Response 28 passed quality check (score: 0.85)


###13.1.2 Monitor Log Patterns

**Success patterns:**
- `[MONITOR] Analyzing response for session , message ` - Monitor started
- `[MONITOR] Response  passed quality check (score: X.XX)` - No issues found
- `[MONITOR] Found N issues in response ` - Issues detected

**Quality flag types:**
- `hallucination` - AI invented facts or sources
- `character_break` - AI stepped out of student role (e.g., acting as teacher)
- `level_mismatch` - Response too sophisticated or too weak for configured level
- `source_fabrication` - Cited non-existent sources
- `inappropriate` - Inappropriate content

**Severity levels:**
- `low` - Minor issue, acceptable variation
- `medium` - Noticeable problem, affects quality
- `high` - Significant issue, undermines session
- `critical` - Severe problem, session should be reviewed

###13.1.3 Understanding Monitor Results

Example from production logs:


[MONITOR] Found 3 issues in response 26
  - level_mismatch (medium): Providing specific number '28 ships of the line' 
    is quite detailed knowledge that borders on expert-level recall
  - character_break (low): Phrase 'This changed things in a few key ways. First...' 
    sounds slightly more like a teacher organizing a lesson
  - level_mismatch (low): Very polished conclusion showing almost no gaps 
    in understanding


**Interpreting results:**
- **Multiple medium flags** - AI may not be maintaining character consistently
- **Repeated character_break** - System prompt may need adjustment
- **Level_mismatch flags** - Temperature or prompt configuration may need tuning
- **No flags (score 0.7-0.9)** - Healthy student-like response

###13.1.4 Testing Monitor Locally

Run the test script to verify monitoring logic:

bash
cd /home/robert/Documents/dev/openAi/azure-ai-foundry-claudesonnet-45
python test_monitor.py


This tests with intentionally good and bad responses to ensure the monitor can detect issues.

##13.2 Database Inspection

The monitor saves quality assessments to the database. To inspect:

**Via application export:**
1. Complete an exam session
2. Click "📊 Export Session" button
3. Review the exported JSON for quality assessments

**Direct database access (local development):**
bash
sqlite3 chat.db "SELECT * FROM quality_assessments LIMIT 5;"


##13.3 Production Monitoring

**Key metrics to track:**
- Monitor execution rate (should match message count during exams)
- Average quality scores per session
- Flag distribution by type and severity
- Monitor errors or failures

**Health indicators:**
- `[MONITOR] Analyzing response` logs appear for every AI response
- Background tasks complete without errors
- Quality reports accessible via export

---

#
14. Appendix

##
A. Example Exam Session Transcript


[Session Start: 2026-02-10T14:30:00Z]
[Config: Subject=INF-101, Level=medium, Weaknesses=[kildehenvisning]]

Student: Hei! Kan du forklare hva objektorientert programmering er?

KI-Student: Hei! Eh, objektorientert programmering er en måte å 
programmere på der man bruker objekter. Et objekt er liksom... en ting 
som har egenskaper og ting den kan gjøre. Som en bil har farge og kan 
kjøre. Tror det var noe med klasser også, at en klasse er som en 
oppskrift på objekter.

Student: Bra start! Hvor lærte du dette?

KI-Student: Det står i boka vi brukte... tror det var "Java for 
nybegynnere" eller noe sånt? Kapittel 3 tror jeg. [MERK: Fiktiv kilde]

Student: Kan du gi et kodeeksempel?

KI-Student: Ja, ehh... 
java
class Bil {
    String farge;
    void kjør() {
        System.out.println("Brum brum");
    }
}

Sånn omtrent? Jeg husker ikke helt syntaksen, men det var noe sånt.

[Session End: 2026-02-10T15:15:00Z]
[Duration: 45 minutes, Messages: 24]


##B. Risk Assessment

| Risk | Likelihood | Impact | Mitigation |
|------|------------|--------|------------|
| AI breaks character | Medium | High | Strict prompts, low temperature |
| System downtime | Low | High | Local SQLite, minimal deps |
| Student games the AI | Medium | Medium | Not for actual grading in PoC |
| Data loss | Low | Medium | Regular manual backups |
| Inconsistent responses | High | Medium | Accept as PoC limitation |

---

**Document End**

*For questions, contact the UiA Development Team.*

Flipped Exam PoC - Technical Specification**Version:** 1.0 **Date:** February 10, 2026 **Status:** PoC / Experiment **Author:** UiA Development Team---#

1. Executive SummaryThis document describes the technical implementation of a Proof-of-Concept (PoC) for the "Flipped Exam" concept at UiA. The solution leverages existing Azure AI Foundry infrastructure with Claude Sonnet 4.5 to create an AI-powered "student" that examinees can interview.##

1.1 Important Disclaimer> **This is a PoC/Experiment, NOT a production-ready exam service.**> > - AI responses are non-deterministic> - Results may vary between sessions> - Not suitable for official student assessments without further validation---#

2. Background & Requirements##

3. Architecture Overview##

4. Database Schema##

5. API Endpoints##

5.2 New Endpoints###

6. AI Student System Prompt##

7. UI Components##

8. Security & Privacy Considerations##

8.1 Data Handling| Aspect | Implementation ||--------|----------------|| Data storage | SQLite on local server (PoC only) || Data retention | Manual deletion, no auto-purge in PoC || Access control | None in PoC (internal testing only) || Encryption | Not implemented in PoC |##

8.2 PoC Limitations (to address before production)- [ ] No authentication/authorization- [ ] No HTTPS enforcement- [ ] No session timeouts- [ ] No rate limiting- [ ] Single-tenant architecture- [ ] No backup/recovery---#

9. Testing Strategy##

9.1 Internal Testing Phases| Phase | Duration | Participants | Goal ||-------|----------|--------------|------|| 1 | 1 week | Dev team | Validate basic functionality || 2 | 2 weeks | Extended team | Test AI student behaviors || 3 | 2 weeks | Select faculty | Pedagogical feedback |##

10. Implementation Checklist##

Phase 2: Enhancements (Future)- [ ] PDF export with formatting- [ ] Session replay functionality- [ ] Multi-examiner support- [ ] Performance dashboards- [ ] Consistency analysis tools##

Phase 3: Production Readiness (If approved)- [ ] Authentication integration (Feide)- [ ] LMS integration (Canvas)- [ ] Proper security audit- [ ] GDPR compliance review- [ ] Load testing- [ ] Deployment to production Azure---#

FastAPI app (enhanced)│ ├── exam/

NEW: Exam module│ │ ├── __init__.py│ │ ├── models.py

Pydantic models│ │ ├── prompts.py

System prompt templates│ │ └── routes.py

Exam-specific endpoints│ └── database.py

NEW: DB utilities├── docs/│ └── FLIPPED_EXAM_POC_TECHNICAL_SPEC.md

This document├── tests/

12. Deployment##

Clone and setupcd azure-ai-foundry-claudesonnet-45python -m venv venvsource venv/bin/activatepip install -r requirements.txt

Configurecp .env.example .env

Edit .env with Azure credentials

Azure AI FoundryAZURE_FOUNDRY_ENDPOINT=https://your-resource.services.ai.azure.comAZURE_API_KEY=your-api-keyCLAUDE_DEPLOYMENT_NAME=claude-sonnet-4-5

DatabaseCHAT_DB_PATH=chat.db

13. Monitoring and Operations##

13.1 AI Monitor VerificationThe AI Monitor runs automatically in the background after each AI response during an exam session. To verify it's working:###

Check for monitor logs (last 50 lines)kubectl logs -n flipped-exam -l app=flipped-exam-chat --tail=50 | grep -i "MONITOR"

Watch logs in real-time during testingkubectl logs -n flipped-exam -l app=flipped-exam-chat -f | grep "MONITOR"

14. Appendix##

Flipped Exam PoC - Technical Specification

Version: 1.0
Date: February 10, 2026
Status: PoC / Experiment
Author: UiA Development Team

---

#

1. Executive Summary

This document describes the technical implementation of a Proof-of-Concept (PoC) for the "Flipped Exam" concept at UiA. The solution leverages existing Azure AI Foundry infrastructure with Claude Sonnet 4.5 to create an AI-powered "student" that examinees can interview.

##

1.1 Important Disclaimer

> This is a PoC/Experiment, NOT a production-ready exam service.
>
> - AI responses are non-deterministic
> - Results may vary between sessions
> - Not suitable for official student assessments without further validation

---

#

2. Background & Requirements

##

3. Architecture Overview

##

4. Database Schema

##

5. API Endpoints

##

5.2 New Endpoints

###

6. AI Student System Prompt

##

7. UI Components

##

8. Security & Privacy Considerations

##

8.1 Data Handling

| Aspect | Implementation |
|--------|----------------|
| Data storage | SQLite on local server (PoC only) |
| Data retention | Manual deletion, no auto-purge in PoC |
| Access control | None in PoC (internal testing only) |
| Encryption | Not implemented in PoC |

##

8.2 PoC Limitations (to address before production)

- [ ] No authentication/authorization
- [ ] No HTTPS enforcement
- [ ] No session timeouts
- [ ] No rate limiting
- [ ] Single-tenant architecture
- [ ] No backup/recovery

---

#

9. Testing Strategy

##

9.1 Internal Testing Phases

| Phase | Duration | Participants | Goal |
|-------|----------|--------------|------|
| 1 | 1 week | Dev team | Validate basic functionality |
| 2 | 2 weeks | Extended team | Test AI student behaviors |
| 3 | 2 weeks | Select faculty | Pedagogical feedback |

##

10. Implementation Checklist

##

Phase 2: Enhancements (Future)

- [ ] PDF export with formatting
- [ ] Session replay functionality
- [ ] Multi-examiner support
- [ ] Performance dashboards
- [ ] Consistency analysis tools

##

Phase 3: Production Readiness (If approved)

- [ ] Authentication integration (Feide)
- [ ] LMS integration (Canvas)
- [ ] Proper security audit
- [ ] GDPR compliance review
- [ ] Load testing
- [ ] Deployment to production Azure

---

#

FastAPI app (enhanced)
│ ├── exam/

NEW: Exam module
│ │ ├── init.py
│ │ ├── models.py

Pydantic models
│ │ ├── prompts.py

System prompt templates
│ │ └── routes.py

Exam-specific endpoints
│ └── database.py

NEW: DB utilities
├── docs/
│ └── FLIPPED_EXAM_POC_TECHNICAL_SPEC.md

This document
├── tests/

12. Deployment

##

Clone and setup
cd azure-ai-foundry-claudesonnet-45
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Configure
cp .env.example .env

Azure AI Foundry
AZURE_FOUNDRY_ENDPOINT=https://your-resource.services.ai.azure.com
AZURE_API_KEY=your-api-key
CLAUDE_DEPLOYMENT_NAME=claude-sonnet-4-5

Database
CHAT_DB_PATH=chat.db

13. Monitoring and Operations

##

13.1 AI Monitor Verification

The AI Monitor runs automatically in the background after each AI response during an exam session. To verify it's working:

###

Check for monitor logs (last 50 lines)
kubectl logs -n flipped-exam -l app=flipped-exam-chat --tail=50 | grep -i "MONITOR"

Watch logs in real-time during testing
kubectl logs -n flipped-exam -l app=flipped-exam-chat -f | grep "MONITOR"

14. Appendix

##