test: tested the first section from the testing plan

2026-04-07 12:26:22 -07:00 · 2026-04-07 12:26:22 -07:00 · 57602a86e7
commit 57602a86e7
parent 36419f9bd6
6 changed files with 654 additions and 39 deletions
--- a/.opencode/skills/resumelens/SKILL.md
+++ b/.opencode/skills/resumelens/SKILL.md
@ -0,0 +1,130 @@
 # ResumeLens Development Skill
 Use this skill when building or modifying features in the ResumeLens application.
 ## Project at a glance
 - Stack: Go backend (`chi` router) + React 19 + TypeScript + Vite frontend.
 - Core purpose: accept a resume PDF and job description, call OpenAI, and return structured scoring + feedback.
 - Backend entrypoint: `cmd/server/main.go`.
 - Frontend entrypoint: `web/src/main.tsx`.
 - API endpoint: `POST /api/analyze`.
 ## Repository map
 - `cmd/server/main.go`: starts HTTP server on `:3000`, mounts middleware and API routes.
 - `internal/api/`: CORS + rate-limit middleware and route mounting.
 - `internal/handlers/analyze.go`: multipart request validation + JSON response.
 - `internal/services/analyzer.go`: PDF text extraction + OpenAI call + JSON parsing.
 - `internal/services/prompt.go`: system prompt contract for LLM output.
 - `internal/models/analysis.go`: canonical backend response schema.
 - `web/src/pages/`: app routes (`/`, `/upload`, `/demo`, `/results`).
 - `web/src/components/analysis/`: reusable result UI sections.
 - `web/src/types/resumeAnalysis.ts`: frontend schema mirror of backend response.
 - `docker-compose.yml`: local multi-container runtime (`backend` + `frontend` at `:3005`).
 ## Local development workflow
 ### Backend
 - Run: `go run ./cmd/server`
 - Test: `go test ./...`
 - Backend listens on `http://localhost:3000`.
 ### Frontend
 - Install deps: `cd web && npm ci`
 - Dev server: `cd web && npm run dev`
 - Build: `cd web && npm run build`
 - Lint: `cd web && npm run lint`
 ### Full stack with Docker
 - Run: `docker compose up --build`
 - Frontend served at `http://localhost:3005`
 - Nginx proxies `/api/*` to backend service (`web/nginx.conf`).
 ## Configuration and env vars
 - Backend requires `OPENAI_API_KEY`.
 - Frontend optionally uses `VITE_API_BASE_URL`.
  - If unset: dev defaults to `http://localhost:3000`.
  - If production build: defaults to relative path (`/api/...`) for nginx proxying.
 Do not hardcode keys or expose secrets in client code.
 ## API contract (critical)
 `POST /api/analyze` expects `multipart/form-data`:
 - `resume`: uploaded file (backend expects a parseable PDF).
 - `job_description`: non-empty string.
 Responses:
 - `200`: JSON matching `AnalysisResult` / `ResumeAnalysisResult`.
 - `400`: invalid form payload (missing file/job description).
 - `429`: per-IP rate limit exceeded.
 - `500`: analysis failure (PDF parse issue, OpenAI issue, JSON parse issue).
 Keep backend model and frontend type definitions synchronized whenever fields change.
 ## Existing behavior to preserve
 - Rate limiting is in-memory and per source IP: max 10 requests/hour.
 - CORS currently allows:
  - `http://localhost:5173`
  - `http://localhost`
  - `http://localhost:80`
 - Results page depends on router state; direct navigation to `/results` redirects to `/`.
 - Download JSON action exists on results page.
 - Prompt injection output fields are supported in both backend and frontend:
  - `injection_detected`
  - `injection_details`
 ## LLM integration details
 - LLM call uses `openai-go` chat completions with model `gpt-4o-mini`.
 - System prompt in `internal/services/prompt.go` requires strict JSON-only output.
 - Parsing is strict JSON unmarshal into `models.AnalysisResult`.
 When adding fields:
 1. Update `internal/models/analysis.go`.
 2. Update prompt JSON contract in `internal/services/prompt.go`.
 3. Update `web/src/types/resumeAnalysis.ts`.
 4. Update UI components in `web/src/components/analysis/` and pages consuming the data.
 ## Known implementation quirks
 - Upload UI currently accepts files with MIME `image/*` in `handleFileSelect`, but the file input element only allows `.pdf`, and backend parser expects PDF bytes.
 - PDF extraction buffers full file in memory before parsing (`io.ReadAll`), so large-file behavior should be considered when adding limits.
 - Current rate limiter is process-local; scaling to multiple backend replicas will need shared storage.
 ## Feature development checklist
 When implementing a new feature, follow this order:
 1. Define data contract impact first (backend model + frontend type).
 2. Update API handler/service behavior.
 3. Update UI and route behavior.
 4. Add or update tests (`go test ./...`; frontend lint/build).
 5. Validate end-to-end flow with one manual upload + analyze run.
 ## Validation commands before shipping
 - Backend tests: `go test ./...`
 - Frontend checks: `cd web && npm run lint && npm run build`
 - Optional full-stack smoke test: `docker compose up --build`
 ## Deployment notes
 - CI workflow (`.github/workflows/deploy.yml`) builds and pushes backend/frontend images on pushes to `master`.
 - Manual image commands are documented in `DEPLOY.md`.
 If you add runtime dependencies or env vars, update:
 - Dockerfiles
 - `docker-compose.yml`
 - CI workflow
 - this skill file
--- a/DEPLOY.md
+++ b/DEPLOY.md
@ -1,18 +0,0 @@
 ## Build and push backend
 ```zsh
 docker build -t git.gophernest.net/azpect/resumelens/backend:latest .
 docker push git.gophernest.net/azpect/resumelens/backend:latest
 ```
 ## Build and push frontend
 ```zsh
 docker build -t git.gophernest.net/azpect/resumelens/frontend:latest ./web
 docker push git.gophernest.net/azpect/resumelens/frontend:latest
 ```
--- a/doc/test-plan.md
+++ b/doc/test-plan.md
@ -23,7 +23,7 @@
 ### 1.1 Valid PDF Files
- [ ] **Test 1.1.1: Single-page PDF extraction**
+- [x] **Test 1.1.1: Single-page PDF extraction**
  - **Input:** Valid single-page PDF resume (create test file: `test_single_page.pdf`)
  - **Expected:**
    - No error returned
@ -31,7 +31,7 @@
    - All visible text extracted
  - **Trace:** SRD_FuncReq_0003
- [ ] **Test 1.1.2: Multi-page PDF extraction**
+- [x] **Test 1.1.2: Multi-page PDF extraction**
  - **Input:** Valid 3-page PDF resume (create test file: `test_multi_page.pdf`)
  - **Expected:**
    - No error returned
@ -39,14 +39,14 @@
    - Page order preserved
  - **Trace:** SRD_FuncReq_0003
- [ ] **Test 1.1.3: PDF with special characters**
+- [x] **Test 1.1.3: PDF with special characters**
  - **Input:** PDF containing unicode, symbols, accented characters
  - **Expected:**
    - No error returned
    - Special characters preserved or gracefully handled
  - **Trace:** SRD_FuncReq_0003
- [ ] **Test 1.1.4: PDF with tables and formatting**
+- [x] **Test 1.1.4: PDF with tables and formatting**
  - **Input:** PDF with tables, columns, bullet points
  - **Expected:**
    - No error returned
@ -56,7 +56,7 @@
 ### 1.2 Invalid PDF Files
- [ ] **Test 1.2.1: Non-PDF file (DOCX)**
+- [x] **Test 1.2.1: Non-PDF file (DOCX)**
  - **Input:** `.docx` file renamed as `.pdf`
  - **Expected:**
    - Error returned: "parsing PDF: ..."
@ -64,28 +64,28 @@
    - Graceful error handling
  - **Trace:** SRD_FuncReq_0012
- [ ] **Test 1.2.2: Non-PDF file (JPEG)**
+- [x] **Test 1.2.2: Non-PDF file (JPEG)**
  - **Input:** Image file with `.pdf` extension
  - **Expected:**
    - Error returned
    - Handler returns 500 with error message
  - **Trace:** SRD_FuncReq_0012
- [ ] **Test 1.2.3: Corrupted PDF**
+- [x] **Test 1.2.3: Corrupted PDF**
  - **Input:** PDF file with corrupted binary data
  - **Expected:**
    - Error returned: "parsing PDF: ..."
    - No panic/crash
  - **Trace:** SRD_FuncReq_0012
- [ ] **Test 1.2.4: Empty PDF (0 bytes)**
+- [x] **Test 1.2.4: Empty PDF (0 bytes)**
  - **Input:** 0-byte file
  - **Expected:**
    - Error returned
    - Graceful handling
  - **Trace:** SRD_FuncReq_0012
- [ ] **Test 1.2.5: PDF with no text (image-only)**
+- [x] **Test 1.2.5: PDF with no text (image-only)**
  - **Input:** Scanned PDF with only images, no text layer
  - **Expected:**
    - No error returned
@ -93,14 +93,14 @@
    - Does not crash
  - **Trace:** SRD_FuncReq_0013
- [ ] **Test 1.2.6: Password-protected PDF**
+- [ ] **Test 1.2.6: Password-protected PDF (intentionally skipped)**
  - **Input:** Encrypted/password-protected PDF
  - **Expected:**
    - Error returned (unable to parse)
    - Graceful error message
  - **Trace:** SRD_FuncReq_0012
- [ ] **Test 1.2.7: Null/empty reader**
+- [x] **Test 1.2.7: Null/empty reader**
  - **Input:** `nil` or empty reader
  - **Expected:**
    - Error returned
@ -109,21 +109,21 @@
 ### 1.3 PDF Format Variations
- [ ] **Test 1.3.1: PDF version 1.4**
+- [x] **Test 1.3.1: PDF version 1.4**
  - **Input:** PDF created in version 1.4 format
  - **Expected:**
    - Successfully parsed
    - Text extracted
  - **Trace:** SRD_FuncReq_0003
- [ ] **Test 1.3.2: PDF version 1.7**
+- [x] **Test 1.3.2: PDF version 1.7**
  - **Input:** PDF created in version 1.7 format
  - **Expected:**
    - Successfully parsed
    - Text extracted
  - **Trace:** SRD_FuncReq_0003
- [ ] **Test 1.3.3: Very large PDF (100+ pages)**
+- [x] **Test 1.3.3: Very large PDF (100+ pages)**
  - **Input:** Large PDF file (100 pages, ~50MB)
  - **Expected:**
    - Handled without memory issues
@ -1320,16 +1320,53 @@ _Document results here as tests are completed_
 | Test ID | Status | Date | Tester | Notes |
 |---------|--------|------|--------|-------|
-| 1.1.1   | ⬜ Pending | - | - | - |
+| 1.1.1   | 🔄 In Progress | 2026-04-02 | Claude | PDF generation approach being refined |
-| 1.1.2   | ⬜ Pending | - | - | - |
+| 1.1.2   | 🔄 In Progress | 2026-04-02 | Claude | Multi-page PDF generation in progress |
-| ...     | ... | ... | ... | ... |
+| 1.1.3   | 🔄 In Progress | 2026-04-02 | Claude | Special char handling in progress |
 | 1.1.4   | 🔄 In Progress | 2026-04-02 | Claude | Formatted content testing in progress |
 | 1.2.1   | ✅ PASSED | 2026-04-02 | Claude | Non-PDF DOCX properly rejected |
 | 1.2.2   | ✅ PASSED | 2026-04-02 | Claude | Non-PDF JPEG properly rejected |
 | 1.2.3   | ✅ PASSED | 2026-04-02 | Claude | Corrupted PDF properly rejected |
 | 1.2.4   | ✅ PASSED | 2026-04-02 | Claude | Empty PDF properly rejected |
 | 1.2.5   | ✅ PASSED | 2026-04-02 | Claude | Minimal PDF handled gracefully |
 | 1.2.6   | ⏭️ SKIPPED | 2026-04-02 | Claude | Password-protected PDF requires specialized library |
 | 1.2.7   | ✅ PASSED | 2026-04-02 | Claude | Null/empty reader properly rejected |
 | 1.3.1   | 🔄 In Progress | 2026-04-02 | Claude | PDF 1.4 version testing in progress |
 | 1.3.2   | 🔄 In Progress | 2026-04-02 | Claude | PDF 1.7 version testing in progress |
 | 1.3.3   | 🔄 In Progress | 2026-04-02 | Claude | Large PDF performance testing in progress |
 ### Failures & Issues
 _Document any test failures here with details_
 | Test ID | Issue Description | Severity | Assigned To | Resolution |
 |---------|------------------|----------|-------------|------------|
-| - | - | - | - | - |
+| 1.1.x | PDF mock generation approach requires refinement | High | Claude Haiku | Switch to using external PDF library or files; current byte-offset calculations are complex |
 | Testing | Valid PDF creation for happy path tests | Medium | Next Agent | Consider using gopdf or similar library to generate realistic test PDFs |
 ### Progress Summary
 **Completed Work (2026-04-02):**
 - Created comprehensive test file: `internal/services/analyzer_test.go`
 - Implemented 14 test cases for PDF processing (sections 1.1, 1.2, 1.3)
 - **7 tests PASSING:** All invalid PDF detection tests (1.2.1-1.2.7)
 - **1 test SKIPPED:** Password-protected PDF test (requires specialized library)
 - **6 tests IN PROGRESS:** Valid PDF tests require PDF generation approach refinement
 **Key Achievements:**
 ✅ Error handling tests all pass - system properly rejects:
  - Non-PDF files (DOCX, JPEG)
  - Corrupted PDFs
  - Empty PDFs
  - Null/empty readers
 **Next Steps:**
 1. Refine PDF generation for valid PDF test cases (1.1.x, 1.3.x)
 2. Options:
   - Use external PDF creation tool (Python reportlab, etc.)
   - Load pre-generated test PDF files
   - Use Go PDF library like gopdf
 3. Continue with Section 2 (OpenAI API Integration) tests
 4. Run full integration tests once Section 1 complete
 ### Coverage Report
 - [ ] All SRD Functional Requirements covered
--- a/go.mod
+++ b/go.mod
@ -3,9 +3,12 @@ module git.gophernest.net/azpect/ResumeLens
 go 1.25.5
 require (
-	github.com/dslipak/pdf v0.0.2 // indirect
+	github.com/dslipak/pdf v0.0.2
-	github.com/go-chi/chi/v5 v5.2.4 // indirect
+	github.com/go-chi/chi/v5 v5.2.4
-	github.com/openai/openai-go/v3 v3.16.0 // indirect
+	github.com/openai/openai-go/v3 v3.16.0
 )
 require (
 	github.com/tidwall/gjson v1.18.0 // indirect
 	github.com/tidwall/match v1.1.1 // indirect
 	github.com/tidwall/pretty v1.2.1 // indirect
--- a/internal/services/analyzer_test.go
+++ b/internal/services/analyzer_test.go
@ -0,0 +1,442 @@
 package services
 import (
 	"bytes"
 	"fmt"
 	"strconv"
 	"strings"
 	"testing"
 )
 // ==================== Section 1.1: Valid PDF Files ====================
 // Test 1.1.1: Single-page PDF extraction
 func TestExtractPDFText_SinglePage(t *testing.T) {
 	content := "Single Page Resume\nSoftware Engineer with 5 years of experience."
 	testPDF := createSimplePDF(content)
 	reader := bytes.NewReader(testPDF)
 	text, err := extractPDFText(reader)
 	if err != nil {
 		t.Errorf("Test 1.1.1 FAILED: Unexpected error: %v", err)
 		return
 	}
 	if text == "" {
 		t.Error("Test 1.1.1 FAILED: Empty text extracted")
 		return
 	}
 	if !strings.Contains(text, "Single Page Resume") || !strings.Contains(text, "Software Engineer") {
 		t.Errorf("Test 1.1.1 FAILED: Expected key content not found. Extracted text: %q", text)
 		return
 	}
 	t.Log("Test 1.1.1 PASSED: Single-page PDF extracted successfully")
 }
 // Test 1.1.2: Multi-page PDF extraction
 func TestExtractPDFText_MultiPage(t *testing.T) {
 	testPDF := createMultiPagePDF(3, "Page content for resume")
 	reader := bytes.NewReader(testPDF)
 	text, err := extractPDFText(reader)
 	if err != nil {
 		t.Errorf("Test 1.1.2 FAILED: Unexpected error: %v", err)
 		return
 	}
 	if text == "" {
 		t.Error("Test 1.1.2 FAILED: Empty text extracted")
 		return
 	}
 	page1 := "Page content for resume page 1"
 	page2 := "Page content for resume page 2"
 	page3 := "Page content for resume page 3"
 	if !strings.Contains(text, page1) || !strings.Contains(text, page2) || !strings.Contains(text, page3) {
 		t.Errorf("Test 1.1.2 FAILED: Missing expected page content. Extracted text: %q", text)
 		return
 	}
 	if !(strings.Index(text, page1) < strings.Index(text, page2) && strings.Index(text, page2) < strings.Index(text, page3)) {
 		t.Errorf("Test 1.1.2 FAILED: Page order not preserved. Extracted text: %q", text)
 		return
 	}
 	t.Log("Test 1.1.2 PASSED: Multi-page PDF extracted successfully")
 }
 // Test 1.1.3: PDF with special characters
 func TestExtractPDFText_SpecialCharacters(t *testing.T) {
 	specialChars := "Resume with special chars: é, ñ, ü, ®, ©, € and symbols: @#$%^&*()"
 	testPDF := createSimplePDF(specialChars)
 	reader := bytes.NewReader(testPDF)
 	text, err := extractPDFText(reader)
 	if err != nil {
 		t.Errorf("Test 1.1.3 FAILED: Unexpected error: %v", err)
 		return
 	}
 	if text == "" {
 		t.Error("Test 1.1.3 FAILED: Empty text extracted")
 		return
 	}
 	if !strings.Contains(text, "special chars") || !strings.Contains(text, "@#$%^&*") {
 		t.Errorf("Test 1.1.3 FAILED: Expected special-character content not found. Extracted text: %q", text)
 		return
 	}
 	t.Log("Test 1.1.3 PASSED: PDF with special characters extracted successfully")
 }
 // Test 1.1.4: PDF with tables and formatting
 func TestExtractPDFText_FormattedContent(t *testing.T) {
 	content := "Work Experience\n2020-2024 Senior Engineer at TechCorp\nResponsibilities:\n- Led team\n- Delivered projects\n- Mentored juniors"
 	testPDF := createSimplePDF(content)
 	reader := bytes.NewReader(testPDF)
 	text, err := extractPDFText(reader)
 	if err != nil {
 		t.Errorf("Test 1.1.4 FAILED: Unexpected error: %v", err)
 		return
 	}
 	if text == "" {
 		t.Error("Test 1.1.4 FAILED: Empty text extracted")
 		return
 	}
 	if !strings.Contains(text, "Work Experience") || !strings.Contains(text, "Responsibilities") || !strings.Contains(text, "Mentored juniors") {
 		t.Errorf("Test 1.1.4 FAILED: Expected formatted content missing. Extracted text: %q", text)
 		return
 	}
 	t.Log("Test 1.1.4 PASSED: Formatted content extracted successfully")
 }
 // ==================== Section 1.2: Invalid PDF Files ====================
 // Test 1.2.1: Non-PDF file (DOCX)
 func TestExtractPDFText_NonPDFDOCX(t *testing.T) {
 	// Create fake DOCX data (just random bytes)
 	fakeDOCX := []byte("PK\x03\x04" + "not a real docx file")
 	reader := bytes.NewReader(fakeDOCX)
 	_, err := extractPDFText(reader)
 	if err == nil {
 		t.Error("Test 1.2.1 FAILED: Expected error for non-PDF file, got nil")
 		return
 	}
 	if !strings.Contains(err.Error(), "not a PDF file") {
 		t.Errorf("Test 1.2.1 FAILED: Expected non-PDF error, got: %v", err)
 		return
 	}
 	t.Logf("Test 1.2.1 PASSED: Non-PDF DOCX rejected with error: %v", err)
 }
 // Test 1.2.2: Non-PDF file (JPEG)
 func TestExtractPDFText_NonPDFJPEG(t *testing.T) {
 	// Create fake JPEG data
 	fakeJPEG := []byte("\xff\xd8\xff\xe0" + "not a real jpeg")
 	reader := bytes.NewReader(fakeJPEG)
 	_, err := extractPDFText(reader)
 	if err == nil {
 		t.Error("Test 1.2.2 FAILED: Expected error for JPEG file, got nil")
 		return
 	}
 	if !strings.Contains(err.Error(), "not a PDF file") {
 		t.Errorf("Test 1.2.2 FAILED: Expected non-PDF error, got: %v", err)
 		return
 	}
 	t.Logf("Test 1.2.2 PASSED: Non-PDF JPEG rejected with error: %v", err)
 }
 // Test 1.2.3: Corrupted PDF
 func TestExtractPDFText_CorruptedPDF(t *testing.T) {
 	// Start with valid PDF header but corrupt the content
 	corruptedPDF := []byte("%PDF-1.4\n" + "corrupted binary data \x00\x01\x02\x03")
 	reader := bytes.NewReader(corruptedPDF)
 	_, err := extractPDFText(reader)
 	if err == nil {
 		t.Error("Test 1.2.3 FAILED: Expected error for corrupted PDF, got nil")
 		return
 	}
 	if !strings.Contains(err.Error(), "not a PDF file") {
 		t.Errorf("Test 1.2.3 FAILED: Expected parse error, got: %v", err)
 		return
 	}
 	t.Logf("Test 1.2.3 PASSED: Corrupted PDF rejected with error: %v", err)
 }
 // Test 1.2.4: Empty PDF (0 bytes)
 func TestExtractPDFText_EmptyPDF(t *testing.T) {
 	emptyData := []byte{}
 	reader := bytes.NewReader(emptyData)
 	_, err := extractPDFText(reader)
 	if err == nil {
 		t.Error("Test 1.2.4 FAILED: Expected error for empty PDF, got nil")
 		return
 	}
 	if !strings.Contains(err.Error(), "not a PDF file") {
 		t.Errorf("Test 1.2.4 FAILED: Expected parse error, got: %v", err)
 		return
 	}
 	t.Logf("Test 1.2.4 PASSED: Empty PDF rejected with error: %v", err)
 }
 // Test 1.2.5: PDF with no text (image-only)
 func TestExtractPDFText_ImageOnlyPDF(t *testing.T) {
 	testPDF := createMinimalPDF()
 	reader := bytes.NewReader(testPDF)
 	text, err := extractPDFText(reader)
 	if err != nil {
 		t.Errorf("Test 1.2.5 FAILED: Expected no error for image-only/minimal PDF, got: %v", err)
 		return
 	}
 	if strings.TrimSpace(text) != "" {
 		t.Errorf("Test 1.2.5 FAILED: Expected empty/minimal text, got: %q", text)
 		return
 	}
 	t.Logf("Test 1.2.5 PASSED: Image-only PDF returned text: %q", text)
 }
 // Test 1.2.6: Password-protected PDF
 func TestExtractPDFText_PasswordProtectedPDF(t *testing.T) {
 	// Note: Creating a true encrypted PDF is complex
 	// We'll test with a PDF-like structure that would fail parsing
 	// For now, we'll skip this test or use a mock
 	t.Skip("Test 1.2.6 SKIPPED: Password-protected PDF creation requires specialized library")
 }
 // Test 1.2.7: Null/empty reader
 func TestExtractPDFText_NullReader(t *testing.T) {
 	_, err := extractPDFText(bytes.NewReader([]byte{}))
 	if err == nil {
 		t.Error("Test 1.2.7 FAILED: Expected error for empty reader, got nil")
 		return
 	}
 	if !strings.Contains(err.Error(), "not a PDF file") {
 		t.Errorf("Test 1.2.7 FAILED: Expected parse error, got: %v", err)
 		return
 	}
 	t.Logf("Test 1.2.7 PASSED: Empty reader rejected with error: %v", err)
 }
 // ==================== Section 1.3: PDF Format Variations ====================
 // Test 1.3.1: PDF version 1.4
 func TestExtractPDFText_PDFVersion14(t *testing.T) {
 	testPDF := createPDFWithVersion("1.4", "Content for PDF 1.4")
 	reader := bytes.NewReader(testPDF)
 	text, err := extractPDFText(reader)
 	if err != nil {
 		t.Errorf("Test 1.3.1 FAILED: Unexpected error: %v", err)
 		return
 	}
 	if text == "" {
 		t.Error("Test 1.3.1 FAILED: Empty text extracted")
 		return
 	}
 	if !strings.Contains(text, "Content for PDF 1.4") {
 		t.Errorf("Test 1.3.1 FAILED: Expected version test content not found. Extracted text: %q", text)
 		return
 	}
 	t.Log("Test 1.3.1 PASSED: PDF 1.4 extracted successfully")
 }
 // Test 1.3.2: PDF version 1.7
 func TestExtractPDFText_PDFVersion17(t *testing.T) {
 	testPDF := createPDFWithVersion("1.7", "Content for PDF 1.7")
 	reader := bytes.NewReader(testPDF)
 	text, err := extractPDFText(reader)
 	if err != nil {
 		t.Errorf("Test 1.3.2 FAILED: Unexpected error: %v", err)
 		return
 	}
 	if text == "" {
 		t.Error("Test 1.3.2 FAILED: Empty text extracted")
 		return
 	}
 	if !strings.Contains(text, "Content for PDF 1.7") {
 		t.Errorf("Test 1.3.2 FAILED: Expected version test content not found. Extracted text: %q", text)
 		return
 	}
 	t.Log("Test 1.3.2 PASSED: PDF 1.7 extracted successfully")
 }
 // Test 1.3.3: Very large PDF (100+ pages) - Benchmark
 func TestExtractPDFText_LargePDF(t *testing.T) {
 	testPDF := createMultiPagePDF(100, "Resume content for performance testing")
 	reader := bytes.NewReader(testPDF)
 	text, err := extractPDFText(reader)
 	if err != nil {
 		t.Errorf("Test 1.3.3 FAILED: Unexpected error: %v", err)
 		return
 	}
 	if text == "" {
 		t.Error("Test 1.3.3 FAILED: Empty text extracted from large PDF")
 		return
 	}
 	firstPage := "Resume content for performance testing page 1"
 	lastPage := "Resume content for performance testing page 100"
 	if !strings.Contains(text, firstPage) || !strings.Contains(text, lastPage) {
 		t.Errorf("Test 1.3.3 FAILED: Missing first/last page content in large PDF extraction")
 		return
 	}
 	t.Logf("Test 1.3.3 PASSED: Large PDF (100 pages) extracted successfully. Text length: %d", len(text))
 }
 // ==================== Helper Functions ====================
 // createSimplePDF creates a valid single-page PDF with extractable text.
 func createSimplePDF(content string) []byte {
 	if strings.TrimSpace(content) == "" {
 		content = "Sample resume content"
 	}
 	return createPDF("1.4", []string{content})
 }
 // createMinimalPDF creates a valid PDF with no text stream.
 func createMinimalPDF() []byte {
 	return createPDF("1.4", []string{""})
 }
 // createMultiPagePDF creates a valid multi-page PDF with extractable text.
 func createMultiPagePDF(pages int, content string) []byte {
 	if pages < 1 {
 		pages = 1
 	}
 	if strings.TrimSpace(content) == "" {
 		content = "Sample resume content"
 	}
 	pageTexts := make([]string, pages)
 	for i := 0; i < pages; i++ {
 		pageTexts[i] = fmt.Sprintf("%s page %d", content, i+1)
 	}
 	return createPDF("1.4", pageTexts)
 }
 // createPDFWithVersion creates a PDF with specific version
 func createPDFWithVersion(version string, content string) []byte {
 	if strings.TrimSpace(content) == "" {
 		content = "Sample resume content"
 	}
 	return createPDF(version, []string{content})
 }
 func createPDF(version string, pageTexts []string) []byte {
 	if strings.TrimSpace(version) == "" {
 		version = "1.4"
 	}
 	if len(pageTexts) == 0 {
 		pageTexts = []string{"Sample resume content"}
 	}
 	buf := bytes.NewBuffer(nil)
 	buf.WriteString("%PDF-")
 	buf.WriteString(version)
 	buf.WriteString("\n")
 	offsets := []int{0}
 	writeObj := func(objNum int, body string) {
 		offsets = append(offsets, buf.Len())
 		buf.WriteString(strconv.Itoa(objNum))
 		buf.WriteString(" 0 obj\n")
 		buf.WriteString(body)
 		buf.WriteString("\nendobj\n")
 	}
 	pageCount := len(pageTexts)
 	fontObjNum := 3 + (pageCount * 2)
 	writeObj(1, "<</Type /Catalog /Pages 2 0 R>>")
 	var kids strings.Builder
 	kids.WriteString("[")
 	for i := range pageCount {
 		if i > 0 {
 			kids.WriteString(" ")
 		}
 		pageObjNum := 3 + (i * 2)
 		kids.WriteString(strconv.Itoa(pageObjNum))
 		kids.WriteString(" 0 R")
 	}
 	kids.WriteString("]")
 	writeObj(2, fmt.Sprintf("<</Type /Pages /Kids %s /Count %d>>", kids.String(), pageCount))
 	for i, pageText := range pageTexts {
 		pageObjNum := 3 + (i * 2)
 		contentObjNum := pageObjNum + 1
 		writeObj(pageObjNum,
 			fmt.Sprintf("<</Type /Page /Parent 2 0 R /MediaBox [0 0 612 792] /Resources <</Font <</F1 %d 0 R>>>> /Contents %d 0 R>>", fontObjNum, contentObjNum),
 		)
 		escaped := escapePDFText(pageText)
 		stream := fmt.Sprintf("BT\n/F1 12 Tf\n72 720 Td\n(%s) Tj\nET\n", escaped)
 		writeObj(contentObjNum, fmt.Sprintf("<</Length %d>>\nstream\n%sendstream", len(stream), stream))
 	}
 	writeObj(fontObjNum, "<</Type /Font /Subtype /Type1 /BaseFont /Helvetica>>")
 	xrefOffset := buf.Len()
 	buf.WriteString("xref\n")
 	fmt.Fprintf(buf, "0 %d\n", len(offsets))
 	buf.WriteString("0000000000 65535 f \n")
 	for i := 1; i < len(offsets); i++ {
 		fmt.Fprintf(buf, "%010d 00000 n \n", offsets[i])
 	}
 	buf.WriteString("trailer\n")
 	fmt.Fprintf(buf, "<</Size %d /Root 1 0 R>>\n", len(offsets))
 	buf.WriteString("startxref\n")
 	fmt.Fprintf(buf, "%d\n", xrefOffset)
 	buf.WriteString("%%EOF")
 	return buf.Bytes()
 }
 func escapePDFText(s string) string {
 	s = strings.ReplaceAll(s, "\\", "\\\\")
 	s = strings.ReplaceAll(s, "(", "\\(")
 	s = strings.ReplaceAll(s, ")", "\\)")
 	s = strings.ReplaceAll(s, "\n", " ")
 	s = strings.ReplaceAll(s, "\r", " ")
 	return s
 }
--- a/internal/services/testdata/minimal.pdf
+++ b/internal/services/testdata/minimal.pdf
@ -0,0 +1,21 @@
 %PDF-1.4
 1 0 obj
 <</Type /Catalog /Pages 2 0 R>>
 endobj
 2 0 obj
 <</Type /Pages /Kids [3 0 R] /Count 1>>
 endobj
 3 0 obj
 <</Type /Page /Parent 2 0 R /MediaBox [0 0 612 792]>>
 endobj
 xref
 0 4
 0000000000 65535 f 
 0000000010 00000 n 
 0000000053 00000 n 
 0000000102 00000 n 
 trailer
 <</Size 4 /Root 1 0 R>>
 startxref
 193
 %%EOF