# Cache Cleanup System Documentation

## Overview

The Aumentum Document System now includes automatic cache cleanup to manage temporary PDF files efficiently. This ensures the system doesn't accumulate large amounts of cached files over time.

## Cache Location

**Default Directory:** `/tmp/aumentum_pdfs/`

**Configurable via environment variable:**
```bash
export TEMP_PDF_DIR="/custom/cache/path"
```

## Cleanup Mechanisms

### 1. **Automatic Cleanup on Logout**

When a user logs out, the system automatically cleans cache files older than 1 hour.

**Backend:**
- Endpoint: `POST /auth/logout`
- Triggers: `cleanup_old_cache_files(max_age_hours=1)`
- Returns: Number of deleted files in response

**Frontend:**
- Clears all localStorage and sessionStorage
- Displays cleanup results in console
- Resets application state

### 2. **Periodic Background Cleanup**

The server runs automatic cleanup every 6 hours.

**How it works:**
- Background thread starts on app startup
- Runs every 6 hours automatically
- Deletes files older than 24 hours
- Logs cleanup activity

### 3. **Startup Cleanup**

When the server starts, it cleans old cache files immediately.

**Configuration:**
- Runs on: Application startup
- Max age: 24 hours (configurable)
- Logged in startup output

### 4. **Manual Cleanup Endpoints**

#### **POST /cache/cleanup**

Clean cache files manually.

**Parameters:**
- `force` (optional): If true, deletes ALL cache files. If false, only old files.

**Authentication:** Required (logged-in user)

**Example:**
```bash
# Clean old files only
curl -X POST "http://localhost:8001/cache/cleanup" \
  -H "Authorization: Bearer $TOKEN"

# Force clean ALL files
curl -X POST "http://localhost:8001/cache/cleanup?force=true" \
  -H "Authorization: Bearer $TOKEN"
```

**Response:**
```json
{
  "message": "Cache cleanup completed",
  "deleted_files": 15,
  "freed_space_bytes": 125829384,
  "freed_space_mb": 120.02,
  "force_cleanup": false,
  "cache_directory": "/tmp/aumentum_pdfs"
}
```

#### **GET /cache/stats**

Get cache directory statistics.

**Authentication:** Required (logged-in user)

**Example:**
```bash
curl -X GET "http://localhost:8001/cache/stats" \
  -H "Authorization: Bearer $TOKEN"
```

**Response:**
```json
{
  "total_files": 42,
  "total_size_bytes": 456789123,
  "total_size_mb": 435.56,
  "cache_directory": "/tmp/aumentum_pdfs",
  "oldest_file_age_hours": 23.5,
  "newest_file_age_hours": 0.2
}
```

## Configuration

### Cache Max Age

Edit `aumentum_api.py`:

```python
CACHE_MAX_AGE_HOURS = 24  # Change to desired hours
```

### Periodic Cleanup Frequency

Edit the startup_event function:

```python
time.sleep(6 * 3600)  # Change 6 to desired hours
```

## What Gets Cleaned

### Cleaned Files:
- ✅ Cached PDF files (`.pdf` in `TEMP_PDF_DIR`)
- ✅ Files older than configured max age
- ✅ Orphaned temporary files from interrupted conversions

### NOT Cleaned:
- ❌ Original `.bin` files in contentstore (read-only, never touched)
- ❌ Database records
- ❌ Active session data

## Cleanup Triggers

| Trigger | Max Age | Frequency | Description |
|---------|---------|-----------|-------------|
| **App Startup** | 24 hours | Once | Cleans old files when server starts |
| **Periodic** | 24 hours | Every 6 hours | Background thread cleanup |
| **User Logout** | 1 hour | Per logout | Cleans recent files on session end |
| **Manual Force** | All files | On demand | Admin cleanup endpoint |

## Client-Side Cleanup

### On Logout:

**localStorage cleared:**
- `access_token`
- `user_info`
- All other stored data

**sessionStorage cleared:**
- All temporary session data

**Application state reset:**
- `currentDocNumber`
- `currentTransactions`
- `authToken`
- `currentUser`

### On Page Refresh:

**Preserved:**
- `access_token` (user stays logged in)
- `user_info` (username and roles preserved)

**Cleared:**
- Nothing (allows seamless page refresh)

## Monitoring Cache

### Check Cache Size

```bash
# Via API
curl "http://localhost:8001/cache/stats" -H "Authorization: Bearer $TOKEN"

# Via filesystem
du -sh /tmp/aumentum_pdfs
ls -lh /tmp/aumentum_pdfs | wc -l
```

### View Server Logs

```bash
# Real-time monitoring
tail -f server.log | grep -E "🧹|🗑️|cache"

# Check cleanup history
grep "cache cleanup" server.log
```

## Benefits

✅ **Automatic Management** - No manual intervention needed
✅ **Space Efficient** - Old files automatically deleted
✅ **Performance** - Keeps cache directory manageable
✅ **Logout Cleanup** - Session data properly cleared
✅ **Configurable** - Adjust timing and age limits
✅ **Logged** - All cleanup actions logged for audit

## Troubleshooting

### Cache Not Cleaning

**Check if cleanup is running:**
```bash
grep "Running.*cache cleanup" server.log
```

**Check permissions:**
```bash
ls -ld /tmp/aumentum_pdfs
# Should be writable by server user
```

### Files Not Deleting

**Check file ages:**
```bash
find /tmp/aumentum_pdfs -name "*.pdf" -mtime +1 -ls
```

**Manual cleanup:**
```bash
find /tmp/aumentum_pdfs -name "*.pdf" -mtime +1 -delete
```

### Disk Space Issues

**Emergency cleanup (delete all cache):**
```bash
curl -X POST "http://localhost:8001/cache/cleanup?force=true" \
  -H "Authorization: Bearer $TOKEN"
```

**Or via filesystem:**
```bash
rm -f /tmp/aumentum_pdfs/*.pdf
```

## Best Practices

1. **Regular Monitoring**
   - Check cache stats weekly
   - Monitor disk space usage
   - Review cleanup logs

2. **Adjust Settings**
   - Increase max_age for slow networks
   - Decrease max_age for limited disk space
   - Adjust periodic frequency based on usage

3. **Production Deployment**
   - Use persistent cache directory (not /tmp)
   - Configure max_age based on usage patterns
   - Monitor cleanup logs
   - Set up disk space alerts

## Security

- ✅ **Authentication Required** - All cache endpoints require login
- ✅ **User-Specific Cleanup** - Tracks who triggered cleanup
- ✅ **Safe Deletion** - Only deletes from designated cache directory
- ✅ **Error Handling** - Continues on individual file errors
- ✅ **Logging** - All cleanup actions logged

## Performance Impact

- **Startup:** < 1 second for typical cache sizes
- **Periodic:** < 1 second every 6 hours
- **Logout:** < 0.5 seconds additional logout time
- **Manual:** Depends on cache size, typically < 2 seconds

## Future Enhancements

1. **Per-User Cache Quotas**
   - Limit cache size per user
   - Track user-specific cache usage

2. **Smart Caching**
   - Keep frequently accessed documents
   - LRU (Least Recently Used) eviction

3. **Cache Warming**
   - Pre-generate popular documents
   - Scheduled cache refresh

4. **Distributed Caching**
   - Redis or Memcached integration
   - Multi-server cache sharing

