# 🔍 Finding Correct File URLs for All Documents

## 🎯 **Goal**

Create a complete mapping of which store:// URLs actually contain which document's content.

---

## 📊 **What We Know So Far**

From manual PDF verification:

| File URL (store://) | Labeled As (DB) | Actually Contains |
|---------------------|----------------|-------------------|
| `store://2015/3/26/15/8/3eee6f3f-0b98-41b9-a6cb-2c4488152fed.bin` | PL11089 | **PL689** content ✅ Verified |
| `store://2015/3/17/10/10/879dcd53-f552-4e82-858f-7e868e60a275.bin` | PL689 | **BP102** content ✅ Verified |
| ??? | BP102 | **PL6204** content ✅ Verified (need URL) |
| ??? | PL6204 | **PL12321** content ✅ Verified (need URL) |
| ??? | ??? | **PL11089** content ❓ NOT FOUND |

---

## 🔧 **Step-by-Step Process to Complete the Mapping**

### **Step 1: Get URLs for BP102 and PL6204 from Diagnostic**

From earlier diagnostic output (lines 456-558 of terminal selection), we have:

```
PL11089 (node 823587):
  URL: store://2015/3/26/15/8/3eee6f3f-0b98-41b9-a6cb-2c4488152fed.bin
  
PL689 (node 729874):
  URL: store://2015/3/17/10/10/879dcd53-f552-4e82-858f-7e868e60a275.bin
```

**But we need URLs for BP102 and PL6204!**

### **Command to Get Missing URLs:**

```bash
# Create API endpoint to show raw URLs
curl "http://localhost:8001/documents/by-document-number?document_number=BP102" | jq
curl "http://localhost:8001/documents/by-document-number?document_number=PL6204" | jq
```

But this won't show the URLs directly. We need to query the database or use a debug endpoint.

---

### **Step 2: Create Debug Endpoint to Show URLs**

Add this to `aumentum_api.py`:

```python
@app.get("/debug/show-urls")
async def show_urls_for_document(
    document_number: str = Query(..., description="Document number")
):
    """Show actual store URLs for a document from database"""
    try:
        conn = service._get_db_connection()
        cursor = conn.cursor()
        
        cursor.execute("""
            SELECT DISTINCT
                np.string_value,
                n.id AS node_id,
                cu.content_url,
                cu.content_size
            FROM LRSAdmin.alf_node_properties np
            JOIN LRSAdmin.alf_qname q ON q.id = np.qname_id
            JOIN LRSAdmin.alf_node n ON n.id = np.node_id AND n.node_deleted = 0
            LEFT JOIN LRSAdmin.alf_content_data cd ON cd.id = n.id
            LEFT JOIN LRSAdmin.alf_content_url cu ON cu.id = cd.content_url_id
            WHERE RTRIM(LTRIM(np.string_value)) COLLATE Latin1_General_BIN = ? COLLATE Latin1_General_BIN
            AND q.local_name IN ('targetRids','sourceRids')
            ORDER BY n.id
        """, (document_number,))
        
        cols = [d[0] for d in cursor.description]
        rows = cursor.fetchall()
        
        results = [dict(zip(cols, row)) for row in rows]
        
        cursor.close()
        conn.close()
        
        return {
            "document_number": document_number,
            "urls": results
        }
        
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))
```

---

### **Step 3: Get URLs for All Problem Documents**

Once the debug endpoint is added, run:

```bash
for doc in PL11089 PL689 BP102 PL6204 PL12321; do
    echo "=== $doc ==="
    curl "http://localhost:8001/debug/show-urls?document_number=$doc" | jq
    echo ""
done
```

This will show the actual store:// URLs from the database.

---

### **Step 4: Build Complete Mapping Table**

| Document | DB Says Use This URL | URL Actually Contains | Correct URL for Document |
|----------|---------------------|----------------------|-------------------------|
| PL11089 | `store://2015/3/26/.../3eee6f3f...fed.bin` | PL689 | **UNKNOWN** |
| PL689 | `store://2015/3/17/.../879dcd53...275.bin` | BP102 | `store://2015/3/26/.../3eee6f3f...fed.bin` |
| BP102 | ??? | PL6204 | `store://2015/3/17/.../879dcd53...275.bin` |
| PL6204 | ??? | PL12321 | ??? |
| PL12321 | NO URL (0 images) | N/A | ??? |

---

## 🚀 **Quick Solution: Add Debug Endpoint**

I'll add the debug endpoint to the API right now:

