# 🔍 Investigation: Is This a Sorting/Fetching Problem?

## 🎯 **User's Excellent Insight**

Instead of assuming the database has wrong associations, what if OUR CODE is fetching/sorting files incorrectly?

---

## 🚨 **Potential Issues Found**

### **Issue 1: ORDER BY content_url (Line 867)**

```sql
ORDER BY cu.content_url
```

**Problem**: Ordering by URL string might cause alphabetical sorting that shifts everything!

**Example**:
```
If we have:
- store://2015/3/17/... (should be for PL689)
- store://2015/3/26/... (should be for PL11089)

Alphabetically:
- store://2015/3/17/... comes FIRST (month 3, day 17)
- store://2015/3/26/... comes SECOND (month 3, day 26)

But if PL11089 is queried first, it might get the FIRST url (which belongs to PL689)!
```

---

### **Issue 2: No Document ID Filtering**

```python
# Line 852-868: Query gets ALL images for document_number
# BUT doesn't filter by document_id!

cursor.execute("""
    SELECT DISTINCT ...
    FROM LRSAdmin.alf_node_properties np
    ...
    WHERE ... np.string_value = ?  # Only filters by document_number
    AND q.local_name IN ('targetRids','sourceRids')
    ORDER BY cu.content_url  # Then sorts alphabetically!
""", (document_number,))
```

**Problem**: When PL11089 has 3 document IDs, this query returns images for ALL 3, then sorts them. Each document ID might be getting the wrong images from this sorted list!

---

### **Issue 3: Duplicate Entries from targetRids AND sourceRids**

```sql
q.local_name IN ('targetRids','sourceRids')
```

**Problem**: Same file might appear TWICE (once as targetRids, once as sourceRids). The `SELECT DISTINCT` should handle this, but combined with ORDER BY, the order might be unpredictable.

---

### **Issue 4: All Document IDs Share Same Images**

```python
# Line 894-895: For EACH document ID
for doc_id, doc_type, page_count, issued_by, create_date in docs:
    actual_images = db_images.copy()  # They all get the SAME images!
```

**Problem**: If PL11089 has 3 document IDs (History Card, Property File, Land Form), they all get assigned the SAME images from `db_images`. But these images aren't filtered by which document_id they actually belong to!

---

## 🧪 **Test This Theory**

### **Hypothesis:**
The alphabetical sorting of URLs is causing a +1 shift:
```
Documents (in order): PL11089, PL689, BP102, PL6204, PL12321
URLs (alphabetically): URL_A, URL_B, URL_C, URL_D, URL_E

Current (WRONG):
  PL11089 gets URL_B (which actually belongs to PL689)
  PL689 gets URL_C (which actually belongs to BP102)
  BP102 gets URL_D (which actually belongs to PL6204)
  etc.

Should be:
  PL11089 gets URL_A
  PL689 gets URL_B
  BP102 gets URL_C
  etc.
```

### **How to Test:**

1. **Check the actual database without any sorting**:
```sql
-- For PL11089, what content_url is in the database?
SELECT 
    np.string_value as doc_number,
    cu.content_url,
    n.audit_created,
    cd.content_url_id
FROM LRSAdmin.alf_node_properties np
JOIN LRSAdmin.alf_qname q ON q.id = np.qname_id
JOIN LRSAdmin.alf_node n ON n.id = np.node_id
JOIN LRSAdmin.alf_content_data cd ON cd.id = n.id
JOIN LRSAdmin.alf_content_url cu ON cu.id = cd.content_url_id
WHERE np.string_value IN ('PL11089', 'PL689', 'BP102', 'PL6204', 'PL12321')
AND q.local_name IN ('targetRids','sourceRids')
ORDER BY np.string_value, cu.content_url;  -- See how they're ordered
```

2. **Check if removal of ORDER BY fixes it**:
Remove `ORDER BY cu.content_url` and see if documents show correct content.

3. **Check creation timestamps**:
```sql
-- Are the documents created in the same order as the URLs?
SELECT 
    np.string_value,
    n.audit_created,
    cu.content_url
FROM ...
ORDER BY n.audit_created;  -- Order by creation time instead
```

---

## 🔧 **Potential Fixes**

### **Fix 1: Remove ORDER BY or Change It**

```python
# Instead of:
ORDER BY cu.content_url

# Try:
ORDER BY n.audit_created  # Order by creation time
# or
ORDER BY n.id  # Order by node ID
# or
# Remove ORDER BY entirely (use natural database order)
```

### **Fix 2: Add Document ID Filtering**

The query should JOIN to lr_source_document and filter by document_id:

```sql
SELECT DISTINCT
    n.id AS node_id,
    n.uuid,
    cu.content_url,
    cu.content_size,
    mt.mimetype_str,
    sd.id AS source_document_id  -- Add this
FROM LRSAdmin.alf_node_properties np
JOIN LRSAdmin.alf_qname q ON q.id = np.qname_id
JOIN LRSAdmin.alf_node n ON n.id = np.node_id AND n.node_deleted = 0
JOIN LRSAdmin.alf_content_data cd ON cd.id = n.id
JOIN LRSAdmin.alf_content_url cu ON cu.id = cd.content_url_id
LEFT JOIN LRSAdmin.alf_mimetype mt ON mt.id = cd.content_mimetype_id
-- ADD: Join to source document to link images to specific document IDs
LEFT JOIN LRSAdmin.lr_source_document sd ON sd.document_number = np.string_value
WHERE RTRIM(LTRIM(np.string_value)) COLLATE Latin1_General_BIN = ?
AND q.local_name IN ('targetRids','sourceRids')
ORDER BY sd.id, cu.content_url;  -- Order by document ID first
```

### **Fix 3: Don't Use DISTINCT with Multiple Property Types**

Instead of:
```sql
WHERE q.local_name IN ('targetRids','sourceRids')
```

Query them separately or use just one:
```sql
WHERE q.local_name = 'sourceRids'  -- Or targetRids, pick one
```

---

## 🧪 **Quick Test**

Let's create a test query to see the actual database ordering:

