# Aumentum Storage Structure - Visual Guide

## 🎯 The Simple Truth

**When you upload multiple files for the same document number at different times, they go into different folders based on WHEN they were uploaded, not WHAT document they belong to.**

---

## 📁 Real Example: Your PL21825

### What You Did

```
09:18 AM - Uploaded Type 103 (50 pages)
09:25 AM - Uploaded Type 127 (2 pages)
09:29 AM - Uploaded Type 126 (2 pages)
```

### Where They Went

```
/contentstore/
    └── 2025/
        └── 11/
            └── 4/
                ├── 9/              ← Hour 9 (9:00 AM)
                │   └── 15/         ← Minute 15 (9:15 AM)
                │       └── [9 .bin files for Type 103]
                │
                └── 10/             ← Hour 10 (10:00 AM)
                    ├── 1/          ← Minute 1 (10:01 AM)
                    │   └── [39 .bin files for Type 127]
                    │
                    └── 4/          ← Minute 4 (10:04 AM)
                        └── [8 .bin files for Type 126]
```

**Total: 56 files across 3 different directories for ONE document number!**

---

## 🔄 How The System Processes This

### Step 1: Upload

```
User Action: "Upload PL21825, Type 103"
    ↓
System: "What time is it? 09:15"
    ↓
System: "Create folder 2025/11/4/9/15/"
    ↓
System: "Save 50 pages as UUID.bin files in that folder"
```

### Step 2: Database Entry

```
Create record in lr_source_document:
    document_number: PL21825
    document_type: 103
    page_count: 50
    create_date: 2025-11-04 09:18:30
```

### Step 3: Linking (Should happen, but delayed for your case)

```
Create Alfresco node:
    node_id: 2443208
    uuid: 46974fd7-af5d-4e1d-9719-3b63d0a2542b

Link to document:
    property: targetRids
    value: "PL21825"

Link to files:
    content_url: store://2025/11/4/9/15/file1.bin
    content_url: store://2025/11/4/9/15/file2.bin
    ... (50 files total)
```

---

## 🔍 How Searching Works

### When Someone Searches "PL21825"

```
Step 1: Database Query
    "Find all nodes where targetRids = 'PL21825'"
    
    Result: Node 2443208

Step 2: Get File Locations
    "What files does Node 2443208 point to?"
    
    Result:
        store://2025/11/4/9/15/uuid1.bin
        store://2025/11/4/9/15/uuid2.bin
        ...
        store://2025/11/4/10/1/uuid50.bin
        ...
        store://2025/11/4/10/4/uuid54.bin

Step 3: Convert to Filesystem Paths
    store://2025/11/4/9/15/uuid1.bin
        ↓
    /contentstore/2025/11/4/9/15/uuid1.bin

Step 4: Convert Each .bin to PDF
    Read JPEG from .bin file
    Convert to PDF
    Combine all PDFs into one document
    
Step 5: Display to User
    "Here's your PL21825 document (54 pages)"
```

**The user never knows the files are in different folders!**

---

## 💡 Key Concepts

### 1. Directory = Time, Not Document

```
❌ WRONG: Document PL21825 goes in folder "PL21825/"
✅ RIGHT: Document PL21825 uploaded at 9:15 goes in folder "2025/11/4/9/15/"
```

### 2. One Document Number → Many Directories

```
Document Number: PL21825
    ↓
Multiple Uploads (at different times)
    ↓
Multiple Directories (one per upload time)
    ↓
System combines them when you search
```

### 3. Directory Structure is Always: YYYY/MM/DD/HH/MM

```
Year     = 2025
Month    = 11
Day      = 4
Hour     = 9 (or 10)
Minute   = 15 (or 1, or 4)

Path = 2025/11/4/9/15/
```

### 4. The Database Links Everything Together

```
Document Number "PL21825"
    ↓ (stored in alf_node_properties)
Node ID 2443208
    ↓ (stored in alf_content_url)
Store URLs: store://YYYY/MM/DD/HH/MM/UUID.bin
    ↓ (map to filesystem)
Physical Files: /contentstore/YYYY/MM/DD/HH/MM/UUID.bin
```

---

## 📊 Comparison: Multiple Document Numbers vs Same Document Number

### Scenario A: Different Documents Uploaded at Same Time

```
9:15 AM - Upload PL21825 (50 pages)
9:15 AM - Upload PL21826 (30 pages)
9:15 AM - Upload PL21827 (20 pages)

Storage:
    2025/11/4/9/15/
        ├── file1.bin (PL21825)
        ├── file2.bin (PL21825)
        ├── ...
        ├── file50.bin (PL21825)
        ├── file51.bin (PL21826)
        ├── ...
        ├── file80.bin (PL21826)
        ├── file81.bin (PL21827)
        └── file100.bin (PL21827)

ALL 100 FILES IN SAME DIRECTORY!
```

### Scenario B: Same Document Uploaded at Different Times (YOUR CASE)

```
9:15 AM - Upload PL21825 Type 103 (50 pages)
10:01 AM - Upload PL21825 Type 127 (2 pages)
10:04 AM - Upload PL21825 Type 126 (2 pages)

Storage:
    2025/11/4/9/15/
        ├── file1.bin (PL21825 Type 103)
        ├── ...
        └── file50.bin (PL21825 Type 103)
    
    2025/11/4/10/1/
        ├── file51.bin (PL21825 Type 127)
        └── file52.bin (PL21825 Type 127)
    
    2025/11/4/10/4/
        ├── file53.bin (PL21825 Type 126)
        └── file54.bin (PL21825 Type 126)

SAME DOCUMENT, 3 DIFFERENT DIRECTORIES!
```

---

## 🎯 Why This Design?

### Advantages

1. **Temporal Organization**
   - Easy to find files by date/time
   - Helps with backup and archival
   - Performance: prevents huge directories

2. **Scalability**
   - No single directory gets too large
   - Distributed storage is possible
   - Easy to partition by time

3. **Debugging**
   - Can trace uploads by time
   - Upload problems isolated by timeframe
   - Audit trail built into structure

### Trade-offs

1. **One document can span multiple directories**
   - System must query database to find all pieces
   - Can't simply look in one folder

2. **Requires robust database linking**
   - If database link breaks, files become orphaned
   - Must maintain database integrity

---

## 🚨 Current Situation with PL21825

### What's Working

```
✅ Files uploaded to filesystem (56 files in 3 directories)
✅ lr_source_document records created (3 records)
✅ alf_node created (Node 2443208)
✅ alf_node_properties created (PL21825 → Node 2443208)
```

### What's Missing

```
❌ alf_content_data not populated
❌ alf_content_url entries not created
❌ Node 2443208 doesn't point to any files yet
```

### Result

```
Files exist: ✅
Database knows about PL21825: ✅
Database knows which files belong to PL21825: ❌

Status: INDEXING INCOMPLETE
```

**This is why you can't access the document yet in WebAccess!**

---

## 📋 Summary

### Your Theory: ✅ CONFIRMED

```
"Multiple files for same document go to different directories 
 based on upload time"
    
    → 100% CORRECT!
```

### How It Works

1. **Directory = Upload Time** (YYYY/MM/DD/HH/MM)
2. **Same document at different times = Different directories**
3. **Database links them all together via document_number**
4. **Search queries database, not filesystem**
5. **System transparently combines files from all directories**

### Your PL21825 Specifically

- ✅ Correctly stored in 3 time-based directories
- ❌ Database linking not complete yet
- ⏳ Waiting for indexing/linking process to finish

---

## 🎓 Teaching Example

If you were explaining this to someone new:

```
"Think of the contentstore as a library organized by TIME, 
 not by BOOK TITLE.
 
 Books arrive throughout the day and get shelved based on 
 when they arrived.
 
 To find all chapters of a book, you don't look in one place.
 You look in the card catalog (database), which tells you:
 
 'Chapter 1-50: Shelf 9:15'
 'Chapter 51-52: Shelf 10:01'
 'Chapter 53-54: Shelf 10:04'
 
 Then you collect all chapters and bind them together."
```

That's exactly how Aumentum works! 📚