Skip to main content

S3 Storage Strategy

This document describes the S3 storage architecture for the Tellus EHS platform, following industry best practices for multi-tenant SaaS applications.

Overview

The platform uses a single S3 bucket per environment with key prefixes for organization and tenant isolation. This approach is the industry standard for multi-tenant applications.

Why Single Bucket with Prefixes?

FactorMultiple BucketsSingle Bucket + Prefixes
AWS LimitsDefault 100 bucket limit per accountUnlimited prefixes
IAM ComplexitySeparate policies per bucketSingle policy with prefix conditions
CostSame (per-GB pricing)Same
PerformanceNo differenceS3 scales infinitely with prefixes
OperationsComplex backup/replicationSimple single-bucket operations
MonitoringMultiple metrics streamsUnified metrics with prefix filtering

Bucket Naming

Environment-specific buckets ensure complete isolation between environments:

tellus-ehs-documents-dev      # Development
tellus-ehs-documents-staging # Staging
tellus-ehs-documents-prod # Production

Directory Structure

tellus-ehs-documents-{env}/

├── chemiq/
│ │
│ ├── product-images/ # User-uploaded label photos
│ │ └── {company_id}/
│ │ └── {product_id}/
│ │ └── {image_id}.jpg
│ │
│ ├── sds/
│ │ ├── global/ # Shared SDS repository (verified)
│ │ │ └── {sds_id}/
│ │ │ └── {revision_date}_{checksum}.pdf
│ │ │
│ │ └── company/ # Company-uploaded (unverified)
│ │ └── {company_id}/
│ │ └── {sds_id}_{checksum}.pdf
│ │
│ └── epa-labels/
│ ├── global/ # EPA label repository (verified)
│ │ └── {label_id}/
│ │ └── {version}_{checksum}.pdf
│ │
│ └── company/ # Company-uploaded
│ └── {company_id}/
│ └── {label_id}_{checksum}.pdf

└── temp/ # Short-lived processing files
└── uploads/
└── {company_id}/
└── {upload_id}/
└── {original_filename}

Key Patterns

Product Images

chemiq/product-images/{company_id}/{product_id}/{image_id}.jpg
  • Use case: Label photos uploaded during inventory entry
  • Isolation: By company_id prefix
  • Deduplication: Via file_checksum in database (not filename)

SDS Documents (Global)

chemiq/sds/global/{sds_id}/{revision_date}_{checksum}.pdf
  • Use case: Verified SDS documents shared across all companies
  • Versioning: revision_date in path supports multiple versions
  • Deduplication: Checksum in filename prevents duplicates

SDS Documents (Company)

chemiq/sds/company/{company_id}/{sds_id}_{checksum}.pdf
  • Use case: Company-uploaded SDSs before verification
  • Isolation: Full company_id isolation
  • Workflow: Can be promoted to global after verification

EPA Labels (Global)

chemiq/epa-labels/global/{label_id}/{version}_{checksum}.pdf
  • Use case: EPA label repository (from EPA PPLS API or verified uploads)
  • Versioning: Label version in path

EPA Labels (Company)

chemiq/epa-labels/company/{company_id}/{label_id}_{checksum}.pdf
  • Use case: Company-uploaded labels before verification

Temporary Files

temp/uploads/{company_id}/{upload_id}/{filename}
  • Use case: Processing uploads before final storage
  • Lifecycle: Auto-deleted after 24 hours via S3 lifecycle rule

Configuration

Python Configuration

The S3 configuration is centralized in app/core/s3_config.py:

from app.core.s3_config import (
S3_BUCKET,
S3Prefixes,
S3KeyGenerator,
S3PresignedConfig
)

# Generate a product image key
key = S3KeyGenerator.product_image(
company_id="abc123",
product_id="prod456",
image_id="img789",
extension="jpg"
)
# Result: chemiq/product-images/abc123/prod456/img789.jpg

# Generate an SDS key
key = S3KeyGenerator.sds_global(
sds_id="sds123",
revision_date="2024-01-15",
checksum="a1b2c3d4e5f6"
)
# Result: chemiq/sds/global/sds123/2024-01-15_a1b2c3d4e5f6.pdf

Environment Variables

# Optional: Override the bucket name
S3_BUCKET=custom-bucket-name

# Required: AWS credentials (or use IAM roles)
AWS_ACCESS_KEY_ID=xxx
AWS_SECRET_ACCESS_KEY=xxx
AWS_REGION=us-east-1

Security

IAM Policy Example

Company-scoped access (for direct S3 access if needed):

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::tellus-ehs-documents-prod/chemiq/*/company/{company_id}/*",
"arn:aws:s3:::tellus-ehs-documents-prod/chemiq/product-images/{company_id}/*"
]
},
{
"Effect": "Allow",
"Action": ["s3:GetObject"],
"Resource": [
"arn:aws:s3:::tellus-ehs-documents-prod/chemiq/sds/global/*",
"arn:aws:s3:::tellus-ehs-documents-prod/chemiq/epa-labels/global/*"
]
}
]
}

Access Patterns

Document TypeWrite AccessRead Access
Product ImagesOwn company onlyOwn company only
SDS (Global)Admin/SystemAll companies
SDS (Company)Own company onlyOwn company only
EPA Labels (Global)Admin/SystemAll companies
EPA Labels (Company)Own company onlyOwn company only
Temp FilesOwn company onlyOwn company only

Presigned URLs

All frontend access uses presigned URLs generated by the backend:

from app.core.s3_config import S3PresignedConfig

# Default expiration times
S3PresignedConfig.PRODUCT_IMAGE_EXPIRY = 3600 # 1 hour
S3PresignedConfig.SDS_DOCUMENT_EXPIRY = 3600 # 1 hour
S3PresignedConfig.UPLOAD_URL_EXPIRY = 900 # 15 minutes

S3 Bucket Configuration

  1. Versioning: Enabled (for audit trail and accidental deletion recovery)
  2. Encryption: SSE-S3 or SSE-KMS (at-rest encryption)
  3. Public Access: Blocked (all access via presigned URLs)
  4. CORS: Configured for frontend domain

Lifecycle Rules

{
"Rules": [
{
"ID": "DeleteTempFiles",
"Status": "Enabled",
"Filter": {
"Prefix": "temp/"
},
"Expiration": {
"Days": 1
}
},
{
"ID": "TransitionOldVersions",
"Status": "Enabled",
"NoncurrentVersionExpiration": {
"NoncurrentDays": 90
}
}
]
}

CORS Configuration

[
{
"AllowedHeaders": ["*"],
"AllowedMethods": ["GET", "PUT", "POST"],
"AllowedOrigins": [
"https://app.tellus-ehs.com",
"https://staging.tellus-ehs.com",
"http://localhost:5174"
],
"ExposeHeaders": ["ETag"],
"MaxAgeSeconds": 3600
}
]

Cost Optimization

  1. Storage Classes:

    • Standard: Active documents (< 30 days)
    • Intelligent-Tiering: SDSs and EPA labels (automatically optimized)
  2. Cost Allocation Tags:

    • company_id: For per-customer cost tracking
    • document_type: For cost analysis by document type
  3. Request Optimization:

    • Use multipart uploads for files > 5MB
    • Implement client-side caching with presigned URL expiry

Monitoring

CloudWatch Metrics to Track

  • BucketSizeBytes - Total storage
  • NumberOfObjects - Object count
  • 4xxErrors - Client errors (access denied, etc.)
  • 5xxErrors - Server errors

Alerts

  1. Storage threshold: Alert when bucket size exceeds threshold
  2. Error rate: Alert on elevated 4xx/5xx error rates
  3. Unusual access patterns: Alert on access from unexpected regions

Migration Notes

If migrating from a different structure:

  1. Create new keys following the pattern
  2. Copy objects to new locations
  3. Update database references (s3_key columns)
  4. Delete old objects after verification
  5. Update lifecycle rules
  • Configuration: tellus-ehs-hazcom-service/app/core/s3_config.py
  • Product Image Service: tellus-ehs-hazcom-service/app/services/chemiq/product_image_service.py
  • Database Schema: ChemIQ Schema (Table 4: chemiq_product_images)