S3-Compatible Object Storage
for AI and Machine Learning
Rabata is an S3 alternative built for storage-heavy AI workloads. Swap your endpoint URL and your PyTorch data loaders, TensorFlow pipelines, and Hugging Face datasets keep working — no SDK changes, no vendor lock-in.
What AI Teams Store on Rabata
Training Datasets
Terabytes of labeled images, text corpora, audio — the usual. S3-compatible object storage means your data loading scripts stay unchanged. boto3, s3fs, and smart_open work out of the box.
Model Training Pipelines
Feed training jobs directly from AI storage. High-throughput reads keep GPUs busy during distributed training, and checkpointing writes back without becoming a bottleneck.
Embeddings and Feature Data
Vector embeddings, feature stores, preprocessed batches. Zero API request fees mean high-frequency reads during inference stay affordable no matter how often you hit the bucket.
Model Artifacts and Checkpoints
Version and archive trained models, ONNX exports, experiment logs. MLflow and DVC plug in through the same S3-compatible endpoint — one bucket for everything.
Hyperscaler object storage works for machine learning — technically. But per-request fees, throttling on small objects, and billing complexity make AI data storage harder to predict and more expensive than it needs to be. Three pain points keep coming up:
Massive Datasets, Massive Bills
AI training data grows fast. When every GET, PUT, and LIST request carries a fee, costs scale unpredictably — especially during multi-epoch training runs that re-read the same data thousands of times.
Pipeline Bottlenecks
GPUs sit idle when storage can't deliver data fast enough. Mixed workloads — simultaneous reads, writes, and list operations — expose performance gaps that headline bandwidth numbers hide.
Infrastructure Complexity
Managing IAM policies, cross-region replication, and tiered pricing across multiple services adds operational overhead that pulls ML engineers away from model development.
We ran MinIO warp v1.0.7 on identical hardware to compare Rabata against AWS S3 across the operations that matter for AI workloads. Mixed workload throughput — the metric closest to real training and inference patterns — is where the gap shows up most.
| Workload | Rabata | AWS S3 | Advantage |
|---|---|---|---|
| Upload throughput | 1,462 MB/s | 1,444 MB/s | 1.01× |
| Download throughput | 1,107 MB/s | 1,816 MB/s | AWS leads |
| Mixed operations | 346 MB/s | 151 MB/s | 2.3× |
| Small objects (ops/s) | 696 | 319 | 2.2× |
Mixed operations and small-object handling reflect what actually happens during training: concurrent reads of batches, checkpoint writes, and metadata lookups running in parallel. Read the full benchmark methodology or compare all providers.
Rabata speaks the S3 API. Any tool that writes to S3 writes to Rabata — no custom SDKs, no proprietary clients. Change the endpoint URL and everything else stays the same.
Connect in Two Lines
import boto3
s3 = boto3.client("s3",
endpoint_url="https://s3.rabata.io",
aws_access_key_id="YOUR_ACCESS_KEY",
aws_secret_access_key="YOUR_SECRET_KEY",
)
# Upload a training dataset
s3.upload_file("train.parquet", "my-ml-bucket", "datasets/v2/train.parquet")
# Stream directly into PyTorch DataLoader
# Use s3fs or smart_open — no code changes needed
Two tiers. No hidden fees. No per-request charges.
Hot Storage
- Active training data and models
- Low-latency reads for inference
- Zero API request fees
- Unlimited GET, PUT, LIST, DELETE
Backup Storage
- Cold model archives and snapshots
- Historical dataset versions
- Zero API request fees
- Same S3 API, same tooling
Zero API Request Fees — Really
Every GET, PUT, LIST, and DELETE is included. No per-request billing, no surprise line items. When your training job reads the same dataset 10,000 times across epochs, you pay only for the storage, not the access.
IAM Access Controls
Fine-grained permissions per bucket and per key. Grant read-only access to training jobs while restricting write access to your data pipeline.
Private by Default
Every bucket is private from creation. No accidental public exposure of proprietary training data or model weights.
SigV4 + HTTPS Encryption
All requests are authenticated with AWS Signature Version 4 and encrypted in transit over HTTPS. Data at rest is encrypted on our infrastructure.
EU and US Data Placement
Choose where your data lives. EU or US regional placement helps you meet GDPR requirements and data residency policies without third-party workarounds.
AI Storage FAQ
Is Rabata a drop-in S3 alternative for AI workloads?
How does Rabata handle large training datasets?
What does “zero API request fees” mean?
Can I use Rabata with PyTorch, TensorFlow, or Hugging Face?
How do I migrate existing AI data from AWS S3?
rclone sync, aws s3 sync, or any S3-compatible
transfer tool. Point the destination at your Rabata endpoint and bucket.
For large migrations, our team can help plan the transfer. See the
migration guide for step-by-step instructions.
Is Rabata suitable for storing sensitive or proprietary AI models?
What about the $100,000 Gen-AI startup grant?
How does Rabata compare to AWS S3 for AI workloads?
Try Rabata Object Storage for AI — Free for 30 Days
No credit card, no sales call. Sign up, create a bucket, and point your ML pipeline at it.