S3-Compatible Object Storage
for AI and Machine Learning

Rabata is an S3 alternative built for storage-heavy AI workloads. Swap your endpoint URL and your PyTorch data loaders, TensorFlow pipelines, and Hugging Face datasets keep working — no SDK changes, no vendor lock-in.

1,462 MB/s Upload throughput
2.3× Faster than AWS (mixed)
$0 API request fees
30 days Free trial, no card

What AI Teams Store on Rabata

Training Datasets

Terabytes of labeled images, text corpora, audio — the usual. S3-compatible object storage means your data loading scripts stay unchanged. boto3, s3fs, and smart_open work out of the box.

Model Training Pipelines

Feed training jobs directly from AI storage. High-throughput reads keep GPUs busy during distributed training, and checkpointing writes back without becoming a bottleneck.

Embeddings and Feature Data

Vector embeddings, feature stores, preprocessed batches. Zero API request fees mean high-frequency reads during inference stay affordable no matter how often you hit the bucket.

Model Artifacts and Checkpoints

Version and archive trained models, ONNX exports, experiment logs. MLflow and DVC plug in through the same S3-compatible endpoint — one bucket for everything.

Hyperscaler object storage works for machine learning — technically. But per-request fees, throttling on small objects, and billing complexity make AI data storage harder to predict and more expensive than it needs to be. Three pain points keep coming up:

Massive Datasets, Massive Bills

AI training data grows fast. When every GET, PUT, and LIST request carries a fee, costs scale unpredictably — especially during multi-epoch training runs that re-read the same data thousands of times.

Pipeline Bottlenecks

GPUs sit idle when storage can't deliver data fast enough. Mixed workloads — simultaneous reads, writes, and list operations — expose performance gaps that headline bandwidth numbers hide.

Infrastructure Complexity

Managing IAM policies, cross-region replication, and tiered pricing across multiple services adds operational overhead that pulls ML engineers away from model development.

We ran MinIO warp v1.0.7 on identical hardware to compare Rabata against AWS S3 across the operations that matter for AI workloads. Mixed workload throughput — the metric closest to real training and inference patterns — is where the gap shows up most.

Workload Rabata AWS S3 Advantage
Upload throughput 1,462 MB/s 1,444 MB/s 1.01×
Download throughput 1,107 MB/s 1,816 MB/s AWS leads
Mixed operations 346 MB/s 151 MB/s 2.3×
Small objects (ops/s) 696 319 2.2×

Mixed operations and small-object handling reflect what actually happens during training: concurrent reads of batches, checkpoint writes, and metadata lookups running in parallel. Read the full benchmark methodology or compare all providers.

Rabata speaks the S3 API. Any tool that writes to S3 writes to Rabata — no custom SDKs, no proprietary clients. Change the endpoint URL and everything else stays the same.

PyTorch
PyTorch
TensorFlow
TensorFlow
Hugging Face
Hugging Face
MLflow
MLflow
Apache Spark
Apache Spark

Connect in Two Lines

import boto3

s3 = boto3.client("s3",
    endpoint_url="https://s3.rabata.io",
    aws_access_key_id="YOUR_ACCESS_KEY",
    aws_secret_access_key="YOUR_SECRET_KEY",
)

# Upload a training dataset
s3.upload_file("train.parquet", "my-ml-bucket", "datasets/v2/train.parquet")

# Stream directly into PyTorch DataLoader
# Use s3fs or smart_open — no code changes needed

Two tiers. No hidden fees. No per-request charges.

Backup Storage

$49
flat / 10 TB / month
  • Cold model archives and snapshots
  • Historical dataset versions
  • Zero API request fees
  • Same S3 API, same tooling

Zero API Request Fees — Really

Every GET, PUT, LIST, and DELETE is included. No per-request billing, no surprise line items. When your training job reads the same dataset 10,000 times across epochs, you pay only for the storage, not the access.

IAM Access Controls

Fine-grained permissions per bucket and per key. Grant read-only access to training jobs while restricting write access to your data pipeline.

Private by Default

Every bucket is private from creation. No accidental public exposure of proprietary training data or model weights.

SigV4 + HTTPS Encryption

All requests are authenticated with AWS Signature Version 4 and encrypted in transit over HTTPS. Data at rest is encrypted on our infrastructure.

EU and US Data Placement

Choose where your data lives. EU or US regional placement helps you meet GDPR requirements and data residency policies without third-party workarounds.

$100 000 free credits
for Gen-AI startups

Get $100 000 storage credits for free
to accelerate your generative AI startup

AI Storage FAQ

Is Rabata a drop-in S3 alternative for AI workloads?
Yes. Rabata implements the S3 API — boto3, s3fs, AWS CLI, and any S3-compatible SDK work without changes. PyTorch DataLoaders, TensorFlow tf.data, Hugging Face datasets, and MLflow artifact stores all connect by swapping the endpoint URL.
How does Rabata handle large training datasets?
Multipart uploads handle objects of any size, and we measured 1,462 MB/s upload throughput in independent benchmarks. No object count limits per bucket, no per-request fees — you can store and iterate over millions of training samples without the bill surprising you.
What does “zero API request fees” mean?
Every S3 API operation — GET, PUT, LIST, DELETE, HEAD, and multipart uploads — is included in your storage price. AWS S3, for example, charges $0.005 per 1,000 GET requests. For a training job that reads 10 million objects per epoch across 100 epochs, those request fees alone can exceed $5,000. On Rabata, that same workload costs $0 in request fees.
Can I use Rabata with PyTorch, TensorFlow, or Hugging Face?
Absolutely. Any framework that reads from S3 works with Rabata. PyTorch via s3fs or boto3, TensorFlow via tf.io.gfile with S3 endpoints, and Hugging Face datasets with the fsspec S3 backend. Just set the endpoint URL and credentials — no library patches or custom code.
How do I migrate existing AI data from AWS S3?
Use rclone sync, aws s3 sync, or any S3-compatible transfer tool. Point the destination at your Rabata endpoint and bucket. For large migrations, our team can help plan the transfer. See the migration guide for step-by-step instructions.
Is Rabata suitable for storing sensitive or proprietary AI models?
Yes. All buckets are private by default. Access is controlled through IAM with per-bucket and per-key permissions. Data is encrypted in transit (HTTPS + SigV4) and at rest. You can choose EU or US data placement to meet compliance requirements.
What about the $100,000 Gen-AI startup grant?
Rabata offers up to $100,000 in storage credits for qualifying generative AI startups. The credits apply to both Hot and Backup storage tiers. Apply for the grant to get started.
How does Rabata compare to AWS S3 for AI workloads?
In independent benchmarks, Rabata delivers 2.3× faster mixed operations (346 MB/s vs 151 MB/s) and 2.2× faster small-object handling (696 ops/s vs 319 ops/s). AWS leads in raw download speed with extreme concurrency. For AI workloads that involve concurrent reads, writes, and metadata operations, Rabata provides better balanced performance at roughly 70% lower cost with zero request fees. See the full comparison.

Try Rabata Object Storage for AI — Free for 30 Days

No credit card, no sales call. Sign up, create a bucket, and point your ML pipeline at it.