Eliminating the Storage Tax on Genomics, HPC, and AI Workloads Gary Planthaber, CTO

Every genomics workflow, HPC job, and AI agent pays a tax for the gap between the storage it has and the performance it needs. flexFS eliminates it — full filesystem performance, object storage economics, zero code changes.

flexFS is an object-native parallel filesystem that gives genomics, HPC, and AI workloads the performance of a high-end filesystem while running entirely over standard cloud object storage.

The tax shows up two ways. Teams running provisioned cloud filesystems pay in cost and scale limits: over-provisioned capacity, idle spend, and infrastructure that can’t keep up with demanding workloads. Teams whose data lives in object storage pay in performance: high-latency metadata that forces pipelines to wait on storage instead of compute. Either way, the gap between the storage layer and what workloads actually need is expensive.

flexFS eliminates the tax. A drop-in replacement for any filesystem — NFS, EFS, Lustre, Weka, and others — flexFS runs directly over object storage and delivers full POSIX semantics, sub-millisecond metadata access, and parallel byte-range I/O with no application code changes. Teams keep the economics of object storage and gain performance that provisioned cloud filesystems can’t match, at 50–80% lower cost.

The Problems We Solve

Cloud bioinformatics pipelines and HPC scratch space: Nextflow, Snakemake, and WDL pipelines require true POSIX semantics — atomic operations, low-latency metadata, and elastic scratch space scaling with the job. flexFS delivers all of this directly over object storage, without the cost and complexity of managing a parallel filesystem cluster. Genomics workflows including alignment, variant calling, and single-cell analysis have cut time-to-results and increased analytical throughput — enabling teams to scale within the time windows their business demands.
Large imaging repositories: CryoEM, digital pathology, DICOM radiology, and other large-scale imaging workflows generate massive repositories of files overwhelming conventional cloud filesystems in both scale and throughput. flexFS handles hundreds of millions of files and folders at petabyte scale, delivering the high aggregate throughput image analysis pipelines require — without reprovisioning, without downtime, and without budget overruns.
Research Data Commons and shared scientific computing: Life sciences organizations managing petabyte-scale repositories of clinical and research data need a single, unified namespace accessible by both HPC clusters and interactive computing environments. flexFS provides that — replacing self-managed NFS and over-provisioned cloud filesystems with a high-performance shared fabric that eliminates reprovisioning downtime and dramatically reduces administrative overhead.
Data lakehouse accelerator: Spark, Presto, and other analytics engines need direct access to lakehouse datasets without traditional cloud object I/O bottlenecks. Using flexFS as a transparent, high-performance filesystem layer between Open Table Formats such as Apache Iceberg and underlying object storage eliminates those bottlenecks without requiring changes to the execution engine or data formats — achieving up to 7x faster query computation on benchmarked TPC-H workloads versus Spark-direct-to-S3.
AI and ML model training: Life sciences AI workloads — drug discovery models, protein structure prediction, multi-modal clinical datasets — demand storage that can keep GPU clusters fed without stalling. flexFS delivers 2x the throughput of native S3 without a cache layer, using object-native parallelism to saturate H100 and B200 clusters. Checkpoint saves that previously took minutes complete in seconds.
Agentic AI workspace and persistent memory: Autonomous agents running multi-step genomics and drug discovery workflows need more than a bucket — they need a real filesystem. flexFS gives agents a POSIX-compliant scratchpad backed by object storage: fast enough for multi-hop reasoning (sub-millisecond metadata, NVMe-speed warm data), large enough for full multi-modal datasets, and persistent across sessions. Agents share file paths instead of copying data, read only the byte ranges they need, and save intermediate results directly to those shared paths — without the latency or cost of loading context into an LLM.

Built for Life Sciences

flexFS is purpose-built for the security and compliance requirements of life sciences. Sensitive genomic and clinical datasets are protected with continuous data protection, point-in-time recovery, immutable data retention, and end-to-end encryption — keys never leave the host. At 50–80% lower cost than provisioned cloud filesystems, teams pay only for what they use.

“We built flexFS because we were paying the storage tax ourselves. Every team we talk to at top-10 pharmas and leading biotechs is paying it too — in GPU idle time, stalled workflows, and over-provisioned infrastructure. Now they don’t have to.”

— Gary Planthaber, CTO, Paradigm4

About Paradigm4

Paradigm4 makes flexFS, an object-native parallel filesystem that lets genomics, HPC, and AI workloads run directly over cloud object storage at full filesystem performance. Customers in life sciences, earth observation, and enterprise AI use flexFS to eliminate I/O bottlenecks without changing their infrastructure or application code. Life sciences customers include multiple top-10 pharmas and leading biotechs.

Start a free trial: www.flexfs.io | lifesciences@paradigm4.com