June 2025 – Mike Stanley

Cloud Rewind at Cloud Field Day 23: Resilience as Code, Without the Ritual

At Cloud Field Day 23, I sat in on Commvault’s presentation of their Cloud Rewind solution—formerly Appranix—and I’ll admit, I came in skeptical. I’ve been doing ops and architecture work for nearly three decades, and I’ve seen a lot of “reinvented” DR solutions that promise to reduce downtime, complexity, and cost. But most of them just swap one kind of management overhead for another.

Cloud Rewind felt different. Not just because of the marketing (though there was plenty of that), but because the architecture—and the intent behind it—actually addressed some very real problems I’ve experienced firsthand.

Rethinking DR for the Cloud-Native World

The central problem Commvault tackled here is that most cloud environments today are too dynamic, too distributed, and frankly, too chaotic for traditional DR approaches to keep up. The Commvault team walked through a production AWS deployment spanning multiple availability zones, with a massive sprawl of RDS instances, load balancers, and ephemeral services—exactly the kind of complexity that keeps cloud architects up at night.

In environments like that, idle recovery setups—the standard disaster recovery standby—aren’t just wasteful. They drift. They rot. And in a ransomware event, they’re often just as compromised as production. The Commvault team flat-out said it: “Idle recovery environments drift away from production.” That alone should make any ops lead pause.

Cloud Rewind’s answer is to automate recovery from the inside out. Think of it as recovery-as-code, with snapshots of not just data, but also configurations, infrastructure dependencies, and policies. It’s not Terraform (though they integrate with it), and it’s not just backup—it’s a “cloud time machine” that builds you a clean room environment from known-good states and then lets you decide how, when, and where to cut back over to production.

They call this approach “Recovery Escort”—an odd name, honestly, but a great idea. Instead of juggling team-specific runbooks in the middle of a crisis, Cloud Rewind creates a single, orchestrated, infrastructure-as-code-based recovery plan. One workflow. One click. Done. And it’s not a copy-paste of yesterday’s environment—it uses continuous discovery to track configuration and application drift so you’re always recovering to something real. That’s what impressed me most: they’re not assuming your documentation is up to date. They know it isn’t, and they’re building around that.

CFD CloudRewind.

Security and Simplicity in Tandem

One feature that stood out—especially with ransomware scenarios top of mind—was their support for what they call Cleanroom Recovery. You can spin up an isolated clone of your environment, run scans, validate app behavior, and confirm you’re not just recovering the malware along with the data. That level of forensic flexibility isn’t just a nice-to-have; it’s a practical necessity. Because the minute you cut back over, you want confidence that what you’ve recovered is actually usable—and uncompromised.

And the broader idea here is that DR shouldn’t be an awkward ritual. Most tooling assumes recovery is rare, complex, and terrifying—something you test once a year (maybe) and dread every time. But Cloud Rewind flips that: what if recovery were fast enough to test weekly? What if it were just part of your CI/CD pipeline? One customer story shared claimed recovery tests that used to take three days and dozens of people now complete in 32 minutes. If true, that’s awesome. That’s the kind of muscle memory every cloud org needs—and the only way to get there is through automation.

Final Thoughts

I’ve spent much of my career trying to protect environments that I could barely map out on a whiteboard. Cloud Rewind feels like a tool built by people who’ve lived that pain. Is it perfect? No. Does it still feel like a premium play? Sure. But if you care about recovery time, reproducibility, or even just reducing the number of sleepless nights when your phone buzzes at 2am, this is worth a serious look.

There’s a lot more under the hood than I’ve captured here—cross-region replication, policy-based orchestration, integration with AWS and Azure backup tools—but the big takeaway is this: Cloud Rewind shifts DR from a fire drill to a workflow. And that’s exactly the kind of evolution cloud resilience needs.

My one regret about this session? The delegates were so engaged, digging into details, that we ran out the clock before the live demo could be run. Tim Zonca from Commvault did offer to arrange a demo for those interested at another time. I might just take him up on that.

To watch the video of the #CFD23 presentation by Commvault on Cloud Rewind, go to the Tech Field Day’s YouTube channel.

Clumio at Cloud Field Day 23: Backups That Belong in the Cloud – But Not Too Close

I’ve long believed that backup isn’t just a checkbox—it’s a strategy. So when Clumio took the virtual stage at Cloud Field Day 23, I was curious to see how a backup-as-a-service vendor could stand out in a space where “cloud-native” is often more branding than architecture. What I saw was a team leaning hard into the AWS ecosystem—not just running on AWS, but with it, using native services like EventBridge and Lambda to build a scalable, serverless platform designed for automation.

Clumio, acquired by Commvault in 2024, now serves as the cloud-native backbone of Commvault’s broader data protection strategy. It’s their answer for teams that live inside AWS and don’t want to manage backup infrastructure—but still need performance, isolation, and scale.

Clumio isn’t trying to reinvent backup. It’s not asking you to forklift your data to some third-party silo or wrap agents around every workload you touch. Instead, it’s offering a clean and efficient way to protect AWS-native data, from S3 and EBS to DynamoDB and RDS, without managing infrastructure, hardware, or even backup windows.

What makes Clumio interesting isn’t just that it backs up your cloud—it’s how it separates your backups from your primary environment without leaving the cloud. That “logical air gap” model, paired with strong performance claims and aggressive pricing, adds up to a platform that doesn’t try to be everything—but does a few critical things very well.

Air Gaps, Automation, and Architecture That Makes Sense

Clumio’s “logical air gap” is a key part of its security and resilience story. Your backups live in a completely separate AWS account, managed by Clumio, dedicated to your organization, and inaccessible without explicit authorization. It’s not a physical gap—this isn’t the old-school offline tape model—but it is real isolation. A compromised IAM role in your production account doesn’t give attackers access to your backup data.

What’s especially compelling is how Clumio achieves this without requiring agents, proxies, or any backup infrastructure in your account. After an initial CloudFormation or Terraform deployment, everything else runs on Clumio’s side—fully serverless, driven by EventBridge, and orchestrated with Lambda. It’s a clean, automated pipeline that’s responsive to changes and designed to scale without babysitting.

This isn’t just about elegance. It’s about reducing operational overhead for teams already stretched thin. And Clumio backs that up with a support model that resolves 95–98% of tickets proactively, before most users even know there’s a problem. The platform is engineered for invisibility in the best way—hands-off, API-driven, and infrastructure-free.

CFD23 Clumio vs Native.

Granularity, Performance, and Real-World Outcomes

Clumio’s flexibility comes into sharper focus when you look at how it handles S3 and DynamoDB. For S3, protection groups allow you to define exactly what gets backed up—by tag, prefix, region, or any combination thereof. Instead of duplicating entire multi-purpose buckets, you can target the slice that matters, which helps with both cost and compliance.

DynamoDB protection is table-level, and leverages streams for incremental capture. That’s a big win over full-table snapshots—especially for large, high-throughput environments—because it means lower cost, shorter RPOs, and faster restores.

And the numbers? They’re hard to ignore:

100M object restore in 9 hours, compared to 3 days with AWS-native tools.
50+ billion object support per bucket, well above AWS’s 7.5B limit.
Sub-hour RPOs thanks to continuous change tracking.

Case studies bring those claims into focus. Atlassian dropped their RTO from 248 days to under 2, and cut costs by 70%. Duolingo saved over $1M annually on DynamoDB backup while increasing retention from 7 to 30 dailies. These aren’t marginal gains. They’re systemic improvements—both in speed and economics.

CFD23 Clumio Duolingo.

Pricing and the Big Picture

Clumio’s pricing is refreshingly straightforward: $0.025 per GB per month (front-end backup size), plus $1.50 per million objects managed. Restore and cross-region transfer costs apply only during recovery. Air-gapped storage comes standard—no extra fees, no extra knobs to turn.

The takeaway? Clumio isn’t trying to do everything. It’s built for AWS customers who want native integration, automation without the overhead, and real control over their backup data. You won’t find support for Azure or GCP (yet), and it doesn’t pretend to be a legacy replacement for every use case. What it does do is give you a scalable, secure, efficient way to back up and restore critical AWS services without dragging along the complexity of traditional platforms.

That said, no matter how invisible or “set and forget” the platform seems, you still need to care about your backup strategy. Automation is only as good as your observability and validation practices. But if you’re all-in on AWS and still leaning on snapshots and lifecycle policies, Clumio is worth a serious look.

To watch the video of the #CFD23 presentation by Commvault on Clumio, go to the Tech Field Day’s YouTube channel.

Qumulo at Cloud Field Day 23: DR Without the Drama

I’ve been in this industry long enough to have seen disaster recovery run the full spectrum—from fire drills that barely worked to full-on game changers. And at Cloud Field Day 23, Qumulo laid out a vision for business continuity that might just live up to the hype. Not because it’s flashy, but because it’s practical. DR that works without being the most expensive line item in your budget? That got my attention.

Let’s break it down.

No More Stretching the Truth About Stretch Clusters

Most of us have been sold the dream of active-active systems with full failover and zero downtime. And then reality sets in. You either pay double to stand up a hot standby that sits idle most of the year—or you roll the dice and hope your last backup isn’t too stale when it counts.

Qumulo’s take? Don’t replicate everything twice. Instead, park your data in cold cloud storage like Glacier Instant Retrieval or Azure Blob Cold, where the cost per terabyte actually makes CFOs smile. Then, scale up compute on demand only when you need it. We’re talking cold to hot in under five minutes—no data rehydration, no DNS voodoo, no long restore windows.

Qumulo Instant Hot DR

And here’s where it gets really interesting: Qumulo isn’t just minimizing costs—they’re actively engineering out inefficiencies. One customer faced $800,000 in projected API charges moving medical images into a vendor-neutral archive. Qumulo helped them reduce that to just $180. The magic wasn’t in waving away the cost—it came from bin-packing files to minimize object write operations and optimizing how data interacted with the storage backend.

On top of that, Qumulo’s neural cache plays a significant role in controlling read-heavy workloads. By maintaining 92–98% read cache hit rates and adapting caching strategies based on usage patterns, file types, and directory behavior, they slash repeat API calls that would otherwise nickel-and-dime you into oblivion. Their global fleet averages less than 1% of monthly cost from API charges, compared to 15–20% that’s typical for cloud object storage.

Now, are those numbers a best-case scenario? Almost certainly. And I’m always skeptical of dramatic cost-saving claims until I can get hands-on and validate them in a real environment. But what’s undeniable is that Qumulo is hyper-optimizing their platform not just for performance, but to respect the economics of running file workloads in the cloud. That’s more than most vendors even try to do.

Real-Time DR You Can Actually Test (And Should)

One line from the presentation stuck with me: “We’ve never tested our DR because if I take down production to do it, that’s a resume-generating event.” If you’ve worked in ops, you’ve heard this. Or maybe you’ve said it.

Qumulo’s approach flips the model. Their system continuously replicates data block-by-block between your on-prem caching layer and a cloud-native backend. The cache stays local for performance, but the cloud holds the authoritative copy. That means you can spin down your on-prem environment, move employees to another site—or just hand them a fully capable Asus NUC—and keep working like nothing happened.

During the demo, they showed a failover that was basically “stop using this SMB share, start using this one.” Same data, same structure, same IP. Even the mount point didn’t change. No rehydration, no scrambling to resync state. It just worked.

Data Fabric That’s More Than a Buzzword

Every vendor talks about their data fabric. Qumulo actually showed one.

They connected an on-prem environment to a cloud-native Qumulo instance using a “data portal”—think block-level streaming replication, not dumb file copies. Clients didn’t have to know the cloud existed. Then they moved data, edited a file, failed over, failed back, and showed full consistency end to end. And if you’re thinking, “What about edge or third-party access?”—yep, that’s built in too. They demoed extending read-write access to an external partner with full revocation and audit.

Even better, this wasn’t just DR—it was a multi-cloud-ready setup with file, object, and cloud compute playing nice together. AWS, Azure, GCP—pick your flavor. The system doesn’t care.

Where the Data Lives (And Who Uses It)

In the CEO’s opening segment, we got a better sense of just who’s using Qumulo. Turns out, it’s everyone from entertainment studios rendering animated characters your kids know by name, to healthcare researchers storing medical imaging and genomic datasets. The claim that “we’re storing the cure for cancer on a Qumulo system” might sound dramatic—but they meant it literally, with NIH and NSF grant recipients relying on their storage to keep research data accessible and verifiable.

Want to know how real-time image processing supports public safety? Qumulo powers storage for real-time crime centers in major U.S. cities. They even shared a story about a high-profile presidential visit requiring instantaneous video ingest, analysis, and secure access for multiple agencies, including those that couldn’t legally use facial recognition software—while others could. Same data, different access paths, strict consistency guaranteed.

They’re also serving defense agencies working with UAS/UAV video data and edge AI, and municipal governments managing CAD/GIS datasets for public works. One customer runs a single microscope generating 750 terabytes per week, streaming it to an AI cluster in Texas for medical research. That’s the scale we’re talking about.

Final Take

This wasn’t a pitch about replacing your on-prem hardware with another box. It was a strategy shift: DR that’s built into your primary system, not bolted on. Qumulo’s demo didn’t just show high availability—it showed recoverability that feels like high availability. And it did it with less infrastructure, less manual effort, and fewer “please don’t crash” prayers.

It’s business continuity without the drama. And frankly, it’s pretty cool.

Qumulo CFD23 Presenters

Here are links to the videos of the Qumulo presentation at Cloud Field Day 23 on the Tech Field Day Youtube channel:

Reimagining Data Management in a Hybrid-Cloud World with Qumulo

Seamless Business Continuity and Disaster Avoidance with Qumulo

Seamless Business Continuity and Disaster Avoidance: Multi-Cloud Demonstration Workflow with Qumulo

Cloud ERP on Your Terms: SAP, HPE GreenLake, and the Private Cloud Middle Ground

I participated this week as a delegate at Cloud Field Day 23, and one of the most candid sessions so far came from HPE GreenLake and SAP. The focus? SAP Cloud ERP – formerly known as RISE – and their joint approach to helping legacy SAP ERP customers make the leap to their private cloud platform.

An early slide (highlights mine) hit with a stat that landed harder than I expected: as of the end of 2024, only 39% of legacy SAP ERP customers had actually purchased S/4HANA licenses. That’s not migration complete—that’s just licenses purchased. And that is for a product that goes End of Support in 2027. For a platform as mission-critical and sprawling as SAP ERP, it’s not hard to see why inertia reigns.

SAP and HPE’s proposed answer for hesitant customers? A hybrid approach called Customer Data Center (CDC) private cloud ERP. Think of it as SAAS, but running in your data center, on HPE hardware, maintained by both SAP and HPE. Customers get cloud operations and SAP support continuity, while keeping their workloads and their data close to home. It’s designed to help customers avoid falling off the end-of-support cliff while buying time to transition on their terms.

The session also included a customer perspective from Energy Transfer, a US firm with 130,000 miles of pipeline in 44 states and one of the early adopters of this CDC model. They were refreshingly transparent. Yes, there were “sticks and carrots” involved in the decision, but the biggest carrot for them was the promise of access to Joule – SAP’s agentic AI platform. Joule is only available in SAP’s public SaaS offering or this private CDC model, making it a compelling draw. Energy Transfer’s non-negotiable condition? The transition had to be cost-neutral.

SAP also described how they structure their engagement model to support projects of this magnitude. Given how many ERP projects fail or flounder due to continuity issues, I asked a question during the session about team depth. Specifically, how do they manage institutional knowledge when key personnel inevitably move on? SAP’s response was pragmatic: their named project teams are regional, and roles are built with intentional overlap. Each team member is flanked by someone one level above and one level below who is looped in, to smooth transitions if (when) someone leaves. As someone who has had to step into the gap when colleagues take other opportunities and now manages a team, that struck me as both smart and necessary.

HPE and SAP didn’t shy away from the business reality underpinning all of this. The perpetual license model is dying, and subscription-based models are now the norm. While some customers still pine for the days of CapEx and perpetuals, HPE and SAP are incentivizing the move to recurring revenue models in a way that’s clearly designed to align better with how modern IT is financed and measured.

Bottom line? Public Cloud ERP isn’t one-size-fits-all, and by SAP’s admission isn’t ready for many of their complex and customized customer environments. This hybrid CDC approach acknowledges that reality. Not every enterprise is ready to go all-in on SaaS, and some may never be. SAP and HPE GreenLake seem to understand that, and the CDC model looks like a pragmatic (and carrot-laced) middle path.