PREFACE: We like to provide a bit of background on our blog posts, but if you want, you can skip ahead to the CloudFront v. Cloudflare response times.
💡 Also, important to note: For the purposes of this post, we’re only talking about our API, which is entirely non-cacheable.
As a “mature” SaaS that went live in 2007, we’ve been around long enough to have used a variety of services over the past decade and a half. Over the years, we’ve gone from a single manually-configured bare metal server to a nice and tidy, Terraform-managed AWS infrastructure with all the fixin’s. We’ve added additional programming languages, serverless systems, micro-services, and more. We’ve seen SSL certs go from costly luxury items to automatically and freely provisioned.
Oh, and remember IE6? Because we do! (Or at least those of us that have at least a few gray hairs, some of which surely were caused by IE6.)
So much progress, both for us at Foxy, and for the internet at large. Truly amazing stuff.
CloudFront & Illusions of Perfection
But with all that change, it’s sometimes easy to think we’ve arrived at the ultimate solution. In our case, we’ve felt that way about Route53 (AWS’s DNS) and CloudFront (AWS’s CDN). Route53 works well and is nice and easy to automate via Terraform and CloudFormation. CloudFront gets us a fantastically powerful and fast CDN to serve traffic globally. Yes, other CDNs exist, but why bother? Surely nobody could really do that much better than AWS, with its near-unlimited resources, right?
Cracks in the Façade
A few months ago we started working with a company operating in Singapore and Hong Kong that’s using Foxy to expand their online services in the grocery and restaurant industries. Foxy was a perfect fit, but they were finding our API far too slow to meet their needs.
Which was weird, because when we moved to CloudFront a number of years ago, we tested response times from locations around the world, and our testing showed CloudFront absolutely destroying the other options in Asia-Pacific. So we figured CloudFront -> load balancers in us-east-1 was as good as things could get. Surely local traffic to any CloudFront PoP (Point of Presence) should have the fastest possible traffic to servers in AWS US-East-1. Yes?
Apparently not. Our user was indeed seeing really slow API responses from Singapore. Well over a second, at the fastest. Definitely too slow.
Since we’ve recently fallen in love with Terraform, and since Cloudflare could be added to our Terraform setup, we figured it testing our Cloudflare was worth a shot. Our previous concerns with mixing cloud providers primarily had to do with difficulties keeping things fully automated, but Terraform is a game changer in that regard.
So an hour or two later, we had Cloudflare added to a dev environment provisioned by Terraform, duplicating our Route53 DNS and sending traffic to our load balancer in us-east-1.
First Impressions: Cloudflare is REALLY Impressive
First we’ll outline how we tested. We used Zoho’s Site24x7 to set up some basic monitors that made GET requests to our API, which has no caching, and responds with JSON. Good enough for our purposes.
What we’re calling “CloudFront” is CloudFront -> ALB (Application Load Balancer) in us-east-1 -> ECS Fargate. What we’re calling “Cloudflare” is Cloudflare -> ALB -> ECS Fargate.
For our initial testing, we enabled Cloudflare’s Argo Smart Routing, thinking that’d be a fair fight.
It was not.
So what did our tests show? In short, Cloudflare somehow manages to beat CloudFront (sometimes by silly margins), but only with their Argo smart routing enabled. Here are some comparisons for various locations in APAC.
Let’s break that down by location, with average response times (over a 7 day period, tested from all locations every 15 minutes) in milliseconds:
Those results are impressive, to put it mildly. Almost 40% faster, on average, with Singapore showing a 50%+ reduction in pageloads. We know Cloudflare likes to talk about how great they are, but everybody claims they’re the best and the fastest, so we really didn’t expect this.
That said, this test was with Cloudflare’s Argo Smart Routing enabled. It sounds cool, but it’s an additional cost. Is it really doing anything, or is it just marketing fluff?
Easy answer: Argo is absolutely doing something. We don’t need to annotate this graph to show where it was disabled:
With Argo disabled, Cloudflare was immediately slower than CloudFront, by ~10%-20%. (We’ll get into those numbers later.)
So at this point, it looked like CloudFront was just not good, to put it mildly.
But wait… CloudFront’s Origin Shield to the Rescue?
We were actually all ready to switch one of our domains over to Cloudflare, but we were still in disbelief that CloudFront was so bad, so we reached out to our friendly AWS rep. He got us a meeting with a CloudFront expert on their end, who mentioned a few things:
- Cloudflare’s Argo is more comparable to AWS’s Global Accelerator. (This won’t actually work for us for a few reasons, mostly because of how we do so many different domains + SSL, so this was a non-starter. For our purposes, Cloudflare to CloudFront is a more reasonable comparison.)
- Increasing the CloudFront origin keep-alive might help, especially when traffic will keep connections from CloudFront to the origin open. Good to note.
- CloudFront’s Origin Shield might help.
That last one was a surprise. According to AWS’s description of Origin Shield… it doesn’t make sense that it’d improve performance for a non-cacheable API:
Amazon CloudFront announces Origin Shield, a centralized caching layer that helps increase your cache hit ratio to reduce the load on your origin.
For an API that’s never cacheable, why on earth would this help? Because nestled at the bottom of the Origin Shield Developer Guide is this nugget:
Better network performance
When you enable Origin Shield […] you can get better network performance. For origins in an AWS Region, CloudFront network traffic remains on the high throughput CloudFront network all the way to your origin.
Still though… how much could it help, right? Surely if AWS was sitting on some magic functionality that made things 20-50% faster, they wouldn’t bury that lede.
We weren’t expecting much, but turns out it brings CloudFront’s performance about even with Cloudflare with Argo enabled. Because of course it does. And because why not, I guess. It’s the year 2021 and nothing makes sense anymore.
TL;DR: Turn on Argo or Origin Shield!
So, in summary:
- Turn on Origin Shield if you use CloudFront and care about response times, even for dynamic / uncacheable content.
- Turn on Argo if you use Cloudflare and care about response times.
- Talk to your AWS rep if you’re seeing another service just absolutely destroy the comparable AWS service. Maybe there’s a magic switch that you’d never assume actually solves the problem 🙂
Average Response Times Globally
The following table shows average response times in milliseconds (again, over a 7 day window, tested every 15 minutes). Lower numbers are better.
|Locations||Cloudflare||CloudFront||Cloudflare + Argo||CloudFront + Origin Shield|
And that data graphed…
Response times from Virginia are a bit weird, as is Mexico City, but the clear average is a significant reduction of response times.
Data From Our Production API
The results were so encouraging that we turned on Origin Shield for our production API’s CloudFront distro. (All the other data we’ve shown is for a staging environment.) Using AWS Athena, we were able to average the hourly
time_taken values for all traffic, globally. We don’t need to annotate where on the graph Origin Shield was enabled:
The average went from 0.1419s to 0.0932s. That’s a ~34% reduction in request time for our production API, on average, for all requests globally (as seen by CloudFront, so excluding DNS and connection times). All from just flipping on a Origin Shield.
Is this truly an 🍎 to 🍏 comparison?
Sorta. Sorta not:
- We’re only testing our API here. Cloudflare has a lot of really slick things they can do for more “normal” website traffic (like image optimization, js optimization, etc.). Our assumption is that, for a public-facing marketing site, Cloudflare is probably much better.
- The WAF (Web Application Firewall). Cloudflare’s WAF is almost undeniably “better” by most of the metrics most people care about (like “set it and forget it”). AWS’s WAF is sorta “build what you want”, and we have some neat Lambda-powered automations for our specific use cases. But even for us, is that really what we want?
- Pricing is very different.
- Cloudflare’s bandwidth is free, but Argo is 10¢/GB. CloudFront’s is ~8¢/GB, but Origin Shield is 0.75¢ per 10k requests (for US origins). Let’s call this about even, assuming you care about the fastest response times.
- Cloudflare’s WAF is free, but CloudFront’s WAF definitely isn’t. AWS WAF costs can add up, especially if you get hit by bots constantly like we do. (We pay hundreds of dollars a month just for traffic we rate-limit.) Note that if the AWS WAF blocks a request, you don’t pay for Origin Shield, which makes sense. (We’re not sure about Cloudflare’s WAF as it relates to Argo pricing.)
- Cloudflare Workers and Lambda@Edge pricing scales a bit differently, but these costs tend to be negligible compared to the other costs (in our experience, at least).
- SSL for multiple domains is very different. Though Cloudflare just recently made their “SSL for SaaS” publicly available (instead of enterprise-only), it’s $2/mo/domain. CloudFront + ACM is free. If you’ve just got one domain, great, but if you’re a SaaS that allows custom domains like we do, this pushes Cloudflare from “we could probably save some money!” to “wow… even if we save all of what we’re paying for bandwidth and WAF, it’d still cost us a lot :(“.
- Cloudflare is supposedly faster for Enterprise Plan users. We don’t know if the “network prioritization” would make a significant difference. And though we understand withholding features based on tiers, we prefer AWS’s a la carte pricing. Certainly feels more like we can get the best they offer without arbitrary pricing tiers.
- Testing more frequent requests may have changed things. Because of the way CloudFront and Cloudflare connect to origins, and keep connections open, if we tested multiple requests per minute (or second), we may have seen different values.
- Terraform (mostly) works great with both. If we did want to go with Cloudflare, Terraform makes going multi-cloud just beautifully manageable. So that’s nice 🙂
A Final Note
If you’ve made it this far, we hope this was helpful. If you need a platform for custom ecommerce, let us know. We’re here to help 🙂
We also hope AWS provides data like this in the future. Would have been pretty great if AWS promoted Origin Shield as “Hey, turn this on and see a ~35%+ reduction in response times!” Would have saved us a fair number of hours. 🙂