Background and goals
A product team that runs a news and resources site needed to cut origin load and delivery energy without harming content freshness. The team wanted a clear rollout plan engineers could test and revert, simple metrics to prove impact, and configuration rules that stayed robust across localization and AB tests.
Constraints and risk profile
Correctness rules
Some pages must be fresh on every request. Some pages tolerate seconds or minutes of delay. Editorial pages update rarely but sometimes get rapid corrections. Commerce related endpoints require strong cache control for personalization. Any CDN change had to avoid showing stale or incorrect content to authenticated users and to important crawlers used by search engines.
Operational limits
The origin has burst limits on compute and bandwidth that translate directly into cost and resilience risk. The team had a single origin region and wanted to reduce cross region egress while keeping RTO and rollback simple.
Decision framework
Primary objective
Reduce origin requests and total bytes pulled from origin in ways that are measurable and reversible while keeping user visible freshness within defined tolerance windows.
How trade offs were evaluated
Three levers were considered. Time to live values control how long an object is served from cache. Cache key composition controls how many distinct variants the CDN stores and whether requests hit cache. Origin shielding routes origin fetches through a single intermediate POP to consolidate cache fills and reduce duplicate origin connections. Each lever changes correctness, cache hit ratio and origin work in different ways.
Plan and rollout
Step 1. Inventory and classification
The team mapped pages and API endpoints to five freshness classes. Each class had a tolerance rule and an owner responsible for approving changes.
- Immutable assets that never change after deploy
- Cacheable content that can be delayed for minutes to hours
- Soft real time pages where a small delay is acceptable
- Personalized items served to signed in users
- Non cacheable endpoints such as carts and checkout
Step 2. Baseline measurement
Before changing CDN rules the team captured current signals for a representative week. They measured origin requests per minute, bytes served from origin, cache hit ratio by content type and by POP, and latency percentiles for cache miss paths. They also recorded error and rollback windows for previous deploys to estimate safe change sizes.
Step 3. Conservative TTL defaults
The team set conservative TTLs by class. Immutable assets received very long TTLs. Cacheable content used medium TTLs. Soft real time pages used shorter TTLs. The guiding rule was to default to the longest TTL that stays within the class tolerance and to use shorter values only where owners explicitly requested them.
Step 4. Simplify cache keys
Cache keys were audited to remove unnecessary variance. Common mistakes included user agent fragments, tracking parameters and ephemeral tokens in the cache key. The team adopted a whitelist approach. Only path, explicit accepted query parameters, and specific header variations such as Accept for image format negotiation were included.
Step 5. Apply origin shielding selectively
Origin shielding was enabled for cacheable content classes where multiple POPs would otherwise cause repeated origin fetches for the same object. Shielding was not enabled for personalized endpoints or non cacheable items. The shield POP was chosen in front of the origin region to minimize additional network hops and to consolidate TLS sessions.
Step 6. Staged rollout and validation
Changes were rolled out by content class. Each stage lasted long enough to observe cache fill patterns across geographically distributed POPs. Validation checks compared pre change and post change KPIs and also included manual checks of freshness for a sample of pages. If a stage failed any safety check the configuration reverted immediately.
Practical rules and configuration examples
Choosing TTLs
Pick a TTL using a combination of update frequency, user tolerance and cost. If editorial updates are rare but corrections happen, a TTL of a few minutes plus a stale while revalidate pattern can be appropriate. For truly immutable assets long TTLs are cost effective. For pages that must never be cached set cache control to no store and ensure the CDN respects the header.
Cache key design
Include only the components that affect the rendered output. For sites that vary images by Accept header use that header in the key. For localization use the path or an explicit language parameter. Do not include analytics or tracking parameters. For query parameters adopt a whitelist so unimportant parameters do not fragment the cache.
When to use origin shielding
Use shielding when many POPs can independently request the same uncached object within a short window. Shielding is most valuable for objects that take time to generate or large files with high bandwidth cost. Do not enable shielding for personalized content. If your CDN supports origin shielding with connection reuse choose a shield close to the origin to reduce latency and source bandwidth.
Stale serve and revalidation patterns
Serving stale content while revalidating in background reduces user facing latency and avoids origin spikes from concurrent cache fills. Use the CDN or origin validation mechanisms to allow a short stale period while an asynchronous revalidation happens. Ensure revalidation respects conditional requests and that ETags or last modified headers are reliable.
Measurement and KPIs
Track cache hit ratio overall and by content class, origin request rate, origin bytes served, and latency for cache misses. Instrument cache fills to count the number of distinct objects requested from origin per time window. For sustainability reporting convert bytes transferred and CPU time into energy estimates using your provider or a published conversion factor when available. Keep the same baseline period and control for traffic volume to avoid attributing seasonal traffic changes to configuration changes.
Debugging and safety checks
If cache hit ratio drops after a change first check whether the cache key has been over fragmented. Examine sample request logs to see which headers or query parameters vary. If freshness complaints appear, verify the TTL and any stale while revalidate settings and inspect conditional validation headers from origin to CDN. For origin spikes confirm whether shielding is functioning and whether the shield POP is close enough to the origin to avoid introducing new cross region fetches.
Illustrative example
Consider a sample page type that previously caused many origin fetches because the cache key included an ephemeral query parameter. After whitelisting only the path and a single pagination parameter the cache hit ratio for that page type rose significantly and origin requests for that endpoint fell. As an example calculation for planning purposes imagine an endpoint that previously triggered 1000 origin fetches per hour with an average response size of 400 kilobytes. If changes raise cache hit rate so that only 200 requests per hour go to origin the hourly origin bytes drop from 400 megabytes to 80 megabytes. Use these arithmetic steps to estimate reduced bandwidth and to prioritize changes that return the largest reduction per engineering hour invested.
Operational checklist before any change
Ensure owners for each content class have signed off. Export recent request logs for comparison. Prepare rollback steps and automated alerts for sharp increases in origin load or error rates. Choose a staging domain or a subset of POPs for the initial rollout. Document the expected effect on cache hit ratio and the KPI thresholds that will trigger an automatic rollback.
Lessons learned
Conservative defaults and a content class driven approach make changes safer. Simplifying cache keys often yields the largest wins with the least risk. Origin shielding is powerful but should be applied selectively. Measurement is non negotiable. Changes that cannot be validated with clear before and after figures should be paused until proper instrumentation exists.
The approach used here gives teams a repeatable, low friction way to reduce origin work and network transfer while protecting correctness and user experience.