Excessive Bandwidth Usage

Hi,

So an update today.

I’ve paid for Netlify Analytics, which for the most part is completely useless to me. It didn’t give me any insights of value per my particular problems.

I did make a substantial change to the site, which seems to be decreasing my bandwidth. Instead of bundling and serving assets like jQuery, Bootstrap, etc., directly on my Netlify site, I’ve switched to using an external CDN for them. I did this last night and instead of waking up to 6GB of bandwidth consumption, I woke up to less than 1GB.

So I have questions from this:

  1. Why were these assets not correctly caching? Were they not caching because I had chosen to create a bundle from them?
  2. If bundling all JS and CSS was the reason, how come my bundles were not caching? My unique visitors were much less than my page views, so shouldn’t most of those people have been cached from a prior visit?
  3. Should I be using an external CDN, or is the preferred approach to get all my assets on Netlify? My gut says get everything on Netlify to serve them faster with my overall site.

Please let me know.

Thanks,

cc @fool

1 Like

Hmm, no, I think you misread my post that you linked to - that is an example of ADDING a header to a PROXY that we make, not doing a redirect based on a header.

1 Like

Hi again,

Lots to work through here and I’m a bit time limited today so will do my best:

  1. our caching works well, when your assets don’t change names. I guess I didn’t anticipate that your JS and CSS would be changing that often, to generate new names and require new downloads, but perhaps you change them frequently? If not, our default caching should prevent excessive transmission since it would return a 304 without resending the content.
  2. Since I don’t understand the cause of the evidence you are using to make your assertion, hard to say. If you hadn’t changed caching settings I could speak to it more easily.
  3. It is not our intention that you’d need another service to use our service. I guess if you want to pay them instead of us for bandwidth, that is an option that is available to you.

Hi,

After I offloaded my JS and CSS assets to another CDN, I removed the caching rules that I had added yesterday. To be clear, the other CDNs are not charging me anything, these are the CDNs that Bootstrap, jQuery, etc., recommend using if not downloading.

As of right now I am using the vanilla caching rules, but Netlify is now serving my specific JS and CSS, not third-party libraries. Having made this change has reduced my bandwidth by quite a bit, which again takes me back to the question of why weren’t these assets being cached?

A detail that I didn’t add because I didn’t think it mattered was that I have Zapier running my builds every day on a schedule. In theory if nothing has changed, the caches still work, but maybe I’m wrong and new builds give new etag values. The builds say no files changed most of the time.

Do you have any tools or documentation you recommend to test that caching is working properly. I’m trying to troubleshoot all this, but am at a loss since I cannot see what’s happening. I was hoping that if I paid for Analytics, it would tell me how much cached content was served, etc., but instead I got the same data that Google Analytics was giving me.

I’d also be willing to pay for an hour of support if it meant that by the end of it, I could rest easy that everything was resolved.

Please let me know @fool.

Thanks,

Hi @fool,

Been doing some more experimenting because I don’t have anything to lose yet.

I’ve gotten rid of asset bundling and I’ve moved all libraries such as Bootstrap, Font Awesome, jQuery, etc., back onto Netlify. I’ve removed all custom cache rules from the netlify.toml file. With the exception of my service worker, all behavior should be very vanilla.

With that in mind, I execute the following command:

$ curl -I https://www.thepolyglotdeveloper.com/css/bootstrap/bootstrap.min.css

HTTP/2 200 
accept-ranges: bytes
cache-control: public, max-age=0, must-revalidate
content-length: 121457
content-type: text/css; charset=UTF-8
date: Fri, 30 Aug 2019 20:48:56 GMT
etag: "1be33a500fe9965531b55b00eb157868-ssl"
strict-transport-security: max-age=31536000
age: 0
server: Netlify
x-nf-request-id: e9648d51-c3f5-40de-b217-f552987d4c70-2783111

It seems fine, although, I don’t have the knowledge on this kind of stuff to say that it isn’t fine.

So now, I take information from the previous request and execute the following:

$ curl -I -H 'If-None-Match: "1be33a500fe9965531b55b00eb157868-ssl"' https://www.thepolyglotdeveloper.com/css/bootstrap/bootstrap.min.css

HTTP/2 304 
cache-control: public, max-age=0, must-revalidate
date: Fri, 30 Aug 2019 21:06:07 GMT
etag: "1be33a500fe9965531b55b00eb157868-ssl"
strict-transport-security: max-age=31536000
age: 0
server: Netlify
x-nf-request-id: eaa63499-41f5-44e6-838e-9fffcaf493fd-3156240

As you can see, I’ve taken the etag value from the first request and tested in the second request to see if I get a 304 response, which I did.

I’ve re-read your articles at least 25 times now and as far as I can tell, when a request is sent with the etag, it returns a 304 if the hash hasn’t changed which signals to the requestor to use what exists in cache. If anything other than a 304 comes back, use the fresh copy and store the new etag value.

So here is where it gets strange.

When I open my browser (tried Chrome and Firefox) and look at the etag value, it doesn’t quite match. Instead, I have the following:

1be33a500fe9965531b55b00eb157868-ssl-df

Notice that it is essentially the same as the etag that came back in the cURL request, except this one has a -df appended to the end.

So I take the new etag and run the request in cURL:

$ curl -I -H 'If-None-Match: "1be33a500fe9965531b55b00eb157868-ssl-df"' https://www.thepolyglotdeveloper.com/css/bootstrap/bootstrap.min.css

HTTP/2 200 
accept-ranges: bytes
cache-control: public, max-age=0, must-revalidate
content-length: 121457
content-type: text/css; charset=UTF-8
date: Fri, 30 Aug 2019 21:10:44 GMT
etag: "1be33a500fe9965531b55b00eb157868-ssl"
strict-transport-security: max-age=31536000
age: 0
server: Netlify
x-nf-request-id: eaa63499-41f5-44e6-838e-9fffcaf493fd-3236395

The -df at the end is invalid and the entire resource is obtained again because a 304 response code was not used.

So is there something wrong with the etag feature of Netlify? My browsers appear to be requesting a new resource every time because the etag value doesn’t match cURL. If this is true, it would probably explain my heavy bandwidth usage.

Can you please validate my research and tell me if I’m correct or incorrect? If I’m incorrect, can you please explain why I’m seeing what I’m seeing as well as how to correctly do these tests?

Thanks,

Hi, @nraboy. I also had the same question when I was first learning about how our caching works when I experienced the same issue. There are both a reason and solution for this behavior!

The -df on the end of the etag (1be33a500fe9965531b55b00eb157868-ssl-df) indicates that this is the etag for the compressed version of the URL (df = deflate).

There was a 200 status response because curl normally requests the uncompressed version of a URL so the etag isn’t actually a match. However, you can add an accept-encoding request header and the 304 will occur like so:

$ curl -I -H 'If-None-Match: "1be33a500fe9965531b55b00eb157868-ssl-df"' -H 'accept-encoding: gzip, deflate, br' https://www.thepolyglotdeveloper.com/css/bootstrap/bootstrap.min.css
HTTP/2 304
date: Tue, 03 Sep 2019 02:15:08 GMT
etag: "1be33a500fe9965531b55b00eb157868-ssl-df"
cache-control: public, max-age=0, must-revalidate
server: Netlify
vary: Accept-Encoding
x-nf-request-id: 56251d0c-bbc9-4172-aa60-6a5e55b7c899-3247103

Adding this additional header (using -H 'accept-encoding: gzip, deflate, br' ) means curl is now requesting the gzipped version of the page and this will cause the -df etag header to match - resulting in 304s.

Most web browsers always include the accept-encoding: gzip, deflate, br header by default. On the other hand, curl only sends that header when it is specifically told to do so.

You can also used --compressed in place of that header:

curl --compressed -I -H 'If-None-Match: "1be33a500fe9965531b55b00eb157868-ssl-df"' https://www.thepolyglotdeveloper.com/css/bootstrap/bootstrap.min.css

Additional questions are always welcome. :+1:

1 Like

Hi @luke,

Thanks for clarity on that particular issue.

Back to my initial. I am still burning through massive amounts of bandwidth. I’ve removed custom caching rules, stopped asset bundling and hashing with Hugo, and have left the site alone for the past few days, but yet I’m burning ~3GB of transfer a day for ~400,000 page views per month.

I’m still failing to understand why I was at ~13GB a month of bandwidth while using Cloudflare with DigitalOcean, but so much more with Netlify. I’ve been asking around externally and everyone seems to think the amount of bandwidth I’m using on Netlify is out of control.

Please let me know.

Thanks,

1 Like

I ran a test (just using the browser with caching disabled) to load the the site landing page. I saw about 531 KB of data transferred, about 197 KB of which is coming from the site domain (in other words, from Netlify).

Some cocktail napkin math shows that, with 400k page views, the total bandwidth used (imagining all unique visitors with no caching) would be around 75 GB (using GiB so 2^30 not 10^9 bytes for GBs). Now, there is mention above about recent changes around caching and I do show recent bandwidth daily use has dropped, but is still much higher than the stated expectation of around 13 GB a month.

Then I noticed the following on August 30th:

After I offloaded my JS and CSS assets to another CDN, I removed the caching rules that I had added yesterday. To be clear, the other CDNs are not charging me anything, these are the CDNs that Bootstrap, jQuery, etc., recommend using if not downloading.

I am seeing site behavior which doesn’t match this. I think that many of these assets are still hosted at your Netlify hosted domain. This, I believe, is the source of the extra bandwidth at Netlify.

These assets appear to be the vast majority of the bandwidth used and, if they were hosted elsewhere, the bandwidth used would be very close to your stated estimate of 13 GB monthly - and not orders of magnitude higher.

Getting to the details, here are some example asset URLs which currently point to the domain hosted at Netlify, not a third-party CDN:

https://www.thepolyglotdeveloper.com/css/bootstrap/bootstrap.min.css - 19.4 KB
https://www.thepolyglotdeveloper.com/js/bootstrap/bootstrap.min.js - 10.8 KB
https://www.thepolyglotdeveloper.com/webfonts/fa-solid-900.woff2 - 73.8 KB

(Those are the gzipped sizes “on the wire” so the actual bandwidth not the uncompressed size.)

These are assets loaded when I visit the url https://wthepolyglotdeveloper.com/. They are not loaded using a third-party CDN; they come from Netlify.

Of the 197 KB for the initial page load coming from Netlify, these three files above are 104 KB of that total - over half. These are just three examples. There are other assets which you might be factoring as being hosted elsewhere.

For example, two font file URLs below are the largest sources of bandwidth in this billing cycle:

https://www.thepolyglotdeveloper.com/webfonts/fa-solid-900.woff2 4,994,579,936 btyes
https://www.thepolyglotdeveloper.com/webfonts/fa-brands-400.woff2 4,859,231,552 bytes

That is 9.2 GiB of bandwidth. There are also variations of URLs like these with GET parameters, example (in a code block to preserve the string):

https://www.thepolyglotdeveloper.com/webfonts/fa-solid-900.eot?__WB_REVISION__=a547e21eceadf53602caf057be9ad9fd

That URL used 3,833,266,226 bytes this cycle. All told, I see 40,149,099,122 bytes used for fonts under /webfonts/ at this domain in the currently billing cycle.

When you subtract that from your total bandwidth, we are getting closer to the expected value. And that is just the fonts. If CSS and javascript assets were also factored in, I bet the new bandwidth would be very close to the expected value.

Does this explain the bandwidth discrepancies? I

I don’t think your team/account will reach the 100 GB limit this month with the current trends I’m seeing so there shouldn’t be any extra bandwidth charge. If you move these assets elsewhere, I do believe your new monthly bandwidth use will be back to your previous levels.

If there are other questions please let us know.

3 Likes

Hey @luke,

Thanks for getting back to me!

In regards to offloading the static libraries. I had brought them back in prior to your response.

I’m wondering if I’ve been spoiled by Cloudflare and some kind of magic that they do behind the scenes. Maybe they do some kind of shared cache where common resources like my fonts don’t take bandwidth?

Regardless, it does look like bandwidth is mellowing out. Not as low as what I once had, but not the 8GB per day I saw at the beginning.

Some feedback for Netlify:

If I’m paying for Analytics, which I am, I want to be able to access the same information that you see in regards to bandwidth usage. The Analytics package doesn’t give me insight into what the bandwidth is for my font files, etc. Knowing this information would be incredibly helpful so that I don’t need support staff to get involved.

I’m going to sit on the site for a while and monitor what happens.

Thanks,

2 Likes

totally valid feedback, thank you! I’ve brought it to our team’s attention. The intention of the product is NOT to overwhelm you with data, so I doubt we’ll be exposing all traffic in the way that our logging captures it, but it’s worth asking for :slight_smile:

Hi @fool,

I get that you don’t want to overwhelm people, but currently users are not receiving much benefit with the Netlify Analytics service versus what Google already provides for free. Sure, you get a definitive number on your page views rather than hoping that someone isn’t blocking the Google Analytics JavaScript, but that kind of stuff isn’t very insightful.

For $9+ a month, I want information I couldn’t get in Google, or I couldn’t easily get in Google. Bandwidth usage for assets is a good example of this.

Best,

@nraboy - totally get what you are saying. Analytics is an early product, so feedback is coming at a great time. We’ll pass it on.

Hello,
I have a doubt.

lets say my account consumes 120GB/m i.e. 100GB free and i have purchased extra bandwidth $20/100GB pack.

remaining 80GB will carry forward to next month or not?

thanks.

It will not carry over. Bandwidth allocations are all per-billing-cycle, so the one you bought is for this billing cycle only.

Thanks @fool for the info.

I have another question…

Are you planning to reduce to free 100GB BW in near future (say 6months).

because i started moving most of my static sites to netlify, so if you are planning to change free BW limit in near future…i need to re-think about moving my sites.

thanks.

There is no immediate plan to reduce the amount of free bandwidth for free accounts. But, of course, that could change at any time.

I am not sure of this applies to you or not, but we do increase the free bandwidth to 400GB for open source projects. You can apply for an open source team here: https://opensource-form.netlify.com/

1 Like

My guess what is happening is that: Cloudflare is caching the static assets (js, css, webfonts) and new users download these assets directly from Cloudflare and therefore never hitting your DigitalOcean server.

This means the resource is only requested a few times from your origin server and the rest is handled by Cloudflare; NOT counting to your toal bandwith.

Contrary, in the Netlify case; the asset is also cached on the Netlify CDN however the bandwith is COUNTED against your total bandwith.

You could get the same behavior on Netlify if you would still use Cloudflare additionally:
→ User > Cloudflare > Netlify

In this case, the assets cached on Cloudflare would again not be counted against your total.

4 Likes

Thanks for spelling this out @MasterCassim. If that is true, this is a significant departure from the traditional CDN setup with a VPS. I had assumed Netlify’s CDN traffic would not count towards my bandwidth.

@nraboy is your site back down to expected bandwidth levels, or did you just manage to remain under acceptable limits? I see your site serves its bundles with hashed filenames and a long-term cache policy, and serves libraries like Bootstrap via Netlify’s must-revalidate cache strategy.

I’ve been using Netlify for over a year, and I just started using hashed filenames with long-term caching. My site feels snappier when loading assets from memory vs. Netlify 304’s. I got an alert a few days later about high bandwidth usage. I don’t see why my bandwidth shot up because in my case everything is bundled together. Prior to this change, any build would update the .js file which is the lion’s share of data. I guess I need to make sure these get cached in a CDN that doesn’t have bandwidth limits.

Hey @robert,

It appears that the bandwidth on my site has stabilized, but I’m not worry free about the whole thing.

The more content I publish, the higher the bandwidth. The concerning part is that more content could mean ~4 extra blog posts per month leading to more than 80% bandwidth usage. I suspect it is because rebuilding my site means each HTML page changes, but I don’t know.

In short though, after about a month or so on Netlify, my usage has dropped significantly.

Best,

1 Like

I have exactly same problem, 1 Gb day on Cloudflare turned to 25 GB /day with Netlify for a static site that I’m updating twice a month. Your magical cache article that is referred every where don’t make much help as well as analytics that don’t shows bandwidth usage. I will try to change _headers file to introduce long term caching for everything to see how it will go, but for now it looks for me that magical self invalidating caching bicycle not working so great for me as expected.