Excessive Bandwidth Usage

Hi,

My main site has been on Netlify a little less than a week now and after checking the bandwidth transfer, the number is a little concerning.

In the ~5 or so days that I’ve been on Netlify, my bandwidth is reporting at 27GB. Comparing against my total bandwidth for the month of July at DigitalOcean, my July report was 13GB for the entire month.

How did I manage to accrue so much more bandwidth usage on Netlify than I did on my previous host? My site hasn’t changed between the two hosts. It is a PWA with a generous cache policy on resources like fonts, scripts, images, etc…, so returning visitors in theory shouldn’t be downloading much.

I’m not currently paying for Netlify Analytics, but Google Analytics generally reports my page views at ~210K per month, which makes the GB of transfer seem even more odd.

Any ideas?

My site is https://www.thepolyglotdeveloper.com.

Thanks,

Which file that eat most bandwidth?

Hi,

I don’t have Netlify analytics, so my analysis may not be the most accurate. Inside the Chrome Inspector, I viewed the resource size for everything coming back in a request. Aside from images, the JavaScript, CSS, etc., all have a size of 80KB or less.

I’m not quite sure how to limit the analysis to Netlify hosted resources as I have a lot in the Inspector.

I checked my bandwidth today, which is one week from when I started and it is at ~50GB. I’m burning through way too much bandwidth on Netlify and am looking for suggestions.

Thanks,

In an effort to try to prevent myself from burning through hundreds of GB of bandwidth every month and ultimately switch back to my prior Cloudflare and DigitalOcean setup, I’ve started experimenting with custom cache headers in my netlify.toml file:

[[headers]]
    for = "/*.css"
    [headers.values]
        Cache-Control = "public, s-max-age=2592000"
[[headers]]
    for = "/*.js"
    [headers.values]
        Cache-Control = "public, s-max-age=2592000"
[[headers]]
    for = "/*.svg"
    [headers.values]
        Cache-Control = "public, s-max-age=2592000"
[[headers]]
    for = "/*.png"
    [headers.values]
        Cache-Control = "public, s-max-age=2592000"
[[headers]]
    for = "/*.jpg"
    [headers.values]
        Cache-Control = "public, s-max-age=2592000"
[[headers]]
    for = "/*.gif"
    [headers.values]
        Cache-Control = "public, s-max-age=2592000"
[[headers]]
    for = "/*.ttf"
    [headers.values]
        Cache-Control = "public, s-max-age=2592000"
[[headers]]
    for = "/*.otf"
    [headers.values]
        Cache-Control = "public, s-max-age=2592000"
[[headers]]
    for = "/*.woff"
    [headers.values]
        Cache-Control = "public, s-max-age=2592000"
[[headers]]
    for = "/*.woff2"
    [headers.values]
        Cache-Control = "public, s-max-age=2592000"

Beyond HTML and XML, my site assets don’t change very frequently so rather than invalidating the cache after every request, I’d rather just serve from the cache. My CSS and JavaScript have bundle identifiers, so in the event I need to roll my website back, the HTML should point at the old bundles.

I’m going to let this sit for a few days and see if the bandwidth usage slows down. In the mean time, can @fool or @laura please chime in? Current events will have me paying significantly more than I was previously and I’d like to come up with solutions to prevent this.

Thanks,

Hiya!

Good intuition on the cache-control header overriding, though I hope you’ve read this article to be aware that you will basically be unable to update any of your non-html content for repeat visitors, within a month: https://www.netlify.com/blog/2017/02/23/better-living-through-caching/

I took a quick look at your account’s bandwidth usage and what I found is that as I’m guessing you expected, it is all on the polyglotdeveloper site and production URL.

I also searched over the past hour to understand your highest traffic URL’s (this is the 10 most requested URL’s by count + path), and I can see some problems in them:

Specifically it seems like you are using hashes in your assets which will make your cache-control headers pointless. These articles have more details on avoiding that pattern:


and

The next 10 or so top loads are all font files. While I don’t know what changed from DO, this does all seem like legitimate traffic, though you get about 2400 reqs a day from this service: http://serpstatbot.com/ many of which 404 and serve your 14126 byte 404 page, so you might try lightening up that HTML, but it isn’t the majority cause of your traffic or anything.

Sorry I don’t have more insights but hopefully that helps you diagnose and improve!

Hi @fool,

I’ve read those articles, and I don’t believe they apply to me.

Hugo is creating hashes on my assets only if those particular assets change. If those assets don’t change, the hash will be the same between builds. If the asset does change, then the HTML is updated to point at the new asset which is then cached.

I’m failing to understand how my cache-control overrides are pointless when there are hashes in my assets, given the hashes don’t frequently change.

In regards to the bot. Is there a way to block bots in Netlify? When I was on DigitalOcean, I was blocking a lot of user agents in my .htaccess file to prevent them from hammering my server or scraping my content.

If you go to http://serpstatbot.com, they list various options on how to control the crawler but to completely block you have add this text to robots.txt file:

User-agent: serpstatbot
Disallow: /

In addition to this I think you have to try to shave as much you can from your 404 page.

I think there were other instances where this happened You can read more about it on Remysharp.com website. I think activating the analytics at least for one month would give you much more insights, given the fact that in the long run you will be able to get rid of paying DO.

In general though, is there a way to block bots that don’t respect the robots file?

Based on other conversations here on the forum I don’t think you cann block or write redirect rules by user-agent. Please check my updated answer above.

Ok, so I stand corrected. Theoretically you can create a redirect rule which searches for a header value, and in the header of the http call there is the user-agent as well. You can find an example Here in the forum
But with this you might need to redirect somewhere so that html file has to be very lightweight.

I’d rather redirect those bots straight to hell.

I’m going to take your suggestion and pay for a month of analytics. I’m going to learn from the data, and hopefully resolve any problems related to bandwidth so I can drink the same kool-aid that everyone else is in regards to Netlify.

I’d really like to stay on the platform, but going from 13GB a month to an estimated 200GB a month of bandwidth is incredibly concerning.

@fool still looking for insights into this when you have a chance.

Thanks,

2 Likes

Hi,

So an update today.

I’ve paid for Netlify Analytics, which for the most part is completely useless to me. It didn’t give me any insights of value per my particular problems.

I did make a substantial change to the site, which seems to be decreasing my bandwidth. Instead of bundling and serving assets like jQuery, Bootstrap, etc., directly on my Netlify site, I’ve switched to using an external CDN for them. I did this last night and instead of waking up to 6GB of bandwidth consumption, I woke up to less than 1GB.

So I have questions from this:

  1. Why were these assets not correctly caching? Were they not caching because I had chosen to create a bundle from them?
  2. If bundling all JS and CSS was the reason, how come my bundles were not caching? My unique visitors were much less than my page views, so shouldn’t most of those people have been cached from a prior visit?
  3. Should I be using an external CDN, or is the preferred approach to get all my assets on Netlify? My gut says get everything on Netlify to serve them faster with my overall site.

Please let me know.

Thanks,

cc @fool

Hmm, no, I think you misread my post that you linked to - that is an example of ADDING a header to a PROXY that we make, not doing a redirect based on a header.

1 Like

Hi again,

Lots to work through here and I’m a bit time limited today so will do my best:

  1. our caching works well, when your assets don’t change names. I guess I didn’t anticipate that your JS and CSS would be changing that often, to generate new names and require new downloads, but perhaps you change them frequently? If not, our default caching should prevent excessive transmission since it would return a 304 without resending the content.
  2. Since I don’t understand the cause of the evidence you are using to make your assertion, hard to say. If you hadn’t changed caching settings I could speak to it more easily.
  3. It is not our intention that you’d need another service to use our service. I guess if you want to pay them instead of us for bandwidth, that is an option that is available to you.

Hi,

After I offloaded my JS and CSS assets to another CDN, I removed the caching rules that I had added yesterday. To be clear, the other CDNs are not charging me anything, these are the CDNs that Bootstrap, jQuery, etc., recommend using if not downloading.

As of right now I am using the vanilla caching rules, but Netlify is now serving my specific JS and CSS, not third-party libraries. Having made this change has reduced my bandwidth by quite a bit, which again takes me back to the question of why weren’t these assets being cached?

A detail that I didn’t add because I didn’t think it mattered was that I have Zapier running my builds every day on a schedule. In theory if nothing has changed, the caches still work, but maybe I’m wrong and new builds give new etag values. The builds say no files changed most of the time.

Do you have any tools or documentation you recommend to test that caching is working properly. I’m trying to troubleshoot all this, but am at a loss since I cannot see what’s happening. I was hoping that if I paid for Analytics, it would tell me how much cached content was served, etc., but instead I got the same data that Google Analytics was giving me.

I’d also be willing to pay for an hour of support if it meant that by the end of it, I could rest easy that everything was resolved.

Please let me know @fool.

Thanks,

Hi @fool,

Been doing some more experimenting because I don’t have anything to lose yet.

I’ve gotten rid of asset bundling and I’ve moved all libraries such as Bootstrap, Font Awesome, jQuery, etc., back onto Netlify. I’ve removed all custom cache rules from the netlify.toml file. With the exception of my service worker, all behavior should be very vanilla.

With that in mind, I execute the following command:

$ curl -I https://www.thepolyglotdeveloper.com/css/bootstrap/bootstrap.min.css

HTTP/2 200 
accept-ranges: bytes
cache-control: public, max-age=0, must-revalidate
content-length: 121457
content-type: text/css; charset=UTF-8
date: Fri, 30 Aug 2019 20:48:56 GMT
etag: "1be33a500fe9965531b55b00eb157868-ssl"
strict-transport-security: max-age=31536000
age: 0
server: Netlify
x-nf-request-id: e9648d51-c3f5-40de-b217-f552987d4c70-2783111

It seems fine, although, I don’t have the knowledge on this kind of stuff to say that it isn’t fine.

So now, I take information from the previous request and execute the following:

$ curl -I -H 'If-None-Match: "1be33a500fe9965531b55b00eb157868-ssl"' https://www.thepolyglotdeveloper.com/css/bootstrap/bootstrap.min.css

HTTP/2 304 
cache-control: public, max-age=0, must-revalidate
date: Fri, 30 Aug 2019 21:06:07 GMT
etag: "1be33a500fe9965531b55b00eb157868-ssl"
strict-transport-security: max-age=31536000
age: 0
server: Netlify
x-nf-request-id: eaa63499-41f5-44e6-838e-9fffcaf493fd-3156240

As you can see, I’ve taken the etag value from the first request and tested in the second request to see if I get a 304 response, which I did.

I’ve re-read your articles at least 25 times now and as far as I can tell, when a request is sent with the etag, it returns a 304 if the hash hasn’t changed which signals to the requestor to use what exists in cache. If anything other than a 304 comes back, use the fresh copy and store the new etag value.

So here is where it gets strange.

When I open my browser (tried Chrome and Firefox) and look at the etag value, it doesn’t quite match. Instead, I have the following:

1be33a500fe9965531b55b00eb157868-ssl-df

Notice that it is essentially the same as the etag that came back in the cURL request, except this one has a -df appended to the end.

So I take the new etag and run the request in cURL:

$ curl -I -H 'If-None-Match: "1be33a500fe9965531b55b00eb157868-ssl-df"' https://www.thepolyglotdeveloper.com/css/bootstrap/bootstrap.min.css

HTTP/2 200 
accept-ranges: bytes
cache-control: public, max-age=0, must-revalidate
content-length: 121457
content-type: text/css; charset=UTF-8
date: Fri, 30 Aug 2019 21:10:44 GMT
etag: "1be33a500fe9965531b55b00eb157868-ssl"
strict-transport-security: max-age=31536000
age: 0
server: Netlify
x-nf-request-id: eaa63499-41f5-44e6-838e-9fffcaf493fd-3236395

The -df at the end is invalid and the entire resource is obtained again because a 304 response code was not used.

So is there something wrong with the etag feature of Netlify? My browsers appear to be requesting a new resource every time because the etag value doesn’t match cURL. If this is true, it would probably explain my heavy bandwidth usage.

Can you please validate my research and tell me if I’m correct or incorrect? If I’m incorrect, can you please explain why I’m seeing what I’m seeing as well as how to correctly do these tests?

Thanks,

Hi, @nraboy. I also had the same question when I was first learning about how our caching works when I experienced the same issue. There are both a reason and solution for this behavior!

The -df on the end of the etag (1be33a500fe9965531b55b00eb157868-ssl-df) indicates that this is the etag for the compressed version of the URL (df = deflate).

There was a 200 status response because curl normally requests the uncompressed version of a URL so the etag isn’t actually a match. However, you can add an accept-encoding request header and the 304 will occur like so:

$ curl -I -H 'If-None-Match: "1be33a500fe9965531b55b00eb157868-ssl-df"' -H 'accept-encoding: gzip, deflate, br' https://www.thepolyglotdeveloper.com/css/bootstrap/bootstrap.min.css
HTTP/2 304
date: Tue, 03 Sep 2019 02:15:08 GMT
etag: "1be33a500fe9965531b55b00eb157868-ssl-df"
cache-control: public, max-age=0, must-revalidate
server: Netlify
vary: Accept-Encoding
x-nf-request-id: 56251d0c-bbc9-4172-aa60-6a5e55b7c899-3247103

Adding this additional header (using -H 'accept-encoding: gzip, deflate, br' ) means curl is now requesting the gzipped version of the page and this will cause the -df etag header to match - resulting in 304s.

Most web browsers always include the accept-encoding: gzip, deflate, br header by default. On the other hand, curl only sends that header when it is specifically told to do so.

You can also used --compressed in place of that header:

curl --compressed -I -H 'If-None-Match: "1be33a500fe9965531b55b00eb157868-ssl-df"' https://www.thepolyglotdeveloper.com/css/bootstrap/bootstrap.min.css

Additional questions are always welcome. :+1:

Hi @luke,

Thanks for clarity on that particular issue.

Back to my initial. I am still burning through massive amounts of bandwidth. I’ve removed custom caching rules, stopped asset bundling and hashing with Hugo, and have left the site alone for the past few days, but yet I’m burning ~3GB of transfer a day for ~400,000 page views per month.

I’m still failing to understand why I was at ~13GB a month of bandwidth while using Cloudflare with DigitalOcean, but so much more with Netlify. I’ve been asking around externally and everyone seems to think the amount of bandwidth I’m using on Netlify is out of control.

Please let me know.

Thanks,

I ran a test (just using the browser with caching disabled) to load the the site landing page. I saw about 531 KB of data transferred, about 197 KB of which is coming from the site domain (in other words, from Netlify).

Some cocktail napkin math shows that, with 400k page views, the total bandwidth used (imagining all unique visitors with no caching) would be around 75 GB (using GiB so 2^30 not 10^9 bytes for GBs). Now, there is mention above about recent changes around caching and I do show recent bandwidth daily use has dropped, but is still much higher than the stated expectation of around 13 GB a month.

Then I noticed the following on August 30th:

After I offloaded my JS and CSS assets to another CDN, I removed the caching rules that I had added yesterday. To be clear, the other CDNs are not charging me anything, these are the CDNs that Bootstrap, jQuery, etc., recommend using if not downloading.

I am seeing site behavior which doesn’t match this. I think that many of these assets are still hosted at your Netlify hosted domain. This, I believe, is the source of the extra bandwidth at Netlify.

These assets appear to be the vast majority of the bandwidth used and, if they were hosted elsewhere, the bandwidth used would be very close to your stated estimate of 13 GB monthly - and not orders of magnitude higher.

Getting to the details, here are some example asset URLs which currently point to the domain hosted at Netlify, not a third-party CDN:

https://www.thepolyglotdeveloper.com/css/bootstrap/bootstrap.min.css - 19.4 KB
https://www.thepolyglotdeveloper.com/js/bootstrap/bootstrap.min.js - 10.8 KB
https://www.thepolyglotdeveloper.com/webfonts/fa-solid-900.woff2 - 73.8 KB

(Those are the gzipped sizes “on the wire” so the actual bandwidth not the uncompressed size.)

These are assets loaded when I visit the url https://wthepolyglotdeveloper.com/. They are not loaded using a third-party CDN; they come from Netlify.

Of the 197 KB for the initial page load coming from Netlify, these three files above are 104 KB of that total - over half. These are just three examples. There are other assets which you might be factoring as being hosted elsewhere.

For example, two font file URLs below are the largest sources of bandwidth in this billing cycle:

https://www.thepolyglotdeveloper.com/webfonts/fa-solid-900.woff2 4,994,579,936 btyes
https://www.thepolyglotdeveloper.com/webfonts/fa-brands-400.woff2 4,859,231,552 bytes

That is 9.2 GiB of bandwidth. There are also variations of URLs like these with GET parameters, example (in a code block to preserve the string):

https://www.thepolyglotdeveloper.com/webfonts/fa-solid-900.eot?__WB_REVISION__=a547e21eceadf53602caf057be9ad9fd

That URL used 3,833,266,226 bytes this cycle. All told, I see 40,149,099,122 bytes used for fonts under /webfonts/ at this domain in the currently billing cycle.

When you subtract that from your total bandwidth, we are getting closer to the expected value. And that is just the fonts. If CSS and javascript assets were also factored in, I bet the new bandwidth would be very close to the expected value.

Does this explain the bandwidth discrepancies? I

I don’t think your team/account will reach the 100 GB limit this month with the current trends I’m seeing so there shouldn’t be any extra bandwidth charge. If you move these assets elsewhere, I do believe your new monthly bandwidth use will be back to your previous levels.

If there are other questions please let us know.

2 Likes

Hey @luke,

Thanks for getting back to me!

In regards to offloading the static libraries. I had brought them back in prior to your response.

I’m wondering if I’ve been spoiled by Cloudflare and some kind of magic that they do behind the scenes. Maybe they do some kind of shared cache where common resources like my fonts don’t take bandwidth?

Regardless, it does look like bandwidth is mellowing out. Not as low as what I once had, but not the 8GB per day I saw at the beginning.

Some feedback for Netlify:

If I’m paying for Analytics, which I am, I want to be able to access the same information that you see in regards to bandwidth usage. The Analytics package doesn’t give me insight into what the bandwidth is for my font files, etc. Knowing this information would be incredibly helpful so that I don’t need support staff to get involved.

I’m going to sit on the site for a while and monitor what happens.

Thanks,

1 Like