Deploy is hanging - PostCSS problem

I recently updated one of my jekyll-plugins, and now it seems that deploys are hanging.

The plugin is ruby code, that runs a node script, which starts a tcp server to process code with PostCSS. The server will print “PostCSS Server listening on port 8124…” when it starts up, but I’m not seeing that.

The only log I’m seeing is “Configuration file: /opt/build/repo/_config.yml”, which tells me that jekyll has indeed started building, but hasn’t been able to start the server.

My ideas for the problem are

  • Path issue, the plugin cant find the script
  • Sandbox issue, Netlify doesn’t allow certain Node.js APIs

Both of which are things that I have low observability, I’d appreciate any help from the Netlify employees that have more domain knowledge of the build infrastructure.

Deploy log: https://app.netlify.com/sites/mitchell-hanberg/deploys/5ec0221cd148e6000625f9c3
Repo: https://github.com/mhanberg/blog
Plugin: https://github.com/mhanberg/jekyll-postcss

Thanks!

I went ahead and assumed that you can’t start a TCP server, and changed the gem to only start the server in development, and just shell out the command in all other environments.

My problem has been solved, but I’d still like to know from a Netlify employee if that is an actual limitation.

Thanks!

Hey @mhanberg,
Very glad to hear that you found a workaround! I was digging into logs on our side to try to understand what was happening but wasn’t finding anything particularly useful- basically just build timeouts after 15-20 minutes of trying, which you could already see in the logs you shared.

I think you’re probably right about a TCP sever during not working during build, but I didn’t want to say anything conclusive without checking with colleagues… who will be back in the “office” on Monday.

I will leave this tagged “unanswered” until then and hopefully you’ll get more of an answer then!

If you really want, one thing that might give you a little more insight would be downloading our build image and building your site in there. There are instructions here under “What next?”:

Or here’s the link right to the repo:

you can start a TCP server - it’s just not super useful or easy to do in a good way:

  • the server would not be reachable directly over the internet, so only connections from localhost would be accepted
  • you’d have a hard time finding a way to stop it after it run, unless you used some magic around running things in the background and setting timers to kill them - our build won’t exit until ALL your processes exit - servers and all - and if they haven’t exited before time runs out you’ve paid for the CPU time, but gotten no deploy published since “didn’t fully exit all processes” == “unsuccessful” by our standards.

I also have this problem on gallant-wright-da5879

I’m deploying a new site using the same configurations as other sites I’ve deployed in the past, but with a couple of packages upgraded. The package postcss-cli had a vulnerability that was shown in an earlier deploy. I thought that might be the problem.

After committing, it gets to the build command and times out on /opt/build/repo/_config.yml with no output, as follows:

5:17:39 PM: Executing user command: jekyll build
5:17:40 PM: Configuration file: /opt/build/repo/_config.yml
5:43:55 PM: Build exceeded maximum allowed runtime

Example build: https://app.netlify.com/sites/gallant-wright-da5879/deploys/5ec7fa4046115b0006d3a03f

How long does it take to build locally? That sounds like a pretty normal “long build” that our system protects against…

This build was a fork off the same template: https://app.netlify.com/sites/silly-rosalind-46cc1c/deploys/5e4450cecfa8940008a67ad0

That took 1m2.073214756s

There is a difference between the bundle and the jekyll build though:

silly-rosalind:

7:24:25 PM: Gem bundle installed
7:24:25 PM: Started restoring cached node modules
7:24:25 PM: Finished restoring cached node modules
7:24:25 PM: Started restoring cached go cache
7:24:25 PM: Finished restoring cached go cache
7:24:25 PM: unset GOOS;
7:24:25 PM: unset GOARCH;
7:24:25 PM: export GOROOT='/opt/buildhome/.gimme/versions/go1.12.linux.amd64';
7:24:25 PM: export PATH="/opt/buildhome/.gimme/versions/go1.12.linux.amd64/bin:${PATH}";
7:24:25 PM: go version >&2;
7:24:25 PM: export GIMME_ENV='/opt/buildhome/.gimme/env/go1.12.linux.amd64.env';
7:24:25 PM: go version go1.12 linux/amd64
7:24:25 PM: Installing missing commands
7:24:25 PM: Verify run directory
7:24:25 PM: Executing user command: jekyll build
7:24:26 PM: Configuration file: /opt/build/repo/_config.yml

gallent-wright:

4:31:26 PM: Gem bundle installed
4:31:26 PM: 5.2 is already installed.
4:31:26 PM: Using Swift version 5.2
4:31:26 PM: Started restoring cached node modules
4:31:26 PM: Finished restoring cached node modules
4:31:26 PM: Installing NPM modules using NPM version 6.14.4
4:31:31 PM: npm WARN repo No repository field.
4:31:31 PM: npm WARN repo No license field.
4:31:31 PM: npm WARN optional SKIPPING OPTIONAL DEPENDENCY: fsevents@2.1.3 (node_modules/fsevents):
4:31:31 PM: npm WARN notsup SKIPPING OPTIONAL DEPENDENCY: Unsupported platform for fsevents@2.1.3: wanted {"os":"darwin","arch":"any"} (current: {"os":"linux","arch":"x64"})
4:31:31 PM: added 195 packages from 135 contributors and audited 196 packages in 3.385s
4:31:31 PM: 12 packages are looking for funding
4:31:31 PM:   run `npm fund` for details
4:31:31 PM: found 0 vulnerabilities
4:31:31 PM: NPM modules installed
4:31:31 PM: Started restoring cached go cache
4:31:31 PM: Finished restoring cached go cache
4:31:31 PM: go version go1.12 linux/amd64
4:31:31 PM: go version go1.12 linux/amd64
4:31:31 PM: Installing missing commands
4:31:31 PM: Verify run directory
4:31:31 PM: Executing user command: jekyll build
4:31:32 PM: Configuration file: /opt/build/repo/_config.yml

Sorry, I see you said locally.

Locally:

$ jekyll build
Configuration file: path/to/project/_config.yml
            Source: path/to/project/
       Destination: path/to/project/_site
 Incremental build: disabled. Enable with --incremental
      Generating... 
PostCSS Server listening on port 8124...
       Jekyll Feed: Generating feed for posts
                    done in 2.295 seconds.
 Auto-regeneration: disabled. Use --watch to enable.

Hi, @JalisoCSP, I’m able to reproduce this. It works locally but not in our build system.

Something different is happening with this build at Netlify. I think the issue is likely due something relating to how the PostCSS Server process is forked.

As a test to confirm this (which would narrow the search area for the root cause), I would like to disable this server. Is it possible to build the site with PostCSS disabled? If so, how is that done?

Hey @luke

You’d need to remove the jekyll-postcss from the Gemfile and _config.yml (and bundle again)

I’ve done that and successfully redeployed here: https://app.netlify.com/sites/gallant-wright-da5879/deploys/5ed0dd7c9cf3f30006031b2d

… but now I don’t have postcss :thinking:

Is this a Netlify issue or is there something I can do?

Hey team,

I’ve done some more digging and had a successful deploy downgrading jekyll-postcss to v0.2.2

There is a breaking change in the next version (0.3.0), as listed here: https://github.com/mhanberg/jekyll-postcss/blob/master/CHANGELOG.md#030

[Breaking?]: Uses postcss instead of postcss-cli . I think that it will continue to work without changing your dependencies since postcss-cli uses postcss as a dependency.

I’ve raised an issue on their Github here: https://github.com/mhanberg/jekyll-postcss/issues/15

Thanks

Hello all, sorry I didn’t see this sooner, it seems I don’t have email notifications set up correctly.

@fool

the server would not be reachable directly over the internet, so only connections from localhost would be accepted.

The server is a development convenience, and is not meant to be accessed over the internet

you’d have a hard time finding a way to stop it after it run, unless you used some magic around running things in the background and setting timers to kill them - our build won’t exit until ALL your processes exit - servers and all - and if they haven’t exited before time runs out you’ve paid for the CPU time, but gotten no deploy published since “didn’t fully exit all processes” == “unsuccessful” by our standards.

I use this shell script to start he node.js process, which ensure that it is killed when the calling process (Jekyll, in this case) dies.

@luke

As a test to confirm this (which would narrow the search area for the root cause), I would like to disable this server. Is it possible to build the site with PostCSS disabled? If so, how is that done?

My workaround for this was to only start the node.js TCP server when JEKYLL_ENV=development or is not set. So you can disable it by setting JEKYLL_ENV=production. This is available in v0.3.1 and higher.

@JalisoCSP

I’ve done some more digging and had a successful deploy downgrading jekyll-postcss to v0.2.2

There is a breaking change in the next version (0.3.0), as listed here: https://github.com/mhanberg/jekyll-postcss/blob/master/CHANGELOG.md#030

That breaking change is not related to this Netlify issue.

Hi, @mhanberg, if one version of Jekyll PostCSS works on our build system and another version does not, this would imply that there is a change in Jekyll PostCSS causing this issue. It doesn’t prove it but is seemly likely to me that some change the dependency itself either is a) the root cause of the issue or b) triggering some issue in the build system not seen either an earlier version of the dependency.

In situations like these it is often helpful to add more verbose logging to identify where in the code the build process is getting stuck.

Have you tried adding logging before starting and after stopping PostCSS in the build or does such logging already exist? Do all lines get logged during the deploy? (If so, would you send us a link to those deploy logs and let us know what logging to look for?) Also, it is possible to get PostCSS itself to log when it exits? If so, would you test that as well and let us know if this logging occurs or not?

1 Like