AWS Cloudwatch agent config file removed after startup

Question:

Problem

I am simply trying to install Cloudwatch Agent on Amazon Linux 2 instances at startup, using AWS userdata. For some reason, after Cloud Init has finished running, all services get restarted and the configuration file I put in the cloudwatch folder is not there anymore.

I am using a custom AMI which is pre-built with Packer, my configuration file being put in /opt/aws/amazon-cloudwatch-agent/etc/custom/amazon-cloudwatch-agent.json from an Ansible template. This is the configuration file I want to use, holding all metrics and logs I want to send. I am then copying it to /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json at startup after the agent installation.

Here is my userdata script:

What is happening

After startup has finished, I can see the script ran correctly. If I run cat /opt/aws/amazon-cloudwatch-agent/log/amazon-cloudwatch-agent.log I can see that the following:

So as you can see, the initial command from userdata runs fine and custom metrics and logs are collected (see the ====> mark before the relevant lines).
However a few seconds later, after Cloud Init is over, the cloudwatch agent is restarted by systemd somehow and again, somehow, the file amazon-cloudwatch-agent.json is absent from the filesystem, so the agent runs with default parameters.

However if I rerun the command manually after startup everything works fine but of course I need it automated for when autoscaling fires up.

What I have tried

Launching amazon cloudwatch agent directly with systemd, trying to chown the config file to read-only, fetching config only and let the system start the agent itself, but the problem still persists.

Thank you for your help

Answer:

Workaround

The preinstalled ssm-agent conflicts with the Cloudwtach Agent. Uninstall ssm-agent during Packer build:

Explanation

I finally found out that the newly install cloudwatch agent conflicts with the SSM agent installed by default in the Amazon Linux 2 image.
Indeed, I first tried an ugly workaround which would be to replace the StartExec line of the amazon-cloudwatch-agent service using sed in the user data :

That way when the service gets restarted after instance startup it would use my custom configuration.
However I then found out that the service file got also replaced after Cloud Init ended.

Reviewing the system messages I noticed that ssm-agent was performing some configuration reloading after Cloud Init ended, and thus I assumed that it could possibly be the culprit.
I ended up uninstalling it in the packer build which is building my AMI so it would not be present at instance startup, and finally my configuration did not get overwritten anymore.

Note that I do not have a deep understanding of how ssm-agent works, and there is probably a proper way to instantiate Cloudwatch Agent using some SSM configuration.
Since we do not currently use SSM and I do not have enough time to study this option, I choosed this compromise.

If someone can come up with a cleaner solution, using ssm-agent through an automated method, this would be greatly appreciated.

Leave a Reply