Toger Blog

Cloud-Init With CentOS7, Chef, and SELinux

The CentOS 7 AMI in Amazon comes with Cloud-Init (cloud-init-0.7.5-10.el7.centos.1.x86_64). This is quite handy as it assists in automating several bootup tasks. One of these tasks is to install and bootstrap Chef. Unforunately, when SELinux is installed the Chef handler will fail.

Sample error:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
[CLOUDINIT] util.py[DEBUG]: Restoring selinux mode for /var/lib (recursive=True)
[CLOUDINIT] util.py[DEBUG]: Running chef (<module 'cloudinit.config.cc_chef' from '/usr/lib/python2.7/site-packages/cloudinit/config/cc_chef.py'>) failed
       Traceback (most recent call last):
         File "/usr/lib/python2.7/site-packages/cloudinit/stages.py", line 658, in _run_modules
           cc.run(run_name, mod.handle, func_args, freq=freq)
         File "/usr/lib/python2.7/site-packages/cloudinit/cloud.py", line 63, in run
           return self._runners.run(name, functor, args, freq, clear_on_fail)
         File "/usr/lib/python2.7/site-packages/cloudinit/helpers.py", line 197, in run
           results = functor(*args)
         File "/usr/lib/python2.7/site-packages/cloudinit/config/cc_chef.py", line 54, in handle
           util.ensure_dir(d)
         File "/usr/lib/python2.7/site-packages/cloudinit/util.py", line 1291, in ensure_dir
           os.makedirs(path)
         File "/usr/lib/python2.7/site-packages/cloudinit/util.py", line 167, in __exit__
           self.selinux.restorecon(path, recursive=self.recursive)
         File "/usr/lib64/python2.7/site-packages/selinux/__init__.py", line 95, in restorecon
           for fname in fnames]), None)
         File "/usr/lib64/python2.7/posixpath.py", line 246, in walk
           walk(name, func, arg)
         File "/usr/lib64/python2.7/posixpath.py", line 238, in walk
           func(arg, top, names)
         File "/usr/lib64/python2.7/site-packages/selinux/__init__.py", line 95, in <lambda>
           for fname in fnames]), None)
         File "/usr/lib64/python2.7/site-packages/selinux/__init__.py", line 85, in restorecon
           status, context = matchpathcon(path, mode)
       OSError: [Errno 2] No such file or directory

Extending AWS Instance Trust

All nodes in EC2 can fetch their Instance Identity Documents. This returns an AWS-signed block containing data about the instance that requested it. This data is only available to the instance that fetched it, so if it turns around and presents it to some other service you can be confident it originated from the machine in question.

This is useful for secrets-enrollment processes where you want a node to be able to attest to its own identity in a secure fashion. Your enrollment mechanism can ensure that the node is ‘young’ enough to be making the request, and that the enrollment has never occured before. Your enrollment tool can also look up other data about the instance (autoscale group, CloudFormation stack, etc) to determine what privileges it should be granted.

In this manner you can pivot on the security features AWS offers for identifying a particular node to another tool.

Passing Host IP to ECS

I’ve been looking for a way to run Consul ‘standalone’ on a host and let multiple ECS containers connect to it. I was hoping to find a macro I could put in a container definition but such does not yet exist. Instead I realized that I can have my Docker instance query the instance metadata service and get this information. It is not quite as elegant docker-wise but it should work until something better comes along.

Chef Too Big for the Kitchen

While pondering running a full ‘ride-the-wave’ auto-scaling solution in AWS, I looked closely at my Chef installation. The environment is very Chef-heavy with fairly generic AMI that runs a slew of Chef recipes to bring it up to the needs of that particular role. The nodes are in Autoscale groups and get their role designation from the Launch Configuration.

On average a node invoked approximately 55 recipes (as recorded by seen_recipes in the audit cookbook). Several of those recipes bring in resources from (authenticated) remote locations that have very good availability but are not in my direct control. Ignoring the remote-based recipes there is still a significant number of moving parts that can be disrupted unexpectedly, such as by other cookbook / role / KV store changes. This is acceptable-if-not-ideal when nodes are generally brought into being under the supervision of an operator who can resolve any issues, or for when the odd 1-of-x00 pooled nodes dies and is automatically replaced. This risk is manageable when the environment is perpetually scaled for peak traffic.

However, when critical nodes are riding the wave of capacity then the chance that something will eventually break during scale-up and cause the ‘wave’ to swamp the application becomes 100%. That requires an operator to adjust the problem under a significant time crunch as the application is overwhelmed by traffic — hardly a recipe for success. The more likely scenario is breakage at some odd hour of the morning as users wake up, and the application fails before an operator can intervene to keep the application alive.

I looked at my Chef construction and realized it was less Infrastructure As Code (IaC) and more like Compile In Production (CIP).

AWS ECS and Docker Exit (137)

I ran into this the other day, my ECS instances were dieing off and docker ps showed Exited (137) About a minute ago. Looking at docker inspect I noticed:

1
2
3
4
"State": {
 "FinishedAt": "2015-09-20T21:38:58.188768082Z",
    "OOMKilled": true
  },

This tells me that ECS / Docker enforced the memory limit for the container, and the out-of-memory-killer killed off the contained processes. Raising the ECS memory limit for this process resolved the issue.

Carl’s Jr, SPF and AWS

Carl’s Jr has a nifty nutritional calculator / order planner at http://www.carlsjr.com/menu/nutritional_calculator. It lets you fully customize your meal, then lets you print or email your order to yourself with all the magic words to say to get your meal as planned (subbing cheese, extra / 2x / no onion, etc).

Tonight I used this to pre-assemble a highly customized meal for my family. I triggered it to send me an email (easier to read on my phone at the order window) and anxiously awaited.

No email was received.

I host my own email and use http://rollernet.us for my public incoming MX relays; they are nifty as they have a ton of highly configurable anti-spam features that I can apply ‘at the edge’ and lets my actual mailserver run much leaner since SpamAssassin et all are resource intensive.

Rollernet logs stated:

1
2
3
4
Connection from 54.236.226.113 rejected by mail.rollernet.us
From: carlsjr@carlsjr.com
To: my_email  
Reason: SPF fail (Mechanism -all matched)

Oh ho! I temporarily disabled SPF checking (and greylisting) and sent another meal through, and the email header said:

1
2
Received-SPF: fail (carlsjr.com: Sender is not authorized by default to use 'carlsjr@carlsjr.com' in 'mfrom' identity (mechanism '-all' matched)) receiver=mail2.rollernet.us; identity=mailfrom; envelope-from="carlsjr@carlsjr.com";
        helo=ip-10-198-0-85.localdomain; client-ip=54.236.168.30

I fetched their SPF record at with http://www.kitterman.com/spf/validate.html (though any DNS fetching tool would work) and received:

1
v=spf1 ip4:63.168.109.0/24 ip4:67.203.173.0/26 ip4:216.87.35.224/27 mx include:spf.protection.outlook.com -all

This tells me they use outlook.com for their internal email, and only allow a few subnets to originate mail from them — and the source IP I got was not one of them. 54.236.168.30 resolves to ec2-54-236-168-30.compute-1.amazonaws.com. www.carlsjr.com resolves to what appears to be a Cloudformation-based AWS environment:

1
2
3
4
$ host www.carlsjr.com
www.carlsjr.com is an alias for CKEMKTPRDLB-20130419-1810707626.us-east-1.elb.amazonaws.com.
CKEMKTPRDLB-20130419-1810707626.us-east-1.elb.amazonaws.com has address 54.236.231.179
CKEMKTPRDLB-20130419-1810707626.us-east-1.elb.amazonaws.com has address 52.1.95.39

I suspect their Carl’s AWS-based web farm is generating the outgoing mails directly, and they have not accounted for that in their SPF configuration. Using Amazon Simple Email Service (SPF) would have accounted for this already (http://docs.aws.amazon.com/ses/latest/DeveloperGuide/authenticate-domain.html). This could also be handled by designating a outgoing mail host from their AWS environment with an Elastic IP attached and add it to their SPF record.

I sent them an email detailing the issue at their corporate email address. I’ll update if I hear back, but I don’t expect it will ever reach anyone who knows what to do with it.

Update: I got an email back stating the issue was being routed ‘to the appropriate department’.

Varnish Cache and req.backend.healthy

An odd issue I ran into the other day: I had a Varnish 3 instance that had logic hinging on req.backend.healthy to show a special error page if all the backends were down. That logic inexpliciably triggered even though all my backends were up! After much head-scratching I identified the issue: one of my historical VCLs was still loaded that no longer had any healthy backends (due to repeated autoscaling up / down), and although the current definition of that director had healthy backends, the historical one did not. Varnish has a habit of not letting go of old VCLs even if you specify vcl.discard on them. So, req.backend.healthy will show the director as down if any prior definition of that director is down. Since the only way to definitively remove VCLs from memory is a restart (which flushes the memory cache), this makes req.backend.healthy fairly unreliable.

This is in v3 and may not apply to v4 anymore.

2FA and Minecraft Server

Two Factor Authentication (2FA) is an additional layer of protection you can add to your Minecraft server. You should already be relying on SSH keys to access your server, but those keys can be lost or leaked if you use them on untrusted machines. 2FA protects you from hacks resulting from someone gaining access to your password or SSH key.

DuoSecurity is a company that provides a 2FA service that is free for personal use. A Mobile Push notification is sent via their Android / iOS app whenever a login is attempted and requires an affirmative response before login can proceed. There is also SMS and computer-generated voice calls, but those consume credits that have a cost to refill. There is also a cost for going over 10 users, which should not be an issue for adminstation of a Minecraft server.

I performed a source install to get the latest version, there are also packages for RHEL/CentOS/Debian/Ubuntu on their website.

They offer a SSH-specific installation and a PAM installation that covers all auth on the machine. PAM is likely the more comprehensive solution but is a more involved process, plus requires a SELinux policy update (and the server resulting from this series has SELinux enabled). Their website tells you how to install the relevant SELinux policy but the necessary objects are missing from their download. I’ve send them a mail and will update if I get clarification. For the record, the error I got was:

1
2
3
4
5
6
7
8
9
10
11
12
make: Entering directory `/home/centos/duo_unix-1.9.14/pam_duo'
checkmodule -M -m -o authlogin_duo.mod authlogin_duo.te
checkmodule:  loading policy configuration from authlogin_duo.te
checkmodule:  unable to open authlogin_duo.te
make: [semodule] Error 1 (ignored)
semodule_package -o authlogin_duo.pp -m authlogin_duo.mod
semodule_package:  Could not open file No such file or directory:  authlogin_duo.mod
make: [semodule] Error 1 (ignored)
semodule -i authlogin_duo.pp
semodule:  Failed on authlogin_duo.pp!
make: [semodule] Error 1 (ignored)
make: Leaving directory `/home/centos/duo_unix-1.9.14/pam_duo'

SSH setup is straightforward. Install the DuoSecurity app on your mobile device. Create an account on their website. Navigate their portal and select Applications, then +New Application. Give it name of Minecraft SSH. This will result in a window showing a Integration key, Secret key (requiring a click to show), and an API hostname. Collect those and store them to the side (securely!)

https://www.duosecurity.com/docs/duounix#instructions describes the compile process:

1
2
3
4
wget https://dl.duosecurity.com/duo_unix-latest.tar.gz
tar zxf duo_unix-latest.tar.gz
cd duo_unix-1.9.14
./configure --prefix=/usr && make && sudo make install

After installing the binaries, modify /etc/duo/login_duo.conf and fill in the ikey/skey/host values from above. Uncomment pushinfo as well so that more details about the login are sent to you.

https://www.duosecurity.com/docs/duounix#centos describes modifying SSH to work with the new system. Keep an alternate root-logged in session open in an alternate window while attempting this so you can back out if you make a mistake. Do not log out of the ‘backup’ window until you’ve thoroughly exercised the system. I will not be responsible if you lock yourself out.

Add ForceCommand /usr/sbin/login_duo to the end of /etc/sshd/sshd_config, then restart sshd (which does not kick out existing sessions) with systemctl restart sshd.service

Now attempt to log in to your account. You will be prompted to enroll in 2FA. Upon logging in again you will be prompted for your 2FA method, which will look like:

1
2
3
4
5
6
7
8
$  ssh minecraft.example.com -i ~/.ssh/minecraft.pem -l centos
Duo two-factor login for centos

Enter a passcode or select one of the following options:

 1. Duo Push to XXX-XXX-1234
 2. Phone call to XXX-XXX-1234
 3. SMS passcodes to XXX-XXX-1234

I always choose #1 as I do not want to expend credits, though if push failed I could resort to the other two. When I choose #1 I get a popup on my phone asking me to accept / deny the login, and after choosing Accept I am able to log in.

Now I do not have to worry someone will get ahold of my minecraft.pem and gain access to the server. Further, if I get a login-request on my phone and haven’t attemped to log in, I know my key has been compromised.

DSL Modem and Power Strips

I had cause to call CenturyLink support recently. As one of the troubleshooting steps they seriously claimed that a power strip is not capeable of providing sufficient power to a DSL modem, and that it must be connected directly to the wall to receive sufficient power.

I suspect this was to force me to unplug and powercycle the modem, but I had done that several times already. The idea that a DSL modem draws more power than will flow through a power strip is ridiculous.

Minecraft and Datadog Monitoring

DataDog is a nifty monitoring / statistics gathering system. It is something like a akin to a combination of Graphite / Grafana, but with a social aspect so that your team can attach discussions to a given point in time. They have a free tier that retains data for a day, which is handy for visualizing the state of the Minecraft server.

Java applications normally expose their statistics via JMX. I did not see anything Minecraft-specific in my stock instance, but Java itself exposes several counters that are informative.

I created my Datadog account, procured my API key, and installed the agent with:

1
DD_API_KEY=MyAPIKey  bash -c "$(curl -L https://raw.githubusercontent.com/DataDog/dd-agent/master/packaging/datadog-agent/source/install_agent.sh)"

JMX is not enabled by default for Java processes, so I updated my systemd unit file in /etc/systemd/system/minecraft.service to include the JMX configuration: