Toger Blog

DNS as a Service Discovery Load Balancer

DNS makes for a deceptively easy service-discovery platform. Various platforms (such as Docker swarm, Amazon Auto-Naming purport to provide a trivial to use service discovery mechanism. DNS is handled in most applications, so has the allure of working with legacy applications with zero integration work. Contrast DNS with purpose-built service discovery mechanisms such as Consul, Zookeeper, and Etcd or Netflix Eureka. These tools require additional effort on the part of developers to integrate, but the result is much more robust. I tested several platforms below as to how they handle DNS round-robin loadbalancing.

Home Automation Ideas

My house has a bevy of Z-Wave devices, plus a few LiFX bulbs. This has allowed for some fairly fun automations:

  • When the garage door opens, a tilt sensor attached to the door turns the external bulbs on to bright-white, and activates the garage internal lights. The outside lights revert to normal ‘dim 2700K’ after 3 minutes, and the garage lights goes off after about 10 minutes.

  • A chime sounds upstairs whenever an exterior door is opened.

  • The exterior lights come on at ‘civil twilight’ and extinguish at ‘civil dawn’

  • The exterior doors lock whenever the garage door closes. No more ‘Did I lock the front door?’ doubts.

  • The exterior doors lock at a certain time each night.

  • The kids’ room and desk lamps turn off at midnight. Sometimes they get up after bedtime and ‘play’ then fall asleep.

  • The bathroom and downstairs hallway lights turn on concurrent with my morning alarm. It is much easier to get up and walk into a lit room then in pitch black.

  • Telling Alexa to ‘tell my house its bedtime’ locks all exterior doors, turns off all lights not in the kids rooms, turns on the stairway and master bath lights, then turns off the stairs light in 5 minutes.

  • Alexa can be instructed to turn on or off any or all of the downstairs lights; individually or collectively.

  • The Roku in the master bedroom, which inexplicibly has an audible fan, turns off at midnight.

  • The outlet adjacent to the master bedroom turns off at 4am. This used to control a window fan with the goal of pulling in cool air at night but turning off before anyone gets up to avoid blasting them with frigid air. This rule has been rendered moot with the installation of an AirScape whole-house fan.

  • The pantry and coat-closet lights turn off after 5 minutes.

  • A notification is sent each day during the summer when the exterior temperature crosses over the interior temperature, to either open or close the windows and to turn on the house fan.

This is all accomplished with a combination of ZWave devices, a Zwave.me Razberry Pi, a few LiFX bulbs, Amazon Echo, and AirScape fan w/2nd gen controls.

Home automation can be quite addicting — its impossible to install just one device!

Jenkinsfile and Github Hooks

Jenkins is the juggernaut of the build pipeline ecosystem. Its roots go back to bespoke local build pipelines and dedicated build engineers. In present days the likes of CircleCI, CodeShip, Drone.io have started nipping at Jenkins heels.

One of the significant differences is that, historically, Jenkins builds are configured inside Jenkins. The contemporary alternatives generally favor a SCM-managed configuration file (often YAML format). This is useful as it removes the specialness of the Golden Build Host (ever had someone corrupt a Jenkins configuration? Getting back to par is a pain).

Jenkins has joined the in-repo build configuration with their Jenkinsfile. Unfortunately the documentation for this format is abysmal. Truly wretched. The other day I was trying to make use of the Multibranch Workflow aka Pipelines function in Jenkins and attempted to get a Github-triggered build hook working. Hours of googling around to try to find what this should look like and I eventually stumbled across the sample below. Having found it I’ve further looked to see if there was any location in any sort of official documentation such that I should have been able to discern this as the correct answer, and come up empty. I truly do not know how anyone makes all but the most trivial Jenkinsfile’s function.

Having ranted about that, the resulting sample is:

Jenkinsfile
1
2
3
4
5
6
7
8
9
10
11
12
13
14
properties([
    pipelineTriggers([
      [$class: "GitHubPushTrigger"]
    ])
  ])


node('docker') {
  checkout scm

  stage('Create Docker Image') {
   sh 'docker build -t dockerhub.example.com/example:latest .'
  }
}

AWS Route53 Failover and ALB

I ran across this little gotcha recently.

Scenario: A service hosted in two regions. Each is fronted by either an ALB or ELB with an attached Autoscale group, that uses the LB healthcheck to determine instance health. A Route53 configuration balances trafic between the two. Route53 ‘Evaluate Target Health’ is set to yes and no healthcheck is attached.

Under the ELB, if the backend application fails in a region, the ELB will trigger termination of the application nodes. Route53 will consider the region unhealthy if all the backends are sick or unregistered and fail over to the remaining region.

Under the ALB, the same occurs except Route53 does not consider an empty ALB as unhealthy and will continue to send traffic to a region with no registered backends.

This is to a certain extent understandable, as ALBs allow attaching multiple target groups and its not immediately obvious what to do when there is a mix of statuses. I suspect the common case is that most ALBs have exactly one target group attached though and that could be used as the status, or allow Route53 to be bound to a specific target group.

The current workaround is to use a Route53 Healthcheck (at an additional $1+ per month per check) to have Route53 perform an application healthcheck against each origin.

Cloud-Init With CentOS7, Chef, and SELinux

The CentOS 7 AMI in Amazon comes with Cloud-Init (cloud-init-0.7.5-10.el7.centos.1.x86_64). This is quite handy as it assists in automating several bootup tasks. One of these tasks is to install and bootstrap Chef. Unforunately, when SELinux is installed the Chef handler will fail.

Sample error:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
[CLOUDINIT] util.py[DEBUG]: Restoring selinux mode for /var/lib (recursive=True)
[CLOUDINIT] util.py[DEBUG]: Running chef (<module 'cloudinit.config.cc_chef' from '/usr/lib/python2.7/site-packages/cloudinit/config/cc_chef.py'>) failed
       Traceback (most recent call last):
         File "/usr/lib/python2.7/site-packages/cloudinit/stages.py", line 658, in _run_modules
           cc.run(run_name, mod.handle, func_args, freq=freq)
         File "/usr/lib/python2.7/site-packages/cloudinit/cloud.py", line 63, in run
           return self._runners.run(name, functor, args, freq, clear_on_fail)
         File "/usr/lib/python2.7/site-packages/cloudinit/helpers.py", line 197, in run
           results = functor(*args)
         File "/usr/lib/python2.7/site-packages/cloudinit/config/cc_chef.py", line 54, in handle
           util.ensure_dir(d)
         File "/usr/lib/python2.7/site-packages/cloudinit/util.py", line 1291, in ensure_dir
           os.makedirs(path)
         File "/usr/lib/python2.7/site-packages/cloudinit/util.py", line 167, in __exit__
           self.selinux.restorecon(path, recursive=self.recursive)
         File "/usr/lib64/python2.7/site-packages/selinux/__init__.py", line 95, in restorecon
           for fname in fnames]), None)
         File "/usr/lib64/python2.7/posixpath.py", line 246, in walk
           walk(name, func, arg)
         File "/usr/lib64/python2.7/posixpath.py", line 238, in walk
           func(arg, top, names)
         File "/usr/lib64/python2.7/site-packages/selinux/__init__.py", line 95, in <lambda>
           for fname in fnames]), None)
         File "/usr/lib64/python2.7/site-packages/selinux/__init__.py", line 85, in restorecon
           status, context = matchpathcon(path, mode)
       OSError: [Errno 2] No such file or directory

Extending AWS Instance Trust

All nodes in EC2 can fetch their Instance Identity Documents. This returns an AWS-signed block containing data about the instance that requested it. This data is only available to the instance that fetched it, so if it turns around and presents it to some other service you can be confident it originated from the machine in question.

This is useful for secrets-enrollment processes where you want a node to be able to attest to its own identity in a secure fashion. Your enrollment mechanism can ensure that the node is ‘young’ enough to be making the request, and that the enrollment has never occured before. Your enrollment tool can also look up other data about the instance (autoscale group, CloudFormation stack, etc) to determine what privileges it should be granted.

In this manner you can pivot on the security features AWS offers for identifying a particular node to another tool.

Passing Host IP to ECS

I’ve been looking for a way to run Consul ‘standalone’ on a host and let multiple ECS containers connect to it. I was hoping to find a macro I could put in a container definition but such does not yet exist. Instead I realized that I can have my Docker instance query the instance metadata service and get this information. It is not quite as elegant docker-wise but it should work until something better comes along.

Chef Too Big for the Kitchen

While pondering running a full ‘ride-the-wave’ auto-scaling solution in AWS, I looked closely at my Chef installation. The environment is very Chef-heavy with fairly generic AMI that runs a slew of Chef recipes to bring it up to the needs of that particular role. The nodes are in Autoscale groups and get their role designation from the Launch Configuration.

On average a node invoked approximately 55 recipes (as recorded by seen_recipes in the audit cookbook). Several of those recipes bring in resources from (authenticated) remote locations that have very good availability but are not in my direct control. Ignoring the remote-based recipes there is still a significant number of moving parts that can be disrupted unexpectedly, such as by other cookbook / role / KV store changes. This is acceptable-if-not-ideal when nodes are generally brought into being under the supervision of an operator who can resolve any issues, or for when the odd 1-of-x00 pooled nodes dies and is automatically replaced. This risk is manageable when the environment is perpetually scaled for peak traffic.

However, when critical nodes are riding the wave of capacity then the chance that something will eventually break during scale-up and cause the ‘wave’ to swamp the application becomes 100%. That requires an operator to adjust the problem under a significant time crunch as the application is overwhelmed by traffic — hardly a recipe for success. The more likely scenario is breakage at some odd hour of the morning as users wake up, and the application fails before an operator can intervene to keep the application alive.

I looked at my Chef construction and realized it was less Infrastructure As Code (IaC) and more like Compile In Production (CIP).

AWS ECS and Docker Exit (137)

I ran into this the other day, my ECS instances were dieing off and docker ps showed Exited (137) About a minute ago. Looking at docker inspect I noticed:

1
2
3
4
"State": {
 "FinishedAt": "2015-09-20T21:38:58.188768082Z",
    "OOMKilled": true
  },

This tells me that ECS / Docker enforced the memory limit for the container, and the out-of-memory-killer killed off the contained processes. Raising the ECS memory limit for this process resolved the issue.

Carl’s Jr, SPF and AWS

Carl’s Jr has a nifty nutritional calculator / order planner at http://www.carlsjr.com/menu/nutritional_calculator. It lets you fully customize your meal, then lets you print or email your order to yourself with all the magic words to say to get your meal as planned (subbing cheese, extra / 2x / no onion, etc).

Tonight I used this to pre-assemble a highly customized meal for my family. I triggered it to send me an email (easier to read on my phone at the order window) and anxiously awaited.

No email was received.

I host my own email and use http://rollernet.us for my public incoming MX relays; they are nifty as they have a ton of highly configurable anti-spam features that I can apply ‘at the edge’ and lets my actual mailserver run much leaner since SpamAssassin et all are resource intensive.

Rollernet logs stated:

1
2
3
4
Connection from 54.236.226.113 rejected by mail.rollernet.us
From: carlsjr@carlsjr.com
To: my_email  
Reason: SPF fail (Mechanism -all matched)

Oh ho! I temporarily disabled SPF checking (and greylisting) and sent another meal through, and the email header said:

1
2
Received-SPF: fail (carlsjr.com: Sender is not authorized by default to use 'carlsjr@carlsjr.com' in 'mfrom' identity (mechanism '-all' matched)) receiver=mail2.rollernet.us; identity=mailfrom; envelope-from="carlsjr@carlsjr.com";
        helo=ip-10-198-0-85.localdomain; client-ip=54.236.168.30

I fetched their SPF record at with http://www.kitterman.com/spf/validate.html (though any DNS fetching tool would work) and received:

1
v=spf1 ip4:63.168.109.0/24 ip4:67.203.173.0/26 ip4:216.87.35.224/27 mx include:spf.protection.outlook.com -all

This tells me they use outlook.com for their internal email, and only allow a few subnets to originate mail from them — and the source IP I got was not one of them. 54.236.168.30 resolves to ec2-54-236-168-30.compute-1.amazonaws.com. www.carlsjr.com resolves to what appears to be a Cloudformation-based AWS environment:

1
2
3
4
$ host www.carlsjr.com
www.carlsjr.com is an alias for CKEMKTPRDLB-20130419-1810707626.us-east-1.elb.amazonaws.com.
CKEMKTPRDLB-20130419-1810707626.us-east-1.elb.amazonaws.com has address 54.236.231.179
CKEMKTPRDLB-20130419-1810707626.us-east-1.elb.amazonaws.com has address 52.1.95.39

I suspect their Carl’s AWS-based web farm is generating the outgoing mails directly, and they have not accounted for that in their SPF configuration. Using Amazon Simple Email Service (SPF) would have accounted for this already (http://docs.aws.amazon.com/ses/latest/DeveloperGuide/authenticate-domain.html). This could also be handled by designating a outgoing mail host from their AWS environment with an Elastic IP attached and add it to their SPF record.

I sent them an email detailing the issue at their corporate email address. I’ll update if I hear back, but I don’t expect it will ever reach anyone who knows what to do with it.

Update: I got an email back stating the issue was being routed ‘to the appropriate department’.