2019-10-10
aka use the cdk to build an account probe rig
When one creates a new account, it is pristine and useless. Then you start to dirty it up with all kinds of goodies that does stuff and, every once in a while, you have to check up on it to make sure it still is something that resembles wabisabi and not some nasty old hand bag of tissues, loose change, lipstick and used chewing gum.
For us, Cloud Custodian goes a long way, but we needed to have some weird shit checked that doesn’t quite fit its profile.
So in our never ending story of creating a “safe” space for the kids to play with their cloudy lego blocks, we have found multiple ways to make accounts say “ahh” and have a look around and optionally fix things.
I like just had to give the new cloud vaping kid on the block, the CDK, a try since I’ve jinja2’ed enough for 2 lifetimes.
The design is basically:
{ width=1080 }
and, whenever the need to spam the “people who care” gets too much:
This post describes the first CDK stack that happens in the org account.
Being a bit keep that .ts away from me!
I started with a cdk init -l javascript
.
Then I slapped some mini projects into the handlers folder:
▸ bin/
▸ docs/
▾ handlers/
▾ jobControl/
▸ node_modules/
jobControl.js
package-lock.json
package.json
▾ prober/
▸ lib/
▾ probe/
__init__.py
budgets.py
iam.py
poster.py
prober.py
requirements.txt
▸ lib/
▸ node_modules/
buildspec.yml
cdk.json
package-lock.json
package.json
README.md
tsconfig.json
so it might seem a little weird to have duo lingo in the same project, but we are moving towards js but wanted to build the actual probing in python for simplicity sake.
The CDK makes it super simple to have both and CodeBuild makes sure the CDK has something to deploy.
Anyhoos - the lib of the stack looks like this…
const cdk = require('@aws-cdk/core')
const lambda = require('@aws-cdk/aws-lambda')
const sqs = require('@aws-cdk/aws-sqs')
const sfn = require('@aws-cdk/aws-stepfunctions')
const tasks = require('@aws-cdk/aws-stepfunctions-tasks')
const iam = require('@aws-cdk/aws-iam')
const events = require('@aws-cdk/aws-events')
const targets = require('@aws-cdk/aws-events-targets')
class ProberStack extends cdk.Stack {
constructor(scope, id, props) {
super(scope, id, props);
const probeBucketName = 'fooeep239012382193012'
const lebuck = { bucketName: probeBucketName } //new s3.Bucket(this, 'ResultBucket');
const leq = new sqs.Queue(this, 'proberQ', {
fifo: true,
contentBasedDeduplication: true,
})
const startBatchFunction = new lambda.Function(this, 'startBatchFunction', {
runtime: lambda.Runtime.NODEJS_10_X,
handler: 'jobControl.startBatch',
code: lambda.Code.asset('./handlers/jobControl'),
timeout: cdk.Duration.seconds(15),
environment: {
QUEUE_URL: leq.queueUrl,
},
})
const checkBatchFunction = new lambda.Function(this, 'checkBatchFunction', {
runtime: lambda.Runtime.NODEJS_10_X,
handler: 'jobControl.checkBatch',
code: lambda.Code.asset('./handlers/jobControl'),
timeout: cdk.Duration.seconds(30),
environment: {
QUEUE_URL: leq.queueUrl,
},
})
const probeFunction = new lambda.Function(this, 'probeFunction', {
runtime: lambda.Runtime.PYTHON_3_7,
handler: 'prober.handler',
code: lambda.Code.asset('./handlers/prober'),
timeout: cdk.Duration.seconds(10),
environment: {
QUEUE_URL: leq.queueUrl,
S3_BUCKET: lebuck.bucketName
},
memorySize: 1024,
})
const polStatement = new iam.PolicyStatement({
resources: ['*'],
actions: [
"codecommit:Get*",
"codecommit:List*",
],
})
startBatchFunction.addToRolePolicy(polStatement)
const polStatement1 = new iam.PolicyStatement({
resources: ['*'],
actions: [
"sts:*"
],
})
probeFunction.addToRolePolicy(polStatement1)
leq.grantPurge(startBatchFunction);
leq.grantSendMessages(startBatchFunction);
leq.grantConsumeMessages(checkBatchFunction);
leq.grantConsumeMessages(probeFunction);
const polStatement2 = new iam.PolicyStatement({
resources: [
'arn:aws:s3:::'+probeBucketName,
'arn:aws:s3:::'+probeBucketName+'/*',
],
actions: [
"s3:PutObject*"
],
})
probeFunction.addToRolePolicy(polStatement2)
const startJob = new sfn.Task(this, 'startBatch', {
task: new tasks.InvokeFunction(startBatchFunction),
resultPath: '$.guid',
});
const checkJob = new sfn.Task(this, 'checkBatch', {
task: new tasks.InvokeFunction(checkBatchFunction),
});
const numTasks = 5;
let probeTasks = []
for (let i=0; i<numTasks; i++) {
probeTasks.push(new sfn.Task(this, 'probe_'+i, {
task: new tasks.InvokeFunction(probeFunction),
}))
}
const parallel = new sfn.Parallel(this, 'paraProbes')
const loop = new sfn.Pass(this,'loop!')
const done = new sfn.Pass(this,'done!')
probeTasks.forEach(x => {
parallel.branch(x)
})
const sfndef = startJob
.next(loop)
.next(parallel)
.next(checkJob)
.next( new sfn.Choice(this, 'Job Complete?')
.when( sfn.Condition.stringEquals('$.Attributes.ApproximateNumberOfMessages', '0'), done)
.otherwise(loop))
const leSfn = new sfn.StateMachine(this, 'ProberStateMachine', {
definition: sfndef,
timeout: cdk.Duration.seconds(300),
})
const leRule = new events.Rule(this, 'ProberScheduledRule', {
schedule: events.Schedule.expression('cron(0 16 ? * * *)')
});
leRule.addTarget(new targets.SfnStateMachine(leSfn, {
input: events.RuleTargetInput.fromObject({accMask: 'foo_'})
}));
}
}
module.exports = { ProberStack }
This builds a pretty little sfn number that looks like this…
{ width=1080 }
Time is of the essence they say, so this stack runs 5 probers in parallel (1 account per prober).
The last bit kicks it off once a day via a cloudwatch event rule.
both of them are contained in the same js file…
const aws = require('aws-sdk');
const codecommit = new aws.CodeCommit();
const sqs = new aws.SQS();
const process = require('process');
const yaml = require('yaml')
exports.startBatch = async (event) => {
console.log('starting....',event)
let accre = event.accMask ? event.accMask : 'foo_'
let pa = []
let params = {
filePath: 'group_vars/all/accounts.yml', /* required */
repositoryName: 'fooeep', /* required */
};
const contents = await codecommit.getFile(params).promise();
const accounts = yaml.parse(String(contents.fileContent))
Object.keys(accounts.aws_accounts).filter( i => new RegExp(accre).test(i)).forEach(accName => {
console.log(accName)
let accNum = accounts.aws_accounts[accName]
let params = {
MessageBody: JSON.stringify({
name: accName,
number: String(accNum)
}),
QueueUrl: process.env.QUEUE_URL,
MessageGroupId: 'proberGroup'
}
pa.push( sqs.sendMessage(params).promise()
.then((response) => {
console.log(accNum,response)
return response;
})
)
})
return Promise.all(pa)
}
exports.checkBatch = (event) => {
console.log('checking',event)
let params = {
QueueUrl: process.env.QUEUE_URL,
AttributeNames: [ 'ApproximateNumberOfMessages' ],
}
return sqs.getQueueAttributes(params).promise()
.then((response) => {
console.log(response)
return response;
})
}
The startBatch handler goes and grabs our latest list of accounts from codecommit in a yaml file that looks like this
---
aws_accounts:
foo_whan: "111111111111"
foo_too: "222222222222"
foo_tea: "333333333333"
and dumps each { name: accName, number: String(accNum) }
json onto the sqs queue.
The checkBatch handler literally just fetches the queue depth and returns it so sfn can figure out if it needs to go loopy again.
The python lambda that slithers into all the accounts and tickles some api’s also has some multi threaded goodness (since we can’t multiprocess in lambda :[).
This allows the probes themselves to run at the same time, kicking relevant api tyres and taking numbers for securityhub findings.
The prober itself then has a cup of tea while the probes do their thing and, when they all are back, dumps their responses to a bucket in our security account and stashes any findings found in the account’s securityhub (which then makes its way to the security account also)
import sys
sys.path.insert(0,'lib')
import yaml
import boto3
import os
import json
import time
import datetime
import dateutil
import uuid
import threading
import poster
import probe.budgets
import probe.iam
sqs = boto3.client('sqs')
s3 = boto3.client('s3')
iam = boto3.client('iam')
results = {}
class ProbeThread(threading.Thread):
def __init__(self, account, creds, func):
threading.Thread.__init__(self)
self.account = account
self.creds = creds
self.func = func
self.name = func.__module__
def run(self):
print (self.name, "working")
session = boto3.session.Session( aws_access_key_id=self.creds['AccessKeyId'],
aws_secret_access_key=self.creds['SecretAccessKey'],
aws_session_token=self.creds['SessionToken'])
results[self.name] = self.func(self.account, session)
def dateConverter(o):
if isinstance(o, datetime.datetime):
return o.__str__()
def handler(event, context):
result = 'nan'
response = sqs.receive_message( QueueUrl=os.environ['QUEUE_URL'],
MaxNumberOfMessages=1, WaitTimeSeconds=5)
print ('receive', response)
if 'Messages' in response:
print (response['Messages'][0]['Body'])
account = json.loads(response['Messages'][0]['Body'])
sts_client = boto3.client('sts')
sts_response = sts_client.assume_role(RoleArn='arn:aws:iam::'+account['number']+':role/OrganizationAccountAccessRole',RoleSessionName='prober_session')
creds = sts_response["Credentials"]
probeThreads = []
probeThreads.append(ProbeThread(account, creds, probe.budgets.getInfo))
probeThreads.append(ProbeThread(account, creds, probe.iam.getInfo))
results['msg']=response
results['account'] = account
for probeThread in probeThreads:
probeThread.start()
for probeThread in probeThreads:
probeThread.join()
s3.put_object(
ACL='bucket-owner-full-control',
Bucket=os.environ['S3_BUCKET'],
Body=json.dumps(results, default=dateConverter),
Key=account['name']+'/'+time.strftime('%Y/%m/%d/%H/')+str(uuid.uuid4())+'.json')
result = json.dumps(response['Messages'][0]['Body']) + ' processed'
poster.post_finding(account, creds, results)
response = sqs.delete_message( QueueUrl=os.environ['QUEUE_URL'], ReceiptHandle=response['Messages'][0]['ReceiptHandle'])
print ('delete', response)
else:
print ('no message found')
return result
poster.post_finding is just a function that posts the findings into securityhub, like this…
import sys
sys.path.insert(0,'lib')
import boto3
import time
import datetime
import dateutil
def post_finding(account, creds, results):
session = boto3.session.Session( aws_access_key_id=creds['AccessKeyId'],
aws_secret_access_key=creds['SecretAccessKey'],
aws_session_token=creds['SessionToken'])
sechub = session.client('securityhub')
now = datetime.datetime.utcnow().replace(tzinfo=dateutil.tz.tzutc()).isoformat()
for key in results:
if 'findings' in results[key]:
for finding in results[key]['findings']:
print ('found finding', finding)
prev_findings = sechub.get_findings(
Filters = {
'AwsAccountId': [{ 'Value': account['number'],
'Comparison': 'EQUALS'
}],
'Id': [{
'Value': 'probe-finding-'+key+'-'+finding['Id'],
'Comparison': 'EQUALS'
}],
'GeneratorId': [{
'Value': key,
'Comparison': 'EQUALS'
}]
},
MaxResults=1)
if len(prev_findings['Findings'])>0:
print ('existing finding','probe-finding-'+key+'-'+finding['Id'])
ctime = prev_findings['Findings'][0]['CreatedAt']
else:
print ('new finding','probe-finding-'+key+'-'+finding['Id'])
ctime = now
sechub.batch_import_findings(
Findings = [{
'SchemaVersion': '2018-10-08',
'Id': 'probe-finding-'+key+'-'+account['number']+'-'+account['name']+'-'+finding['Id'],
'ProductArn': 'arn:aws:securityhub:eu-west-1:'+account['number']+':product/'+account['number']+'/default',
'GeneratorId': key,
'AwsAccountId': account['number'],
'Types': ['Software and Configuration Checks/Prober '+finding['Pillar']+'/'+key],
'CreatedAt': ctime,
'UpdatedAt': now,
'Severity': {
'Normalized': finding['Severity']
},
'Title': finding['Title'],
'Resources': [{
'Type': 'Other',
'Id': finding['Resource']
}],
'Description': finding['Description'],
}])
print ('posted finding', finding)
else:
print('no findings for ',key)
the first of the two probes hooked up is a “no iam users!” probe
import sys
sys.path.append('lib')
def getInfo(account, session):
iam = session.client('iam')
users = iam.list_users()
findings = []
for user in users['Users']:
try:
profile = iam.get_login_profile(UserName=user['UserName'])
findings.append( {
'Id': user['UserName'],
'Description': 'User has console access - '+user['UserName'],
'Title': 'Access Control violation',
'Severity': 75,
'Resource': account['name'],
'Pillar': 'Security'
})
print (profile)
except:
print ('no login for', user['UserName'])
return {
'result': users,
'findings': findings
}
and the second a “your budget is blown!” probe
import sys
sys.path.append('lib')
def getInfo(account,session):
budgets = session.client('budgets')
blist = budgets.describe_budgets(AccountId=account['number'])
findings = []
for b in blist['Budgets']:
if float(b['BudgetLimit']['Amount']) < float(b['CalculatedSpend']['ActualSpend']['Amount']):
findings.append({
'Id': b['BudgetName',
'Description': 'Budget '+b['BudgetName']+' exceeded - {Budget:'+b['BudgetLimit']['Amount']+',ActualSpend:'+b['CalculatedSpend']['ActualSpend']['Amount']+'}',
'Title': 'Cost Management - Budget Exceeded',
'Severity': 75,
'Resource': account['name'],
'Pillar': 'Cost Optimization'
})
return {
'result' : blist,
'findings' : findings
}
The findings basically contain
The CodeBuild buildspec is a two runtime 3 install affair…
---
version: 0.2
phases:
install:
runtime-versions:
python: 3.7
nodejs: 10
commands:
- npm -g i aws-cdk
- cd handlers/jobControl
- npm ci
- cd ../../handlers/prober
- pip install -r requirements.txt -t lib
- cd ../..
- npm ci
build:
commands:
- cdk deploy
Whoa that is a monster post, and it looks overwhelming - but it really aint.
With our 65 account gig it runs 2m30s and as we add more complex probes it will start expanding.
So on cdk, sfn and sechub…
Pretty sleek (barring the close to daily updates :]) and provides that flexibility that we had in jinja2, composability of ansible roles just in a more code friendly way.
Definitely my new favourite cloudformation puffing tool.
The mega bonus for me, is the magic iam dust it sprinkles in one line.
So a more efficient option would have been to CodeBuild a js/py app that would negate need to queue, but the CDK makes queues and sfn is so damn easy and provides oodles of visibility and control.
I really like the idea of hooking up lambda’s together to make something that can work around the limitations (not time, now that time is 15 minutes! max, but memory or vcpus) .
So far it seems a bit tweaked to ‘security vendors’ but, thanks to c7n showing us all how it’s done in their sechub resource all in all its pretty easy to hook up and use its stash (surely dynamo at the back) and reporting.
The alternative would have been to point Athena’s Gluey HaDoopy Warriors @ s3 and See Quilled GROUP BY
’s till the cows came home.
This rig greases the wheels for our reporting bit - maybe in a later post…