Transit Gateway

2019-09-01

Keeping your landing zone up to date with with AWS’s service evolution is quite a gig, and the network wiring is no different.

Last year November (my how time flies) they GA’ed Transit Gateways (TGW) and we decided to adopt in Jan. It took us about a month worth of testing, 2 weeks of hacking and 2 days of rewiring our 35+ vpc’s. For the last 6 months we’ve not had any issues with it and it actually saves us some moolah, not having to pay for a Fortigate to do a router’s job.

Our transit design differs a little from the next-next-finish design AWS hooks up for you directly, in so much as we don’t want to route traffic between our VPC’s, if we don’t actually have a frigging good reason to.

It’s part of limiting our OMG!Monero!Virus#23231!On!Windows! blast radius and keeping everyone in their respective little playpens.

Given the amount of VPC’s we are hooking up to the mothership, we had to ask for a limit increase of the amount of routing tables one can wire up… The default is 20, and service limit bumping got us to 50, but soon that aint going to be enough either… :[

This post then is how we hooked up the automation with dual CodeBuilds, since this involves us jumping into 2 accounts concurrently. Most of this was from a talk I did at an AWS Loft pow wow earlier this year.


So lets start with the basic design…

design

When we created the TGW, we specifically didn’t want the “auto accept shared attachments” setting enabled, you know, because control and what not.

After sharing the TGW with the other accounts in our Organization’s RAM (Resource Access Manager), we added 2 new VPN connections (4 IPSEC tunnels) to our fortigates, and hooked them up to a transit gateway instance.

We then attached them to the default mothership transit gateway routing table.

For each Business Unit VPC we had to automate the next sequence:

  • creating the attachment on the shared TGW in a business unit account
  • accepting the attachment in the transit account
  • adding a new TGW routing table in the transit account
  • attaching the VPC attachment to it
  • undoing the default wiring in the transit gateway (yeah - don’t click it when you create the damn thing)
  • and associating the VPC’s attachment to our mothership routing table

In the end this is what we end up with in the transit account’s TGW

routing{ width=500 }

Here, solid lines denotes attachments and dotted means assosciations.

The “CodeBuild with a CodeBuild up his brothers nose” automation works pretty much like this…

foo{ width=700 }

1 VPC CodeBuild

After launching the VPC CodeBuild, first thing it does is to launch the TGW CodeBuild

the buildspec…

version: 0.2

phases:
  install:
    commands:
      - echo Entered the install phase...
      - apt-get update
      - apt-get install -y curl apt-transport-https git jq
      - pip install --upgrade pip awscli ansible boto boto3
      - which python
      - python --version
      - ansible --version
  build:
    commands:
      - echo Build started on `date`
      - mkdir -p generated/vpc
      - aws codebuild start-build --project-name TransitGateway --environment-variables-override='[{"name":"ACCOUNT","value":"'$ACCOUNT'"},{"name":"STACK_NAME","value":"'$STACK_NAME'"}]'
      - ansible-playbook vpc.yaml -e acc="$ACCOUNT" -e stack_name="$STACK_NAME"
2 VPC CodeBuild

Hack the VPC cf, which will pause-up the process until the TGW CodeBuild accepts the attachment in (3)…

the cf jinja2 snippets

    {{vpc.name}}TransitGatewayAttachment:
      Type: "AWS::EC2::TransitGatewayAttachment"
      Properties:
        SubnetIds: 
          {% for subnet in vpc.subnets %}
          {% if subnet.route_table == "RtPriv" and subnet.name.startswith("Private")  %}
          - !Ref {{vpc.name}}{{subnet.name}}Subnet
          {% endif %}
          {% endfor %}
        Tags: 
          - { Key: Name, Value: {{vpc.name}} }
        TransitGatewayId: {{TransitGatewayId}}
        VpcId: !Ref {{vpc.name}}VPC

and


    {% for rt in vpc.route_tables %}
    {% for route in rt.routes %}
    {{vpc.name}}{{rt.name}}{{route.name}}Route:
      Type: AWS::EC2::Route
      DependsOn: {{vpc.name}}TransitGatewayAttachment
      Properties:
        RouteTableId: !Ref {{vpc.name}}{{rt.name}}RouteTable
        DestinationCidrBlock: {{route.cidr}}
        {% if route.natgw is defined %}
        NatGatewayId: !Ref {{vpc.name}}{{route.natgw}}NatGateway
        {% elif route.igw is defined %}
        GatewayId: !Ref {{vpc.name}}InternetGateway
        {% elif route.tstgw is defined %}
        TransitGatewayId: {{TransitGatewayId}}
        {% endif %}
    {% endfor %}
    {% endfor %}
3 and 4 TGW CodeBuild

Find the ‘pendingAcceptance’ attachment and accept it in an ansible role

from the transit_gateway role we created

- name: list transit gateways pending invitation (use first instance)
  shell: >
    AWS_ACCESS_KEY_ID="{{ assumed_role.sts_creds.access_key | default(omit) }}"
    AWS_SECRET_ACCESS_KEY="{{ assumed_role.sts_creds.secret_key | default(omit) }}"
    AWS_SESSION_TOKEN="{{ assumed_role.sts_creds.session_token | default(omit) }}"
    AWS_DEFAULT_REGION="{{ region }}"
    aws ec2 describe-transit-gateway-vpc-attachments | jq '.TransitGatewayVpcAttachments|map(select(.State=="pendingAcceptance" and .VpcOwnerId=="{{aws_accounts[acc]}}"))|map(.TransitGatewayAttachmentId)'
  register: tgapending

- block:
  - set_fact:
      tgaid: "{{(tgapending.stdout|from_json)[0]}}"

  - name: accept invitation
    shell: >
      AWS_ACCESS_KEY_ID="{{ assumed_role.sts_creds.access_key | default(omit) }}"
      AWS_SECRET_ACCESS_KEY="{{ assumed_role.sts_creds.secret_key | default(omit) }}"
      AWS_SESSION_TOKEN="{{ assumed_role.sts_creds.session_token | default(omit) }}"
      AWS_DEFAULT_REGION="{{ region }}"
      aws ec2 accept-transit-gateway-vpc-attachment --transit-gateway-attachment-id "{{tgaid}}"

  - name: create transit gateway route table
    shell: >
      AWS_ACCESS_KEY_ID="{{ assumed_role.sts_creds.access_key | default(omit) }}"
      AWS_SECRET_ACCESS_KEY="{{ assumed_role.sts_creds.secret_key | default(omit) }}"
      AWS_SESSION_TOKEN="{{ assumed_role.sts_creds.session_token | default(omit) }}"
      AWS_DEFAULT_REGION="{{ region }}"
      aws ec2 create-transit-gateway-route-table --transit-gateway-id {{TransitGatewayId}} --tag-specifications "ResourceType=transit-gateway-route-table,Tags=[{Key=Name,Value={{stack_name|default('unknown')}}}]"
    register: tgrt

  - set_fact:
      tgrtid: "{{(tgrt.stdout|from_json).TransitGatewayRouteTable.TransitGatewayRouteTableId}}"

  - pause:
      seconds: 90

  - name: configure name tag for transit gateway attachment
    shell: >
      AWS_ACCESS_KEY_ID="{{ assumed_role.sts_creds.access_key | default(omit) }}"
      AWS_SECRET_ACCESS_KEY="{{ assumed_role.sts_creds.secret_key | default(omit) }}"
      AWS_SESSION_TOKEN="{{ assumed_role.sts_creds.session_token | default(omit) }}"
      AWS_DEFAULT_REGION="{{ region }}"
      aws ec2 create-tags --resources {{tgaid}} --tags Key=Name,Value={{stack_name}}

  - name: remove association from default route table
    shell: >
      AWS_ACCESS_KEY_ID="{{ assumed_role.sts_creds.access_key | default(omit) }}"
      AWS_SECRET_ACCESS_KEY="{{ assumed_role.sts_creds.secret_key | default(omit) }}"
      AWS_SESSION_TOKEN="{{ assumed_role.sts_creds.session_token | default(omit) }}"
      AWS_DEFAULT_REGION="{{ region }}"
      aws ec2 disassociate-transit-gateway-route-table --transit-gateway-route-table-id "{{tgrtid}}" --transit-gateway-attachment-id {{tgaid}}

  - pause:
      seconds: 60

  - name: associate transit gateway attachment with newly created route table
    shell: >
      AWS_ACCESS_KEY_ID="{{ assumed_role.sts_creds.access_key | default(omit) }}"
      AWS_SECRET_ACCESS_KEY="{{ assumed_role.sts_creds.secret_key | default(omit) }}"
      AWS_SESSION_TOKEN="{{ assumed_role.sts_creds.session_token | default(omit) }}"
      AWS_DEFAULT_REGION="{{ region }}"
      aws ec2 associate-transit-gateway-route-table --transit-gateway-route-table-id {{tgrtid}} --transit-gateway-attachment-id {{tgaid}}

  - name: get vpn transit gateway attachments
    shell: >
      AWS_ACCESS_KEY_ID="{{ assumed_role.sts_creds.access_key | default(omit) }}"
      AWS_SECRET_ACCESS_KEY="{{ assumed_role.sts_creds.secret_key | default(omit) }}"
      AWS_SESSION_TOKEN="{{ assumed_role.sts_creds.session_token | default(omit) }}"
      AWS_DEFAULT_REGION="{{ region }}"
      aws ec2 describe-transit-gateway-attachments --query 'TransitGatewayAttachments[?ResourceType==`vpn`].TransitGatewayAttachmentId'
    register: tgvpn

  - set_fact:
      tgvpnai: "{{tgvpn.stdout|from_json}}"

  - name: propagate vpn transit gateway attachments to route table
    shell: >
      AWS_ACCESS_KEY_ID="{{ assumed_role.sts_creds.access_key | default(omit) }}"
      AWS_SECRET_ACCESS_KEY="{{ assumed_role.sts_creds.secret_key | default(omit) }}"
      AWS_SESSION_TOKEN="{{ assumed_role.sts_creds.session_token | default(omit) }}"
      AWS_DEFAULT_REGION="{{ region }}"
      aws ec2 enable-transit-gateway-route-table-propagation --transit-gateway-route-table-id {{tgrtid}} --transit-gateway-attachment-id {{item}}
    loop: "{{tgvpnai}}"
  when: tgapending.stdout != "[]"
  • PS-1: The hackers rule - when CF aint doing your ‘touch-pause-engage’ api/cli orchestration - pause for a minute or so, it keeps those ‘not there yet’ errors at bay. 90 and 60 seconds in our case, just enough time to go press buttons on your coffee machine.

  • PS-2: Because we have quite an install phase in our CodeBuild project the TGW CodeBuild takes a min or two longer than the (2) cloudformation to reach the “waiting for the TGW acceptance bus” state (times out after a while), so we keep the sequence by exploiting the setup time for apt and pip to get their collective butts in gear.


Overall this aint the world’s most perfect networking IaC hack, but it does make things a tad more manageable.

We will probably give it a revamp when the local region starts pushing electrons around next year.