Big Ruby Conf 2014 Notes

These are the notes I took while attending Big Ruby 2014 in Grapevine, Texas.

Testing the untestable

Start with testing as though code is in a black box
Heroku managed this by actually just git push/deploy/build to test build packs
heroku run bash will put you in a terminal on your dyno
If heroku can deploy 6 times and have at least 1 passing test suite, they consider it successful
Heroku test speeds went from 5 minutes per test suite to 44 tests in 12 minutes.
- He has worked at companies that had Rails tests going on for 30 minutes per suite
parallel_rspec is a plugin to look into for speeding up tests
Testing allowed heroku to be aggressive in refactoring.
- They use to code Min Viable Patching because they wanted to add/modify the least amount of code per patch
build out smaller black boxes and unit test those smaller components
Untestable scenarios
- Mock for determinism (webmock for small pages, VCR for larger network traffic)
codetriage.com
If things seem too big, start with integration tests
Test things that would hurt if they break.

Business intelligence

Supporting decision making
Data is generally an afterthought to devs, it’s just a storage for objects
How do you use the data that’s stored to make business decisions
The naive approach
- “We can run reports straight from the transactional db”
- The complex joins are slow
- affects production performance
Figure out the business needs instead of just feature requests
They might not know about data that is valuable like (we know how loyal customers are by the last login date)
Turn inferences into facts
De-normalize data for the facts. It’s ok for data to be stored in many places if the facts are only stored once.
Tables can communicate more than graphs
“A” recipe for success
- Use Ruby - TDD, deployment, “data munging”, many Powerful ORMs
- Mongodb
  - Flexible schemas
  - Map/Reduce
  - New Aggregation framework — Check this out
- Postgres for transactional
Minimize on-demand calculations
Store facts, not attributes (denormalized)
SQL is the recipe and the cake is the report. store the cake too.
Not storing objects, storing events (events don’t change)

Job processing

Queues and Processors
Queues
- Durability, Availability, Throughput (pick two)
  - Durability - Message sent, how likely is it going to be received
  - Availability - How fault tolerant
  - Throughput - How many jobs can we push through
  - Cost?
- RabbitMQ - Fast, durable, reasonably fault tolerant (entirely how you configure it)
- Amazon SQS - highly Durable, highly Available, reasonable throughput, Not fast
- Redis - High throughput, fairly durable, availability is hard, really fast
- Not all jobs are the same, Tapjoy has roughly 2 different types of jobs
  - X00,000 / minute (events) - RabbitMQ
  - 1,000,000 / hour (general jobs) - SQS
Processors
- Def
  - A way to retrieve jobs
  - A way to match that job to code
  - A way to run that code
- Concurrently with RabbitMQ and Fibers
  - Ruby - Forks, Fibers, Threads
  - Rabbit - Async, Stateful connections (retrieve and say finished on same state), no batch processing
  - Fibers - Coroutines on the main stack, Fibers are great for async, no worries about cross thread talking
    - If CPU bound, it’s not worth it. however, IO bound is ok.
    - Pain points
      - SystemStackError - fixed, but size varies, frame for the stack under ruby (e.g., Can’t save an AR object in Fiber)
      - Needs Fiber-aware network libraries
      - Code needs to know its on a fiber
  - Fibers aren’t good for gen purpose jobs, RabbitMQ is complex, He says he wished he looked harder at Celluloid.
- SQS Job Processor
  - SQS - HTTP API, Synchronous, allows for batch processing
  - Why not threads
    - Global Interpreter Lock
    - Thread Safety
    - Memory Bloat
  - Building the processor
    - Threads to fetch
    - For for each batch of messages
    - child has a pipe to parent to share stats
    - However, pipe processing is slow (tapjoy gets 6 children before CPU upper bounds)
    - Resolv is the Ruby DNS in Ruby instead of C to prevent GIL, use this instead of C DNS if more than 1 thread
  - Look into learning to use GDB (also useful is the LLVM equal, lldb)
  - Sometimes the bug is not your bug
  - Chore, resque compat serialization, per-job config, jobs for each server, queue agnostic and Concurrency agnostic
    - Not released yet :(

Shopify sharded Rails

Sharding - data over more than 1 db
Rails assumes 1 db
Auto-Incr doesn’t work
Normally can get by on tune and cache for years
Why Shard
- Smaller indexes
- Better localization, data stays warmer longer
No Joins between shards, normally there is a ‘stop and shard’ moment, shopify doesn’t do this.
Denormalized everything to have a shop_id since everything is scoped through shop
Noeqd, like snowflake for ids
Make sure ids are javascript safe
shopify ids are auto_incr + N * auto_incr
Rebalancing (move across shards) lock the shop and move

Check out Pry
text-table gem for viewing data in the console
Using thor to grep to view logs
“Telling a programmer there is already a library for that is like telling a song writer there is already a song about love”

Castle on a cloud

Look up ChatOps video
common tasks are automated through hubot commands
IAM is basically like LDAP. Can limit within a bucket too.
look into graphite for graphing. I think it’s a python app
VPC as a VPN
GitHub is a MySQL shop and use RDS for even heroku-based apps

Working effectively on a distributed team

As Remote Worker
- Be visible
  - make sure people see your work
  - speak up in meetings
  - Use video, if you have to talk to someone, make it a video
  - Be visible with your calendar (we currently don’t really do use)
- Be yourself
  - Livingsocial has an ‘off topic’ room in campfire that allows people to show personalities. Being silly and being fun
    - You need the water cooler time even online. work can be / is social
    - People need to see you in addition to your work
- Be disciplined
  - Not just not being lazy, also be effective with the team (eg not working during odd hours)
  - Setup a dedicated workspace
- Be available
- Be flexible
- Be free (e.g., if you want to move around, move around. just do your work)
Use Video - it well worth the additional effort, even if it’s 6fps
Screen Hero for paired programming (ss and audio)

Refactoring with science

Github use to say “never break the API” now because of usage data, they can change and contact users
Github deploys branches to production, not merge into master, then deploy
This allows for easier rollbacks by just deploying master again to revert
dat-science allows for basically A/B Testing of code
dat-analysis allows for visualizing the dat-science results

Building a service

Before writing a service. They write a spec
Curl-ish, Description, URL params, Request Body, Response (201 created, 409 exist… etc)
Then they write the client. It allows them to understand error handling
Writing configuration
Rack::Test is good for testing API services
Dev would be best in docker or vagrant, but Union metrics (Austin dev team) just ‘wings it’
Ops is important in SLA
Benefits
- small deploys, experiments (eg with different languages) because it only has to talk HTTP
- Experiments are double edged. happy programmers due to trying new things. but you might get boxed into something
- client and servers are built in parallel for faster development. stubbing the server allows the client to be visualized quicker

Legacy codebase

It’s not a problem until it’s a problem
Think of working with legacy code like an archaeologist
- You have to dig, research, piece together… all to figure out what happened and why these choices were made
- Survey, Excavation, Analysis
Survey
- Take inventory with tests, code comments, new relic, benchmarks
- Ask dumb questions and flag myths. you’ll be able to figure out the culture which leads to insights in the code
- Tricks -> Techniques -> Process -> Methodology -> Dogma (seems like you should watch out for this)
Excavation
- If you have a goal to fix something, fix something. If something else comes out of the excavation, make an issue and get to it after your goal
- “First, do no harm”
- Use warn when making changes while leaving current implementation in place
Analysis
- Keep up the documentation in blog posts, wiki articles, readme or even tests
- Document everything. Even the most trivial things like initializing from scratch. Also keep it up to date
- Make a map. Search for gems that will make a rails UML diagrams
The fear of looking at and working with a legacy codebase is all in your head
Ask why is it the way it is
It’s not bad code in the sense that it’s not working (it probably is working) but it just never went to refactor after green

Active interaction

Fat controllers, skinny models make god models
Rails says to focus models on NOUNS
Consider making models with verbs
active_interaction

Key models

RDB - Relations, transactions, schemas, ability to extend and ad-hoc queries
KVS - Schema-less, single-access reads, write-heavy (append only), easier to scale (restricted api), it’s just a hash

Managing fleets (could be Postgres)

Clint is from Missouri. Not St. Luis, Not KC. Columbus.
In the beginning was a sinatra app talking to aws using sequel
The simplest thing that could possibly work, but no less
Now 5 apps in sinatra, Fog to talk to aws, and still sequel
They use Sinatra to map to other sinatra apps based on url
50 worker types, 100s of workers all doing the heavy lifting
each machine only has OS, Postgres, and Wal-E
Monitoring is from the outside in. Using State Machines and stateless workers
This is kind of like in a game. Loop over: observe environment, do something based on env
their queue is while resource, feel, do, requeue. over and over again
Feeler - collects data on the resource. Append data to a observation table. Finds out state as well
Think - Eval the state the resource is in. (eg if it’s uncertain, do x)
Stateless (background) workers talk to aws, heroku api, postgres because they all require network connection
Failure
- notify and push to another queue
- it either resolves or needs a human
- the notifications lead to the development of playbooks based on the type of incident notification.
- They have codified their playbook for different incidents that can resolve them (like restarting)
- “Have you tried turning it off and on again?” - can resolve a lot of aws issues because images will pop up on other machines
- If the codified resolver can’t do it, it says “I need an adult” and sends it a human
State Machines are great
Use Stateless workers
Expect things to break

Understanding others

Leveling up people skills
- Everyone is different
Developers
- Grinders - product/feature-obsessed. Iterative. “Move fast, break things”
  - Explain how process X is better to communicate
  - Use TDD
- Tour Guides - Works across the system/stack. Helps others
  - Ask how is something the way it is.
  - 50/50 split between legacy and sharing knowledge. write it down and spread the knowledge
- Geniuses - Thinkers, worriers. Focused on quality and experimentation
  - Ask how can we iterate towards it being better
  - Make sure they don’t want to completely throw out the current system
- T-shaped people - Focused on one area of expertise.
  - they might be so-so at everything else except that one thing.
  - share that particular knowledge. teach their weaknesses
- Fun leader
  - People skills with some tech knowledge
  - They can’t fix every people problem. let them cool down.
Learn about your self and know your own tendencies
- Know what you need to recharge yourself. Know how to humor people
- Know when your personality type is in conflict with the other
Empathy - Understand the other and their current situation and next move
- Can be difficult online.
- Why would someone say that?
  - Hanlon’s razor: stupid over malice
  - Occam’s razor: simple over complex
There is another person on the other end of the text box
“An adult is someone who has been around longer than one hype cycle”
“Upstart is 1-3 years” and wants to rewrite the world

Testing the untestable

Business intelligence

Job processing

Shopify sharded Rails

Living Social lightning talks

Castle on a cloud

Working effectively on a distributed team

Refactoring with science

Building a service

Legacy codebase

Active interaction

Key models

Managing fleets (could be Postgres)

Understanding others