Jesse Buchanan (“jbinto”)

I’m a software developer, a Torontonian, and a Leafs fan.

Using Ansible to set up a Rails deployment environment

| Comments

I’ve been using Linux for quite a long time, but I still get a little anxious every time I have to run a server in production, listening on the internet.

It’s not so hard to get a server up and running. Even if you’re not a system administrator, with a little bit of Linux experience and some Google-fu you can get up and running fairly quickly.

Now, upgrade everything a few times, remove a few packages, add ten more, change the hostname back and forth, break nginx trying to add SSL, install packages from source that overwrite system files. The jenga tower is teetering, but somehow it still works.

You have a snowflake server.

You’re probably not this undisciplined with your production environment. But if you spend most of your time working on software and not infrastructure, and only have one or two servers to manage, you’re going to be rusty approximately 100% of the time. When it comes time to change hosts, or if your server gets compromised, are you ready to take action at a moment’s notice?

You have to do a lot of things to get a web server running, but none of them are particularly challenging. They are repetitive, detail-oriented, exact. The sort of things humans are lousy at and computers are great at.

Even surgeons can benefit from checklists. Why not create an executable one?

That’s exactly what I’ve done using Ansible.

I’ve created a playbook for provisioning a Rails environment. With one command, I can build – from scratch; not an image – a brand new virtual server on my laptop with the following:

  • Ubuntu 12.04
  • Basic server security (SSH keys, ufw, disable PasswordAuthentication and PermitRootLogin, unattended upgrades)
  • Postgresql - latest
  • rbenv/ruby 2.1.1
  • nginx with Phusion Passenger - latest
  • An environment for my Rails app: a deploy user, an nginx vhost, a Postgres database/user with a random password, and a database.yml pre-set with that password.

The Ansible DSL is easy to understand: it’s just a YAML file. It’s declarative, like SQL or CSS, not imperative like C or Ruby. It’s also idempotent, which means running it many times will not have undesirable side effects. In fact, running it again on the same server will upgrade it.

The value of this cannot be overstated. Just like when you first discover the value in test-driven development or source control, there’s this feeling of invincibility. Every bit of progress you make is cemented in your playbook.

Having this virtual server “recipe” is great. It means I am always a few minutes away from a reliable Rails environment, even if I lose access to my primary computer. But this is only scratching the surface.

With the digtal_ocean Ansible module, I’m able to connect to the Digital Ocean API and spin up cloud instances. I can edit one configuration file, and with one command:

  • Provision a virtual server on DigitalOcean with the right region/size/SSH keys
  • Create an A record in DigitalOcean DNS matching the IP I was just assigned

Now I can run the same playbook that worked on my laptop, on the pristine DigitalOcean droplet. Within 10 minutes, I have a freshly-baked Rails environment on the Internet that wasn’t there before. And it just works.

Once the environment is set up, you can deploy Rails apps “normally” using Capistrano.

Why not _____?

  • Heroku: I’m not Heroku’s target customer right now. The free tier is useful when you are certain you can limit yourself to there, but cost creep is real. $35 sounds reasonable for a hobby app until it’s $70 (another dyno), then $90 (SSL), then $140 (better database)..

I’m trying to work on a PostGIS project, and the lowest tier of database Heroku offers with PostGIS costs $200/month. Not going to happen.

  • Docker/Dokku: Docker uses some new linux features to provide real containers to applications: think ultra-light weight VMs. Dokku is an incredible little project that allows Heroku buildpack-style deployments to your own server running docker. It tries to mimic Heroku at every turn, and it’s really, really cool.

But it’s not really ready for production. At least, not for someone who isn’t ready to hack on it a bit. It can be difficult to get containers to talk to each other. The order in which containers start up can be problematic. It seems dokku is a small piece of a greater puzzle of a full open source Heroku replacement, which is an upcoming project called Flynn. Unfortunately I need something now and I’m not willing to work on the bleeding edge.

Enough talk, show me the code!

Okay. I won’t be walking through any of it here, like I normally would. It only took me a few minutes to get the hang of Ansible, the rest of the time was spent trying to get the server configuration just right.

The playbook for building the server (either locally or remotely), and provisioning the DigitalOcean VM, is here:

https://github.com/jbinto/ansible-ubuntu-rails-server

I also took the sample application from Michael Hartl’s Rails Tutorial and added a Capistrano 3 configuration to test a real app deployment.

https://github.com/jbinto/rails4-sample-app-capistrano

If you have any questions or suggestions for improvement, I’d love to hear from you. Leave a comment or send me email at [email protected].

Loading Toronto Bikeways data into PostGIS

| Comments

Working with shapefiles

I’ve been working with the City of Toronto’s open data catalogue, specifically the Bikeways route data. The route data is provided as an ESRI Shapefile, a once proprietary format, now an open standard.

If you go to the Toronto open data site, you’ll see there are two shapefiles provided: “MTM 3 Degree Zone 10, NAD27” and “WGS84 (Latitude/Longitude)”. After a short crash course on coordinate systems and projections, I found I’m in luck: WGS84 is what Google Maps uses, so the data is already in the correct format.

There are open source tools you can use to convert from shapefiles to something more usable, such as KML (if you’ve ever used Google Earth or the Maps API, you’re familiar with KML). There’s also a newer format called GeoJSON, which I’ll be using. In a stroke of luck, the Google Maps team just recently (March 2014) announced support for GeoJSON, through it’s new Data Layer functionality.

The first naive approach I took was to just dump the whole file into GeoJSON.

I followed these instructions and used the ogr2ogr program which is part of the GDAL suite of tools. On OS X:

1
2
brew install gdal
ogr2ogr -f GeoJSON -t_srs crs:84 bikeways.geojson CENTRELINE_BIKEWAY_OD_WGS84.shp

It didn’t take long, less than a second, and I was surprised not to be confronted with a wall of verbose text. I pushed it to Github. They render GeoJSON commits in a map view in the browser, so lacking any other tools to verify this output, I figured it’d be a good smoke test. What I didn’t realize was that the file was 50MB, not the 5MB I thought at first glance. The browser spun and spun, but eventually timed out and couldn’t render the file.

Installing PostGIS

Clearly, it would take a bit more work. I got to reading about PostGIS, which is an spatial / geographical database extension for PostgreSQL.

I already had PostgreSQL installed via homebrew. I haven’t made any notable changes to the configuration. PostGIS can also be installed via Homebrew:

1
2
3
brew install postgis
createdb postgis_junk
psql -d postgis_junk
1
2
3
4
CREATE EXTENSION postgis;

-- verify the extension is installed
SELECT postgis_full_version();

Importing the data

I then used the shp2pgsql tool to generate an SQL script to load the data into PostgreSQL.

1
2
3
4
5
6
7
# -s = source projection. WGS84 is also known as EPSG:4326.
# -I = index the geometry column
# -c = create new
shp2pgsql -s 4326 -I -c -W UTF-8 CENTRELINE_BIKEWAY_OD_WGS84.shp shapes > shapes.sql

wc -l shapes.sql
60444

60,000 INSERT statements! At this point, I had no idea what the data looked like. The Toronto cycling map has a lot going on, but there’s certainly not 60,000 routes.

I loaded the data into PostgreSQL and started looking at it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
psql -d postgis_junk < shapes.sql
echo "\d shapes" | psql -d postgis_junk

                                       Table "public.shapes"
   Column   |              Type              |                      Modifiers                       
------------+--------------------------------+------------------------------------------------------
 gid        | integer                        | not null default nextval('shapes_gid_seq'::regclass)
 geo_id     | numeric(10,0)                  | 
 lfn_id     | numeric(10,0)                  | 
 lf_name    | character varying(110)         | 
 address_l  | character varying(20)          | 
 address_r  | character varying(20)          | 
 oe_flag_l  | character varying(2)           | 
 oe_flag_r  | character varying(2)           | 
 lonuml     | integer                        | 
 hinuml     | integer                        | 
 lonumr     | integer                        | 
 hinumr     | integer                        | 
 fnode      | numeric(10,0)                  | 
 tnode      | numeric(10,0)                  | 
 one_way_di | smallint                       | 
 dir_code_d | character varying(20)          | 
 fcode      | integer                        | 
 fcode_desc | character varying(100)         | 
 juris_code | character varying(20)          | 
 objectid   | numeric                        | 
 cp_type    | character varying(50)          | 
 rid        | double precision               | 
 geom       | geometry(MultiLineString,4326) | 
Indexes:
    "shapes_pkey" PRIMARY KEY, btree (gid)
    "shapes_geom_gist" gist (geom)

The data will need to be cleaned up significantly. It’s denormalized with a lot of repetition.

Immediately, I noticed some entries that were clearly not bike paths. 427/Gardiner, Brown’s Line, etc. It didn’t take long to figure out the common thread: cp_type.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
select count(*) from shapes;
 60415

select count(*) from shapes where cp_type is not null;
  6573

select distinct cp_type from shapes order by 1;
 Bike Lanes
 Contra-Flow Bike Lanes
 Cycle Tracks
 Informal Dirt Footpath
 Major Multi-use Pathway
 Minor Multi-use Pathway
 Park Roads Cycling Connections
 Sharrows
 Signed Routes
 Suggested On-Street Connections
 Suggested On-Street Routes

Well, that’s a big difference: 60,000 entries to 6,500 entries. It seems the Toronto open dataset also includes all of the roads in the city. I suppose this makes sense, as in order to make a cycling map you would still need to include the roads that are not specifically designated for cycling.

I considered deleting the roads, but they may serve some purpose to me in the future. I created a view to keep the roads out of the way, to focus entirely on bikeways.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
CREATE VIEW "public"."bikeways" AS SELECT shapes.gid,
    shapes.geo_id,
    shapes.lfn_id,
    shapes.lf_name,
    shapes.address_l,
    shapes.address_r,
    shapes.oe_flag_l,
    shapes.oe_flag_r,
    shapes.lonuml,
    shapes.hinuml,
    shapes.lonumr,
    shapes.hinumr,
    shapes.fnode,
    shapes.tnode,
    shapes.one_way_di,
    shapes.dir_code_d,
    shapes.fcode,
    shapes.fcode_desc,
    shapes.juris_code,
    shapes.objectid,
    shapes.cp_type,
    shapes.rid,
    shapes.geom
   FROM shapes
  WHERE (shapes.cp_type IS NOT NULL);

Some fun queries

Now, for some simple analysis of the data.

Toronto bikeways, by name, sorted by length

1
2
3
4
5
6
7
8
9
select lf_name, sum(st_length(geom::geography)) from bikeways group by lf_name order by 2 desc;

Humber River Recreational Trl 32281.7240352214
Beltline Trl  18177.9734050018
West Highland Creek Trl 17814.6034264817
Martin Goodman Trl  17091.7664842414
East Don River Trl  14531.8038715227
Finch Corridor Trl  14517.1523456954
...

Toronto bikeways, by type, sorted by length

1
2
3
4
5
6
7
8
9
10
11
12
13
select cp_type, sum(st_length(geom::geography))/1000 as km from bikeways group by cp_type order by 2 desc;

Major Multi-use Pathway 195.97070786872
Signed Routes 132.051380133788
Minor Multi-use Pathway 127.11080747771
Suggested On-Street Routes  125.16581425177
Bike Lanes  109.282966609882
Suggested On-Street Connections 78.4045134911026
Park Roads Cycling Connections  27.6211172379201
Sharrows  9.19191499632497
Informal Dirt Footpath  3.93281102346339
Cycle Tracks  2.65004176773446
Contra-Flow Bike Lanes  2.47592665715739

I still don’t fully understand projections

Disclaimer: I’ve been using PostGIS for less than a week.

PostGIS can store both geometry and geography. Geometry is as we learned in grade school: 2 dimensional cartesian coordinates, simple to manipulate, familiar formulas for length, area, etc. Geography uses latitude and longitude, which are 3-dimensional spherical coordinates. Asking a geography for it’s length will return an answer in degrees, whereas a geometry responds in metres. Simply put, the two don’t really mix.

Some things I’ve learned:

  • A geography is the “only” way to store latitude/longitudes natively without projections.
  • A geometry can only be positioned on the globe when combined with a SRID.
  • Geometry has more features than geography and is much faster.
  • Geography is more appropriate for doing “global” calculations, e.g. trans-oceanic flight paths.
  • Geometry is more appropriate for city, province/county level calculations

Here’s where it gets messy. One thing that worries me is the following quote:

You might be tempted to store latitude and longitude in a geometry type column. That is, to set up your PostGIS column with a geometry type, but use SRID=4326 (which is the EPSG number for WGS 84 latitude and longitude).

Don’t do this.

I’m pretty sure that’s exactly what I’m doing. It’ll take me a bit longer to understand exactly what this means and why it’s a problem, because for now, everything is working fine.

UPDATE: This GIS StackExchange question helps clear things up a bit. More in a future post.

Next step: Creating a Rails app, with migrations to automate the shapefile import and associated data normalization.

Using semantic classes with Twitter Bootstrap

| Comments

Twitter Bootstrap is a CSS framework used to provide some commonly used design elements for web projects. It’s great to get your project up and running quickly.

I’m not much of a designer, and (most of the time) Bootstrap helps me easily achieve things I’d be hard-pressed to do on my own: well-designed margins & padding, carefully selected colours, and fluid grid layouts that work cross-browser.

One thing that irks me about Bootstrap, however, is the fact that it litters your markup with their non-semantic class names. If you’re not familiar with semantic markup, the idea is that your HTML markup should only indicate meaning, not presentation. For example:

Semantic markup compared to non-semantic markup
1
2
3
4
5
6
7
8
9
10
11
  <!-- Not semantic: we're saying "this is bold" in the markup. -->
  This is some <b>important</b> text.

  <!-- Not semantic: there's a tag to indicate headers. -->
  <div class="header">Welcome to the site</div>

  <!-- Better: screen-readers can understand the meaning of the strong tag. -->
  This is some <strong>important</strong> text.

  <!-- Better: a web scraper like the Googlebot will understand this is your header. -->
  <header>Welcome to the site</header>

However, when you look at Bootstrap’s example code, the markup is filled with non-semantic class names:

A typical Bootstrap example
1
2
3
4
5
6
7
8
9
<div class="row">
  <div class="span9">
    Level 1 column
    <div class="row">
      <div class="span6">Level 2</div>
      <div class="span3">Level 2</div>
    </div>
  </div>
</div>

I’m working on an NHL hockey prediction game in Rails, and in it, I want to use some Bootstrap elements:

  • Buttons to select the home team by 1, or the away team by 2
  • Badges to show the user’s score

For instance, if I want to quickly use the green and turquoise buttons, I could just stick btn btn-success classes for some buttons (say picking the home team), and btn btn-info for others (say, picking the away team). But now I’ve lost the ability to do things by meaning: I’m not being DRY. If I ever decide the colours should change, I have to search-and-replace on btn-success and this may affect other places in my markup.

Luckily, I’m using the bootstrap-sass gem in my project. This allows me to use SCSS in my project, even though Bootstrap uses the older LESS framework.

This means I can make use of the @extend directive, and keep my class names semantic. So, rather than giving my buttons class names like btn-success, I can do the following:

HAML markup
1
2
3
  - css_class = (team == game.home) ? "home" : "away"
  - css_class += (spread_wager == 1) ? "by1" : "by2"
  = f.submit "#{team.code} +#{spread_wager}", class: "#{css_class}"
SCSS styles
1
2
3
4
5
6
7
8
9
input.homeby1, input.homeby2 {
  @extend .btn;
  @extend .btn-success;
}

input.awayby1, input.awayby2 {
  @extend .btn;
  @extend .btn-info;
}

Now, my markup has no Bootstrap classes in it. It looks like:

1
<input class="awayby2" name="commit" type="submit" value="PIT +2" />

Cool! I did the same thing with badges elsewhere in the project. To show the impact of a user’s pick, I apply either the win or loss class, and @extend the Bootstrap badge classes in my SCSS.

Some people believe this isn’t worth it, and when it comes to structural/grid layout classes, maybe they’re right. I still think it’s a good idea to separate my “home team by 1” buttons from my “create a new user” buttons, even if they both have the same btn-success class in the end. It will make things easier to change in the future, and won’t require me to apply a second CSS class if I ever need to do more.

For my next project I’m going to take a closer look at Foundation, a competing CSS framework. This framework seems to have native support for Sass, a first-party Rails gem, and their documentation treats using mixins as a first-class citizen.

Multiple forms in Rails

| Comments

Recently I’ve been going through a few Railscasts. These videos are great: focused only on a single task, and easy to follow. Last night I was going through Railscast #136 - jQuery & Ajax, and I decided as an exercise in Rails I’d try to implement the non-AJAX form rather than copy-pasting it. It looks something like this:

Now for a little background: my previous web development experience has been in ASP.NET (aka “web forms”). This is an attempt to bring a Visual Basic style event driven programming model to the web. It’s great for making quick demo apps, but it’s a leaky abstraction that falls apart quickly when trying to do anything that doesn’t fit their model exactly.

One of the biggest flaws of ASP.NET is the fact that it’s postback model forces you to have only one form per page. After working this way for a few years, I won’t say it became natural (because it most certainly isn’t), but I learned to live with it.

As soon as I saw how the task page was laid out (with multiple submit buttons), I knew I wouldn’t be creating a single form and walking through it looking for changes; rather, one form per task. (n.b. I realize isn’t the best user experience, of course, and that’s what the Railscast is all about. I just wanted to make it work exactly as in the example.)

1
2
3
4
5
6
7
8
9
10
11
<% @incomplete.each do |task| %>
  <li>
    <%= form_for task do |f| %>
      <%= f.check_box :completed %>
      <%= f.label :completed %>

      <%= f.submit 'Update' %>
      <%= task.task %>
    <% end %>
  </li>
<% end %>

This works just fine in terms of submitting the form. Each checkbox/update button pair gets it’s own form. However, there’s something wrong: the <label> tag is broken. Clicking on any label always checks the first box!

Looking into the rendered HTML, I see this:

1
2
3
4
5
<input id="task_completed" name="task[completed]" type="checkbox" value="1" />
<label for="task_completed">Completed</label>
...
<input id="task_completed" name="task[completed]" type="checkbox" value="1" />
<label for="task_completed">Completed</label>

There’s definitely an issue with this. You can’t have multiple HTML elements with the same ID.

The solution is to add a namespace to the form_for helper:

1
<%= form_for task, namespace: task.id do |f| %>

Now the rendered HTML looks like this:

1
2
3
4
5
<input id="5_task_completed" name="task[completed]" type="checkbox" value="1" />
<label for="5_task_completed">Completed</label>
...
<input checked="checked" id="3_task_completed" name="task[completed]" type="checkbox" value="1" />
<label for="3_task_completed">Completed</label>

Much better! Every ID on the page is unique and the <label> tags work as expected.

POST-then-redirect and Chrome bugs

| Comments

For the past few days I’ve been working on my first Rails app.

A common pattern in web applications is Post/Redirect/Get. The idea is that if you return any content from a POST, you will break the browser. Consider the following:

  • User sends a POST request to /pictures
  • /pictures returns some result, like “Thanks for submitting.”
  • If the user reloads the page, they’ll get an ugly “You are resubmitting data” dialog. If they click OK, duplicate data will be submitted.
  • If the user bookmarks the page, when they come back, it will be a GET request. This may or may not be what we want.

You see this all the time in older, clunky web applications (think government). If you press the BACK button, it breaks. If you reload, it breaks. And don’t even think about bookmarking anything.

Fortunately, web developers have a pattern to fix this. After completing a POST, they will redirect the user to a “safe” page. This redirect is always made using GET. Common Rails practice is to always redirect from a POST action, like so:

1
2
3
if @picture.save
  redirect_to pictures_path
end

But when I pressed reload on the /pictures page, I saw that ugly resubmit dialog! It didn’t make any sense. I opened the inspector in Chrome, and it showed exactly what I expected: a POST to /pictures, a redirect, and a GET to /pictures.

OK, I must be misunderstanding something in Rails. I press reload, blindly click OK, and I see the last POST is duplicated! This is strange…

Somehow I get the inclination to look up chrome reload after 302 (302 is the HTTP status code for redirects). And I find this Chrome bug report. It confirms that the behavior I’m seeing is in fact a bug in Chrome 25.

It turns out, sometimes the compiler is broken.