« HE:labs
HE:labs

How to upload big files to Amazon S3 while using Heroku

Postado por Ali Ismayilov em 21/02/2014

On the last MVP we had an issue with storing big photos. We solved it with filepicker.io. But filepicker.io is kinda expensive and not always the client can be happy with that.

The project Cliqx was using paperclip to manage photo uploads and aws-sdk to store them on S3. If user would upload a file, it would go to Heroku webserver, stored in /tmp/cache and then would be pushed to Amazon S3. The problem with uploading to Heroku is that it has explicit 30 seconds of maximum timeout. If you didn't finish uploading your file in 30 seconds, you are screwed.

The solution is to upload a file directly to S3 and attach it to the model. Happily, there's a gem for that:

S3DirectUpload

README says:

Easily generate a form that allows you to upload directly to Amazon S3. Multi file uploading supported by jquery-fileupload.

I've already created project on github to make use of it: https://github.com/aliismayilov/bigphotoblog

I'll walk through the key parts of the project.

S3 Bucket configuration

First of all you'll need to add CORS configuration to your S3 bucket. It looks like this:

 1 <CORSConfiguration>
 2   <CORSRule>
 3     <AllowedOrigin>http://0.0.0.0:3000</AllowedOrigin>
 4     <AllowedMethod>GET</AllowedMethod>
 5     <AllowedMethod>POST</AllowedMethod>
 6     <AllowedMethod>PUT</AllowedMethod>
 7     <MaxAgeSeconds>3000</MaxAgeSeconds>
 8     <AllowedHeader>*</AllowedHeader>
 9   </CORSRule>
10 </CORSConfiguration>

NOTE: Don't forget to change AllowedOrigin to the heroku app url.

Paperclip initializer

Paperclip initializer looks like usual. It's configured for S3 storage.

1 # config/initializers/paperclip.rb
2 Paperclip::Attachment.default_options[:storage] = :s3
3 Paperclip::Attachment.default_options[:bucket] = ENV['S3_BUCKET']
4 Paperclip::Attachment.default_options[:s3_permissions] = :public_read
5 Paperclip::Attachment.default_options[:s3_credentials] = {
6   access_key_id: ENV['AWS_ID'],
7   secret_access_key: ENV['AWS_KEY']
8 }

S3DirectUpload initializer

It also needs to know about AWS credentials to connect directly to the bucket.

1 # config/initializers/s3_direct_upload.rb
2 S3DirectUpload.config do |c|
3   c.access_key_id = ENV['AWS_ID']
4   c.secret_access_key = ENV['AWS_KEY']
5   c.bucket = ENV['S3_BUCKET']
6   c.region = nil
7   c.url = nil
8 end

NOTE: Don't forget to put ENV variables in your .env file and/or set them as Heroku env variables.

Asset files

 1 # app/assets/javascripts/application.coffee
 2 //= require jquery
 3 //= require jquery_ujs
 4 //= require turbolinks
 5 //= require s3_direct_upload
 6 //= require_tree .
 7 
 8 ready = ->
 9   $("#s3-uploader").S3Uploader()
10 
11 $(document).ready(ready)
12 $(document).on('page:load', ready)
1 // app/assets/stylesheets/application.scss
2 /*
3 *= require_self
4 *= require s3_direct_upload_progress_bars
5 *= require_tree .
6 */

In the forementioned project I used bootstrap-sass and its progressbar stylings.

Form view

This snippet will render s3_direct_upload form with its progress bars:

 1 = s3_uploader_form callback_url: model_url, callback_param: "model[image_url]", id: "s3-uploader"
 2   = file_field_tag :file, multiple: true
 3 
 4 erb:
 5   <script id="template-upload" type="text/x-tmpl">
 6     <div id="file-{%=o.unique_id%}" class="upload">
 7       {%=o.name%}
 8       <div class="progress"><div class="bar" style="width: 0%"></div></div>
 9     </div>
10   </script>

ActiveRecord creation

As soon as the file is saved on S3 there will be a callback to the url specified as callback_url. By default it will be a POST request. By the Rails REST convention it will trigger ModelsController#create method. Here's our snippet from the controller:

1 # app/controllers/photos_controller.rb
2   def create
3     current_user.photos.create upload: URI.parse(URI.unescape(params['url']))
4   end
5 # Photo is an activerecord model. Photo#upload is a paperclip field.

Yes, all we need is to parse 'url' param which comes from S3. Our Heroku app will quickly download that file, do all the processing (e.g. thumbnail creation) and store it back in S3. As Heroku apps are hosted on Amazon servers, download latency and speed will be very good. Longest delay can happen with the post-processing of the file.

If the post-processing of the file takes longer, consider using delayed jobs. But that is a topic for another blogpost 😉

Happy photo

Here's how it was working for me. Happy coding!

Screenshot

Compartilhe

Sabia que nosso blog agora está no Medium? Confira Aqui!