Amazon Web Services: Integrating S3 and EC2 in a few simple steps

Hi there! This is my first post on emfluence emsights – I joined emfluence recently as a software engineer – and I’m excited to be here.

By way of an introduction, the title of this post is decidedly nerdy. It’s a bunch of acronyms that sound like Star Wars robots. If the idea of tinkering with code gives you the heebie jeebies… You might consider skipping to another post. But if you’re interested in setting up highly scalable cloud based server architecture, then read on!

As just about everyone in web development knows these days, Amazon Web Services (AWS) has become one of the preferred cloud hosting services in the US. Why? Because it’s relatively easy and cost effective to set up robust yet flexible web sites and applications. And because it lets web devs large or small use the same kind of hosting as the biggest companies on the market.

When you start getting into scalability, you want to be able to use your web server (EC2) instances with your virtually limitless storage bucket (S3). That way, you can boot up as many EC2 instances as your traffic demands and they can all access the same files.

But S3 is a transactional storage service, not a drive. I can hear you saying: How does that mesh with my website, which deals in files and folders?

In order to use S3 as a drive, you need to FUSE a bucket to a particular folder in your VM. Once that is done, the folder behaves just like any other folder from the standpoint of file management, but the storage occurs in the S3 bucket.

ABOUT SHARING: While S3 is made to be attached to many servers, it’s probably best if those servers share both codebase and database so that conflicts do not occur in file management. Setting up a shared database is very easy with AWS, but is a topic for another day.

ABOUT TIMING: Most of this can occur on a live site, without interrupting anything. I wouldn’t recommend it, though. You’re installing software and messing with the file system. There’s always the chance for conflicts to occur. So it’s better to do this before the site launches, when you can afford to turn off or interrupt the web server. At a minimum, you could spin up a dev copy of the EC2 instance and experiment there before doing it on the live server. Then you could take the minimum required time on the live server, during a time of day when there isn’t much traffic, and reduce your stress levels a bit.

SETTING UP THE BUCKET

Create a new S3 bucket. Name it something specific to the project and use. For instance, “example_upload_folder”. Make sure the name is all in lower case and contains no spaces or other special characters. FUSE requires the name to be in lower case. Don’t worry about creating permissions on the bucket.

Each bucket should have its own user created for programmatic access via FUSE. Create a user in the AWS IAM. As with the bucket, name it something specific to the project and use. For instance, “example_upload_usr”. Copy the secret access key and public access key for later.

Create 2 custom permissions on the user, one for each of the following scripts: (change specifics to your bucket’s details)

{ "Statement": [ { 
  "Action": "s3:*", 
  "Effect": "Allow", 
  "Resource": [ 
    "arn:aws:s3:::BUCKET_NAME", 
    "arn:aws:s3:::BUCKET_NAME/*" 
  ] 
} ] }

And

{ "Statement": [ { 
  "Action": "s3:ListAllMyBuckets", 
  "Effect": "Allow", 
  "Resource": "arn:aws:s3:::*" 
} ] }

SETTING UP THE VM

There are four steps to be taken.

Install FUSE;
Create a security file;
Add an fstab entry to automatically mount the FUSE when the VM starts;
Mount / activate the FUSE.

Installing FUSE

https://code.google.com/p/s3fs/wiki/FuseOverAmazon

Run through the FUSE installation instructions. I used the SVN checkout method to download the application. I installed on Ubuntu (like most of our VMs), but because I was using YUM as my installer, I actually used the Fedora / CentOS instructions for installing dependancies. Make sure you install all of the dependancies before trying to build FUSE. I didn’t encounter any problems with the installed packages, though there’s always the chance that you install something and it damages your VM.

Creating a Security File

Create a file as:

PATH_TO_SITE/security/BUCKET_NAME

PATH_TO_SITE might start with /var/www/… But if you set up your site on the server, you’ll know where your site is located. Anyway – inside that file, enter a single line of text in the following format:

bucketName:accessKeyId:secretAccessKey

Save and exit. Make sure the file permissions are sufficient to allow your web server access.

Adding an fstab entry

Grab the UID and GID of the web server (apache or nginx, whatever you’re using on this VM). You can discover these by using:

cat /etc/passwd

These will allow the FUSE to be mounted with the correct ownership and permissions.

Edit the file at:

/etc/fstab

Add a line at the end, in the following format:

s3fs#BUCKET_NAME PATH_TO_SITE/httpdocs/DIR_TO_FUSE fuse defaults,uid=WEB_SERVER_UID,gid=WEB_SERVER_GID,allow_other,use_cache=/data,default_acl=public-read,use_cache=/tmp/s3-cache,passwd_file=PATH_TO_SITE/security/BUCKET_NAME 0 0

ALMOST DONE: TIME TO ACTIVATE

Now, when the VM restarts, the FUSE will be automatically mounted. When you’re ready, you can restart the VM and your FUSE will be available. Up until now, everything caould be done without interrupting the hosted site(s). Before you activate, try a fake test mount with the following command:

mount -afv

If you don’t get any errors, you’re good to go.

You have two options. You can activate the mount in a straightforward, forceful way. Or you can be a fancy pro and mount completely seamlessly. If your site isn’t live yet, I’d recommend the first option.

The Straighforward Way

It’s probably a good idea to stop the web server before continuing (apache or nginx, etc). That way, your web app isn’t fighting against you as dependent folders are changed.

You’ll want to make sure that the target folder does not already exist in your file system. If it does, rename it to something else temporarily. After the FUSE gets mounted, you can copy those files back into their proper location.

Restart your VM. This can take a few minutes. The FUSE should now be mounted. Your VM will probably show you a list of active mounts when you reconnect to it. You can test the mount by dropping a file into it and logging into the S3 bucket via the AWS console.

The Fancy Way

You can mount the FUSE without restarting, but if the attached folder is being used on your site site then you’ve got a conflict. I haven’t tried this method, but here’s the idea:

Move your target folder and symlink to it from the target folder’s original location.
Instead of FUSEing directly to the target folder, FUSE to a folder nearby.
Mount the FUSE. The FUSE project page lists this as the command:
/usr/bin/s3fs mybucket /mnt
Test the FUSEd mount at your leisure.
Copy the (moved) target folder’s contents into the FUSE folder.
Alter the symlink to point to the FUSE folder.
Profit.

If you do it this way, you’ll want to be confident that your fstab works as expected. The best way to gain this confidence is to spin up a replica EC2 instance and see if the FUSE automatically mounts as expected.

Troubleshooting

If the fuse mount fails for some reason you’ll receive the message ‘Transport endpoint is not connected’. To correct this remount the drive and fuse. Here’s the simple commands:

sudo umount /mnt/aws_3s
sudo mount -va /etc/fstab

If that fails, try:

sudo fusermount -u /mnt/aws_s3
sudo mount -va

Did these instructions help you build a better website? Were you able to use them easily? Do you think that AWS is the best or that there’s something better to use? Let me know in the comments!