I’ve always been fascinated by the amount of complexity that can be beautifully hidden on software.
Take, for example, the Site Validator project. Seen from the outside, it’s a simple app: you enter an URL and it scrapes the site and gives you back a report with the HTML and CSS validation errors. But internally, it involves the collaboration of (at least) fifteen servers.
At its core, Site Validator is a monolithic rails application. While everything could run on a single server instance, to be prepared to scale a better approach is separating it on different servers so you’re able to manage every part of the system independently.
Rails / PostgreSQL / Redis
The rails application itself runs on a 1 Gb server instance on Digital Ocean. Inside it, we’ve got the web server and a sidekiq process with 5 concurrent workers for the background queue. By now, that’s all we need to handle our current load, but we could easily scale by adding more server instances and putting a load balancer on front of them.
For the PostgreSQL database, we’ve set up a 512 Mb server instance also on DO. Also, we needed a Redis store for our sidekiq background processing, so there goes another 512 Mb server instance just for that. Nothing out of the ordinary.
Orchestrating deployments with Cloud 66
We’re using Digital Ocean to host all those server instances, but we didn’t set up any of them. Instead, we let Cloud 66 set them up for us. You just need to give them access to your git repository (ours is at Bitbucket) and permission to manage your cloud servers (Digital Ocean in our case), and they set up all the needed servers, with security, metrics, backups and scaling. You can also set up a web hook so you can deploy just by doing a
W3C validation software
While the site scraping is done by the rails application, the validation of the pages is done using the same open source software that the W3C uses. So, we have a server instance that is just in charge of doing HTML validation, and another server instance for the CSS validation software. The rails app will query them using their APIs, get the results of the validations and process them.
As page validation is a slow process (the validation software needs to get the page to be validated and its stylesheets, process them and return the results), we have several server instances running the same validation software. We’re using 5 server instances for the HTML validation, and 5 server instances for CSS validation. So that’s 10 server instances; they’re easy to set up thanks to the possibility on Digital Ocean to clone servers from their snapshots.
On front of those 10 server instances we’ve put a load balancer server, so our rails application only needs to talk to an IP, the load balancer will transparently manage all the traffic. The load balancer was surprisingly easy to set up using nginx.
And finally, we’ve got a 2 Gb server instance to host the forum, based on discourse. In this case, this server instance holds everything it needs: rails app, postgresql, redis, memcached… We went with the basic setup of discourse that contains everything it needs on a docker image, and it works great out of the box.
So currently we’re using 15 servers for our application, but life would be much harder without the external help we get from:
- Bitbucket to host the git repository.
- RubyGems to host the ruby gems.
- Source Viewer to show the source of the validated pages.
- New Relic to measure the application performance.
- Pingdom for uptime monitoring.
- Sentry to be notified about exceptions.
- Olark for customer support.
- AddThis for social sharing.
- Google Analytics for traffic stats.
- Amazon S3 to store images for the posts and backups.
- Mandrill to send emails.
- Mailchimp for the newsletter.
Thank you everyone!