Tuesday, 28 July 2009

Moving to geekin.gs

I'm moving my blog under a new domain http://geekin.gs
If you've subscribed the RSS feed please do subscribe the new site feed.

Thanks

Paolo

Sunday, 7 June 2009

Install couchdb 0.9.0 and Erlang R13B from sources on ubuntu Jaunty 9.04

I wanted to play with CouchDB 0.9.0 on Ubuntu Jaunty, and to do this I had to compile it from sources, and to be in sync with CouchDBX I compiled as well erlang 13B.
This is the transcript of what I did.

Erlang


credits: ciarang.com

Dependencies


install some dev packages

sudo apt-get install build-essential m4 libssl-dev libncurses-dev

note: if you need support for odbc, jinterface or wx you need to install their dependencies

Compiling



wget http://erlang.org/download/otp_src_R13B.tar.gz
tar zxf otp_src_R13B.tar.gz
cd otp_src_R13B
./configure
make
sudo make install

CouchDB


Credits: blog.james-carr.org

Dependencies



sudo apt-get install libcurl4-openssl-dev libicu-dev libmozjs-dev

Compiling



wget http://apache.mirroring.de/couchdb/0.9.0/apache-couchdb-0.9.0.tar.gz
tar zxf apache-couchdb-0.9.0.tar.gz
cd apache-couchdb-0.9.0
./configure --sharedstatedir=/var/local --localstatedir=/var/local
make
sudo make install

System configuration


You need to create a user for couchdb

sudo useradd -d /var/local/lib/couchdb couchdb
sudo chown couchdb. /var/local/lib/couchdb
sudo chmod 750 /var/local/lib/couchdb

I prefer to have the logs in /var/log/couchdb and the pid in /var/run/couch.pid so let's edit the configuration file
in /usr/local/etc/couchdb/local.ini
edit the file so the log section looks as follows

[log]
file = /var/log/couchdb/couch.log

to edit the pid location we need to edit the file in /usr/local/etc/default/couchdb
this is how my file looks like

COUCHDB_USER=couchdb
COUCHDB_STDOUT_FILE=/dev/null
COUCHDB_STDERR_FILE=/dev/null
COUCHDB_PID_FILE=/var/run/couchdb.pid
COUCHDB_RESPAWN_TIMEOUT=5
COUCHDB_OPTIONS=

It's time to setup the init.d script

sudo cp -v /usr/local/etc/init.d/couchdb /etc/init.d/couchdb

we need to amend/etc/init.d/couchdb to set the permission of the pid file when the daemon starts

you need to find the section

if test -n "$COUCHDB_PID_FILE"; then
command="$command -p $COUCHDB_PID_FILE"
fi

and change it to

if test -n "$COUCHDB_PID_FILE"; then
touch $COUCHDB_PID_FILE
chown $COUCHDB_USER. $COUCHDB_PID_FILE
command="$command -p $COUCHDB_PID_FILE"
fi

again we need to edit /etc/init.d/couchdb because the stop command won't work as it is
you need to find the "stop_couchdb" function at the line

command="$COUCHDB -d"

and change it to

command="$COUCHDB -d"
if test -n "$COUCHDB_PID_FILE"; then
command="$command -p $COUCHDB_PID_FILE"
fi


As a last step we want to create the directory for the logs

sudo mkdir /var/log/couchdb
sudo chown couchdb. /var/log/couchdb

And consequently set the log rotation in /etc/logrotate.d/couchdb

/var/log/couchdb/*.log {
daily
missingok
rotate 52
compress
delaycompress
notifempty
create 640 couchdb adm
sharedscripts
postrotate
[ ! -f /var/run/couchdb.pid ] || kill -USR1 `cat /var/run/couchdb.pid`
endscript
}

And now is time to have some fun

/etc/init.d/couchdb start
curl 127.0.0.1:5984
curl 127.0.0.1:5984/test
curl -X PUT 127.0.0.1:5984/test
curl 127.0.0.1:5984/test
curl 127.0.0.1:5984/test/baz
curl -X PUT -d '{"foo":"bar"}' 127.0.0.1:5984/test/baz
curl 127.0.0.1:5984/test/baz


Note: I just finished checking this installation process with the release candidate of couchdb 0.9.1 and it works fine

Thursday, 14 May 2009

Amazon AWS SimpleDB a la carte with ruby and aws_sdb_bare

Among the Amazon Web Services or AWS if you prefer acronyms, SimpleDB is one of the less popular.

I've been using this service it for a while in some personal projects using the aws-sdb, so far the only choice to deal with SimpleDB at a low level.

Since I wanted to overcome some of the limits of aws-sdb I wrote a new gem aws_sdb_bare.
Main feature of the gem are:

* Complete implementation of SimpleDB api
* XML parsing uses Hpricot or Nokogiri

And be free of chosing your HTTP connection library and strategy!

You can use Net::HTTP or curb or go concurrent using eventmachine or typhoeus to manage your connections with sdb

So just install it


sudo gem install hungryblank-aws_sdb_bare -s http://gems.github.com


And use the code in this gist to get started, rdoc are available.

For a more complete overview of the SimpleDB ruby libraries currently available you should read this article

Monday, 11 May 2009

RailsConf 2009 the "system" stream

RailsConf 2009 finished few days ago and after a long trip back and some time to chill out I was going through the material of the conference and trying to connect some dots among the different topics which were covered.

To give some background, I started my career as sysadmin (or linux geek, if you prefer) and because of that I like when custom software, whether it's a rails app or just a small script is seen in the context of an entire stack or as glue among heterogeneous systems.

Well, this year's RailsConf made me happy including quite a few talks which had this kind of point of view, I'll list them here for my and other's reference.

The Gilt Effect: Handling 1000 Shopping Cart Updates per second in Rails
This presentation was brilliant, the guys at Gilt have a very interesting business model which translates in a really interesting technological challenge.
They use a solutions which combines ruby, rails, java and postgres to manage extreme peaks of traffic, as a note they measure their peak traffic in fractions of the amazon.com ecommerce website traffic!
Since they run time limited sales they leverage very well the EC2 pay per use scheme reducing to the minimum the size of their cluster in quiet periods and expanding it when the sale is on and customer really fights to get hold of the items Gilt offers.
It was the same time I've seen the cloud architecture leveraged in a completely spike driven business.
It's a shame that the slides of thi presentation aren't available yet.

PWN Your Infrastructure: Behind Call of Duty: World at War
These guys run several communities dedicated to popular games, they've been successful and they had to manage to scale their apps, they deploy a single app on multiple servers and they tried some different deployment and monitoring tools, and what they've found out is... the simpler tools are the best for the job they use shell scripting when shell scripting is enough, they deploy with chef to manage their deployment process and monit monit for monitoring. They developed an internal app to keep all the configuration together and aggregate monitoring infos.
I really enjoyed the "use the simpler tool for the job" philosophy they embraced, very hard core linux geek stuff. By the way the only piece of architecture they're looking to change is nfs, used to keep in sync their ruby/rubygems stack across machines.
Slides are on the railscon website

Building a Mini-Google: High-Performance Computing in Ruby Interesting talk very CS oriented, the speaker went through the publicly available docs for google page rank algorithm and in the presentations shows the theory behind it and the ruby code to actually implement it, very high quality content for a 45 mins talk.
Slides are on the railscon website

Confessions of a PackRat
I've lost this one, can't find the slides on line.

Rube Goldberg Contraptions, Building Scalable Decoupled Web Apps and Infrastructure with Ruby Really interesting talk, presenting a bunch of interesting technologies and solutions at once: deployment with chef, computer cloud coordination and administration with nanite demoed live controlling all the notebooks in the room. Presentation on slideshare but go and check out nanite on github now!

Art of the Ruby Proxy for Scale, Performance, and Monitoring
Missed this one, but slides are available on the railsconf website and some excerpt can be found on the speaker's blog. Learn how to put small ruby proxies in front of your services to achieve better performances, adding load balancing, filtering and other really neat tricks using just ruby, EventMachine and other bits.

It's Not Always Sunny In the Clouds: Lessons Learned
Interesting talk about the problems of deploying your app on AWS infrastructure, really practical tips and tricks to run your app in the cloud, slides on
railsconf website

%w(map reduce).first - A Tale About Rabbits, Latency, and Slim Crontabs Ok this was my talk so I'll skip any comment slides are on slideshare.

This is the end of this post railsconf wrap up, I hope it's been interesting and please leave comment if you find around the missing slides

Saturday, 14 March 2009

consume web services like a ninja with eventmachine end event_utils

Lately I spent some time using eventmachine, my main interest so far has been in the deferrables and in writing clients to consume web services using them.

In the process I ended up writing event_utils a gem that makes (or at least should) more intuitive to write clients based on eventmachine deferrables.

Now let's start working at an example, first things first we need a web service to consume, our choice for this tutorial will be a dummy service named slowrand.

If you hit the url http://slowrand.elastastic.com/?delay=1 you'll be served a random number between 0 and 9, the random number will be served after a delay of at least one second.

So now that we have a web service to use, let's write a class that wraps the service.
Save the following as slow_rand.rb


class SlowRand

attr_accessor :value

include EM::Deferrable
def initialize(delay = 1)
client = EM::Protocols::HttpClient.request(
:host => "slowrand.elastastic.com",
:query_string => "delay=#{delay}")
@value = nil
client.callback do |response|
self.value = response[:content].to_i
puts "fetched value #{value} at #{Time.now}"
self.succeed
end
client.errback { self.fail }
end

def +(other)
self.value + other.value
end

def to_s
value.to_s
end

end

Let's take a look at the code above

In the initialize method we

  1. use the EM HttpClient to generate a request to slowrand

  2. we bind a callback to the request, in the callback we set the instance variable @value, we print some info on output and we set the deferred status of the slowrand object to succeeded

  3. we bind an errback to the request to have some feedback in case the request fails


the Slowrand class defines also a + method do add one slowrand to another and for convenience defines a to_s method

ok now let's write client that fetches 2 slowrand and calculates the sum, the code will be


require 'rubygems'
require 'event_utils'
require 'slow_rand'
include EventUtils

in_deferred_loop do
puts "started at #{Time.now}"

a, b = SlowRand.new, SlowRand.new

waiting_for(a, b) do
sum = a + b
puts "sum executed at #{Time.now}, #{a} + #{b} = #{sum}"
end
end

Save the code above in client.rb
So, we initialize a deferred loop and print out the timestamp, after that we say to our client that he needs to wait for two slowrands to be fetched and only after that we execute the sum.

The idea behind it is very simple a and b are 2 deferrables and will return instantly but the value of a and b will be defined only after the web service will reply, that's why we ask our deferred loop to wait, specifically it will be waiting until all the deferrables listed in waiting_for will have the deferred status set.

Before running the client you need to install the eventmachine and event_utils gems

sudo gem install eventmachine
sudo gem install hungryblank-event_utils -s http://gems.github.com

and finally you can run the client

ruby client.rb

and you should see an output that looks like

started at Mon Mar 16 21:53:13 +0000 2009
fetched value 1 at Mon Mar 16 21:53:14 +0000 2009
fetched value 6 at Mon Mar 16 21:53:14 +0000 2009
sum executed at Mon Mar 16 21:53:14 +0000 2009, 1 + 6 = 7

Ok that's very little satisfaction but it takes little effort to make the client code more interesting.

Save the following code in client_multi.rb

require 'rubygems'
require 'event_utils'
require 'slow_rand'
include EventUtils

in_deferred_loop do
puts "started at #{Time.now}"

a, b = SlowRand.new(3), SlowRand.new(3)
c, d = SlowRand.new(2), SlowRand.new(2)
e, f = SlowRand.new, SlowRand.new

waiting_for(a, b) do
sum = a + b
puts "== sum with delay 3 =="
puts "sum executed at #{Time.now}, #{a} + #{b} = #{sum}"
end

waiting_for(c, d) do
puts "== sum with delay 2 =="
sum = c + d
puts "sum executed at #{Time.now}, #{c} + #{d} = #{sum}"
end

waiting_for(e, f) do
puts "== sum with delay 1 =="
sum = e + f
puts "sum executed at #{Time.now}, #{e} + #{f} = #{sum}"
end
end


So in this case we execute 3 sums, as in the client seen before but with a twist, in our code we setup first a sum of slowrands with a delay of 3 seconds, after one with a delay of 2 seconds and at last one with a delay of 1 second.

After running our new client

ruby client_multi.rb

The output will look like

started at Mon Mar 16 21:57:48 +0000 2009
fetched value 0 at Mon Mar 16 21:57:50 +0000 2009
fetched value 2 at Mon Mar 16 21:57:50 +0000 2009
== sum with delay 1 ==
sum executed at Mon Mar 16 21:57:50 +0000 2009, 0 + 2 = 2
fetched value 8 at Mon Mar 16 21:57:51 +0000 2009
fetched value 6 at Mon Mar 16 21:57:51 +0000 2009
== sum with delay 2 ==
sum executed at Mon Mar 16 21:57:51 +0000 2009, 8 + 6 = 14
fetched value 8 at Mon Mar 16 21:57:52 +0000 2009
fetched value 8 at Mon Mar 16 21:57:52 +0000 2009
== sum with delay 3 ==
sum executed at Mon Mar 16 21:57:52 +0000 2009, 8 + 8 = 16

This is more interesting.
Because of the non blocking nature of eventmachine every sum has been performed as soon as possible.

The sum of the slowrands with a delay of 1 second, which was written as last in our code did actually get executed first without waiting for the code above it to be executed.

This example makes even more clear the advantages on the overall timing, the client fetched 2 slowrands with delay 1 second, 2 slowrands with delay 2 seconds and 2 slowrands with delay 3 seconds, a client that would fetch values sequentially would then spend at least 12 seconds to perform all the operations while in this case we spend about 4 seconds.

In addition of a quicker overall processing we published the first result available after a bit more than one second while a client performing actions sequentially would have spent 6 seconds waiting before publishing anything.

This example is more interesting if you think in terms of services with non predictable delays, without estimating what will happen first, you just need to specify what is needed to execute some specific code and let the events drive your code.

Hoping to have provided some interesting ground to explore I'll finish saying thanks to all the people who contributed to eventmachine or wrote tutorials that helped me in the process of learning.

Monday, 26 January 2009

Start using Amazon SimpleDB with ruby in 10 minutes

The information in this post is outdated, if you want a better complete overview of the SimpleDB ruby libraries you should read this article

Out there you can already find more than one project that lets you access Amazon SimpleDB using ruby.

Since I wasn't really happy with any of them I started building my solution on top of aws-sdb.
Being really bad at naming projects I named my gem dead_simple_db.

To install the gem just type in your console


sudo gem install hungryblank-dead_simple_db -s http://gems.github.com


Here's how you can use it.


require 'rubygems'
require 'dead_simple_db'

#you need your Amazon AWS credentials defined in the environment

ENV['AMAZON_ACCESS_KEY_ID'] = 'your access key'
ENV['AMAZON_SECRET_ACCESS_KEY'] = 'your secret access key'

#Let's define a class that will use SimpleDb backend to store the instances

class Client < DeadSimpleDb::Base

#Let's define the SimpleDb domain where the class will be stored
domain 'test_domain'

#Add the definitions of the attributes we need to store
attr_sdb :first_name, 'String'
attr_sdb :last_name, 'String'
attr_sdb :budget, 'Integer', :digits => 9
attr_sdb :first_purchase, 'Time'

end

#and now is time to create the first object

c = Client.new
c.first_name = "Henry"
c.last_name = "Chinaski"
c.budget = 1000
c.first_purchase = Time.now

# that's how you save your first record on Amazon SimpleDB

c.save

# and that's how you fetch it

henry = Client.find(:first, "['first_name' = 'Henry']")
puts henry.first_name


Now this might be not exciting but it's been pretty easy, if you've time you can check that


henry.first_purchase


Is actually a Time object and not a string, dead_simple_db does the type casting for you so you don't have to worry if SimpleDb can store only strings.
For a similar reason the budget attribute has been defined with 9 digits, on the SimpleDB backend 1000 is stored ad "000001000" to allow the sorting as string to work properly with numbers.

Note that dead_simple_db always stores in the records a 'class_name' attribute with the name of the class the record refers to, this allow you to store more than one class in the same domain.

If you do store multiple classes in one domain, please remember to add the class_name to your queries to avoid weird results!

So here you go you have no more excuses to not take a look a SimpleDB, is free and now it's simple to access it in a way that makes instantly sense.

Planned improvements on the short term are


  • Return query results with over 250 elements

  • Introduce support to QueryWithAttributes in find

  • Automatically add class_name filter to queries