Saturday, May 14, 2011

How I installed Cassandra and pycassa

I first went to Debian and downloaded an image for the squeeze. I chose the small iso image, use what you want.

Debian by default does not allow non-free software, which you need if you want to use Sun/Oracles java. I wanted to use apt-get to install the latest version of Cassandra, so after installing debian I first updated my apt source list. I used git and easy_install to install pycassa.

You can obviously use the editor of your choice, I like vi. I am also lazy and get tired of sudo'ing to root all the time, so I just su to root normally.

vi /etc/apt/sources.list

add non-free at the end of the lines so they look like this:

linux/debian/ squeeze main non-free

add these lines to the end of the file. You can choice stable instead of unstable if you don't want the latest version (at the time of this writing 0.8.0~rc1 was the latest)

#Cassandra DB
deb http://www.apache.org/dist/cassandra/debian unstable main
deb-src http://www.apache.org/dist/cassandra/debian unstable main

Once you are done editing the file, then it's time to get the apache keys:

wget http://www.apache.org/dist/cassandra/KEYS -O- | apt-key add -

Now it is time to get a lot of software. Using apt-get get doing the following:

Update the cached database first:

apt-get update

then:
apt-get install sun-java6-jdk libmx4j-java python2.6-dev dpkg-dev python-setuptools git git-doc subversion

I choose to use the sun/oracle java, so I will update my default java. (this only matters if openjava is installed)

update-alternatives --config java

Now it is time to install Cassandra

apt-get install cassandra

I don't know if I should need to do this, but every install I have done I have had to make this link.

ln /usr/share/cassandra/jamm-0.2.2.jar /lib/jamm-0.2.2.jar

To get the Cassandra extra's running jna.jar and the mx counters do this:


change directory to a tmp or home

cd

wget -O mx4j-3.0.2.tar.gz http://downloads.sourceforge.net/project/mx4j/MX4J%20Binary/3.0.2/mx4j-3.0.2.tar.gz

tar zxvf mx4j-3.0.2.tar.gz mx4j-3.0.2/lib/mx4j-tools.jar

cp mx4j-3.0.2/lib/mx4j-tools.jar /usr/share/cassandra


Now I install pycassa

easy_install pycassa
easy_install thrift05

Ok, at this point you should be good to go. I use:

service cassandra stop && service cassandra start

to stop and restart the service. /var/log/cassandra contains the logs so you can check to see if everything is running correctly.

After stopping and starting, and assuming everything is going ok you can do a quick test.

nodetool -h localhost ring


I hope this helps.
For more information see the Cassandra documention, Datastax, Pycassa docs. If you need help getting stress.py running see my other post.















Friday, May 13, 2011

How I got Cassandra stress.py to work

I am using debian squeeze and already had cassandra running from the debian packages. If you are on a different distribution, you mileage may vary.

Get Cassandra source (current version 0.8.0~beta2)
svn co http://svn.apache.org/repos/asf/cassandra/tags/cassandra-0.8.0-rc1/ cassandra

Get needed packages to build Thrift (all on one line)
apt-get -y install libboost-dev python-dev autoconf automake pkg-config make libtool flex bison build-essential ant subversion

Get Thrift (must be version 0.6.0 wiht 0.8.0~beta2 cassandra)
svn co http://svn.apache.org/repos/asf/thrift/tags/thrift-0.6.0/ thrift

Now compile Thrift
cd thrift
./bootstrap.sh
./configure (you may need to add --without-ruby or --without-csharp or any other targets that are giving you trouble)
make
make install (MUST BE ROOT! SU or SUDO)

Now make the python libraries
cd ~/thrift/lib/py
python setup.py install

Change directory to Cassandra source
cd ~/cassandra

Now compile thrift for python (I think this is why)
ant gen-thrift-py

Change directory to the python stress directory
cd ~/cassandra/tools/py_stress

Cross your fingers and run it!
python stress.py --num-keys 1000000 --threads 8 --keep-going