Tag Archives: linux

First Steps Into Big Data With Apache Cassandra

I’ve got a monitoring application at work that I wrote and maintain which currently uses MySQL as a storage back end. With the amount of data it holds and the activity in the system, MySQL has gone from “probably not the optimal solution” to “really stupid”. The system is comprised of many storage servers, most of which are completely I/O bottle-necked because of MySQL’s write overhead. It’s a typical “big data” kind of problem, and I need a big data solution.

Over the past couple of weeks, I’ve been experimenting with Apache Cassandra. We recently started using it in another context, and it seems pretty damned slick. Based on what I read, it seems like a great fit for my needs. The data model is consistent with what I do in MySQL, and the built-in redundancy and replication is awesome.

Most of the stuff I’ve tried so far has Just Worked (TM). Setting up a basic test cluster was easy, and once I found a suitable PHP client library for accessing Cassandra, I was able to make my development setup store collected data in under 30 minutes. I started off using a data model that pretty much mirrored what I was doing in MySQL, but as I learned more, I was able to strip away a few of the MySQL-specific “optimizations” (read: hacks) in favor of a more streamlined setup.

However, there are a few things that just make me scratch my head. From what I can tell, updating rows in Cassandra is “strange”. In my testing so far, inserting new data works flawlessly. Both adding new rows and adding columns onto an existing row work as expected. However, I notice lots of weirdness when updating pre-existing columns in pre-existing rows. It seems as though Cassandra is only updating the values associated with columns if the value is “larger” than the previous. See the following for an example.

# ./casstest
truncating column family for cleanliness...
========================================================
What we're storing...
Array
(
    [timestamp] => 1339529068
    [value] => 0.01
)
storing ...
sleeping a second for consistency...
What is retrieved from a get()...
Array
(
    [timestamp] => 1339529068
    [value] => 0.01
)
========================================================
What we're storing...
Array
(
    [timestamp] => 1339529071
    [value] => 1.01
)
storing ...
sleeping a second for consistency...
What is retrieved from a get()...
Array
(
    [timestamp] => 1339529071
    [value] => 1.01
)
========================================================
What we're storing...
Array
(
    [timestamp] => 1339529074
    [value] => 2.01
)
storing ...
sleeping a second for consistency...
What is retrieved from a get()...
Array
(
    [timestamp] => 1339529074
    [value] => 2.01
)
========================================================
What we're storing...
Array
(
    [timestamp] => 1339529077
    [value] => 1.01
)
storing ...
sleeping a second for consistency...
What is retrieved from a get()...
Array
(
    [timestamp] => 1339529077
    [value] => 2.01
)
========================================================
What we're storing...
Array
(
    [timestamp] => 1339529080
    [value] => 0.05
)
storing ...
sleeping a second for consistency...
What is retrieved from a get()...
Array
(
    [timestamp] => 1339529080
    [value] => 2.01
)
========================================================

In the example above, the timestamp column is just the result of a call to time(), so it will always increment over time. The values for the value column are just a few static entries pulled from a pre-populated array I used for testing. They increment three times, then decrement twice. I’m just making a simple array out of the two pieces of data, and then doing a set operation to write the data into Cassandra. As you can see, the timestamp fields show the proper values each time the key is retrieved, but the value column only shows the proper values when the value being written is larger than the last. WTF? I don’t know whether to blame Cassandra or the PHP client library I’m using (CPCL), but it’s really cramping my style at this point. I’ve gone as far as watching the contents of the TCP connections between client and server with tcpdump/wireshark to see if the client is making the same set requests for all values, and it seems to be. I’ve also tried changing the write consistency level, with no change.

It is also worth noting that when using the cassandra-cli utility to do inserts sets/gets manually, things work as I would expect.

[default@keyspace] assume TestCF VALIDATOR as utf8; 
Assumption for column family 'TestCF' added successfully.
[default@keyspace] assume TestCF SUB_COMPARATOR as utf8; 
Assumption for column family 'TestCF' added successfully.
[default@keyspace] assume TestCF keys as utf8; 
Assumption for column family 'TestCF' added successfully.
[default@keyspace] assume TestCF COMPARATOR as utf8; 
Assumption for column family 'TestCF' added successfully.
[default@keyspace] get TestCF['TestKey'];
=> (column=timestamp, value=1339532764, timestamp=172800)
=> (column=value, value=2.01, timestamp=172800)
Returned 2 results.
Elapsed time: 2 msec(s).
[default@keyspace] set TestCF['TestKey']['value'] = utf8('0.0');
Value inserted.
Elapsed time: 1 msec(s).
[default@keyspace] get TestCF['TestKey'];
=> (column=timestamp, value=1339532764, timestamp=172800)
=> (column=value, value=0.0, timestamp=1339532783904000)
Returned 2 results.
Elapsed time: 2 msec(s).
[default@keyspace] set TestCF['TestKey']['value'] = utf8('2.0');
Value inserted.
Elapsed time: 2 msec(s).
[default@keyspace] get TestCF['TestKey'];
=> (column=timestamp, value=1339532764, timestamp=172800)
=> (column=value, value=2.0, timestamp=1339532783913000)
Returned 2 results.
Elapsed time: 2 msec(s).
[default@keyspace] set TestCF['TestKey']['value'] = utf8('1.5');
Value inserted.
Elapsed time: 1 msec(s).
[default@keyspace] get TestCF['TestKey'];
=> (column=timestamp, value=1339532764, timestamp=172800)
=> (column=value, value=1.5, timestamp=1339532783923000)
Returned 2 results.
Elapsed time: 2 msec(s).
[default@keyspace] set TestCF['TestKey']['value'] = utf8('0.2');
Value inserted.
Elapsed time: 0 msec(s).
[default@keyspace] get TestCF['TestKey'];
=> (column=timestamp, value=1339532764, timestamp=172800)
=> (column=value, value=0.2, timestamp=1339532783933000)
Returned 2 results.
Elapsed time: 2 msec(s).

Another thing that isn’t acting as I would expect is row deletions. In my testing, it seems that once a row has been deleted, subsequent attempts to write to that row will just silently fail. I suspect that it has to do with the fact that Cassandra’s distributed nature makes deletes a bit counter-intuitive, which is outlined here in the Cassandra documentation. It would be nice to know for sure, though.

EDIT: I was doing it wrong. Sigh. Deletes are still weird to me though.

In Search of the Ultimate Desktop

Back in the day, when first started playing with Xen years and years ago, I had this idea of being able to run a multi-user desktop, with each user getting their own OS with a keyboard, mouse, and monitor all running from the same physical computer. I didn’t really have much use for it at the time, but I thought it would be a pretty cool demonstration of the technology. I discovered pretty quickly that it wasn’t possible. Hardware assisted virtualization (HVM) was around at that point, but Xen lacked the ability to pass a video device into the child OS.

The times have changed, as the following long-winded video demonstrates:

As of version 4.0, Xen now has the ability to pass a VGA device through to a child instance. This makes my idea possible, and now I have to try it!

A fair amount of research on the subject left me a bit disheartened. Said research told me that in order to pass a VGA device through to a fully-virtualized child instance (the only way you can run Windows in Xen), you need to have special hardware support that’s only found in newer machines. None of the machines I have, even the whole pile machines acquired in the hardware free-for-all at work, had the necessary hardware support. My goal was out of reach.

I still wanted to try it though, just for shits and giggles. Something told me it was possible. My reading said that it was possible to pass VGA adapters through to paravirtualized child OSs, and Windows has paravirtualized driver support via the GPLPV driver bundle. They’re not exactly the same, but I figured it was worth a try to see if things worked. I set up a test box running Fedora 16/Xen 4.1.2, and grabbed a templated Windows XP install to see if I could make things work.

I set up the Windows with the latest GPLPV drivers and configured the parent to pass through a PCI express slot containing a Nvidia GeForce 8400 GS to the child. I tried configuring the device as both the primary graphics adapter and a secondary one, but neither worked completely. With the card configured as a secondary adapter, I could still interact with the system via the Xen-provided VNC console. From there I saw that the card was being seen by the OS, but it wasn’t able to activate it. Unable to start device. Code 10. Blah. So close! The parent’s logs showed repeated issues with IRQs, so I’m wondering if something isn’t lining up properly that I can fix.

At one point, I was able to (accidentally) make the Windows instance use the crappy onboard video card the parent was using, so I’m hopeful! Xen’s documentation says that passing the parent’s VGA card to a child seems to be stable, but that’s not really what I’m going for.

In reading some of Xen’s wiki documentation, it seems that ATI graphics hardware might be better supported at this point. That fact hasn’t made things work any better. I tried a Radeon card to no avail. Nerds! Perhaps I just need to buy some new hardware to make this go.

Free Stuff and Kickstart Hell

Recently, my employer had a free hardware giveaway that allowed employees to pick through pallets upon pallets of retired server hardware and claim things as their own. Some of it was known to be dead, but the state of most of it was unknown, so it was kind of a crap shoot. I grabbed myself a few (okay, thirteen) machines in the hopes that I’d pick up a few decent pieces of machinery. As luck would have it, ten out of the thirteen boxes were in working order. Five of them were dual-dual core machines, and the other five were dual-quad-core machines, all server grade. After picking up some RAM, I have a pile of working machines just waiting for something to do. Time for a new lab setup at home!

The first thing I’m working on is a kickstart setup to automate system installs of various types. I’ve got a few basic CentOS server setups configured already, and I’ve been working on some Fedora-based stuff as well. Fedora is much more bleeding-edge, so there’s a lot of change from what I’m used to with the “OMG SO OLD” CentOS.

One of those differences cropped up while I was trying to write up a kickstart script for a Fedora 16-based desktop with an encrypted storage setup on a software RAID1. Prior to Fedora 16, one could easily set up a kickstart that would partition a disk with “growable” RAID partitions of non-fixed size – a partition on the disk that would grow to fill the available space on the disk. Fedora 16 forces you to create a RAID partition of fixed size, without the ability to fill up the available space. This makes the kickstart script non-flexible, since you have hard-coded size values for each partition. This is not at all desirable in the setup I was going for, except for the /boot partition.

With a bit of elbow grease and a non-trivial amount of time spent waiting for test runs to complete, I was able to craft a work-around. I’ve documented it here.

OpenWRT Successes and Failures

I’ve been a big fan of the OpenWRT project for a while now. I love the idea of taking an inexpensive single-purpose device with low power needs and turning it into a flexible platform that can fill multiple roles. I’ve been using it on a few Linksys WRT54GL routers for a while now with great results, even if the plain old “Wireless Access Point” function I’ve given them is fairly bland.

Sooner or later, that may change. A few months ago, I bought a couple Asus WL-520gU units to act as a platform for more OpenWRT experimentation. They’re quite similar to the WRT54GLs in their specifications, but they have one added feature that I highly desire – a USB port. The now-ubiquitous feature of modern computers is still fairly absent from devices compatible with OpenWRT.

I have two primary use cases in mind for these units and their USB ports. The first is for use in my home monitoring and automation setup, which uses 1-wire devices inside and outside of the house to monitor the environment and manipulate various things, such as electricity fed to power outlets or HVAC controls. These 1-wire devices are controlled by a computer, and the 1-wire network connects to the computer using USB. In this setup, I could connect back into the main IP network by using the WL-520gU’s wireless in client mode, or it connect to the network using ethernet and function as an access point while also doing 1-wire control things.

The second use case is to make the device into a wireless router (gasp!) that connects to a cell phone using its USB tethering function and provides the 3G internet to the devices connected through it, either via the wifi or ethernet ports. This would serve as an easily transportable and totally mobile internet connection that could be used while travelling. This would also have the secondary benefit of providing emergency internet access to our overly-complicated home network setup, which uses routing protocols and separate virtual machines to handle routing duties for separate internet connections (of which we have two). With the ‘magic’ of the routing protocols, I could simply plug the tethered router into the appropriate VLAN on our network and its routes to the internet would be exposed to the rest of our network, providing access if our cable and DSL services are down.

I’ve been largely successful with the first goal. OpenWRT’s backfire branch has support for 1-wire USB host devices and the OWFS interface layer that I use to access the 1-wire network. It took me a while to find the right version of OpenWRT to use, but I think I’ve got a stable setup. I’m cooking up my own firmware images so I can customize what goes in the ROM image so I don’t have to install things later. I’ve had a “stripped down” version of OpenWRT running the owserver utility for a month or so, and it’s been stable so far. I only documented its setup process loosely, but I’ll likely reimage the device at some point, and I’ll take better notes at that point.

The wireless tethering goal has been more elusive. While getting it to function has been pretty easy, making it stable has not. There seems to be a bug in the kernel that’s affecting the USB RNDIS functionality in my situation, and it causes simple things like a stream of pings to cause a kernel oops or panic. This is obviously not a good thing. I tried a new build last night, and for a while, I thought i had achieved a breakthrough. The tether functioned for a good 20 minutes without issue, with pings going the whole time. After a reboot of the WL-520, things did not go as well. The kernel problems started showing up:

[  180.832000] ------------[ cut here ]------------
[  180.836000] WARNING: at net/sched/sch_generic.c:255 0x801da888()
[  180.840000] NETDEV WATCHDOG: usb0 (rndis_host): transmit queue 0 timed out
[  180.844000] Modules linked in: rndis_host cdc_ether usbnet ohci_hcd nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp ipt_MASQUERADE iptable_nat nf_nat xt_conntrack xt_CT xt_NOTRACK iptable_raw xt_state nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack pppoe pppox ipt_REJECT xt_TCPMSS ipt_LOG xt_comment xt_multiport xt_mac xt_limit iptable_mangle iptable_filter ip_tables xt_tcpudp x_tables ppp_async ppp_generic slhc b43legacy(O) b43(O) mac80211(O) usbcore usb_common nls_base crc_ccitt cfg80211(O) compat(O) ssb_hcd bcma_hcd arc4 aes_generic crypto_algapi switch_robo(O) switch_core(O) diag(O)
[  180.892000] Call Trace:[] 0x802676b0
[  180.896000] [] 0x802676b0
[  180.900000] [] 0x8001bc18
[  180.904000] [] 0x801da888
[  180.904000] [] 0x8001bccc
[  180.908000] [] 0x80015ecc
[  180.912000] [] 0x801da888
[  180.916000] [] 0x801da6a4
[  180.920000] [] 0x80028484
[  180.920000] [] 0x80051660
[  180.924000] [] 0x801bfc04
[  180.928000] [] 0x80022940
[  180.932000] [] 0x80054e18
[  180.932000] [] 0x80022ba8
[  180.936000] [] 0x800022b0
[  180.940000] [] 0x80005984
[  180.944000] [] 0x80005ba0
[  180.948000] [] 0x80016c58
[  180.948000] [] 0x800075fc
[  180.952000] [] 0x80005bc0
[  180.956000] [] 0x802cd908
[  180.960000] [] 0x802cd0dc
[  180.960000]
[  180.964000] ---[ end trace 7e575b276bcf3f69 ]---

I’m using the bleeding-edge branch of OpenWRT (currently at r31439) in the hopes that the kernel fixes will miraculously show up sometime soon. They haven’t at this point, so I’ll have to submit a bug report.

Convert a CentOS 5 HVM domU to PV

I just added a guide on how to convert a fully-virtualized (HVM) Xen instance running CentOS 5 to a paravirtualized (PV) instance.

Convert a CentOS 5 HVM domU to PV with pygrub

Happy converting!