Arduino? What’s that?

A few weekends ago, I bit the bullet and did something I’ve been thinking about for a while. I went to Radio Shack, and bought a couple Arduino units and some accessory shields. What’s an Arduino, you ask? I’ll borrow the description from their website…

Arduino is an open-source electronics prototyping platform based on flexible, easy-to-use hardware and software. It’s intended for artists, designers, hobbyists, and anyone interested in creating interactive objects or environments.

To say it differently, it’s a platform that lets suitably motivated folks build little electronic gizmos that interact with the world in some meaningful way. What that interaction entails is entirely up you. In my search of the interwebs for various ideas for projects, I’ve come across folks doing a ton of cool things with Arduinos at their core.

The Arduino hardware is all open-source, meaning that schematics for all of their hardware are available to all, and anyone with the proper knowledge and components can build their own Arduino (or compatible) board. The guts of it are tiny and run with really low power requirements, so they can be placed in really tight environments (or even worn in clothing), running on batteries for extended periods of time. Being so tiny and cheap, they’re definitely not meant to replace your computer or anything. What they’re good for is providing the bridge between the environment and someone looking to interact with it in some way.

Like any good versatile tool, one needs to do some learning to be able use it properly. So, accordingly, the  first mile in my Arduino journey is to figure out how the hell to make one go. The Arduino SDK is available for pretty much everything, so step one was to get it installed. Fortunately, that’s a single command away for my Fedora 16 box.

The SDK is pretty light, but provides some good example programs (“sketches” in the Arduino lingo). I loaded a few of them up, and got the feel for flashing sketches to the Arduino unit. It’s as easy as pressing one button, provided that your code compiles properly (easier said than done for someone whose C/C++ is rusty at best). I loaded up a simple sketch designed to make a single LED blink at a specified interval. While strangely mesmerizing, it was basic. I needed to take it further. Since I learn best by doing, I’ve come up with some simple milestones.

LEARNING PROJECT 1: Make three LEDs blink at separate intervals

Arduino sketches basically just iterate through a single loop function in perpituity, so the pattern in the video above would have lasted until the end of time had I chosen to film it that long. The Arduino doesn’t come with any “wall clock” hardware (meaning that it doesn’t know what time it is), but it does come with a clock that lets it know much time has passed. This sketch just divides time into intervals of 10ms, and for each interval, it adds to a time counter variable corresponding to each LED. If the time counter passes the threshold to change the LED’s state, the state is changed, and the counter reset. If not, it just adds 10ms to the counter, and loops through again. Really simple stuff, but it was still neat to see things blinking away. 🙂

LEARNING PROJECT #2: Make the Knight Rider/Cylon oscillating light bar

This one has a resonance with me. Back when I was in 6th or 7th grade, I had this idea to build a hand-held “scanner” that resembled the little oscillating light bar that served as KITT’s eyes. I (ok, my dad), designed an analog circuit that accomplished the feat using a simple 555 timer IC that generated a waveform and a LED driver chip to convert that waveform into the visual representation – LED’s oscillating back and forth. I built it though, and stuffed into a computer mouse shell. It was pretty slick, but it would quite often go “out of tune” (the moving pattern in the LED’s would ‘linger’ at one end of the light bar or the other) when the battery was lower than optimal levels .

This digital version has no such issue. The clock on the Arduino is rock solid, so the stable oscillating pattern was easy to achieve. The code just increments a counter value corresponding to what LED is lit until it reaches the upper boundary, then starts decrementing, then incrementing… etc. It’s another really simple pattern, but I’ll admit to giggling like a 6th grader again when the thing lit up for the first time. 😀 I modified things a bit after the video was recorded to use a potentiometer measured via one of the analog inputs to control the timing of the oscillation.

LEARNING PROJECT #3: Read temperature values from a DS1820 1-Wire temperature sensor

I’ve got a number of DS1820 1-Wire temperature sensors laying around from my various attempts home sensing and automation, and I can see a lot of room for Arduinos to become part of those attempts, so I figured I’d take a crack at monitoring temperatures directly instead of monitoring on a computer via a USB 1-Wire dongle. With the help of some helpful examples, and the OneWire Arduino code library, this was really easy. The example pretty much did everything for me, but I thought it would be cool to extend it to multiple sensors. Making this work taxed my limited memory of C++, but I got it to work after a while. Coding in loosely typed languages like PHP, bash, and perl for so long has made me fairly lazy when it comes to programming. “What? I can’t just display that byte as if it were character? It contains a character!” I digress. Soon enough, I had output like this streaming out of the Arduino’s serial console every second.

10.8519E6180E8: 79.70 F
10.5516E618043: 79.70 F
10.FF10E61804C: 82.40 F

10.8519E6180E8: 79.70 F
10.5516E618043: 79.70 F
10.FF10E61804C: 82.40 F

10.8519E6180E8: 79.70 F
10.5516E618043: 79.70 F
10.FF10E61804C: 81.50 F

It turns out that the 1-Wire sensor addresses that I was displaying there weren’t formatted properly, but I fixed that later.

LEARNING PROJECT #4: Use the Ethernet Shield to send temperature data to another host 

When I bought my two Arduinos, I bought a pair of Ethernet Shields for them to couple with. I figured some or all of my projects would have a network component, so I got those purchases out of the way early. My previous success with temperature monitoring left me thinking that I could fairly easily replicate the autonomous heating system I put together for our chicken coop. It senses temperature, and fires a relay to activate/deactivate a heat lamp in the coop based on the temperature that was sensed. Easily done with what I’ve learned so far. But since we like pretty graphs, we need to know what the system is seeing, and what it’s doing, like so.

A system completely controlled by an Arduino wouldn’t act the same as the current one I’ve put together, but I’d require that the new system send equivalent types of data somewhere so that it could be logged in a similar fashion as compared to the current system.

Thankfully, there’s some good networking example code in the Arduino SDK, so making the Arduino send UDP packets wasn’t too hard either. The bulk of my time on this project was spent trying to get the packet data formatting to be what I wanted it to be. I definitely felt my shortcomings with C++ here. But after a while, I had my packets flowing out onto the network, and I could see them hitting my workstation with tcpdump. To add icing on top of the cake, I wrote a short receiver script in PHP to receive and print out the data that was read from the UDP packets. Regular Expressions parse the data out of the packet to look for properly formatted sensor/temperature pairs, then display them in the terminal.

For extra fun, I connected the Arduino to an old Asus router I long ago converted into a wireless bridge using OpenWRT, and sensed temperature in various places. For those who may be wondering, the freezer in our garage gets really cold. Cold as in -10°F.

So there it is. My first steps into Arduino land have met with success. There are so many possibilities, and so little time. I’ve got a ton of ideas that I’d love to explore when I get more up to speed with the Arduino and designing circuits.

  • Expanding my home automation setup. So many options here. 
  • Fully autonomous chicken coop controller with automatic door. Automatic watering (during summer months at least) for chickens. 
  • Car data logger and control system. Log data via ODB2 port (and perhaps also a GPS sensor), display it on a pretty tablet or LCD panel interface, store it for nerdy number crunching. Allow for control of HVAC, windows, or anything else really. Computer/Phone-controlled remote starter perhaps?
  • OK, and the 8 year old in me also really wants a full-sized Knight Rider light bar on the front of his car. I mean, how badass would that be? I could time it using something based on the current engine RPMs – higher RPMs = more oscillations per second. Combine that with a full RGB LED pixel array like this, and well, the 8 year old in me would be giddy.
  • Two words – Quad Copters. The Arducopter project turns an Arduino into a control and autopilot system for a quad copter, essentially making it into an autonomous UAV. So cool. I want one.
I’ve been saving my project code, so if anyone actually wants to see it, I can post it somewhere. I’m guessing nobody will want to see it, so I’m saving the effort for other things, like finding new places from which I can monitor temperature data.

A 500 Character Limit On A Customer Service Form Is Terrible.

Dearest NewEgg,
Since I can’t place more than 500 characters in in your customer service form, here it is.

My recent purchase seems to be lost somewhere, apparently between Newegg and UPS. Instead of an immediate refund or replacement, I’m forced to wait for 3-5 business days for a claim before I can get either a refund or a replacement order. With the replacement order, there’s still the shipping time on top of the 3-5 days to process the claim. I was expecting to have the item in hand today.

In my view, proper Customer Service on NewEgg’s part would have been to immediately ship a replacement with the highest shipping priority possible to make things right, without me having to request it. Denying this even upon request just adds insult to injury.

This transaction has left me with a fairly bad taste in my mouth, and I expect that I’ll be taking my business elsewhere in the future unless corrective action is taken on NewEgg’s part.

Context: I purchased an item on Newegg on 11/7. Newegg order status shows item shipped that day. UPS tracking info shows nothing past “billing information received.” Inquired with NewEgg regarding the status of the order today (11/12), and they (almost immeditately) reply that I need to file a claim for a replacement order or a refund. Both the replacement order and the refund come with a 3-5 business day processing time for the claim, then whatever additional time is required to refund the purchase or ship the replacement order.

NewEgg refused requests for immediate reshipment, and they also refused requests to increase the shipping priority of the replacement order.

So in other words, my money is in their hands, they lost the item in shipping, and instead of doing their best to right the situation, they pushed it into a pile of red tape. Customer Service fail.

Cassandra – Successes, Failures, Headaches, Lessons Learned

Victory!

My long project of converting my monitoring application at work to use Apache Cassandra is finally in production, and for the most part, running smoothly. All data storage and retrieval is using Cassandra, although I’ve yet to take the training wheels off, so to speak. Everything is still logging into the old MySQL databases until I’m confident that everything is working properly. The Cassandra storage setup has been the primary storage method for a bit over a week now, and while everything is functioning properly on the surface, I’d still like to do some additional sanity checks to make sure the data going in is completely consistent in both storage methods. I’ve had enough weirdness during the process to warrant caution.

For the most part, the process of adapting my code base to use Cassandra was straight forward, but getting to know Cassandra has been more complicated. Being a fully redundant and replicated system leaves a lot more room for strangeness than your typical single-host software package, and the fact that it’s a Java app meant that I was stepping into largely unfamiliar territory, since I typically run far, far away from Java apps (except you, Minecraft… we love you!). What follows are descriptions of some of the bumps in my road to victory, so that others may benefit from the lessons learned from my ignorance.

Picking  the right interface library is key

Since the bulk of my monitoring application is written in PHP, and I didn’t really feel like doing a complete rewrite, I needed to find an interface library that allowed PHP to talk to a Cassandra cluster. The first library I decided to try was the Cassandra PHP Client Library (CPCL). It was pretty easy to learn and get working, and at first glance, seemed to do exactly what I wanted with a very small amount of pain. When I really got deep into the code conversion, I started having some issues. Things were behaving strangely, and I got fairly (ok, seriously) frustrated. Then I found out it was my fault, and that I was doing things wrong. There’s always a “man, do I feel stupid” moment when things happen like that, but I quickly got past it and started moving on.

Then I came across some issues that really weren’t my fault (I think). Once the basics of converting the code were done, I started adding some new pieces to take advantage of the fact that I’d be able to store many times more data than I was able to previously. One of my previous posts on the topic mentioned my problems with drawing graphs when a large number of data points were involved. My solution to that was to incorporate rrdtool-style averaging of data (henceforth known as “crunching”) of data, which takes the raw data and averages things based upon fixed time intervals, typically 5 minutes, 30 minutes, 2 hours, 1 day, etc. rrdtool only keeps the raw data for a short period of time, but my system keeps it for the lifespan that Cassandra is told to keep it (currently two years). Combining the averaging over the various time intervals and the raw data itself, I can quickly graph data series over many different time ranges while keeping the PHP memory footprint and data access time at sane levels.

My solution for crunching the data was fairly simple to implement, but works in a fundamentally different way from rrdtool. rrdtool does the averaging calculations as data is submitted, and it does it very quickly because it enforces a fixed number of data points in its database, with fixed intervals in between each collection. The scheduling of data collections in my system is completely arbitrary and can be done on any regular or irregular interval. It uses agents that report the monitored data to the storage cluster via HTTP POSTs, so doing the averaging in-line would be very costly in terms of web server processing and latency for the monitoring agent. A big goal is to keep the monitoring agent idle and asleep as much as possible, so waiting for long(ish) HTTP POSTs isn’t very optimal. Therefore, the data processing needed to be done out of band from the data submission.

That’s where I came across the first big gotcha with the CPCL package. After I had my crunching code written and made the necessary changes to my graphing code to use the averaged data, I started seeing some really weird graphs. Certain ranges of the data were completely wrong, in ways that made no sense at all. For example, on an otherwise normal graph of a machine’s load average, there would be a day’s worth of data showing the machine’s load average at over 1,000,000,000,000 (which probably isn’t even possible without a machine catching on fire). After a lot of debugging, I finally came to the conclusion that even though the query to Cassandra was right, the wrong data was being returned. That huge number of over 1,000,000,000,000 was actually a counter belonging to a network interface on that machine. I tried everything I could think of to find or fix the problem, but nothing I did eliminated the problem completely.

The other big problem manifested itself at the same time as I was seeing the first. For no reason I could discern, large-ish write operations to Cassandra would just hang indefinitely. Not so good for a long-running process that processes data in a serial fashion. I tried debugging this problem extensively as well, and came up with nothing but more questions. What had me completely confused was that I could unfreeze the write operation using a few non-standard methods – either by attaching a strace process to the cruncher process, or by stopping the cruncher process using a Ctrl-Z, then putting it back into the foreground. Both of these methods reliably unfroze the hung cruncher process, but it would typically freeze up sometime later.

These two issues had my wits completely frayed. In desperation, I put up a plea for help on the CPCL github bug tracker page. I couldn’t find a mailing list or fourm, so I had to post it on their bug tracker, even though that’s not the optimal place for such a request. I waited a few weeks, but I never received any kind of response.

Instead of kicking the dead horse further, I decided to try another library – phpcassa. The classes were laid out in a similar way to CPCL, so converting the CPCL code over to phpcassa wasn’t too hard. And guess what! Everything worked properly! No strange hangups, no incorrect data. I’m not sure there was any way I could have known to try phpcassa first, since both libraries looked feature complete and fairly well documented, but man, I wish I had tried phpcassa first. it would have saved me a lot of headache.

Think hard about your hardware configuration, then test the crap out of it

Once I had all of my code changes ready, I was pretty anxious to get them pushed out to my production setup. Before I could do so, I needed to set up a production Cassandra cluster. My development clusters were done in a virtualized manner with ten or so Cassandra VMs spread across a couple hardware nodes. This worked perfectly well in my development setup, but my production setup monitors a few orders of magnitude more hosts, so I knew there was going to be some trial and error involved with finding the right production setup.

After talking things over with my boss and another co-worker familiar with Cassandra, I came up with a template of what I was shooting for. I aimed for some fairly capable machines that had been retired from our normal product offerings. They were servers that were just occupying space that would otherwise go unused. I ended up setting up eight Harpertown Xeon (E5420) machines, each with 8GB of RAM, a 500GB disk for the OS and Cassandra commit logs, and three 3TB drives in a RAID0 for the Cassandra data. I figured that the RAID0 would make things nice and fast, and that the large amount of data each node could store would get me to my utilization goals.

I set up the nodes, and quickly ran into a few issues in my pre-deployment testing. The boxes kept losing disks at a rapid rate, even under fairly paltry load conditions. It turns out that the motherboard model that was present in those machines had a fairly high rate of goofy onboard SATA controllers. It was known to our server setup team and our systems restoration folks, but I was unaware of it. Lesson? Research your hardware. Make sure it isn’t crap.

I was able to bypass the issue in the affected servers by using add-on SATA controllers, so I proceeded undeterred. After much preparation, I was ready to put things live in production. I activated the Cassandra code in a write-only state, and everything worked pretty well. Anxious to see how it functioned under heavier load, I started doing some data imports from the old MySQL database into Cassandra, and things got a lot less awesome. Doing any more than one  or two concurrent data imports caused a lot of congestion in the Cassandra cluster, and caused the whole system to slow down and grind to a halt. I figured I could survive if I could only import data in serial fashion, so I let it run for a while. I really wanted to test the data crunching code, so after a while I stopped the imports and started up a couple cruncher processes. They dragged Cassandra down a bit, but things were keeping up. I decided to see how far I could push things, so I started up a few more cruncher processes, and they proceeded to drive Cassandra straight into the ground. The entire system ground to a halt. Crap. Frustrated and completely annoyed, I disabled the Cassandra code and went back to the drawing board.

After taking a few days away from the project to clear my head, I realized that I had forgotten a fairly important piece of the process of building my Cassandra cluster. I never really tested it under load. Sure, I pointed my dev environment at it to make sure it worked, but I never tried in any way to simulate the huge difference in load between what my dev environment generates and the total onslaught generated by the 20,000+ servers being monitored in the production environment. Lesson? Load testing is good. Beat the crap out of your setup, then when you think you’re satisfied it can handle the load, double it. Beat it up more.

I decided that my next round of servers would be newer and less flaky. My MySQL setup uses a good number of AMD X6 1055T servers as cheap and fairly capable storage nodes, so I went in that direction. I started with 8GB of RAM in each box, and did a lot of testing to figure out whether I should use single data disks, or multiple disks in a RAID configuration. My first test was with a single disk compared with a two-disk RAID0, and the RAID0 performed much better. The next iteration compared the same two-disk RAID0 configuration with a four-disk RAID10. I expected the RAID10 to perform better, but the numbers really weren’t that different, so I decided on the two-disk RAID0 as my setup to save on equipment costs.

One thing that I could see very clearly in the benchmarking is Cassandra’s huge optimization of write performance at the expense of read performance. I could easily throw a few hundred thousand write operations per second at the test cluster, but it struggled to get more than a few hundred reads per second.   I did a lot of reading to find out what I was doing wrong, and came up with a few pointers (some of which will be discussed later), but the big one was simply the scale of the cluster. If you want better read performance  out of Cassandra, throw more hardware at it.

So I did. I added more RAM to each box (for a total of 12GB per box currently), and added many additional nodes. My current cluster is comprised of 22 machines, each with 12GB of RAM and at least 3TB of storage each. Go big or go home, right? The big lesson learned here is that Cassandra does better with more hardware nodes of moderate capability, rather than fewer large nodes. Give the cluster as many I/O paths to your data as possible, or risk trapping it behind I/O bottlenecks.

Node death can be a pain even when you have many other active nodes

During the course of my first attempt at pushing things live in production, a very painful problem emerged as a result of the frequent drive issues the Harpertown boxes suffered. If I had my web nodes configured to connect to a node that was offline (as in, the machine is not responding on the network), the whole system would grind to a standstill as web nodes tried, in vain, to connect to the dead node. Even with eleven out of twelve nodes active, a large enough percentage of the web processes tried to connect to the dead node that the whole system would eventually jam up waiting for the connection to time out before trying the next node in the list. My long-running processes, like the cruncher script, weathered this gracefully, since connection latency was not a factor to them, and they would remember that a particular node was down. Short-running tasks, namely the HTTP interactions, suffered greatly when a node died, since the system essentially has to relearn the state of each Cassandra node each time the process starts. Since the HTTP stack is heavily reliant on moving the enormous number of short-lived HTTP connections through the queue in quick fashion. wasting a bunch of time figuring out that a node is down is very costly – the stalled HTTP processes quickly pile up and quickly prevent real work from occurring.

My solution to this was to add pair of load balancers into the mix. I have a good amount of experience using the linux kernel load balancing (managed with ipvsadm) and ldirectord as a monitoring agent, but the network setup required to properly implement it wasn’t really conducive for my needs. I wanted to be able to arbitrarily forward the Cassandra TCP connections to anywhere, not just a local subnet directly connected to the load balancers. So I looked around a bit and found haproxy. It’s a lightweight process that will do load balancing and service checking for either HTTP or bare TCP connections, which fit my needs to a T. I did some quick testing and found it to be exactly what I needed.

In my final production setup, I grabbed two of my previously discarded Harpertown boxes (making sure they weren’t the ones with SATA issues) and configured them in an active-active setup using haproxy and heartbeat. Each load balancer has a VIP that is active at any given time, but can also fail over to the other box, which avoids the downed node problem. I configured haproxy to run in a multi-process fashion, one process for each of the eight CPU cores in the Harpertown machines. The end result is a redundant service-checking load balancer setup that ignores downed Cassandra nodes that quite easily passed over 100MByte/sec of Cassandra traffic in my testing.

Make sure your Cassandra JVM is properly tuned

Before Cassandra, my typical interaction with Java applications was to see them, acknowledge their presence, then run in the opposite direction.  I had some pretty bad experiences trying to deal with Tomcat in my early days as a SysAdmin, and they really turned me off to Java as a whole. Once Cassandra came into the picture, I didn’t have much choice but to learn about it.

I read plenty of articles where people talked about tuning the Java heap, but really didn’t have a concept of what all the variables in the equation meant. My first attempts at changing the heap size came after a Cassandra process crashed, and wouldn’t start up again. I’m not sure what prompted me to think about changing the heap size as a solution, but it worked, temporarily at least. I increased the heap size to 6GB I think, out of 8GB of total system RAM. Cassandra started, worked fine for a short while, then promptly crashed and burned. Attempt number one – resounding failure!

I read up some more on what each heap-related configuration setting related to, and what it affected. Learning the relationship between the new and old generations of heap memory, and how they were used by the garbage collection processes was key. In my first attempt, not knowing what I was changing, I set the new generation heap memory to the same value as total heap memory. Pretty much the definition of Doing It Wrong (TM). After some more reading and tweaking, I found that with the 12GB memory footprint on my finalized hardware setup, a 6GB heap size provided Cassandra with the memory it needed to stay stable, while leaving the OS with a fair amount of room for its filesystem caches. I later refined those heap settings to set the new generation heap size to 1GB, leaving 5GB for older data objects.

One tool that I found very helpful was jconsole. It’s included in my OS’s Java Development Kit, and was very useful in getting a good idea how Cassandra was using memory. It gives a real-time view into the innards of a running Java process, which, among many other things, gave me the ability to see how memory was being utilized in the various heap regions, which made how Java uses memory much clearer and easier to understand. Old Java pros probably know all about this, but it’s still pretty new and novel to me.

Tune your column family’s read repair chance

After I had observed the fairly awful read performance in my benchmarks, I started doing some research. In my readings, I came across the description of a column family’s read repair chance, and realized that I was, once again, Doing It Wrong. Basically, the read repair chance is the likelihood that a particular read operation will initiate a read repair operation between the various Cassandra servers that contain replicas of the data object you’re trying to retrieve. If a read repair is initiated, the read operation will go to all replicas, and the results are then compared to make sure all replicas are in sync. If you have your cluster configured to have four replicas of each piece of data, a read operation triggering a read repair will cause all four replicas to do read operations instead of just one (or however many you asked for). For a system that is already under heavy read load, this can make things much slower.

I looked at the column families that the benchmarking utility was creating, and sure enough, the read repair chance was set to 1, or 100%. I tried setting it much lower, to 0.1, and my read rates improved dramatically. Setting it even lower, to 0.01, and they went higher still. I was using a replication factor of four in my benchmarking tests, and with the closer I got to a read repair chance of zero, the closer to I got to a factor of four increase in the “base” read rate with a read repair chance of 100%. This brought my read rates out of “crap” territory and into “probably acceptable” territory. My production schemas were also using a read repair chance of 100%, so I adjusted them down to far more conservative values.

It’s worth noting that read repairs aren’t a bad thing, per-se, they’re just really good at adding load to a cluster. They’re actually a very good sanity check for data integrity, but they come with a cost.

Tune key/row caches

Another thing I found that can improve read performance are the key and row caches. I was well aware of the row cache concept, and mostly discarded it in my case, since my data model didn’t really work with it. The key cache, however, was definitely useful. Basically, the key cache is a pointer to the disk location where the data for a particular key is stored. This makes disk access a lot less painful, since Cassandra doesn’t have to search through its data files to find the piece of data its looking for.

I initially tried setting my key caches large enough to account for all keys currently in each column family. My data model has a fairly fixed number of row keys that scales mostly linearly based on the number of hosts being monitored by the system, so I was able to figure pretty easily how many keys I needed to cache. I set the number comfortably above that value, and watched as read performance improved as the caches filled in. And then, after coming back to work after a weekend had passed, I noticed that my key caches seemed to have shrunk far below the maximums I had set previously. Some quick searches in the Cassandra system logs showed that Cassandra had lowered those limits because of memory pressure inside the JVM. The whole cluster’s stability and proper operation seemed to be negatively impacted by this state (all nodes were showing the ‘memory pressure’ messages in their logs), so I experimented with some lower key cache settings until I found a good balance between performance and stability.

 

The Problem With Dealing With More Data Than You Can Deal With

Over the past few weeks, as I’ve mentioned in previous posts, I’ve been working on converting a server monitoring application to use Apache Cassandra as its storage engine. Now that I have got past the initial hurdles of learning the system and my own stupidity while making code modifications, the code is successfully converted and all of my collected data is dumping into Cassandra. Now what?

For the life of the application, I’ve stored collected data in two ways. First is a simple snapshot of the latest value collected, along with its time stamp, which is used for simple numeric threshold checks, i.e. “Is the server’s memory usage currently too high” or “is free disk space currently too low”. Each piece of snapshot data is overwritten with the newest value when its collected. The other method is a historical record of all values collected. Numeric data gets stored each time its collected, and text-based data (software versions, currently loaded kernel modules, etc) is logged when it changes. This allows for the application to draw (somewhat) pretty graphs of numerical data or provide a nice change log of text-based data.

An Example Graph

My current quandary is how to deal with the vast amounts of data I’ll be able to store. Previously I had to constantly prune the amount of data stored so that MySQL wouldn’t melt down under the weight of indexing and storing millions of data points. I set up scripts that would execute nightly and trim away data that was older than a certain point in time, and then optimize the data tables to keep things running quickly. Cassandra shouldn’t have that problem.

Even though I’ve only been storing data in Cassandra for a few weeks, I’m already running into issues with having more data than I can handle. My graphing scripts are currently set up to get all data that will be graphed in a single request, and then iterate through it to determine the Y-axis minimums and maximums, and then build the graph. It then grabs another set of data via a single request to draw the colored bar at the bottom of the graph, which displays whether data collection attempts were successful or if they failed. With that approach, I’m a slave to the amount of memory PHP can allocate, since the arrays I’m building with the data from Cassandra can only get so large before PHP shuts things down. I’m already hitting that ceiling with test servers in my development environment.

Some of the possible solutions to this problem are tricky. Some of them are easy, but won’t work forever. Some of them require out-of-band processing of data that makes graphing easier. None of the potential solutions I’ve come up with is a no-brainer. Since some of the graphed data is customer-facing, performance is a concern.

  1. Increase the PHP memory limit. This one is easy, but will only work for so long. I’m already letting the graph scripts allocate 128MB of RAM, which is on the high side in my book.
  2. Pull smaller chunks of my data set in the graphing code, and iterate through it to create graphs. This is probably the most sane approach, all told, but it seems fairly inefficient with how things are currently structured. I’d have to do two passes through the graph data in order to draw the graph (the first to grab the data set boundaries, and the second to actually draw the data points within the graph), and a single pass through the data detailing whether collections were successful or not. For a larger number of data points, this could mean a fair number of Cassandra get operations, which would cause slow graphing performance. 
  3. Take an approach similar to how MRTG does things, where data is averaged over certain time frames, with the higher resolution data being kept for shorter periods, with larger-length averages stored longer. This is something I’ve wanted to do for a while, but I’m not sure how much out-of-band processing this would require in the production cluster. One possible advantage to this is that if I did some basic analysis, I could store things like maximum and minimum values for particular time ranges ahead of time, and use those in my graphs instead of calculating them on the fly. 

I’m sure there are brilliant folks out there who have come up with elegant solutions to this type of problem, but at this point, I’m kind of stuck.

Herp-a-derp

And then there was that one time, where I was having issues with Cassandra, that I was dumb and Did It Wrong (TM).

$_CASSANDRA->cf('TestCF')->set("TestKey", $tostore, Cassandra::CONSISTENCY_ALL, (2 * 86400) );

Should have been…

$_CASSANDRA->cf('TestCF')->set("TestKey", $tostore, Cassandra::CONSISTENCY_ALL, $null, (2 * 86400) );

Don’t forget parameters in your function calls folks, they’ll mess you up.

First Steps Into Big Data With Apache Cassandra

I’ve got a monitoring application at work that I wrote and maintain which currently uses MySQL as a storage back end. With the amount of data it holds and the activity in the system, MySQL has gone from “probably not the optimal solution” to “really stupid”. The system is comprised of many storage servers, most of which are completely I/O bottle-necked because of MySQL’s write overhead. It’s a typical “big data” kind of problem, and I need a big data solution.

Over the past couple of weeks, I’ve been experimenting with Apache Cassandra. We recently started using it in another context, and it seems pretty damned slick. Based on what I read, it seems like a great fit for my needs. The data model is consistent with what I do in MySQL, and the built-in redundancy and replication is awesome.

Most of the stuff I’ve tried so far has Just Worked (TM). Setting up a basic test cluster was easy, and once I found a suitable PHP client library for accessing Cassandra, I was able to make my development setup store collected data in under 30 minutes. I started off using a data model that pretty much mirrored what I was doing in MySQL, but as I learned more, I was able to strip away a few of the MySQL-specific “optimizations” (read: hacks) in favor of a more streamlined setup.

However, there are a few things that just make me scratch my head. From what I can tell, updating rows in Cassandra is “strange”. In my testing so far, inserting new data works flawlessly. Both adding new rows and adding columns onto an existing row work as expected. However, I notice lots of weirdness when updating pre-existing columns in pre-existing rows. It seems as though Cassandra is only updating the values associated with columns if the value is “larger” than the previous. See the following for an example.

# ./casstest
truncating column family for cleanliness...
========================================================
What we're storing...
Array
(
    [timestamp] => 1339529068
    [value] => 0.01
)
storing ...
sleeping a second for consistency...
What is retrieved from a get()...
Array
(
    [timestamp] => 1339529068
    [value] => 0.01
)
========================================================
What we're storing...
Array
(
    [timestamp] => 1339529071
    [value] => 1.01
)
storing ...
sleeping a second for consistency...
What is retrieved from a get()...
Array
(
    [timestamp] => 1339529071
    [value] => 1.01
)
========================================================
What we're storing...
Array
(
    [timestamp] => 1339529074
    [value] => 2.01
)
storing ...
sleeping a second for consistency...
What is retrieved from a get()...
Array
(
    [timestamp] => 1339529074
    [value] => 2.01
)
========================================================
What we're storing...
Array
(
    [timestamp] => 1339529077
    [value] => 1.01
)
storing ...
sleeping a second for consistency...
What is retrieved from a get()...
Array
(
    [timestamp] => 1339529077
    [value] => 2.01
)
========================================================
What we're storing...
Array
(
    [timestamp] => 1339529080
    [value] => 0.05
)
storing ...
sleeping a second for consistency...
What is retrieved from a get()...
Array
(
    [timestamp] => 1339529080
    [value] => 2.01
)
========================================================

In the example above, the timestamp column is just the result of a call to time(), so it will always increment over time. The values for the value column are just a few static entries pulled from a pre-populated array I used for testing. They increment three times, then decrement twice. I’m just making a simple array out of the two pieces of data, and then doing a set operation to write the data into Cassandra. As you can see, the timestamp fields show the proper values each time the key is retrieved, but the value column only shows the proper values when the value being written is larger than the last. WTF? I don’t know whether to blame Cassandra or the PHP client library I’m using (CPCL), but it’s really cramping my style at this point. I’ve gone as far as watching the contents of the TCP connections between client and server with tcpdump/wireshark to see if the client is making the same set requests for all values, and it seems to be. I’ve also tried changing the write consistency level, with no change.

It is also worth noting that when using the cassandra-cli utility to do inserts sets/gets manually, things work as I would expect.

[default@keyspace] assume TestCF VALIDATOR as utf8; 
Assumption for column family 'TestCF' added successfully.
[default@keyspace] assume TestCF SUB_COMPARATOR as utf8; 
Assumption for column family 'TestCF' added successfully.
[default@keyspace] assume TestCF keys as utf8; 
Assumption for column family 'TestCF' added successfully.
[default@keyspace] assume TestCF COMPARATOR as utf8; 
Assumption for column family 'TestCF' added successfully.
[default@keyspace] get TestCF['TestKey'];
=> (column=timestamp, value=1339532764, timestamp=172800)
=> (column=value, value=2.01, timestamp=172800)
Returned 2 results.
Elapsed time: 2 msec(s).
[default@keyspace] set TestCF['TestKey']['value'] = utf8('0.0');
Value inserted.
Elapsed time: 1 msec(s).
[default@keyspace] get TestCF['TestKey'];
=> (column=timestamp, value=1339532764, timestamp=172800)
=> (column=value, value=0.0, timestamp=1339532783904000)
Returned 2 results.
Elapsed time: 2 msec(s).
[default@keyspace] set TestCF['TestKey']['value'] = utf8('2.0');
Value inserted.
Elapsed time: 2 msec(s).
[default@keyspace] get TestCF['TestKey'];
=> (column=timestamp, value=1339532764, timestamp=172800)
=> (column=value, value=2.0, timestamp=1339532783913000)
Returned 2 results.
Elapsed time: 2 msec(s).
[default@keyspace] set TestCF['TestKey']['value'] = utf8('1.5');
Value inserted.
Elapsed time: 1 msec(s).
[default@keyspace] get TestCF['TestKey'];
=> (column=timestamp, value=1339532764, timestamp=172800)
=> (column=value, value=1.5, timestamp=1339532783923000)
Returned 2 results.
Elapsed time: 2 msec(s).
[default@keyspace] set TestCF['TestKey']['value'] = utf8('0.2');
Value inserted.
Elapsed time: 0 msec(s).
[default@keyspace] get TestCF['TestKey'];
=> (column=timestamp, value=1339532764, timestamp=172800)
=> (column=value, value=0.2, timestamp=1339532783933000)
Returned 2 results.
Elapsed time: 2 msec(s).

Another thing that isn’t acting as I would expect is row deletions. In my testing, it seems that once a row has been deleted, subsequent attempts to write to that row will just silently fail. I suspect that it has to do with the fact that Cassandra’s distributed nature makes deletes a bit counter-intuitive, which is outlined here in the Cassandra documentation. It would be nice to know for sure, though.

EDIT: I was doing it wrong. Sigh. Deletes are still weird to me though.

In Search of the Ultimate Desktop

Back in the day, when first started playing with Xen years and years ago, I had this idea of being able to run a multi-user desktop, with each user getting their own OS with a keyboard, mouse, and monitor all running from the same physical computer. I didn’t really have much use for it at the time, but I thought it would be a pretty cool demonstration of the technology. I discovered pretty quickly that it wasn’t possible. Hardware assisted virtualization (HVM) was around at that point, but Xen lacked the ability to pass a video device into the child OS.

The times have changed, as the following long-winded video demonstrates:

As of version 4.0, Xen now has the ability to pass a VGA device through to a child instance. This makes my idea possible, and now I have to try it!

A fair amount of research on the subject left me a bit disheartened. Said research told me that in order to pass a VGA device through to a fully-virtualized child instance (the only way you can run Windows in Xen), you need to have special hardware support that’s only found in newer machines. None of the machines I have, even the whole pile machines acquired in the hardware free-for-all at work, had the necessary hardware support. My goal was out of reach.

I still wanted to try it though, just for shits and giggles. Something told me it was possible. My reading said that it was possible to pass VGA adapters through to paravirtualized child OSs, and Windows has paravirtualized driver support via the GPLPV driver bundle. They’re not exactly the same, but I figured it was worth a try to see if things worked. I set up a test box running Fedora 16/Xen 4.1.2, and grabbed a templated Windows XP install to see if I could make things work.

I set up the Windows with the latest GPLPV drivers and configured the parent to pass through a PCI express slot containing a Nvidia GeForce 8400 GS to the child. I tried configuring the device as both the primary graphics adapter and a secondary one, but neither worked completely. With the card configured as a secondary adapter, I could still interact with the system via the Xen-provided VNC console. From there I saw that the card was being seen by the OS, but it wasn’t able to activate it. Unable to start device. Code 10. Blah. So close! The parent’s logs showed repeated issues with IRQs, so I’m wondering if something isn’t lining up properly that I can fix.

At one point, I was able to (accidentally) make the Windows instance use the crappy onboard video card the parent was using, so I’m hopeful! Xen’s documentation says that passing the parent’s VGA card to a child seems to be stable, but that’s not really what I’m going for.

In reading some of Xen’s wiki documentation, it seems that ATI graphics hardware might be better supported at this point. That fact hasn’t made things work any better. I tried a Radeon card to no avail. Nerds! Perhaps I just need to buy some new hardware to make this go.

Free Stuff and Kickstart Hell

Recently, my employer had a free hardware giveaway that allowed employees to pick through pallets upon pallets of retired server hardware and claim things as their own. Some of it was known to be dead, but the state of most of it was unknown, so it was kind of a crap shoot. I grabbed myself a few (okay, thirteen) machines in the hopes that I’d pick up a few decent pieces of machinery. As luck would have it, ten out of the thirteen boxes were in working order. Five of them were dual-dual core machines, and the other five were dual-quad-core machines, all server grade. After picking up some RAM, I have a pile of working machines just waiting for something to do. Time for a new lab setup at home!

The first thing I’m working on is a kickstart setup to automate system installs of various types. I’ve got a few basic CentOS server setups configured already, and I’ve been working on some Fedora-based stuff as well. Fedora is much more bleeding-edge, so there’s a lot of change from what I’m used to with the “OMG SO OLD” CentOS.

One of those differences cropped up while I was trying to write up a kickstart script for a Fedora 16-based desktop with an encrypted storage setup on a software RAID1. Prior to Fedora 16, one could easily set up a kickstart that would partition a disk with “growable” RAID partitions of non-fixed size – a partition on the disk that would grow to fill the available space on the disk. Fedora 16 forces you to create a RAID partition of fixed size, without the ability to fill up the available space. This makes the kickstart script non-flexible, since you have hard-coded size values for each partition. This is not at all desirable in the setup I was going for, except for the /boot partition.

With a bit of elbow grease and a non-trivial amount of time spent waiting for test runs to complete, I was able to craft a work-around. I’ve documented it here.

Overkill

One awesome thing about my job is I get to play with things like this.

[root@box ~]# cat /proc/cpuinfo | grep "model name"
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
model name : AMD Opteron(TM) Processor 6272
[root@box ~]# cat /proc/cpuinfo | grep "model name" | wc -l
64
[root@box ~]# free -g
total used free shared buffers cached
Mem: 504 5 499 0 0 0
-/+ buffers/cache: 5 499
Swap: 179 0 179

OpenWRT Successes and Failures

I’ve been a big fan of the OpenWRT project for a while now. I love the idea of taking an inexpensive single-purpose device with low power needs and turning it into a flexible platform that can fill multiple roles. I’ve been using it on a few Linksys WRT54GL routers for a while now with great results, even if the plain old “Wireless Access Point” function I’ve given them is fairly bland.

Sooner or later, that may change. A few months ago, I bought a couple Asus WL-520gU units to act as a platform for more OpenWRT experimentation. They’re quite similar to the WRT54GLs in their specifications, but they have one added feature that I highly desire – a USB port. The now-ubiquitous feature of modern computers is still fairly absent from devices compatible with OpenWRT.

I have two primary use cases in mind for these units and their USB ports. The first is for use in my home monitoring and automation setup, which uses 1-wire devices inside and outside of the house to monitor the environment and manipulate various things, such as electricity fed to power outlets or HVAC controls. These 1-wire devices are controlled by a computer, and the 1-wire network connects to the computer using USB. In this setup, I could connect back into the main IP network by using the WL-520gU’s wireless in client mode, or it connect to the network using ethernet and function as an access point while also doing 1-wire control things.

The second use case is to make the device into a wireless router (gasp!) that connects to a cell phone using its USB tethering function and provides the 3G internet to the devices connected through it, either via the wifi or ethernet ports. This would serve as an easily transportable and totally mobile internet connection that could be used while travelling. This would also have the secondary benefit of providing emergency internet access to our overly-complicated home network setup, which uses routing protocols and separate virtual machines to handle routing duties for separate internet connections (of which we have two). With the ‘magic’ of the routing protocols, I could simply plug the tethered router into the appropriate VLAN on our network and its routes to the internet would be exposed to the rest of our network, providing access if our cable and DSL services are down.

I’ve been largely successful with the first goal. OpenWRT’s backfire branch has support for 1-wire USB host devices and the OWFS interface layer that I use to access the 1-wire network. It took me a while to find the right version of OpenWRT to use, but I think I’ve got a stable setup. I’m cooking up my own firmware images so I can customize what goes in the ROM image so I don’t have to install things later. I’ve had a “stripped down” version of OpenWRT running the owserver utility for a month or so, and it’s been stable so far. I only documented its setup process loosely, but I’ll likely reimage the device at some point, and I’ll take better notes at that point.

The wireless tethering goal has been more elusive. While getting it to function has been pretty easy, making it stable has not. There seems to be a bug in the kernel that’s affecting the USB RNDIS functionality in my situation, and it causes simple things like a stream of pings to cause a kernel oops or panic. This is obviously not a good thing. I tried a new build last night, and for a while, I thought i had achieved a breakthrough. The tether functioned for a good 20 minutes without issue, with pings going the whole time. After a reboot of the WL-520, things did not go as well. The kernel problems started showing up:

[  180.832000] ------------[ cut here ]------------
[  180.836000] WARNING: at net/sched/sch_generic.c:255 0x801da888()
[  180.840000] NETDEV WATCHDOG: usb0 (rndis_host): transmit queue 0 timed out
[  180.844000] Modules linked in: rndis_host cdc_ether usbnet ohci_hcd nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp ipt_MASQUERADE iptable_nat nf_nat xt_conntrack xt_CT xt_NOTRACK iptable_raw xt_state nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack pppoe pppox ipt_REJECT xt_TCPMSS ipt_LOG xt_comment xt_multiport xt_mac xt_limit iptable_mangle iptable_filter ip_tables xt_tcpudp x_tables ppp_async ppp_generic slhc b43legacy(O) b43(O) mac80211(O) usbcore usb_common nls_base crc_ccitt cfg80211(O) compat(O) ssb_hcd bcma_hcd arc4 aes_generic crypto_algapi switch_robo(O) switch_core(O) diag(O)
[  180.892000] Call Trace:[] 0x802676b0
[  180.896000] [] 0x802676b0
[  180.900000] [] 0x8001bc18
[  180.904000] [] 0x801da888
[  180.904000] [] 0x8001bccc
[  180.908000] [] 0x80015ecc
[  180.912000] [] 0x801da888
[  180.916000] [] 0x801da6a4
[  180.920000] [] 0x80028484
[  180.920000] [] 0x80051660
[  180.924000] [] 0x801bfc04
[  180.928000] [] 0x80022940
[  180.932000] [] 0x80054e18
[  180.932000] [] 0x80022ba8
[  180.936000] [] 0x800022b0
[  180.940000] [] 0x80005984
[  180.944000] [] 0x80005ba0
[  180.948000] [] 0x80016c58
[  180.948000] [] 0x800075fc
[  180.952000] [] 0x80005bc0
[  180.956000] [] 0x802cd908
[  180.960000] [] 0x802cd0dc
[  180.960000]
[  180.964000] ---[ end trace 7e575b276bcf3f69 ]---

I’m using the bleeding-edge branch of OpenWRT (currently at r31439) in the hopes that the kernel fixes will miraculously show up sometime soon. They haven’t at this point, so I’ll have to submit a bug report.