Optimizing Caching: Twemproxy and Memcached at Flickr
... a post I wrote about some work I did at Flickr: code.flickr.net. TLDR; I delivered a couple key infrastructure pieces providing a more resilient caching infrastructure, propagation of cache clears across data centers, and reduced user latency across the board.
I recently attended an awesome talk by Steve Pousty on Vert.x, and I found this technology so intriguing, I had to dig deeper myself. This post summarizes why I think Vert.x is so promising, and takes a look at my own exploration of the technology.
More About Vert.X
I think Vert.x has a really cool combination of features for or creating application servers. I will focus on the two aspects I find most powerful. First are the APIs that "enable you to write non-blocking network enabled applications." Vert.x is a productive abstraction over netty.io, with an API that makes it easy to set up servers. This feature lets one build servers in JS (or whatever), very similar to node.js. There are even projects for running node.js and Rack applications on Vert.x, but that is a topic for another day.
The second feature of Vert.x that I find so compelling is the built-in, distributed event bus, combined with an actor-like processing model. This allows developers to write clean back-end logic, in whatever language makes sense, and capitalize on concurrent and parallel processing, shielded from the gory details. A flexible deployment model allows you to distribute the pieces how you see fit.
If the potential of these features is still not grabbing you, let's look at the demo, and then I will try to drive it home at the end.
For my simple demo, I set up a dead-simple server that handles a few routes and responds with ugly plain text. There is not much to it, just a tiny sketch of how a restaurant site with ratings might work. The whole demo is on GitHub.
In the previous gist, line 2, we add a route to a routeMatcher object, then proceed to pull out the params in the handler. Later on, we set up a server that handles all the routes:
Adding a Rails Backend
Back to the route handler in server.js, the following is how we interact with the business logic:
This code handles manage-restaurant events, mutating the db for these business events as necessary.
What's the Big Deal?
... and Future Considerations
In addition to the single-process simplicity afforded by Vert.x, one can also deploy Verticles as separate nodes, and the distributed capability of the event bus will handle the interprocess communication, effectively turning a concurrent application into a parallel one. And changing deployment models should entail little or no change to core logic. I think this is awesome because one can develop quickly as a monolithic-style app, and then deploy as more of a microservices architecture--scalability options are there from the beginning.
In the future, I want to dig deeper into building application architectures on top of Vert.x (with suitable abstraction of the framework of course). I envision a JS presentation layer where some POJO logic is shared between the client and the server-side, with lightweight TCP communication between them. For the backend logic I want to explore how a clean or hexagonal application layer would look combined with a message bus API. Putting it all together, I have a foggy notion of a new stack that ultimately might have some frameworks or tooling around it. In my head I've been calling it "mystack." Stay tuned for further developments if you like where this is headed.
*Note: I have not experimented with the scalability of the distributed event bus, but I give Vert.x the benefit of the doubt for now.
I think I have said before, the purpose of software development is to produce good enough software as quickly and efficiently as possible. Everything I do at codecraft is driven by that principle. But why just "good enough," and what does it mean exactly?
Good Enough Software
The software created by a project should only be good enough, not more. That sounds crazy, I know. But, if we are trying to get to market quickly and make strategic use of our investments, any bit of extra quality or premature scalability or speculative functionality costs money, money we could spend elsewhere. One has to be serious about how time and money are spent or the competition will clobber you.
Good enough, then, means finding the balance between present and future costs and benefits. Good enough means investing in what you need most right now, while being conscious of future concerns. Good enough means getting something inperfect in front of users quickly, and worrying about scale and perfection when the system is viable.
So how do I go about delivering Good Enough, quickly and efficiently? It is all about providing rich, continuous feedback.
What I Do
Every day, the plan will change to respond to where we are and what is the most strategic use of time. I usually like to keep a prioritized "feature inventory" in a shared document of tracking system. The client picks priority features, with some elaboration and discussion as necessary, and defines "Stories," pieces of the feature that make reasonable units of work. Stories move from a Ready state to In-Process with further discussion as necessary, and using the principles of Kanban the in-process work at any time is limited to hone productivity. Completed code for a story is captured in the shared repository when finished.
Using a tool like Campfire or Flowdock, the entire team can share and recall group communications. Developers will be available continuously (during certain time frames) in this environment.
All business logic modules will be supported by automated unit tests and shared with the client as desired. The user interface will be supported by a reasonable suite of integration tests. The full test suite should be run before every commit.
All completed source code is kept in a shared repository like GitHub and available at any time to the Client. I would suggest we use a cloud or PaaS like Heroku or Digital Ocean to host a couple environments, "staging" and the ultimate live site. As soon as possible, the running application will be available in staging. Every bit that is committed into the shared repository will immediately be installed into the staging environment for review. Code is often delivered several times a day.
What I Don't Do
I don't tell you what you want to hear. I don't pretend to see the future and act like I know what the optimal solution for the client will look like, months from now, much less how much effort it will take. After all, Estimation is Evil.
Rather than basing the relationship on a contract that gives clients a false sense of security about getting what they need, I prefer to set up a relationship based on trust. Trust at a personal level, but more importantly a trust in the practices that we will set up together to ensure the best possible use of time and resources.
UPDATE: If I did this again, I would consider packaging the code as a Concern
I recently wanted to DRY up some repetitive Rails association code, similar associations used in multiple models. This gave me the opportunity to figure out a few tricks, like creating a module Class method and using metaprogramming to add dynamically named accessors.
Often developers need to model a multi-step process. Also common is the need to reuse pieces of the process: maybe the steps are the same but the execution of the steps vary for different contexts, or maybe the step flow changes slightly in some cases but the logic of the steps is largely reused.
One common way developers address this situation is to implement a Template Method pattern. For example, an insurance system might have several similar flows for processing feeds from different sources--partner EDI, consumer web site, provider applications--where many of the steps involved are largely reusable.
A Template Method basically uses inheritance to share common logic, while varying the implementation of steps by overriding methods representing particular steps. However as the GOF posited and leading developers believe today, one should "favor composition over inheritance." Inheritance abuse is a common and natural side effect of trying too hard to reuse logic the OO-way, but we can do better.
The essential problem with using inheritance with the Template Method is that the class tree can get pretty ugly if one needs to vary the implementation of steps AND the flow of steps. Also, more often than not, the is-a relationship between common and specialized processors is restrictive and faulty. For example if a processor implementation wanted to extend something else like ActiveRecord.
What if, rather than using a template method, one could compose flows of steps, where the steps are pluggable members? Another way to build processes that can be reused and extended is with the Strategy pattern, where the algorithm (or steps thereof) are pluggable. Using this approach, the developer creates the structure of the flow in one class but delegates the execution of steps to collaborators.
Now what if there was a fluent interface for building flows of plugable steps, as well as machinery and conventions for coordinating between steps? Well, that is the idea of Flo: https://github.com/jtroxel/flo
Some of the ideas from Spring Batch and Apache Camel have also inspired my thinking here, but Flo is much simpler and less specialized. I was also motivated by the Unix programming model, little tools that do one thing well, strung together with pipes.
Note that Flo is a work in progress, awaiting her first real application.