TL;DR: Introduction to the Orchestrate.io DBaaS
In the good ol’ Enterprise days, when you wanted to POC a new application or service, you would fill out the paperwork and, if you were a favorite son, have a new server to play with in just 5 months.
Then private virtualization came to the Enterprise and you could get a new VM in about a month.
Then, your company committed to valuing Business Agility and paid for a place in the public cloud, providing accounts in AWS, Google, Azure, Rackspace, CenturyLink Cloud, etc. Now, you can have an idea in the morning, provision some VMs in a few minutes, and start playing with that idea immediately.
I’m greedy. What’s next? How can I get my POC in front of people even faster?
Data stores are the cornerstone of every application/service … What if I could just skip database deployment and configuration, have a simplified admin, and just start using it?
That’s the promise of a DBaaS like Orchestrate.io which provides much of the common NoSQL functionality over a ReST API. Most importantly, it provides a free pricing tier - perfect for this experiment.
I want to see what one of these services can do, how easy they are to work with, and how it can fit into my developer toolbox. In this article, I am going to scour the docs, pick out what I find interesting, and try to produce a concise, yet meaningful overview. Future articles will explore the API over some common uses.
Disclaimer: I am a CenturyLink Cloud employee and CenturyLink acquired Orchestrate.io on 20 APR 2015.
Orchestrate organizes JSON objects and search in common use cases
All interactions with data are through the ReST API; minimal management is available through a dashboard.
Behind the scenes, Orchestrate is multi-tenant database built on a collection of database technologies
A complete list of technologies they use is here
Data consistency is determined by the database technology your API is using and is either strongly consistent:
or eventually consistent:
due to indexing delays required for this type of data. E.g. If you write 1500 documents in a bulk write, it will take a few hundred milliseconds for that data to be available via search.
Orchestrate was built on AWS, will continue to support existing customers on AWS, but is expanding through the CenturyLink data centers. Current locations are:
This status page shows uptime for each data center for day/week/month and allows you to subscribe to updates.
When you register with Orchestrate, you are automatically enrolled in the Free Tier which requires no credit card and allows
If you exceed your API call monthly limit, your account is suspended until you upgrade or the next monthly billing cycle.
Pricing is presented as following application evolution. The free tier allows experimentation and development. When you are ready to go to production, and your volume exceeds the 50K operations/month, or you want to separate your production and development data, you can upgrade to the Developer plan at $49/month. When your application use, or operational needs demand it, you can upgrade to the Professional tier ($500/month). There is an additional Enterprise level for full customization, including options like private cloud deployments, dedicated clusters, warranty and indemnification.
During normal operation:
Data is replicated 3x across secure, distributed HBase storage engines for resilience and high availability. Daily backups to S3 or your preferred storage account are automatic, or available on demand.
Note that replication is not accross data centers - it is currently 1 DC per app. Multiple DC replication is an outstanding feature request.
There are several ways to import data:
There is also a bash script that uses the bulk write API. If you have > 1 million documents, Orchestrate suggests opening a support ticket for help with the import. There is an outstanding feature request to provide import from an S3 bucket or the dashboard via file upload.
You can migrate data between data centers with a support ticket.
Basic data exports can be done from the dashboard. Once the button is pressed, the data is gzipped and emailed to you. Obvious file size limits apply.
Once you create your Application (database), you will rarely use the dashboard; all data interaction is done via ReST API. This includes creating new collections by simply PUTting data in them.
Hard-core shell people you can use cUrl to communicate with Orchestrate - it’s just ReST.
There is a Node.js command line tool, orcli, that integrates very well with the shell and simplifies everything. orcli is really handy for interactive and batch db maintenance chores.
TIP: see jq for reshaping JSON documents on the command line.
There are several client libraries to simplify ReST use:
There are also data adapters on npm for
All Key/Value items in Orchestrate have a version history, a list of all the changes to the item over time.
Data in Orchestrate is immutable and the majority of operations are non-destructive. When updating a Key/Value item, Orchestrate creates a new version (instead of overwriting the original), and adds it to the version history.
This non-destructive behavior enables you to track changes, retrieve previous values of a Key/Value item, restore deleted values, or manage state when multiple actors are changing values in parellel.
All version history is available via API, down to the millisecond. This provides some pretty interesting opportunities, like preventing SQL style “write-unders” by validating that the user’s version of the document matches the current version, and reverting to the user when their edited version is out of date.
Some obvious limits stand out.
If you need presence in more than one data center, you are out of luck. Data does not replicate between data centers so something like MongoDB regional clusters is not possible.
You could get something like regional sharding by standing up an application in each region, backed by a dedicated database, but any common data would have to make the long haul to the common database.
It would seem that pricing would become a concern when you have multiple databases associated with one application, but Evident.io claims that
the cost of running MongoDB in EC2 was nearly 50% higher than using Orchestrate for our projects.
There is very little to the Admin dashboard. A lot of this need is eliminated by design by things like indexing all document keys so you don’t have to worry about query index coverage.
All of your application monitoring will have to be black box from the application/service instead of the more general performance analyzers for more general purpose NoSQL databases like MongoDB.
That’s it. That’s all. Post to a collection using your API key and you have created your first collection.