Software as a service SaaS and SOA updates
Whew, the first part of this year has really gone quickly. We want to take some time out to reflect on the stuff we’ve discussed over the last few weeks and to talk about where the Expert Q&A is going from here.Recap
The first Q&A session we did was on Virtualization. This is a huge topic, but we focused on Virtual Machine Timing because that has such a big impact on communications applications. Lots of great stuff in this Q&A featuring our friends from Voxeo and PSSC Labs.
The second Q&A session we did was on Faxing. Again, another huge topic but we decided to go after Faxing because it impacts almost all of our clients. What was really interesting to me about this was the idea that the delimiter in faxing is silence, as opposed to most IP protocols where signaling comes from packets. It’s very clever and I encourage you to dive into this presentation, it’s Jam Packed!
The third Q&A session we did was on Provisioning. We think the industry can do better when it comes to provisioning. There isn’t one provisioner that handles all handsets, but there should be, and that’s what we’re working on at 2600hz. This Q&A presentation explains why this is such a hard thing to do and how our team is solving this problem. This presentation features Andrew Nagy of Provisioner.net fame!
The third Q&A session we did was on Database. Could we have picked a more massive topic? Maybe, but we decided to focus critically on Layer 1 Failures and Network Partitions, which are some of the biggest issues operationally for distributed systems. Featuring Sam Bisbee from Cloudant, we dive headfirst into a TON of content on Databases. This one is not to be missed, it’s a great deck and a ton of awesome commentary. (The image at the top of this email is about Zombie, Flapping Datacenters, which have to be removed from the cluster).
First up we’ve got two more Awesome Q&A sessions scheduled. The one Next Friday, the 14th of June is on Session Border Controllers! If you can’t tell the difference between an SBC and a Firewall, this is the presentation for you.
Two weeks after that we’re covering DTMF! Ever wonder where those tones on your phone came from? We’re going to give you the answers.
If you want to resell 2600hz, we have two great trainings coming up. The first one is this Friday, the 7th of June and the second one is two weeks later on Friday, the 21st of June.
Last but not least, our sales team just got a new shiny 800 #. Ring them at:
Thanks, and we look forward to bringing you tons of new content. Be on the lookout for an announcement of our API Trainings which should be starting later this month or early next!
How Squirrels break Datacenters and other Database Conjectures
This Q&A presentation was influenced by Kyle Kingsbury’s work on Jepsen, an exploration of modern databases. If you haven’t seen his work and you like this stuff, you should go check it out. It’s awesome.
We just did our Epic Database Expert Q&A featuring Sam Bisbee of Cloudant and Darren Schreiber of 2600hz. We covered a range of topics but focused on these three kinds of failures:
- Network Partitions
- Layer 1 Disasters
- Flapping Internet (Special Class of Network Partitions)
All public networks are unreliable; such is the reality of modern distributed database management (and let’s face it, because of AWS we’re all managing distributed databases, whether we like it or not). Sometimes, when these unreliable networks break down, a partition can form. These partitions, depending on your database configuration, can wreak havoc across a wide gamut of scenarios.
Arguably, most of what a database admin does is prepare for network partitions and how to resolve them.
-Joshua Goldbard, 2600hz
Yes, modern databases run fairly well when they’re not in a failure state, but, frankly, the only thing that matters is the failure state. During a partition, it’s important to understand your database behavior, which can vary wildly. At 2600hz, we leverage BigCouch which is a Master-Master replication strategy with Dynamo Quorum(PPT LINK). What that means in plain English is that every node is a master node and it uses consistent hashing to redistribute the load in the event of a partition.
The best advice we can give here is to know the failure modes and behavior of your database and understand the partition realities of the software.
-Darren Schreiber, 2600hz
Layer 1 Disasters
Hurricanes, Earthquakes or Squirrels? Squirrels eating glass. Squirrels caught in HVAC units. Squirrels tampering with Power lines. All of these are examples of Layer 1 Disasters, but we only think about the really massive outages, not the unexpected ones that effect critical infrastructure.
Darren, the 2600hz CEO, has a lot of experience managing Datacenters. Here’s a quick story from back-in-the-day about managing racks in a DC:
Once upon a time a Datacenter vendor decided to give my company a couple of months notice that they were going to 10x our rates. They assumed we couldn’t migrate out of that Datacenter easily, and they were right. Because we were cheap, we did everything ourselves, which meant loading the racks into a pickup truck by hand that we drove in the rain to another Datacenter. Not my definition of Fun.
-Darren Schreiber, 2600hz
Contrast that with our experience during Sandy, when we were using BigCouch:
On the day before the storm, we just turned off the Datacenter. That was it.
We can evade storms, earthquakes and Squirrels because of Cloudant.
-Darren Schreiber, 2600hz
If a Datacenter gets into a Layer 1 issue, we just kill it and move on. When the disaster is mitigated we bring the service back up, but losing an entire DC (or even multiiple DCs) is not an issue because of our database choice.
Protip: If you can’t predict disasters, have a plan to avoid them.
It is up or is it down? Flapping internet is a special case of the Network Partition. Basically, a flapping connection is one that goes down, then comes up, then goes down; this is actually worse than a server going hard down because of the reconciliation process that happens when the networks reconnect. We’ve got one answer for this and one only: Zombie Servers get Double Tapped.
Basically, if a DataCenter is flapping, it’s better to just disconnect that datacenter manually until it can be confirmed as restored. There’s no easy way to say this, Flapping is one of those scenarios that requires manual intervention. If the DC is flapping you have to take it out or you may never get back online.
Protip: If it flaps, Double Tap.
Darren chose to use the last few minutes to pontificate about how ridiculous life was before BigCouch. There was a point very early on where we simply could not get BigCouch to work and we thought we might have to fold the company. Thanks to incredible support from the Cloudant team, we got everything working and the rest is history.
It’s night and day. We just don’t spend any time on the database anymore… We just don’t have problems with the Software.
-Darren Schreiber, 2600hz
Sam chose to talk about right reliability, specifically in the way in which other systems buffer writes and respond to concerns about right availability.
There are a lot of other databases out there that will reply “Write confirmed” when you buffer the write, NOT when it actually commits to disk. The practical effect of this is that if the disk dies before the write moves out of the buffer, you’re missing writes, which is death to a database.
Durable databases write to disk and confirm, they don’t just buffer. Databases that buffer can be very dangerous depending on the workload.
-Sam Bisbee, Cloudant
We had a blast doing this presentation with Cloudant and we can’t wait for the next Q&A Session on Border Controllers in two weeks. Click here to join us!
Two weeks after that, we’re going to discuss DTMF and how all of that nonsense works in VoIP. Register for free here:
Lastly, if you’d like to talk to our friends at Cloudant, check them out at Cloudant.com or in IRC on Freenode in channel #Cloudant.
Thanks so much for checking out our Q&A. If this all sounds like it’s too much work, you should call our Sales team at 8554642600 or email@example.com. We power some of the biggest infrastructures on the planet and we’d love to talk about how we can help your business eliminate the pains of operating communications infrastructure :).
Database Expert Q&A from 2600hz and Cloudant
When Darren isn’t busy working on stuff in the guts of the world’s biggest Telecom Infrastructures, he’s helping to write books about FreeSWITCH with the epic FreeSWITCH team. Their latest work is available now!
You can buy the book here: http://www.packtpub.com/freeswitch-1-2/book
Learn more about FreeSWITCH by checking out their site: http://freeswitch.org
Thanks for letting us be a part of such an awesome open-source project!!
Every now and again we let our engineers out of the office to wreak havoc on the conference circuit. James gave an awesome presentation at Kamailio World earlier this year and we’ve had a bunch of requests for the deck.
SO, without further adieu, I’d like to present “Kazoo” by James Aimonetti, presented earlier this year at Kamailio World (THANKS FOR THE INVITE DANIEL!!!)
2600hz Kazoo Kamailio Integration Deck from Kamailio World by James Aimonetti
Provisioning Sucks (And here’s what 2600hz is doing about it)
This is the recap of our Provisioning Q&A session featuring Andrew Nagy of Provisioner.net and Francis Genet, lead engineer on the 2600hz Provisioning project. If you dig this kind of stuff you should check out our next Q&A on Database Management here.
If you have ever dealt with phones, chances are you hate provisioning. If you do it manually, it is exceedingly tedious. If you do it automatically, it can be disastrous. Many organizations opt for a homogenous equipment mix, supporting only one Manufacturer with a proprietary provisioning solution because it works and that’s good enough.
Here at 2600hz, almost all of our clients run heterogeneous infrastructures, which means we have to handle all different manufacturers so we couldn’t use the proprietary solution. Second, we work with a lot of handsets and we realized pretty early on that manual provisioning wouldn’t work for us. So we did what any self-respecting group of telecom engineers would do: we built our own provisioner! And, since we’re awesome open-source citizens, we’ve made the code publicly available HERE too! Let’s take a look at the work we’ve done and why we’re doing it:
On the shoulders of Giants
It’s worth mentioning that we are hardly the only organization to wrestle with the realities of provisioning. Our work is based on Andrew Nagy’s Provisioner.net and, to quote Isaac Newton or Linus Torvalds, depending on who you ask, “If we have seen further, it is because we stand on the shoulders of Giants”. Before we start diving into what we consider the state-of-the-art, we’d like to acknowledge the great work our predecessors have done in bringing us to the point where what we’d like to achieve is possible. Alright, let’s dive in!
Why is this hard?
(Quick note: Cisco Handsets take up to 2.1 hours per phone to provision. That’s why having auto-provisioning is so important. Source)
It’s actually not that hard to provision a single phone, or even 100 phones. Hell, it’s not that hard to provision 1000 phones if they’re at the same site and the same manufacturer. See, routers have this awesome option called DHCP Option66 which lets you point phones en masse towards a provisioning server. All of the devices that connect to the router will receive a URL in a packet header that points the phone towards the config files. This is how the process works, but it’s worth diving into how this works over the WAN in a little more detail. Let’s lay out the process for setting up a handset over a Wide Area Network:
- Phone arrives brand new from factory
- Phone has Provisioning URL added to the on-Device GUI <—- This is DHCP66
- Provisioning server creates a provisioning profile for the handset containing all of the configuration files (MAC Address used for identification)
- The Phone is attached to the corporate network and attempts to connect to the provisioning URL in the GUI
- The provisioning server recognizes the MAC ID of the handset and sends the corresponding configuration files after authenticating the phone
- The phone receives the firmware and if this is a secure environment, performs a checksum on the configuration files to make sure they match
- If everything is Kosher, the phone will begin the update process. Once complete it will enter service.
- Every few minutes (days) or when the phone powers on, it will repeat this process starting at step 4
This process has to work every time for every handset. Now, one would think that after the 150 years of telecom that we’ve had, there might be some standardization between vendors but that’s certainly not the case with respect to provisioning. Every manufacturer has a different way crafting their provisioning files, even down to the number required to boot a phone or even the names of the files. It’s enough to drive a developer batty, but this is what we have to work with in Telecom. Seriously go look at the Polycom firmware grid; it’s like a forest of incompatible firmwares.
The Polycom Nightmare Grid
If you want to present your users with simple-to-consume services, you must first conceal complexity. That’s a recurring theme in all the work we do at 2600hz, but it’s perhaps no more true than what we’re doing with respect to provisioning.
What are we doing about it?
At 2600hz, we believe in presenting simple interfaces for complex services. When we think of provisioning, we want our clients to experience a service that “just works”. We don’t allow folks to see firmware file names because we know what works with our servers. Power users can get this functionality back with trivial difficulty, but for the majority of use cases, the default settings are perfect. Here’s what our provisioning interface looks like in our GUI:
You’ll notice that we request a user to select a make and model of their phone, a name and a MAC address. The only piece of really specialized information is the MAC address; everything else is immediately obvious to the user. But provisioning the handset doesn’t govern how the handset might interact with the network. That’s why we’ve included some extra tools to take the experience just a bit further. Like take segregating Voice and Data traffic without physically separate ports. That’s hard to do without VLAN tags but who wants to manually go into each phone to program a VLAN? That’s complicated, and remember, 2600hz is all about hiding complexity:
Here you’ll see a place to enter a VLAN tag. It really is that simple to push VLANs to all of your clients equipment.
How do we hide all this complexity?
When you check new boxes in the management interface for provisioning your handset, we make on the fly changes to the provisioning file for that phone. If you want to have a Yealink T-22 change from 1 line to 2 lines, you can execute that change NOT with a site visit to your client, but with the click of a mouse. This dramatically reduces labor and wasted time in client site visits by eliminating unnecessary troubleshooting.
2600hz has built an awesome suite of provisioning tools for our clients to use in managing their systems. Provisioning is hard because hardware manufacturers make it hard, but that’s why there’s an opportunity for us to innovate in the first place. By concealing complexity from our clients, we make things run smoother and in a much more controllable fashion.
See our Powerpoint here:
Do provisioning servers make you feel weak in the knees? Does the prospect of reading SIP Packets for a living intrigue you? You might have a future working with 2600hz. If this is interesting, shoot us an email at firstname.lastname@example.org and we’ll chat :D.