ABOUT THREE YEARS ago, Cavium introduced its high-end Octeon network processor family. The company has about a two-year cadence on new parts, so it is about time for an update, and this one is called Octeon II.
The first Octeon was a multicore MIPS CPU with a ton of extras for network, storage and security processing. The idea was to push bits around at high speeds with low latency, and twiddle them in all sorts of unseemly ways at line speed. Given the success of the company, it looks like it worked.
2005's Octeon was followed by Octeon Plus in 2007 and now Octeon II in 2009, with minor updates in even-numbered years. All are software compatible, and do the usual Moore's Law dance of more, better, faster, cheaper and cooler. What does the II after the Octeon bring you? A lot.
A block diagram of the Octeon II
The heart of the new chip is the cores, now called cnMIPS v2, a new dual-issue architecture replacing the older one. It is faster and has more throughput, Cavium claims 45+GHz of compute power. Doing the maths, they say there will be variants that clock to 1.5GHz, so that would mean 32 cores max.
It is a nine-pipeline stage core, so that isn't a bad frequency to hit, especially given the claim of sub-1W/core GHz. The first six-core (max) variant tops out at 17W TDP, and the 32-core has a 60W TDP, but that includes everything, not just the cores.
Every Octeon II will not have 32 cores, most things don't need anywhere near that much CPU power, what they need is fast interconnects. Because of this, the Octeon II has two tiers of crossbars to connect it all up, Hyperconnect in Cavium parlance. With luck, bits won't be blocked moving between units.
The cores share 6MB of cache, split into 2MB L1 and 4MB L2 for a fully-populated chip. On the diagram above, it is listed as on-chip memory.
The Application Acceleration box is where a good chunk of the magic happens. Cavium lists the functions that it speeds up as wireless networking, storage and control processing. This includes TCP/IP acceleration, Deep Packet Inspection (boo, hiss), specialised encryption, protocol processing, compression and packet processing.
Toss in the more generic crypto in the Security Engines box, and you have all you really need. If that isn't enough, the architecture is meant to be customisable, and Cavium will do that for you if you ask nicely and agree to buy lots of them. What you end up with is a chip that can do TCP/IP routing at high speeds, wireless base stations, RAID and data de-duplication all at gigabit speeds.
Having the units to do all of this is all fine and dandy, but Moore's Law tends to conflict with Amdahl's Law, and even if it doesn't, when you start throwing multiple units at a problem, things get ugly. One of the ways to get around this is to control what goes where very carefully, schedule, schedule again, and schedule more. That is what the Application Acceleration Manager does.
It is a hardware instruction scheduler that minimises or eliminates software-created spinlocks. With any luck, by carefully parsing off data to the units, you can use each app engine to capacity. Given the die size of the manager, it is pretty clear that Cavium is serious about minimising contention, it is expensive but worth it.
The last major block is the I/O, and that starts with memory. You can put on up to four DDR3 controllers supporting up to 128GB of DDR3-1600. If that isn't enough work space, buy two Octeon IIs.
Given that the purpose of the chips is moving data in and out quickly, you need a lot of I/O, and Cavium is claiming up to 100Gbps for a 32-core part. This is accomplished with PCIe2, Serial RIO and an inter-chip connection for chaining multiple Octeon IIs.
There are also USB2 ports, and generic low speed IOs for bootROMs and serial ports.
If that is a little too vague for you, here is a much more detailed view of the block diagram.
A more technical block diagram
The first part out of the gate, announced this week, with an SDK soon and shipping samples in Q4, is the CN63xx. It is a family that ranges from two to six cores running at 800MHz to 1.5GHz, 2MB L2 cache, along with various accelerators enabled.
That will be followed by a high-end CN68xx in 2H/09 with a lower end 1-4 core version after that. Prices for the CN63xx start at $59 and go to $199, or at least they will when you can buy the parts.
Now that you know what the Octeon II is, and how much it costs, you are probably wondering why you need it. Cavium has coined a term it calls Hyper Networks, and by that it means having the ability to get your data on any device, anywhere, any time. It doesn't take much to realise that we aren't there yet.
The chip itself can do everything you need, from pulling bits off the HDs on a SAN to LTE base stations, so if Hyper Networks come to pass, Cavium is well positioned. Disparate networks are becoming a mass of converged bits, so it isn't much of a stretch to wonder when this will be available to end users instead of if it will.
A task at least as daunting as the hardware design is the software that runs on these chips, and Cavium has a lot of tools to give customers. In addition to a fairly large ecosystem of OSes, (Green Hills, Wind River, Montavista and ENEA are listed), there are tons of partners supplying middleware and related apps.
You can either take the Cavium supplied applications as is, customise them heavily, buy some from vendors, or if you are really masochistic, write your own from scratch. Since the chips are MIPS based, there is a base of knowledge out there to lower the learning curve.
In about a year, you will not see Octeon II CPUs popping up in things from home routers to telecom base stations, but they will be there. That is the problem with embedded parts, no one knows they are there, or who is powering that blue and black box with blinking lights under your desk.
If you get out a screwdriver, you just might find a Cavium chip in your next one. µ