Matty: I looked at one of these servers at Supercomputing '08 earlier this week. While the architecture probably would support a Tesla or two, the physical arrangement of the HC-1 almost certainly doesn't leave room for it. It's essentially a sandwich of two 1U servers, with the bottom one a relatively standard x86 server layout, and the top one completely given over to the FPGA-based coprocessor board.

HGJ: What may not have been entirely clear in the article is that this is something entirely different from the C-to-FPGA-image compilers that you refer to. What Convey calls a "personality" is a fixed FPGA image (well, set of images; there are 14 FPGAs on the board!) that turns the FPGA board into a programmable coprocessor with a very specialized instruction set. Then, the compiler targets this virtual processor much as it would any other coprocessor.

(Thus, this is really much more akin to the CPU-on-an-FPGA designs such as the very successful ARM Cortex M1, rather than to the sorts of projects you're thinking of. And the compiler technology is little different from "normal" compiler technology.)

Needless to say, the FPGA images that make up the "personality" are almost certainly generated using specialized languages and carefully optimized. And you can make your own, if you think theirs aren't fast enough.
You will get fairly poor performance if you take legacy code written in C, C++ or Fortran and translate that into something for the FPGA. This has been done before (see for example the Impulse C Compiler). You better write your specialized code in VHDL or Verilog, translate it and put the output into the FPGA and let the CPU do the rest.

The Xilinx Virtex-5 already comes with up to two PowerPC 440 cores. Combining CPU and FPGA on a motherboard is nothing new, and as you just learned, Xilinx has already done the next step by merging the CPU and FPGA on one chip.

I am a bit surprised that AMD turned this project down. But I bet they would love it now. As I wrote in a prior posting, the merger of ATI and AMD only makes sense when you see the future of general-purpose CPUs combined with DSP. And FPGAs are certainly an interesting part of that picture.

Let me lay out the next developments:
- FPGAs become integrated components of CPUs,
- fully customizable microcode, per application (see how the Virtex-5 FPGA is being accessed from the PPC440 cores, a first step),
- CPUs based on photonic transistors,
- CPUs based on organic material,
- synthesizable organic CPUs,
- self-organizing organic CPUs,
- synthetic brains.

In case you were wondering why India sent a probe to the moon a few days ago, you should take a look at their FPGA development plans. Some people realize that a scientific program can be a great vehicle to drive an industrial development. Just wondering what is happened to basic research in the USA...
"...one could literally take legacy code, pass it through the Convey compiler and get a stunningly fast-running executable out the other end..."

That is a very strong claim. Did the company back it up with any demonstrations?
Matty: I looked at one of these servers at Supercomputing '08 earlier this week. While the architecture probably would support a Tesla or two, the physical arrangement of the HC-1 almost certainly doesn't leave room for it. It's essentially a sandwich of two 1U servers, with the bottom one a relatively standard x86 server layout, and the top one completely given over to the FPGA-based coprocessor board.

HGJ: What may not have been entirely clear in the article is that this is something entirely different from the C-to-FPGA-image compilers that you refer to. What Convey calls a "personality" is a fixed FPGA image (well, set of images; there are 14 FPGAs on the board!) that turns the FPGA board into a programmable coprocessor with a very specialized instruction set. Then, the compiler targets this virtual processor much as it would any other coprocessor.

(Thus, this is really much more akin to the CPU-on-an-FPGA designs such as the very successful ARM Cortex M1, rather than to the sorts of projects you're thinking of. And the compiler technology is little different from "normal" compiler technology.)

Needless to say, the FPGA images that make up the "personality" are almost certainly generated using specialized languages and carefully optimized. And you can make your own, if you think theirs aren't fast enough.
Hmmmmm, now I'm thinking if this is standard they can chuck some Tesla cards in there too.....
You will get fairly poor performance if you take legacy code written in C, C++ or Fortran and translate that into something for the FPGA. This has been done before (see for example the Impulse C Compiler). You better write your specialized code in VHDL or Verilog, translate it and put the output into the FPGA and let the CPU do the rest.

The Xilinx Virtex-5 already comes with up to two PowerPC 440 cores. Combining CPU and FPGA on a motherboard is nothing new, and as you just learned, Xilinx has already done the next step by merging the CPU and FPGA on one chip.

I am a bit surprised that AMD turned this project down. But I bet they would love it now. As I wrote in a prior posting, the merger of ATI and AMD only makes sense when you see the future of general-purpose CPUs combined with DSP. And FPGAs are certainly an interesting part of that picture.

Let me lay out the next developments:
- FPGAs become integrated components of CPUs,
- fully customizable microcode, per application (see how the Virtex-5 FPGA is being accessed from the PPC440 cores, a first step),
- CPUs based on photonic transistors,
- CPUs based on organic material,
- synthesizable organic CPUs,
- self-organizing organic CPUs,
- synthetic brains.

In case you were wondering why India sent a probe to the moon a few days ago, you should take a look at their FPGA development plans. Some people realize that a scientific program can be a great vehicle to drive an industrial development. Just wondering what is happened to basic research in the USA...
So, Convex was followed up with Convey. Will Convez be next?
"...one could literally take legacy code, pass it through the Convey compiler and get a stunningly fast-running executable out the other end..."

That is a very strong claim. Did the company back it up with any demonstrations?