Conclusions: Altera's Offerings and Competitive Landscape

FPGA vendors have long preached about the efficiency of reconfigurable hardware over general purpose processors. However, FPGAs have often been rejected as an option by many due to the programming challenges associated with them.  CPUs, and even GPUs, typically offered a much faster time to market and a larger talent pool of programmers. With OpenCL, FPGA vendors can play on an equal footing as far as programming is concerned.   Earlier, the decision to use an FPGA over another accelerator required significant resource commitment.  OpenCL allows FPGAs to be used as just another option lowering the risk and is potentially a game changer.

On a personal note, I am hoping for cheaper OpenCL capable FPGAs to hit the market. Currently, OpenCL capable FPGAs run into thousands of dollars. This is likely not an issue for the enterprise market typically targeted by FPGA vendors. However, OpenCL on FPGAs has not attracted as much mindshare as GPUs. GPU vendors have a huge advantage that anyone with a cheap laptop can start experimenting with and learning about GPUs. The easy and cheap access to GPUs enabled GPU computing to take off.  Whenever computing technology has become cheaper and/or easier to program, it has enabled many creative products around it in fields not thought of by the original technology makers. FPGAs have not yet reached that stage. While there is a community of FPGA enthusiasts, enabling OpenCL on cheaper FPGAs can increase this community many-fold.

Altera's OpenCL offering effectively promises customized hardware for your OpenCL kernels and the claim is that FPGAs will be more efficient than CPUs or GPUs at many tasks.  Applications that are not necessarily floating-point heavy, for example applications relying on custom integer datatypes, heavy bit-manipulation or fixed point calculations, are an area where FPGAs can shine because CPU and GPU hardware is not really tailored for such applications. The high-speed I/O connections available on an FPGA with external bandwidth far outstripping other accelerators is another advantage. I think streaming/filtering type of applications are an obvious niche that FPGAs can fulfill.  On the other hand, accelerators such as Nvidia Tesla and Xeon Phi will likely continue to do well in many double-precision floating-point applications because these accelerators are heavily optimized for such use cases. Applications such as image processing or data visualization that can make use of dedicated graphics related hardware on GPUs are also best done on a GPU. 

Finally, I would say I am cautiously optimistic at the prospect of using OpenCL on FPGAs.  I am impressed by the theoretical potential for OpenCL on FPGAs. However, I would  like to see third party studies comparing OpenCL SDKs for FPGAs and general purpose processors on various tasks to get a better understanding of performance and power consumption of various accelerator options. If you are evaluating GPUs or Xeon Phi for your application, you should definitely also consider evaluating OpenCL on FPGAs and compare their performance against other options for your application. OpenCL on FPGAs looks to be gaining steam and this will be an interesting space to watch in the near future and may very well be a turning point for wider adoption of FPGAs in various high-performance application segments.

Altera's OpenCL Implementation Details
Comments Locked

56 Comments

View All Comments

  • MrSpadge - Wednesday, October 9, 2013 - link

    BTW 2: David, you might want to contact Slicker, the admin of Collatz@Home. His project is fairly simple (and not that useful.. but people like it nevertheless) and has regularly been at the forefront of new technology (CUDA, ATI Stream, OpenCL, Intel GPUs..). Usually he's also very responsive. I could imagine a deal like: you give him access to your hardware, and if he succeeds you could get loads of publicity (attracting buyers and further developers) and quite a few sales.
  • viv32 - Wednesday, October 9, 2013 - link

    Application driven reconfigurable hardware is an exciting idea. I am not sure how dense the fpga should be to support the complexity of today's GPUs (If they want FPGAs to replace GPU ASICs). We design network processors and our fpga emulation boards need atleast 4 Stratixs for complete emulation. If the FPGA gate count can match the GPU then can they still be cost effective? My2c .. please correct me if I'm wrong (I'm no FPGA jockey).
  • rahulgarg - Wednesday, October 9, 2013 - link

    Well it depends. The objective in this case isn't to emulate the GPU at all. If the GPU is actually already a very good fit for your application, then going to FGPAs won't gain you much. But let us say in an application that does not use GPU's texture units, you don't really want to generate texture units on an FPGA. The idea isn't to emulate GPU's units or its pipeline, rather it is to generate a *different* pipeline that is more suitable for your application.
  • wyx087 - Wednesday, October 9, 2013 - link

    Benefit of using hardware description languages such as VHDL is just that, it describes the hardware, forcing you to think in terms of the cells gets placed down. OpenCL is a compute language, its programmers won't take into account something as simple as multipliers are very expensive in hardware unless done in powers of 2.

    Also, vast number of university courses do VHDL/Verilog/SystemVerilog as standard. Electronics is the course title. I have no doubt the number of HDL experts is much more than OpenCL experts on this planet.

    The way around "slow compile" is simulation. I see no mention of simulation tools for designing OpenCL on FPGA. Without simulation tools, it is impossible for this to take off. Simulation is the way we verify our design on a functional level.

    The "compile" (it is known as synthesis and implementation or map, place and route) time is indeed in hours for large designs. Remember you are not just generating a binary for a processor, you are generating a binary file that describes the actual hardware. Put it simply, you are generating THE processor.

    - Professional VHDL programmer
  • loki1725 - Sunday, October 13, 2013 - link

    This is actually what I was going to say. When I was an undergrad in EE (1997-2001) our embedded electronics course used VHDL. I taught in the EE department of a different university from 2009 to 2013 and we offered several courses that used VHDL. While there may not be more VHDL courses then OpenCL, the numbers are probably comparable.

    Still, really cool article, and anything that helps drive the adoption of FPGAs is a good step forward.
  • toyotabedzrock - Wednesday, October 9, 2013 - link

    So the compile time happens beforehand but how long does it take for the fpga to configure itself when you run a program.
  • rahulgarg - Wednesday, October 9, 2013 - link

    Well once you have done the compilation, my understanding is that flashing the binary is actually very fast so that is not an issue.
  • John32 - Wednesday, October 9, 2013 - link

    Are you saying Altera doesn't provide a simulation stage to testing functionality? That's done before generating the binary file for all designs. Generating the binary file is the last thing you do after verifying everything works functionally.

    You say Altera generates Verilog code in then I assume that goes through their standard synthesis, place and route tools. I don't see why you can't do a software and "hardware" (ie. the Verilog code) co-simulation. That's what is normally done during verification. I have C/C++ code that talks to the Verilog code. The C/C++ code is compiled to a binary file and the Verilog code is compiled within an HDL simulator software. Then the entire thing is simulated together. Once that checks out, I generate the binary file and load into the FPGA. I use the same C/C++ code but now with the actual FPGA.
  • John32 - Wednesday, October 9, 2013 - link

    Also, the whole "will it fit into the FPGA" issue is probably going to be a big problem for the likely target audience for this. You have no idea how the OpenCL code is being translated into hardware (ie. gates, LUTs, flip-flops, etc.). That all depends on your code and Altera's software to hardware algorithm.

    This reminds me of Xilinx's System Generator for MATLAB. It's a nice and easy way to get scientists to test their algorithms in hardware to see a ballpark figure of how fast it can be but it's definitely not the way to go for a final product.
  • John32 - Wednesday, October 9, 2013 - link

    I guess there's also the "will it meet timing" problem. What clock speeds does Altera use? Do they just use whatever clock speed they can achieve (ie. one design clocks at 400 MHz while another can only go 100 MHz)?

Log in

Don't have an account? Sign up now