AMD’s Manju Hegde is one of the rare folks I get to interact with who has an extensive background working at both AMD and NVIDIA. He was one of the co-founders and CEO of Ageia, a company that originally tried to bring higher quality physics simulation to desktop PCs in the mid-2000s. In 2008, NVIDIA acquired Ageia and Manju went along, becoming NVIDIA’s VP of CUDA Technical Marketing. The CUDA fit was a natural one for Manju as he spent the previous three years working on non-graphics workloads for highly parallel processors. Two years later, Manju made his way to AMD to continue his vision for heterogeneous compute work on GPUs. His current role is as the Corporate VP of Heterogeneous Applications and Developer Solutions at AMD.

Given what we know about the new AMD and its goal of building a Heterogeneous Systems Architecture (HSA), Manju’s position is quite important. For those of you who don’t remember back to AMD’s 2012 Financial Analyst Day, the formalized AMD strategy is to exploit its GPU advantages on the APU front in as many markets as possible. AMD has a significant GPU performance advantage compared to Intel, but in order to capitalize on that it needs developer support for heterogeneous compute. A major struggle everyone in the GPGPU space faced was enabling applications that took advantage of the incredible horsepower these processors offered. With AMD’s strategy closely married to doing more (but not all, hence the heterogeneous prefix) compute on the GPU, it needs to succeed where others have failed.

The hardware strategy is clear: don’t just build discrete CPUs and GPUs, but instead transition to APUs. This is nothing new as both AMD and Intel were headed in this direction for years. Where AMD sets itself apart is that it is will to dedicate more transistors to the GPU than Intel. The CPU and GPU are treated almost as equal class citizens on AMD APUs, at least when it comes to die area.

The software strategy is what AMD is working on now. AMD’s Fusion12 Developer Summit (AFDS), in its second year, is where developers can go to learn more about AMD’s heterogeneous compute platform and strategy. Why would a developer attend? AMD argues that the speedups offered by heterogeneous compute can be substantial enough that they could enable new features, usage models or experiences that wouldn’t otherwise be possible. In other words, taking advantage of heterogeneous compute can enable differentiation for a developer.

That brings us to today. In advance of this year’s AFDS, Manju has agreed to directly answer your questions about heterogeneous compute, where the industry is headed and anything else AMD will be covering at AFDS. Manju has a BS in Electrical Engineering (IIT, Bombay) and a PhD in Computer Information and Control Engineering (UMich, Ann Arbor) so make the questions as tough as you can. He'll be answering them on May 21st so keep the submissions coming.

Comments Locked

101 Comments

View All Comments

  • PrezWeezy - Tuesday, May 15, 2012 - link

    It seems like with the advent of using the GPU for some tasks and the CPU for others, the biggest technical hurdle is the idea of programing something to make use of the best suited processor for the job. What are the possibilities of adding a chip, or core, on the hardware side to branch off different tasks to the processor which can complete it fastest? That way no software must be changed in order to make use of the GPGPU. Would that even be feasible? It seems like it might make a rather drastic change in the way x86 works, but I see many more possibilities if a hardware level branch happened instead of software level.
  • BenchPress - Tuesday, May 15, 2012 - link

    No, you can't just "branch off" a CPU workload and make it run on the GPU, let alone faster.

    That said, AVX2 enables compilers to do auto-vectorization very effectively. And such compilers are nearly ready today, a year before Haswell arrives. So it will take very little effort from application developers to take advantage of AVX2. And there will also be many middleware frameworks and libraries which make use of it, so you merely have to install an update to see each application that makes use of it gain performance.

    So you can get the benefits of GPGPU right at the core of the CPU, with minimal effort.
  • kyuu - Tuesday, May 15, 2012 - link

    Goddamn, what are you, some Intel/nVidia shill/shareholder? Or just a troll?

    We get it. You think AVX2 rules and GPGPU drools. Cool, thanks. This is supposed to be for people who want to ask Mr. Hegde questions about GPGPU, not for people to come in and troll everyone else's questions with what comes down to, "AVX2 > GPGPU, lol idiot".

    If you really want to contribute, how about posting an actual question. Y'know, like maybe one about what role, if any, GPGPU will have in the future assuming the widespread adoption of AVX2. That would be a legitimate question and much better than you trolling everyone with your AVX2 propaganda.
  • mrdude - Tuesday, May 15, 2012 - link

    indeed. Can a mod just delete his posts please?
  • SleepyFE - Wednesday, May 16, 2012 - link

    BenchPress must be an Intel employee or something because i can read all his posts between the lines like so:"No no, don't use GPGPU. Our GPU-s suck too much to be used for that. We will add such functions to the CPU where we have a monopoly so everyone will be forced to use it."
  • BenchPress - Wednesday, May 16, 2012 - link

    I'm not an Intel employee. Not even close. So please don't try to make this personal when you're out of technical arguments why a homogeneous CPU with throughput computing technology can't be superior to a heterogeneous solution.

    Have you seen the OpenCL Accelerated Handbrake review? That's Trinity against Intel CPUs without AVX2. Trinity still loses against the CPU codec. So Intel won't need my help selling AVX2. The technology stands for itself. And I would be hailing AMD if they implemented it first.

    AVX2 will have *four* times the floating-point throughput of SSE4, and also adds support for gather which is critical for throughput computing (using OpenCL or any other language/framework of your choice). This isn't some kneejerk reaction to GPGPU. This is a carefully planned paradigm shift and these instructions will outlive any GPGPU attempt.

    It's a scientific fact that computing density increases quadratically while bandwidth increases linearly. And while this can be mitigated to some extent using caches, it demands centralizing data and computation. Hence heterogeneous general purpose computing is doomed to fail sooner or later.
  • maximumGPU - Wednesday, May 16, 2012 - link

    Well said!
  • BenchPress - Wednesday, May 16, 2012 - link

    No, I'm not employed by any of these companies, nor am I a shareholder, nor am I troll. I'm a high performance compiler engineer and I just want to help people avoid wasting their time with technology that I strongly believe has no future in the consumer market.

    I'm sorry but I won't ask Manju any direct questions. He's here only to help sell HSA, so he won't admit that AVX2+ is very strong competing technology which will eventually prevail. AMD has invested a lot of money into HSA and will be late to the homogeneous computing scene. They want to capitalize on this investment while it lasts.

    If you do think HSA is the future, then please tell me which bit of GPU technology could never be merged into the CPU. The GPU already lost the unique advantage of multi-core, SMT, FMA, wide vectors, gather, etc. And AVX-1024 would bring the power consumption on par. Anything still missing to make the CPU superior at general purpose throughput computing?

    Don't blame me for AMD making the wrong technology choices.
  • SleepyFE - Wednesday, May 16, 2012 - link

    You have stated that you strongly belive that HSA is a fail and that AVX is superior. Nothing wrong with speaking your mind.
    Doing so over and over and over (over times number of your posts) makes you a troll
  • BenchPress - Wednesday, May 16, 2012 - link

    Each of my posts highlights different aspects of homogeneous versus heterogeneous throughput computing, backed by verifiable facts. So I'm doing a lot more than just sharing some uninformed opinion and repeating it. I'm trying to help each individual with their specific questions. I can't help it that the conclusion is always the same though. We have AMD to blame for that.

    Manju will answer questions about individual HSA aspects, which is fine and dandy except that the *whole* concept of HSA is problematic compared to the future of AVX.

    Someone has yet to come up with a strong technical argument why general purpose computing on the GPU and general purpose computing on the CPU will always be superior to merging both into one. Physics is telling us otherwise, and I didn't make the rules.

    If nobody else has the guts to point that out, I will. It doesn't make me a troll.

Log in

Don't have an account? Sign up now