And also, making those copies of the data will take up some time and memory bandwidth as well. If you want your AI thread for example to run as a separate process and it needs access to certain structures in order to do it's work, instead of sending a pointer or reference to those functions, you'll need to send a complete copy instead. Besides using extra memory as I mentioned, this is also going to necessarily be slower. If you have multiple processors it will be worth it no doubt, but it's easy to see why they wouldn't code their games this way right now given how few multiprocessor setups are out there.
I am not a game programmer yet either - actually going to school to become one - but as other people have mentioned, the problem with multiple threads, and the reason why 2 processors is not twice as fast as 1 is because much of the data used for one part of the program will need to be accessed by another part.
This is easy enough to implement in C++ with critical sections that permit only one thread to access the data at a time, however that also means that if multiple threads are trying to access it at the same time, the rest will have to wait while one modifies the data.
So in order to allow multiple threads to work at the same time, each will need to store it's own local copy of the data to minimize the amount of time it controls the critical section. In other words, multithreaded games will need to store the same set of data multiple times, increasing the memory footprint. And if you code it that way, all systems will pay the price whether the threads are running on one CPU or several.
Nice comments #19. Programming for multiple threads is kinda different, and IMHO, a lot of effort is spent just synchronizing the threads. Then there are the overheads of variable/memory protection and the really fun stuff like starvation and deadlocks. Of course, my favorite is probably still the good 'ol fork bomb :P
#27 - 939 will eventually support dual-core processors, but it will definitely appear first for socket 940, so you're in luck there. Of course, the big question that remains is clock speed. Supposing that x52 is the dual-core variant (rumors also have that being a 2.6 GHz Opteron single-core, essentially the Opteron equivalent of the FX-55), if it "only" runs at 1.8 or 2.0 GHz initially, it won't outperform an FX-51 in most applications - at least not initially. (See conversation about SMP programming that took place in this thread.)
However, let me make it clear that I have not seen any material stating the initial clock speeds of dual-core chips - i.e. I haven't read something that's under NDA - so the clock speed is just a stab in the dark. If AMD can launch dual-core at 2.4 GHz, on the other hand (and not charge an arm and a leg for it), I think a lot of people would snatch it up.
#16 I share the same sentiment, I am running a socket 940 FX-51 and cant see shelling out the dough for a marginal upgrade to the FX53. However... Us crazy Socket 940'ers should be in the highest levels of heaven once dual core opterons launch. You see our boards will be compatible with the first dual core opteron CPUS to hit the streets. (With perhaps some BIOS updating) (I"m not certain if 939 will be initally supported or not) I'm hoping to Drop a Dual core Opteron x52 into my boxx once it comes out.
Fascinating roadmap. Luckily I looked at it. I wasn't aware of Socket A demise in this year until now. I must have been off the loop - badly. Time to look at the MoBo features with a magnifying glass again...
Perhaps it is vague what exactly a module is, but if different modules share quite a lot of data then obviously the design can't be called modular anymore.
Any reasonably big program ought be designed with a lot of thought. Good programming practices are well known for decades, but unfortunately not adhered to too often in practice. And that is perhaps the key difference between us: I know the theory, while you have been more confronted with the raw reality in the field, which isn't as pretty as it could have been. Alas, there is a lot of bad code around, and it appears that the spaghetti paradigm still reigns. ;-)
A good design is never a trade off, as most other aspects of the program benefit from it. The art of programming is to keep things simple, even if that may require a lot of thought. If one makes a mess, than any change to the code will be hard.
You aks me whether I played a bug free game recently? At a friends place, I exploited the only serious bug I found in HL2 within a minute after I was finished harassing the police by throwing garbage to their heads. :-)
I think a lot of non-programmers don't realize how much data is shared between different modules. Not necessarily directly, of course, but as passed parameters. You typically end up with the output of one module (procedure/function) being the input to another module, so you have heavy dependencies between the two. Global variables, of course, really help to break modularity. Unfortunately, global variables are often still used as a performance optimization.
It *IS* possible to get things running independently, but it is also a lot more tricky than you seem to think. I'll give a simple example from one of my former jobs.
We were working on a word processor, more or less. It was done in Java (a long time ago - like six or seven years back), and so a lot of routines had to be written by us. We ended up writing the whole document layout engine from scratch because there wasn't much available in Java's primitives. Basically, we had a big, blank window and we wrote all the routines to handle mouse input, text selection, the blinking of the cursor, etc. Can you guess which item used a thread?
It might not be immediately obvious, but at the time we used a separate thread for the blinking of the cursor. That way it could toggle the cursor state between on and off once every .5 seconds. Simple enough, right? It worked well in theory too... except that there were conflicting calls to certain functions. What ended up happening is that the paint function could be triggered in numerous ways - sometimes for no apparent reason, just the OS doing its thing - and you had this thread blinking the cursor all the time. Getting the cursor to write to the screen properly (i.e. no think it was "on" when it was really "off") took weeks of work. Yes, WEEKS! And that was a very simple thread, for all intents and purposes.
The real problem is that when you have multiple threads accessing data at the same time, synchronization isn't something you just worry about once per frame. Well, I suppose it *could* be provided you designed very carefully with this in mind, but it is that "paradigm shift" I was talking about earlier.
Now, let's all just assume for the sake of argument that making games threaded isn't especially complex. Fair enough. Let me ask one question: how many bug free games have you played in recent years? Oh, some are close enough to bug free, but the vast majority ship with quite a few major bugs that need to be addressed with a patch. If the game developers can't manage to rid themselves of most critical bugs with single-threaded models, I shudder to think how difficult it will be for them to eliminate bugs in a multi-threaded world. Threads make debugging *much* more difficult - just trust me on this one!
What you failed to clarify is that when you have to "lock" some data to prevent concurrent access, what happens to the second thread that tries to access the data? It sits and waits and does nothing, usually. If it could find something else to do, that would be great, but it would also add another level of complexity and bug hunting.
We have to become more threaded in our programming approach, but it's a lot more than being modular in design. Everything in the design process is a series of trade offs: more optimized code at the cost of more development time, better graphics at the cost of more time, new features at the cost of more bugs and time, more threads at the cost of more bugs and time... It's a delicate balancing act, and to be honest I would just as soon have slower bug-free code than highly optimized but buggy code (provided the slower code isn't more than 20% slower).
Synchonisation is done by saying: "Don't touch this data until I'm finished!" by one process to another, which will have to wait if it wanted to use that data too. That is simple, and it stays simple if the data is almost never used at the same time by both processes and if not much data is shared. Once per frame is not often. It can easily be made much more complex, and you made a good start. ;)
I didn't even mention multithreading CPU heavy tasks within one module. That depends much more on the actual implementation for how hard that would be and might be more work. The reason why I didn't mention it.
What you seem to overlook it the fact that game engines are frame based. Every freaking frame they calculate and render everything again. At the start of the frame they handle some user input and internet data, then they do the heavy stuff: physics, AI and rendering. Lets say those are three different modules. Even if one depends on the output data of another, they can be easily threaded if they don't process the same frame at the same time. I believe that is called pipelining in hardware. ;)
In fact, games run already on SMP: rendering happens on the GPU, while the rest happens on the CPU. Graphics can't be multithreaded, and OpenGL at least, being a statefull API, is NOT thread safe! However, advantage is almost automatically taken of this parallelism. I think Carmack tried to run the graphics part on a separate thread, not the whole game on an actual dual processor system. Can be wrong though, but it would explain why it didn't make any difference. ;)
The reason why no one bothers with SMP for desktop programs is because desktop pc's have been all single processor or core. Also, the reason why SMP doesn't improve performance that dramatically for most of the programs is because most programs are IO-bound. Games can be video-bound, and dual core processors wouldn't help much either in such case.
"The algorithms that divide up the work between threads need to also be efficient enough that they don't end up wiping out any gains."
Between CPU's you mean? That is what CPU schedulers are for, which reside in the OS. A game doesn't need any algorythm, it simply runs a module as a separate thread, a no-brainer. At the end of the frame the main loop of the game simply waits for all modules to have finished before continuing. Still, I can't understand why you guys think that the overhead might kill the advantage of an additional 2-3 GHz processing power. :p
My main point is that _if_ a game is designed modular, it should be easy to multithread it. If the game design is messy, then obviously it could be a nightmare to implement.
Perfect example why multithreading can be tricky, and locking the data used is necessary: we both used the same comment to reply to, and we're out of sync. ;-)
Damn it, I was just happy I finished my long reply. My reply to you is coming Jarred. Don't post! :p
PrinceXizor, will all due respect, but you have no clue about multithreading it seems. :p
Neither am I a programmer, but I know about programming. ;)
1. If there are different processes, you are already multithreading. But for the rest you are right. However, the whole point of modules is that they don't share a lot of data and that is why it should be easy.
The accounting for the overlap is apart from creating and killing new threads the use of a multithreading API, like Posix Threads.
2. This mysterious entity is the kernel of the operating system. Multithreading is one of the purposes of an OS, and you don't have to worry about the extra latency in your case. Especially not if different threads can run on different processors. If you are running different programs simultaneously, the OS is giving the CPU continually to a different program for a short time. Haven't seen you fret over that. ;)
3. The compilation of source code into binary is not relevant for multithreading as everything which even comes close to hardware communication is far outside of your program. Programs are oblivious to the number of processors or cores, until they ask the OS. Compilation is not affected by multithreading.
4. The reason why Hyperthreading hardly makes a difference is due to hardware, not software. A Hyperthreading processor is only virtually a dual processor system, and not dual core. Running two processes simultaneously is only partially supported by the hardware, and more a matter of utilizing unused transistors by trying to run an extra process.
It depends on the situation whether you get any significant gain at all, and was never supposed to be a 100% increase. With dual core you really get twice the processing power, but not the memory bandwidth. (Of course, games can be video bound. ;)
I think you are also confused a bit with SIMD (Single Input, Multiple Data) and SSE (Streaming SIMD Extensions) which are newer intel CPU instructions, if I look to point 3 of you.
Multithreading overhead can be neglected in general, but it depends on how much data needs to be synchronized between different threads and how often. In a game it could be done mostly just once per frame, and if it has a modular design (which it should, really) hardly any data needs to be protected. If output of one module is used as the input of another module (from physics to graphics engine e.g.), one frame delay makes it possible to put those modules on different cores.
Of course, there is no guarantee that a gamesdeveloper has sensible programmers, but I stand by my former comment.
17 & 18: Actually, 18 is relatively accurate from my experience. The term for what you need to have in order to get multiple threads to interact properly is called a "semaphore", I believe. Basically, you need a gateway so that certain segments of code can *only* be accessed by *one* thread at a time; otherwise, you get out of synch.
Imagine the physics engine, which updates the locations of objects in the world. Let's say you put that in a thread. Then we have another thread handling player movement, one for input, maybe one for network communications, graphics rendering of course, artificial intelligence... there's a ton of things which sound like they *could* be moved into different threads, right?
Consider this, though: in order for the AI to react properly to a given situation, it has to know the current state of the world. You can't have the AI thread examining the world and trying to figure out what to do while the physics thread is in the process of updating the location of objects in the world. The physics and graphics threads will also overlap: you can't have the physics thread moving objects around while the graphics are in the process of rendering to the screen.
The type of application that usually benefits most from highly parallel designs is something where you have chunks of data to be processed that are *entirely* separate from each other - no overlap in shared state. With games, you could look at the graphics pipeline for lots of parallelism, but the graphics cards are already handling most of that now anyway. I don't think the physics calculations take nearly as much time as a lot of people assume (although that could be wrong).
Anyway, I believe the basic process of most 3D games these days goes something like this:
1) Analyze inputs and adjust variables as appropriate (i.e. a gun begins to fire, player begins to slow down/turn, etc.)
2) Run AI routines to determine how the AI characters are going to behave (similar to player inputs) and adjust variables as appropriate.
3) Run physics routines to update the state of all the objects in the world - position, angle, health, etc.
4) Render the current state of the world in the graphics engine.
A multi-threaded approach might do something like the following:
1) Synchronize AI and player input threads and have both update the global variables appropriately.
a) Certain variables are going to be accessed frequently by both threads, so put semaphore logic in place to keep them from writing/reading incorrect values.
2) Run physics threads that update the state of the world. Each thread can handle a portion of the objects so that the physics calculations are done faster. Ideally, you would be able to vary the number of physics threads from 1 to n, where n is the number of processor cores that the system has available.
a) Add additional logic to double-check areas where there is overlap - some objects are going to need to affect both threads.
3) Render the current state of the world to the graphics card, using multiple threads. Again, the ability to have 1 to n rendering threads would be ideal.
a) Hopefully, the graphics card drivers are capable of handling multi-threaded input efficiently!
b) You would also want some sort of optimization in the way the objects are sent to the graphics card.
That's a *VERY* rough description as to how game logic might be coded. You would need to analyze the game code thoroughly to make sure you focus on optimizing the right areas - if the physics and AI only takes 10% of the total CPU time, it's probably not worth wasting effort on this area!
The algorithms that divide up the work between threads need to also be efficient enough that they don't end up wiping out any gains. That's a real key point. Imagine it takes one thread 10 milliseconds to render all of the current world state, and that if we can divide up the work into two threads each thread can get everything done in 5 milliseconds. If the task that divides the work (and the synchronizations issues) take 4 milliseconds on their own, then you break even and you've just spent a lot of effort for no performance gain.
I believe that Jon Carmack encountered some of these issues on Quake 3. It had some alpha/beta level SMP support, but I never did hear of an instance where the SMP-enabled version ran substantially faster than the non-SMP version. I think he ended up halting work on the feature because it just didn't seem to be worthwhile.
Now, my disclaimer: I *am* a programmer, but I haven't done any serious game programming work. I've also been doing less programming in the past two years as I've moved on to different work. My thoughts on some of this might be wrong, but logic seems to be on my side. If writing multi-threaded software is so "easy" (as some people seem to claim), then why is it that the vast majority of software is single-threaded? Even the best multi-threaded applications often end up with a 25 to 50% speed improvement over a single CPU, and finding truly independent tasks in a lot of applications - particularly games - can be rather difficult. Maybe it's just that the programmers haven't been trained/taught to look for such opportunities? We can only hope that's the case....
I'll preface this by stating that I am not a programmer.
However, I would think that multi-threaded applications are not nearly so simple, because of a few reasons.
1. Even modular components are not totally modular. Data, variables, processes are shared to varying degrees so this overlap has to be accounted for.
2. Something has to know what is going on. This knowledgable entity most likely introduces latencies into the system. These latencies could overshadow the initial performance gain from "ported" single threaded code.
3. Translation. Please see the initial disclaimer especially for this item. It seems to me that we need to keep in mind that the code that is written is not what is run. Compiled code obviously differs from written code and the compilers would have to take into account a multi-thread environment as well. We currently rely on compilers to heavily optimize compiled code (just think of the arguments over which compiler to use for an "apples-to-apples" comparison of Linux application speed to Windows application speed),so, it certainly makes a difference. And, if the compilers are doing a poor job of optimizing for multi-threaded applications, then spending the extra time to program in written code for it, would seem a waste of time to many companies.
4. Hyperthreading. While not exactly dual core, the simple fact that even now, hyperthreading does not accelerate a vast majority of computer tasks seems to indicate that the programming intracacies are not so simple. Only the most highly parallel operations see great performance gains from hyperthreading, which seems (to me anyway) to lend credence to point number three.
Of course, this is all the opinion of a non-programmer :)
Jarred, while I agree with you for arbitrary programs, we're talking about games here, which have large orthogonal, independent components. Every programmer can dig himself into a hole too deep to easily climb out of by bad design and code, but assuming that a game is somehow modular it should be relatively trivial as I said to kick one CPU intensive part of the engine onto a seperate thread. There is no paradigm shift required, unless the programmer hasn't made the switch yet from spaghetti code to modular design.
Game developers are spending insane efforts in optimizing their engine for all kind of different hardware to squeeze out every drop of extra performance. Now, don't tell me that if AMD released a dual core processor next week Valve and Id wouldn't be flogging their programmers to make their game run 100% faster.
14 - The whole problem with getting games to utilize more than one processor is the same as getting *any* application to make use of more than one processor: you need to rework some fundamental aspects of the program. While it sounds trivial at a high level, getting multiple threads to work together effectively and efficiently is actually a rather tricky problem. It's not impossible, but it is more difficult than writing single-threaded code.
What's needed is a paradigm shift (ugh - I hate terms like that, even when they're correctly used); the programmers need to step back and modify some of their core development processes. That's often a painful experience, and I think most software developers right now are looking at the problem and saying that there's no real benefit to changing the code yet. Once dual-core and multi-core setups become common, then they will *have* to change, but right now fewer than 1% of gamers (or computer users) have SMP setups, and only something like 40% have HyperThreading.
I don't understand why AMD thinks that games wouldn't benefit by dual core. It should be relatively trivial to implement threading into a game with its independant CPU intensive components: physics engine, AI and graphics, as there are still games which utilize the CPU a lot for rendering.
Old games will run fine even on one core, while new games will be patched. Action and reaction, stimulus and response. The question is, who does the pushing?
FWIW, The Inq was reporting a mid-February release of the Opteron x52, causing a downward shift in the pricing structure (i.e., x52 will take x50 price slot, x50 will fall to x48 prices, and so on). Furthermore, XbitLabs was reporting that 90nm Opterons x48 are shipping. These tidbits are interesting to me because I hope to build a system based on the Iwill zMAXdp in the very near future. :)
We have received some conflicting reports on the x52 model Opterons. Some places are claiming as early as a January launch, others are claiming February. We have received word from some server manufacturers, however, stating that they don't think the x52 will even be launched at all and that AMD will wait for dual-core Opterons instead. (For all the conflicting reports, the x52 could even *be* dual-core when it launches.)
While we certainly don't mind doing a little speculation, we're not willing to stake our reputation on a launch date for which we have received no *official* word. If we do see x52 Opterons, Q1'05 would be the best time frame. It could also just be an "interesting" test of 90 nm SOI with strained silicon before they finalize that process for the dual-core parts. Who knows for sure? Regardless, the dual-cores will be coming a few months later, so anyone actually looking for added processing power on an SMP server would be well advised to wait until then.
I noticed the new Opterons x52 (2,6GHz) are missing..they're set for a january 14th release i think. Also, a s754 3700+ part would surprise me..and i also think AMD will put out a 2.8GHz cpu in the first half of 2005.
#2 - Fark. Right you are, Peter. Someone corrected me on that last time, and somehow I still managed to get the wrong data into my spreadsheet. Strange, considering that I copied and pasted the data from the last roadmap I thought... must have used a wrong spreadsheet or something.
Anyway, as far as the roadmap being "boring", blame that on the market. Intel's roadmap is really just as bad. Possibly it's a matter of them waiting until 2005 before they really start making any big announcements, but things have been very quiet the last few months.
true the multipliers are low but these things should cost sround 50-60 bucks and should be able to go too 2-2.2ghz without getting the chipset too out of wack. would be good for people like me who upgrade every 6 mounths or so to the hot CHEAP overclocking chip I'm currently on a 2.4ghz 2400 mobile and would welcome an upgrade to a A64 based chip that is close to the clock i'm at now.
HKEPC reported that the socket 754 Semprons 2800+ and 2600+ are clocked at 1.6 and 1.4GHz.
I'm not very excited about overclocking them, with multipliers as low as 8 and 7. The Celeron D is actually much easier to overclock (change FSB 133MHz -> 200MHz and you're set).
What a boring roadmap. Hopefully the new stepping of the 90 nm will be good enough. Then again if it were, wouldn't AMD release higher clocked CPUs? Rumors say up to 25% more headroom. If that is true it doesn't really matter tough if they release higher speed grades as long as you can over clock them yourself.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
31 Comments
Back to Article
timw - Saturday, January 15, 2005 - link
And also, making those copies of the data will take up some time and memory bandwidth as well. If you want your AI thread for example to run as a separate process and it needs access to certain structures in order to do it's work, instead of sending a pointer or reference to those functions, you'll need to send a complete copy instead. Besides using extra memory as I mentioned, this is also going to necessarily be slower. If you have multiple processors it will be worth it no doubt, but it's easy to see why they wouldn't code their games this way right now given how few multiprocessor setups are out there.timw - Saturday, January 15, 2005 - link
I am not a game programmer yet either - actually going to school to become one - but as other people have mentioned, the problem with multiple threads, and the reason why 2 processors is not twice as fast as 1 is because much of the data used for one part of the program will need to be accessed by another part.This is easy enough to implement in C++ with critical sections that permit only one thread to access the data at a time, however that also means that if multiple threads are trying to access it at the same time, the rest will have to wait while one modifies the data.
So in order to allow multiple threads to work at the same time, each will need to store it's own local copy of the data to minimize the amount of time it controls the critical section. In other words, multithreaded games will need to store the same set of data multiple times, increasing the memory footprint. And if you code it that way, all systems will pay the price whether the threads are running on one CPU or several.
WhoBeDaPlaya - Friday, January 14, 2005 - link
Nice comments #19. Programming for multiple threads is kinda different, and IMHO, a lot of effort is spent just synchronizing the threads. Then there are the overheads of variable/memory protection and the really fun stuff like starvation and deadlocks. Of course, my favorite is probably still the good 'ol fork bomb :PJarredWalton - Friday, January 7, 2005 - link
#27 - 939 will eventually support dual-core processors, but it will definitely appear first for socket 940, so you're in luck there. Of course, the big question that remains is clock speed. Supposing that x52 is the dual-core variant (rumors also have that being a 2.6 GHz Opteron single-core, essentially the Opteron equivalent of the FX-55), if it "only" runs at 1.8 or 2.0 GHz initially, it won't outperform an FX-51 in most applications - at least not initially. (See conversation about SMP programming that took place in this thread.)However, let me make it clear that I have not seen any material stating the initial clock speeds of dual-core chips - i.e. I haven't read something that's under NDA - so the clock speed is just a stab in the dark. If AMD can launch dual-core at 2.4 GHz, on the other hand (and not charge an arm and a leg for it), I think a lot of people would snatch it up.
phaxmohdem - Friday, January 7, 2005 - link
#16 I share the same sentiment, I am running a socket 940 FX-51 and cant see shelling out the dough for a marginal upgrade to the FX53. However... Us crazy Socket 940'ers should be in the highest levels of heaven once dual core opterons launch. You see our boards will be compatible with the first dual core opteron CPUS to hit the streets. (With perhaps some BIOS updating) (I"m not certain if 939 will be initally supported or not) I'm hoping to Drop a Dual core Opteron x52 into my boxx once it comes out.*Drooooling already.
Jii - Tuesday, January 4, 2005 - link
Fascinating roadmap. Luckily I looked at it. I wasn't aware of Socket A demise in this year until now. I must have been off the loop - badly. Time to look at the MoBo features with a magnifying glass again...Googer - Saturday, January 1, 2005 - link
First Comment of They NEW YEAR!Pannenkoek - Friday, December 31, 2004 - link
Perhaps it is vague what exactly a module is, but if different modules share quite a lot of data then obviously the design can't be called modular anymore.Any reasonably big program ought be designed with a lot of thought. Good programming practices are well known for decades, but unfortunately not adhered to too often in practice. And that is perhaps the key difference between us: I know the theory, while you have been more confronted with the raw reality in the field, which isn't as pretty as it could have been. Alas, there is a lot of bad code around, and it appears that the spaghetti paradigm still reigns. ;-)
A good design is never a trade off, as most other aspects of the program benefit from it. The art of programming is to keep things simple, even if that may require a lot of thought. If one makes a mess, than any change to the code will be hard.
You aks me whether I played a bug free game recently? At a friends place, I exploited the only serious bug I found in HL2 within a minute after I was finished harassing the police by throwing garbage to their heads. :-)
JarredWalton - Friday, December 31, 2004 - link
Modular != ThreadedI think a lot of non-programmers don't realize how much data is shared between different modules. Not necessarily directly, of course, but as passed parameters. You typically end up with the output of one module (procedure/function) being the input to another module, so you have heavy dependencies between the two. Global variables, of course, really help to break modularity. Unfortunately, global variables are often still used as a performance optimization.
It *IS* possible to get things running independently, but it is also a lot more tricky than you seem to think. I'll give a simple example from one of my former jobs.
We were working on a word processor, more or less. It was done in Java (a long time ago - like six or seven years back), and so a lot of routines had to be written by us. We ended up writing the whole document layout engine from scratch because there wasn't much available in Java's primitives. Basically, we had a big, blank window and we wrote all the routines to handle mouse input, text selection, the blinking of the cursor, etc. Can you guess which item used a thread?
It might not be immediately obvious, but at the time we used a separate thread for the blinking of the cursor. That way it could toggle the cursor state between on and off once every .5 seconds. Simple enough, right? It worked well in theory too... except that there were conflicting calls to certain functions. What ended up happening is that the paint function could be triggered in numerous ways - sometimes for no apparent reason, just the OS doing its thing - and you had this thread blinking the cursor all the time. Getting the cursor to write to the screen properly (i.e. no think it was "on" when it was really "off") took weeks of work. Yes, WEEKS! And that was a very simple thread, for all intents and purposes.
The real problem is that when you have multiple threads accessing data at the same time, synchronization isn't something you just worry about once per frame. Well, I suppose it *could* be provided you designed very carefully with this in mind, but it is that "paradigm shift" I was talking about earlier.
Now, let's all just assume for the sake of argument that making games threaded isn't especially complex. Fair enough. Let me ask one question: how many bug free games have you played in recent years? Oh, some are close enough to bug free, but the vast majority ship with quite a few major bugs that need to be addressed with a patch. If the game developers can't manage to rid themselves of most critical bugs with single-threaded models, I shudder to think how difficult it will be for them to eliminate bugs in a multi-threaded world. Threads make debugging *much* more difficult - just trust me on this one!
What you failed to clarify is that when you have to "lock" some data to prevent concurrent access, what happens to the second thread that tries to access the data? It sits and waits and does nothing, usually. If it could find something else to do, that would be great, but it would also add another level of complexity and bug hunting.
We have to become more threaded in our programming approach, but it's a lot more than being modular in design. Everything in the design process is a series of trade offs: more optimized code at the cost of more development time, better graphics at the cost of more time, new features at the cost of more bugs and time, more threads at the cost of more bugs and time... It's a delicate balancing act, and to be honest I would just as soon have slower bug-free code than highly optimized but buggy code (provided the slower code isn't more than 20% slower).
Pannenkoek - Thursday, December 30, 2004 - link
Synchonisation is done by saying: "Don't touch this data until I'm finished!" by one process to another, which will have to wait if it wanted to use that data too. That is simple, and it stays simple if the data is almost never used at the same time by both processes and if not much data is shared. Once per frame is not often. It can easily be made much more complex, and you made a good start. ;)I didn't even mention multithreading CPU heavy tasks within one module. That depends much more on the actual implementation for how hard that would be and might be more work. The reason why I didn't mention it.
What you seem to overlook it the fact that game engines are frame based. Every freaking frame they calculate and render everything again. At the start of the frame they handle some user input and internet data, then they do the heavy stuff: physics, AI and rendering. Lets say those are three different modules. Even if one depends on the output data of another, they can be easily threaded if they don't process the same frame at the same time. I believe that is called pipelining in hardware. ;)
In fact, games run already on SMP: rendering happens on the GPU, while the rest happens on the CPU. Graphics can't be multithreaded, and OpenGL at least, being a statefull API, is NOT thread safe! However, advantage is almost automatically taken of this parallelism. I think Carmack tried to run the graphics part on a separate thread, not the whole game on an actual dual processor system. Can be wrong though, but it would explain why it didn't make any difference. ;)
The reason why no one bothers with SMP for desktop programs is because desktop pc's have been all single processor or core. Also, the reason why SMP doesn't improve performance that dramatically for most of the programs is because most programs are IO-bound. Games can be video-bound, and dual core processors wouldn't help much either in such case.
"The algorithms that divide up the work between threads need to also be efficient enough that they don't end up wiping out any gains."
Between CPU's you mean? That is what CPU schedulers are for, which reside in the OS. A game doesn't need any algorythm, it simply runs a module as a separate thread, a no-brainer. At the end of the frame the main loop of the game simply waits for all modules to have finished before continuing. Still, I can't understand why you guys think that the overhead might kill the advantage of an additional 2-3 GHz processing power. :p
My main point is that _if_ a game is designed modular, it should be easy to multithread it. If the game design is messy, then obviously it could be a nightmare to implement.
Pannenkoek - Thursday, December 30, 2004 - link
Perfect example why multithreading can be tricky, and locking the data used is necessary: we both used the same comment to reply to, and we're out of sync. ;-)Damn it, I was just happy I finished my long reply. My reply to you is coming Jarred. Don't post! :p
Pannenkoek - Thursday, December 30, 2004 - link
PrinceXizor, will all due respect, but you have no clue about multithreading it seems. :pNeither am I a programmer, but I know about programming. ;)
1. If there are different processes, you are already multithreading. But for the rest you are right. However, the whole point of modules is that they don't share a lot of data and that is why it should be easy.
The accounting for the overlap is apart from creating and killing new threads the use of a multithreading API, like Posix Threads.
2. This mysterious entity is the kernel of the operating system. Multithreading is one of the purposes of an OS, and you don't have to worry about the extra latency in your case. Especially not if different threads can run on different processors. If you are running different programs simultaneously, the OS is giving the CPU continually to a different program for a short time. Haven't seen you fret over that. ;)
3. The compilation of source code into binary is not relevant for multithreading as everything which even comes close to hardware communication is far outside of your program. Programs are oblivious to the number of processors or cores, until they ask the OS. Compilation is not affected by multithreading.
4. The reason why Hyperthreading hardly makes a difference is due to hardware, not software. A Hyperthreading processor is only virtually a dual processor system, and not dual core. Running two processes simultaneously is only partially supported by the hardware, and more a matter of utilizing unused transistors by trying to run an extra process.
It depends on the situation whether you get any significant gain at all, and was never supposed to be a 100% increase. With dual core you really get twice the processing power, but not the memory bandwidth. (Of course, games can be video bound. ;)
I think you are also confused a bit with SIMD (Single Input, Multiple Data) and SSE (Streaming SIMD Extensions) which are newer intel CPU instructions, if I look to point 3 of you.
Multithreading overhead can be neglected in general, but it depends on how much data needs to be synchronized between different threads and how often. In a game it could be done mostly just once per frame, and if it has a modular design (which it should, really) hardly any data needs to be protected. If output of one module is used as the input of another module (from physics to graphics engine e.g.), one frame delay makes it possible to put those modules on different cores.
Of course, there is no guarantee that a gamesdeveloper has sensible programmers, but I stand by my former comment.
JarredWalton - Thursday, December 30, 2004 - link
17 & 18: Actually, 18 is relatively accurate from my experience. The term for what you need to have in order to get multiple threads to interact properly is called a "semaphore", I believe. Basically, you need a gateway so that certain segments of code can *only* be accessed by *one* thread at a time; otherwise, you get out of synch.Imagine the physics engine, which updates the locations of objects in the world. Let's say you put that in a thread. Then we have another thread handling player movement, one for input, maybe one for network communications, graphics rendering of course, artificial intelligence... there's a ton of things which sound like they *could* be moved into different threads, right?
Consider this, though: in order for the AI to react properly to a given situation, it has to know the current state of the world. You can't have the AI thread examining the world and trying to figure out what to do while the physics thread is in the process of updating the location of objects in the world. The physics and graphics threads will also overlap: you can't have the physics thread moving objects around while the graphics are in the process of rendering to the screen.
The type of application that usually benefits most from highly parallel designs is something where you have chunks of data to be processed that are *entirely* separate from each other - no overlap in shared state. With games, you could look at the graphics pipeline for lots of parallelism, but the graphics cards are already handling most of that now anyway. I don't think the physics calculations take nearly as much time as a lot of people assume (although that could be wrong).
Anyway, I believe the basic process of most 3D games these days goes something like this:
1) Analyze inputs and adjust variables as appropriate (i.e. a gun begins to fire, player begins to slow down/turn, etc.)
2) Run AI routines to determine how the AI characters are going to behave (similar to player inputs) and adjust variables as appropriate.
3) Run physics routines to update the state of all the objects in the world - position, angle, health, etc.
4) Render the current state of the world in the graphics engine.
A multi-threaded approach might do something like the following:
1) Synchronize AI and player input threads and have both update the global variables appropriately.
a) Certain variables are going to be accessed frequently by both threads, so put semaphore logic in place to keep them from writing/reading incorrect values.
2) Run physics threads that update the state of the world. Each thread can handle a portion of the objects so that the physics calculations are done faster. Ideally, you would be able to vary the number of physics threads from 1 to n, where n is the number of processor cores that the system has available.
a) Add additional logic to double-check areas where there is overlap - some objects are going to need to affect both threads.
3) Render the current state of the world to the graphics card, using multiple threads. Again, the ability to have 1 to n rendering threads would be ideal.
a) Hopefully, the graphics card drivers are capable of handling multi-threaded input efficiently!
b) You would also want some sort of optimization in the way the objects are sent to the graphics card.
That's a *VERY* rough description as to how game logic might be coded. You would need to analyze the game code thoroughly to make sure you focus on optimizing the right areas - if the physics and AI only takes 10% of the total CPU time, it's probably not worth wasting effort on this area!
The algorithms that divide up the work between threads need to also be efficient enough that they don't end up wiping out any gains. That's a real key point. Imagine it takes one thread 10 milliseconds to render all of the current world state, and that if we can divide up the work into two threads each thread can get everything done in 5 milliseconds. If the task that divides the work (and the synchronizations issues) take 4 milliseconds on their own, then you break even and you've just spent a lot of effort for no performance gain.
I believe that Jon Carmack encountered some of these issues on Quake 3. It had some alpha/beta level SMP support, but I never did hear of an instance where the SMP-enabled version ran substantially faster than the non-SMP version. I think he ended up halting work on the feature because it just didn't seem to be worthwhile.
Now, my disclaimer: I *am* a programmer, but I haven't done any serious game programming work. I've also been doing less programming in the past two years as I've moved on to different work. My thoughts on some of this might be wrong, but logic seems to be on my side. If writing multi-threaded software is so "easy" (as some people seem to claim), then why is it that the vast majority of software is single-threaded? Even the best multi-threaded applications often end up with a 25 to 50% speed improvement over a single CPU, and finding truly independent tasks in a lot of applications - particularly games - can be rather difficult. Maybe it's just that the programmers haven't been trained/taught to look for such opportunities? We can only hope that's the case....
PrinceXizor - Thursday, December 30, 2004 - link
I'll preface this by stating that I am not a programmer.However, I would think that multi-threaded applications are not nearly so simple, because of a few reasons.
1. Even modular components are not totally modular. Data, variables, processes are shared to varying degrees so this overlap has to be accounted for.
2. Something has to know what is going on. This knowledgable entity most likely introduces latencies into the system. These latencies could overshadow the initial performance gain from "ported" single threaded code.
3. Translation. Please see the initial disclaimer especially for this item. It seems to me that we need to keep in mind that the code that is written is not what is run. Compiled code obviously differs from written code and the compilers would have to take into account a multi-thread environment as well. We currently rely on compilers to heavily optimize compiled code (just think of the arguments over which compiler to use for an "apples-to-apples" comparison of Linux application speed to Windows application speed),so, it certainly makes a difference. And, if the compilers are doing a poor job of optimizing for multi-threaded applications, then spending the extra time to program in written code for it, would seem a waste of time to many companies.
4. Hyperthreading. While not exactly dual core, the simple fact that even now, hyperthreading does not accelerate a vast majority of computer tasks seems to indicate that the programming intracacies are not so simple. Only the most highly parallel operations see great performance gains from hyperthreading, which seems (to me anyway) to lend credence to point number three.
Of course, this is all the opinion of a non-programmer :)
You may fire at will!
P-X
Pannenkoek - Thursday, December 30, 2004 - link
Jarred, while I agree with you for arbitrary programs, we're talking about games here, which have large orthogonal, independent components. Every programmer can dig himself into a hole too deep to easily climb out of by bad design and code, but assuming that a game is somehow modular it should be relatively trivial as I said to kick one CPU intensive part of the engine onto a seperate thread. There is no paradigm shift required, unless the programmer hasn't made the switch yet from spaghetti code to modular design.Game developers are spending insane efforts in optimizing their engine for all kind of different hardware to squeeze out every drop of extra performance. Now, don't tell me that if AMD released a dual core processor next week Valve and Id wouldn't be flogging their programmers to make their game run 100% faster.
BARK - Wednesday, December 29, 2004 - link
Is there any hope for socket 940 fx users? I thought AMD would support us more than this!2.2 to 2.4 thats not much of a bump.
JarredWalton - Wednesday, December 29, 2004 - link
14 - The whole problem with getting games to utilize more than one processor is the same as getting *any* application to make use of more than one processor: you need to rework some fundamental aspects of the program. While it sounds trivial at a high level, getting multiple threads to work together effectively and efficiently is actually a rather tricky problem. It's not impossible, but it is more difficult than writing single-threaded code.What's needed is a paradigm shift (ugh - I hate terms like that, even when they're correctly used); the programmers need to step back and modify some of their core development processes. That's often a painful experience, and I think most software developers right now are looking at the problem and saying that there's no real benefit to changing the code yet. Once dual-core and multi-core setups become common, then they will *have* to change, but right now fewer than 1% of gamers (or computer users) have SMP setups, and only something like 40% have HyperThreading.
Pannenkoek - Tuesday, December 28, 2004 - link
I don't understand why AMD thinks that games wouldn't benefit by dual core. It should be relatively trivial to implement threading into a game with its independant CPU intensive components: physics engine, AI and graphics, as there are still games which utilize the CPU a lot for rendering.Old games will run fine even on one core, while new games will be patched. Action and reaction, stimulus and response. The question is, who does the pushing?
miketheidiot - Tuesday, December 21, 2004 - link
#11 moores law is about the # of transitors not performance. with dual core it should not be too much of a difficultly to keep up tho that.coldpower27 - Tuesday, December 21, 2004 - link
I have a theory on what hte Smeprons are likely to be.Smepron 3400+ Palermo 2.0GHZ/DC/256KB S939
Smepron 3200+ Palermo 1.8GHZ/DC/256KB S939
Smepron 3000+ Palermo 1.6GHZ/DC/256KB S939
Sempron 3400+ Palermo 2.2GHZ/SC/128KB S754
Sempron 3300+ Palermo 2.0GHZ/SC/256KB S754
Sempron 3200+ Palermo 2.0GHZ/SC/128KB S754
Sempron 3100+ Palermo 1.8GHZ/SC/256KB S754
Sempron 3000+ Palermo 1.8GHZ/SC/128KB S754
Sempron 2800+ Palermo 1.6GHZ/SC/256KB S754
Sempron 2600+ Palermo 1.6GHZ/SC/128KB S754
I don't beleive in having any K8 processors below 1.6GHZ for the Sempron, performance would just be quite low.
The S754 data has been extrapolated in the fact that we alrerady have Sempron K8 Mobile. 2600+, 2800+ and 3000+
KHysiek - Tuesday, December 21, 2004 - link
I see Moore's law is pretty much dead (this year and next year is even worse.SUOrangeman - Monday, December 20, 2004 - link
FWIW, The Inq was reporting a mid-February release of the Opteron x52, causing a downward shift in the pricing structure (i.e., x52 will take x50 price slot, x50 will fall to x48 prices, and so on). Furthermore, XbitLabs was reporting that 90nm Opterons x48 are shipping. These tidbits are interesting to me because I hope to build a system based on the Iwill zMAXdp in the very near future. :)-SUO
JarredWalton - Monday, December 20, 2004 - link
We have received some conflicting reports on the x52 model Opterons. Some places are claiming as early as a January launch, others are claiming February. We have received word from some server manufacturers, however, stating that they don't think the x52 will even be launched at all and that AMD will wait for dual-core Opterons instead. (For all the conflicting reports, the x52 could even *be* dual-core when it launches.)While we certainly don't mind doing a little speculation, we're not willing to stake our reputation on a launch date for which we have received no *official* word. If we do see x52 Opterons, Q1'05 would be the best time frame. It could also just be an "interesting" test of 90 nm SOI with strained silicon before they finalize that process for the dual-core parts. Who knows for sure? Regardless, the dual-cores will be coming a few months later, so anyone actually looking for added processing power on an SMP server would be well advised to wait until then.
Da DvD - Monday, December 20, 2004 - link
I noticed the new Opterons x52 (2,6GHz) are missing..they're set for a january 14th release i think. Also, a s754 3700+ part would surprise me..and i also think AMD will put out a 2.8GHz cpu in the first half of 2005.JarredWalton - Monday, December 20, 2004 - link
#2 - Fark. Right you are, Peter. Someone corrected me on that last time, and somehow I still managed to get the wrong data into my spreadsheet. Strange, considering that I copied and pasted the data from the last roadmap I thought... must have used a wrong spreadsheet or something.Anyway, as far as the roadmap being "boring", blame that on the market. Intel's roadmap is really just as bad. Possibly it's a matter of them waiting until 2005 before they really start making any big announcements, but things have been very quiet the last few months.
ChineseDemocracyGNR - Sunday, December 19, 2004 - link
I'm running a Mobile too, and I don't think it's worth upgrading to anything less than an A64 with a 10x multiplier. But that's just me. :)The Sempron 2800+ is $109, the 2600+ is $85.
Falloutboy - Sunday, December 19, 2004 - link
true the multipliers are low but these things should cost sround 50-60 bucks and should be able to go too 2-2.2ghz without getting the chipset too out of wack. would be good for people like me who upgrade every 6 mounths or so to the hot CHEAP overclocking chip I'm currently on a 2.4ghz 2400 mobile and would welcome an upgrade to a A64 based chip that is close to the clock i'm at now.ChineseDemocracyGNR - Sunday, December 19, 2004 - link
HKEPC reported that the socket 754 Semprons 2800+ and 2600+ are clocked at 1.6 and 1.4GHz.I'm not very excited about overclocking them, with multipliers as low as 8 and 7. The Celeron D is actually much easier to overclock (change FSB 133MHz -> 200MHz and you're set).
Falloutboy - Sunday, December 19, 2004 - link
I'm looking foward to some lower end semperons for 754 should make for a good cheap chip for overclockingPeter - Sunday, December 19, 2004 - link
Socket-A Sempron-3000 is a 2.0 GHz 512K L2 cache "Barton" part. See also: AMD's datasheet on the matter.Live - Sunday, December 19, 2004 - link
What a boring roadmap. Hopefully the new stepping of the 90 nm will be good enough. Then again if it were, wouldn't AMD release higher clocked CPUs? Rumors say up to 25% more headroom. If that is true it doesn't really matter tough if they release higher speed grades as long as you can over clock them yourself.