QUOTE(Whadda I Do Whadda I Do @ May 15 2007, 10:23 AM)
mmoy: What you are saying is; Intel took 3 years to produce a core (with shrinks and extra large caches) that is 20% faster than AMD's and now they have to follow the lead again to produce a chip with on-die memory controllers just to stay current and get rid of that 20 watt consuming front bus.. SSE is a software tweak to make up for lack of power that you have to write for if Intel tells you to. AMD enhancements are actually easier to implement.
Intel relies on NVidia for decent graphic chips and I wonder why Intel decided to make drop-in replacement CPUs?
Hard to explain AMD share price movement but we will find out soon enough. Intel's shares are being rewarded for process they really haven't shown anything else being as they can't break out.
I suspect that Nehalem will put them far ahead of current.
Intel grew fat and lazy and that happens at big companies. IMC's have their downsides too. You get improved performance at the expense of flexibility.
Your comment about SSE indicates that you don't know what it is. I do SSE programming and perhaps have a better perspective. SSE is a pretty complex topic but I'll just state this: AMD's improvements in FP processing are only had via SSE.
Improving IPC via scalar instructions is hard. That's why Core 2 Duo's microarchitectural improvements were very, very impressive. I expect AMD copied some of the MA improvements from Core 2 Duo. Stuff like macro-ops fusion is such a clever idea that it's surprising that someone didn't come up with it sooner. SSE implements vector operations (btw, there's a reason as to why my website is called vector64.com) where you can operate on one, two, four, eight or sixteen objects at the same time. This is where all of the new x86 architecture instructions are headed to because x86 is such a confused jungle of instructions that adding to it is difficult. Multimedia and other compute-intensive algorithms generally can be programmed for parallel operations and if you can execute parallel operations efficiently using parallel or vector instructions, then you will improve throughput and IPC.
AMD's original K8 architecture processed 128-bit vector instruction as two chunks of 64 bits as they tried to provide SSE compatibility with Intel chips. This was a satisfactory answer as Netburst was a relatively high-frequency, high-latency machine. The K8 latencies are better than Netburst but that's not really something to brag about.
Intel made major improvements to SSE processing in Core 2 Duo. They generally reduced latencies on SIMD instructions while adding several additional instructions. These SIMD improvements are why Core 2 Duo generally clobbers AMD's K8 architecture in heavy Multimedia tasks. AMD implements the instruction set but with an old and outdated architecture. Intel also improved their FP processing in Core 2 Duo but the improvement is only seen if you use SSE instructions. AMD followed suit.
Architecturally, it was the only economical way to get such a big increase in FP performance. If you had to do it with General Purpose FPRs, you'd have to deal with the combinatoral explosion of possibilities with all of the registers. With SIMD, the FP objects are packed together so you know where they are and how you have to deal with them.
Apple, interestingly enough, has made the most use of SIMD instructions. They used IBM's Power chips which had good FP performance but lousy integer performance compared to Intel and AMD. So they were always looking for an edge. IBM's Power series has an SIMD instruction set called Altivec which was a richer instruction set than SSE extensions found on x86 chips. Intel's latest SSE4 instructions mostly look like they are implementing stuff that's been in Altivec for quite some time. They are useful instructions for improving IPC. But the programmer or compiler has to take advantage of them. Microsoft's compiler is autovectorizing at SSE2 and they are starting to use SSE code in their object libraries. On their 64-bit x86 platforms, they assume SSE and it's part of the OS. The coding model is quite different with the FP registers getting dumped in favor of the SIMD registers which use a register model instead of a stack model.
Apple is generally better with multithreaded software too. When you never had that much to work with, you work harder. Apple has always provided a better development environment and better tools to do SIMD than what you see on the Windows and Linux side. That their tools are free is an added plus.
Intel's compilers are generally more up-to-date and take better advantage of the new vector features in the latest processors.