--- created_at: '2015-11-16T08:46:47.000Z' title: On FPGAs as PC Coprocessors (1996) url: http://fpgacpu.org/usenet/fpgas_as_pc_coprocessors.html author: luu points: 73 story_text: comment_text: num_comments: 32 story_id: story_title: story_url: parent_id: created_at_i: 1447663607 _tags: - story - author_luu - story_10573313 objectID: '10573313' --- [Source](http://fpgacpu.org/usenet/fpgas_as_pc_coprocessors.html "Permalink to fpgacpu.org - On FPGAs as PC Coprocessors") # fpgacpu.org - On FPGAs as PC Coprocessors | ----- | | | | | **_On FPGAs as PC Coprocessors_** | | [**Home][1]** [_Regexps in FPGAs][2] **>>** **<<** [Emulating FPGAs][3]_ **Usenet Postings**   [By Subject][1]   [By Date][4]** **FPGA CPUs**   [Why FPGA CPUs?][5]   [Homebuilt processors][6]   [Altera, Xilinx Announce][7]   [Soft cores][8]   [Porting lcc][9]   [32-bit RISC CPU][10]   [Superscalar FPGA CPUs][11]   [Java processors][12]   [Forth processors][13]   [Reimplementing Alto][14]   [Transputers][15]   [FPGA CPU Speeds][16]   [Synthesized CPUs][17]   [Register files][18]   [Register files (2)][19]   [Floating point][20]   [Using block RAM][21]   [Flex10K CPUs][22]   [Flex10KE CPUs][23] [**Multiprocessors][24]**   [Multis and fast unis][25]   [Inner loop datapaths][26]   [Supercomputers][27] **Systems-on-a-Chip**   [SoC On-Chip Buses][28]   [On-chip Memory][29]   [VGA controller][30]   [Small footprints][31] [**CNets][32]**   [CNets and Datapaths][33]   [Generators vs. synthesis][34] **FPGAs vs. Processors**   [CPUs vs. FPGAs][35]   [Emulating FPGAs][3]   _**FPGAs as coprocessors**_   [Regexps in FPGAs][2]   [Life in an FPGA][36]   [Maximum element][37] **Miscellaneous**   [Floorplanning][38]   [Pushing on a rope][39]   [Virtex speculation][40]   [Rambus for FPGAs][41]   [3-D rendering][42]   [LFSR Design][43]   | [Google][44] SiteSearch | | | | | | Newsgroups: comp.arch.fpga Subject: On FPGAs as PC coprocessors Date: 6 May 1996 22:12:11 GMT One of the "way out speculation" questions asked at FCCM 96 (IEEE Symposium on FPGAs for Custom Computing Machines) was "when, if ever, will an FPGA coprocessor ship on every PC motherboard?" Ignoring the daunting language and interface standards issues, and just looking at current hardware approaches to FPGA "coprocessors", the answer must be "not any time soon". Consider that today's power user's CPU (such as a 200 MHz Pentium Pro or a 400 MHz Alpha 21164A) *running out of L1 cache* issues peak three or four instructions per 2.5-5 ns clock. Also consider that the current "low latency high bandwidth" approach to FPGA coprocessor integration is to hang the FPGA on the PCI bus. In this scenario, the kinds of quasi-general purpose computing problems that an FPGA coprocessor can usefully assist with are quite limited. Issuing a write and then a read back operation to the FPGA could easily take 10 PCI bus cycles (300 ns), assuming no PCI bus contention. In that time (assuming hand crafted code (less effort than a FCCM)) a Pentium Pro could issue as many as 180 instructions; the Alpha, 480 64-bit instructions. Future versions of such processors will soon be doing VIS- or MMX-like limited bytewise 8+-way SIMD parallelism, and eventually superscalar versions of same. Such designs might issue between 500 (*) and 4000 (**) hand coded packed byte operations while that single 300 ns FPGA write/read is still in progress. ((At peak speeds like 400e6 clock/s * 4 issue/clock * 8 byte ops/issue, e.g. 12e9 byte ops/s, you have to agree that superscalar micros with bytewise SIMD are going to displace many FPGA applications on PC and workstation platforms.)) So as long as FPGAs are attached on relatively glacially slow I/O buses -- including 32-bit 33 MHz PCI -- it seems unlikely they will be of much use in general purpose PC processor acceleration. Sure, for applications such as cryptography, image and signal processing, they might be a win (***), given a semi-autonomous problem which either fits in the FPGA and local storage, or which can employ DMA to stream data into or through the FPGA without much CPU intervention or management. Of course, the PCI ASIC crowd has the same latency problems, but they dont share FCCM aspirations of accelerating general purpose computing, rather they focus on the same aforementioned special purpose applications. Five times better latency and four times better bandwidth could be achieved if FPGA vendors invent a way to directly connect their parts to the Pentium Pro external bus, as a peer of the memory/bus controller. A custom, dedicated Pentium Pro interface would probably be required, since FPGA configurable logic would be too slow and electrically incompatible. This could be a good volume business, and not quite the moving target it might appear -- I expect the PPro external bus to be just as ubiquitous and as long lived as have been the 486 and Pentium buses. Someone could make a plug in card which sits in the PPro ZIF socket and which hosts a PPro and its FPGA(s). Alternately, the FPGA coprocessor could be attached on the new advanced graphics memory port, or whatever it is to be called, that will be available in future Intel memory/PCI controller chipsets. One might argue that Xilinx made a big mistake in not offering a version of the XC6200 with a dedicated 66 MHz Pentium external bus interface -- after all it is by far the most popular and most supported processor interface for the most lucrative general computing market. If any vendor does pursue this idea, I would appreciate a couple of sample parts. :-) -- (*) 500 op in 300 ns: forthcoming 200 MHz PPro with MMX: 60 clocks x 1 8-byte MMX insn/clock (**) 4000 op in 300 ns: hypothetical 400 MHz Alpha with each integer unit enhanced for bytewise SIMD: 120 clocks x 4 8-byte insns/clock (***) "win": much cheaper/faster than simply adding a second processor -- Acknowledgements: this posting is a spin-off of a discussion with Mark Shand, and the "way out speculative" question was suggested by Mike Butts. Jan Gray Copyright © 2000, Gray Research LLC. All rights reserved. Last updated: Feb 03 2001 | [1]: http://fpgacpu.org/index.html [2]: http://fpgacpu.org/re.html [3]: http://fpgacpu.org/emulating_fpgas.html [4]: http://fpgacpu.org/bydate.html [5]: http://fpgacpu.org/why.html [6]: http://fpgacpu.org/homebrew.html [7]: http://fpgacpu.org/a-x-announce.html [8]: http://fpgacpu.org/soft.html [9]: http://fpgacpu.org/lcc.html [10]: http://fpgacpu.org/32bits.html [11]: http://fpgacpu.org/superscalar.html [12]: http://fpgacpu.org/javaproc.html [13]: http://fpgacpu.org/forth.html [14]: http://fpgacpu.org/alto.html [15]: http://fpgacpu.org/transputer.html [16]: http://fpgacpu.org/speed.html [17]: http://fpgacpu.org/synth_cpu.html [18]: http://fpgacpu.org/regfile.html [19]: http://fpgacpu.org/regfile2.html [20]: http://fpgacpu.org/fp.html [21]: http://fpgacpu.org/bb.html [22]: http://fpgacpu.org/altera_cpus.html [23]: http://fpgacpu.org/altera_cpus_dual_port_EABs.html [24]: http://fpgacpu.org/multiprocessors.html [25]: http://fpgacpu.org/multis_and_fast_fpgacpus.html [26]: http://fpgacpu.org/innerloop.html [27]: http://fpgacpu.org/array.html [28]: http://fpgacpu.org/buses.html [29]: http://fpgacpu.org/memory.html [30]: http://fpgacpu.org/vga.html [31]: http://fpgacpu.org/12mm2.html [32]: http://fpgacpu.org/cnets.html [33]: http://fpgacpu.org/cnets_datapath.html [34]: http://fpgacpu.org/generators.html [35]: http://fpgacpu.org/cpus_vs_fpgas.html [36]: http://fpgacpu.org/life.html [37]: http://fpgacpu.org/max.html [38]: http://fpgacpu.org/floorplanning.html [39]: http://fpgacpu.org/rope_pushing.html [40]: http://fpgacpu.org/virtex_spec.html [41]: http://fpgacpu.org/rambus.html [42]: http://fpgacpu.org/render.html [43]: http://fpgacpu.org/lfsrs.html [44]: http://www.google.com