hn-classics/_stories/1996/10573313.md

---
created_at: '2015-11-16T08:46:47.000Z'
title: On FPGAs as PC Coprocessors (1996)
url: http://fpgacpu.org/usenet/fpgas_as_pc_coprocessors.html
author: luu
points: 73
story_text: 
comment_text: 
num_comments: 32
story_id: 
story_title: 
story_url: 
parent_id: 
created_at_i: 1447663607
_tags:
- story
- author_luu
- story_10573313
objectID: '10573313'

---
[Source](http://fpgacpu.org/usenet/fpgas_as_pc_coprocessors.html "Permalink to fpgacpu.org - On FPGAs as PC Coprocessors")

# fpgacpu.org - On FPGAs as PC Coprocessors


| ----- |
|   |   |   |  **_On FPGAs as PC Coprocessors_**  
  
 |  
|  [**Home][1]**  
  
[_Regexps in FPGAs][2] **>>**  
**<<** [Emulating FPGAs][3]_  

**Usenet Postings**  
  [By Subject][1]  
  [By Date][4]**  
  
**FPGA CPUs**  
  [Why FPGA CPUs?][5]  
  [Homebuilt processors][6]  
  [Altera, Xilinx Announce][7]  
  [Soft cores][8]  
  [Porting lcc][9]  
  [32-bit RISC CPU][10]  
  [Superscalar FPGA CPUs][11]  
  [Java processors][12]  
  [Forth processors][13]  
  [Reimplementing Alto][14]  
  [Transputers][15]  
  [FPGA CPU Speeds][16]  
  [Synthesized CPUs][17]  
  [Register files][18]  
  [Register files (2)][19]  
  [Floating point][20]  
  [Using block RAM][21]  
  [Flex10K CPUs][22]  
  [Flex10KE CPUs][23]  
  
[**Multiprocessors][24]**  
  [Multis and fast unis][25]  
  [Inner loop datapaths][26]  
  [Supercomputers][27]  
  
**Systems-on-a-Chip**  
  [SoC On-Chip Buses][28]  
  [On-chip Memory][29]  
  [VGA controller][30]  
  [Small footprints][31]  
  
[**CNets][32]**  
  [CNets and Datapaths][33]  
  [Generators vs. synthesis][34]  
  
**FPGAs vs. Processors**  
  [CPUs vs. FPGAs][35]  
  [Emulating FPGAs][3]  
  _**FPGAs as coprocessors**_  
  [Regexps in FPGAs][2]  
  [Life in an FPGA][36]  
  [Maximum element][37]  
  
**Miscellaneous**  
  [Floorplanning][38]  
  [Pushing on a rope][39]  
  [Virtex speculation][40]  
  [Rambus for FPGAs][41]  
  [3-D rendering][42]  
  [LFSR Design][43]  
   

|  [Google][44] SiteSearch  
 |  | |

 |   |   | 

    
    Newsgroups: comp.arch.fpga
    Subject: On FPGAs as PC coprocessors
    Date: 6 May 1996 22:12:11 GMT
    
    One of the "way out speculation" questions asked at FCCM 96 (IEEE 
    Symposium on FPGAs for Custom Computing Machines) was "when, if ever, 
    will an FPGA coprocessor ship on every PC motherboard?"
    
    Ignoring the daunting language and interface standards issues, and just 
    looking at current hardware approaches to FPGA "coprocessors", the 
    answer must be "not any time soon".
    
    Consider that today's power user's CPU (such as a 200 MHz Pentium Pro 
    or a 400 MHz Alpha 21164A) *running out of L1 cache* issues peak three 
    or four instructions per 2.5-5 ns clock.  Also consider that the 
    current "low latency high bandwidth" approach to FPGA coprocessor 
    integration is to hang the FPGA on the PCI bus.
    
    In this scenario, the kinds of quasi-general purpose computing problems 
    that an FPGA coprocessor can usefully assist with are quite limited.  
    Issuing a write and then a read back operation to the FPGA could easily 
    take 10 PCI bus cycles (300 ns), assuming no PCI bus contention.  In 
    that time (assuming hand crafted code (less effort than a FCCM)) a 
    Pentium Pro could issue as many as 180 instructions; the Alpha, 480 
    64-bit instructions. Future versions of such processors will soon be 
    doing VIS- or MMX-like limited bytewise 8+-way SIMD parallelism, and 
    eventually superscalar versions of same.  Such designs might issue 
    between 500 (*) and 4000 (**) hand coded packed byte operations while 
    that single 300 ns FPGA write/read is still in progress.
    
    
    ((At peak speeds like 400e6 clock/s * 4 issue/clock * 8 byte ops/issue, 
    e.g. 12e9 byte ops/s, you have to agree that superscalar micros with 
    bytewise SIMD are going to displace many FPGA applications on PC and 
    workstation platforms.))
    
    
    So as long as FPGAs are attached on relatively glacially slow I/O buses 
    -- including 32-bit 33 MHz PCI -- it seems unlikely they will be of 
    much use in general purpose PC processor acceleration.  Sure, for 
    applications such as cryptography, image and signal processing, they 
    might be a win (***), given a semi-autonomous problem which either fits 
    in the FPGA and local storage, or which can employ DMA to stream data 
    into or through the FPGA without much CPU intervention or management.
    
    Of course, the PCI ASIC crowd has the same latency problems, but they 
    dont share FCCM aspirations of accelerating general purpose computing, 
    rather they focus on the same aforementioned special purpose 
    applications.
    
    
    Five times better latency and four times better bandwidth could be 
    achieved if FPGA vendors invent a way to directly connect their parts 
    to the Pentium Pro external bus, as a peer of the memory/bus 
    controller.  A custom, dedicated Pentium Pro interface would probably 
    be required, since FPGA configurable logic would be too slow and 
    electrically incompatible.
    
    This could be a good volume business, and not quite the moving target 
    it might appear -- I expect the PPro external bus to be just as 
    ubiquitous and as long lived as have been the 486 and Pentium buses.  
    Someone could make a plug in card which sits in the PPro ZIF socket and 
    which hosts a PPro and its FPGA(s).
    
    Alternately, the FPGA coprocessor could be attached on the new advanced 
    graphics memory port, or whatever it is to be called, that will be 
    available in future Intel memory/PCI controller chipsets.
    
    One might argue that Xilinx made a big mistake in not offering a 
    version of the XC6200 with a dedicated 66 MHz Pentium external bus 
    interface -- after all it is by far the most popular and most supported 
    processor interface for the most lucrative general computing market.
    
    If any vendor does pursue this idea, I would appreciate a couple of 
    sample parts. :-)
    
    --
    (*) 500 op in 300 ns: forthcoming 200 MHz PPro with MMX: 60 clocks x 1 
    8-byte MMX insn/clock
    
    (**) 4000 op in 300 ns: hypothetical 400 MHz Alpha with each integer 
    unit enhanced for bytewise SIMD: 120 clocks x 4 8-byte insns/clock
    
    (***) "win": much cheaper/faster than simply adding a second processor
    --
    
    Acknowledgements: this posting is a spin-off of a discussion with Mark 
    Shand, and the "way out speculative" question was suggested by Mike 
    Butts.
    
    Jan Gray
    

Copyright © 2000, Gray Research LLC. All rights reserved.  
Last updated: Feb 03 2001  
 | 

[1]: http://fpgacpu.org/index.html
[2]: http://fpgacpu.org/re.html
[3]: http://fpgacpu.org/emulating_fpgas.html
[4]: http://fpgacpu.org/bydate.html
[5]: http://fpgacpu.org/why.html
[6]: http://fpgacpu.org/homebrew.html
[7]: http://fpgacpu.org/a-x-announce.html
[8]: http://fpgacpu.org/soft.html
[9]: http://fpgacpu.org/lcc.html
[10]: http://fpgacpu.org/32bits.html
[11]: http://fpgacpu.org/superscalar.html
[12]: http://fpgacpu.org/javaproc.html
[13]: http://fpgacpu.org/forth.html
[14]: http://fpgacpu.org/alto.html
[15]: http://fpgacpu.org/transputer.html
[16]: http://fpgacpu.org/speed.html
[17]: http://fpgacpu.org/synth_cpu.html
[18]: http://fpgacpu.org/regfile.html
[19]: http://fpgacpu.org/regfile2.html
[20]: http://fpgacpu.org/fp.html
[21]: http://fpgacpu.org/bb.html
[22]: http://fpgacpu.org/altera_cpus.html
[23]: http://fpgacpu.org/altera_cpus_dual_port_EABs.html
[24]: http://fpgacpu.org/multiprocessors.html
[25]: http://fpgacpu.org/multis_and_fast_fpgacpus.html
[26]: http://fpgacpu.org/innerloop.html
[27]: http://fpgacpu.org/array.html
[28]: http://fpgacpu.org/buses.html
[29]: http://fpgacpu.org/memory.html
[30]: http://fpgacpu.org/vga.html
[31]: http://fpgacpu.org/12mm2.html
[32]: http://fpgacpu.org/cnets.html
[33]: http://fpgacpu.org/cnets_datapath.html
[34]: http://fpgacpu.org/generators.html
[35]: http://fpgacpu.org/cpus_vs_fpgas.html
[36]: http://fpgacpu.org/life.html
[37]: http://fpgacpu.org/max.html
[38]: http://fpgacpu.org/floorplanning.html
[39]: http://fpgacpu.org/rope_pushing.html
[40]: http://fpgacpu.org/virtex_spec.html
[41]: http://fpgacpu.org/rambus.html
[42]: http://fpgacpu.org/render.html
[43]: http://fpgacpu.org/lfsrs.html
[44]: http://www.google.com