hn-classics/_stories/1996/10573313.md

---
created_at: '2015-11-16T08:46:47.000Z'
title: On FPGAs as PC Coprocessors (1996)
url: http://fpgacpu.org/usenet/fpgas_as_pc_coprocessors.html
author: luu
points: 73
story_text:
comment_text:
num_comments: 32
story_id:
story_title:
story_url:
parent_id:
created_at_i: 1447663607
_tags:
- story
- author_luu
- story_10573313
objectID: '10573313'

---
[Source](http://fpgacpu.org/usenet/fpgas_as_pc_coprocessors.html "Permalink to fpgacpu.org - On FPGAs as PC Coprocessors")

# fpgacpu.org - On FPGAs as PC Coprocessors


| ----- |
|   |   |   |  **_On FPGAs as PC Coprocessors_**

 |
|  [**Home][1]**

[_Regexps in FPGAs][2] **>>**
**<<** [Emulating FPGAs][3]_

**Usenet Postings**
  [By Subject][1]
  [By Date][4]**

**FPGA CPUs**
  [Why FPGA CPUs?][5]
  [Homebuilt processors][6]
  [Altera, Xilinx Announce][7]
  [Soft cores][8]
  [Porting lcc][9]
  [32-bit RISC CPU][10]
  [Superscalar FPGA CPUs][11]
  [Java processors][12]
  [Forth processors][13]
  [Reimplementing Alto][14]
  [Transputers][15]
  [FPGA CPU Speeds][16]
  [Synthesized CPUs][17]
  [Register files][18]
  [Register files (2)][19]
  [Floating point][20]
  [Using block RAM][21]
  [Flex10K CPUs][22]
  [Flex10KE CPUs][23]

[**Multiprocessors][24]**
  [Multis and fast unis][25]
  [Inner loop datapaths][26]
  [Supercomputers][27]

**Systems-on-a-Chip**
  [SoC On-Chip Buses][28]
  [On-chip Memory][29]
  [VGA controller][30]
  [Small footprints][31]

[**CNets][32]**
  [CNets and Datapaths][33]
  [Generators vs. synthesis][34]

**FPGAs vs. Processors**
  [CPUs vs. FPGAs][35]
  [Emulating FPGAs][3]
  _**FPGAs as coprocessors**_
  [Regexps in FPGAs][2]
  [Life in an FPGA][36]
  [Maximum element][37]

**Miscellaneous**
  [Floorplanning][38]
  [Pushing on a rope][39]
  [Virtex speculation][40]
  [Rambus for FPGAs][41]
  [3-D rendering][42]
  [LFSR Design][43]


|  [Google][44] SiteSearch
 |  | |

 |   |   |


    Newsgroups: comp.arch.fpga
    Subject: On FPGAs as PC coprocessors
    Date: 6 May 1996 22:12:11 GMT

    One of the "way out speculation" questions asked at FCCM 96 (IEEE
    Symposium on FPGAs for Custom Computing Machines) was "when, if ever,
    will an FPGA coprocessor ship on every PC motherboard?"

    Ignoring the daunting language and interface standards issues, and just
    looking at current hardware approaches to FPGA "coprocessors", the
    answer must be "not any time soon".

    Consider that today's power user's CPU (such as a 200 MHz Pentium Pro
    or a 400 MHz Alpha 21164A) *running out of L1 cache* issues peak three
    or four instructions per 2.5-5 ns clock.  Also consider that the
    current "low latency high bandwidth" approach to FPGA coprocessor
    integration is to hang the FPGA on the PCI bus.

    In this scenario, the kinds of quasi-general purpose computing problems
    that an FPGA coprocessor can usefully assist with are quite limited.
    Issuing a write and then a read back operation to the FPGA could easily
    take 10 PCI bus cycles (300 ns), assuming no PCI bus contention.  In
    that time (assuming hand crafted code (less effort than a FCCM)) a
    Pentium Pro could issue as many as 180 instructions; the Alpha, 480
    64-bit instructions. Future versions of such processors will soon be
    doing VIS- or MMX-like limited bytewise 8+-way SIMD parallelism, and
    eventually superscalar versions of same.  Such designs might issue
    between 500 (*) and 4000 (**) hand coded packed byte operations while
    that single 300 ns FPGA write/read is still in progress.


    ((At peak speeds like 400e6 clock/s * 4 issue/clock * 8 byte ops/issue,
    e.g. 12e9 byte ops/s, you have to agree that superscalar micros with
    bytewise SIMD are going to displace many FPGA applications on PC and
    workstation platforms.))


    So as long as FPGAs are attached on relatively glacially slow I/O buses
    -- including 32-bit 33 MHz PCI -- it seems unlikely they will be of
    much use in general purpose PC processor acceleration.  Sure, for
    applications such as cryptography, image and signal processing, they
    might be a win (***), given a semi-autonomous problem which either fits
    in the FPGA and local storage, or which can employ DMA to stream data
    into or through the FPGA without much CPU intervention or management.

    Of course, the PCI ASIC crowd has the same latency problems, but they
    dont share FCCM aspirations of accelerating general purpose computing,
    rather they focus on the same aforementioned special purpose
    applications.


    Five times better latency and four times better bandwidth could be
    achieved if FPGA vendors invent a way to directly connect their parts
    to the Pentium Pro external bus, as a peer of the memory/bus
    controller.  A custom, dedicated Pentium Pro interface would probably
    be required, since FPGA configurable logic would be too slow and
    electrically incompatible.

    This could be a good volume business, and not quite the moving target
    it might appear -- I expect the PPro external bus to be just as
    ubiquitous and as long lived as have been the 486 and Pentium buses.
    Someone could make a plug in card which sits in the PPro ZIF socket and
    which hosts a PPro and its FPGA(s).

    Alternately, the FPGA coprocessor could be attached on the new advanced
    graphics memory port, or whatever it is to be called, that will be
    available in future Intel memory/PCI controller chipsets.

    One might argue that Xilinx made a big mistake in not offering a
    version of the XC6200 with a dedicated 66 MHz Pentium external bus
    interface -- after all it is by far the most popular and most supported
    processor interface for the most lucrative general computing market.

    If any vendor does pursue this idea, I would appreciate a couple of
    sample parts. :-)

    --
    (*) 500 op in 300 ns: forthcoming 200 MHz PPro with MMX: 60 clocks x 1
    8-byte MMX insn/clock

    (**) 4000 op in 300 ns: hypothetical 400 MHz Alpha with each integer
    unit enhanced for bytewise SIMD: 120 clocks x 4 8-byte insns/clock

    (***) "win": much cheaper/faster than simply adding a second processor
    --

    Acknowledgements: this posting is a spin-off of a discussion with Mark
    Shand, and the "way out speculative" question was suggested by Mike
    Butts.

    Jan Gray


Copyright © 2000, Gray Research LLC. All rights reserved.
Last updated: Feb 03 2001
 |

[1]: http://fpgacpu.org/index.html
[2]: http://fpgacpu.org/re.html
[3]: http://fpgacpu.org/emulating_fpgas.html
[4]: http://fpgacpu.org/bydate.html
[5]: http://fpgacpu.org/why.html
[6]: http://fpgacpu.org/homebrew.html
[7]: http://fpgacpu.org/a-x-announce.html
[8]: http://fpgacpu.org/soft.html
[9]: http://fpgacpu.org/lcc.html
[10]: http://fpgacpu.org/32bits.html
[11]: http://fpgacpu.org/superscalar.html
[12]: http://fpgacpu.org/javaproc.html
[13]: http://fpgacpu.org/forth.html
[14]: http://fpgacpu.org/alto.html
[15]: http://fpgacpu.org/transputer.html
[16]: http://fpgacpu.org/speed.html
[17]: http://fpgacpu.org/synth_cpu.html
[18]: http://fpgacpu.org/regfile.html
[19]: http://fpgacpu.org/regfile2.html
[20]: http://fpgacpu.org/fp.html
[21]: http://fpgacpu.org/bb.html
[22]: http://fpgacpu.org/altera_cpus.html
[23]: http://fpgacpu.org/altera_cpus_dual_port_EABs.html
[24]: http://fpgacpu.org/multiprocessors.html
[25]: http://fpgacpu.org/multis_and_fast_fpgacpus.html
[26]: http://fpgacpu.org/innerloop.html
[27]: http://fpgacpu.org/array.html
[28]: http://fpgacpu.org/buses.html
[29]: http://fpgacpu.org/memory.html
[30]: http://fpgacpu.org/vga.html
[31]: http://fpgacpu.org/12mm2.html
[32]: http://fpgacpu.org/cnets.html
[33]: http://fpgacpu.org/cnets_datapath.html
[34]: http://fpgacpu.org/generators.html
[35]: http://fpgacpu.org/cpus_vs_fpgas.html
[36]: http://fpgacpu.org/life.html
[37]: http://fpgacpu.org/max.html
[38]: http://fpgacpu.org/floorplanning.html
[39]: http://fpgacpu.org/rope_pushing.html
[40]: http://fpgacpu.org/virtex_spec.html
[41]: http://fpgacpu.org/rambus.html
[42]: http://fpgacpu.org/render.html
[43]: http://fpgacpu.org/lfsrs.html
[44]: http://www.google.com