PETonCHIP: Architecture of a on-chip High-Resolution, Fully Digital Positron Emission Tomography Scanner for Small Animal Imaging

Pedro Guerra, IEEE Graduate Member, Giancarlo Sportelli, Juan Ortuño, Maria J. Ledesma-Carbayo IEEE Member, Juan Jose Vaquero, IEEE Senior Member, Manuel Desco and Andres Santos IEEE Senior Member

Abstract—Small animal PET systems based on a reduced number of rotating planar detectors have advantages in terms of cost/performance for high resolution imaging. The huge integration capability of current semiconductor technologies has favoured miniaturization and popularised the concept of system on a chip. Following this trend, this work attempts to integrate in a single chip the acquisition electronics and coincidence unit of a fully digital four-head PET scanner. The proof of concept is based on commercially available industrial modules, including a FPGA-based high speed acquisition board combined with a rackable computer, both in 6U cPCI form. The four acquisition digital blocks, coincidence logic and communication interface take up close to 30% of the available resources of the FPGA, providing margin for improvement in the system under development.

I. INTRODUCTION

Small animal PET systems based on rotating planar detectors constitute an alternative, cost effective research tool for molecular imaging and present some interesting advantages, such as resolution, for high sensitivity, high resolution imaging. These scanners usually consist of a reduced number of detector heads in coincidence [1-5]. This configuration has proven to be a reasonable trade-off between cost and performance.

In the last decade semiconductor technology has improved to a point where complete systems that previously required the assembly of several circuits on a printed board (PCB) can now be integrated in a single chip, solution commonly known as system-on-chip (SoC). Presently, it is technically feasible to integrate the complete acquisition electronics for the four heads in a single SoC, including the coincidence unit. This approach simplifies clock distribution to the acquisition modules and communications with the coincidence unit, resulting in a compact module that contributes to keep the system simple and small.

The advent of new highly integrated programmable electronic devices (FPGAs) has enabled the realization of such SoC in programmable devices. Moreover FPGA’s architectures have been further optimized for advanced digital signal processing, fact that can be exploited to explore innovative solutions in the field of high resolution PET imaging [6].

This work faces the design of the front-end electronics for a four-head PET scanner with rotating gantry. In particular, this work describes the hardware under development to implement such on-chip electronics with real time scintillation signal characterization and coincidence resolution.

II. ARCHITECTURE DESCRIPTION

The proposed digital architecture for real time acquisition of the scintillation signals is an alternative to the more generic approach described in [7], where a flexible digital front-end intended for a distributed acquisition system is presented. The current solution is specifically targeted towards a preclinical PET system based on a reduced number of rotating detectors. Under the constraint of a limited number of heads, it is feasible to integrate in a single FPGA all the acquisition and real-time processing electronics as well as coincidence sorting modules.

As it is shown in Fig. 1, the integrated modules provide interface to four gamma detectors with Anger-like outputs \((x^+, x^-, y^+, y^-)\).

These are sampled at a fixed-sampling rate by the free-running ADCs and the digital output is processed by four on-chip acquisition units that include:

- Base line restoration (BLR),
- Pulse detection (Trigger),
- Time stamp, 40 bits with 78 ps bins (Timing),
- Energy and position estimation (\(\Sigma\)),

Manuscript received November 23, 2007. This work has been partially funded by the Spanish Ministry of Education and Science through the CDTEAM consortium as part of the CENIT Spanish research plan.

P. Guerra, J. E. Ortuño, G. Sportelli, M. J. Ledesma-Carbayo and A. Santos are with the Electronic Engineering Department, ETSIT, Universidad Politécnica de Madrid, Spain E-28040, (e-mail: pguerra, juanen, gsportelli, mledesma, andres@die.upm.es).

J.J.Vaquero and M. Desco are with the Hospital General Universitario “Gregorio Marañón”, Madrid, Spain E-28007 (e-mail: juanjo, desco@mce.hggm.es).

Patent pending, application number P2007-02836.
Timing

RAM Ctrl

Queue

Fig. 1: Diagram of the on-chip architecture, including four single processors, one coincidence sorter and on-chip block (SoC) for slow control

- Delayed windows energy, for pile-up detection or depth-of-interaction (DOI) correction in phoswich detectors.

Each detected event generates a 16-byte packet that is internally queued for coincidence recognition in the following stage.

Fig. 2: Diagram of the coincidence unit architecture

Coincidences are resolved in three steps. Firstly, four head-control blocks retrieve detected events from the single detector output queues; secondly, fetched events are simultaneously compared against the oldest timestamp, and finally coincident events are dispatched for transmission. This configuration allows coincidence recognition without explicit event sorting, entailing a complexity as low as $O(\log_2 n)$, where $n$ is the number of heads, which is even more important for future extensions to scanners with more than four heads.

The current prototype is capable of resolving a coincidence every 24 clock cycles (8 for coincidence recognition + 16 for data transmission), that is 460 ns for a sampling clock of 52.5 MHz.

In the transmission stage the detected coincidences are queued as packets of eight 32-bit words and streamed to the single board computer (SBC) through the cPCI interface.

A parallel control thread sends to the SBC information regarding single activity, lost events and external time-stamped events.

Finally, the on-chip hardware includes signalling to external units, aimed at providing synchronization with the acquisition clock to off-the-shelf acquisition boards sampling biological signals, such as cardiac or respiratory. This timing information may be used for off-line gating of the coincidence output list. Alternatively, coincidences may be tagged with the value of external control signals for hard-coded data gating.

III. MATERIAL AND METHODS

A. Hardware

A prototype has been developed to validate the proposed concept. This prototype is based on the VHS-ADC V4 board (Lyrtech Signal Processing, Quebec, Canada), which provides up to 16 DC-coupled channels at 105 MHz, built around the high speed monolithic 14-bit ADC AD6645 (Analog Devices, Norwood, MA, USA) and a high-performance Virtex-4 LX160 (Xilinx, San Jose, CA, USA) FPGA for high speed processing. The board comes in a 6U form factor with a 33 MHz/32-bit Compact PCI (cPCI) interface. For high speed
inter-board communications RapidCHANNEL (1 GB/s) or front panel data port (FPDP) (400 MB/s) digital ports are available.

![Fig. 2: Photo of the actual acquisition board and chassis, with the SBC in place.](image)

The board, which is also being used for the prototype of OPET [8], has very good clock synchronization, thanks to a very low skew clock routing through all the ADCs, optimized for tight synchronous acquisition applications, and it is extensively shielded, which provides excellent inter-channel and external noise insulation (up to 102-dB inter-channel crosstalk insulation).

The acquisition board is plugged into cPCI chassis, where coincidences are reported through the bus to a 2 GHz Core Duo Master-based SBC, which encapsulates and sends data to the host PC via a Gigabit Ethernet (GigE) connection.

The developed code is integrated with different communication modules supplied by the board’s manufacturer. These modules provide clock management and register control with the aid of Microblaze (uB) based IP core from Lyrtech, and data streaming with DMA support through the cPCI interface.

### B. Software

FPGA firmware is described with VHDL’93 (VHSIC Hardware Description Language) and has been synthesized with ISE 9.1 (Xilinx, San Jose, CA, USA).

Software drivers are based on Lyrtech’s board software development kit v4.3 (BSDK) which include a host API that provides low-level routing as well as high level blocks for integration within Simulink (The Mathworks, Natick, MA, USA) blocksets for quick FPGA implementation.

The interface software was implemented with Visual Studio 2005 v8.0 (Microsoft Corp., Redmond WA, USA).

### IV. Results

So far, the minimum system for four heads in coincidence has been coded, where the current firmware takes 30% of the FPGA resources. As it can be seen in Table 1, each of the 4-head single processors only occupies around 5% of the device, which provides margin for the integration of more heads on the chip. Currently, the firmware for a full virtual oscilloscope acquisition mode is still under development. This mode, as described in [9], provides access to the ADC’s raw outputs enabling in-depth analysis for debugging and calibration purposes.

<table>
<thead>
<tr>
<th>Occupational Area</th>
<th>Clock Frequency</th>
</tr>
</thead>
<tbody>
<tr>
<td>4 x Single Processor</td>
<td>20% 52.5 MHz</td>
</tr>
<tr>
<td>Coincidence Processor</td>
<td>3% 52.5 MHz</td>
</tr>
<tr>
<td>Committed IP core</td>
<td>7% 33 MHz</td>
</tr>
<tr>
<td>Oscilloscope</td>
<td>N/A 125 MHz</td>
</tr>
<tr>
<td>Total</td>
<td>30%</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Sustainable Rate</th>
<th>Single Processor</th>
<th>Coincidence Processor</th>
<th>PCI Interface</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>3.75 Mcps</td>
<td>2.1 Mcps</td>
<td>&gt;600 kcps</td>
</tr>
</tbody>
</table>

The system runs with three clocks, one at the sampling frequency of 52 MHz, one at the PCI frequency and one at the operating frequency of the SDRAM. It must be pointed out that the selection of the sampling frequency has been based on the availability in the prototyping board of a low-jitter clock at 125 MHz or 52.5 MHz. However, the same design may be synthesized for sampling clock frequencies as high as 110 MHz, thanks to the high pipelining level of the processing block.

The system has been evaluated in terms of sustainable rates for the selected working frequencies, yielding the results summarized in Table 2. The coincidence data rate through the PCI interface has been computed for a 32 byte coincidence packet and corresponds to a recording rate slightly above 19 MB/s. This packet size is useful for development and testing, though a more compact packet will be developed for the actual data acquisition. It has been determined that the actual bottleneck of the data recording process is the 72000 rpm ST-3120022A (Seagate Technology, Scotts Valley, CA, USA) hard disk of the SBC, which is not able to record sustained data streams faster than 25 MB/s. The processor, however, has been measured to retrieve data from the FPGA at rates higher that 100 MB/s.

The relatively low device occupation suggests the possibility of extending the concept to a full ring tomograph with 8 or 16 detectors. However, if we consider that each detector requires at least 4 ADCs of at least 10 bits, it is reasonable to imagine that the pin out count of the FPGA
would easily become the limiting factor. A feasible and efficient solution would be replacing parallel ADCs with a multiple channel converter with a serial and differential output interface such as the ADS6425 (Texas Instruments, Dallas, TX, USA). Moreover, the use of integrated multichannel converters would help in minimizing clock skew between channels of a single detector.

V. CONCLUSIONS

To our knowledge, this work describes the first attempt to integrate in a single chip the acquisition electronics and coincidence unit of a fully digital 4-head PET scanner. It is expected that the on-chip approach greatly simplifies clock distribution and synchronization as well as eases event communications with the coincidence unit, resulting in a compact module that contributes to keep the system simple and small.

The FPGA occupation validates the feasibility of the approximation, which could be even scaled to more complex tomographs with a higher number of heads. In that case, the most critical limitation would be the pinout of the FPGA, which would require the use of high speed ADCs with serial output.

REFERENCES