Main results of the 4th International PIV Challenge

In the last decade, worldwide PIV development efforts have resulted in significant improvements in terms of accuracy, resolution, dynamic range and extension to higher dimensions. To assess the achievements and to guide future development efforts, an International PIV Challenge was performed in Lisbon (Portugal) on July 5, 2014. Twenty leading participants, including the major system providers, i.e., Dantec (Denmark), LaVision (Germany), MicroVec (China), PIVTEC (Germany), TSI (USA), have analyzed 5 cases. The cases and analysis explore challenges specific to 2D microscopic PIV (case A), 2D time-resolved PIV (case B), 3D tomographic PIV (cases C and D) and stereoscopic PIV (case E). During the event, 2D macroscopic PIV images (case F) were provided to all 80 attendees of the workshop in Lisbon, with the aim to assess the impact of the user’s experience on the evaluation result. This paper describes the cases and specific algorithms and evaluation parameters applied by the participants and reviews the main results. For future analysis and comparison, the full image database will be accessible at http://www.pivChallenge.org.


Introduction
The 4th International PIV Challenge continues a successful series of PIV Challenges performed since 2001. The 1st International PIV Challenge took place in Göttingen (Germany) in September 2001 prior to the 4th International Symposium on Particle Image Velocimetry (PIV01). Thirteen different institutions analyzed successfully 9 cases, and about 50 participants were present at the workshop to discuss the outcome. The main results are published in Stanislas et al. (2003), and the images of the cases are available on the PIV Challenge Web site. Due to the great success of the competition, a 2nd International PIV Challenge was organized in Busan (Korea) in September 2003. This time the event was linked to the 5th International Symposium on Particle Image Velocimetry (PIV03) and 16 institutions contributed to the analysis of 3 cases and about 60 participants visited the workshop. The main results and conclusions are published in Stanislas et al. (2005). Because of the continuing interest, a 3rd International PIV Challenge was performed in Pasadena (USA), linked to the 6th International Symposium on Particle Image Velocimetry (PIV05). This time 29 teams registered for the challenge, 20 institutions delivered results for in total 5 test cases, and 67 scientists followed the discussion at the workshop. The main results are published in Stanislas et al. (2008).
In 2012, motivated by the PIV research effort within the EU project AFDAR (Advanced Flow Diagnostics for Aeronautical Research), it was decided to initiate another International PIV Challenge to assess the global development Abstract In the last decade, worldwide PIV development efforts have resulted in significant improvements in terms of accuracy, resolution, dynamic range and extension to higher dimensions. To assess the achievements and to guide future development efforts, an International PIV Challenge was performed in Lisbon (Portugal) on July 5, 2014. Twenty leading participants, including the major system providers, i.e., Dantec (Denmark), LaVision (Germany), MicroVec (China), PIVTEC (Germany), TSI (USA), have analyzed 5 cases. The cases and analysis explore challenges specific to 2D microscopic PIV (case A), 2D timeresolved PIV (case B), 3D tomographic PIV (cases C and D) and stereoscopic PIV (case E). During the event, 2D macroscopic PIV images (case F) were provided to all 80 attendees of the workshop in Lisbon, with the aim to assess the impact of the user's experience on the evaluation result. This paper describes the cases and specific algorithms and evaluation parameters applied by the participants and efforts that have strongly enhanced the accuracy, resolution and dynamic range of planar PIV but, for the first time, also to assess the capabilities of volumetric PIV techniques, as their significance and popularity has strongly increased over the last years.
The primary aim of the International PIV Challenge is to assess the current state of the art of PIV evaluation techniques and to guide future development efforts. This requires a broad range of cases each addressing specific problems of general interest. Case A addresses challenges specific to 2D microscopic PIV (µPIV) experiments, such as preprocessing, depth of correlation and enormous flow gradient effects. Case B consists of a real 2D time-resolved image sequence, to access the potential in terms of measurement accuracy, resulting from the strong developments in the field of high repetition rate lasers, high-speed cameras and multi-frame evaluation techniques. Case C consists of a synthetic 3D data set to study the accuracy and resolution of 3D-PIV reconstruction and evaluation methods and the effect of particle concentration. Case D is a synthetic timeresolved 3D image sequence of a turbulent flow to investigate the benefits of using the time history of particles for 3D-PIV. Case E is again an experimental data set measured with a stereoscopic PIV setup to assess the errors due to calibration, self-calibration and loss of pairs. Cases A-E were provided to all participants prior to the PIV Challenge. Case F, which consists of experimental images of particles whose displacement field is precisely predetermined, was made available to all the other 80 attendees of the workshop with the target of evaluating the images within 2 h. The primary aim was to assess the impact of the user experience on the evaluation result by comparing the evaluation result obtained by PIV users and PIV developers. An overview of the test cases is given in Table 1.

Organization
The 4th International PIV Challenge was led by the international scientific steering committee given in Table 2. Beside the general organization of the PIV Challenge, the committee was in charge of defining and preparing the cases, selecting the most qualified institutions out of the ones who registered, providing the images, analyzing the data and presenting the main results at the workshop in Lisbon on July 5, 2014. To ensure an objective and fair competition, members of the scientific committee were excluded from the competition. The scientific committee was supported by the advisory committee given in Table 3. Furthermore, the coauthors of the paper were involved in the preparation and analysis of the data to manage the large workload. The local organization in Lisbon was strongly supported by Professor Antonio L. Moreira (Instituto Superior Tecnico, University of Lisbon, Portugal) who is warmly acknowledged here.

Review
Putting the 4th International PIV Challenge into perspective, the total number of participating institutions is displayed in Fig. 1. Institutions from 15 different countries and 4 continents have contributed to the past four Challenges, most of them from Europe and the USA which reflects the long-term status of the technique in these regions. Figure 2 shows a more detailed picture. Here the number of contributions per year is displayed for the different countries. For some countries, the participations per Challenge are quite similar as the same teams participated multiple times. LaVision GmbH and Technical University Delft are the only participants who joined all four Challenges. It is also visible that the PIV Challenge activity raised in some countries over the years while it decayed in others. Most striking is the first participation of 3 participants from China in 2014. But also participant from Australia and Ireland joined the PIV Challenge for the first time in 2014. On the other hand, the decay of the contributions from Japan from three in 2001 to none in 2014 is surprising, because the image analysis community in Table 1 Overview of cases provided within the 4th International PIV Challenge a . . . in fact different single exposed double-frame images were provided with seeding densities varying between 0.01 and 0.15 ppp. Japan is quite strong. However, Figs. 1 and 2 illustrate nicely the continuing international interest in the PIV Challenge. This is not surprising as PIV has evolved to one of the most important velocity measurement technique in fluid mechanics since its invention in 1977 (Dudderar and Simpkins 1977).
Thanks to the continuing developments in laser, camera and computer power but also evaluation algorithms the capability of PIV will further advance. Thanks to the increasing user-friendliness of commercial software packages and hardware components, provided by the leading PIV system providers, the popularity and spreading of the technique will also continue to raise. However, it will be seen in this paper that the evaluation results strongly depend on the processing parameters selected by the user. This shows the strong relevance of the user's knowledge, experience, persistence and attitude. Therefore, continuing effort is essential to improve the understanding and evaluation procedures of the technique and to establish procedures to quantify the uncertainty of the measurements. We hope that the 4th International PIV Challenge and this paper will provide a valuable service to the community to further enhance the PIV technique. Table 4 shows the list of teams of the present PIV Challenge. In total, 20 institutions from 10 different countries and 4 continents participated in this Challenge. Because of the broad spectrum of cases, the teams were free to choose the ones they wanted to analyze as indicated in Table 4. Some departments shared the workload between different people-cases A, B, E for instance were evaluated by DLR Cologne (Christian Willert), while cases C and D were evaluated by DLR Göttingen (Daniel Schanz). However, as the departments belong to a single institution, it is shown only once in Table 4.

Case description and measurements
PIV is a well-established technique even for the analysis of flows in systems whose effective dimensions are below 1 mm. However, the analysis of flows in microsystems differs from macroscopic flow investigations in many ways (Meinhart et al. 1999). Due to the volume illumination, the thickness of the measurement plane is mainly determined by the aperture of the imaging lens and therefore out-offocus particle images are an inherent problem in microscopic PIV investigations. Consequently, digital image preprocessing algorithms are essential to enhance the spatial resolution by filtering out the unwanted signal caused by out-of-focus particles. Furthermore, the low particle image density achievable in microfluidic systems at large magnification and large particle image displacements as well as strong spatial gradients require sophisticated evaluation procedures, which are less common in macroscopic PIV applications. To bring the micro-and macro-PIV community together, a demanding microscopic PIV case, measured in a microdispersion systems, was considered in this PIV Challenge.
Microdispersion systems are very important in the field of process engineering, in particular food processing technology. These systems produce uniform droplets in the sub-micrometer range if driven with pressure around 100 bar (Kelemen et al. 2015a, b). The basic geometry consists of a straight microchannel (80 µm in depth) with a sharp decrease in cross section and a sharp expansion thereafter as illustrated in Fig. 3. Due to this sudden change in width (from 500 to 80 µm), the velocity changes significantly by about 230 m/s within the field of view. The strong convective acceleration stretches large primary droplets into filaments. Due to the normal and shear forces in the expansion zone, the filaments break up into tiny droplets which persist in the continuous phase of the emulsion (fat in homogenized milk for instance). The flow includes cavitation at the constriction due to the large acceleration, strong in-plane and out-of-plane gradients  in the boundary layers of the channel and shear layers of the jet flow and, in case of two-phase fluids, droplet-wall and droplet-droplet interaction. Thus, the flow characteristics are difficult to resolve due to the large range of spatial and velocity scales.
To characterize the single-phase flow, a µPIV experiment was performed at the Bundeswehr University Munich within the DFG (German Research Foundation) research group 856 (Microsystems for Particulate Life-Science-Products). The channel was driven by a pressure of 200 bar. Fluorescent polystyrene particles with a diameter of 1 µm were added to the flow and imaged by a Zeiss Axio Observer microscope with a 20× magnification lens. To capture the flow, a sCMOS camera with an interframing time of 120 ns was used. The particles were illuminated using a Litron Nano-PIV double-pulse Nd:YAG laser with 4 ns pulse length. 600 Double-frame images were recorded. The field of view covers approximately 1400 × 600 µm 2 on 2560 × 1230 pixels. The main challenges of this experiment are: • Extremely large velocity gradients (Keane and Adrian 1990;Westerweel 2008) • High dynamic velocity range (Adrian 1997) • Depth of correlation • Optical aberrations due to the thick window • Low signal-to-noise ratio (SNR) and cross-talk • Cavitation The time-averaged flow field is quite uniform and of small velocity prior to the inlet of the channel, see (1) in Fig. 3. Toward the inlet, the flow is highly accelerated (2) and a fast channel flow develops with very strong velocity gradients close to the walls (3). Cavitation occurs at different places, especially at the inlet of the small channel in regions of very high velocities (2). This can already be seen in the images. Taking an average image over images 1-250, cavitation bubbles can be seen at both sides of the inlet as shown in Fig. 4 (left). Later, these cavitation bubbles diminish as can be seen by averaging images 300-600, see Fig. 4 (right). Due to this effect, strong velocity changes are expected in this area as the flow state alternates between two modes, namely one with and without cavitation bubbles. It is evident that the velocity change will result in strong local Reynolds stresses on average, although both flow states are fully laminar. The Reynolds stresses are simply an effect of the change in flow state, which results in a mean flow measurement that does not exist at all in the experiment.
At the outlet of the small channel, a free jet with a very thin shear layer develops, indicated by (4). This jet tends to bend toward one wall, because of the 2D geometry. The preferred position might change for subsequent experiments. However, the jet remains attached to one wall for the rest of the experiment and a recirculation region forms (5). The flow field therefore features a high dynamic spatial range (DSR) and a very high dynamic velocity range (DVR) between the fast channel flow and the almost stagnant and reverse flow regions. Since the aim of the current experiment is to qualify the whole flow field at once, the magnification was chosen such that by using the smallest interframing time of 120 ns results in displacements that can still be processed. Increasing the magnification further was not possible, because the minimum interframing time of 120 ns would than have been to large and the velocity evaluation would not have been possible.

Image quality for µPIV
In contrast to macroscopic PIV, the image quality for microscopic imaging is often poor due to low light intensities, small particles and optical aberrations. As typically for cameras with a CMOS architecture, the gain for different pixels might also be different, which results in so-called hot and cold pixels . Hot pixels show a very large intensity, which results in a bright spot in the image, whereas cold pixels have a low or even zero intensity, which results in a dark spot.
These hot and cold pixels must be taken into account during the evaluation process to avoid bias errors. If not removed, they cause strong correlation peaks at zero velocity due to their high intensity. Usually, this effect can be corrected by subtracting a dark image already during the recording. However, sometimes this does not work sufficiently well, especially when cameras are new on the market. Therefore, it was decided to keep these hot pixels (with constantly high intensity values) and cold pixels (with constantly low gain and low or zero intensity value) in the images and leave it up to the participant to correct for them, instead of using the internal camera correction. Furthermore, the signal-to-noise ratio (SNR) is very low, due to the use of small fluorescent particles as well as the large magnifications. In the current case, the ratio of the mean signal to background intensity was below two. In some images also blobs of high intensity can be seen that result from particle agglomerations.
An additional problem was the time delay between successive frames which was set to the minimal interframing time of 120 ns. Due to jitter in the timing, crosstalk between the frames can be observed in some images, i.e., particle images of both exposures appear in one frame.
The pairwise occurrence of particle images with the same distance in a single frame indicates a double exposure with almost the same intensity. These images can be identified by the average intensity of the different frames, which is, in the case of double exposure, significantly higher than in average. In total, only 6 out of the 600 images show significant cross-talk. Strategies to cope with this problem might include image preprocessing or simply the identification and exclusion of these images. However, if not taken into account, high correlation values at zero velocity bias the measurements.
In comparison with macroscopic PIV applications, the particle image density is due to the large magnification typically lower. In order to enhance the quality of the correlation peak and the resolution, ensemble correlation approaches are typically applied (Westerweel et al. 2004). Another inherent problem for microfluidic investigations is the so-called depth of correlation (DOC) as already mentioned at the beginning of the section. In standard PIV, a thin laser light sheet is generated to illuminate a plane in the flow which is usually observed from large distances. The depth of focus of the camera is typically much larger than the thickness of the laser light sheet, and therefore, the depth of the measurement plane is determined by the thickness of the light sheet at a certain intensity threshold which depends also on the size of the particles, the camera sensitivity and the lens settings. In microfluidic applications, due to the small dimensions of the microchannels, the creation and application of a thin laser light sheet is not possible. Volume illumination is used instead. As a consequence of this, the thickness of the measurement plane is defined by the depth of field of the microscope objective lens. In µPIV experiments, the DOC is commonly defined as twice the distance from the object plane to the nearest plane in which a particle becomes sufficiently defocused so that it no longer contributes significantly to the cross-correlation analysis (Meinhart et al. 2000). In the case of vanishing out-of-plane gradients within the DOC, the measurements are not biased, but the correlation peak can be deformed due to non-homogeneous particle image sizes, which can reduce the precision of the peak detection. Unfortunately in many cases, the DOC is on the order of the channel dimension and thus large out-of-plane gradients are present. Under certain assumptions, a theoretical value can be derived for the DOC based on knowledge of the magnification, particle diameter, numerical aperture, refractive index of the medium and wavelength of the light. The DOC can also be determined by an experiment, and very often the experimental values are larger since aberrations appear. In the current case, the theoretical value is DOC theory = 17.6 µm, whereas the experimentally obtained value gives DOC exp = 31.5 µm (Rossi et al. 2012). Since the tiny channel had a squared cross section of 80 × 80 µm 2 , this covers more than one-third of the channel and significant bias errors can be expected. However, adequate image preprocessing or non-normalized correlations can minimize this effect (Rossi et al. 2012). Alternatively, 3D3C methods can be applied to fully avoid bias errors due to the volume illumination .
In general, image preprocessing is very important for microscopic PIV to increase the signal-to-noise ratio and avoid large systematic errors due to the depth of correlation (Lindken et al. 2009). Consequently, most of the teams applied image preprocessing to enhance the quality of the images.

Participants and methods
In total, 23 institutions requested the experimental images. Since the images were very challenging, not all of them succeeded in submitting results in time. Results were submitted by 13 institutions. The acronyms used in the paper and details about the data submitted are listed in Table 5. For the data sets named evaluation 1 (eval1), the participants that applied spatial correlation such as conventional window cross-correlation or sum of correlation by means of window correlation were forced to use a final interrogation window size of 32 × 32 pixels. Multipass, window weighting, image deformation, etc. were allowed. The fixed interrogation windows size allows for a comparison of the different algorithms without major bias due to spatial resolution effects (Keane and Adrian 1992;Kähler et al. 2012a, b). However, the experience and attitude of the user has a very pronounced effect on the evaluation result, and different window sizes may have been favored by the different users to alter the smoothness of the results. Therefore, the teams were free to choose the interrogation window size in case of evaluation 2 (eval2). The evaluation parameters are summarized in Table 5. Some special treatments are described in the following if they differ significantly from the standard routines.
Dantec used close to the walls a special shaped wall window so that the contribution from particles further away can be excluded and thus the systematic velocity overestimation can be minimized. In addition, a N-sigma validation (σ = 2.5) was used to detect and exclude outliers that may appear in groups due to the large overlap.
A feature tracking algorithm was applied by INSEAN (Miozzi 2004), which solves the optical flow equation in a local framework. The algorithm defines the best correlation measure as the minimum of the sum of squared differences (SSD) of pixel intensity corresponding to the same interrogation windows in two subsequent frames. After a linearization, the SSD minimization problem is iteratively solved in a least square style, by adopting two different models of a pure translational window motion and in the second step an affine window deformation. The deformation parameters are given directly by the algorithm solution (Miozzi 2005). The velocity for the individual images was only considered where the solutions of the linear system corresponding to the minimization problem exist. Subpixel image interpolation using a fifth-order B-Spline basis (Unser et al. 1993;Astarita and Cardone 2005) was performed. In-plane loss of pairs was avoided by adopting a pyramidal image representation and subpixel image interpolation using a bicubic scheme. The peak detection scheme of IOT was based on the maximum width of the normalized correlation function at a value of 0.5. The assessment of the mean velocity and the velocity fluctuations started from the center of mass histogram method, and then it was used as an initial guess for an elliptic 2D Gaussian fitting with a nonlinear Levenberg-Marquardt optimization by the upper cap (0.5-1) of the correlation peak points. The displacement fluctuations were obtained by the shape of the ensemble correlation function following the approach of Scharnowski et al. (2012). A global displacement validation (−20 ≤ DX ≤ 50 px and −30 ≤ DY ≤ 30 px and 53 px for vector length) was applied. Outliers after each iteration were replaced by 3 × 3 moving average.
For peak finding, TCD used a 3 × 3 2D Gaussian estimator (Nobach and Honkanen 2005) that ranks peaks according to their volume instead of their height. For intermediate steps, a 3 × 3 spatial median filter was used to determine outliers . They were replaced by lower-ranked correlation peaks or local median of the neighbors. Two further median smoothing operations were applied.
An iterative multi-grid continuous window deformation interrogation algorithm was applied by TU Delft to individual images with decreasing window size starting with 128 pixels (Scarano and Riethmuller 2000) with bilinear interpolation for the velocity and 8 × 8 sinc interpolation for pixel intensities (Astarita and Cardone 2005). Close to the walls, a weighting function was applied to the non-masked region.
The peak finder used by UniG separates in the first step the most significant peaks of the correlation function and evaluates their particular characteristics. The peak quality is rated based on several individually weighted criteria which are related to, e.g., peak shape and peak height relative to the local image contrast. In the next step, a second peak-rating run is performed with an additional criterion that takes into account information from the peaks in neighboring interrogation windows. This method is superior to a simple interpolation-based error filter, because its second peak rating replaces any ordinary outlier substitution (Schewe 2014).

Mask generation
Among all submissions, the mask generation (either algorithmic or manual) is very different. The smallest distance between masked points for eval1 (i.e., minimum channel width) or points where the velocity reaches exactly zero (if no mask provided) range between 160 ...194 pixels, which is more than 20 % of the channel width. The values for the different participants are given in Table 6. No trend can be seen concerning automated or manual mask  , smoothing 2D wavelet (Ogden 1997) If not explicitly stated, groups performed image pair correlation instead of ensemble correlation, VS vector spacing, W interrogation window generation, large as well as small channel width can be found for both algorithmic and manual mask generation. This large scatter already illustrates the strong effect of the participant's attitude on the evaluation domain and thus on the results.

Evaluation 1
Most of the participants used their own masks or excluded data outside of the channel by setting the flag to zero. However, the SNR at the left an right border of the images decreases due to inhomogeneous illumination. Therefore, values for x ≤ 120 px and x ≥ 2400 px were also excluded from the analysis. The mean flow field component in x-direction is shown in Figs. 5 and 6 in the first column. In many experiments, the real or ground truth is not known; however, all physical phenomena due to the flow, tracer particles, illumination, imaging, registration, discretization and quantization of this experiment are realistic, which is not achievable using synthetic images. Furthermore, the ground truth is only helpful if only small deviations exist between the results, which is not the case here. Therefore, a statistical approach is well suited which identifies results that are highly inconsistent with the results of the other teams or which are unphysical taking fluid mechanical considerations into account or which are not explainable based on technical grounds. In principle, all the teams could resolve the average characteristics of the flow. The fluid is accelerated in the first part of the chamber, later a channel flow forms in the contraction and a free jet, leaning to the lower part within the images, develops behind the channel's outlet. For the mean displacement, the magnitude is in coarse agreement for all participants. Obvious differences can be observed in terms of data smoothness (MicroVec for instance), the acceleration of the flow toward the inlet (compare Dantec, Lavision and UniNa), the symmetry of the flow in the channel with respect to its center axis (TCD and LANL), the contraction of the streamlines at the outlet of the channel (compare TSI and UniG with others). The major differences can be found close to the walls, especially in the inlet region where the cavitation bubble forms in the first 250 images. Here the data are in particular more noisy for INSEAN, LANL and TCD.
Dantec used a special wall treatment in the algorithm to estimate the near-wall flow field. For other teams (Micro-Vec, TCD, TSI, TUD, UniG and UniMe), it seems that the velocity was forced to decay to zero at the wall, which results in very strong differences for the gradients in the small channel. These effects can be seen in the displacement profiles for evaluation 2 in Fig. 10. Although it seems reasonable to take the near-wall flow physics into account, this procedure is very sensitive to the definition of the wall location, which is strongly user dependent as shown above. Integrating the displacement in x-direction shows differences of 28 % (of the mean for all participants) between the smallest and larges values for the volume flow rate in the small channel, which is remarkable. It is also obvious that image preprocessing is crucial to avoid wrong displacement estimates due to the hot pixels and cross-talk between frames, which results in an underestimation of the displacement. Image preprocessing was not performed by TCD and regions of zero velocity (see dark spots close to the inlet at the upper part) can be seen. Also in the case of MicroVec, where only smoothing was applied, artifacts from the hot pixels can be clearly seen. All the other participants used a background removal by subtraction of the mean, median or minimum image and additional smoothing which works reasonably fine. Dantec, DLR, IOT, LANL, LaVision, TU Delft and UniNa used special image treatments to reduce the effect of the depth of correlation. The underestimation of the center line displacement can be minimized by this treatment as shown by the slightly higher mean displacements in x-direction (indicated in the histograms in the third column in Figs. 5 and 6) in comparison with the teams which used only image smoothing.
The mean fluctuation flow fields for the x-direction are displayed in the second column of Figs. 5 and 6. As indicated in Table 5, some teams (LANL, UniG, UniNa) did not provide fluctuation fields. Compared with the mean displacement fields, much larger differences can be observed in the fields. The differences clearly illustrate the sensitivity of the velocity measurement on the evaluation approach  Mean displacement and mean rms displacement in x-direction (first and second columns) and histograms of the displacement (third column) for evaluation 1 and the need for reliable measures for the uncertainty quantification. In general, the fluctuations in x-and y-direction are in the same order of magnitude. Turbulence in microchannels is usually difficult to achieve, even if the Reynolds number based on the width of the small channel and the velocity of O(200) m/s is Re > 16,000 and thus above the critical Reynolds number. Therefore, it has to be kept in mind that this is not a fully developed channel flow. However, due to the temporal cavitation at the inlet, which also has an effect on the mean velocity in this area because of the blockage, fluctuations are expected in the inlet region of the channel. During the time when cavitation occurs, the velocity in that region is almost zero (although no tracer particles enter the cavitation), whereas it is about 40-50 pixels in cases where no cavitation is present. Also close to the channel walls, strong fluctuations are expected because of the finite size of the particle images. This is obvious because at a certain wall distance particles from below and above, which travel with a strongly different velocity, contribute to the signal. Consequently, virtual fluctuations become visible, caused by seeding inhomogeneities. The same effects appear in the thin shear layers behind the outlet of the channel. This is confirmed in the results where in the small channel and in the side regions of the jet strong Mean displacement and mean rms displacement in x-direction (first and second columns) and histograms of the displacement (third column) for evaluation 1 fluctuations can be found. If the mean rms displacement value for the different measurements is divided by the mean displacement, fluctuations of 20-30 % are reached. The majority of the teams found the largest rms values at the inlet region where the cavitation bubble forms and in the free jet region where the shear layers develop. The mean rms levels for DX rms are between 2.06 pixels (INSEAN) and 3.26 pixels (IOT). The values for TCD (5.05 pixels) are much higher than the values for the other teams and should therefore be taken with care.
Probably, artifacts due to the data processing close to the wall are leading to the high rms levels in the rounded corners shown by IOT, TCD, TSI. TUD, UniMe and the unphysical high rms levels at the walls of the small channel, which were not measured by the other participants. In addition, the data of TCD show very large rms values close to all walls, which is caused by the wall treatment. Moreover, the mean displacement profiles of TCD show outliers and the large rms values away from the walls are caused by these outliers.
LaVision applied a global histogram filter, only allowing fluctuation levels that do not exceed ±10 pixels from the reference values, the rms values therefore showed a mean value of only 2.16 pixels for the x-direction. IOT obtained the fluctuations by evaluating the shape of the ensemble correlation function following the approach of Scharnowski et al. (2012), which shows often a bit higher values than vector processing since no additional smoothing is applied. The same trend as just described holds also true for the rms displacement in y-direction.
The last column of Figs. 5 and 6 reveals the probability density functions (pdfs) for the displacement in x-and y-direction. The velocity pdfs are in good agreement among the different teams as small deviations cannot be resolved using pdf distributions. For DX a very broad peak around ≈ −2 pixels results from the large recirculation region. A second broad peak at ≈8 pixels stems from the mean channel flow. The very high velocities in the small channel do not result in a single peak, but show an almost constant contribution in the range of large displacements up to 40 pixels. Very large peaks of a single preferred velocity can be seen in the pdfs of INSEAN, MicroVec, UniNa and UniMe at zero velocity which result from cross-talk or hot pixels. Additional strong peaks for distinct displacement are visible for LANL and UniG in the low velocity region. The data of DLR also show a small broader peak at ≈35 pixels, which was not found by the other teams. The pdfs for the other teams are quite smooth and show the biggest differences in the gap between the first and the second broad peak.
The pdf of DY is centered around slightly negative values for the displacement, and again the agreement between the teams is very good. However, MicroVec and TSI show significantly larger values at zero displacement. The pdf for TCD is centered around zero.

Evaluation 2
For evaluation 2, the participants were free to chose the appropriate window size. All teams have used the same preprocessing (if applied) as in evaluation 1. About one quarter of the teams (Dantec, MicroVec, TUD, UniG) have chosen 32 × 32 pixel windows and the same processing parameters as in evaluation 1; therefore, the same data will be used for comparison.
Some teams only changed the window size. Namely, INSEAN, LaVision and UniMe lowered the final window size to 16 × 16 pixels. UniMe did not use the Savitzki-Golay filter and UniNa set the final window size to 11 × 11 pixels. For TCD the final window size was 128 × 32 pixels on a 32 × 16 pixel grid. The evaluation parameters, as far as they differ from evaluation 1, are described in the following and listed in Table 5.
The DLR team used a predictor field, computed using the pyramid approach of evaluation 1 stopping at a sampling size of 32 × 32 pixels on a grid with 8 × 8 pixel spacing. Each image pair was then processed individually by first subjecting it to full image deformation based on the predictor field. Then a pyramid scheme was used, starting at an initial window size of 64 × 64 pixels, which limits the maximum displacement variations to about ±20 pixels. The final window size was 24 × 24 pixels at a grid distance of 8 × 8 pixel spacing which was subsequently up-sampled to the requested finer grid of 2 × 2 pixels.
IOT employed an ensemble correlation with 2 × 2 pixel windows. The search area was 128 × 64 pixels. The correlation function was multiplied by four neighbors and then the peaks were preprocessed and a Gaussian fit was applied to determine the peak position for the mean displacement and rms displacements from the shape of the correlation peaks (Scharnowski et al. 2012).
LANL used for evaluation 2 a PTV method using a multi-parametric tracking (Cardwell et al. 2010). Particle images were identified using a dynamic thresholding method, which allows for the detection of dimmer particles in close proximity to brighter ones. Their position was determined using a least squares Gaussian estimation with subpixel accuracy (Brady et al. 2009). The multiparametric matching allows for the particles to be matched using not only their position but also a weighted contribution of their size and intensity between image pairs. To further increase the match probability, the particle search locations were preconditioned by the velocity field determined using an ensemble PIV correlation. The PTV data were then interpolated (search windows of 32 × 32 pixels) with inverse distance weighting to a 2 × 2 pixel grid. When three or less vectors are present in the corresponding window, the window size was increased by 50 %.
LaVision subdivided the images with respect to the modes with and without the cavitation bubble present at the entrance. By this procedure, the large fluctuations due to the velocity change at the wall should be completely avoided but also the effect of the cavitation on the mean flow field due to the contraction. For comparison to the other teams, both data sets were averaged, weighted by the number of images that belong to them. TSI used correlation averaging instead of individual correlation planes.
The mean flow fields for the x-component are shown in Figs. 7 and 8 in the first column. For TCD the amount of outliers was reduced, because of the enlargement of the window in x-direction. The mean displacement field for evaluation 2 of TCD looks much smoother now, although artifacts due to hot pixels can still be seen. The difference for TSI's results is that correlation averaging was used instead of individual correlations, which also tends to lower the velocity estimate by smoothing if normalized. IOT used ensemble correlation of 2 × 2 pixel windows from the beginning, instead of an iterative averaged correlation as did for evaluation 1. This results in a more noisy pattern. In general, the differences to evaluation 1 are not very pronounced in the mean fields. Regarding the velocity profiles, the differences among the participants are larger for evaluation 2 instead for evaluation 1 where processing parameters were fixed. This indicates that in principle all algorithms provide reasonable results, but a great difference is made by the choice of the parameters which are dependent on the users experience and intention.
The rms flow fields are shown in Figs. 7 and 8 in the second column. Please note that TSI and UniNa did not provide fluctuation fields for evaluation 2. Here larger differences are visible from evaluation 1 and evaluation 2 results as expected. For the DLR data, the levels of the rms values are smaller and focus on regions close to the channel walls and in the shear layers of the evolving jet flow. For INSEAN, the mean rms levels are larger, and also stronger signatures of regions of high rms values can be seen. For the data of IOT, very high and very low levels of rms values are visible. The patterns occur to be very noisy, which is probably a result of the smaller window size and thus a noisy correlation peak. Since this correlation peak is analyzed with a Gaussian fit function, the results of this function might show large fluctuations. The levels of the rms value for TCD's results are on the same order as found by the other teams, they are also much smoother than for evaluation 1, which is attributed to the larger window size. For UniMe, the rms values are above the other teams levels and even larger than for evaluation 1. Part of this behavior is probably due to different filtering schemes. This detrimental effect can also be seen in the subpixel rms displacement (not shown).
However, especially for INSEAN, the large value at zero subpixel displacement from evaluation 1 was significantly decreased.
To have a closer look to the results, the displacement along the centerline at y = 620 px is shown for evaluation 2 in Fig. 9. Lines "A" and "B" correspond to the horizontal profiles shown in Figs. 10 and 11, respectively. The gray shaded region indicates the small channel length. The agreement in the acceleration region is quite good for all participants. On the center line, the differences in terms of the standard deviation for DX are 0.81 px for x = 900 and 0.77 px for x = 1300 px, excluding the data of MicroVec. However, close to the small channel and in the inlet region, deviations occur due to the cavitation. On the center line, the data agree again fairly well within the small channel and in the free jet region. In this representation, differences can only be seen in the declaration region, where even the direction of the flow among the participants differs. Interestingly, the data of LANL for evaluation 2 appear smoother, although particle tracking was applied instead of cross-correlation. However, the mean velocities are very similar to slightly higher values for the PTV results, which might be attributed to the reason that usually only brighter in-focus particles are taken into consideration for the tracking (Cierpka et al. 2013).
In Fig. 10, the displacement in x-and y-directions for evaluation 2 is shown for a line with x = 900 px, which is in the channel but far away from the cavitation region. Significant differences can be seen near the walls, which is due to the different wall treatments and the ability of the algorithms to resolve the gradients close to the wall. As discussed above, the width of the channel changes due to the different approaches in the mask generation. If the volume flow rate is determined by an integration of the displacement along the wall-normal direction, differences of up to 28 % between the participants occur. Although the differences on the center line in x-direction are small in this representation, for the y-direction large differences can be seen on the right side of Fig. 10. Even the sign changes for different teams and the magnitude of the error can reach up to 7 px in some places. Figure 11 displays the ability to resolving large gradients the x-displacement after the small channel in the region of the jet. The high-velocity region in the jet was resolved by all participants. Again, in the middle of the channel, the results match very well. With increasing distance from the center line, the gradients show large differences not only in spatial location but also in magnitude. It seems that for example TSI and MicroVec used a considerable higher amount of smoothing which results in a smaller region of high momentum fluid in comparison with the other participants.

Conclusion
The analysis of case A shows that even after more than three decades of evaluation algorithm development the interrogation of 2D2C µPIV measurements is still challenging when strong spatial gradients and large dynamic velocity ranges or solid boundaries are present, and it is evident that a lot of knowledge and experience is necessary to produce reliable and correct results.
-In µPIV usually fluorescent tracer particles are used to separate the signal from the background using optical filters. Thus one may assume that preprocessing is not needed to enhance the signal-to-noise ratio. However, the analysis of case A shows that image preprocessing is important to avoid incorrect displacement estimations caused by typical camera artefacts such as cold and hot pixel or cross-talk between camera frames. The latter appears if the laser pulse separation is comparable to the interframing time of the digital cameras. The influence of gain variations for different pixels is usually lowered using image intensity correction implemented in most camera software. If it can not fully be balanced, the analysis shows that the different background

Fig. 7
Mean displacement and mean rms displacement in x-direction (first and second columns) and histograms of the displacement (third column) for evaluation 2 removal methods applied by the teams work similarly well. However, it is evident from the results provided by MicroVec and TCD that smoothing does not work effectively to eliminate fixed pattern noise.
-It is well known that the precise masking of solid boundaries before evaluating the measurements is a very important task. The masking affects the size of the evaluation domain and thus the region where flow information can be measured. Thus a precise masking is desirable to maximize the flow information. However, the present case shows that this is associated with large uncertainties. The estimated width of the small straight channel varied by more than 20 % among all submissions. This unexpected result illustrates the strong influence of the user. To avoid this user-dependent uncertainty, universal digital masking techniques are desirable to address this serious problem in future measurements.
-Another interesting effect could be observed by comparing the strongly varying velocity profiles across the small channel. Due to the spatial correlation analysis, the flow velocity close to solid boundaries is usually overestimated. However, as the spatial resolution effect was almost identical for all teams in the case of evaluation 1 the variation can be attributed to the special treatment of near-wall flow. While most teams did not use special techniques, variations are mainly caused by the masking procedure. In contrast, UniMe seems to resolve the boundary layer effect much better than the other teams. This was achieved by implementing the no-slip condition in the evaluation approach. Although this model-based approach shows the expected trend of the velocity close to the wall, it is evident that the results are strongly biased by defining the location of the boundaries. Unfortunately, the definition of the boundary location is associated with large uncertain- ties, as already discussed, and therefore, this modelbased approach is associated with large uncertainties. Therefore, it would be more reliable to make use of evaluation techniques which enhance the spatial resolution without making any model assumption such as single-pixel PIV or PTV evaluation techniques (Kähler et al. 2012b). However, as the uncertainty of these evaluation techniques is not always better than spatial correlation approaches, it is recommended to evaluate the measurements in a zonal-like fashion as done in numerical flow simulations, where RANS simulations are coupled with LES or even DNS simulations to resolve the flow unsteadiness at relevant locations. The zonallike evaluation approach minimizes the global uncertainty of the flow field if the near-wall region is evaluated using PTV (to avoid uncertainties due to spatial filtering or zero velocity assumptions at the wall) while at larger wall distances spatial correlation approaches are used (because the noise can be effectively suppressed using statistical methods).
-Furthermore, the precise estimation of the mean velocity is very important for the calculation of higher-order moments. The strong variation of the rms values illustrates that the uncertainty quantification is very important because the differences, visible in the results, deviate much more than expected from general PIV uncertainty assumptions. Here future work is required to quantify the reliability of a PIV measurement. -Since in microscopic devices the flow fields show large in-plane and out-of-plane gradients and are inherently three-dimensional, it is beneficial to use techniques that allow a reconstruction of all three components of the velocity vector in a volume .
6 Case B

Case description
Case B of the 4th International PIV Challenge deals with a time-resolved image sequence. The data were measured in a water tunnel at the Technical University Munich within the EU project AFDAR (Advanced Flow Diagnostics for Aeronautical Research), see also Schröder et al. (2015) and Kähler et al. (2016) for more details.
The test section of this facility is adopted to ERCOFTAC test case 81 "Flow over periodic hills" (see Fig. 12). For more details see ERCOFTAC QNET-CFD forum under case study UFR 3-30. and Rapp and Manhart (2011). The test section has 10 hills, and the measurements were performed downstream of the seventh hill summit as the flow is almost periodic at this location, e.g., the averaged flow quantities coincide half way between the neighboring hills. The seeding particles (glass hollow spheres with a mean diameter of 10 µm) were illuminated by a Spectra Physics Millennia Nd:YAG cw laser with a power of 5 W @ 532 nm . A Phantom v12 CMOS camera with a pixel size of 20 × 20 µm 2 at a recording rate of 2000 Hz was used for image acquisition. In total, a sequence of 1044 single-frame single-pulse images with a resolution of 1280 × 800 px each was provided to the teams. An example image with the corresponding histogram (100 images have been considered) of the gray value distribution is shown in Fig. 13. As typically observed for this kind of high-speed image sequences, the SNR is rather low and the particle image diameter is small due to the large pixel size, see . At the bottom and top wall, laser light reflections occur which complicate the evaluation near the wall. Furthermore, it can be seen in this figure that the particle image intensities depend on the x-location which is influenced by the laser light intensity distribution.
With the experimental setup described, the question arises, which scales can be resolved. With a magnification factor of 0.209 mm/px, an interrogation window of 32 × 32 px has a size of 6.7 × 6.7 mm 2 in physical space. An estimation of the Kolmogorov microscales assuming u = 0.5 m/s, L = 0.15 m and ν = 1 · 10 −6 m 2 /s gives the following estimations. The rate of energy dissipation ǫ per mass unit is This leads to a Kolmogorov length scale of and to a Kolmogorov timescale τ η of ( The Taylor scale g is obtained by The Kolmogorov length scale cannot be resolved using the experimental setup. This is typical for most of the PIV measurements since the field of view is usually relatively large in order to observe the large-scale flow structures. However, the spatial resolution is in the order of the Taylor scale which means that most of the vortices which occur in the turbulent spectrum can be resolved.
The specific challenges of case B are as follows: • Low signal strength and signal-to-noise ratio • Small particle image size due to the large pixel size of the camera • Large dynamic velocity range (DVR) • Large turbulent intensities • Strong out-of-plane motion due to 3D effects (turbulence, separation) • Laser light reflections at the walls.

Evaluation of case B
The participants had to perform two different evaluations. Evaluation 1 is a standard PIV evaluation, and evaluation 2 is more advanced.

Evaluation 1
For evaluation 1, the participants were forced to use the images B[i − 1].tif and B[i + 1].tif for computing the displacement fields at time steps [ i ], with i = 11 . . . 1034 . The displacements obtained from this evaluation are divided by 2 in order to obtain the displacement from one time step to the next time step. The final interrogation window size was 32 × 32 px with 93.75 % overlap (2 px vector spacing). Techniques such as multi-pass, window weighting, image deformation and masking were allowed.

Evaluation 2
For evaluation 2, the teams were free to choose any images to obtain the displacement fields at time steps [ i ], with i = 11 . . . 1034. Techniques such as multi-pass, window weighting, image deformation and masking were allowed, and the teams had to select an optimal interrogation window size based on their experience and attitude. However, the vector grid was specified by a spacing of 2 px to allow for a comparison with evaluation 1. The participants were also encouraged to apply advanced multi-frame evaluation techniques, as first proposed in . Within the third International PIV Challenge (Stanislas et al. 2008), it was already shown that multi-frame approaches are able to reduce the uncertainty strongly.

Participants and methods
The participants listed in Tables 7 and 8 contributed to case B. A short description of the various methods applied is also given in the tables. As can be seen, most of the

Results
In order to compare the evaluations of the teams, different plots are shown in the following. At first, instantaneous flow fields are presented. Afterward, PDFs of the displacements are shown in order to investigate the peak-locking effect and to see the differences between evaluation 1 and 2. Line plots of mean velocities and rms are shown Dantec, eval2 The estimated displacements at the upper wall show some differences. The high gradient does not seem to be captured in a meaningful physical sense. Depending on the mask and boundary condition, the displacements drop down to 0 at a location near to the wall. More details concerning the near-wall gradients are displayed in later line plots.
In comparison with the displacement component dx of evaluation 1, the displacement component dx of evaluation 2 is smoother in most of the evaluations, see Fig. 15. This is a result of the noise reduction due to the advanced evaluation techniques which also will be visible in the results shown in the following.
Histograms of the displacements dx are shown in Figs. 16 and 17.
It has to keep in mind that for, e.g., evaluation 1 the displacements obtained from the correlation of two images with a temporal distance of 2 · t has been divided by 2 in order to get the virtual displacement from one time step to the next one. Therefore, peak locking is observed in these figures for some teams at locations x = i · 0.5 px, with i ∈ Z. Concerning the peak locking, there is a strong difference between the participants. While some curves show nearly no peak locking, some other have a severe peak locking for evaluation 1 as well as evaluation 2. However, for many teams, the peak locking is strongly reduced or even vanishes by using the advanced evaluation technique. This results in an increased accuracy. The optical flow evaluation of INSEAN agrees well with the cross-correlation evaluations of, e.g., Dantec and DLR.
To evaluate statistical properties, line plots of the displacements in x-direction, dx, are shown. The plots for the displacement dy in y-direction do not differ much so that they are not presented here.
In order to see whether there are differences between the participants regarding the statistical values, the mean displacements of dx along the line x = 100 px are shown in Figs. 18 and 19 for evaluation 1 and evaluation 2, respectively. The dashed lines in these figures visualize the location of the walls. The hill is located at y = 271 px and the upper wall at y = 764 px. The figures indicate that the mean values agree well between the participating teams, in contrast to case A. Only slight differences are observed near the wall. One important reason for these differences are the masks which have been applied. The comparison between evaluation 1 and evaluation 2 also shows only small differences. Due to the averaging process, stochastic errors do not play a role and no systematic errors occur. This result is quite encouraging as it means that all methods shown here are suited to obtain mean displacement fields with comparable accuracy. The rms of the displacements dx along the line x = 100 px is shown in Figs. 20 and 21. Compared with the mean displacement fields, the scatter of the data is higher. However, many teams provided quite similar results. Looking at subplot 1 in Figs. 20 and 21, it can be seen that the scatter between the teams for evaluation 2 is smaller than the scatter for evaluation 1 (not considering the IOT-PTV evaluation). The conclusion can be drawn that a proper evaluation by means of an advanced evaluation technique reduces the random error. However, not every advanced technique is able to do this, as can bee seen when comparing the subplots 3 in Figs. 20 and 21. In these cases, the scatter between the teams for evaluation 2 is larger than the scatter for evaluation 1, showing that the effect of the user is larger than the differences due to the specific evaluation techniques applied here.
A temporal Fourier transform has been performed in order to determine the frequency content in the evaluated signals. At every vector location 1024 time steps are available which have been acquired with an frequency of 2000 Hz . Such a temporal signal of, e.g., the displacement dx was used to perform a FFT. An average of the resulting amplitude spectrum was computed over the region x = 300 . . . 700 px; y = 500 . . . 700 px (20301 vectors/FFTs in total). The resulting mean FFTs are shown in Figs. 22 and 23 for evaluation 1 and evaluation 2, respectively.
For evaluation 1, the curves show a similar behavior, except for IOT. In the postprocessing, the teams performed a temporal filtering which leads to the suppression of high-frequency fluctuations. Comparing the results of the other participants shows that even at high frequencies the amplitude of dx does not significantly decrease. This is expected as these high-frequency fluctuations are caused by the noise. However, from the physical point of view such high frequency should not occur, cf. the discussion on the Kolmogorov scales in Sect. 6.1 and considering the averaging effect due to the final interrogation window size. The curves of evaluation 2 show a decreasing amplitude with increasing frequency for some teams who apply sophisticated evaluation techniques, which means that the measurement accuracy is increased. Typically the high-frequency oscillations are damped by locally increasing the particle image displacement and thus reducing the relative measurement error.
In order to determine the spatial frequency content in the evaluated signals, a spatial Fourier transform was performed at location x = 572 px to calculate the amplitude spectrum. An average over the available 1024 time steps was computed in order to obtain a smooth spectrum. The resulting plots are shown in Figs. 24 and 25 for evaluation 1 and evaluation 2, respectively.
Using top-hat quadratic interrogation windows, the locations of the roots in these plots are theoretically determined from the sinc function which is the Fourier transform of the rectangular function. This leads to root locations at frequencies f = n IW with n ∈ {1, 2, 3, . . .} and IW being the size of the interrogation window in pixels. For many teams, strongly decreased amplitudes are observed at these frequencies. However, this is not the case for all teams for different reasons. One of these reasons is, using window weighting which decreases the mentioned effect. A second reason is the method of postprocessing which was applied by the participants. Depending on the kind of postprocessing, a filtering/ smoothing effect is obtained in this step which leads to a modification of the spatial frequency spectrum. A third reason may be that some participants did not perform the evaluation on a grid with size 2 × 2 px. Rather they performed the evaluation on a coarser grid and interpolated the data on the fine grid. However, this effect should be negligible since as least the low spatial frequencies are not affected by this method.

Conclusions
The results of case B lead to the following conclusions: -The discussion shows that results from evaluation 1 are quite consistent, but the results from evaluation 2 show significant differences, in particular when the rms fields are considered. This implies that the results depend significantly on the user and his/her experience and intention. This holds in particular true for the parameters of the preprocessing, the evaluation as well as the postprocessing. Thus it is dangerous to use PIV as black box because the application of the technique still requires substantial expert knowledge and experience. -The analysis also shows that the uncertainty of the technique can be greatly reduced by making use of multiframe evaluation techniques because they evaluate the temporal history of particle image trajectories. However, in order to benefit from the temporal analysis of the particle pattern, different aspects in an sophisticated evaluation method have to be considered. In the case of strongly oversampled image sequences, as provided here, a damping of the amplitudes at higher frequencies is acceptable from the physical point of view. However, caution is advised not to suppress physically relevant frequencies. A smooth field in space and time does not necessarily mean a high evaluation accuracy. -Another strong advantage of multi-frame evaluation techniques is the reduction in bias errors due to the peak-locking effect. How a displacement at a certain location is obtained by a sophisticated evaluation method determines the level of the peak-locking reduction. Choosing an optimal temporal separation between correlated images at a certain location may lead to an increased relative measurement accuracy and will Fig. 25 Mean spatial amplitude spectrum of dx at x = 572 px for evaluation 2 reduce peak locking. However, if a displacement at a certain location is determined not only from a single correlation but from a kind of (weighted) average from many correlations with different temporal separations, the peak-locking effect can be averaged out in principle. Finally, the multi-frame analysis makes it possible to compensate also acceleration and curvature effects, which is important for highly accurate measurements in strongly unsteady flows.

Case description
In the 3D scenario, the spatial resolution is notoriously difficult to be predicted, as it depends on several parameters. The particles concentration and the reliability of the algorithm for the reconstruction of the 3D particles distribution play a key role. Generally, a compromise is required between dense particles distributions and sufficiently accurate reconstruction. To date, the trade-off is left to the personal experience of the experimenter. The purpose of the Test case C is to assess the performances of the reconstruction and processing algorithms in terms of final spatial resolution and accuracy. The test case is performed on synthetic images of a complex flow field. The optimization of the particle image density in terms of maximum spatial resolution with minimum loss of accuracy is the main challenge of this test case.
The particles are randomly located in a 48 × 64 × 16 mm 3 region. Particles are also generated outside of the measurement region (in a volume extended by 24 and 32 mm on both sides on the x and y direction, respectively) in order to reproduce the experimental conditions of a laser slab of 16 mm thickness illuminating the measurement region. A small displacement is imposed to the outer particles in order to avoid trivial suppression by images subtraction. Projected images of 4 cameras (1280 × 1600 pixels at 16-bit discretization; pixel pitch 6.5 µm) are provided with image density ranging between 0.01 ppp (particles per pixel) and 0.15 ppp. Sample images for low, medium and large density are reported in Fig. 26. The cameras are angled at ≈±35 • in both directions (deviations from these values are due to accommodation of the Scheimpflug angles while maintaining the imaged volume centered in the images). The magnification in the origin of the reference system is about 0.15 for all the cameras and images are generated with f # = 11, thus resulting in a particle image diameter of about 2.5 pixels.
The projection images in this set are not contaminated by noise. Perfect calibration images (as well as the exact correspondence between space and image coordinates of the calibration markers in form of a lookup table) are provided. Since the calibration can be achieved with very small residual error, the effects of calibration uncertainty (vibrations or thermal expansion in the mechanical system for target displacement, loose connections, etc.) are not included in the analysis. Removing the influence of preprocessing on image quality restoring and of the effectiveness of the self-calibration (Wieneke 2008), in correcting calibration uncertainties within the camera set, might appear as an oversimplification with respect to the real experimental environment. This line has been a necessary choice given the early stage of volumetric PIV and the novelty of fully three-dimensional three-component test cases in the PIV Challenge. The interpretation of the results greatly benefits from this simplification. Effects of image corruption (distortion, background noise, reflections, etc.), calibration uncertainties and more realistic experimental conditions might be the object of test cases in future challenges.
The flow field is composed of a combination of 3D vortices with different wavelengths and amplitudes, a periodic pseudo-jet and step changes on velocity components. With this compact benchmark, it is possible to evaluate simultaneously the impulsive response of different processing algorithms, the modulation effects of ghost particles on the spatial resolution and the random errors induced by poor reconstruction or PIV processing. The details of the pieces composing the measurement volume are reported below: 1. 3D vortices are obtained as a combination of sines and cosines in the three directions, with wavelengths varying between 1.6 and 16 mm, corresponding to 32 and 320 voxels at 20 vox/mm. The amplitude is about 1 voxel and varies with the wavelength and the seeding density. 2. A periodic pseudo-jet with sinusoidal profile is obtained with a combination of shortwaves, varying between 8 and 1 mm (160 and 20 voxels at 20 vox/mm) in y and z, and longwaves (along x) cosines. The maximum amplitude is about 2 voxels and varies with the wavelength and the seeding density. 3.
Step variations of the velocity fields are a consolidated test to obtain the impulsive response of the processing algorithm. Here it is expected that the PIV processing influences mainly the slope of the filtered stepwise variation, while the quality of the reconstruction determines the upper and lower values of the step. A poor reconstruction will be more affected by ghost particles, thus resulting in smoother velocity gradients. 4. A marker is included to identify the image density chosen by the participant. The marker consists in a region of constant displacement, which has been easily caught by all the participants.

Algorithms
A total of 13 contributors participated to test case C. The algorithms can be decomposed in 4 steps, i.e., optical calibration, reconstruction of the particles distribution, displacement field estimate and postprocessing. In the two central sections of the processing chain, the algorithms are quite scattered (Tomographic PIV, PTV, surface segmentation, etc.) and in some cases there is cross-talk between the steps (Motion Tracking Enhancement MTE, Shakethe-box StB, etc.). In Table 9, the algorithms used by the participants are briefly outlined, with very little pretense of being exhaustive. For further information, the reader should refer to the references listed at the end of the paper. In the attempt to isolate the effects of the displacement estimate from those of the reconstruction process, an additional participant (Hacker) has been included. The Hacker team analyzed the exact particle distributions with a traditional iterative volume deformation algorithm, as outlined below. The adopted process is of simple interpretation, thus leading to quite straightforward extraction of the effects of the PIV analysis and of the reconstruction from the processing chain. The submission of University of Melbourne was considered withdrawn for both cases C and D since the data processing was affected by an error in handling the calibration files. The team TSI requested to not disclose their results for both cases C and D after the final results presentation in Lisbon.

Reconstruction
In order to have a fair comparison of the reconstruction algorithms, the participants have been asked to submit a reconstructed volume at a reference image density of 0.075 ppp. From this point on, unless otherwise stated, the reference resolution will be 20 vox/mm. The reconstruction has been performed by the participants on a larger volume; the depth of the volume is increased from 16 to 24 mm (320-480 voxels at 20 vox/mm). Particles positions in the reconstructed volumes are determined by identification of local maxima on 3 × 3 × 3 voxel kernels. A sub-voxel accuracy position estimate is obtained by Gaussian interpolation on a 3-point stencil along the three spatial directions. A particle is identified as "true particle" if located at a distance smaller than 2 voxels (i.e., 0.1 mm at 20 vox/mm) from a particle of the exact distributions.  The quality of the reconstruction is quantified with the following metrics: -Number of true and ghost particles (N T , N G ); -Quality factor, as defined in Elsinga et al. (2006b), i.e., the correlation factor between the true and the reconstructed intensity distributions, as defined in Eq. 5: where I ijk and I e ijk are the intensity values of the reconstructed and the exact distributions, respectively. Since each participant used its own grid (different voxel origin, resolution, etc.), in order to evaluate Q all the reconstructed volumes are interpolated on a common grid by using third-order spline interpolation; -Mean intensity of true and ghost particles ( I T , I G ); -Weighted levels WL and power ratio PR, as defined in Eqs. 6 and 7: The number of true and ghost particles is reported in Fig. 27 for the full volume and for the central part, obtained by cutting 300 voxels in the x and y directions from the borders. The participants are sorted by increasing percentage of ghost particles in the full volume. For all the participants except LaVision and IoT, the particles are directly identified on the provided volumes. Since the volumes submitted by LaVision and IoT were saturated on the top intensity level, it was not possible to perform the same operation on the original volumes. For this reason, the analysis is conducted on the interpolated volumes used also for the Q factor calculation. The values are expressed in percentage of the total number of true particles in the exact distribution.
In some of the provided reconstructions, there is a significant percentage of true particles which is lost in the process (see for instance ASU, LANL). This might be addressed by different reasons. ASU uses a direct method which tends to lose edge in case of overlapping particles. LANL uses the same direct method, but in addition particles are removed to match the image density. This procedure could be dangerous since the intensity distributions of true and ghost particles might not be statistically separated, and thus the weakest true particles might be erroneously removed. Additionally, note that the highest values of Q are achieved by the participants using a Gaussian smoothing during the reconstruction process, as proposed by Discetti et al. (2013). While this smoothing is not expected to change significantly the number of ghost and true particles, it regularizes the solution and redistributes the intensity from the reconstruction artifacts to the true particles, thus improving the quality of the reconstruction. In almost all the cases, the number of ghost particles is larger than that of the true particles, with the exceptions of DLR and LaVision. There is little dependence on the used algorithm. For instance, ASU, which is using a direct method for the reconstruction, has nearly the same number of ghost particles of TUD, which is instead adopting an advanced algorithm with MTE-MART, and of BUAA, which is performing the reconstruction with a single-frame implementation of a MART-based method. It is remarkable, on the other hand, the difference between LaVision and IoT, which are using very similar processes for the reconstruction, the main differences being the adopted resolution (40 and 20 vox/mm, respectively) and the smoothing applied by LaVision between the iterations. However, since LaVision and IoT are both using MTE, there is an influence Fig. 27 Percentage of true (left) and ghost particles (right) for the full volume and for the central part obtained cutting the borders of the quality of the PIV interrogation on the reconstruction itself, which cannot be easily isolated from the provided volumes.
The differences between the algorithms are more evident if the intensity of true and ghost particles is taken into account, as in the weighed levels or in the power ratio (see Fig. 28). These parameters are extremely relevant, as they indicate the relative importance of real and ghost particles in determining the estimated velocity field. In particular, the power ratio expresses in a synthetic form the "crosscorrelation power" of the particles within the reconstructed volumes. Consequently, a high power ratio can be obtained with a small number of ghost particles, or with weak intensity ghost particles (when compared with the true ones), or both. If only the central portion of the volume is considered (i.e., the red columns in the histograms of Fig. 28), algebraic methods are superior in this sense with respect to direct methods. On the other hand, algebraic methods are more prone to border effects, as testified by the drop in weighed intensity levels and power ratio for BUAA, IPP, LaVision, TUD when considering the full volume. Very good results are also obtained by the StB method (DLR), which is an hybrid between PTV and Tomographic PIV. While traditional triangulation methods are severely challenged for image density larger than 0.005 ppp (Maas et al. 1993), the combination of 3D particles matching with iterative intensity-based correction can easily handle large image densities.
The quality factor evaluated according to Eq. 5 is reported in Fig. 29. Participants using PTV-based methods (DLR) cannot be included. With only the exceptions of LaVision and ONERA, all the participants provided reconstructions with Q < 0.75, which is commonly considered as the minimum value for an acceptable reconstruction. It is worth to stress that Q is evaluated on the volumes at 0.075 ppp, while some of the participants decided to use a lower image density to extract the velocity fields (ASU, IPP); thus, the quality factor might be larger. Again, algebraic methods suffer due to border effects while direct methods are relatively insensitive to this issue.
The comparison of the results in Figs. 27, 28, 29 poses a shade of doubt on the effectiveness of the quality factor in describing the accuracy of the reconstruction. For instance, Dantec has larger quality factor than TUD, but at the same time higher number of ghost particles and lower power ratio. Similarly, LANL has lower number of ghost particles than ASU, similar power ratio but lower Q; this might be related to true particles suppression in the LANL reconstruction algorithm. For this reason, when performing numerical assessment of reconstruction methods, the Q factor might be appealing for its intrinsic simplicity, but caution should be taken at the time of drawing conclusions.
In Fig. 30, more insightful information on the reconstruction is reported for the teams ASU and TUD. The Quality factor (as defined in Eq. 5) for the full volume and for the central part obtained cutting the borders comparison is particularly meaningful, since ASU used a direct method for the reconstruction, while TUD used an algebraic method with the motion tracking enhancement. On the left column, a 120 × 120 voxels slice (obtained by summing 20 adjacent planes in the center of the volume) is reported, with the reconstructed particles in gray levels and the true particles superimposed with green circles. The particles reconstructed by TUD appear sharper and better centered; the smoother particles shape in ASU's results is due to the geometrical limitations of direct MLOS-like methods. The profiles obtained by summing the intensity along x − y-planes in the reconstructed volumes (normalized with the relative maximum value of the profile itself) are reported in the central column. A significant difference between direct and algebraic methods is that in the second case the intensity profile tends to drop in correspondence of the limits of the illuminated volume, while it increases again toward the boundaries. This peculiar feature might be addressed by the update mechanism of the iterative MART, which tends to accumulate ghost intensity in regions that are observed only by a subset of the cameras, or where the viewing angle becomes larger, thus leading to larger probability of particles overlapping and higher local image density (e.g., the volume corners). Finally, in the right column the histograms of ghost, true and exact particles distributions are reported (normalized with the mode of the intensity of the exact particles). While the number of ghost particles is similar (see Fig. 27), the intensity distribution of the true particles reconstructed in TUD is skewed toward larger intensity values, thus leading to the conclusions that iterative algebraic methods act significantly on the particles intensity but very weakly on the ghost particles number.

Displacement field
The displacement fields are interrogated in order to extract the following information: -3D modulation transfer function (MTF) as a function of discrete wavelengths vort in the regions of the 3D vortices; -2D MTF as a function of the wavelength jet in the region of the sinusoidal jet; -Cutoff wavelengths where the MTFs drop below 0.8; -Total, random and systematic errors (respectively indicated with δ, σ, β) -Wavelengths at which the total error becomes larger than 0.25 voxels at 20 vox/mm; -Profile of the response to the step function: step amplitude and equivalent wavelength corresponding to the slope of the detected step variation.
The total error is illustrated in Fig. 31 on x − z-slices, cutting 4 significant regions of the displacement field: sinusoidal jet; small-scale 3D vortices and marker; large-scale 3D vortices; step functions. Most of the participants using algebraic methods (IPP, LaVision, ONERA, TUD) and PTV-based approaches (DLR, ONERA-PTV) are able to capture the large wavelengths region of the sinusoidal jet (on the right side of the relative slices). Almost all the participants are able to capture the largest 3D vortex in the center of the third slice, while errors become larger as the scales get smaller due to spatial resolution limitations. This effect seems to be less severe for participants enjoying higher quality of the reconstruction (see for instance DLR, IPP, LaVision, ONERA, TUD).

3D vortices
The response function is evaluated in terms of MTF, i.e., as the ratio between the measured amplitude and the true one (it is supposed that along each direction the MTF is the same and therefore an average is reported). The results are shown in Fig. 32 as a function of the non-dimensional spatial frequency ω = W / , where W is the reference interrogation volume side of 32 voxels used by the Hacker team. The horizontal black line (cutoff line) indicates the limit at which the MTF drops below 0.8; the cutoff wavelength is extracted from the intersection point between the cutoff line and the MTF curve (obtained with cubic spline interpolation of the discrete evaluation data). For participants whose MTF is always below 0.8, a cutoff wavelength of 320 voxels is set by default. The MTF of Hacker, which is operating on the exact particles distributions, follows with quite high fidelity a sinc function shape, as expected considering that the PIV process is performed without applying weighting windows. For all the participants, the MTF drops down monotonically toward 0 at ω = 1, with the exception of the MTF of BUAA which is characterized by quite large oscillations. The best performances are achieved by DLR, IPP, LaVision, TUD, whose MTF is close to unity for a quite extended set of wavelengths. A similar analysis can be performed on the total and the systematic error, as in Fig. 33 where the errors are plotted as a function of the dimensional wavelength. In this case, the limit to estimate the cutoff wavelength is fixed at 0.25 voxels. Similarly to the case of the MTF, for participants whose errors are always above the threshold a cutoff wavelength of 320 voxels is set by default. The systematic error is directly linked to the MTF, while the total error includes the effects of random errors, and thus the difference between them is linked to the random errors. At large wavelengths, the systematic error should go to zero, and this is true for most of the participants. Only two teams (DLR and LaVision) have a smaller total error at relatively large wavelengths ( > 100) than the Hacker team, and this effect is connected to the small level of ghost particles in their reconstructions.
The cutoff wavelengths obtained by analyzing the MTF, the total and the systematic error are reported in Fig. 34. It is interesting to compare the obtained wavelengths with the interrogation volumes size used by the participants (see Table 10). The effective interrogation volume dimension is not directly linked to the interrogation volume size: For instance, the best results are obtained by DLR, IPP, LaVision, ONERA and TUD which used interrogation volumes size ranging between 24 and 47 voxels. Results from participants with smaller interrogation volume size are characterized by significantly larger cutoff wavelengths (thus lower spatial resolution). Another interpretative view of the histograms in Fig. 34 resides in observing the difference between the cutoff wavelengths corresponding to the systematic and total errors, which is related to the random error value. While IPP and ONERA reach levels of spatial resolution similar to the ones of DLR, LaVision and TUD, the cutoff wavelength computed on the total error is substantially larger, thus highlighting a more intense contamination of the data with random errors. This is particularly meaningful for IPP, which is using the largest interrogation volume size of the set of participants; a larger interrogation volume does not lead necessarily to lower levels of random error (other factors might play a role, such as the quality of the reconstruction and the shape of the reconstructed particles, the intermediate predictor/corrector filtering in the PIV interrogation process (Astarita 2007), etc.). It is worth noticing that, even if analyzing the exact volumes, the Hacker team does not provide the best data of the set. However, as expected, the cutoff wavelengths of the total and systematic errors are similar, thus showing that the dominant part of the error is due to the chosen approach that results in a relatively limited spatial resolution. Concluding, in order to achieve the best performances an accurate reconstruction is important, but at the same time quite some edge can be obtained with careful PIV processing.

Jet
In the left region of Fig. 31, the displacement field is that of a sinusoidal jet with wavelength and amplitude varying along the streamwise direction (x in the reference system of the calibration). Similarly to the 3D vortices case, the ratio of the measured and exact amplitude can be interpreted as a measure of the MTF. The results are reported in Fig. 35 as a function of the spatial frequency W / (where W is again the reference volume side of 32 voxels used by the Hacker). When compared with Fig. 32, the results are significantly noisier. The reason is to be sought in the algorithm for the MTF calculations, which is obtained by optimization over the entire block (320 × 320 × 320 voxels) of periodic 3D vortices in one case, and over y − z-slices of 320 × 320 voxels in the jet case. For almost all the participants, the MTF is consistently lower than the test case of the 3D vortices; this might be also due to the position of the block containing the jet region, which is located on the top boundary of the volume. Consequently, the algorithm suffers due to border effects of the tomographic reconstruction. Hybrid PTV-algebraic methods seem to be less affected by this issue (see, for instance, DLR and ONERA-PTV, whose MTF is substantially unchanged with respect to the previous case). The teams that obtained a cutoff wavelength smaller than the Hacker for the 3D vortices (DLR, IPP, LaVision, ONERA, TUD) confirm their good performance.
In Fig. 36, the cutoff wavelengths obtained from the analysis of the MTF and the total and the systematic error (with cutoff at 0.25 voxels) are reported. In contrast to the 3D vortices case, in which for most participants the systematic error was dominating only at the smallest wavelengths, β is the overwhelming component of the error over the entire set. The most significant exceptions are provided by DLR, IPP, ONERA and TUD, in which the effect of the random error on the cutoff wavelength shift is relevant. The best performances are obtained by teams using the MTE in the reconstruction step (LaVision, TUD) and the hybrid Tomo-PTV approach of DLR, which in a sense enjoys both the exposures to improve the quality of the reconstruction. However, the traditional single-frame reconstruction approach still obtains good results when a careful PIV processing is performed (see IPP, ONERA). Surprisingly, ONERA-PTV is able to reach a satisfactory spatial resolution despite of using a lower image density (0.03 ppp). Indeed, even if analyzing volumes with roughly one half of the particles, the mean distance between the particles increases only of 26%. Evidently, while the community is pushing toward larger image density, moving to lower image density is still advantageous in terms of reconstruction quality gain.

Step function
Step variations of the u and w component are imposed, with an amplitude of the order of 1 voxel and slightly depending on the image density. This test enables to evaluate the combination of the bias effect of the ghost particles (which is expected to smear out the step difference) and of the PIV processing on the spatial resolution. The procedure to extract an equivalent wavelength is illustrated in Fig. 37. First, the minimum and maximum values of the step are calculated by taking the average being careful to remove boundary effects. Then, the profile of the step is interpolated with a cubic spline, and the points at a distance of 10% the step height from the minimum and the maximum are identified (Fig. 37, left). Finally, a line passing through these points is drawn to extract the equivalent wavelength (Fig. 37, center). Some of the participants pushed the limit of the spatial resolution, thus obtaining step profiles contaminated by the Gibbs phenomenon. In cases of overshooting larger than 5% the step height, to evaluate the equivalent wavelength the distance between the two extreme points is used (Fig. 37, right).
The step profiles on the u and w component are reported in Fig. 38. The profiles are normalized, so that the minimum is 0 and the maximum is 1. In both cases, the Hacker reaches the correct extreme values, since the processed volumes are not contaminated by reconstruction artifacts. A similar performance is achieved by DLR and LaVision, thus testifying the high quality of their reconstructions. On the other hand, the profiles are affected by significant overshooting. Good results in terms of step height are also obtained by IPP, ONERA, ONERA-PTV and TUD. For the case of ONERA-PTV, the step on the w component is affected on the right side by undetected outliers, which might be due to local lower particles concentration. The cases of TUD and IPP are peculiar, since on the u or w component the overshooting is significant, while on the other component it is smaller. The teams ASU, BUAA, Dantec, IoT and LANL results are characterized by step amplitude smaller than 90% of the exact one, which can be mainly attributed to a lower quality of the reconstruction.
The equivalent wavelength (extracted with the procedure of Fig. 37) and the random error σ around the maximum and minimum values of the step are presented in the form of scatter plot in Fig. 39. Both quantities are presented in voxels at the reference resolution of 20 vox/mm. The best performances are achieved by teams placed in the bottomleft corner of the scatter plots (DLR, LaVision, ONERA, ONERA-PTV). ONERA-PTV does not appear on the right scatter plot due to the large error induced by the undetected outliers. Some participant's results are characterized by very different corresponding wavelengths on the two steps (for instance IPP and TUD); the reason is to be sought on the effect of overshooting, which is present only on one of the two components. For almost all the other participants, the equivalent wavelength obtained from the two steps is consistent.
The level of random error is reported, together with the total and the systematic errors, in the histograms of Fig. 40. The participants are sorted by the total error level on the step for the u component. In this test, the effect of the ghost particles level is fundamental in offsetting the maximum and the minimum of the steps, and therefore the quality of the reconstruction is extremely relevant. It comes with no surprise that the best performances are achieved by the teams using both the exposure to enhance the reconstruction and by the ONERA-PTV team which is using a lower image density. Furthermore, the margin on PIV processing seems to be less important, since Hacker is providing very good results even without using advanced high-resolution methods.

Conclusions of test case C
In summary, the analysis of the results of case C raises interesting points on both the particles distributions reconstruction and the estimation of the velocity fields.
-Artificial additional operations (apart from triangulation/algebraic reconstruction) are beneficial when using both the exposures, as in the MTE-based reconstruction algorithms or in the shake-the-box method. Methods based on removing particles via thresholding appear less effective due to the superposition between the intensity statistical distribution of true and ghost particles. -As a general guideline, since the quality of the reconstruction might have dramatic effects on the final outcome of the process, it is recommended to avoid pushing toward high particle image densities. Reducing slightly the particle image density below the typically used value of 0.05 ppp, at the present state, might be very rewarding in terms of reconstruction quality while turning down only weakly the spatial resolution. Nevertheless, careful analysis of the reconstructed distributions can still provide some edge, since some participants were able to perform better than Hacker even if not working on the same exact volumes.  The measurement of the underlying cascade turbulence mechanism at sufficiently high Reynolds number is a challenge for both computational and experimental fluid mechanics in terms of required spatial and dynamic velocity range. The establishment of accurate three-dimensional three-component anemometry techniques is of fundamental importance to both characterize the evolution and organization of the turbulent coherent structures and to provide a benchmark for numerical codes. In the test case D, a 3D PIV experiment is simulated imposing the velocity field of a direct numerical simulation (DNS) of an isotropic incompressible turbulent flow. The velocity fields and particle trajectories are directly extracted from the turbulence database of the group of Prof. Meneveau at John Hopkins University (http://turbulence.pha.jhu.edu/); details on the database and how to use it are provided by Perlman et al. (2007) and Li et al. (2008). The main challenge of the test case D is to evaluate correctly the velocity gradient components in the presence of many different length scales within the flow field. The originating data set consists in a periodic grid of 1024 3 points, spanning a domain of 2π × 2π × 2π. The particles are generated in a region corresponding to 2π × 0.25π × 0.17π , discretized with 4096 × 512 × 352 voxels at 20 vox/mm (thus leading to 204.8 × 25.6 × 17.6 mm 3 ). This discretization level, which is fully arbitrary, corresponds to 4 voxels for each point of the DNS grid. Details on the relevant turbulent scales, expressed in the original DNS domain, in the simulated physical domain and in the voxels space, are reported in Table 11. As in the previous case, spurious particles are generated outside of the measurement region in order to reproduce the experimental conditions of a laser sheet of 17.6 mm thickness illuminating the measurement region. In this case, no coherent motion is imposed, so that spurious particles randomly appear within the exposures. Projection images of 4 cameras (4512 × 800 pixels  at 16-bit discretization; pixel pitch 6.5 µm) are provided with fixed image density (0.10 ppp). The participants have access to projection images corresponding to 50 snapshots, but the result (velocity and vorticity vectors) is required only for one image each three of them (snapshots 2, 5, 8, ..., 47). The imaging systems are set to f # = 11 and magnification in the origin of the reference system of 0.15, as in the previous case C. Cameras are arranged to cover an angle of ≈ ±35 • with respect to the x = 0 plane and ≈ ±18 • with respect to the y = 0 plane. Similar to test case C, the projection images are not contaminated by noise and perfect calibration images (as well as the correspondence between space and image coordinates of the calibration markers in form of list) are provided.

Algorithms
A total of 8 teams (with the additional Hacker team and after withdrawing the submission of University of Melbourne and of TSI) participated in test case D. The algorithms used by the participants for this test case are summarized in Table 12.

Reconstruction
The participants have been requested to provide the reconstructed volume corresponding to the snapshot 23. As in test case C, the reconstruction has been performed by participants on a larger volume; the depth of the volume is increased from 17.6 to 24 mm (352-480 voxels at 20 vox/ mm, which has been used by all the participants and will be considered the reference resolution from this point on). The procedure for the analysis of the volumes (extraction of intensity profile, estimate of the number of true and ghost particles, etc.) and the adopted metrics are the same outlined in Sect. 7.4. The number of true and ghost particles (normalized on the number of particles in the exact distributions) is reported in Fig. 41. In this case, the central portion of the volume is obtained by cutting 300 voxels on the left and the right side, and 100 voxels on each border on the y and z directions. For some of the participants (IPP, LANL), the number of identified true particles is less than 70 %. For LANL, it is reasonable to suspect similar issues as to those of test case C. For IPP, the percentage of true particles is substantially lower than that of case C. This might be attributed to the used reconstruction algorithm (BIMART), which might be more prone to true particles cancelation at larger image density. The best performances in terms of ghost particles suppression are achieved by DLR and LaVision. This result is achieved by intense exploitation of the temporal information: the shake-the-box method used by DLR tracks particles over the several exposures and optimizes particles positions and trajectories with an intense cross-talk between trajectory predictions from previous steps and comparison with the camera projections; LaVision applies the MTE on sets of 7 exposures. With the exceptions of DLR and LaVision, in all cases the ghost particles outnumber the true ones. The impact of ghost particles can be measured by their relative intensity with respect to the reconstructed ones. In Fig. 42, the weighed intensity levels and the power ratio (according to Eqs. 6 and 7) are reported. Even though BUAA, LANL and TUD have a similar number of ghost particles, the effect on the power ratio is significantly different. The lower power ratio achieved by LANL is due to the low percentage of reconstructed true particles, while the gap between BUAA and TUD is mainly related to the intensity distributions of true and ghost particles.
Some interesting considerations can be drawn observing the quality factor (Fig. 43). The best performance according to this metric is obtained by TUD, even though the reconstruction by LaVision is characterized by a remarkably Fig. 37 Procedure to estimate the equivalent wavelength. Left selection of minimum and maximum limit. Center calculation of with a line passing through the points of maximum and minimum. Right penalization procedure in case of overshooting of more than 5% lower ghost particles number and a substantially higher power ratio. By inspecting the reconstructions, it can be inferred that the quality factor of the reconstructions by LaVision is severely affected by the particles size, which is larger than the exact solution. IoT achieves a good quality factor despite of the extremely high number of ghost particles and the low power ratio; furthermore, ASU, LANL and IPP reach a similar quality factor but with relevant differences in the estimated number of true and ghost particles, as well as their intensity distribution. As outlined in test case C, the quality factor is appealing, but its interpretation is not always as straightforward as it might appear at a first glance.

Turbulent flow features
The exact displacement field corresponding to the snapshot 23 of the sequence (obtained by correlating the exposures 22 and 24) is represented in Fig. 44. A contour of the Fig. 38 Step on the u (top) and w (bottom) displacement components. The step height is normalized to set the minimum at 0 and the maximum at 1. The z coordinate is expressed in voxels u component of the displacement field is reported on the slice z = −5.2 mm. The vortical structures are visualized with the Q criterion, where Q is the second invariant of the velocity gradient tensor, defined as:

Fig. 39
Scatter plot of equivalent wavelengths and random error (in voxels) of the participants for the step on the u (left) and w (right) displacement component  where ∇ · V is the divergence of the velocity field and tr (∇V ) 2 is the trace of the square of the velocity gradient tensor. In order to obtain a fair comparison, Q is normalized with Q w , i.e., the average value of the second invariant of the antisymmetric part of the velocity gradient tensor Ω: where the angular brackets indicate the operation of spatial averaging over the entire domain. The results obtained by the teams are illustrated in Fig. 45. From a qualitative inspection, all the participants are able to capture with good approximation the features of the u component of the velocity field (even if with different degree of smoothing). Large differences arise on the side of the vortical features.
The results obtained with the shake-the-box method by DLR seem to be the closest one to the exact displacement field illustrated in Fig. 44. A slightly lower level of detail is achieved by LaVision and TUD, which share extremely similar results even though the features of the reconstructed field are quite different and the PIV processing is performed with the Fluid Trajectory Correlation (Lynch and Scarano 2013) on different sequence lengths (11 steps and 7 steps for LaVision and TUD respectively). Most of the relevant vortical features are extracted also by IoT and IPP, even though for IoT the data are corrupted by strong edge effects; the other teams obtain more filtered vortical structures but are still able to capture the main features of the velocity distributions.

Temporal history
When analyzing time-resolved data, achieving temporal coherence can improve the spatial resolution, reduce the measurement uncertainty and allow for the calculation of temporal derivatives of relevant physical quantities. For instance, the computation of the Lagrangian acceleration is crucial when performing pressure field measurements with Tomo-PIV data. The point with coordinates [−49.2, 2.4, 0] mm is considered as a reference to observe the temporal coherence and resolution of the data submitted by the participants. The choice of the point is fully arbitrary, and it is based only on the observation of a structure characterized by high vorticity magnitude passing through the point in the center of the time sequence. The reference point is indicated in Fig. 46 on the contour of the vorticity magnitude on the slice z = 0 mm. The vorticity magnitude is expressed in mm/mm to underline that it is actually calculated on displacement, not velocities. Thus, the quantity indicated with the term "vorticity" is non-dimensional. Postprocessing LaVision Pinhole camera model Tsai (1987) 14 SMART iterations after initialization with MLOS followed by 5 MTE iterations each composed of 14 SMART (Novara et al. 2010). MTE performed on 7 objects at 20 vox/mm. Gaussian smoothing The temporal profiles of the three components of the displacement field and of the vorticity magnitude in the reference point are plotted in Figs. 47 and 48. DLR follows with quite high fidelity the temporal variations of velocity and is the only team recovering the maximum value of intensity of vorticity, even though the descent after the vorticity peak is slightly anticipated. Teams using temporal information in the cross-correlation process (IPP, LaVision, TUD) or in postprocessing (IoT) obtain, in general, good spatial resolution but may partly sacrifice the temporal one. The temporal coherence is improved using multi-frame methods, but with quite different results (see DLR, IPP, LaVision, TUD). Moreover, the shake-the-box method seems to be superior to the traditional tomographic PIV approach in both spatial and temporal resolution. Figure 49 shows the comparison of the spectra between the different teams and the reference one. The spectra are calculated using the snapshot 23. The spectrum is presented in a compensated form, obtained by multiplying it by the square of the wavenumber 2π/ . This form has the advantage of magnifying the spectral behavior at the medium and small scales. With the exception of DLR, the spectra of  Quality factor (as defined in Eq. 5) for the full volume and for the central part obtained cutting the borders all the teams are characterized by a windowing effect that is typical of a sinc function. This effect is more stressed on the spectrum of the Hacker team due to the relatively large interrogation volume size (1.6 × 1.6 × 1.6 mm 3 , i.e., 32 × 32 × 32 voxels at 20 vox/mm). Since the algorithm used by the Hacker is a simple top hat moving average approach, the impulsive response is a sinc with the first lobe closing at 2π/ ≈ 3.9 mm −1 and the second one at 2π/ ≈ 7.8 mm −1 . Some of the teams are able to follow only the first few points of the spectrum. The significant underestimation of the spectral energy over the entire spectral range is most likely attributed to the poor reconstruction quality (thus, consequently, larger smoothing effect due to the ghost particles coherent motion). A similar effect is observed to less extent in the spectrum of IoT. In this case, there is a recovery of spectral energy at the largest frequency; however, this is most likely attributed to random noise due to the irregular true particles shape or due to incoherent ghost particles, especially considering that the spectrum for large wavenumbers is decaying with a smaller slope than the exact one well before the "sinc" lobe. The teams DLR, IPP, LaVision and TUD obtained a spectrum which is close the reference one up to 2π/ ≈ 1 mm −1 , but the behavior at the small scales is quite different. The spectra of LaVision and TUD are characterized by a steep decay at the large scales, which might be due to the smoothing applied within the iterations or in postprocessing. In order to quantify the spectral energy underestimation at large and medium scales (and possibly correlate it to the quality of the reconstruction), the spectral energy fraction (SEF) is defined as the ratio between the integral of the compensated measured spectrum and the reference one:

Turbulent spectra and spatial resolution
where k = 2π/ is the wavenumber and k is a reference wavenumber up to which the compensated spectrum is (11) SEF = k 0 k 2 E 11 dk k 0 k 2 E 11,exact dk integrated. In order to reduce the effect of the random noise distortion on the spectrum, the integration is carried out up to k = 1 (corresponding to = 6.28 mm = 125.6 voxels). The results are reported in Fig. 50 in form of scatter plots to correlate the SEF with the quality of the reconstruction or the power ratio. The best results in terms of SEF are obtained exploiting the temporal information both in the reconstruction (shake-the-box, MTE) and in the displacement estimation (FTC, multi-frame tracking). The comparison between the performances of IoT and IPP is very interesting, IPP exploits the temporal information only in the PIV processing with the FTEE (Jeon et al. 2014), while IoT uses two exposures in the reconstruction (MTE) and exploits the temporal information only in the postprocessing step with a weighted average of consecutive fields. According to the SEF, using a multi-frame principle in the PIV processing pays off more than applying a similar principle in the reconstruction. However, this conclusion cannot be generalized, as it is related to the implementation details of single participants. The scatter plots of Fig. 50 highlight that the correlation between spatial resolution and quality of the reconstruction stands quite weakly, according to the chosen metrics. Starting from the top-right corner (i.e., high quality or power ratio and high SEF), 3 groups can be identified: DLR, LaVision and TUD, exploiting in full the temporal information in both reconstruction and displacement estimate; BUAA, IoT and IPP, using a traditional tomographic PIV approach, with some use of temporal information either in the reconstruction or the PIV or none of them; ASU, LANL using direct methods. However, drawing any conclusion on the achievable accuracy in the spectrum estimate by simple observation of the quality of the reconstruction is risky. For instance, BUAA obtains a lower quality factor than ASU and LANL, but still reaches a much larger SEF and spectral resolution, similar to that of the Hacker (see Fig. 49). An estimate of the quality of medium and small-scale measurement can be provided by observing the distribution of the second invariant of the velocity gradient tensor Q. Quantitative information is inferred by computing the correlation factor F Q between the distribution on the data submitted by the contributor Q c and that of the exact data Q e : The results are reported in Fig. 51 in form of scatter plots to correlate F Q with the quality of the reconstruction or the power ratio. In contrast to the SEF case, which was calculated mainly on large and medium scales, there is a significant difference between the performances of the shake-the-box method (DLR) and the multi-frame reconstruction-correlation with MTE and FTC (LaVision and TUD). Furthermore, the gap between BUAA, IoT and IPP is reduced; this is possibly an indication that, unless the full time information is used in the entire process, the differences on small-scale measurement when using algebraic methods are not relevant. From the scatter plots of Fig. 51, the same groups of Fig. 50 can be highlighted. Similarly to the case of the SEF, there is a correlation with the quality of the reconstruction, but it is quite weak. BUAA, IoT and IPP provide very similar results in terms of F Q with substantially different performances in the reconstruction. In order to achieve a perceivable improvement of the performances, a significant reconstruction quality/power ratio jump is required.

Measurement accuracy assessment
In the attempt to estimate the uncertainty of 3D PIV in turbulent flows measurement, in Fig. 52 the total error on velocity is presented in the form of scatter plots with the quality of the reconstruction and the power intensity ratio. The differences between results of the teams are quite large. Again, multi-frame reconstruction and displacement estimation provide a significant advantage, with a measurement total error of about 50-70 % of that achieved with a standard two-frame Tomo-PIV processing. In the scatter plots of Fig. 52, the same three groups highlighted in the previous section can be identified. It is therefore confirmed that direct methods are not suitable to achieve reasonable measurement accuracy at this seeding density. The scatter between participants using similar processes in terms of reconstruction features is reduced when considering the total error (see the cases of LaVision and TUD, or the cases of BUAA, IoT and IPP). However, some questions still remain on whether these techniques are standardized or not. Finally, the shake-the-box method confirms its superiority with respect to the processing algorithms based on algebraic techniques and iterative volume deformation.

Fig. 45
Displacement field (snapshot 23) for the set of participants. Contour of the u component (see Fig. 44 for the colorbar) of the displacement field on the slice z = −5.2 mm; vortices visualization with the Q criterion (Q/Q w = 1) Another important feature of 3D data consists in using physical principles to assess the measurement uncertainty, or possibly use physical constraint to reduce the error. In this test case, the zero divergence principle can be invoked since the flow regime is incompressible. In Fig. 53, the pdf of the divergence of the data provided by the participants is reported. Generally, the divergence is normalized with a reference shear rate identified in the velocity field (for instance, the shear rate in the shear layer of a jet or a wake). In the case of isotropic turbulence the reference shear selection is less trivial. The divergence has been locally normalized with the velocity gradient tensor norm (∇V : ∇V ) 0.5 . Among the participants, DLR and IoT provided data with the lowest level of divergence. However, in both cases a divergence minimization procedure has been implemented (DLR imposed a penalization on residual divergence when interpolating data from particles tracking to the required grid; IoT postprocessed the data with a gradient correction technique). LaVision and TUD are the teams with the best performances in terms of residual divergence if restricting the analysis to teams who did not perform optimization of the divergence field in postprocessing.
The scatter plot of Fig. 53 demonstrates that a correlation between the normalized divergence and the measurement error exists, but with some words of caution. While teams that used standard two-frame or multi-frame implementation of tomographic PIV based on iterative algebraic methods (BUAA, IPP, LaVision, TUD) are quite well aligned in the scatter plot (with the exception of IoT, which modified the divergence in postprocessing), the teams working with direct methods obtain results with low level of divergence when compared with the total error. Furthermore, ASU and LANL used similar processing algorithms and resulted in very similar performances according to the several adopted metrics, while in this case the standard deviation of the divergence is remarkably different. Therefore, the divergence criterion might have some utility in assessing the accuracy performance for algebraic methods in 3D PIV (and, of course, under the condition of not being tricked). However, its use on data obtained with direct method is less advisable.

Conclusions
The introduction of the time coherence and the use of a test case based on a simulated turbulent field have opened new interesting scenarios for discussion. The main results are summarized below: -In general, it is difficult to find a correlation between power ratio, quality factor and the different metrics introduced for the error (spectral energy fraction, factor of correlation of the second invariant of the velocity gradient tensor, error, etc.). However, it can be stated that there is a broad correlation according to how refined the full algorithm analysis is. -The divergence test is interesting, unless it is not "tricked" with algorithms that artificially reduce the divergence in the measured fields. The relationship between measurement error and standard deviation of divergence is approximately linear within a range of relatively small total error (up to about 0.5 voxels). However, optimizing the measured velocity fields using physical criteria seems beneficial in some cases. -The poor reconstruction performance of direct methods affects the resolution at the medium-large frequencies, but the main flow field features are still captured even if the provided images were characterized by a relatively large particle image density. This is due to the limited velocity gradients along the depth direction. However, in such situations direct methods can be very appealing for providing a predictor or a very fast preview of the reconstructed flow field due to their intrinsic simplicity. -The exploitation of the time coherence both in the reconstruction and in the displacement estimation considerably improves the spatial resolution. As predicted, enforcing the time coherence in both reconstruction and displacement estimation provides the best results. Additionally, an intense cross-talk between the two steps of the process is very beneficial (e.g., in the shake-the-box method). Furthermore, PTV-based methods with oriented particle search and refinement seem to overcome classical algebraic reconstruction + cross-correlationbased methods, at least on synthetic images, without noise.
9 Case E

Case description and measurements
Case E aimed to evaluate the state-of-the-art for planar stereoscopic PIV measurements using real experimental data. The experiment that generated the case E data was performed at Virginia Tech's Advanced Experimental Thermofluids Research Laboratory (AEThER Lab) and was based on time-resolved measurements of an impulsively started axisymmetric vortex ring flow. The experimental images acquired were purposefully subject to real-world sources Right scatter plot of the F Q and the power intensity ratio of error that are representative of stereo experiments and that are difficult to model in simulated images. These error sources included focusing effects, diffraction and aberration artifacts, forward and backward scattering intensity variation, sensor noise, and calibration error. Data were acquired simultaneously from five cameras in such a way that any combination of two out of the five cameras could yield a stereoscopic measurement data set. Hence, the experiment delivered a range of data sets with different levels of accuracy including a case for which an optimum stereoscopic configuration was approximated. In addition, the data set was processed as a three-dimensional tomographic PIV measurement that was then used as a reference for evaluating the accuracy of the stereo-PIV measurements. This comparison was based on the notion that a tomographic PIV system measures the three-dimensional velocity field within a laser sheet more accurately than traditional stereo-PIV measurements (Michaelis and Wieneke 2008;Wieneke and Taylor 2006;Elsinga et al. 2006a).
The case E flow field consisted of a laminar vortex ring with a Reynolds number based on diameter and piston velocity, of approximately Re = 2300. The vortex ring was generated using a hydraulic actuation to smoothly translate a piston-cylinder arrangement with a predetermined stroke distance and piston velocity program. The fluid was then ejected through a circular orifice. A schematic of the experimental arrangement is shown in Fig. 54a. Polystyrene fluorescent particles with a mean diameter of 27 µ m (Duke Scientific Corporation catalog number 36-5B) suspended in water were used as flow tracers. The field was illuminated  Left and center Pdf of the divergence (normalized with the velocity gradient tensor norm). Right scatter plot of the standard deviation of the normalized divergence and the total error using a dual head 10 mJpulse527 nm Nd:YLF laser that was synchronized with all cameras. The particle images were acquired as sequential single-pulse photographs recorded at 1000 Hz. The camera array consisted of three Photron FASTCAM APX-RS cameras aligned vertically and two Photron FASTCAM SA4 cameras placed on either side of the three APX-RS cameras, as is shown in Fig. 54b. Cameras 2 and 4 used Scheimpflug adapters to account for the off normal angle between the cameras and the laser sheet. Since Cameras 3 and 5 were also not normal to the laser sheet and did not have Scheimpflug adapters, the lens aperture was decreased to ensure that the particles remained in focus. The optical axis of Camera 1 was normal to the laser sheet, and thus all the particles were in focus with the lens aperture fully open.
A high resolution, low error estimate of the velocity field had to be created from the tomographic data to compare with the participant stereo-PIV velocity field submissions. This "ground truth" solution was used to complete the error analysis; however, since the data were collected from an experiment, the true velocity field within the physical fluid was unknown. Thus while the ground truth solution represents a more accurate measurement of the true field compared to what can be provided by stereo-PIV, it is still subject to measurement errors.
Calculating the ground truth solution from the five camera PIV data required several steps: First, a high accuracy tomographic intensity field reconstruction is calculated; second, a three-dimensional velocity field is calculated from the intensity field; third, the intensity field is used to identify the coordinates of the laser sheet; and 4th, the three-dimensional velocity field is interpolated onto the two-dimensional laser sheet coordinates. The diagram in The three-dimensional intensity field was calculated using LaVision DaVis 8.1.6. The images were preprocessed using background subtraction and particle intensity normalization. A third-order polynomial camera calibration (Soloff et al. 1997) was fit to the calibration grid images and then refined using volumetric self-calibration (Wieneke 2008). The three-dimensional intensity field was then reconstructed using 10 MART iterations (Elsinga et al. 2006b). Following this, the intensity field was used to calculate the three-dimensional velocity field and the position of the laser sheet.
The three-dimensional velocity field was calculated by performing volumetric pyramid robust phase correlations (Sciacchitano et al. 2012;Eckstein et al. 2008;) with iterative window deformation using code implemented in MATLAB. The pyramid correlation technique requires determining an optimum number of frames over which to take the correlations. The optimal number of frames is calculated by determining the largest frame separation that can be used to calculate correlations before the loss-of-correlation effects become too significant. To accomplish this, four loss-of-correlation sources were investigated: the material acceleration of the particles, the shear rate within the velocity field, the out-of-plane motion and the maximum particle image displacement within the correlation window (Sciacchitano et al. 2012). Graphs giving the optimal number of frames for the 50th image pair over the entire velocity field are shown in Fig. 56. The optimal frame number was then determined by finding the minimum number of frames over the full flow field for all four loss-of-correlation sources. Using these estimates, the optimal frame number was determined to be equal to 4 frames, with the material acceleration dominating the other three sources.
Once the optimal frame number was determined, the velocity field was calculated using three passes of the pyramid robust phase correlation method. The PIV processing parameters used for calculating the ground truth velocity field are given in Table 13. The next step in calculating the ground truth velocity field was to determine the precise location of the laser sheet within the three-dimensional reconstruction. This is important for two reasons. First, the three-dimensional velocity field must be interpolated onto the two-dimensional coordinate grid to yield data comparable to stereo data. Second, stereo-PIV measurement error is decreased by performing self-calibration; however, the self-calibration process maps the velocity field coordinate system from the calibration grid coordinate system onto a different coordinate system based upon the laser sheet position. Thus the coordinate system used for measuring the velocity field in stereo-PIV and the coordinate system used for measuring the velocity field in tomographic PIV are different. Physically this difference is due to the fact that the coordinate plane of the calibration grid and the plane of the laser sheet may not be parallel and may not intersect along a line including the calibration grid origin. The transformation between these two coordinate systems corresponds to a rotation and a translation. This transformation changes the velocity components, but does not change the velocity magnitude Spatial contours depicting the optimal number of frames for each point in the flow field to use in calculating the pyramid correlation for the case E data set. Based upon the flow field and processing properties, four loss-of-correlation sources were investigated: a material acceleration, b shear rate, c window size and d out-of-plane motion. The optimal frame number is determined as the minimum of the optimal number of frames over the full flow field for all four lossof-correlation sources nor any physical quantities derived from the velocity field such as forces or stresses. The effect of this transformation will typically be quite small, and in this case, applying the transformation caused the velocity vectors to change by a smaller magnitude than the expected PIV uncertainty. Thus the transformation negligibly changed the ground truth solution, but it was included for completeness. The location of the laser sheet was calculated by summing the tomographically reconstructed intensity fields over all frames to produce a time-averaged intensity field and then fitting a plane to the peak of the average intensity field. Once the precise location of the laser sheet within the reconstruction was determined, the velocity field was transformed onto the laser sheet coordinate system and interpolated onto the specified coordinates given to the participants using a cubic interpolation algorithm to yield the ground truth solution. The velocity components of the ground truth solution are denoted by U * true , V * true , and W * true .

Evaluation of case E
A set of 100 images from Camera 1 and Camera 3 was provided to the participants for evaluation. The images were saved in an uncompressed 16-bit TIFF format with a resolution of 1024×1024 pixels. Furthermore, calibration images were provided showing a grid translated to seven different Z-axis positions with a spacing of Z = 1 mm. The grid consisted of two levels spaced 3 mm apart containing a rectilinear grid of 3.2 mm diameter dots. The dots on the grid were spaced 15 mm apart both horizontally and vertically. The participants were required to submit a 2D3C (twodimensional/planar, three-component) velocity field measured at 121 × 121 vector locations spaced X = Y = 0.45 mm apart. Vector fields from the central 50 image pairs were requested allowing the participants to perform multi-frame processing methods; however, the participants were free to choose any processing method to produce the velocity vectors. In addition, the participants were asked to provide the vorticity field evaluated at the same coordinate grid as the velocity field as well as estimates of the vortex circulation calculated at the 50 time steps. The vorticity and circulation results showed negligible variation between the participants and provided little additional insight. Therefore, for the sake of brevity, a discussion of these results is omitted here.

Results
Five participating teams listed in Table 14 responded to the case E PIV challenge. In addition to these teams, a set of four cases was provided for comparison. These "hacker" cases were processed internally using a priori knowledge about the optimum processing. The data used to create these cases were divided into two subcategories: (i) the image set from Cameras 1 and 3 that matched the data provided to the participants and (ii) the image set from Cameras 2 and 4 that corresponded to a near optimum stereo optical arrangement. For each subcategory, two stereo reconstruction approaches were considered using (a) a geometric (Willert 1997) and (b) a generalized (or calibration) reconstruction (Soloff et al. 1997). These four cases, listed in Table 15, serve as an additional reference for comparison. Finally, Table 16 summarizes the processing parameters chosen and reported by each team as well as the "hacker" cases. Some similarities and differences between the processing choices should be mentioned. First, all participating teams applied a stereo self-calibration although the exact details of each approach were not reported. All participants except Dantec employed a form of multi-frame processing for dynamic range enhancement. All teams that performed multi-frame processing, used a form of the pyramid correlation except LANL that employed the fluid trajectory correlation (Jeon et al. 2014). The velocity component fields U * , V * and W * submitted by the participants were then used to calculate the errors with respect to the ground truth solution.
Using the ground truth volumetric reconstruction velocity estimation, the absolute mean error for each of the measured velocity vector components was calculated as U * = |U * − U * true |, V * = |V * − V * true | and W * = |W * − W * true |. The results are shown in Fig. 57a using bars to represent the mean absolute error and whiskers to show the corresponding standard deviation of the error. For all cases, the in-plane velocity errors were below 0.05 pixels-with DLR and IOT showing marginally lower errors compared to the other teams and approximately  The out-of-plane W component results exhibit more pronounced differences with errors varying from approximately 0.05 pixels for the HC24 and HG24 cases to approximately 0.12 pixels for Dantec and LANL cases. DLR and IOT produced the lowest errors approximately equal to 0.07 pixels while LaVision produced data with errors around 0.1 pixels. The amplification of the error of the out-of-plane component is anticipated and is captured by the calculation of the error ratios e r (U * , W * ) = µ(�W * )/µ(�U * ) , and e r (V * , W * ) = µ(�W * )/µ(�V * ). The error ratios of the participant data are shown in Fig. 57b and compared against the theoretically expected ratios based upon the geometric configuration of the cameras. The theoretical error ratio for Cameras 2 and 4 equals 1.2 and for Cameras 1 and 3 equals 2.9. Considering the processing parameters employed by each participant (Table 16) and the results of Fig. 57, it appears that the in-plane accuracy was only affected by the choice of the multi-frame processing algorithm. The results also show that choice of the camera calibration model had relatively little impact on the error since the data from DLR and IOT produced nearly identical errors, although these teams used pinhole and polynomial calibration models, respectively. Moreover, the decision to use either the geometric or the generalized reconstruction algorithms did not have a noticeable impact on the results since neither method produced significant differences in the measured errors in either the participant or the hacker data. The out-of-plane velocity is a function of the in-plane velocity, the stereo reconstruction method and the camera calibration function. The in-plane velocity error was found to be similar for all teams and the choice of the stereo reconstruction algorithm and camera calibration model appeared to not affect the out-of-plane velocity error. However, differences in the out-of-plane error and error ratios are still apparent among the teams. Thus it is possible that these differences can be attributed to the choice of the stereo self-calibration procedures employed by the teams.
A more thorough characterization of the underlying statistics of the error magnitude is shown in Fig. 58 which shows plots of the error distributions and the statistical median, mode, and quartile values of the errors. The error distributions qualitatively follow a log-normal behavior with the IOT and DLR distributions showing a narrower spread and almost converging on the hacker cases. The spread in the error distribution, which scales with the RMS error, is wider for LaVision, Dantec and LANL, in increasing order. The mode of the error is the highest for Dantec. While in-plane errors contribute to the total error shown in Fig. 58, the differences in the error distributions are primarily driven by the increased error in the out-of-plane velocity component. The near-optimal stereo configuration given by the hacker cases HC24 and HG24 yielded the tightest error distributions with mode values approximately equal to 0.05 pixels and with 50% of the error population residing between 0.05 and 0.10 pixels. The error distributions produced by the DLR and IOT data resulted in similar modes and lower quartiles; however, the upper quartiles of the DLR and IOT data were increased to approximately 0.15 pixels. This increase in the error can possibly be attributed to the elevated errors observed within the vortex ring viscous core region where the high shear and particle rotation significantly impact the accuracy of both the in-plane and out-of-plane velocity estimations. This effect is shown in Fig. 59 which compares representative velocity error maps for both the in-plane and the out-of-plane velocity components between the DLR and the HG24 cases.

Conclusions
The results of the error analysis of the participant submissions and the hacker cases show several important considerations with respect to stereo-PIV experiments. The most pronounced observation that emerged from the analysis is that collecting and processing high-quality planar 2C2D velocity field data will yield high-quality 3C2D data. This process includes designing the experimental setups to collect PIV data in as optimal conditions as possible-for example, placing the cameras in an optimal stereo configuration was shown by the hacker cases HC24 and HG24 to produce lower out-of-plane errors. In addition, using the most advanced PIV algorithms available to process the two-dimensional data-in particular the multi-frame methods-yielded lower in-plane and out-of-plane errors. The error analysis also showed that the choice of camera calibration model and the choice of the stereo reconstruction methods have relatively little effect on the quality of the data, as long as the in-plane velocity error was already low. There was relatively little optical distortion in the data collected for the case E experiment; however, in more complicated experiments, using a higher-order camera calibration model such as the third-order polynomial model may still be beneficial. Furthermore, while camera self-calibration could not be directly addressed by case E, the results show that performing an accurate self-calibration is vital to producing high-quality stereo reconstructions and thus it is important to ensure that collected calibration data are accurate. The result analyzed herein suggests that the self-calibration may be the most critical element in the stereo-PIV processing based on the current state of the method. 10 Case F

Introduction
Images of particles with "known" displacement are important in estimating the error associated with PIV image analysis. While synthetic images are often used for this purpose (Kähler et al. 2012b), images of real particles are physically more accurate if the particle displacement field is precisely predetermined. For Case F images are obtained by recording the particles in a liquid column rotating at a steady uniform angular velocity. In this flow field, any velocity component in a Cartesian coordinate is proportional to the linear distance, and vorticity is uniform everywhere.

Method
A plexiglass cylinder 10 mm deep × 65 mm in diameter was filled with water/glycerin solution containing silvercoated hollow-glass particles (S-HGS-10, 10 µm, Dantec). The cylinder was then covered with a glass plate (#62-606, BK7, 1/4" Edmund Scientific) to allow optical access through the glass without distortion of the image due to deformation of the liquid surface. The cylinder was placed on the turntable of an audio record player that spun at a constant speed of ω = 33 + 1/3 rpm (=10π/9 rad/s). After a few minutes, the flow inside the cylinder reached steady solid-body rotation. A laser beam produced by an Nd-YAG laser (EverGreen 70, 2 × 70 mJ, 15 Hz, Quantel) was expanded by a cylindrical lens to form a laser light sheet approximately 4 mm thick. The light sheet illuminated a horizontal cross section of the fluid in the cylinder, and particles were imaged on a CMOS camera (Fastcam SA3, 1024 × 1024 pixels, Photron) through a lens (AF Nikkor 28-105 mm, Nikon). The focal length and F-number were set at 180 mm and 22, respectively, and thus the dimensions of the field of view were 38 × 38 mm. Trigger signals to determine the timing of camera exposure and laser pulse were sent from a timing generator (Type 9618, 8 channels, Quantum Composers). While exposure time of the camera was set at 8 ms, a frame-straddling scheme allowed to set time intervals between the two laser pulses fired during each frame period at t = 4 ms. A total of 1000 image pairs were acquired in 100 s. While flow inside the cylinder was expected to be stationary relative to the frame rotating around the center of the turntable, a negligible amount of convection remained. As a result of this convection, there was uncertainty in the displacement field involved in the particle image. It was identified as follows. The double-pulse laser was fired at t = 0 and t = t rev , where �t rev = 2π/ω was identical to the time period of one revolution of the cylinder, and particle images were exposed on two independent image frames. Particle displacement between the images would be zero everywhere if the convection were absent; actual displacement, however, was δ rev = 0.53 mean ±0.32 s.t.d. pixels. Thus, the uncertainty of displacement in the particle images of t = 4 ms interval used for this study was δ = δ rev �t/�t rev = 1.2 · 10 −3 mean ±0.7 · 10 −3 s.t.d. pixels.

Evaluation of the image
A single pair of particle images was stored on a secure web server, and participants in the lecture room at the 4th International PIV Challenge could download them beginning around noon that day. One of the particle images evaluated is shown in Fig. 60.
Here, the rectangular image coordinates (x, y) are applied. The particle image diameter was typically 2 or 3 pixels. Participants were allowed to evaluate the images using any commercial software or their own codes running on a laptop or remote computer. They were, however, asked to perform the evaluation completely by themselves, without interacting with others, to prevent outcome bias. Participants were also asked to compute particle displacements on grid-point spans in a rectangular region of (32, 32)-(992, 992) in pixel units with 8 pixel intervals in both directions, for 121 × 121 grid points in total. A total of 29 contributors submitted their evaluation results before the 2:30 PM deadline that day. Contributors and their evaluation parameters are listed in Table 17. Identifiers designating individual contributors begin with 'F' followed by a two-digit number to ensure contributor anonymity. Figure 61 shows typical displacement vectors of the particle image pair for the evaluation.

Results and discussion
Particle displacement is theoretically expressed as Here, the coordinate (x c , y c ) represents a location at the center of the rotation, which was determined based on the evaluated results. Deviation of the evaluated x-component particle displacement from its theoretical one, x − x theory , was averaged along x and is presented in Fig. 62. This represents the bias error of the measurement. Based on the patterns in this figure, we divided contributors into four different groups, called Groups A-D. The first 13 contributors (i.e., F01-F13), designated Group A, had deviations with relatively small amplitude, while the next 6 contributors (up to F19), termed Group B, showed a pattern of sinusoidal oscillation with much larger amplitude. The period of the oscillation is approximately = 146 pixels, in which the particle displacement reduces to |�y | = ω�t = 2 pixels, referring to Eq. 13. Astarita and Cardone (2005) showed that the periodic behavior of bias (13) �x theory = − ω · �t(y − y c ) �y theory =ω · �t(x − x c )   error with a period of 2 pixels appeared as if a symmetric iterative window offset had been applied. This is consistent with the results shown in Group B, in which all contributors used a symmetric offset. Astarita and Cardone (2005) also showed that the bicubic and bilinear interpolation schemes create a bias error of 0.022 and 0.053 pixels, respectively, under use of the iterative image deformation method. This bias error reduces to 0.0025 pixels in the case of sinc with 8 × 8 pixels and to 0.012 pixels in the case of the thirdorder B-spline. Such a reduction tendency of the bias error was found in the current results; i.e., the Group A interpolation scheme based on sinc or B-spline was used by 9 contributors with 2 unknown and 1 no-interpolation-applied, while the bilinear or bicubic interpolation scheme was used by 4 contributors with 1 B-spline and 1 no-interpolationapplied in Group B. The amplitude in both groups was still much larger than the bias error predicted by Astarita and Cardone (2005), who relied on ideal synthetic images without noise or loss of particle. Group C, consisting of contributors F20-F24, presented a similar wavy pattern but with half the period, and the corresponding reduction in displacement was y = 1 pixel. The wavy pattern found in Group C obviously reflects the peak-locking phenomenon (Raffel et al. 2007;Westerweel 1997), where particle displacement tends to be biased toward a discrete integer value. As mentioned above, a 1and 2-pixel period in the variation of bias error appeared under the use of asymmetric and symmetric window offset. This is consistent with the F20 and F24 in Group C contributors, who used an asymmetric window offset. F22 applied a symmetric window offset, but evaluated a particle image that was doubled in both width and height by bilinear interpolation prior to interrogation, and then halved after the interrogation. Thus, the 2-pixel period created right after the interrogation might be halved. Another contributor, F21, who showed a 1-pixel period with an antiphase to others also used symmetric window offset, but no explanation can be given. F23, who did not give information about window offset, showed a relatively large error. While most others used a Gaussian function for correlation peak fitting, F24 used a parabolic function, which is known to produce a larger bias error Adrian and Westerweel (2011). The results of the last five contributors (F25-F29), identified as Group D, exhibited a much smoother pattern over the domain. They used spatial low-pass filtering for data postprocessing, which the contributors in the other groups did not apply. Interrogation with the symmetric offset is equivalent to a velocity calculation using a central difference scheme, which has second-order accuracy and is advantageous to the asymmetric offset; this in turn is equivalent to a forward or backward difference scheme with first-order accuracy Wereley and Meinhart (2001). Such a difference in accuracy is evident in Fig. 63, which shows the deviation x − x theory averaged along y. F08, F20 and F29, who used an asymmetric offset of the interrogation window, showed a linear slope along x, while other contributors had no tendency toward an x-wise development of the value except for F23, who gave no explicit declaration of window offset scheme. Referring to Wereley and Meinhart (2001), the error in the radial velocity component associated with forward or backward difference scheme can be expressed as where v θ is an azimuthal velocity component. This error is presented in Fig. 63 as a black dotted line, which is consistent with the results of above three contributors. Figure 64 shows the standard deviation of x − x theory computed along x, representing the random error variation. While the bias error of the value of Group A was (14) δv r = − 1 2 v θ ω�t Fig. 62 Deviation from theoretical particle displacement, x − x theory , averaged along x much smaller than that of Group C, as shown in Fig. 62, the range of random errors was comparable for both groups. Although peak locking was sufficiently suppressed in Group A, it did not necessarily follow that the random error was also reduced. Large standard deviation values were observed in F19 and F23, which also had large mean bias, as shown in Fig. 62. The standard deviation of Group 4 was suppressed, since they performed smoothing during postprocessing.

Conclusion
The 4th International PIV Challenge has gained strong attention expressed by the high number of teams that were willing to evaluate the data and the large audience present at the workshop in Lisbon in 2014. Thanks to significant improvements, extensions and innovations since the last PIV Challenge, µPIV, time-resolved PIV, stereoscopic PIV and tomographic PIV are standard techniques for velocity measurements in transparent fluids. Today, uncertainties for the displacement as low as 0.05 pixel can be reached with a variety of evaluation techniques, provided the image quality and particle image density are appropriate and problems due to flow gradients and solid boundaries can be neglected. This implies that the quality of the equipment (laser, camera, lenses) is still an important factor for accurate PIV measurements. Surprisingly, the largest uncertainties are introduced by the user who selects the evaluation parameters based on knowledge, experience and intuition. A solid understanding of the underlying principles of the technique and practical experience with the components involved, their alignment but also preliminary knowledge of the flow under investigation is essential to reach the low uncertainties mentioned above. Another general conclusion is that sophisticated PTV algorithms can outperform stateof-the-art PIV algorithms in terms of uncertainty. However, it is to early to predict whether this finding will initiate a transition from PIV toward PTV evaluation approaches on a long term or whether future evaluation strategies will combine both methods to minimize the uncertainty as much as possible. Case A (µPIV) illustrates that image preprocessing is very important and its application can affect the results quite strongly. Therefore, care must be taken to achieve the desired quality of the results. Another remarkable finding is that the handling of boundaries is a predominant source of errors. The results show that the estimated width of the small channel varied by more than 20 % among all teams. Therefore, robust digital masking approaches are needed to reduce the strong uncertainty induced by the human factor. Another important conclusion that can be drawn from the results is that care must be taken by integrating modelbased approaches in the evaluation procedure. The results clearly show that making use of the no-slip condition at boundaries leads to the expected velocity decrease in the boundary layers even when the near-wall flow cannot be resolved. However, since the exact wall location is required to apply this condition large uncertainties arise if the aforementioned uncertainties in determining the location of the boundaries have to be considered. For this reason, objective measures for the quantification of the PIV uncertainty need to be further developed. Instead of using model assumptions, it is also possible to reduce the uncertainty by using evaluation approaches with higher spatial resolution such as single-pixel ensemble correlation or even particle tracking approaches.
Case B shows again that the different evaluation strategies lead to quite similar results if the basic design and operation rules of PIV are taken into account. However, when the evaluation parameter can be freely selected, as done in case of evaluation 2 the human factor becomes essential and the results differ strongly. For instance, if a PIV user prefers smooth velocity fields, he or she will select other parameters from those selected by users who prefer high spatial resolutions. The use of multi-frame evaluation techniques can greatly reduce the uncertainty, and with PTV algorithms, the influence of the user decreases due to the fact that spatial resolution effects can be ignored. However, when multi-frame evaluation techniques are used, other aspects need to be considered with care. In the case of strongly oversampled image sequences, as provided here, a damping of the amplitudes at higher frequencies is acceptable from the physical point of view. However, caution is advised not to suppress physically relevant frequencies. A smooth field in space and time does not necessarily mean a high evaluation accuracy.
Case C indicates that the optimization of the particle image density in terms of maximum spatial resolution with minimum loss of accuracy is the main challenge of 3D tomographic PIV. The results showed that the choice of an appropriate metric for the quality of the reconstruction is controversial, even in the case of synthetic data. The commonly used criterion Q > 0.75 is simple and straightforward, but it should be used with caution considering its high sensitivity to the particle images' positioning and, more importantly, diameter. The use of a particle/ intensity-based approach is proposed for synthetic data, which seems to be more efficient in capturing differences between the participants. In particular, the power intensity ratio provides a measure of the 'cross-correlation power' of the particles, thus better highlighting the effects of ghosts on the PIV analysis. This metric has the additional advantage of being simple and easily extended to PTV methods. Artificial additional operations (apart from triangulation/ algebraic reconstruction) are beneficial when using both exposures, as in the MTE-based reconstruction algorithms or in the shake-the-box method. Methods based on removing particles via thresholding appear less effective due to the superposition between the intensity statistical distribution of true and ghost particles. As a general guideline, it is recommended to avoid pushing toward high particle image densities.
Case D is a synthetic image sequence of a 3D tomographic PIV configuration portraying the velocity field of a direct numerical simulation of an isotropic incompressible turbulent flow. The main challenge of this test case is to correctly evaluate the velocity gradient components in the presence of many different length scales within the flow field. The results showed that it is difficult to find a correlation between power ratio, quality factor and the different metrics introduced for the error (spectral energy fraction, factor of correlation of the second invariant of the velocity gradient tensor, error, etc.). However, it can be stated that there is a broad correlation according to how refined the full algorithm analysis is. The exploitation of the time coherence both in the reconstruction and in the displacement estimation considerably improves the spatial resolution as observed in case B. As predicted, enforcing the time coherence in both the reconstruction and displacement estimation provides the best results. Additionally, an intense cross-talk between the two steps of the process is very beneficial (e.g., in the shake-the-box method). Furthermore, PTV-based methods with oriented particle search and refinement seem to overcome classical algebraic reconstruction and cross-correlation-based methods, at least on synthetic images, without noise.
Case E aimed to evaluate the state-of-the-art for planar stereoscopic PIV measurements using real experimental data. The experimentally acquired images were purposefully subject to real-world sources of error that are representative of stereo experiments and that are difficult to model in simulated images. These error sources included focusing effects, diffraction and aberration artifacts, forward and backward scattering intensity variation, sensor noise and calibration errors. The most pronounced observation that emerged from the analysis is that collecting and processing high-quality planar 2C2D velocity field data will yield high-quality 3C2D data. The error analysis also showed that the choice of camera calibration model and the choice of the stereo reconstruction methods had relatively little effect on the quality of the data, as long as the in-plane velocity error was already low. Furthermore, the results show that performing an accurate self-calibration is vital to producing high-quality stereo reconstructions and thus it is important to ensure that collected calibration data are accurate. The results analyzed herein suggest that the self-calibration may be the most critical element in the stereo-PIV processing based on the current state of the method. However, it is important to bear in mind that the self-calibration reduces the uncertainty of the velocity measurements at the cost of increasing the uncertainty of the vector location. Consequently, this approach may fail in the case of flows with strong gradients.
The results of case F showed apparent difference of both bias and random errors between the contributors. Interpolation scheme such as SINC or B-spline has a tendency of reducing the bias error, while the bilinear or bicubic scheme has a larger bias error. Advantage of the use of symmetric offset scheme with second-order accuracy to the asymmetric offset is evident in the mean velocity field. The random error was distributed within a range of 0.05-0.1 pixels in standard deviation for most of contributors who did not apply smoothing during postprocessing.
Acknowledgments The authors like to thank Springer Science+Business Media (www.springer.com), PCO AG (www.pco. de) and InnoLas Laser GmbH (www.innolas-laser.com) for their financial support to make the workshop possible in Lisbon on the July 5, 2014. Without the participants of the 4th International PIV Challenge, this paper could not be written, and we would like to thank all of them for their effort and the fruitful discussions.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.