Single bit errors in output of fsk_filerw on Ettus N310

MR
Munro, Robert M.
Thu, Oct 10, 2019 4:20 PM

The performance of the fsk_filerw application running on the Ettus N310 is being examined as a functional verification of integrated software and hardware using the OpenCPI framework runtime for loading and execution.  The application runs without any errors output and completes the operations with the same binary output size at all times.  The expected output file has been extracted from the input file, opencpi/projects/assets/applications/FSK/idata/Os.jpeg, by removing the 242 byte header and 2 byte tail.  Binary comparison of multiple runs shows that there are randomly placed single bit errors in multiple places in the output file.

The framework and application were built for the Matchstiq_Z1 using the OCPI 1.4 baseline and the output of the application matched the expected result over multiple runs.

Attached are a golden expected output and outputs with single bit errors in varying numbers, locations, and conditions generated by running the fsk_filerw application on the N310.  The application was run using the command 'ocpirun app_fsk_filerw.xml'.  Runs were performed in a mixture of immediately following power cycles and repeatedly to see if it affected performance.  The files were generated in the following manner:

  •     Power N310
    
  •     Generate run7
    
  •     Generate run8
    
  •     Reboot
    
  •     Generate run9
    
  •     Reboot
    
  •     Generate run10
    
  •     Reboot
    
  •     Generate run11
    
  •     Generate run12
    
  •     Reboot
    
  •     Generate run13
    

Has this kind of issue been encountered before while using the framework?  If so, what were the cause and resolution taken?  What steps are necessary to debug the performance of this reference application?

Thanks,
Rob

The performance of the fsk_filerw application running on the Ettus N310 is being examined as a functional verification of integrated software and hardware using the OpenCPI framework runtime for loading and execution. The application runs without any errors output and completes the operations with the same binary output size at all times. The expected output file has been extracted from the input file, opencpi/projects/assets/applications/FSK/idata/Os.jpeg, by removing the 242 byte header and 2 byte tail. Binary comparison of multiple runs shows that there are randomly placed single bit errors in multiple places in the output file. The framework and application were built for the Matchstiq_Z1 using the OCPI 1.4 baseline and the output of the application matched the expected result over multiple runs. Attached are a golden expected output and outputs with single bit errors in varying numbers, locations, and conditions generated by running the fsk_filerw application on the N310. The application was run using the command 'ocpirun app_fsk_filerw.xml'. Runs were performed in a mixture of immediately following power cycles and repeatedly to see if it affected performance. The files were generated in the following manner: - Power N310 - Generate run7 - Generate run8 - Reboot - Generate run9 - Reboot - Generate run10 - Reboot - Generate run11 - Generate run12 - Reboot - Generate run13 Has this kind of issue been encountered before while using the framework? If so, what were the cause and resolution taken? What steps are necessary to debug the performance of this reference application? Thanks, Rob
JK
James Kulp
Thu, Oct 10, 2019 7:31 PM

Thanks Robert.

For those that don't know, this fsk mod/demod app is a digital loopback
without any transceiver or converter involved.

So if the matchstiq_z1 appears solid and the N310 does not, the
difference is really the Zynq part: (7020 vs. the 7100),
 the clocking set up, and of course the actual place+route of this
bitstream.

The next thing I would check/eliminate is the clocking setup.

First, what clock rates are produced by the PS/PL clock generator (as
programmed by the boot prom).  For example, using the ocpizynq tool:
On a zedboard, "ocpizynq clocks" says:

FCLK 0: source: IO PLL, divisor0:  5, divisor1:  2, throttling not
reset, throttling not started, mode: debug/static, frequency: 100.00 MHz
FCLK 1: source: IO PLL, divisor0: 24, divisor1:  1, throttling not
reset, throttling not started, last count: 1, mode: stopped/normal,
frequency: 41.67 MHz
FCLK 2: source: IO PLL, divisor0: 24, divisor1:  1, throttling not
reset, throttling not started, last count: 1, mode: stopped/normal,
frequency: 41.67 MHz
FCLK 3: source: IO PLL, divisor0: 24, divisor1:  1, throttling not
reset, throttling not started, last count: 1, mode: stopped/normal,
frequency: 41.67 MHz

Second, look at which of these clocks the platform worker is actually using.
In the zed.vhd file we see:

    clkbuf   : BUFG   port map(I => fclk(0),

I.e. the zed platform code uses FCLK 0, which is set up by the boot prom
to be 100MHZ

Third, look at the constraints file to see what the FPGA tools used when
it built the design.
For zed using Vivado, it is: zed.xdc

   # 10 ns period = 100000 KHz
   create_clock -name clk_fpga_0 -period 10.000 [get_pins
{ftop/pfconfig_i/zed_i/worker/ps/ps/PS7_i/FCLKCLK[0]}]

Finally, look at the timing report of the assembly/bitstream used for
this application, namely

projects/assets/hdl/assemblies/fsk_filerw/container-fsk_filerw_zed_base/target-zynq/timing.out

To see if there are any timing violations.

If that all looks good, then there is a different issue, which could
involve a typical divide and conquer approach.

This behavior is not something we have seen exactly.

Jim

On 10/10/19 12:20 PM, Munro, Robert M. wrote:

The performance of the fsk_filerw application running on the Ettus N310 is being examined as a functional verification of integrated software and hardware using the OpenCPI framework runtime for loading and execution.  The application runs without any errors output and completes the operations with the same binary output size at all times.  The expected output file has been extracted from the input file, opencpi/projects/assets/applications/FSK/idata/Os.jpeg, by removing the 242 byte header and 2 byte tail.  Binary comparison of multiple runs shows that there are randomly placed single bit errors in multiple places in the output file.

The framework and application were built for the Matchstiq_Z1 using the OCPI 1.4 baseline and the output of the application matched the expected result over multiple runs.

Attached are a golden expected output and outputs with single bit errors in varying numbers, locations, and conditions generated by running the fsk_filerw application on the N310.  The application was run using the command 'ocpirun app_fsk_filerw.xml'.  Runs were performed in a mixture of immediately following power cycles and repeatedly to see if it affected performance.  The files were generated in the following manner:

  •     Power N310
    
  •     Generate run7
    
  •     Generate run8
    
  •     Reboot
    
  •     Generate run9
    
  •     Reboot
    
  •     Generate run10
    
  •     Reboot
    
  •     Generate run11
    
  •     Generate run12
    
  •     Reboot
    
  •     Generate run13
    

Has this kind of issue been encountered before while using the framework?  If so, what were the cause and resolution taken?  What steps are necessary to debug the performance of this reference application?

Thanks,
Rob

Thanks Robert. For those that don't know, this fsk mod/demod app is a digital loopback without any transceiver or converter involved. So if the matchstiq_z1 appears solid and the N310 does not, the difference is really the Zynq part: (7020 vs. the 7100),  the clocking set up, and of course the actual place+route of this bitstream. The next thing I would check/eliminate is the clocking setup. First, what clock rates are produced by the PS/PL clock generator (as programmed by the boot prom).  For example, using the ocpizynq tool: On a zedboard, "ocpizynq clocks" says: FCLK 0: source: IO PLL, divisor0:  5, divisor1:  2, throttling not reset, throttling not started, mode: debug/static, frequency: 100.00 MHz FCLK 1: source: IO PLL, divisor0: 24, divisor1:  1, throttling not reset, throttling not started, last count: 1, mode: stopped/normal, frequency: 41.67 MHz FCLK 2: source: IO PLL, divisor0: 24, divisor1:  1, throttling not reset, throttling not started, last count: 1, mode: stopped/normal, frequency: 41.67 MHz FCLK 3: source: IO PLL, divisor0: 24, divisor1:  1, throttling not reset, throttling not started, last count: 1, mode: stopped/normal, frequency: 41.67 MHz Second, look at which of these clocks the platform worker is actually using. In the zed.vhd file we see:     clkbuf   : BUFG   port map(I => fclk(0), I.e. the zed platform code uses FCLK 0, which is set up by the boot prom to be 100MHZ Third, look at the constraints file to see what the FPGA tools used when it built the design. For zed using Vivado, it is: zed.xdc    # 10 ns period = 100000 KHz    create_clock -name clk_fpga_0 -period 10.000 [get_pins {ftop/pfconfig_i/zed_i/worker/ps/ps/PS7_i/FCLKCLK[0]}] Finally, look at the timing report of the assembly/bitstream used for this application, namely projects/assets/hdl/assemblies/fsk_filerw/container-fsk_filerw_zed_base/target-zynq/timing.out To see if there are any timing violations. If that all looks good, then there is a different issue, which could involve a typical divide and conquer approach. This behavior is not something we have seen exactly. Jim On 10/10/19 12:20 PM, Munro, Robert M. wrote: > The performance of the fsk_filerw application running on the Ettus N310 is being examined as a functional verification of integrated software and hardware using the OpenCPI framework runtime for loading and execution. The application runs without any errors output and completes the operations with the same binary output size at all times. The expected output file has been extracted from the input file, opencpi/projects/assets/applications/FSK/idata/Os.jpeg, by removing the 242 byte header and 2 byte tail. Binary comparison of multiple runs shows that there are randomly placed single bit errors in multiple places in the output file. > > The framework and application were built for the Matchstiq_Z1 using the OCPI 1.4 baseline and the output of the application matched the expected result over multiple runs. > > Attached are a golden expected output and outputs with single bit errors in varying numbers, locations, and conditions generated by running the fsk_filerw application on the N310. The application was run using the command 'ocpirun app_fsk_filerw.xml'. Runs were performed in a mixture of immediately following power cycles and repeatedly to see if it affected performance. The files were generated in the following manner: > > - Power N310 > > - Generate run7 > > - Generate run8 > > - Reboot > > - Generate run9 > > - Reboot > > - Generate run10 > > - Reboot > > - Generate run11 > > - Generate run12 > > - Reboot > > - Generate run13 > > Has this kind of issue been encountered before while using the framework? If so, what were the cause and resolution taken? What steps are necessary to debug the performance of this reference application? > > Thanks, > Rob >
MR
Munro, Robert M.
Fri, Oct 11, 2019 9:43 PM

Jim,

Thanks for your insight.

The clocking setup and configuration were examined and the XDC clocking constraint for the FCLK0 needed to be adjusted to constrain the HDL compilation properly.  After the adjustment the application generates output as expected.

Thanks,
Rob

-----Original Message-----
From: discuss discuss-bounces@lists.opencpi.org On Behalf Of James Kulp
Sent: Thursday, October 10, 2019 3:31 PM
To: discuss@lists.opencpi.org
Subject: Re: [Discuss OpenCPI] Single bit errors in output of fsk_filerw on Ettus N310

Thanks Robert.

For those that don't know, this fsk mod/demod app is a digital loopback without any transceiver or converter involved.

So if the matchstiq_z1 appears solid and the N310 does not, the difference is really the Zynq part: (7020 vs. the 7100),
 the clocking set up, and of course the actual place+route of this bitstream.

The next thing I would check/eliminate is the clocking setup.

First, what clock rates are produced by the PS/PL clock generator (as programmed by the boot prom).  For example, using the ocpizynq tool:
On a zedboard, "ocpizynq clocks" says:

FCLK 0: source: IO PLL, divisor0:  5, divisor1:  2, throttling not reset, throttling not started, mode: debug/static, frequency: 100.00 MHz FCLK 1: source: IO PLL, divisor0: 24, divisor1:  1, throttling not reset, throttling not started, last count: 1, mode: stopped/normal,
frequency: 41.67 MHz
FCLK 2: source: IO PLL, divisor0: 24, divisor1:  1, throttling not reset, throttling not started, last count: 1, mode: stopped/normal,
frequency: 41.67 MHz
FCLK 3: source: IO PLL, divisor0: 24, divisor1:  1, throttling not reset, throttling not started, last count: 1, mode: stopped/normal,
frequency: 41.67 MHz

Second, look at which of these clocks the platform worker is actually using.
In the zed.vhd file we see:

    clkbuf   : BUFG   port map(I => fclk(0),

I.e. the zed platform code uses FCLK 0, which is set up by the boot prom to be 100MHZ

Third, look at the constraints file to see what the FPGA tools used when it built the design.
For zed using Vivado, it is: zed.xdc

   # 10 ns period = 100000 KHz
   create_clock -name clk_fpga_0 -period 10.000 [get_pins {ftop/pfconfig_i/zed_i/worker/ps/ps/PS7_i/FCLKCLK[0]}]

Finally, look at the timing report of the assembly/bitstream used for this application, namely

projects/assets/hdl/assemblies/fsk_filerw/container-fsk_filerw_zed_base/target-zynq/timing.out

To see if there are any timing violations.

If that all looks good, then there is a different issue, which could involve a typical divide and conquer approach.

This behavior is not something we have seen exactly.

Jim

On 10/10/19 12:20 PM, Munro, Robert M. wrote:

The performance of the fsk_filerw application running on the Ettus N310 is being examined as a functional verification of integrated software and hardware using the OpenCPI framework runtime for loading and execution.  The application runs without any errors output and completes the operations with the same binary output size at all times.  The expected output file has been extracted from the input file, opencpi/projects/assets/applications/FSK/idata/Os.jpeg, by removing the 242 byte header and 2 byte tail.  Binary comparison of multiple runs shows that there are randomly placed single bit errors in multiple places in the output file.

The framework and application were built for the Matchstiq_Z1 using the OCPI 1.4 baseline and the output of the application matched the expected result over multiple runs.

Attached are a golden expected output and outputs with single bit errors in varying numbers, locations, and conditions generated by running the fsk_filerw application on the N310.  The application was run using the command 'ocpirun app_fsk_filerw.xml'.  Runs were performed in a mixture of immediately following power cycles and repeatedly to see if it affected performance.  The files were generated in the following manner:

  •     Power N310
    
  •     Generate run7
    
  •     Generate run8
    
  •     Reboot
    
  •     Generate run9
    
  •     Reboot
    
  •     Generate run10
    
  •     Reboot
    
  •     Generate run11
    
  •     Generate run12
    
  •     Reboot
    
  •     Generate run13
    

Has this kind of issue been encountered before while using the framework?  If so, what were the cause and resolution taken?  What steps are necessary to debug the performance of this reference application?

Thanks,
Rob
-------------- next part -------------- An HTML attachment was
scrubbed...
URL:
<http://lists.opencpi.org/pipermail/discuss_lists.opencpi.org/attachme
nts/20191010/d7154deb/attachment.html>
-------------- next part -------------- A non-text attachment was
scrubbed...
Name: fsk_filerw_output.zip
Type: application/x-zip-compressed
Size: 72734 bytes
Desc: fsk_filerw_output.zip
URL:
<http://lists.opencpi.org/pipermail/discuss_lists.opencpi.org/attachme
nts/20191010/d7154deb/attachment.zip>


discuss mailing list
discuss@lists.opencpi.org
http://lists.opencpi.org/mailman/listinfo/discuss_lists.opencpi.org

Jim, Thanks for your insight. The clocking setup and configuration were examined and the XDC clocking constraint for the FCLK0 needed to be adjusted to constrain the HDL compilation properly. After the adjustment the application generates output as expected. Thanks, Rob -----Original Message----- From: discuss <discuss-bounces@lists.opencpi.org> On Behalf Of James Kulp Sent: Thursday, October 10, 2019 3:31 PM To: discuss@lists.opencpi.org Subject: Re: [Discuss OpenCPI] Single bit errors in output of fsk_filerw on Ettus N310 Thanks Robert. For those that don't know, this fsk mod/demod app is a digital loopback without any transceiver or converter involved. So if the matchstiq_z1 appears solid and the N310 does not, the difference is really the Zynq part: (7020 vs. the 7100),  the clocking set up, and of course the actual place+route of this bitstream. The next thing I would check/eliminate is the clocking setup. First, what clock rates are produced by the PS/PL clock generator (as programmed by the boot prom).  For example, using the ocpizynq tool: On a zedboard, "ocpizynq clocks" says: FCLK 0: source: IO PLL, divisor0:  5, divisor1:  2, throttling not reset, throttling not started, mode: debug/static, frequency: 100.00 MHz FCLK 1: source: IO PLL, divisor0: 24, divisor1:  1, throttling not reset, throttling not started, last count: 1, mode: stopped/normal, frequency: 41.67 MHz FCLK 2: source: IO PLL, divisor0: 24, divisor1:  1, throttling not reset, throttling not started, last count: 1, mode: stopped/normal, frequency: 41.67 MHz FCLK 3: source: IO PLL, divisor0: 24, divisor1:  1, throttling not reset, throttling not started, last count: 1, mode: stopped/normal, frequency: 41.67 MHz Second, look at which of these clocks the platform worker is actually using. In the zed.vhd file we see:     clkbuf   : BUFG   port map(I => fclk(0), I.e. the zed platform code uses FCLK 0, which is set up by the boot prom to be 100MHZ Third, look at the constraints file to see what the FPGA tools used when it built the design. For zed using Vivado, it is: zed.xdc    # 10 ns period = 100000 KHz    create_clock -name clk_fpga_0 -period 10.000 [get_pins {ftop/pfconfig_i/zed_i/worker/ps/ps/PS7_i/FCLKCLK[0]}] Finally, look at the timing report of the assembly/bitstream used for this application, namely projects/assets/hdl/assemblies/fsk_filerw/container-fsk_filerw_zed_base/target-zynq/timing.out To see if there are any timing violations. If that all looks good, then there is a different issue, which could involve a typical divide and conquer approach. This behavior is not something we have seen exactly. Jim On 10/10/19 12:20 PM, Munro, Robert M. wrote: > The performance of the fsk_filerw application running on the Ettus N310 is being examined as a functional verification of integrated software and hardware using the OpenCPI framework runtime for loading and execution. The application runs without any errors output and completes the operations with the same binary output size at all times. The expected output file has been extracted from the input file, opencpi/projects/assets/applications/FSK/idata/Os.jpeg, by removing the 242 byte header and 2 byte tail. Binary comparison of multiple runs shows that there are randomly placed single bit errors in multiple places in the output file. > > The framework and application were built for the Matchstiq_Z1 using the OCPI 1.4 baseline and the output of the application matched the expected result over multiple runs. > > Attached are a golden expected output and outputs with single bit errors in varying numbers, locations, and conditions generated by running the fsk_filerw application on the N310. The application was run using the command 'ocpirun app_fsk_filerw.xml'. Runs were performed in a mixture of immediately following power cycles and repeatedly to see if it affected performance. The files were generated in the following manner: > > - Power N310 > > - Generate run7 > > - Generate run8 > > - Reboot > > - Generate run9 > > - Reboot > > - Generate run10 > > - Reboot > > - Generate run11 > > - Generate run12 > > - Reboot > > - Generate run13 > > Has this kind of issue been encountered before while using the framework? If so, what were the cause and resolution taken? What steps are necessary to debug the performance of this reference application? > > Thanks, > Rob > -------------- next part -------------- An HTML attachment was > scrubbed... > URL: > <http://lists.opencpi.org/pipermail/discuss_lists.opencpi.org/attachme > nts/20191010/d7154deb/attachment.html> > -------------- next part -------------- A non-text attachment was > scrubbed... > Name: fsk_filerw_output.zip > Type: application/x-zip-compressed > Size: 72734 bytes > Desc: fsk_filerw_output.zip > URL: > <http://lists.opencpi.org/pipermail/discuss_lists.opencpi.org/attachme > nts/20191010/d7154deb/attachment.zip> > _______________________________________________ > discuss mailing list > discuss@lists.opencpi.org > http://lists.opencpi.org/mailman/listinfo/discuss_lists.opencpi.org _______________________________________________ discuss mailing list discuss@lists.opencpi.org http://lists.opencpi.org/mailman/listinfo/discuss_lists.opencpi.org