Limit on buffer size due to ocpi_buffer_size_* property

D
dwp@md1tech.co.uk
Fri, Jun 16, 2023 9:00 AM

Hello,

Is there a particular reason why the ocpi_buffer_size_* properties are restricted to 16 bit?
This results in quite a small limit on the amount of data that can be sent over a single buffer.

I am trying to send the output of an FFT (as floats) and this is limiting the size of the FFT I can do to 4096. (When sticking to powers of 2)

I would have expected that even with the limit in place I would be able to send 8192 floats in a single buffer but when I set the sequenceLength property in the protocol to 8192 I get this error:

Exiting for exception: Value for property "ocpi_buffer_size_out" of instance "fft" of component "local.uhd.uhd.fft" is invalid for its type: for property ocpi_buffer_size_out: Expression value (6.5536e4) is out of range for UShort type properties (0 to 6.5535e4)

I know that I could serialise the data and send it over multiple buffers however this would introduce additional time and space complexity copying memory in and out of extra buffers.

Thanks in advance,
Dan

Hello, Is there a particular reason why the `ocpi_buffer_size_*` properties are restricted to 16 bit?\ This results in quite a small limit on the amount of data that can be sent over a single buffer. I am trying to send the output of an FFT (as floats) and this is limiting the size of the FFT I can do to 4096. (When sticking to powers of 2) I would have expected that even with the limit in place I would be able to send 8192 floats in a single buffer but when I set the `sequenceLength` property in the protocol to 8192 I get this error: `Exiting for exception: Value for property "ocpi_buffer_size_out" of instance "fft" of component "local.uhd.uhd.fft" is invalid for its type: for property ocpi_buffer_size_out: Expression value (6.5536e4) is out of range for UShort type properties (0 to 6.5535e4)` I know that I could serialise the data and send it over multiple buffers however this would introduce additional time and space complexity copying memory in and out of extra buffers. Thanks in advance,\ Dan
DW
Dominic Walters
Fri, Jun 16, 2023 12:02 PM

Hi,

The following is what I believe to be true, but shouldn't be taken as
authoritative.

Firstly, this will depend on what protocol you are using.
You mention FFT (and it sounds like a non-complex one), so I'm going to
assume you are doing streams of float in and streams of float out.
A suitable protocol therefore might be float_timed_sample:
specs/float_timed_sample-prot.xml
· develop · OpenCPI / OpenCPI Component Library Projects / SDR · GitLab
https://gitlab.com/opencpi/comp/ocpi.comp.sdr/-/blob/develop/specs/float_timed_sample-prot.xml

This protocol limits its sequences to a length of 4096.
4096 * 4 bytes per float = 16384 bytes total.
You will find that all the timed_sample protocols in ocpi.comp.sdr obey
this same limit.

I had to go searching for why this is, although I knew this was a hard
limit.
It's related to the structure of the headers sent over the Scalable Data
Plane (SDP).
Section 5.4.10 in the Platform Development Guide talks about this: OCPI_ODT
(opencpi.gitlab.io)
https://opencpi.gitlab.io/releases/v2.5.0-beta.1/docs/OpenCPI_Platform_Development_Guide.pdf
Table 8 on page 64 shows a record element called count which is the
number of bytes transferred.
Pg 67 then says "The maximum count allows for 16KB".
This limit can be found in the ocpi.core.sdp primitive library here:
projects/core/hdl/primitives/sdp/sdp_pkg.vhd
· develop · OpenCPI / OpenCPI · GitLab
https://gitlab.com/opencpi/opencpi/-/blob/develop/projects/core/hdl/primitives/sdp/sdp_pkg.vhd#L28

This limit was chosen so that the SDP could accommodate transmission of at
least a full Ethernet Jumbo Frame (9K).
All component implementations in OpenCPI are asked to obey the same rules
regarding their protocols.
As such, RCC is bound by the same limitations as HDL.

With regards to getting round this limitation, it sounds like you need to
make use of the take method on your input port with send on your output.
take is defined in Section 4.4.9 of the RCC Development Guide: OpenCPI
RCC Development
https://opencpi.gitlab.io/releases/v2.5.0-beta.1/docs/OpenCPI_RCC_Development_Guide.pdf
Basically, the current buffer on the port is handed to the worker as a
pointer. The worker then owns this buffer.
This does not copy the data, and the documentation calls out that this is
how "sliding window algorithms" should be implemented.
send is defined in 4.4.6 and talks about how it should be used to "effect
a zero copy transfer".

There are various combinations of take, send, release, advance, and
request that could get to your desired outcome.
Although I'm not convinced that zero copy is achievable (unless there's
some way to merge buffer data pointers).
The simplest solution (although not the most efficient) would be to take
as many buffers as you need, do your fft (however this is achieved), then
copy the results back into the taken buffers and send them all.
This results in two copies (one into the FFT function, one out of it).
If you choose not to use take and send and go for a full advance
approach, I think you are still two copies (one into the FFT function, one
out of it).

I strongly advise reading the documentation of take and send as it is
very easy to trip up using them.
You will also need to look at RunConditions if you go down that route.

Kind Regards,
D. Walters

On Fri, Jun 16, 2023 at 10:01 AM dwp@md1tech.co.uk wrote:

Hello,

Is there a particular reason why the ocpi_buffer_size_* properties are
restricted to 16 bit?
This results in quite a small limit on the amount of data that can be sent
over a single buffer.

I am trying to send the output of an FFT (as floats) and this is limiting
the size of the FFT I can do to 4096. (When sticking to powers of 2)

I would have expected that even with the limit in place I would be able to
send 8192 floats in a single buffer but when I set the sequenceLength
property in the protocol to 8192 I get this error:

Exiting for exception: Value for property "ocpi_buffer_size_out" of
instance "fft" of component "local.uhd.uhd.fft" is invalid for its type:
for property ocpi_buffer_size_out: Expression value (6.5536e4) is out of
range for UShort type properties (0 to 6.5535e4)

I know that I could serialise the data and send it over multiple buffers
however this would introduce additional time and space complexity copying
memory in and out of extra buffers.

Thanks in advance,
Dan


discuss mailing list -- discuss@lists.opencpi.org
To unsubscribe send an email to discuss-leave@lists.opencpi.org

Hi, The following is what I believe to be true, but shouldn't be taken as authoritative. Firstly, this will depend on what protocol you are using. You mention FFT (and it sounds like a non-complex one), so I'm going to assume you are doing streams of float in and streams of float out. A suitable protocol therefore might be `float_timed_sample`: specs/float_timed_sample-prot.xml · develop · OpenCPI / OpenCPI Component Library Projects / SDR · GitLab <https://gitlab.com/opencpi/comp/ocpi.comp.sdr/-/blob/develop/specs/float_timed_sample-prot.xml> This protocol limits its sequences to a length of 4096. 4096 * 4 bytes per float = 16384 bytes total. You will find that all the `timed_sample` protocols in `ocpi.comp.sdr` obey this same limit. I had to go searching for why this is, although I knew this was a hard limit. It's related to the structure of the headers sent over the Scalable Data Plane (SDP). Section 5.4.10 in the Platform Development Guide talks about this: OCPI_ODT (opencpi.gitlab.io) <https://opencpi.gitlab.io/releases/v2.5.0-beta.1/docs/OpenCPI_Platform_Development_Guide.pdf> Table 8 on page 64 shows a record element called `count` which is the number of bytes transferred. Pg 67 then says "The maximum count allows for 16KB". This limit can be found in the `ocpi.core.sdp` primitive library here: projects/core/hdl/primitives/sdp/sdp_pkg.vhd · develop · OpenCPI / OpenCPI · GitLab <https://gitlab.com/opencpi/opencpi/-/blob/develop/projects/core/hdl/primitives/sdp/sdp_pkg.vhd#L28> This limit was chosen so that the SDP could accommodate transmission of at least a full Ethernet Jumbo Frame (9K). All component implementations in OpenCPI are asked to obey the same rules regarding their protocols. As such, RCC is bound by the same limitations as HDL. With regards to getting round this limitation, it sounds like you need to make use of the `take` method on your input port with `send` on your output. `take` is defined in Section 4.4.9 of the RCC Development Guide: OpenCPI RCC Development <https://opencpi.gitlab.io/releases/v2.5.0-beta.1/docs/OpenCPI_RCC_Development_Guide.pdf> Basically, the current buffer on the port is handed to the worker as a pointer. The worker then owns this buffer. This does not copy the data, and the documentation calls out that this is how "sliding window algorithms" should be implemented. `send` is defined in 4.4.6 and talks about how it should be used to "effect a zero copy transfer". There are various combinations of `take`, `send`, `release`, `advance`, and `request` that could get to your desired outcome. Although I'm not convinced that zero copy is achievable (unless there's some way to merge buffer data pointers). The simplest solution (although not the most efficient) would be to `take` as many buffers as you need, do your fft (however this is achieved), then copy the results back into the `take`n buffers and `send` them all. This results in two copies (one into the FFT function, one out of it). If you choose not to use `take` and `send` and go for a full `advance` approach, I think you are still two copies (one into the FFT function, one out of it). I strongly advise reading the documentation of `take` and `send` as it is very easy to trip up using them. You will also need to look at `RunCondition`s if you go down that route. Kind Regards, D. Walters On Fri, Jun 16, 2023 at 10:01 AM <dwp@md1tech.co.uk> wrote: > Hello, > > Is there a particular reason why the ocpi_buffer_size_* properties are > restricted to 16 bit? > This results in quite a small limit on the amount of data that can be sent > over a single buffer. > > I am trying to send the output of an FFT (as floats) and this is limiting > the size of the FFT I can do to 4096. (When sticking to powers of 2) > > I would have expected that even with the limit in place I would be able to > send 8192 floats in a single buffer but when I set the sequenceLength > property in the protocol to 8192 I get this error: > > Exiting for exception: Value for property "ocpi_buffer_size_out" of > instance "fft" of component "local.uhd.uhd.fft" is invalid for its type: > for property ocpi_buffer_size_out: Expression value (6.5536e4) is out of > range for UShort type properties (0 to 6.5535e4) > > I know that I could serialise the data and send it over multiple buffers > however this would introduce additional time and space complexity copying > memory in and out of extra buffers. > > Thanks in advance, > Dan > _______________________________________________ > discuss mailing list -- discuss@lists.opencpi.org > To unsubscribe send an email to discuss-leave@lists.opencpi.org >
D
dwp@md1tech.co.uk
Fri, Jun 16, 2023 1:16 PM

Thank you Dom I will look into using the take method.

Thank you Dom I will look into using the `take` method.
JK
James Kulp
Fri, Jun 16, 2023 9:29 PM

Hi Dom/Dan,

A few additional comments here.

The description of the SDP limitations is correct (big enough for jumbo
frames), but when the SDP sends message buffers, it 'segments" the
buffers into potentially smaller SDP packets.
So the SDP header is used for segments, not the higher level messages.
The source code for SDP uses the term "messages" sometimes when it
should use "segments" or "packets", which is confusing.

The current HDL infrastructure code actually has a message size limit of
2^21-1 (the actual length-in-bytes fields are 21 bits).

But there is indeed a separate limitation of the ocpi_buffer_size
property's data type (ushort) for HDL workers.

8K floats should work since that implies a required buffer size of 32K,
which should fit in a ushort.

The HDL buffer size limit of 64K-1 is primarily due to tradeoffs in the
use of FPGA BRAM resources, which is another discussion.

I do not believe this UShort buffer size limitation applies to RCC workers.

Cheers,
Jim

On 6/16/23 8:02 AM, Dominic Walters via discuss wrote:

Hi,

The following is what I believe to be true, but shouldn't be taken as
authoritative.

Firstly, this will depend on what protocol you are using.
You mention FFT (and it sounds like a non-complex one), so I'm going
to assume you are doing streams of float in and streams of float out.
A suitable protocol therefore might be float_timed_sample:
specs/float_timed_sample-prot.xml · develop · OpenCPI / OpenCPI
Component Library Projects / SDR · GitLab
https://gitlab.com/opencpi/comp/ocpi.comp.sdr/-/blob/develop/specs/float_timed_sample-prot.xml

This protocol limits its sequences to a length of 4096.
4096 * 4 bytes per float = 16384 bytes total.
You will find that all the timed_sample protocols in ocpi.comp.sdr
obey this same limit.

I had to go searching for why this is, although I knew this was a hard
limit.
It's related to the structure of the headers sent over the Scalable
Data Plane (SDP).
Section 5.4.10 in the Platform Development Guide talks about this:
OCPI_ODT (opencpi.gitlab.io)
https://opencpi.gitlab.io/releases/v2.5.0-beta.1/docs/OpenCPI_Platform_Development_Guide.pdf
Table 8 on page 64 shows a record element called count which is the
number of bytes transferred.
Pg 67 then says "The maximum count allows for 16KB".
This limit can be found in the ocpi.core.sdp primitive library here:
projects/core/hdl/primitives/sdp/sdp_pkg.vhd · develop · OpenCPI /
OpenCPI · GitLab
https://gitlab.com/opencpi/opencpi/-/blob/develop/projects/core/hdl/primitives/sdp/sdp_pkg.vhd#L28

This limit was chosen so that the SDP could accommodate transmission
of at least a full Ethernet Jumbo Frame (9K).
All component implementations in OpenCPI are asked to obey the same
rules regarding their protocols.
As such, RCC is bound by the same limitations as HDL.

With regards to getting round this limitation, it sounds like you need
to make use of the take method on your input port with send on
your output.
take is defined in Section 4.4.9 of the RCC Development Guide:
OpenCPI RCC Development
https://opencpi.gitlab.io/releases/v2.5.0-beta.1/docs/OpenCPI_RCC_Development_Guide.pdf
Basically, the current buffer on the port is handed to the worker as a
pointer. The worker then owns this buffer.
This does not copy the data, and the documentation calls out that this
is how "sliding window algorithms" should be implemented.
send is defined in 4.4.6 and talks about how it should be used to
"effect a zero copy transfer".

There are various combinations of take, send, release,
advance, and request that could get to your desired outcome.
Although I'm not convinced that zero copy is achievable (unless
there's some way to merge buffer data pointers).
The simplest solution (although not the most efficient) would be to
take as many buffers as you need, do your fft (however this is
achieved), then copy the results back into the taken buffers and
send them all.
This results in two copies (one into the FFT function, one out of it).
If you choose not to use take and send and go for a full advance
approach, I think you are still two copies (one into the FFT function,
one out of it).

I strongly advise reading the documentation of take and send as it
is very easy to trip up using them.
You will also need to look at RunConditions if you go down that route.

Kind Regards,
D. Walters

On Fri, Jun 16, 2023 at 10:01 AM dwp@md1tech.co.uk wrote:

 Hello,

 Is there a particular reason why the |ocpi_buffer_size_*|
 properties are restricted to 16 bit?
 This results in quite a small limit on the amount of data that can
 be sent over a single buffer.

 I am trying to send the output of an FFT (as floats) and this is
 limiting the size of the FFT I can do to 4096. (When sticking to
 powers of 2)

 I would have expected that even with the limit in place I would be
 able to send 8192 floats in a single buffer but when I set the
 |sequenceLength| property in the protocol to 8192 I get this error:

 |Exiting for exception: Value for property "ocpi_buffer_size_out"
 of instance "fft" of component "local.uhd.uhd.fft" is invalid for
 its type: for property ocpi_buffer_size_out: Expression value
 (6.5536e4) is out of range for UShort type properties (0 to 6.5535e4)|

 I know that I could serialise the data and send it over multiple
 buffers however this would introduce additional time and space
 complexity copying memory in and out of extra buffers.

 Thanks in advance,
 Dan

 _______________________________________________
 discuss mailing list -- discuss@lists.opencpi.org
 To unsubscribe send an email to discuss-leave@lists.opencpi.org

discuss mailing list --discuss@lists.opencpi.org
To unsubscribe send an email todiscuss-leave@lists.opencpi.org

Hi Dom/Dan, A few additional comments here. The description of the SDP limitations is correct (big enough for jumbo frames), but when the SDP sends message buffers, it 'segments" the buffers into potentially smaller SDP packets. So the SDP header is used for segments, not the higher level messages. The source code for SDP uses the term "messages" sometimes when it should use "segments" or "packets", which is confusing. The current HDL infrastructure code actually has a message size limit of 2^21-1 (the actual length-in-bytes fields are 21 bits). But there is indeed a separate limitation of the ocpi_buffer_size property's data type (ushort) for HDL workers. 8K floats should work since that implies a required buffer size of 32K, which should fit in a ushort. The HDL buffer size limit of 64K-1 is primarily due to tradeoffs in the use of FPGA BRAM resources, which is another discussion. I do not believe this UShort buffer size limitation applies to RCC workers. Cheers, Jim On 6/16/23 8:02 AM, Dominic Walters via discuss wrote: > Hi, > > The following is what I believe to be true, but shouldn't be taken as > authoritative. > > Firstly, this will depend on what protocol you are using. > You mention FFT (and it sounds like a non-complex one), so I'm going > to assume you are doing streams of float in and streams of float out. > A suitable protocol therefore might be `float_timed_sample`: > specs/float_timed_sample-prot.xml · develop · OpenCPI / OpenCPI > Component Library Projects / SDR · GitLab > <https://gitlab.com/opencpi/comp/ocpi.comp.sdr/-/blob/develop/specs/float_timed_sample-prot.xml> > > This protocol limits its sequences to a length of 4096. > 4096 * 4 bytes per float = 16384 bytes total. > You will find that all the `timed_sample` protocols in `ocpi.comp.sdr` > obey this same limit. > > I had to go searching for why this is, although I knew this was a hard > limit. > It's related to the structure of the headers sent over the Scalable > Data Plane (SDP). > Section 5.4.10 in the Platform Development Guide talks about this: > OCPI_ODT (opencpi.gitlab.io) > <https://opencpi.gitlab.io/releases/v2.5.0-beta.1/docs/OpenCPI_Platform_Development_Guide.pdf> > Table 8 on page 64 shows a record element called `count` which is the > number of bytes transferred. > Pg 67 then says "The maximum count allows for 16KB". > This limit can be found in the `ocpi.core.sdp` primitive library here: > projects/core/hdl/primitives/sdp/sdp_pkg.vhd · develop · OpenCPI / > OpenCPI · GitLab > <https://gitlab.com/opencpi/opencpi/-/blob/develop/projects/core/hdl/primitives/sdp/sdp_pkg.vhd#L28> > > This limit was chosen so that the SDP could accommodate transmission > of at least a full Ethernet Jumbo Frame (9K). > All component implementations in OpenCPI are asked to obey the same > rules regarding their protocols. > As such, RCC is bound by the same limitations as HDL. > > With regards to getting round this limitation, it sounds like you need > to make use of the `take` method on your input port with `send` on > your output. > `take` is defined in Section 4.4.9 of the RCC Development Guide: > OpenCPI RCC Development > <https://opencpi.gitlab.io/releases/v2.5.0-beta.1/docs/OpenCPI_RCC_Development_Guide.pdf> > Basically, the current buffer on the port is handed to the worker as a > pointer. The worker then owns this buffer. > This does not copy the data, and the documentation calls out that this > is how "sliding window algorithms" should be implemented. > `send` is defined in 4.4.6 and talks about how it should be used to > "effect a zero copy transfer". > > There are various combinations of `take`, `send`, `release`, > `advance`, and `request` that could get to your desired outcome. > Although I'm not convinced that zero copy is achievable (unless > there's some way to merge buffer data pointers). > The simplest solution (although not the most efficient) would be to > `take` as many buffers as you need, do your fft (however this is > achieved), then copy the results back into the `take`n buffers and > `send` them all. > This results in two copies (one into the FFT function, one out of it). > If you choose not to use `take` and `send` and go for a full `advance` > approach, I think you are still two copies (one into the FFT function, > one out of it). > > I strongly advise reading the documentation of `take` and `send` as it > is very easy to trip up using them. > You will also need to look at `RunCondition`s if you go down that route. > > Kind Regards, > D. Walters > > On Fri, Jun 16, 2023 at 10:01 AM <dwp@md1tech.co.uk> wrote: > > Hello, > > Is there a particular reason why the |ocpi_buffer_size_*| > properties are restricted to 16 bit? > This results in quite a small limit on the amount of data that can > be sent over a single buffer. > > I am trying to send the output of an FFT (as floats) and this is > limiting the size of the FFT I can do to 4096. (When sticking to > powers of 2) > > I would have expected that even with the limit in place I would be > able to send 8192 floats in a single buffer but when I set the > |sequenceLength| property in the protocol to 8192 I get this error: > > |Exiting for exception: Value for property "ocpi_buffer_size_out" > of instance "fft" of component "local.uhd.uhd.fft" is invalid for > its type: for property ocpi_buffer_size_out: Expression value > (6.5536e4) is out of range for UShort type properties (0 to 6.5535e4)| > > I know that I could serialise the data and send it over multiple > buffers however this would introduce additional time and space > complexity copying memory in and out of extra buffers. > > Thanks in advance, > Dan > > _______________________________________________ > discuss mailing list -- discuss@lists.opencpi.org > To unsubscribe send an email to discuss-leave@lists.opencpi.org > > > _______________________________________________ > discuss mailing list --discuss@lists.opencpi.org > To unsubscribe send an email todiscuss-leave@lists.opencpi.org
D
dwp@md1tech.co.uk
Mon, Jun 19, 2023 8:18 AM

Hi Jim,

The output of my FFT is complex so the storage requirements are doubled.

Also all of the workers I have are RCC and the limit on ocpi_buffer_size still applies,
I can submit this as a gitlab issue if you believe it is not the intended behaviour.

I have being playing around with using take and send to achieve this. Note that there seems to be an issue with the take method being overloaded by a seemingly unrelated function. I did submit this as an issue and the workaround is to use in.RCCUserPort::take so that the correct method is called.

Thanks,
Dan

Hi Jim, The output of my FFT is complex so the storage requirements are doubled. Also all of the workers I have are RCC and the limit on `ocpi_buffer_size` still applies,\ I can submit this as a gitlab issue if you believe it is not the intended behaviour. I have being playing around with using `take` and `send` to achieve this. Note that there seems to be an issue with the take method being overloaded by a seemingly unrelated function. I did submit this as an issue and the workaround is to use `in.RCCUserPort::take` so that the correct method is called. Thanks,\ Dan