Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Add initial proposal for V2 recording & playback API. (WIP) #791

Draft
wants to merge 7 commits into
base: v2-docs
Choose a base branch
from

Conversation

microbit-carlos
Copy link
Collaborator

@microbit-carlos microbit-carlos commented May 15, 2023

Docs preview:

This initial proposal has been discussed in:

But we have some open question that will likely result and a rework of some of this.

Initial proposal

The initial proposal in this PR was to create a new AudioBuffer class to contain the audio data and sampling rate.
The AudioBuffer.rate property could then be used by microphone.record() and audio.play() to configure recording and playback rates.
This was done to avoid introducing a new parameter to audio.play() to configure the sampling rate, when it could only work with a single type of sound input (as it might not be possible to change the rate of the SoundExpressions or AudioFrames).

Disadvantages

However, changing the rate in a buffer type to change the playback rate in real-time is a bit awkward:

my_recording = audio.AudioBuffer(duration=5000, rate=5500)
microphone.record_into(my_recording)
audio.play(my_recording, wait=False)
while audio.is_playing():
    x = accelerometer.get_x()
    my_recording.rate = scale(x, (-1000, 1000), (2250, 11000))
    sleep(50)

An alternative we considered was to have the playback sampling rate modified via the audio module itself:

audio.play(my_recording, wait=False)
while audio.is_playing():
    x = accelerometer.get_x()
    audio.set_rate(scale(x, (-1000, 1000), (2250, 11000)))
    sleep(50)

However, this would have to set the same rate to everything played via the audio module, and Sound Expression have a different default rate (44K) than recordings (11K). So audio.set_rate(22000) should slow down Sound Expression and speed up recordings.

Alternatively, if we wanted to change the playback rate via the audio module, we could set a ratio instead. Something equivalent to audio.set_speed(100%) (with different semantics). But a disadvantage would be that it's removing some of math/physics learning opportunity to directly relate the sampling rate value with the effects that it has in playback speed.

Alternative proposal: bytearray as the buffer type

In this case a byte array would be returned by microphone.record() and used withmicrophone.record_into().

As this data type does not include info about the rate, we depend on the audio.play() adding an extra argument that might not work with other sound types like Sound Expressions and Audio Frames.

However, we still have the issue of updating the playback rate in real time during playback, which means we might would have to use use a similar approach to the previously mentioned audio.set_speed(100%):

sound_in_byte_array = microphone.record(duration=3000, rate=5500)
audio.play(sound_in_byte_array, rate=5500 wait=False)
while audio.is_playing():
    x = accelerometer.get_x()
    audio.set_speed(scale(x, (-1000, 1000), (50, 200)))
    sleep(50)
DURATION_SECONDS = 3
SAMPLE_RATE = 5500
recording = bytearray(DURATION_SECONDS * SAMPLE_RATE)
microphone.record_into(recording, rate=SAMPLE_RATE)
audio.play(recording, rate=SAMPLE_RATE)

Alternative proposal: AudioFrames as the buffer type

This would be the same as the bytearray proposal, but using the existing AudioFrames instead.

We might need to tweak the AudioFrame class to let us user larger buffers, as the default is 32 samples. As audio.play() can consume an iterable as well, we would need to figure out a good balance between AudioFrame size and number of AudioFrames in a recording buffer.

@microbit-carlos microbit-carlos force-pushed the docs-recording branch 3 times, most recently from 52f230c to e635d92 Compare May 15, 2023 10:53
@microbit-carlos microbit-carlos marked this pull request as ready for review May 19, 2023 11:27
@microbit-carlos microbit-carlos marked this pull request as draft May 19, 2023 11:27
@microbit-carlos microbit-carlos changed the title docs: Add initial proposal for recording & playback API. (WIP) docs: Add initial proposal for V2 recording & playback API. (WIP) May 19, 2023
@microbit-carlos
Copy link
Collaborator Author

microbit-carlos commented Aug 3, 2023

Based on the latest discussion we have agreed that we'd prefer to avoid introducing a new data type for his feature.
In that case we have two options:

Use a byte array and change sampling via function in the audio module

sound_in_byte_array = microphone.record(duration=3000, rate=5500)
audio.play(sound_in_byte_array, wait=False)
while audio.is_playing():
    x = accelerometer.get_x()
    playback_sampling_rate = scale(x, (-1000, 1000), (2_200, 11_000))
    audio.set_rate(playback_sampling_rate)
    sleep(50)

One previous suggestion was to create a function in the lines of audio.set_speed() that used a percentage value (0 to 100) as a the input range. This could solve the issue with different pipelines in the sound mixer having different playback sampling rates, however we believe it's important to use real numbers to be able to directly compare and understand how changing the sampling rate during recording vs playback affects sound.

To be able to implement something like audio.set_rate() we would probably need to ensure everything that is inputted to audio.play() uses the same default sampling rate. Right now these are the types of input audio.play() takes:

  • Sound effects via user-created audio.SoundEffect() instances
    • CODAL SoundExpressions, default sampling rate: 44_100
  • Built-in sounds via microbit.Sound pre-generated instances
    • CODAL SoundExpressions, default sampling rate: 44_100
  • audio.AudioFrame
    • MicroPython data type, default sampling rate: 8_000?
      • Not sure about this one, the docs list as single frame of 32 samples to take a bit over 4ms.

We'd have to check how it'd affect sound quality, but we could consider decreasing the sampling rate for SoundExpressions to 11K. However, we would still have the isse that we won't be able to increase the default AudioFrame sampling rate as that would how old programmes sound.

While this would be the cleanest way to do this for the user API, I'm not seeing a way in which it can be achieved with the current requirements? Does anybody have any ideas to overcomes this issues?

Expand AudioFrame to include sampling rate

This option is similar to the original proposals about creating a new AudioBuffer data type, but instead we can expand AudioFrames to be able to have different sizes and different sampling rates.

By default they should still behave as they do in micro:bit V1 (and older MicroPython versions for V2), which is 32 samples at 8K (?) sampling rate.

But this could be changed via constructor parameters to have any size buffer at any sample rate.

Unfortunately, that still leaves us with the awkward case of changing playback sampling rate by changing a variable from the samples class, instead of a method in the audio module.

my_recording = audio.AudioFrames(length=11000, rate=5500)
my_recording = microphone.record_into(my_recording)
# Or
my_recording = microphone.record(duration=2000, rate=5500)

audio.play(my_recording, wait=False)
while audio.is_playing():
    x = accelerometer.get_x()
    my_recording.rate = scale(x, (-1000, 1000), (2250, 11000))
    sleep(50)

@microbit-carlos
Copy link
Collaborator Author

@dpgeorge the docs have been updated, let me know if something doesn't match our previous conversation.

@microbit-carlos
Copy link
Collaborator Author

@dpgeorge there are couple of issues related to setting the sampling rate, but shouldn't affect the MicroPython implementation and should be fixed (without changes in the API) in the next CODAL release: https://github.com/lancaster-university/codal-microbit-v2/issues?q=is%3Aopen+milestone%3Av0.2.60+label%3Ap0

docs/audio.rst Outdated
@@ -69,6 +72,13 @@ Functions

Stops all audio playback.

.. py:function:: set_rate(sample_rate)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this could be made into a static-method of the AudioFrame class? That way the scope of it is clear, it only acts on these objects. And it would then be possible to write:

AudioFrame.set_rate(8000)
...
my_frame = AudioFrame(...)
my_frame.set_rate(8000)
``

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, as a static method that would affect all AudioFrame playback instead of individual instances?

I agree that having it as part of AudioFrame makes it a lot more clear that it only affects playback of this type, but might be confusing when accessing the function via instances:

frame_one = AudioFrame(...)
frame_two = AudioFrame(...)
frame_one.set_rate(8000)
frame_two.set_rate(8000)
...
while audio.is_playing():
    frame_one.set_rate(scale(accelerometer values...))

# At this point, I'd expect playback for frame_two to still be 8000?
audio.play(frame_two) 

Would it'd be more intuitive if each instance hold its playback rate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, as a static method that would affect all AudioFrame playback instead of individual instances?

Yes, as a static method calling it once affects all AudioFrame instances.

If we went this way, probably best to document it as AudioFrame.set_rate(...) and never refer to it as frame.set_rate(...). Then it's clear it sets the global rate for all audio frames.

Would it'd be more intuitive if each instance hold its playback rate?

I'm not sure... maybe? But then it would be an instance method and you would always call frame.set_rate(...) to set the playback rate of that instance.

I think that's possible to implement.

docs/audio.rst Outdated
AudioFrame
==========

.. py:class::
AudioFrame
AudioFrame(size=32)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this could be extended to take optional keyword arguments duration and rate (with a default), to make it easy to create an AudioFrame of a given duration.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this case, similar to the previous comment about having an AudioFrame.set_rate(), if rate was added to the constructor then we'd need to have set_rate() as a method changing each instance value, instead of a class static function.

I do like the idea to be able to make the arguments of the AudioFrame constructor simpler to work out if the user just wants a specific "size in time".

What would happen if the constructor is provided an argument for size and for duration? Would that throw an exception?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that there's a difference between record and play rates. The rate here is related to the recording rate, although the actual recording rate is set when you call microphone.record_into(). And that's different again to the playback rate set by .set_rate().

What would happen if the constructor is provided an argument for size and for duration? Would that throw an exception?

Yes it could throw an exception.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs/audio.rst Outdated
The ``audio.play()`` function can consume an instance or iterable
(sequence, like list or tuple, or generator) of ``AudioFrame`` instances.
Its default playback rate is 7812 Hz, and uses linear interpolation to output
a PWM signal at 32.5 kHz.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should get rid of this linear interpolation, and just output the 7812 Hz signal directly. I did some tests, and the audio quality is better without the interpolation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, in that case let's do that and remove this from the docs 👍

@@ -70,11 +119,61 @@ Functions
* **return**: a representation of the sound pressure level in the range 0 to
255.

.. py:function:: record(duration=3000, rate=7812, wait=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can remove this function altogether, and just have record_into(). That way memory management is explicit, it avoids situations where there's a lot of heap fragmentation, and also avoids any difficulties with this function returning an AudioFrame object that is growing over time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I forgot the context in which AudioFrame would be growing. I take it that was the case when wait=False, but if we have to have a known/default value for duration and rate, wouldn't the AudioFrame be allocated at the start before the recording occurs?

I still quite like the idea of being able to do audio.play(microphone.record(duration=3000)).
Would it be simpler if wait was removed from this version and only available in record_into()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it that was the case when wait=False, but if we have to have a known/default value for duration and rate, wouldn't the AudioFrame be allocated at the start before the recording occurs?

When wait=False the returned-AudioFrame will be gradually filling up. If we go with the current implementation where the whole buffer is preallocated at the start of the recording, then I guess it's not too bad, the user just sees blank data that's gradually filling up.

So, we can keep this function. And even support wait=False. Under the hood it will simply allocate an AudioFrame of the desired size, then pass it to record_into().


Example
=======
.. py:function:: record_into(buffer, rate=7812, wait=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this function can set some new internal state in an AudioFrame object which indicates the length of the recording (as opposed to the total allocated length of the buffer). Then audio.play() would use this state to only play the amount that was recorded.

Could then add a method to get/set this length, eg AudioFrame.get_recording_length(), and AudioFrame.set_recording_length().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I completely agree here, playing with the current branch and recording a couple of seconds into a 5 second "buffer" results in a few seconds of "silent playback".

Would record_into() then also use this marker to continue recording where it left off?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would record_into() then also use this marker to continue recording where it left off?

I think that would be confusing. It would require the user to explicitly reset the marker to the beginning to reuse the buffer. Maybe instead record_into() could have an additional argument which lets the user specify the starting location in the buffer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


.. py:function:: stop_recording()

Stops an a recording running in the background.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Stops an a recording running in the background.
Stops a recording running in the background.

Fixing typo but that also got me wondering what happens if you record more than once with wait=False (before the first has finished).
Do they all work? If so does this function stop them all?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one can be recorded at once, so a previously running recording is aborted at the point you call record() or record_into() again (as though you called stop_recording() first).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@microbit-carlos microbit-carlos force-pushed the docs-recording branch 2 times, most recently from 72212bc to d37bfa0 Compare July 21, 2024 19:16
@microbit-carlos
Copy link
Collaborator Author

@dpgeorge the docs have been updated with the conclusion from microbit-foundation/micropython-microbit-v2#205 (comment).

One thing I've changed that we haven't discussed before was to remove the rate argument from microphone.record_into(), as the rate can be set in the input AudioRecording/AudioTrack, and without the argument there isn't any ambiguity as to what takes precedence.

--------------

.. py:class::
AudioRecording(duration, rate=7812)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we try with a default rate of 11000?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, will update the docs.


:return: The configured sample rate.

.. py:function:: copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this method? The Python way would be to use the constructor:

new_recording = AudioRecording(old_recording)

(I can't remember if we discussed this or not.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we looked at picking between a copy constructor and a copy() method, in the end we decided to go with the method because the micro:bit API hasn't used copy constructors before and the copy() method was already a pattern used in classes like Image.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've implemented AudioRecording.copy().

(There's actually also the existing SoundEffect.copy(), so that's nice and consistent.)


:returns: a copy of the ``AudioRecording``.

.. py:function:: track(start_ms=0, end_ms=-1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these keyword-only arguments, or positional?

Ie do we allow recording.track(123) or require recording.track(start_ms=123)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to leave them as positional, users can still use keywords for clarity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think they are positional at the moment:

>>> audio.AudioRecording(100).track(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: extra positional arguments given

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now made them positional.


:param buffer: The buffer containing the audio data.
:param rate: The sampling rate at which data will be stored
via the microphone, or played via the ``audio.play()`` function.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you pass in an AudioRecording as the buffer and don't specify the rate, should it take the rate of the AudioRecording?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the answer to that is "yes" then the signature would need to be:

AudioTrack(buffer, rate=None)

and None means:

  • if buffer is an AudioRecording (or AudioTrack?) then take the rate from that
  • otherwise rate is default 7812 (or 11000?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, originally I was thinking to force the default rate, as it's also a simpler implementation and explanation. But after some thought I think most users would expected the rate to be copied. I'll update the docs to the rate=None method.

``AudioTrack``, ``AudioRecording`` or buffer-like object like
a ``bytes`` or ``bytearray`` instance.

:param other: Buffer-like instance from which to copy the data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the other is smaller than self, should the remaining bytes be left untouched? Probably.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll add to the docs as well.

a ``bytes`` or ``bytearray`` instance.

:param other: Buffer-like instance from which to copy the data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need to document access via subscript and slicing, and len(audio_track) operator.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that commonly documented via the dunder methods?
That might be a bit hard to understand for some of our audience, maybe we should add an explanation before or after the class documentation, or the in the class description?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see an example of how CPython documents dict: https://docs.python.org/3/library/stdtypes.html#mapping-types-dict

Eg it uses:

  • len(d) for length
  • d[key] for subscript
  • d | other for binary ops
  • d |= other for inplace ops

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to document these somehow as otherwise it's trial and error/reading the C.

I think for AudioTrack we have:

    @overload
    def __getitem__(self, i: int) -> int: ...
    @overload
    def __getitem__(self, s: slice) -> AudioTrack: ...
    def __setitem__(self, i: int, x: int) -> None: ...

    def __add__(self, v: AudioTrack) -> AudioTrack: ...
    def __iadd__(self, v: AudioTrack) -> AudioTrack: ...
    def __sub__(self, v: AudioTrack) -> AudioTrack: ...
    def __isub__(self, v: AudioTrack) -> AudioTrack: ...
    def __mul__(self, v: float) -> AudioTrack: ...
    def __imul__(self, v: float) -> AudioTrack: ...

So slice access but not slice assignment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants