Skip to content

Exercise idea: emoji nano blogging #1507

Closed
@ccouzens

Description

@ccouzens

Background: I'd like to make an exercise that introduces people to Unicode. The goal is to teach a little about Unicode, and to teach how to manipulate it in the language they're learning.

Originally (#1503) I was thinking of either extending the existing reverse string exercise or making a unicode reverse string exercise.
Extending reverse string wasn't seen as a good idea due to it increasing the complexity and because it would exclude some languages.
I don't think unicode reverse string fits with the idea of having storified exercises.

Instead, I'm suggesting the following exercise. The point of it will be to limit strings to 5 characters. The challenge will come from unicode characters not usually being indexable in the same way people are generally used to.

Consider this javascript:

"hello world".slice(0, 5)
"hello"
"😛😛😛😛😛😛😛😛".slice(0, 5)
"😛😛\ud83d"

The naive way of getting a 5 character string only returned 2 and a half characters when used with Unicode.

Suggested story: You have identified a gap in the social media market for very very short posts. Now that Twitter allows 280 character posts, people wanting quick social media updates aren't being served.

To make your product noteworthy, you make it extreme and only allow posts of 5 or less characters. Any posts of more than 5 characters should be truncated to 5.

To allow your users to express themselves fully, you allow Emoji and other Unicode.

Additional Information for the students:
ASCII, UTF-8 and UTF-16 are 3 common character encodings you are likely to
encounter as a programmer. ASCII is a character encoding published in 1963
designed for representing English language text. UTF-8 and UTF-16 are both
Unicode character encodings. Unicode was first published in 1991 and can now
represent English language text, foreign text, historic text and importantly for our social network, emojis.

UTF-8 and UTF-16 both represent English language characters as a single code
unit. But many emoji characters require 2 or more code units.

You may find that your previous methods of working with strings have been working on code units instead of characters.


I'm happy to create this exercise if people think it has merit.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions