This Go program demonstrates how to work with Unicode characters in a string.
It includes functionality to print the length of a string in bytes, iterate over each byte to print its hexadecimal representation, count the number of runes in the string, and iterate over each rune to print its Unicode code point and starting index.
Additionally, it uses the utf8.DecodeRuneInString
function to decode runes and provides an example function, examineRune
, to examine specific runes.
package main
import (
"fmt"
"unicode/utf8"
)
func main() {
const s = "สวัสดี"
// Print the length of the string in bytes
fmt.Println("Len:", len(s))
// Iterate over each byte and print its hexadecimal representation
for i := 0; i < len(s); i++ {
fmt.Printf("%x ", s[i])
}
fmt.Println()
// Print the count of runes in the string
fmt.Println("Rune count:", utf8.RuneCountInString(s))
// Iterate over each rune and print its Unicode code point and starting index
for idx, runeValue := range s {
fmt.Printf("%#U starts at %d\n", runeValue, idx)
}
fmt.Println("\nUsing DecodeRuneInString")
// Iterate over each rune using DecodeRuneInString and print its Unicode code point and starting index
for i, w := 0, 0; i < len(s); i += w {
runeValue, width := utf8.DecodeRuneInString(s[i:])
fmt.Printf("%#U starts at %d\n", runeValue, i)
w = width
// Examine the rune using the examineRune function
examineRune(runeValue)
}
}
// examineRune examines specific runes and prints custom messages
func examineRune(r rune) {
if r == 't' {
fmt.Println("found tee")
} else if r == 'ส' {
fmt.Println("found so sua")
}
}
Len: 18
e0 b8 aa e0 b8 a7 e0 b8 b1 e0 b8 aa e0 b8 94 e0 b8 b5
Rune count: 6
U+0E2A 'ส' starts at 0
U+0E27 'ว' starts at 3
U+0E31 'ั' starts at 6
U+0E2A 'ส' starts at 9
U+0E14 'ด' starts at 12
U+0E35 'ี' starts at 15
Using DecodeRuneInString
U+0E2A 'ส' starts at 0
found so sua
U+0E27 'ว' starts at 3
U+0E31 'ั' starts at 6
U+0E2A 'ส' starts at 9
found so sua
U+0E14 'ด' starts at 12
U+0E35 'ี' starts at 15
Explanation:
-
The program initializes a constant string
s
with the value "สวัสดี", which means "hello" in Thai. -
The length of the string in bytes is printed using
len(s)
. -
A loop iterates over each byte of the string, printing its hexadecimal representation.
-
The
utf8.RuneCountInString
function is used to count the number of runes in the string. -
A loop using
range
iterates over each rune in the string, printing its Unicode code point and starting index. -
Another loop uses
utf8.DecodeRuneInString
to iterate over each rune and print its Unicode code point and starting index. -
The
examineRune
function is called to examine specific runes and print custom messages based on their values.