- Published on
Line-by-line character: I wrote a Chinese-British hybrid counter on HarmonyOS that was original
- Authors

- Name
- aimode.news
- @aimode_news
Line number: I wrote a Chinese-British hybrid counter on HarmonyOS, original, fine
Foreword
There is always a small-word reminder at the bottom of the editing box when writing a public article: "Recommend 1,500 words within." I'm staring at a screen that I've just finished, in which there's a lot of English, numbers and points, and Word always counts more words than I think. I'm beginning to be more serious: how exactly does one word count? Is "hello" a word or five? Does the Chinese label count as words? These questions sound like a sticker, but the writer cares.
That night, I thought, rather, I would write myself a counter, by my own rules — a Han word, a word for the English word, and at least the letters, numbers and points are separated, so I can see it at once. Opens DevEco Studio 6.1.1 Beta1, Pura X Max emulator to light screens, from TextArea to character run, and a hybrid word-counting tool is slowly spelled out. The article recorded the whole of that night, with the cold knowledge of Unicode, the regular limits, and the stupid way to classify handwritten characters. The code is the same as usual, and you can take the numbers with the simulator.
I. How exactly does “one word” count
Microsoft Word defines the word count as the number of words separated by spaces. In Chinese, each Chinese word counts one word and the English word counts one. Numerical strings and points are not normally counted in words, but are accounted for in " Characters " . This hybrid statistical rule is often confusing for Chinese-English-based texts.
For example:
Three words: hello, world, HarmonyOS.
Word will tell you the number of words: 7 (Today, Learn, Learn, Do, 3, Words, Hello, World, HarmonyOS, etc.), it will merge "3" and "one"? In fact, Word divides the Chinese word by word, the English word by space, complex rules for the processing of numbers and points). I don't want an Office. My needs are simple: the count of classifications, the rules of transparency. So I set myself four rules:
- Hanword: Chinese characters within the Unicode range, one for each.
- Letters: A-Z and a-z, word for word.
- Number: 0-9.
- Punctuation: Common points in English, including commas, periods, quotation marks, brackets, etc.
Spaces, line breaks, and other characters, emoji, are ignored and are not included in any statistics. The numbers are different from Word, but the rules are transparent, and you can see the numbers for each category. It's more practical for the author — like, “How many Chinese words I wrote”, “how many letters I used”, which is clear.
II. Unicode
To achieve the above classification, the program must know the " identity " of each character. It's going to have to go over the Unicode family.
Unicode assigns a single number to almost every character in the world, called " Code Point". For example, the code point for Han word "in" is U+4E2D, and the conversion to Decimal is 20013. charCodeAt (0) in JavaScript
You can get that number. So all we have to do is judge where this figure falls, and we can sort the characters.
The main block of the CJK unified HK (CJK unified expression) is U+4E00 to U+9FFF, which contains most common and sub-used HK. Extension A in U+3400 to U+4DBF adds some remote words. We counted both as Chinese. As for the expansion of sectors B, C, D, etc., the daily text is almost inaccessible, so leave it alone.
Letter: 65-90 (A-Z) and 97-122 (a-z) in the ASCII range. It's just so rude.
Figures: 48-57 (0-9).
Punctuation: This is more troublesome. The points in English are divided between 32-47, 58-64, 91-96, 123-126 of ASCII and contain spaces, comma, period, quotation marks, etc. Chinese labels are scattered in different Unicode blocks, such as full-angle commas U+FF0C, full-angle stop U+3002, book names U+300A, etc. The most cost-effective way to finish them is to write a regular expression directly or to collect them in a single character.
I choose to use a punctual string
, which lists all the common Chinese and English icons and uses the indexof
Judgement. This is maintained intuitively, and you want to add a point to the string.
The code is about the following:
# ^ ^ > } & + + + + > > > > ^ + + + + + + + + + ! ! ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Funct isPunctuation(ch:string): Boolean {
=-1;
♪ I'm sorry ♪
Watch the transposition of inverted and double quotes when you spell with a template string.
III. Real-time statistics
It's from HarmonyOS, TextArea.
One on Change.
In return, every text change triggers. When we put statistical logic into this echo, we got a real-time update. For every word you click in the input box, the following statistics jump, especially with your hands.
I used five of the state variables: inputText
It's not like you're going to do it.
englishCount
I'm not sure.
"PunctCount"
I don't know. Every time on Change
Trigger, call countchars.
Function Run Through InputText
, update four count variables. The code is very direct:
TextArea({placeholder: 'Input text...', text: this.inputText})
.onChange
This.inputText = value;
This.countrychars;
I'm sorry.
I don't know.
The function is judged word for cycle by word, added to a different counter. Here is a detail: the string run through ArkTS for (let i = 0; i < s. Length; i++)
Cooperation s.charAt(i)
Be safe. For...of
, but for compatibility and performance, the revolving index is the most stable.
iv. How does the interface work?
The numbers are not bright enough if they're just a few dry Texts. I made them into four little cards, two rows horizontally, two rows each. Each card has a number (large and thick) followed by a category label (grey small). From top to bottom are: Han, letters, numbers, nodes.
The colour is a soft piece of colour: a light blue base for the Chinese card, a light green letter, a light orange for the numbers, and a light purple mark. It's not like you read labels, but a quick look at the color.
Add a " Clear " button to the empty input box and all counts. The whole layout is also the classic Column flow layout, rolling without squeeze.
For visual effects, I put a little subtle "jump" in every card number change -- but this needs an animation in HarmonyOS, and I don't want to complicate the code, so I'm still. The focus is still on functions.
Complete code
The following codes fit DevEco Studio 6.1.1 Beta1, SDK22 syntax, Pura X Max emulator. New Emtty Ability project with entry/src/main/ets/pages/Index.ets
Select the replacement only. No permissions are required.
*
:: A Chinese-English hybrid statistical tool
:: Environment: DevEco Studio 6.1.1 Beta1, Pura X Max emulator, SDK22
*/
@Entry
@Component
Standard Index {
@StateinputText: string = ''
@SlatechieseCount: number = 0
@state englishCount: number = 0
@StatedigitCount: number = 0
@statepunctCount: number = 0
/ / Common Punctuated Symbols
Private virtually PUNCT CHARS: string
#! % (* % + ~ \ ~ ~ ~ \ \ ! ! ! ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
/ / Judge whether or not to be a pointer
This post is part of our special coverage Global Voices 2011.
=-1;
♪ I'm sorry ♪
/ / Count the number of characters
This post is part of our special coverage Global Voices 2011.
Let cn = 0, en = 0, dg = 0, pt = 0;
for (let i = 0; i < text. Length; i++) {
Let c = text.charat(i);
Let code = c.charcodeat (0);
/ / Range
If((code > = 0x4e00 & & code < = 0x9ff) || (code > = 0x3400 & & code < = 0x4dbf)) {
cn++;
♪ I'm sorry ♪
/ Letter
{(code > = 0x41 & & code < = 0x5a)|| (code > = 0x61 & & code < = 0x7a)) {
en++;
♪ I'm sorry ♪
/ Number
else if (code > = 0x30 & code < = 0x39) {
dg++;
♪ I'm sorry ♪
// Punctuation
If (this.isPunctuation(ch))
pt++;
♪ I'm sorry ♪
♪ I'm sorry ♪
This.chineseCount=cn;
This.englishCount=en;
This.digitCount=dg;
This.punctCount=pt;
♪ I'm sorry ♪
// Clear
Other Organiser
This.inputText=';
This.chineseCount = 0;
This.englishCount=0;
This.digitCount = 0;
This.punctCount=0;
♪ I'm sorry ♪
bueld() {
Column(){)
Text
.fontSize(26)
.fontWeight
.margin({top: 20, bottom: 10})
Text
.fontSize(15)
.fontColor ('#888')
.margin
/ / Input Area
TextArea({placeholder: 'Text: this.inputText})
.onChange
This.inputText = value;
This.countrychars;
I'm sorry.
.width ('92%')
.head (180)
...fontSize(16)
...backgroundColor
.borderRadius (8)
paddy (10)
/ Statistical card
Row(){
/ Hanja
Column(){)
Text(this.chineseCount.toString())
.fontSize(32)
.fontWeight
.fontColor ('#1565C0')
Text
.fontSize (14)
.fontColor ('#666')
♪ I'm sorry ♪
.width ('42%')
.head(80)
...backgroundColor
.borderRadius (10)
...justifyContent
...margin(4)
/ Letter
Column(){)
Text(this.englishCount.toString())
.fontSize(32)
.fontWeight
.fontColor ('#2E7D32')
Text
.fontSize (14)
.fontColor ('#666')
♪ I'm sorry ♪
.width ('42%')
.head(80)
.backgroundColor
.borderRadius (10)
...justifyContent
...margin(4)
♪ I'm sorry ♪
.width ('92%')
...justiceContent (FlexAlign.SpaceAround)
.margin({top: 15, bottom: 8})
Row(){
/ Number
Column(){)
Text(this.digitCount.toString())
.fontSize(32)
.fontWeight
.fontColor ('#E65100')
Text
.fontSize (14)
.fontColor ('#666')
♪ I'm sorry ♪
.width ('42%')
.head(80)
...backgroundColor
.borderRadius (10)
...justifyContent
...margin(4)
// Punctuation
Column(){)
Text(this.punctCount.tostrring())
.fontSize(32)
.fontWeight
.fontColor ('#6A1B9A')
Text
.fontSize (14)
.fontColor ('#666')
♪ I'm sorry ♪
.width ('42%')
.head(80)
...backgroundColor
.borderRadius (10)
...justifyContent
...margin(4)
♪ I'm sorry ♪
.width ('92%')
...justiceContent (FlexAlign.SpaceAround)
.margin
Button.
. type (ButtonType.Capsule)
.backgroundColor
.fontColor ('#333')
...fontSize(16)
.onClick(()=> {this.clarAll(;})
Text('💡"Unicode-based range classification characters: Han, letters, numbers, points, statistics)
.fontSize(12)
.fontColor
.width ('90%')
.textAlign (TextAlign.Center)
.margin({top: 12})
♪ I'm sorry ♪
...width('100%')
.head ('100%')
.backgroundColor
♪ I'm sorry ♪
♪ I'm sorry ♪
The above code goes through the characters using a hard-coded Unicode interval, which is simple, effective and compatible. PUNCT CHARS
Covers almost all daily English points, which, if omitted, can be added directly to the string.
Run Effects
Paste the code into DevEco Studio, Run to Pura X Max emulator. The top of the screen is a white text box with a cursor blinking. Enter "Three words learned today: hello, world, HarmonyOS." Four cards below jump immediately. Delete a few letters and type a few characters, and the numbers change in real time. Point " Clear " button, text box empty, and all counts zero.
Summary
This word-counting tool looks like a few numbers are jumping, but it picks up a very basic, but always neglected, knowledge spot: how the characters are classified by computers. In the process, you learned not only the use of the ArkUI component, but also:
- Unicode Coding System: To understand the pattern of the distribution of Chinese characters, letters, numbers in the code, and to understand why thousands of Chinese characters can be accessed in one range.
- Handwritten character classifier: used
CharCodeAt
A very simple but practical character classification engine has been achieved by adding inter-diagnostic judgement, which can be used in log analysis, text cleansing, format validation. - Real-time interactive design:
TextArea.onChange
Cooperate
Variables, input is feedback and users experience natural flow. - UI Visualization: Distinguishing data categories with card + colour, more intuitive than a simple column.
Then you'll see the hint of 1,500 words, and you'll probably have it in your mind: how many characters, how many letters, how many numbers and points. This sense of control is the greatest satisfaction that your own writing tools bring。