!!Con 2020 - Trapped in codepoints no more! I’m freeing Chinese characters by Gábor Ugray

!!Con 2020 - Trapped in codepoints no more! I’m freeing Chinese characters by Gábor Ugray

Channel:
Subscribers:
42,400
Published on ● Video Link: https://www.youtube.com/watch?v=lep9LJQ5Muc



Duration: 12:39
87 views
3


Trapped in codepoints no more! I’m freeing Chinese characters by Gábor Ugray

Unless you’re one of the 1.5 billion people who learned to write Chinese in school, you probably only know the script as that remarkable thing where you must master thousands of characters to read the daily paper. But those thousands of characters have a structure: they mix, re-mix, combine and shuffle and creatively glue together only a few hundred components.

Together, this makes up an intricate system where elements combine with other elements, adding a piece of meaning here and hinting at the pronunciation there. The details are often more arcane than the spelling of English words, but if you get the knack of the system, the script as a whole suddenly starts making sense. The way computers encode Chinese characters erases all of this. There are no components, no shapes, and no system of interlocking parts. It’s all reduced to one code point per character.

I’m showing you how I’m building a dataset of Chinese characters and their parts. I promote SVG shapes to first-order citizens with meaning, sound, and historical background. The knowledge is out there in print books and unstructured digital content, but it’s never been collected in a thought-through machine-readable format. It’s a long journey: in 2 years I’ve covered 20% of the 9,000 characters in common use today.

I’ll conclude by showing the incredibly cool tools this dataset makes possible, from an interactive two-dimensional graph of every Chinese character to a unique cross-linked character dictionary app.

Gábor is co-founder of memoQ, the tool that brought real-time collaboration to translators before Google Docs was cool. He loves building whimsical language tools and has been known to train neural networks to translate. He blogs at jealousmarkup.xyz and tweets as @twilliability. He was last spotted zooming across Berlin on a sleek red racing bike.




Other Videos By Confreaks


2021-04-19!!Con 2020 - Sparking Musical Joy at Home With Magnetic Stripe Swipe Cards and... by Helen Hou-Sandí
2021-04-19!!Con 2020 - Bang Bang!! My Interpreter Shot Me Down! by Julia Tufts
2021-04-19!!Con 2020 - Learning your 爱比西s: Translating Chinese into Morse code! by Franklin Hu
2021-04-19!!Con 2020 - The Taming of the Clue: Making a Crossword Solver Bot by Chloe Revery
2021-04-19!!Con 2020 - Playing Breakout… inside a PDF!! by Omar Rizwan
2021-04-19!!Con 2020 - Little Printing for everyone!!1 by Tamás Kádár
2021-04-19!!Con 2020 - Printing floating point numbers is surprisingly hard!! by Gargi Sharma
2021-04-19!!Con 2020 - Reverse engineer your ski goggles for fun and profit! by Jonathan Kingsley
2021-04-19!!Con 2020 - Opening Keynote by Taeyoon Choi
2021-04-19!!Con 2020 - Bringing back my 1998 by building an arduino that can recover a ... by Martin Gaston
2021-04-19!!Con 2020 - Trapped in codepoints no more! I’m freeing Chinese characters by Gábor Ugray
2021-04-19RustConf 2020 - Rust for Non-Systems Programmers by Rebecca Turner
2021-04-19RustConf 2020 - Opening Keynote
2021-04-19RustConf 2020 - Controlling Telescope Hardware with Rust by Ashley Hauck
2021-04-19RustConf 2020 - Macros for a More Productive Rust by jam1garner
2021-04-19RustConf 2020 - Under a Microscope: Exploring Fast and Safe Rust for Biology by Samuel Lim
2021-04-19RustConf 2020 - My First Rust Project: Creating a Roguelike with Amethyst by Micah Tigley
2021-04-19RustConf 2020 - Error handling Isn't All About Errors by Jane Lusby
2021-04-19RustConf 2020 - Closing Keynote by Siân Griffin
2021-04-19RustConf 2020 - Bending the Curve: A Personal Tutor at Your Fingertips by Esteban Kuber
2021-04-19RustConf 2020 - How to Start a Solo Project that You'll Stick With by Harrison Bachrach