ASCII ,Uni Code Related Functions in Python

ASCII Code and Related Functions in Python

ASCII (American Standard Code for Information Interchange) is a character encoding standard that assigns numerical values to letters, digits, punctuation marks, and other characters. Here’s an explanation of ASCII and Python functions that work with it.

ASCII Basics

  • ASCII uses 7 bits to represent characters (values 0-127)
  • Standard ASCII (0-127) includes:
    • Control characters (0-31 and 127)
    • Printable characters (32-126): letters, numbers, punctuation
  • Extended ASCII (128-255) varies by system/locale

Python Functions for ASCII

1. ord() – Get ASCII value of a character

python

char = 'A'
ascii_value = ord(char)
print(ascii_value)  # Output: 65

2. chr() – Get character from ASCII value

python

ascii_value = 97
character = chr(ascii_value)
print(character)  # Output: 'a'

3. String methods that work with ASCII concepts

isascii() – Check if all characters are ASCII (Python 3.7+)

python

text = "Hello"
print(text.isascii())  # Output: True

encode() – Convert string to bytes using ASCII encoding

python

text = "ABC"
bytes_data = text.encode('ascii')  # b'ABC'

4. ASCII-related checks

python

# Check if character is uppercase letter
def is_uppercase(c):
    return ord('A') <= ord(c) <= ord('Z')

# Check if character is lowercase letter
def is_lowercase(c):
    return ord('a') <= ord(c) <= ord('z')

# Check if character is digit
def is_digit(c):
    return ord('0') <= ord(c) <= ord('9')

Practical Examples

Convert string to ASCII values

python

text = "Hello"
ascii_values = [ord(c) for c in text]
print(ascii_values)  # Output: [72, 101, 108, 108, 111]

Create string from ASCII values

python

values = [72, 101, 108, 108, 111]
text = ''.join(chr(v) for v in values)
print(text)  # Output: "Hello"

ASCII Art Example

python

# Simple ASCII art
print("""
  /\\
 /  \\
/____\\
""")

Important Notes

  1. Python 3 uses Unicode by default, which extends beyond ASCII
  2. Not all Unicode characters have ASCII equivalents
  3. When working with files or network protocols, you may need to explicitly use ASCII encoding

Remember that while ASCII is fundamental, modern Python applications typically work with Unicode (UTF-8) to support international characters.

Unicode, Code Points, Planes, and Digraphs in Python

Unicode Overview

Unicode is a universal character encoding standard that aims to represent every character from every writing system in the world. Unlike ASCII (which only covers 128 characters), Unicode currently defines over 149,000 characters.

Key Concepts

1. Code Points

  • A code point is a numerical value that represents a specific character in Unicode
  • Represented as U+ followed by hexadecimal digits (e.g., U+0041 for ‘A’)
  • In Python, code points can be accessed using ord():

python

print(ord('A'))     # 65 (U+0041)
print(ord('€'))     # 8364 (U+20AC)
print(ord('🐍'))    # 128013 (U+1F40D) - Python snake emoji

2. Planes

Unicode divides its code space into 17 planes (groups of 65,536 code points each):

  • Plane 0 (BMP – Basic Multilingual Plane): U+0000 to U+FFFF
    • Contains most commonly used characters
  • Plane 1 (SMP – Supplementary Multilingual Plane): U+10000 to U+1FFFF
    • Historic scripts, musical symbols, emoji
  • Plane 2 (SIP – Supplementary Ideographic Plane): U+20000 to U+2FFFF
    • Rare CJK characters
  • Planes 3-13: Unassigned
  • Plane 14 (SSP – Supplementary Special-purpose Plane): U+E0000 to U+EFFFF
    • Special-purpose characters
  • Planes 15-16 (PUA – Private Use Areas): U+F0000 to U+10FFFF
    • For private/custom character definitions

3. Digraphs and Combining Characters

  • Digraph: A pair of characters representing one sound (like ‘ch’ in Spanish)
  • Combining characters: Special code points that modify previous characters:
    • Example: 'n' + '̃' = 'ñ' (U+006E + U+0303 = U+00F1)

python

# Combining character example
n = '\u006E'       # 'n'
tilde = '\u0303'    # Combining tilde
print(n + tilde)    # Output: 'ñ'
print(len(n + tilde))  # Length is 2 (but appears as one character)

Python Unicode Functions

1. chr() – Create character from code point

python

print(chr(65))        # 'A'
print(chr(128013))    # '🐍'

2. str.encode() – Convert to bytes

python

text = "Python 🐍"
utf8_bytes = text.encode('utf-8')
print(utf8_bytes)  # b'Python \xf0\x9f\x90\x8d'

3. bytes.decode() – Convert bytes to string

python

bytes_data = b'Python \xf0\x9f\x90\x8d'
text = bytes_data.decode('utf-8')
print(text)  # 'Python 🐍'

4. Unicode Escape Sequences

python

# Using \u for BMP (4 hex digits)
print('\u0041')  # 'A'

# Using \U for non-BMP (8 hex digits)
print('\U0001F40D')  # '🐍'

# Using \N with name
print('\N{SNAKE}')   # '🐍'

5. Normalization (unicodedata module)

python

from unicodedata import normalize, name

# Normalization forms: NFC, NFD, NFKC, NFKD
text = 'ñ'
print(normalize('NFC', text))  # Combines into 'ñ' (U+00F1)
print(normalize('NFD', 'ñ'))   # Decomposes to 'n' + '̃'

# Get character name
print(name('A'))      # 'LATIN CAPITAL LETTER A'
print(name('🐍'))     # 'SNAKE'

6. String Methods

python

text = "Python3️⃣🐍"

# Check if all characters are alphanumeric
print(text.isalnum())  # True (emoji and numbers count)

# Case folding (more aggressive than lower())
print('ß'.casefold())  # 'ss'

Working with Surrogate Pairs

Characters outside BMP (above U+FFFF) are represented using surrogate pairs in UTF-16:

python

# Emoji is outside BMP (U+1F40D)
snake = '🐍'

# Length in code points
print(len(snake))            # 1

# Actual UTF-16 representation
print([hex(ord(c)) for c in snake])  # ['0xd83d', '0xdc0d'] (surrogate pair)

Practical Examples

1. Iterating over Unicode characters

python

for char in "Hello🐍":
    print(f"{char}: U+{ord(char):04X}")

2. Creating custom characters

python

# Using private use area
custom_char = chr(0xE000)
print(custom_char)  #  (private use character)

3. Checking character properties

python

import unicodedata

def char_info(c):
    print(f"Character: {c}")
    print(f"Code point: U+{ord(c):04X}")
    print(f"Name: {unicodedata.name(c)}")
    print(f"Category: {unicodedata.category(c)}")
    
char_info('A')
char_info('🐍')

Important Notes

  1. Python 3 strings are Unicode by default
  2. The sys.maxunicode value indicates if Python was built with “narrow” (UCS-2) or “wide” (UCS-4) Unicode support
  3. When working with files, always specify encoding (preferably UTF-8)
  4. Some emoji are actually sequences of multiple code points (emoji + modifiers)

Similar Posts

  • date time modules class55

    In Python, the primary modules for handling dates and times are: 🕰️ Key Built-in Modules 1. datetime This is the most essential module. It provides classes for manipulating dates and times in both simple and complex ways. Class Description Example Usage date A date (year, month, day). date.today() time A time (hour, minute, second, microsecond,…

  • Functions as Objects

    Functions as Objects and First-Class Functions in Python In Python, functions are first-class objects, which means they can be: 1. Functions as Objects In Python, everything is an object, including functions. When you define a function, you’re creating a function object. python def greet(name): return f”Hello, {name}!” # The function is an object with type ‘function’…

  • Top Programming Languages and Tools Developed Using Python

    Python itself is not typically used to develop other programming languages, as it is a high-level language designed for general-purpose programming. However, Python has been used to create domain-specific languages (DSLs), tools for language development, and educational languages. Here are some examples: 1. Hy 2. Coconut Description: A functional programming language that compiles to Python. It adds…

  • Functions as Parameters in Python

    Functions as Parameters in Python In Python, functions are first-class objects, which means they can be: Basic Concept When we pass a function as a parameter, we’re essentially allowing one function to use another function’s behavior. Simple Examples Example 1: Basic Function as Parameter python def greet(name): return f”Hello, {name}!” def farewell(name): return f”Goodbye, {name}!” def…

  • Anchors (Position Matchers)

    Anchors (Position Matchers) in Python Regular Expressions – Detailed Explanation Basic Anchors 1. ^ – Start of String/Line Anchor Description: Matches the start of a string, or start of any line when re.MULTILINE flag is used Example 1: Match at start of string python import re text = “Python is great\nPython is powerful” result = re.findall(r’^Python’, text) print(result) #…

  • pop(), remove(), clear(), and del 

    pop(), remove(), clear(), and del with 5 examples each, including slicing where applicable: 1. pop([index]) Removes and returns the item at the given index. If no index is given, it removes the last item. Examples: 2. remove(x) Removes the first occurrence of the specified value x. Raises ValueError if not found. Examples: 3. clear() Removes all elements from the list, making it empty. Examples: 4. del Statement Deletes elements by index or slice (not a method, but a…

Leave a Reply

Your email address will not be published. Required fields are marked *