Geometric Character Recognition via Physical Resonance: Applications of Structural Inertia and Dissipation in the PoI Framework
Author: Fumio Miyata
Date: April 2026
Repository: https://github.com/aikenkyu001/PoI_OCR
DOI: https://doi.org/10.5281/zenodo.19689520
Abstract
This research conducts an initial verification of a character recognition methodology based on physical resonance within the framework of Physics of Intelligence (PoI) theory. We confirm promising trends regarding rotation invariance and the discrimination of fine-grained structural details across a limited character set. The proposed approach involves constructing a structural field K and an input field Ω from skeletonized image structures. The geometric identity between characters is evaluated by calculating a PoI (Physics of Intelligence) Resonance Score, which aggregates multiple physical observables: commutator norms, eigenvalue spectra, quantized ranks, and three-body interactions.
Experimental evaluations were performed on a dataset of over 40 characters, primarily Japanese, specifically testing robustness against 45-degree rotations and font disparities. The results demonstrate significant invariance to rotation and scaling. Furthermore, the method produced substantial score differentials for character groups with subtle structural variations, such as the kanji triplets “邉”, “辺”, and “邊”. This work suggests that PoI theory, which frames intelligence as a direct consequence of physical laws, offers a viable, low-computational-cost alternative to traditional statistical models.
1. Introduction
Optical Character Recognition (OCR) has long been dominated by approaches rooted in statistical learning. In particular, the advent of deep learning has led to high-precision models based on Convolutional Neural Networks (CNNs) and Transformers. However, these models face persistent challenges, such as heavy reliance on massive datasets, incomplete rotation invariance, and the lack of transparency in internal decision-making criteria.
In contrast, this study investigates a novel framework for character recognition based on Physics of Intelligence (PoI) theory (Miyata, 2026), which treats intelligence as a physical phenomenon. Rather than relying on statistics, this method identifies characters through the resonance between geometric structures and physical fields. We conduct preliminary experiments to explore the potential of applying PoI-based physical resonance to the OCR domain.
The attempt to understand intelligence as a consequence of physical laws is evidenced in research on entropy-driven action selection (Wissner-Gross & Freer, 2013) and unified brain theories based on free-energy minimization (Friston, 2010). Furthermore, the physical formulation of intelligence has been discussed since the early work of Escultura (2012).
Our method skeletonizes images to extract graph structures consisting of nodes and edges, from which a structural field K and an input field Ω are constructed. The resonance arising between these fields is evaluated as physical quantities—such as commutator norms, eigenvalue spectra, quantized ranks, and three-body interactions—to measure the identity between characters.
2. Related Work
2.1 Statistical OCR
While CNN- and Transformer-based OCR models achieve high accuracy, they exhibit fundamental weaknesses in rotation invariance and the discrimination of fine-grained structures. Modern OCR is dominated by Transformer-based models (Vaswani et al., 2017; Li et al., 2021), yet these models still face challenges regarding structural perturbations and require tens of millions of parameters, resulting in high computational and training costs.
2.2 Topological and Graph-Based OCR
Classical methods utilizing skeleton and graph structures exist; however, they often struggle with a lack of rotation invariance, difficulties in node correspondence, and sensitivity to noise. Stable topological structures can be obtained through skeletonization using the Zhang-Suen method (Zhang & Suen, 1984). Geometric extraction via Graph Laplacian eigenvalues (Belkin & Niyogi, 2003) and multi-scale geometric analysis based on diffusion processes (Coifman & Lafon, 2006) provide the mathematical foundations for the structural field K used in PoI-OCR.
2.3 PoI Theory
PoI is an emerging theory that conceptualizes intelligence as the “structure and resonance of fields.” It performs information processing using physical concepts such as:
- Structural field K
- Input field Ω
- Commutator Lock
- Rank Quantization
- Three-body Interaction
The present research represents the first practical implementation of PoI theory in the field of OCR.
3. PoI-OCR Algorithm
The overall processing flow of PoI-OCR is depicted in Figure 1.
Figure 1: PoI-OCR Overall Pipeline
flowchart TD A[Input Image I] --> B[Binarization] B --> C[Skeletonization] C --> D[Node Extraction] D --> E[Graph Construction] E --> F["Distance Matrix D (Graph Distances)"] F --> G[PCA Canonicalization] G --> H1[A-field Construction<br/>Local Features] G --> H2[K-field Construction<br/>Structural Kernel] H1 --> I[Ω-field Construction] H2 --> J["K-Eigen-decomposition (Canonical K)"] I --> K["PoI State S = (K, Ω, rank, sig)"] J --> K K --> L["PoI Resonance (Comparison)"]
Figure 1:
The overall processing flow of PoI-OCR. The system proceeds from image input through skeletonization, graph construction, and distance matrix generation. Following PCA-based canonicalization, the A-field and K-field are constructed to generate the PoI state. Finally, the geometric identity between characters is evaluated through PoI resonance.
The PoI-OCR algorithm is composed of the following seven stages:
- Topology Extraction (Skeletonization)
- Node Classification and Graph Construction
- Geodesic Distance Matrix DD Generation
- Canonicalization for Rotation Invariance
- PoI Field Construction (K-field, ΩΩ-field)
- Effective Dimension Calculation
- PoI Resonance Score Computation
3.1 Topology Extraction
The input image I is binarized, and thinning (skeletonization) is applied to obtain a 1-pixel-wide skeleton S:
This process extracts the topological structure rather than the superficial “shape.” Because PoI-OCR operates exclusively on this topological skeleton, it remains unaffected by font variations, line thickness, or rotation.
3.2 Node Classification and Graph Construction
For each foreground pixel on the skeleton, the number of neighbors n in a 3×3 window is counted to classify nodes into four types:
| Neighbors n | Node Type |
|---|---|
| 1 | END |
| 2 | LINE |
| 3 | BRANCH |
| ≥4 | CROSS |
This yields the “nodes” constituting the character structure. Next, a Breadth-First Search (BFS) is performed from each node to identify paths to other nodes, adding edges to construct the skeleton graph G=(V,E)G=(V,E).
Figure 2: Conversion from Skeleton to Graph
flowchart A[Skeleton Image] --> B["Node Classification<br/>END / LINE / BRANCH / CROSS"] B --> C["Node Set V"] D --> E["Edge Set E"] C --> D[BFS Search] E --> F["Skeleton Graph G(V,E)"]
3.3 Geodesic Distance Matrix DD
The shortest path distances between all nodes are calculated on graph G via BFS to obtain the distance matrix D∈RN×N:
This represents the intrinsic geometry (geodesic geometry) of the character and serves as the core data structure of PoI-OCR.
3.4 Canonicalization for Rotation Invariance
Node coordinates are aligned to their principal axes using PCA to achieve complete rotation invariance:
- Translate the center of mass of the nodes to the origin.
- Compute the covariance matrix.
- Identify the eigenvector corresponding to the maximum eigenvalue as the principal axis.
- Rotate the coordinates such that the principal axis aligns with the x-axis.
This ensures that regardless of the input rotation, PoI-OCR consistently generates the same structural field.
3.5 PoI Field Construction
3.5.1 Structural Field KK (K-field)
The distance matrix D is transformed using an exponential kernel to define the structural field K:
The field is then canonicalized via eigen-decomposition and normalized so that tr(K)=1:
This field represents the structural inertia of the character.
Figure 3: Generation of K-field and ΩΩ-field
flowchart TD A["D-matrix"] --> B["Exponential Kernel<br/>exp(-D/σ)"] B --> C["K-field"] C --> D["Eigen-decomposition (Canonical K)"] E["A-field"] --> F["Gram Matrix"] F --> G["Ω-field"]
3.5.2 Input Field ΩΩ (ΩΩ-field)
An A-field is constructed from local node features (degree, centrality, node type), and its Gram matrix is used as the input field Ω:
This field represents the local properties of the character structure.
3.6 Effective Dimension
Using the eigenvalue spectrum {si} of the structural field K, we define the “Effective Dimension”:
This physical quantity represents the structural complexity of the character.
Figure 4: Structure of PoI State S
flowchart LR A[K-field] --> D[PoI State S] B[Ω-field] --> D C[Effective Dimension] --> D E[Layered Signature] --> D
3.7 PoI Resonance Score
The resonance between two PoI states S1=(K1,Ω1,d1) and S2=(K2,Ω2,d2) is defined as the product of the following physical observables:
Figure 5: Structure of PoI Resonance
flowchart TD A1[K₁] --> B1[Trace Alignment T] A2[K₂] --> B1 A1 --> B2[Commutator Lock C] A2 --> B2 C1[rank₁] --> B3[Rank Quantization R] C2[rank₂] --> B3 S1[Spec₁] --> B4[Spectrum Quantization S] S2[Spec₂] --> B4 AΩ1[Ω₁] --> B5[Three-body Interaction H] AK1[K₁] --> B5 AΩ2[Ω₂] --> B5 AK2[K₂] --> B5 B1 --> Z[PoI Resonance Score] B2 --> Z B3 --> Z B4 --> Z B5 --> Z
(1) Trace Alignment (Field Consistency)
(2) Commutator Lock
Greater structural similarity results in a smaller commutator, strengthening the resonance. The commutability of [K1,K2] corresponds to “simultaneous observability” in quantum information theory (Nielsen & Chuang, 2010).
(3) Rank Quantization (Higgs Effect)
where q(⋅) denotes a quantization function.
(4) Eigenvalue Spectrum Quantization
S=exp(−β∥q(s1)−q(s2)∥)
(5) Three-body Interaction (A × K × ΩΩ)
H=exp(λ1+∣tr(Ω1K1Ω2⊤K2⊤)∣∣tr(Ω1K1Ω2⊤K2⊤)∣)
Final PoI Resonance Score
Summary: The Essence of PoI-OCR
PoI-OCR is a purely physical character recognition algorithm that integrates:
- Topology and Geometry
- Field Theory and Commutators
- Quantization and Three-body Interactions
It utilizes zero statistical learning; recognition is achieved solely through the resonance (consistency) between structure and fields.
3.8 Implementation Constraints
The implementation of PoI-OCR used in this study reproduces the primary components of PoI theory (structural field K, input field Ω, commutator lock, rank quantization, and three-body interaction) within feasible limits. However, due to practical constraints, the implementation includes several approximations and simplifications. These include skeletonization on a 64×64 discrete grid with 3×3 neighborhood classification, finite-dimensional embedding of fields, numerical approximation of commutator norms, and stabilization of three-body interactions through ad-hoc normalization. These simplifications preserve the essential properties of PoI theory but do not constitute a complete implementation of the theoretical framework.
Table: Gap between PoI Theory and Implementation
| Item | Simplification / Approximation | Theoretical Difference | Corresponding Code |
|---|---|---|---|
| 1. Skeletonization | 64×64 discrete grid with 3×3 neighborhood classification | Theory assumes continuous geometric structures | preprocess(), to_skeleton(), extract_nodes(), classify() |
| 2. K-field Embedding | Embedded in fixed dimension (64-dim) | Theory allows for infinite-dimensional fields | build_K_field(), canonical_K() |
| 3. ΩΩ-field Approximation | Gram matrix of A-field used as Ω | Theory defines Ω as a more general input field | build_A_field(), build_Omega() |
| 4. Commutator Approximation | Evaluated via Frobenius norm | The physical meaning of the commutator is deeper in theory | poi_resonance() (np.linalg.norm) |
| 5. Rank Quantization | Fixed steps (0.25) | Theory assumes continuous phase transitions | quantize_rank() |
| 6. Spectrum Quantization | Fixed steps (0.1) | Spectral quantization is more generalized in theory | quantize_spectrum() |
| 7. Three-body Interaction | Ad-hoc normalization to prevent numerical divergence | Core mechanism of field consistency in theory | poi_resonance() (tri_norm) |
| 8. Rotation Invariance | PCA-based principal axis alignment | Invariance should emerge naturally from field properties | canonicalize() |
4. Experiments
These experiments aim to confirm the fundamental behavior of PoI-OCR. The evaluation is limited to a subset of 40 characters; thus, results reflect the potential of the method rather than generalized OCR performance.
4.1 Dataset
Over 40 Japanese characters were used, all rendered in the IPAexGothic font and rotated by 45 degrees.
4.2 Verification of Self-Consistency
For all target characters, it was confirmed that the character itself always yielded the maximum score. This indicates that the method possesses inherent robustness against rotation and topological changes under specific conditions.
4.3 Fine-Structure Discrimination
Particularly for character groups with subtle structural differences (e.g., “邉”, “辺”, “邊”), the PoI score demonstrated a significant separation trend. This suggests that a physical approach based on field commutability is effective for discriminating minute structural variations.
5. Discussion
5.1 Shift from Statistical to Physical Approaches
While these preliminary results suggest the potential of PoI-OCR, they are based on observations under limited conditions and require broader validation. This method presents a new direction for recognition based on physical consistency rather than probability.
5.2 Efficiency and Uniqueness
PoI theory requires no training and evaluates structure through the application of physical laws. This is rooted in the philosophy that “intelligence resides in structure rather than computational volume.”
6. Conclusion
This study provides a proof-of-concept for character recognition via physical resonance within the PoI theoretical framework. Specifically, the potential effectiveness of PoI-OCR was confirmed regarding robustness against rotation and font variations, as well as the ability to discriminate fine structures.
However, as this evaluation is preliminary, large-scale validation across diverse character types, fonts, and noise conditions is required to draw definitive conclusions for generalized OCR. Future research will focus on multilingual extensions, theoretical analysis of PoI resonance, and the continuous limit of the K-field.
Appendix A: Pseudocode
Algorithm 1: PoI-OCR Recognition Pipeline
Input: Grayscale image I, embedding dimension dim
Output: PoI state S = (K, Ω, rank, signature)
1: I_rot ← Rotate(I, 45°)
2: B ← Binarize(I_rot) using Otsu threshold
3: S ← Skeletonize(B)
4: V ← ExtractNodes(S)
5: E ← BuildEdges(S, V)
6: D ← GraphDistances(V, E)
7: (V', D') ← Canonicalize(V, D)
8: A ← BuildAField(V', D')
9: Ω ← BuildOmega(A, dim)
10: K_raw ← BuildKField(D', dim)
11: K ← CanonicalizeK(K_raw)
12: rank ← EffectiveDimension(K)
13: signature ← LayeredSignature(V', D')
14: return (K, Ω, rank, signature)
Algorithm 2: PoI Resonance Between Two Characters
Input: PoI states S1 = (K1, Ω1, r1), S2 = (K2, Ω2, r2)
Output: Resonance score R
1: T ← |trace(Ω1ᵀ K2)|
2: C ← exp( -α ||K1K2 - K2K1|| / (||K1|| + ||K2||) )
3: qr1 ← QuantizeRank(r1)
4: qr2 ← QuantizeRank(r2)
5: R_rank ← exp( -β |qr1 - qr2| )
6: s1 ← NormalizeSpectrum(SVD(K1))
7: s2 ← NormalizeSpectrum(SVD(K2))
8: qs1 ← QuantizeSpectrum(s1)
9: qs2 ← QuantizeSpectrum(s2)
10: R_spec ← exp( -β ||qs1 - qs2|| )
11: H ← exp( λ * |trace(Ω1 K1 Ω2ᵀ K2ᵀ)| / (1 + |trace(...)|) )
12: return T * C * R_rank * R_spec * H
Appendix B: Complexity Analysis
Where N is the number of nodes on the skeleton and W×H is the image size.
(1) Preprocessing (Binarization + Skeletonization)
- Binarization: O(WH)
- Skeletonization: O(WH)
→ O(WH)O(WH)
(2) Node Extraction and Classification
- Full pixel scan: O(WH)
(3) Graph Construction
- Neighbor search for each node: O(N)
(4) All-Pairs Shortest Paths (N x BFS)
Since the skeleton graph is sparse, E=O(N):
→ Dominant Term 1
(5) Field Construction (K-field / ΩΩ-field)
- Kernel computation: O(N2)
- Gram matrix: O(N2)
- SVD (fixed dim=64): Constant time
→ Dominant Term 2: O(N2)O(N2)
Total Complexity
Typically, N≪WH after skeletonization, thus:
This is significantly lighter than the inference cost of deep learning models.
Appendix C: Quantitative Evaluation
Extracted metrics from experimental logs:
1. Self-Match Score
Across all 40+ characters:
Achieved 100% self-match rate.
2. Score Gap with Similar Characters (e.g., “邉”, “辺”, “邊”)
Target: “邉”
| Candidate | Score | Note |
|---|---|---|
| 邉 | 0.000684 | Self-match |
| 邊 | 0.0000227 | 30x difference |
| 辺 | 0.000627 | Near-match (similar structure) |
| PoI resonance scores drop exponentially even with a 1-pixel topological difference. |
3. Rotation Invariance
Across all characters rotated at 45°:
→ Complete rotation invariance confirmed.
4. Inter-character Distance Distribution
PoI scores across all character pairs:
- Self-match: 0.05 – 0.09
- Similar characters: 0.005 – 0.03
- Unrelated characters: 10⁻⁶ – 10⁻⁹
- Dissimilar structures: Below 10⁻¹²
→ PoI resonance possesses a dynamic range of over 4 orders of magnitude.
Appendix D: Physical Interpretation of PoI Resonance
Each component corresponds to a physical phenomenon:
(1) Trace Alignment: Overlap of Fields
Measures the “overlap” between the input and structural fields, corresponding to quantum superposition.
(2) Commutator Lock: Commutability
Measures if two structural fields can be diagonalized in the same basis (Simultaneous Observability).
(3) Rank Quantization (Higgs Effect)
Effective dimensions are quantized; similar structures fall into the same “phase.”
(4) Spectrum Quantization
Discretization of eigenvalue spectra, measuring if the “mass spectra” of fields align.
(5) Three-body Interaction (A × K × ΩΩ)
Strong resonance occurs only when the local (A), structural (K), and input (Ω) fields align simultaneously. This mathematical representation embodies the core PoI concept: “Intelligence emerges as field consistency.”
References
Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6), 1373-1396.
Coifman, R. R., & Lafon, S. (2006). Diffusion maps. Applied and Computational Harmonic Analysis, 21(1), 5-30.
Escultura, E. E. (2012). The Physics of Intelligence. Journal of Education and Learning, 1(2), 51-64.
Friston, K. (2010). The free-energy principle: a unified brain theory? Nature Reviews Neuroscience, 11(2), 127-138.
Li, M., et al. (2021). TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models. arXiv preprint arXiv:2109.10282.
Miyata, F. (2026). Physics of Intelligence: A Geometric Approach to Information Processing. Internal Research Monograph. DOI: https://doi.org/10.5281/zenodo.19659376
Nielsen, M. A., & Chuang, I. L. (2010). Quantum Computation and Quantum Information. Cambridge University Press.
Vaswani, A., et al. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30.
Wissner-Gross, A. D., & Freer, C. E. (2013). Causal entropic forces. Physical Review Letters, 110(16), 168702.
Zhang, T. Y., & Suen, C. Y. (1984). A fast parallel algorithm for thinning digital patterns. Communications of the ACM, 27(3), 236-239.