Solving Advanced Mathematics Using Optical Character Recognition
Introduction
Optical Character Recognition (OCR)
OCR is a technology that converts images of text into machine-readable text. It involves hardware and software to analyze and convert the content of physical documents into digital text for further processing. Our project leverages OCR for solving advanced mathematical equations quickly and accurately.
Impact of Using OCR in Mathematics
Efficiently solves complex math expressions that are difficult to input into calculators.
Helps students verify their solutions.
Our application integrates a camera to capture and solve mathematical equations, increasing accuracy and saving time.
Working
Our model operates in two main parts: searching and solving mathematical equations using OCR.
Searching
User Input: User types a query in the search bar.
Backend Processing: Python backend determines the type of problem and sends the result to the frontend.
Scanning
Camera Interface: Users can upload or scan images.
Image Processing: Image is stored in Firebase and processed by our OCR model.
Solution Generation: The OCR model converts the image to LaTeX format, which is then solved by the backend and displayed to the user.
Image to Markup Model
Basic Architecture
Encoder: Standardizes the image size, extracts visual features using a Deep CNN, and enables modeling longer sequences.
Decoder: A Recurrent Neural Network (DRNN) that generates the output word sequence based on previous words and image regions attended to.
Proposed Method
Image Input
Equation Scanning
Gray Scaling
Binarization
Character Recognition: Using Tesseract OCR to detect text.
Blob Detection: Identifies connected components in the binary image.
Parsed Into Equation: Converts recognized characters into mathematical equations.
Computer Algebra System: Solves the equations.
Image Output: Displays the solved equations.
Performance Analysis
Our model achieves more than 75% accuracy with less than 20% training loss.
Conclusion
Our project demonstrates the potential of OCR technology in solving advanced mathematical equations efficiently and accurately.
Future Improvements
Expand to other mathematical domains.
Implement Artificial General Intelligence for broader applications.
P.S. Before making the repo public, I had to remove all the AWS Server commits, but couldn't find any way to hide the credentials from the commit history. Finally after couple of hours decided to just clone the branch and delete the main branch so that there isn't any trace of the commit