Screen Capture & OCR¶

Screen Capture¶

ScreenCaptureManager captures a user-selected screen region using macOS's built-in screencapture tool, invoked via AppleScript.

Why AppleScript?¶

macOS ties TCC (Transparency, Consent, and Control) permissions — including Screen Recording — to the binary path. This creates a problem during development: every Xcode rebuild produces a new binary at a different path, invalidating the Screen Recording permission grant.

By running screencapture via AppleScript's do shell script, the capture executes as an independent process with its own TCC context:

do shell script "/usr/sbin/screencapture -i -s '/path/to/output.png'"

The -i flag enables interactive mode (like Cmd+Shift+4) and -s restricts to selection mode. The user draws a rectangle, and the screenshot is saved to the specified path.

Capture Flow¶

Generate a unique temp file path: {tempDir}/mathy_capture_{UUID}.png
Run the AppleScript via NSAppleScript on a background queue (DispatchQueue.global)
User draws a selection rectangle on screen
If the file exists after the script returns (user didn't cancel):
- Copy to persistent storage: ~/Library/Application Support/Mathy/images/capture_{timestamp}.png
- Delete the temp file
- Return the persistent URL
If no file exists (user pressed Escape): return nil

The entire operation uses withCheckedContinuation to bridge to async/await.

Image Storage¶

Captured images are stored persistently at:

~/Library/Application Support/Mathy/images/capture_{timestamp}.png

These files are referenced by ConversionRecord.imagePath and displayed in the preview popup. When history entries are deleted (individually or via "Clear All"), the associated image files are also removed from disk.

OCR Service¶

OCRService is the HTTP client that sends captured images to the local Python server for LaTeX recognition.

Request Format¶

POST http://127.0.0.1:8765/predict
Content-Type: multipart/form-data; boundary={uuid}
Timeout: 30 seconds

--{boundary}
Content-Disposition: form-data; name="file"; filename="capture.png"
Content-Type: image/png

{binary image data}
--{boundary}--

The multipart body is constructed manually (no third-party HTTP library). A UUID is used as the boundary string.

Response Handling¶

Success (200):

{"latex": "\\frac{a}{b}"}

The latex field is extracted and returned as a String.

Errors:

Code	Meaning	Handling
400	Invalid image	Throws `OCRError.serverError(detail)`
503	Model not loaded	Throws `OCRError.serverError(detail)`
500	Prediction failed	Throws `OCRError.serverError(detail)`
Network error	Server unreachable	Throws underlying `URLError`

Error responses are decoded from {"detail": "..."} when available, with a fallback to a generic HTTP status code message.

Error Types¶

enum OCRError: Error {
    case invalidResponse    // Non-HTTP response
    case httpError(Int)     // Non-200 status (fallback)
    case serverError(String) // Detailed error from server
}

Integration¶

OCRService is called from AppState.startCapture(). Errors are caught and printed to console but not shown to the user (the capture simply appears to fail silently). The service uses URLSession.shared.data(for:) with native async/await — no callbacks or Combine.