StructuredVision

A powerful toolkit for extracting structured JSON data from images using multiple AI-powered OCR and vision models. Specialized for game interfaces, documents, forms, and general text extraction with schema validation.


A powerful toolkit for extracting structured JSON data from images using multiple AI-powered OCR and vision models. Specialized for game interfaces, documents, forms, and general text extraction with schema validation.

๐Ÿš€ Overview

StructuredVision converts images into structured JSON data using state-of-the-art extraction methods:

  • Google Gemini Vision API for AI-powered structured data extraction with schema validation
  • Traditional OCR (Tesseract) for reliable text extraction with preprocessing
  • Vision Language Models (VLM) for advanced text understanding
  • EasyOCR for multi-language text recognition

Perfect for automating data extraction from screenshots, documents, forms, and game interfaces.

โœจ Features

๐ŸŽฏ Multi-Method Extraction

  • Gemini AI: Schema-based structured data extraction with validation
  • OCR: Traditional text extraction with preprocessing options
  • VLM: Vision-language models for context-aware extraction
  • Auto-Selection: Intelligent method selection based on input

๐ŸŽฎ Gaming Specialization

  • Rainbow Six Siege: Extract lobby data, team compositions, match details
  • Custom Gaming Schemas: Extensible for other games
  • Real-time Analysis: Process game screenshots for data analysis

๐Ÿ“„ Document Processing

  • Receipts: Extract merchant, items, totals, dates
  • Invoices: Parse billing information, line items, amounts
  • Business Cards: Extract contact information
  • Forms: Process filled form data

๐Ÿ”ง Advanced Features

  • JSON Schema Validation: Ensure data structure consistency
  • Batch Processing: Handle multiple images efficiently
  • Method Comparison: Compare different extraction approaches
  • Visualization: Generate annotated output images
  • CLI Interface: Command-line tool for automation

๐Ÿ“ฆ Installation

Prerequisites

  • Python 3.10+ (required)
  • Git for cloning the repository
# Clone the repository
git clone https://github.com/ammahmoudi/StructuredVision.git
cd StructuredVision

# Install with uv (recommended)
uv sync

# Or install with pip
pip install -e .

Method 2: Install with Dependencies

# Install with all optional dependencies
pip install -e ".[test,dev,ocr,docs]"

# Install only testing dependencies
pip install -e ".[test]"

# Install only OCR dependencies
pip install -e ".[ocr]"

Method 3: Package Installation

# Install as package
pip install .

# Or using setup.py
python setup.py install

API Keys Setup

# For Gemini API (required for structured extraction)
export GOOGLE_API_KEY="your_google_api_key_here"

# Or create a .env file
echo "GOOGLE_API_KEY=your_api_key" > .env

System Dependencies

# Install Tesseract OCR (optional, for OCR functionality)
# Ubuntu/Debian:
sudo apt-get install tesseract-ocr

# macOS:
brew install tesseract

# Windows: Download from [https://github.com/UB-Mannheim/tesseract/wiki](https://github.com/UB-Mannheim/tesseract/wiki)

Verify Installation

# Run tests to verify installation
uv run pytest tests/schemas/ tests/extractors/test_base_extractor.py tests/utils/test_config.py

# Or run the full test suite
python run_tests.py

# Check basic functionality
python -c "from structured_vision import StructuredVision; print('Installation successful!')"

๐ŸŽฏ Quick Start

Basic Text Extraction

from structured_vision import StructuredVision

# Initialize
sv = StructuredVision()

# Extract text
result = sv.extract_text("image.png")
print(result["extracted_text"])

Structured Data with Schema

# Extract receipt data
result = sv.extract_document_data("receipt.jpg", "receipt")
print(f"Merchant: {result['merchant_name']}")
print(f"Total: {result['total_amount']}")

# Extract gaming data
result = sv.extract_gaming_data("r6_lobby.png", "r6")
print(f"Map: {result['match_details']['map_name']}")
print(f"Teams: {result['teams']}")

Custom Schema

# Define custom schema
schema = {
    "title": "Business Card",
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "title": {"type": "string"},
        "company": {"type": "string"},
        "email": {"type": "string"}
    }
}

# Extract with schema
result = sv.extract("business_card.jpg", schema=schema)

๐Ÿ–ฅ๏ธ Command Line Interface

# Extract text from image
sv-extract image.png --text

# Extract document data
sv-extract receipt.jpg --document receipt

# Extract gaming data
sv-extract r6_lobby.png --gaming r6

# Compare methods
sv-extract image.png --compare

# Batch processing
sv-extract *.png --batch --output results/

# Use custom schema
sv-extract image.png --schema custom_schema.json

# Full example with options
sv-extract image.png --extractor gemini --output results/ --verbose

๐Ÿ“Š Output Examples

Text Extraction

{
    "extractor": "ocr",
    "image_path": "sample.png",
    "extracted_text": "PLAY OPERATORS SHOP...",
    "total_words": 15,
    "extracted_words": [
        {
            "text": "PLAY",
            "confidence": 96,
            "position": {"left": 380, "top": 55, "width": 61, "height": 42}
        }
    ]
}
EasyOCR text regions example

Figure: Detected text regions using EasyOCR โ€” useful for debugging region detection and OCR preprocessing.

Gaming Data (R6 Lobby)

{
    "match_details": {
        "lobby_header": "CUSTOM GAME",
        "time_remaining": "4:00",
        "game_type": "BOMB",
        "map_name": "OREGON",
        "map_time_of_day": "DAY"
    },
    "teams": {
        "blue_team": ["Player1", "Player2", "Player3"],
        "orange_team": ["Player4", "Player5"]
    },
    "spectators": ["Spectator1"]
}
Structured data extraction example

Figure: Example of the structured JSON output visualized alongside the original image โ€” helpful for verifying schema fields and values.

Receipt Data

{
    "merchant_name": "SuperMarket",
    "date": "2024-01-15",
    "total_amount": "$45.67",
    "items": [
        {"name": "Bread", "price": "$2.99"},
        {"name": "Milk", "price": "$3.49"}
    ]
}

๐Ÿ”ง Configuration

Configuration File

from structured_vision.utils import create_default_config

# Create default config
config = create_default_config("config.json")

# Use custom config
sv = StructuredVision(config="config.json")

Environment Variables

# Extraction settings
export SV_EXTRACTOR_TYPE=gemini
export SV_PREPROCESSING=adaptive
export SV_CONFIDENCE_THRESHOLD=70

# API keys
export GOOGLE_API_KEY=your_key

๐ŸŽฎ Gaming Examples

Rainbow Six Siege Lobby

# Specialized R6 extraction
result = sv.extract_gaming_data("r6_lobby.png", "r6")

# Access structured data
match_info = result["match_details"]
teams = result["teams"]
print(f"Playing {match_info['game_type']} on {match_info['map_name']}")

Custom Gaming Schema

# Define custom game schema
valorant_schema = {
    "title": "Valorant Match",
    "properties": {
        "map": {"type": "string"},
        "mode": {"type": "string"},
        "players": {"type": "array", "items": {"type": "string"}}
    }
}

result = sv.extract("valorant.png", schema=valorant_schema)

๐Ÿ“„ Document Processing

Receipts

# Extract receipt data
result = sv.extract_document_data("receipt.jpg", "receipt")
print(f"Total: {result['total_amount']}")
for item in result["items"]:
    print(f"- {item['name']}: {item['price']}")

Invoices

# Extract invoice data
result = sv.extract_document_data("invoice.pdf", "invoice")
print(f"Invoice #{result['invoice_number']}")
print(f"Vendor: {result['vendor']['name']}")

๐Ÿ” Method Comparison

# Compare different extraction methods
comparison = sv.compare_methods("image.png")

for method, result in comparison["methods"].items():
    if result["success"]:
        print(f"{method}: {len(result['extracted_text'])} characters")
    else:
        print(f"{method}: Failed - {result['error']}")

๐Ÿ“š Advanced Usage

Batch Processing

# Process multiple images
results = sv.batch_extract(
    image_paths=["img1.png", "img2.png", "img3.png"],
    schema=receipt_schema,
    output_dir="results/"
)

print(f"Processed {results['successful']} images successfully")

Custom Preprocessing

from structured_vision.utils import preprocess_image

# Custom image preprocessing
enhanced_image = preprocess_image("image.png", "sharpen")
result = sv.extract_text(enhanced_image)

๐Ÿ—๏ธ Project Structure

StructuredVision/
โ”œโ”€โ”€ structured_vision/          # Main package
โ”‚   โ”œโ”€โ”€ extractors/            # Extraction engines
โ”‚   โ”‚   โ”œโ”€โ”€ gemini_extractor.py    # Gemini API extractor
โ”‚   โ”‚   โ”œโ”€โ”€ ocr_extractor.py       # OCR-based extractors
โ”‚   โ”‚   โ””โ”€โ”€ base_extractor.py      # Base class
โ”‚   โ”œโ”€โ”€ utils/                 # Utilities
โ”‚   โ”‚   โ”œโ”€โ”€ image_processing.py    # Image preprocessing
โ”‚   โ”‚   โ””โ”€โ”€ config.py              # Configuration management
โ”‚   โ”œโ”€โ”€ schemas/               # Predefined schemas
โ”‚   โ”‚   โ”œโ”€โ”€ gaming.py              # Gaming schemas (R6, etc.)
โ”‚   โ”‚   โ””โ”€โ”€ documents.py           # Document schemas
โ”‚   โ”œโ”€โ”€ examples/              # Usage examples
โ”‚   โ””โ”€โ”€ main.py               # Main API class
โ”œโ”€โ”€ sv_extract.py             # CLI script
โ”œโ”€โ”€ setup.py                  # Package setup
โ””โ”€โ”€ requirements.txt          # Dependencies

๐Ÿ“‹ Requirements

Core Dependencies

  • Python 3.10+ (required)
  • google-genai โ‰ฅ1.36.0 - Gemini API for structured extraction
  • opencv-python โ‰ฅ4.11.0 - Computer vision operations
  • pillow โ‰ฅ11.3.0 - Image processing
  • numpy โ‰ฅ2.2.6 - Numerical operations
  • matplotlib โ‰ฅ3.10.6 - Visualizations
  • jsonschema โ‰ฅ4.0.0 - Schema validation
  • python-dotenv โ‰ฅ1.1.1 - Environment variable management

Optional Dependencies

OCR Support

  • pytesseract โ‰ฅ0.3.0 - Traditional OCR (requires system Tesseract)
  • easyocr โ‰ฅ1.7.0 - Multi-language OCR

AI/ML Support

  • torch โ‰ฅ2.8.0 - PyTorch for VLM models
  • transformers โ‰ฅ4.56.1 - Hugging Face transformers
  • sentencepiece โ‰ฅ0.2.1 - Text tokenization

Development Tools

  • pytest โ‰ฅ7.0.0 - Testing framework
  • pytest-cov โ‰ฅ4.0.0 - Coverage reporting
  • black โ‰ฅ23.0.0 - Code formatting
  • isort โ‰ฅ5.12.0 - Import sorting
  • mypy โ‰ฅ1.0.0 - Type checking

System Requirements

  • Tesseract OCR: For traditional OCR functionality
    • Ubuntu/Debian: sudo apt-get install tesseract-ocr
    • macOS: brew install tesseract
    • Windows: Download installer
  • GPU Support: CUDA for accelerated VLM processing (optional)
  • Memory: 4GB+ RAM recommended for large image processing

๏ฟฝ Testing

StructuredVision comes with a comprehensive test suite covering core functionality:

Quick Test Run

# Run core working tests (75 tests)
uv run pytest tests/schemas/ tests/extractors/test_base_extractor.py tests/utils/test_config.py

# Run with coverage
python run_tests.py

# PowerShell (Windows)
.\run_tests.ps1

Test Categories

  • Schema Tests (40 tests) โœ… - Document and gaming schema validation
  • Base Extractor Tests (16 tests) โœ… - Core extraction functionality
  • Configuration Tests (19 tests) โœ… - Config loading and environment handling

Current Test Status

  • Total Tests: 149 (75 currently passing)
  • Test Coverage: 28% and growing
  • CI/CD: GitHub Actions integration ready

Running All Tests

# Full test suite (includes some failing tests under development)
uv run pytest tests/ --cov=structured_vision

# Generate HTML coverage report
uv run pytest tests/ --cov=structured_vision --cov-report=html

Test Development

Tests are organized by component:

tests/
โ”œโ”€โ”€ schemas/           # Schema validation tests โœ…
โ”œโ”€โ”€ extractors/        # Extractor functionality tests
โ”œโ”€โ”€ utils/            # Utility function tests โœ…
โ”œโ”€โ”€ test_integration.py # End-to-end integration tests
โ””โ”€โ”€ conftest.py       # Shared test fixtures

๏ฟฝ๐Ÿค Contributing

We welcome contributions! Hereโ€™s how to get started:

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Make changes and add tests
  4. Run tests: python run_tests.py or pytest tests/
  5. Submit a Pull Request

Development Setup

# Clone your fork
git clone https://github.com/your-username/StructuredVision.git
cd StructuredVision

# Install in development mode with all dependencies
uv sync

# Or with pip
pip install -e ".[test,dev,ocr,docs]"

# Run tests to verify setup
python run_tests.py

# Run linting and formatting
black structured_vision/ tests/
isort structured_vision/ tests/
mypy structured_vision/

Testing Guidelines

  • Write tests for new features in the appropriate tests/ subdirectory
  • Aim for high test coverage of new code
  • Use the existing test patterns and fixtures in conftest.py
  • Test both success and error cases

Code Style

  • Black for code formatting (line length: 100)
  • isort for import sorting
  • mypy for type checking
  • pytest for testing

Run the pre-commit hooks:

# Install pre-commit
pip install pre-commit
pre-commit install

# Run manually
pre-commit run --all-files

๐Ÿ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • Google Gemini - AI-powered structured extraction
  • Tesseract OCR - Open source OCR engine
  • OpenCV - Computer vision library
  • EasyOCR - Multi-language text recognition
  • Transformers - Hugging Face model library

๐Ÿ“ž Support


StructuredVision - Transform images into structured data with AI precision ๐ŸŽฏ