GitHub - MARINE-AI/MARINE

MARINE

A distributed microservices system for detecting pirated video content. Marine integrates several microservices for video analysis, crawling, and discovery.

Table of Contents

About
Features
Built With
Project Structure
Installation
Configuration
Usage
Database Schema
Microservices Overview
Contributing

About

MARINE is a distributed system designed to help content owners detect pirated video content across the web. It consists of several specialized microservices that work together to:

Analyze videos using advanced fingerprinting techniques.
Crawl websites to download and segment video content.
Discover potential piracy sites via metadata search and Google dorking.
Manage user uploads through a SaaS dashboard with a Golang backend (Next.js/TailwindCSS frontend).

Features

Video Analysis:
Extract keyframes, compute perceptual hashes (pHashes), and (optionally) generate audio fingerprints from video files.
Dual Pipeline Storage:
- Uploaded Videos: Videos uploaded by content owners are processed and stored in the uploaded_videos table.
- Crawled Videos: Videos discovered via crawling are segmented into chunks, reassembled, and analyzed; the results are stored in the crawled_videos table.
Active Matching:
The system actively compares uploaded videos against crawled videos to flag potential piracy.
Distributed Architecture:
Combines multiple microservices:
- Analysis Microservice: Built with Python and FastAPI.
- Golang Backend: Processes user uploads via Kafka (SaaS dashboard built with Next.js/TailwindCSS).
- Crawler Microservice: Crawls URLs and downloads video content using ffmpeg, yt-dlp, and other tools.
- Discovery Microservice: Uses Google dorking and metadata search to discover sites that might host copyrighted content.

Built With

Python, FastAPI – For building asynchronous web services.
Uvicorn – ASGI server for running FastAPI.
SQLAlchemy (with asyncpg) – For asynchronous database operations.
aiokafka – For Kafka integration.
Redis – For caching fingerprints (if used).
Loguru – For logging.
ffmpeg, yt-dlp – For video downloading, segmentation, and reassembly.
Golang, Next.js, TailwindCSS – For the SaaS dashboard backend and frontend.
Google Dorking & Metadata Search – Used in the Discovery microservice.

Project Structure

The project is organized into four main components:

marine/
├── marine-analysis
│   ├── fingerprint/        # Contains modules for video and audio fingerprinting
│   ├── storage/            # Redis utilities for caching fingerprints
│   ├── config.py           # FastAPI configuration settings
│   ├── db.py               # Database schema & initialization (models for analysis)
│   ├── main.py             # AI microservice endpoints (video analysis, chunk processing)
│   └── requirements.txt    # Python dependencies for the analysis service
├── marine-backend
│   ├── config/             # Go configuration files
│   ├── controllers/        # API controllers (e.g., video, report)
│   ├── eventhandlers/      # Kafka event handler(s)
│   ├── models/             # Data models in Go
│   ├── services/           # Business logic and client services (e.g., AI service client)
│   ├── go.mod, go.sum      # Go module files
│   └── main.go             # Entry point for the Golang backend
├── marine-crawler
│   ├── app/
│   │   ├── storage/        # Redis utilities (Python)
│   │   ├── config.py       # Crawler configuration
│   │   ├── crawler.py      # Logic for crawling URLs and extracting video links
│   │   ├── downloader.py   # Downloads videos using yt-dlp and ffmpeg
│   │   ├── kafka_client.py # Kafka integration for sending download tasks
│   │   └── main.py         # Entry point for the crawler microservice
│   ├── requirements.txt    # Python dependencies for the crawler
│   └── run.py              # Script to start the crawler service
├── marine-dashboard
│   ├── app/                # Next.js app with API routes, components, and layout files
│   ├── lib/                # Utility functions (e.g., auth)
│   ├── config files
│   ├── package.json        # Node.js dependencies for the dashboard
│   └── README.md           # Documentation for the dashboard
├── marine-discovery
│   ├── config.py           # Configuration for the discovery service
│   ├── dorking.py          # Module for performing Google dorking
│   ├── main.py             # Entry point for the discovery service
│   ├── requirements.txt    # Python dependencies for discovery
│   └── scraper.py          # Scraping logic for discovering potential piracy sites
└── .gitignore              # Global git ignore file

Installation

Marine Analysis (AI Microservice)

Clone the Repository:

git clone https://github.com/yourusername/Marine.git
cd Marine/marine-analysis

Create and Activate a Virtual Environment:

On Linux/macOS:

python -m venv env
source env/bin/activate

On Windows:
```
python -m venv env
env\Scripts\activate
```

Install Python Dependencies:
```
pip install -r requirements.txt
```

Marine Backend (Golang)

Navigate to the backend directory:
```
cd ../marine-backend
```
Build the Golang application:
```
go build -o marine-backend
```

Marine Crawler

Navigate to the crawler directory:
```
cd ../marine-crawler
```

Create and activate a virtual environment, then install dependencies:

python -m venv env
source env/bin/activate
pip install -r requirements.txt

Marine Dashboard

Navigate to the dashboard directory:
```
cd ../marine-dashboard
```
Install Node dependencies:
```
npm install
```

Marine Discovery

Navigate to the discovery directory:
```
cd ../marine-discovery
```
Create and activate a virtual environment (if using Python), then install dependencies:
```
python -m venv env
source env/bin/activate
pip install -r requirements.txt
```

Configuration

Each microservice is configured via environment variables. Create a .env file in the project root with contents similar to the following:

# Database Configuration
DATABASE_URL=postgresql+asyncpg://user:password@localhost/dbname

# Kafka Configuration
KAFKA_BOOTSTRAP_SERVERS=your.kafka.server:9092
KAFKA_CRAWL_TOPIC=crawl-tasks
KAFKA_VIDEO_DOWNLOAD_TOPIC=video-download-tasks
KAFKA_VIDEO_CHUNKS_TOPIC=video-chunks

# Redis Configuration
REDIS_HOST=your.redis.server
REDIS_PORT=6379
REDIS_DB=0

# Video Analysis Configuration
FRAMES_DIR=frames
SIMILARITY_THRESHOLD=0.8
REFERENCE_REDIS_KEY=ref_phashes

# AI Microservice URL (if used by other services)
AI_MICROSERVICE_URL=http://localhost:8000

Adjust the values as needed for your environment.

Usage

Analysis Microservice (Marine Analysis)

Content Owner Pipeline

Endpoint: /match-video
Method: POST
Description:
Content owners upload a full video. The service extracts keyframes, computes pHashes (and optionally audio fingerprints), and upserts the analysis result in the uploaded_videos table. The video is actively matched against crawled videos in the database.

Crawled Video Pipeline

Endpoint: /upload-video-chunk
Method: POST
Description:
The crawler uploads video chunks to this endpoint. Once all chunks are received, a background task automatically reassembles the video and triggers the analysis pipeline.
Endpoint: /analyze
Method: POST
Description:
Manually triggers the reassembly and analysis of a crawled video. The analysis result is stored in the crawled_videos table.

Golang Backend

Description:
The Golang backend, built with Kafka, handles user uploads from a SaaS dashboard (Next.js/TailwindCSS) and forwards upload events to the Analysis Microservice.

Marine Crawler

Description:
The crawler microservice crawls websites for video content using tools like ffmpeg and yt-dlp, downloads videos, segments them into chunks, and sends them for analysis.

Marine Discovery

Description:
Uses Google dorking and metadata search techniques to discover websites that may host copyrighted content.

Database Schema

The database schema is defined in marine-analysis/db.py. Two tables are used:

UploadedVideo:
Stores analysis results from videos uploaded by content owners.
CrawledVideo:
Stores analysis results from videos obtained via crawling.

Example schema:

import os
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
from sqlalchemy.orm import declarative_base, sessionmaker
from sqlalchemy import Column, Integer, String, Float, JSON, DateTime, Boolean, func

DATABASE_URL = os.getenv("DATABASE_URL", "postgresql+asyncpg://user:password@localhost/dbname")

engine = create_async_engine(DATABASE_URL, echo=True)
Base = declarative_base()
async_session = sessionmaker(engine, expire_on_commit=False, class_=AsyncSession)

class UploadedVideo(Base):
    __tablename__ = "uploaded_videos"
    id = Column(Integer, primary_key=True, index=True)
    video_id = Column(String, unique=True, index=True)
    video_url = Column(String)
    match_score = Column(Float, nullable=True)
    uploaded_phashes = Column(JSON)
    audio_spectrum = Column(JSON, nullable=True)
    flagged = Column(Boolean, default=False)
    created_at = Column(DateTime, server_default=func.now())

class CrawledVideo(Base):
    __tablename__ = "crawled_videos"
    id = Column(Integer, primary_key=True, index=True)
    video_id = Column(String, unique=True, index=True)
    video_url = Column(String)
    match_score = Column(Float, nullable=True)
    uploaded_phashes = Column(JSON)
    audio_spectrum = Column(JSON, nullable=True)
    flagged = Column(Boolean, default=False)
    created_at = Column(DateTime, server_default=func.now())

async def init_db():
    async with engine.begin() as conn:
        await conn.run_sync(Base.metadata.create_all)

Running the Services

Marine Analysis (AI Microservice)

From the marine-analysis directory, run:

uvicorn app.main:app --host 0.0.0.0 --port 8000

Alternatively, using the run.py at the root:

import uvicorn

if __name__ == "__main__":
    uvicorn.run("app.main:app", host="0.0.0.0", port=8000, reload=False)

Then execute:

python run.py

Marine Backend

From the marine-backend directory, build and run:

./marine-backend

Marine Crawler

From the marine-crawler directory, run:

uvicorn app.main:app --host 0.0.0.0 --port 8001

Marine Discovery

Follow the instructions in the marine-discovery directory to build and run that service.

Logging

Logging is managed using Loguru. Logs are output to the console and provide detailed information about application startup, processing, and errors.

Contributing

Contributions are welcome! To contribute:

Fork the repository.
Create a feature branch:
```
git checkout -b feature/my-feature
```
Commit your changes.
Push to your fork and open a pull request.

For major changes, please open an issue first to discuss your ideas.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MARINE

About

Features

Built With

Project Structure

Installation

Marine Analysis (AI Microservice)

Marine Backend (Golang)

Marine Crawler

Marine Dashboard

Marine Discovery

Configuration

Usage

Analysis Microservice (Marine Analysis)

Content Owner Pipeline

Crawled Video Pipeline

Golang Backend

Marine Crawler

Marine Discovery

Database Schema

Running the Services

Marine Analysis (AI Microservice)

Marine Backend

Marine Crawler

Marine Discovery

Logging

Contributing

About

Releases

Packages

Contributors 6

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 156 Commits
marine-analysis		marine-analysis
marine-backend		marine-backend
marine-crawler		marine-crawler
marine-dashboard		marine-dashboard
marine-discovery		marine-discovery
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
pitch.pdf		pitch.pdf

MARINE-AI/MARINE

Folders and files

Latest commit

History

Repository files navigation

MARINE

About

Features

Built With

Project Structure

Installation

Marine Analysis (AI Microservice)

Marine Backend (Golang)

Marine Crawler

Marine Dashboard

Marine Discovery

Configuration

Usage

Analysis Microservice (Marine Analysis)

Content Owner Pipeline

Crawled Video Pipeline

Golang Backend

Marine Crawler

Marine Discovery

Database Schema

Running the Services

Marine Analysis (AI Microservice)

Marine Backend

Marine Crawler

Marine Discovery

Logging

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages