SurfSense Documentation
Connectors

GitHub

Connect your GitHub repositories to SurfSense

GitHub Integration Setup Guide

This guide walks you through connecting your GitHub repositories to SurfSense for code search and AI-powered insights.

How it works

The GitHub connector uses gitingest to fetch and process repository contents from GitHub.

  • For follow-up indexing runs, the connector retrieves the latest repository state and updates changed files.
  • Indexing should be configured to run periodically, so updates should appear in your search results within minutes.

What Gets Indexed

Content TypeExamples
Code FilesPython, JavaScript, TypeScript, Go, Rust, Java, etc.
DocumentationREADME files, Markdown documents, text files
ConfigurationJSON, YAML, TOML, .env examples, Dockerfiles

Binary files and files larger than 5MB are automatically excluded.


Quick Start (Public Repos)

  1. Navigate to ConnectorsAdd ConnectorGitHub
  2. Enter repository names: owner/repo (e.g., facebook/react, vercel/next.js)
  3. Click Connect GitHub

No authentication required for public repositories.


Private Repositories

For private repos, you need a GitHub Personal Access Token (PAT).

Generate a PAT

  1. Go to GitHub's token creation page (pre-filled with repo scope)
  2. Set an expiration
  3. Click Generate token and copy it

The token starts with ghp_. Store it securely.

Periodic Sync

Enable periodic sync to automatically re-index repositories when content changes. Available frequencies: Every 5 minutes, 15 minutes, hourly, every 6 hours, daily, or weekly.


Connector Configuration

FieldDescriptionRequired
Connector NameA friendly name to identify this connectorYes
GitHub Personal Access TokenYour PAT (only for private repos)No
Repository NamesComma-separated list: owner/repo1, owner/repo2Yes

Troubleshooting

Repository not found

  • Verify format is owner/repo
  • For private repos, ensure PAT has access

Authentication failed

  • Check PAT is valid and not expired
  • Token should start with ghp_ or github_pat_

Rate limit exceeded

  • Use a PAT for higher limits (5,000/hour vs 60 unauthenticated)
  • Reduce sync frequency

On this page