Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Language Filters

Filter repositories based on the programming languages they use.

Overview

LanguageFilter analyzes repositories to determine:

  • Primary language - The dominant language by lines of code
  • Language mix - Percentage breakdown of all languages
  • Language presence - Whether a specific language is used at all

Basic Usage

#![allow(unused)]
fn main() {
use refactor::prelude::*;

// Find Rust projects
Codemod::from_github_org("company", token)
    .repositories(|r| r
        .primary_language("rust"))
    .apply(transform)
    .execute()?;
}

Primary Language

Filter by the dominant language:

#![allow(unused)]
fn main() {
// Primarily Rust
.primary_language("rust")

// Primarily TypeScript
.primary_language("typescript")

// Primarily Python
.primary_language("python")
}

The primary language is determined by lines of code (excluding comments and blanks).

Language Percentage

Filter by language percentage:

#![allow(unused)]
fn main() {
use refactor::prelude::*;

// At least 80% TypeScript
.language_percentage("typescript", ComparisonOp::GreaterThanOrEqual, 80.0)

// Less than 20% JavaScript (mostly migrated)
.language_percentage("javascript", ComparisonOp::LessThan, 20.0)

// Significant Go presence
.language_percentage("go", ComparisonOp::GreaterThan, 30.0)
}

Language Presence

Check if any amount of a language is present:

#![allow(unused)]
fn main() {
// Contains any TypeScript
.has_language("typescript")

// Contains any Rust
.has_language("rust")
}

Multiple Languages

All Required (AND)

#![allow(unused)]
fn main() {
// TypeScript project with some Rust
.repositories(|r| r
    .primary_language("typescript")
    .has_language("rust"))
}

Exclude Languages

#![allow(unused)]
fn main() {
// TypeScript without JavaScript
.repositories(|r| r
    .primary_language("typescript")
    .excludes_language("javascript"))
}

Language Mix

#![allow(unused)]
fn main() {
// Mixed TypeScript/JavaScript projects
.repositories(|r| r
    .language_percentage("typescript", ComparisonOp::GreaterThan, 40.0)
    .language_percentage("javascript", ComparisonOp::GreaterThan, 30.0))
}

Supported Languages

Languages are detected by file extension:

LanguageExtensions
Rust.rs
TypeScript.ts, .tsx
JavaScript.js, .jsx, .mjs
Python.py, .pyi
Go.go
Java.java
C#.cs
Ruby.rb
C.c, .h
C++.cpp, .cc, .hpp, .cxx
Swift.swift
Kotlin.kt, .kts
PHP.php
Scala.scala
Elixir.ex, .exs
Haskell.hs
Shell.sh, .bash
HTML.html, .htm
CSS.css
SCSS.scss, .sass
JSON.json
YAML.yaml, .yml
TOML.toml
Markdown.md

Direct Usage

#![allow(unused)]
fn main() {
use refactor::discovery::LanguageFilter;

let filter = LanguageFilter::primary("rust");

// Check a single repository
if filter.matches(Path::new("./my-project"))? {
    println!("Project is primarily Rust");
}

// Get language breakdown
let analysis = LanguageFilter::analyze(Path::new("./my-project"))?;

println!("Primary language: {}", analysis.primary);
println!("Language breakdown:");
for (lang, stats) in &analysis.languages {
    println!("  {}: {} lines ({:.1}%)",
        lang, stats.lines, stats.percentage);
}
}

Language Analysis

Get detailed language statistics:

#![allow(unused)]
fn main() {
use refactor::discovery::LanguageAnalysis;

let analysis = LanguageAnalysis::for_repo(Path::new("./project"))?;

println!("Repository: {}", analysis.repo_path.display());
println!("Primary: {} ({:.1}%)", analysis.primary, analysis.primary_percentage);
println!();
println!("All languages:");
for lang in analysis.ranked() {
    println!("  {}: {} files, {} lines ({:.1}%)",
        lang.name,
        lang.files,
        lang.lines,
        lang.percentage);
}
}

Configuration

Exclude Patterns

#![allow(unused)]
fn main() {
let filter = LanguageFilter::primary("rust")
    .exclude_patterns(&[
        "**/target/**",
        "**/vendor/**",
        "**/node_modules/**",
    ]);
}

Minimum Threshold

Ignore languages below a threshold:

#![allow(unused)]
fn main() {
let filter = LanguageFilter::primary("rust")
    .min_percentage(5.0);  // Ignore languages < 5%
}

Count by Files vs Lines

#![allow(unused)]
fn main() {
// Default: count by lines
.primary_language("rust")

// Alternative: count by file count
.primary_language_by_files("rust")
}

Use Cases

Find Migration Candidates

#![allow(unused)]
fn main() {
// JavaScript projects to migrate to TypeScript
Codemod::from_github_org("company", token)
    .repositories(|r| r
        .primary_language("javascript")
        .excludes_language("typescript"))
    .collect_repos()?;
}

Find Polyglot Projects

#![allow(unused)]
fn main() {
// Projects using both frontend and backend languages
Codemod::from_github_org("company", token)
    .repositories(|r| r
        .has_language("typescript")
        .has_language("go"))
    .collect_repos()?;
}

Language Inventory

#![allow(unused)]
fn main() {
// Count language usage across org
let repos = Codemod::from_github_org("company", token)
    .collect_all_repos()?;

let mut lang_counts: HashMap<String, usize> = HashMap::new();
let mut lang_lines: HashMap<String, usize> = HashMap::new();

for repo in &repos {
    let analysis = LanguageAnalysis::for_repo(&repo.path)?;
    for (lang, stats) in &analysis.languages {
        *lang_counts.entry(lang.clone()).or_insert(0) += 1;
        *lang_lines.entry(lang.clone()).or_insert(0) += stats.lines;
    }
}

println!("Language usage across organization:");
for (lang, count) in lang_counts.iter().sorted_by_key(|(_, c)| Reverse(*c)) {
    println!("  {}: {} repos, {} total lines",
        lang, count, lang_lines.get(lang).unwrap_or(&0));
}
}

Find Pure Language Projects

#![allow(unused)]
fn main() {
// Pure Rust projects (no other languages)
Codemod::from_github_org("company", token)
    .repositories(|r| r
        .language_percentage("rust", ComparisonOp::GreaterThanOrEqual, 95.0))
    .collect_repos()?;
}

Error Handling

#![allow(unused)]
fn main() {
use refactor::error::RefactorError;

let filter = LanguageFilter::primary("rust");

match filter.matches(Path::new("./project")) {
    Ok(true) => println!("Primarily Rust"),
    Ok(false) => println!("Not primarily Rust"),
    Err(RefactorError::IoError(e)) => {
        println!("Failed to analyze: {}", e);
    }
    Err(e) => return Err(e.into()),
}
}

Performance

Language analysis is fast but can be optimized:

  1. Exclude vendored code - Skip node_modules, vendor, etc.
  2. Cache results - Analysis is cached by default
  3. Sample large repos - For very large repos, sample directories
#![allow(unused)]
fn main() {
Codemod::from_github_org("company", token)
    .repositories(|r| r
        .primary_language("rust")
        .cache_duration(Duration::from_secs(3600)))
    .collect_repos()?;
}

See Also