development en agent skills

apm::packages

@orchestra-research/autogpt-agents

Autonomous AI agent platform for building and deploying continuous agents. Use when creating visual workflow agents, deploying persistent autonomous agents, or building complex multi-step AI automation systems.

★ 5,030MIT

Orchestra-Research/development·2,233 tokens

@orchestra-research/deepspeed

skill

Expert guidance for distributed training with DeepSpeed - ZeRO optimization stages, pipeline parallelism, FP16/BF16/FP8, 1-bit Adam, sparse attention

★ 5,030MIT

Orchestra-Research/development·33,311 tokens

@orchestra-research/fine-tuning-with-trl

skill

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.

★ 5,030MIT

Orchestra-Research/development·3,079 tokens

@orchestra-research/nemo-curator

skill

GPU-accelerated data curation for LLM training. Supports text/image/video/audio. Features fuzzy deduplication (16× faster), quality filtering (30+ heuristics), semantic deduplication, PII redaction, NSFW detection. Scales across GPUs with RAPIDS. Use for preparing high-quality training datasets, cleaning web data, or deduplicating large corpora.

★ 5,030MIT

Orchestra-Research/development·2,513 tokens

@orchestra-research/modal-serverless-gpu

skill

Serverless GPU cloud platform for running ML workloads. Use when you need on-demand GPU access without infrastructure management, deploying ML models as APIs, or running batch jobs with automatic scaling.

★ 5,030MIT

Orchestra-Research/development·2,149 tokens

@orchestra-research/sentencepiece

skill

Language-independent tokenizer treating text as raw Unicode. Supports BPE and Unigram algorithms. Fast (50k sentences/sec), lightweight (6MB memory), deterministic vocabulary. Used by T5, ALBERT, XLNet, mBART. Train on raw text without pre-tokenization. Use when you need multilingual support, CJK languages, or reproducible tokenization.

★ 5,030MIT

Orchestra-Research/development·1,572 tokens

@orchestra-research/llama-cpp

skill

Runs LLM inference on CPU, Apple Silicon, and consumer GPUs without NVIDIA hardware. Use for edge deployment, M1/M2/M3 Macs, AMD/Intel GPUs, or when CUDA is unavailable. Supports GGUF quantization (1.5-8 bit) for reduced memory and 4-10× speedup vs PyTorch on CPU.

★ 5,030MIT

Orchestra-Research/development·1,928 tokens

@orchestra-research/nemo-evaluator-sdk

skill

Evaluates LLMs across 100+ benchmarks from 18+ harnesses (MMLU, HumanEval, GSM8K, safety, VLM) with multi-backend execution. Use when needing scalable evaluation on local Docker, Slurm HPC, or cloud platforms. NVIDIA's enterprise-grade platform with container-first architecture for reproducible benchmarking.

★ 5,030MIT

Orchestra-Research/development·3,310 tokens

@orchestra-research/llamaguard

skill

Meta's 7-8B specialized moderation model for LLM input/output filtering. 6 safety categories - violence/hate, sexual content, weapons, substances, self-harm, criminal planning. 94-95% accuracy. Deploy with vLLM, HuggingFace, Sagemaker. Integrates with NeMo Guardrails.

★ 5,030MIT

Orchestra-Research/development·2,491 tokens

@orchestra-research/openrlhf-training

skill

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.

★ 5,030MIT

Orchestra-Research/development·2,537 tokens

@orchestra-research/weights-and-biases

skill

Track ML experiments with automatic logging, visualize training in real-time, optimize hyperparameters with sweeps, and manage model registry with W&B - collaborative MLOps platform

★ 5,030MIT

Orchestra-Research/development·3,272 tokens

@orchestra-research/ray-data

skill

Scalable data processing for ML workloads. Streaming execution across CPU/GPU, supports Parquet/CSV/JSON/images. Integrates with Ray Train, PyTorch, TensorFlow. Scales from single machine to 100s of nodes. Use for batch inference, data preprocessing, multi-modal data loading, or distributed ETL pipelines.

★ 5,030MIT

Orchestra-Research/development·1,917 tokens

@orchestra-research/sentence-transformers

skill

Framework for state-of-the-art sentence, text, and image embeddings. Provides 5000+ pre-trained models for semantic similarity, clustering, and retrieval. Supports multilingual, domain-specific, and multimodal models. Use for generating embeddings for RAG, semantic search, or similarity tasks. Best for production embedding generation.

★ 5,030MIT

Orchestra-Research/development·1,574 tokens

@orchestra-research/torchforge-rl-training

skill

Provides guidance for PyTorch-native agentic RL using torchforge, Meta's library separating infra from algorithms. Use when you want clean RL abstractions, easy algorithm experimentation, or scalable training with Monarch and TorchTitan.

★ 5,030MIT

Orchestra-Research/development·2,611 tokens

@orchestra-research/sglang

skill

Fast structured generation and serving for LLMs with RadixAttention prefix caching. Use for JSON/regex outputs, constrained decoding, agentic workflows with tool calls, or when you need 5× faster inference than vLLM with prefix sharing. Powers 300,000+ GPUs at xAI, AMD, NVIDIA, and LinkedIn.

★ 5,030MIT

Orchestra-Research/development·3,178 tokens

@orchestra-research/mamba-architecture

skill

State-space model with O(n) complexity vs Transformers' O(n²). 5× faster inference, million-token sequences, no KV cache. Selective SSM with hardware-aware design. Mamba-1 (d_state=16) and Mamba-2 (d_state=128, multi-head). Models 130M-2.8B on HuggingFace.

★ 5,030MIT

Orchestra-Research/development·2,097 tokens

@orchestra-research/crewai-multi-agent

skill

Multi-agent orchestration framework for autonomous AI collaboration. Use when building teams of specialized agents working together on complex tasks, when you need role-based agent collaboration with memory, or for production workflows requiring sequential/hierarchical execution. Built without LangChain dependencies for lean, fast execution.

★ 5,030MIT

Orchestra-Research/development·3,184 tokens

@orchestra-research/awq-quantization

skill

Activation-aware weight quantization for 4-bit LLM compression with 3x speedup and minimal accuracy loss. Use when deploying large models (7B-70B) on limited GPU memory, when you need faster inference than GPTQ with better accuracy preservation, or for instruction-tuned and multimodal models. MLSys 2024 Best Paper Award winner.

★ 5,030MIT

Orchestra-Research/development·2,482 tokens

@orchestra-research/rwkv-architecture

skill

RNN+Transformer hybrid with O(n) inference. Linear time, infinite context, no KV cache. Train like GPT (parallel), infer like RNN (sequential). Linux Foundation AI project. Production at Windows, Office, NeMo. RWKV-7 (March 2025). Models up to 14B parameters.

★ 5,030MIT

Orchestra-Research/development·1,990 tokens

@orchestra-research/langsmith-observability

skill

LLM observability platform for tracing, evaluation, and monitoring. Use when debugging LLM applications, evaluating model outputs against datasets, monitoring production systems, or building systematic testing pipelines for AI applications.

★ 5,030MIT

Orchestra-Research/development·2,331 tokens

Prev 1...37 38 39...173 Next

Page 38 of 173