RowSpeak Private Deployment: Performance Benchmarks

This document provides reference performance data for RowSpeak Private Deployment across different hardware configurations and usage scenarios. Use it to set expectations, plan infrastructure, and validate your deployment.


Summary

Metric Value
Inference latency (first token) < 100ms
Average full response time 3–8 seconds
Uptime SLA 99.9%
Concurrent users (standard config) 50+
Data leaks 0 (by architecture)

Test Environment Reference

All benchmarks below were run on the following standard configuration unless otherwise noted.

Component Specification
CPU 16-core Intel Xeon
RAM 64 GB DDR4
GPU NVIDIA A10 (24 GB VRAM)
Storage 1 TB NVMe SSD
OS Ubuntu 22.04 LTS
Model DeepSeek-V2 (local)
Network 1 Gbps internal

Response Time by Task Type

Average Response Time by Task Type (seconds) 0s 2s 4s 6s 8s 1.2s Simple Query 3.5s Spreadsheet Analysis 4.8s Chart Generation 6.2s Report Summary 7.8s Multi-sheet Workbook Standard tasks Output generation Complex workbooks

Concurrency Benchmarks

How RowSpeak performs as simultaneous users increase.

P95 Response Time vs. Concurrent Users 0s 5s 10s 15s 20s 10 20 50 100 200 Concurrent Users 3.2s 4.1s 6.8s 11.2s 18.5s 200+ users: recommend Enterprise cluster config

Performance by Hardware Configuration

Configuration Concurrent Users Avg Response P95 Response Recommended For
Minimum (8-core, 32GB, 16GB VRAM) 10–20 4.5s 9s Small teams, pilot
Standard (16-core, 64GB, 24GB VRAM) 50 3.5s 7s Departments, 50–100 users
Enterprise (32-core, 128GB, 80GB VRAM) 100–200 2.8s 6s Large orgs, high concurrency
Enterprise cluster (multi-node) 500+ 2.5s 5s Enterprise-wide rollout

Model Performance Comparison

Different models have different speed/quality tradeoffs. Here is how they compare on standard spreadsheet analysis tasks.

Model Type Avg Response Quality Best For
DeepSeek-V2 Open-source 3.5s High General analysis, Chinese language
Qwen2.5-72B Open-source 4.1s High Multilingual, structured data
GPT-4o Closed-source (API) 2.8s Very High Complex reasoning, English
Claude 3.5 Sonnet Closed-source (API) 3.2s Very High Long documents, nuanced output
Gemini 1.5 Pro Closed-source (API) 3.0s High Mixed media, large context

Closed-source model response times depend on the provider's API latency and your network connection to their endpoints.


Stability and Uptime

RowSpeak Private Deployment is designed for continuous operation.

  • Target uptime: 99.9% (less than 9 hours downtime per year)
  • Graceful degradation: if the model layer is temporarily unavailable, the application layer continues to serve cached results
  • Restart recovery: full service recovery in under 60 seconds after a planned restart
  • Memory stability: no memory leaks observed in 30-day continuous operation tests

File Processing Performance

File Type File Size Processing Time
Single-sheet CSV < 1 MB < 1s
Multi-sheet Excel 5 MB 2–4s
Large Excel workbook 50 MB 8–15s
PDF with tables 10 MB 5–10s
Batch (10 files) 50 MB total 20–40s

Planning Your Deployment

Use the hardware sizing table above as a starting point. For a more precise recommendation based on your team size, file types, and usage patterns, request the Deployment Pack which includes a sizing worksheet.

For a live performance demonstration using your own file types, book a demo.