OpenMark helps you evaluate AI models with custom benchmarks. Here's the flow:
| Rank | Model | Provider | Score | Stability | Rec. Temp | Pricing | Cost* | Time | Acc/$ | Acc/min | Avg Out | Completion |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Select a task from the rail, then run a benchmark to see results here. | ||||||||||||
β
OpenMark helps you evaluate AI models with custom benchmarks. Here's the flow:
Stop Guessing. Know.
An AI model's ranking on popular leaderboards tells you little about how it'll handle your specific needs. The only way to know is to test your actual use case.
AI providers update models silently. A model that worked perfectly last month might behave differently today. Regular benchmarking catches drift before it becomes a production issue.
When a shiny new model drops, don't assume it's an upgrade for your task. New models often excel in some areas while regressing in others. Test before you switch.
Premium "reasoning" models shine on complex tasks, but simpler models often win on straightforward ones, at a fraction of the cost. Don't pay for capability you don't need.
If your prompt works consistently across multiple models, it's probably well-crafted. If it fails on most, you've found a fragility to fix.
Rate limits, downtime, API errors: they happen. Know your backup options before your primary goes down. Switch instantly instead of scrambling.
Unlike subjective "which response feels better" voting systems or LLM-as-judge evaluations, OpenMark uses deterministic scoring modes you define. Objective and reproducible.
Task = a benchmark definition (appears in task rail). Test = an individual test case within a task. One task can contain multiple tests.
Create benchmark tasks using one of three modes:
Describe what you want to test in plain language. The AI agent generates test cases, expected answers, and scoring configuration for you. Validation runs automatically after generation.
Build tests using structured forms. Add prompts, expected answers, attachments, and scoring modes for each test. Validation runs automatically after generation.
Write or paste YAML directly. Full control over every field. Use Validate & Preflight to check syntax and token counts before saving.
Add files to your tests via drag & drop or the attachment button. Supports images, PDFs, documents, and spreadsheets. See the attachment tooltip (β) for current format and size limits.
Right-click any task in the task rail to Rename, edit Description, or Delete. Click a saved task to load it for editing.
The Preflight section below Task Preview shows token counts for each test.
Complete YAML schema and scoring mode docs β find it in the Editor tab.
Choose which models to include in your benchmark:
Filter by capability (Vision, Tools) or pricing tier (Very Low to Very High). Click provider icons to show/hide all models from that provider.
Right-click any model chip and select "View Details" to see pricing, context window, capabilities, and latency information.
Automatically selects ~8 models across providers and pricing tiers for a balanced comparison. Adapts to your task β if vision is needed, only vision-capable models are picked.
Quickly re-select the same models you've previously benchmarked for the active task.
Choose a Preset for quick setup, or adjust individual settings below.
Run each test multiple times to measure consistency. Higher values give more reliable stability scores. Prompt caching is enabled for supported providers.
Maximum response length. Set based on expected answer length to control costs.
Automatically test different temperature values to find the best setting for your task.
Low (more deterministic) vs High (more creative). Use Low for factual tasks, High for creative ones.
How long to wait for responses. "Snappy" for quick tasks, "Patient" for complex prompts or busy servers.
Stop processing a model if it hits repeated errors (4xx/5xx). Disable to diagnose failures or capture partial results.
Click Start Benchmark to begin. If a benchmark is already running, the button becomes Queue Benchmark. Jobs appear in the Job Queue and run in order (queue limits vary by plan).
A credit range is shown above the Start button. This is a conservative estimate β actual costs are typically much lower, since most models return concise answers well under the maximum token limit. You'll see the real cost in the completion notification.
Note: Models are evaluated on their training knowledge only β no internet access is enabled. This ensures fair, reproducible comparisons. And Models are run with each provider's default reasoning effort unless otherwise specified. The Acc/$ metric normalizes for cost differences, making it a useful proxy for compute-efficiency even when reasoning effort varies across providers.
Each row shows one model's performance. Key columns:
Overall accuracy (points earned / total points). Higher is better.
Consistency across runs (Β±0.000 is perfectly stable). Lower variance is better.
Temperature used. Highlighted if discovered via temperature optimization.
Average cost per stability run. Asterisk (*) = calculated from actual token usage.
Median time per stability run. Useful for real-time use case decisions.
Efficiency metrics: accuracy per dollar (cost efficiency) and per minute (speed efficiency).
Average output tokens per run. For reasoning models, may include thinking tokens.
Percentage of tests completed without errors.
Click any result row to open the details drawer showing:
Click individual tests to see the exact prompt sent and the model's response β invaluable for refining your prompts.
Use Export to download results. CSV and JSON include detailed per-test data; TXT provides a summary table. Use Share to generate a shareable image.
The default sort order prioritizes accuracy, then cost-efficiency (Acc/$), then speed (Acc/min). Click any column header to re-sort based on your priorities.
Limits vary by subscription tier. Here are your current limits:
When you run stability runs (multiple iterations of the same test), OpenMark enables prompt caching when supported by providers.
Cached prompts cost significantly less (up to 75% savings on some providers)
Cached prompts process faster since the input is pre-tokenized
Caching is automatic β no configuration needed. The cost shown in results reflects actual usage including cache benefits.
Have a question, feature request, or need billing assistance? We'd love to hear from you.
We're happy to help with:
Stop guessing which AI model to use.
Describe your task, test it against 100+ models, get scored results in minutes.
Used by developers and teams worldwide
No API keys. No code. No setup.
Effective Date: January 15, 2026
Welcome to OpenMark ("the Service"), an AI model benchmarking platform operated by:
OpenMark AI, Lda
NIF/VAT: 519 147 766
Lisbon, Portugal
Email: support@openmark.ai
By accessing or using the Service, you agree to be bound by these Terms of Service ("Terms").
OpenMark provides an AI model benchmarking platform that enables users to:
The Service acts as an intermediary, sending user-defined prompts to third-party AI model providers and returning their responses for evaluation.
3.1 Eligibility: You must be at least 18 years old or the age of legal majority in your jurisdiction to use this Service.
3.2 Account Creation: Accounts are created through the available authentication methods. You are responsible for maintaining the security of your account credentials.
3.3 Account Accuracy: You agree to provide accurate and complete information during registration.
3.4 One Account Per Person: Each user may maintain only one account. Creating multiple accounts to circumvent limits or abuse the Service is prohibited.
You agree NOT to use the Service to:
Violation of this Acceptable Use Policy may result in immediate account termination without refund.
5.1 Tiers: The Service offers multiple subscription tiers (Free, Pro, Expert) with varying limits on tasks, storage, and features.
5.2 Credits: Benchmarking operations consume credits. Credits come in two forms:
5.3 Pricing: Current pricing is displayed at the point of purchase. Prices may be subject to applicable taxes (VAT/sales tax) based on your location.
5.4 Price Changes: We reserve the right to modify pricing with 30 days' notice to active subscribers.
6.1 Payment Processing: All payments are processed securely through our payment providers. We do not store your full payment card information on our servers.
6.2 Subscription Billing: Subscriptions are billed in advance on a recurring basis (monthly or annually). Your subscription will automatically renew unless cancelled before the renewal date.
6.3 Failed Payments: If a payment fails, we will attempt to retry. If all retry attempts fail, your subscription may be cancelled.
6.4 Taxes: Depending on your location, VAT, GST, or sales tax may be added at checkout. You are responsible for any applicable taxes.
7.1 Digital Services: As a digital service, refunds are generally not provided once credits have been consumed or services have been used.
7.2 EU Consumer Rights: EU consumers have a 14-day withdrawal right for digital content. However, by using the Service immediately upon purchase, you acknowledge that you waive this right once credits are consumed or benchmark runs are executed.
7.3 Service Errors: If you experience a technical error that causes credit loss through no fault of your own, please contact us at support@openmark.ai. We will review and may issue a credit refund at our discretion.
7.4 Chargebacks: If you believe there has been an error, please contact us before initiating a chargeback. We are committed to resolving disputes fairly and quickly.
8.1 Active Accounts: Your tasks, results, and data are retained while your account remains active.
8.2 Inactive Accounts: After 60 days of inactivity (no logins or API activity), free tier accounts may have their data deleted. Users who have purchased credits receive extended retention of 365 days. Paid subscribers retain data indefinitely while subscribed.
8.3 Account Deletion: You may request account deletion at any time by contacting support@openmark.ai. See our Privacy Policy for details.
9.1 Your Content: You retain ownership of tasks, prompts, and content you create. By using the Service, you grant us a limited license to process your content solely to provide the Service.
9.2 Our Content: The Service, including its design, code, branding, and documentation, is owned by OpenMark AI, Lda and protected by intellectual property laws.
9.3 AI Provider Content: Responses from AI models are subject to the terms of their respective providers.
10.1 Service Provided "As Is": The Service is provided without warranties of any kind, express or implied, including but not limited to fitness for a particular purpose or non-infringement.
10.2 No Guarantee of AI Accuracy: We do not guarantee the accuracy, reliability, or appropriateness of AI model outputs. You are solely responsible for how you use benchmark results.
10.3 Liability Cap: To the maximum extent permitted by law, our total liability is limited to the amount you have paid us in the 1 month preceding the claim, or β¬50, whichever is greater.
10.4 Excluded Damages: We are not liable for indirect, incidental, consequential, or punitive damages, including lost profits or data loss.
You agree to indemnify and hold harmless OpenMark AI, Lda and its officers, directors, employees, and agents from any claims, damages, or expenses arising from your use of the Service or violation of these Terms.
12.1 Availability: We strive for high availability but do not guarantee uninterrupted access. Scheduled maintenance will be announced when possible.
12.2 Modifications: We may modify, suspend, or discontinue features at any time. Material changes affecting paid features will be communicated with reasonable notice.
12.3 Third-Party Dependencies: The Service relies on third-party AI providers. We are not responsible for their availability, pricing changes, or service modifications.
13.1 By You: You may cancel your subscription at any time via the Billing portal. Cancellation takes effect at the end of the current billing period.
13.2 By Us: We may suspend or terminate your account for violation of these Terms, suspected fraud, or abuse. Serious violations may result in immediate termination without refund.
13.3 Effect of Termination: Upon termination, your access to the Service ends. Data may be deleted according to our retention policy.
14.1 Informal Resolution: Before initiating formal proceedings, you agree to contact us at support@openmark.ai to attempt resolution.
14.2 EU Consumers: EU consumers may use the European Commission's Online Dispute Resolution platform at https://ec.europa.eu/odr.
14.3 Consumer Protection: These Terms do not limit any mandatory consumer protection rights under applicable law.
These Terms are governed by the laws of Portugal. Any disputes shall be submitted to the competent courts of Lisbon, Portugal, without prejudice to any mandatory consumer protection jurisdiction rights.
We may update these Terms from time to time. Significant changes will be notified via email or through the Service. Continued use after changes constitutes acceptance of the new Terms.
If any provision of these Terms is found unenforceable, the remaining provisions shall continue in effect.
For questions about these Terms, please contact:
OpenMark AI, Lda
Lisbon, Portugal
Email: support@openmark.ai
NIF/VAT: 519 147 766
Last updated: January 15, 2026
Effective Date: January 15, 2026
This Privacy Policy explains how OpenMark AI, Lda ("we", "us", "our") collects, uses, and protects your personal data when you use our AI model benchmarking platform ("the Service").
We are committed to protecting your privacy and complying with the General Data Protection Regulation (GDPR) and other applicable data protection laws.
The data controller responsible for your personal data is:
OpenMark AI, Lda
NIF/VAT: 519 147 766
Lisbon, Portugal
Email: support@openmark.ai
Note: Full payment card details are processed and stored by our payment providers, not by us. See Section 6.
| Purpose | Legal Basis (GDPR Art. 6) |
|---|---|
| Provide the benchmarking service | Contract performance |
| Process payments and subscriptions | Contract performance |
| Send transactional emails (receipts, account updates) | Contract performance |
| Prevent fraud and abuse | Legitimate interest |
| Comply with tax and legal obligations | Legal obligation |
| Improve the Service based on usage patterns | Legitimate interest |
| Send product updates and newsletters | Consent (opt-in) |
We use browser local storage (not cookies) for:
This data remains on your device and is not transmitted to our servers. You can clear it at any time via your browser settings.
Third-party cookies: Our payment providers may use cookies for fraud prevention and checkout functionality.
We share data with the following third parties to provide the Service:
When you run benchmarks, your prompts are sent to various third-party AI model providers. Each provider processes data according to their own privacy policies.
Some of our third-party providers are based outside the EU/EEA (e.g., United States). When data is transferred outside the EU/EEA, we ensure appropriate safeguards are in place, including:
| Data Type | Retention Period |
|---|---|
| Account data (active users) | Duration of account |
| Tasks and results (Free tier, inactive) | 60 days after last activity |
| Tasks and results (with purchase history) | 365 days after last activity |
| Tasks and results (active subscription) | Indefinite while subscribed |
| Transaction records | 7 years (legal requirement) |
| Security logs | 90 days |
Under the GDPR, you have rights including: access, rectification, erasure, restriction of processing, data portability, objection, and withdrawal of consent.
To exercise any of these rights, please contact us at support@openmark.ai. We will respond within 30 days.
We implement appropriate technical and organizational measures to protect your data, including:
The Service is not intended for users under 18 years of age. We do not knowingly collect personal data from children. If you believe a child has provided us with personal data, please contact us immediately.
We do not use your personal data for automated decision-making or profiling that produces legal or similarly significant effects.
We may update this Privacy Policy from time to time. Significant changes will be communicated via email or through the Service. The "Effective Date" at the top indicates the last update.
If you are unsatisfied with our handling of your data, you have the right to lodge a complaint with the Portuguese Data Protection Authority:
CNPD β ComissΓ£o Nacional de ProteΓ§Γ£o de Dados
Website: www.cnpd.pt
EU residents may also contact their local data protection authority.
For any privacy-related questions or to exercise your rights:
OpenMark AI, Lda
Lisbon, Portugal
Email: support@openmark.ai
Last updated: January 15, 2026
OpenMark works best in portrait mode.