For a long time, I've had a dream of creating my own novel programming language someday. I have a computer science degree and a grade of "A" in my third-year Compiler Construction course back in 2011. All that was missing was a Claude Code subscription.
When I asked Claude what it thought of creating an LLM-first programming language, I could picture its non-existent eyes widen at the possibilities. LLMs often need to infer meaning from imprecise descriptions, which is pervasive in what I will call "human programming languages" that have existed for decades. By asking Claude what its ideal programming language would look like, I was able to distill several key ideas that take advantage of what LLMs are good at (and humans are bad at).
Human Programming Languages
"Human programming languages" are the languages that we all know and love, written by humans for humans, with decades of usage and examples. Human programming languages generally have only one syntax that encodes all of the following:
Semantic intent of the developer
The abstract syntax tree (AST) of program logic
A particular coding style
Hidden code (inheritance, destructors, etc.)
Error handling (implicit or explicit)
Variable and function names
Comments and documentation
Whitespace and indentation (sometimes has an effect on logic)
Many of these features are for human convenience in order to help humans describe to both the machine and other developers what a program should do. What if we distilled the essence of programming by removing the human conveniences?
The Ideal LLM-First Programming Language
In Claude's ideal LLM-first programming language, there is only one standard way to do something. No coding styles or standards, no formatters, no There's More Than One Way To Do It (TMTOWTDI), no syntactic sugar.
In this new language, there is not just one syntax, but several syntaxes that represent the same AST, depending on the task at hand. Authoring new code, inspecting or debugging code, compiling code, and storing code in source control could each have their own syntax that offers advantages for the requirements of each activity. The conceptual AST becomes the source of truth, instead of one particular representation.
The ideal LLM-first language would track effects of functions, so that understanding the effects of a function is a lookup, not a calculation. If one function in a call chain allocates memory, makes a recursive call, or performs disk I/O, that effect is tracked all the way up the call chain. When an LLM authors or debugs code, no inference of effects is necessary.
Lastly, an LLM-first language would be token-efficient. Less tokens used means less cost.
Before I could even ask Claude what to name this new language, it had already suggested "Tacit", which means "to express without words or speech". Who am I to deny Claude's suggestion?
Collaborative Design
Designing a new programming language with Claude consisted of many back-and-forth conversations. I leaned on Claude's vast expertise about programming language design, and leaned on my own basic understanding of the same topic to ask questions and clarify what Claude was proposing. By asking Claude to explain its proposal to me in more simple terms, I would often cause it to recognize and fix inconsistencies.
Claude suggested tracking the design using Architecture Decision Records (ADRs). An ADR is a numbered markdown file that documents an individual design decision along with context, alternatives considered, consequences, and related decisions. At time of writing, Tacit has 90 ADRs.
Choosing a Syntax
With the stated goal of token efficiency in mind, the first major decision was to figure out the initial syntax of the language. In order to evaluate candidates, Claude generated two different ASTs (one small and one large). The plan was to represent these ASTs in five different candidate syntaxes, and then calculate how many tokens each syntax used to represent the ASTs.
The first candidate was based on S-expressions, similar to the Lisp programming language. The second candidate was "glyph-prefix" based on non-alphanumeric characters. The rest of the candidates were variations on Byte-Pair Encoding (BPE), which is what OpenAI's open source tokenizer tiktoken uses (and Anthropic's closed source tokenizer as well).
sexpr-with-int-ids:
(2 (0 0) (2 (7 @fst (1 0 #1) @snd (1 0 #2)) (4 (8 0 @fst) (8 0 @snd) #0)))glyph-prefix:
= \ 0 = { @fst . 0 #1 @snd . 0 #2 } ? / 0 @fst / 0 @snd #0bpe-optimized:
let id = lambda x . x in let pair = { fst : id 1 , snd : id 2 } in if pair . fst then pair . snd else 0bpe-compact:
let id = lambda x. x in let pair = {fst: id 1, snd: id 2} in if pair.fst then pair.snd else 0bpe-hybrid:
let = lambda 0 in let = { fst : 0 1 , snd : 0 2 } in if 0 . fst then 0 . snd else 0Token count was evaluated by running each syntax through tiktoken, as well as through the Anthropic API. The Anthropic tokenizer is closed source, but their API responses do include token counts.
In the end, all three BPE-based syntaxes used significantly less tokens compared to s-expressions and glpyh-prefix. This shouldn't be a surprise, given that the tokenizers evaluated are both BPE tokenizers. I learned that tokens can sometimes be entire words, if they appear frequently in the training data. Even though the BPE syntaxes have a longer character count, they have a smaller token count.
Authoring View
The first "view" of the AST-of-Truth in Tacit is the authoring view. This is the syntax that an LLM will read and write. It follows that the major design factor is efficient use of tokens, so the authoring view uses the bpe-compact syntax from the evaluation. In the Tacit project, authoring view code can be stored in a file with the .taca extension, but this is only done for the sake of examples. Authoring view code is generated on the fly by the Tacit renderer and passed to an LLM model, or is received from an LLM model and passed into the Tacit canonicalizer.
Here is an example of Tacit authoring view code to calculate whether a number is prime:
let ibuf = @buf-alloc 16 in
let n = @read 0 ibuf 16 in
let nl = @scan-byte ibuf 0 n 10 in
let v = @parse-i64 ibuf 0 nl in
let p =
if @lt v 2 then 0
else if @lt v 4 then 1
else if @eq (@mod v 2) 0 then 0
else
rec { trial = lambda pack.
let nn = @div pack 50000 in
let i = @mod pack 50000 in
if @gt (@mul i i) nn then 1
else if @eq (@mod nn i) 0 then 0
else trial (@add (@mul nn 50000) (@add i 2))
} in trial (@add (@mul v 50000) 3) in
let _ = if p then @write 1 "yes" 3 else @write 1 "no" 2 in
let _ = @write 1 "\n" 1 in
0Canonical View
The next view is the canonical view. Canonical Tacit code is a byte-exact, deterministic, textual representation of the abstract syntax tree of a program.
When authoring view code is fed into the Tacit canonicalizer, variable names are sorted, stripped, and converted to using De Bruijn indices. This ensures that slight variations between similar authoring view code that have no effect on program logic end up producing the same canonical code. If a variable is renamed or its order relative to other variables is changed, the same canonical code is generated in both cases. Newlines and indentation do not exist, because they serve no purpose. The canonical view code is hashed, in order to uniquely identify the code and allow for de-duplication in cases where two functions are 100% equivalent.
The syntax of canonical view code optimizes for standardization, not tokens. This is the format that is stored under source control using files with the .tac extension:
(let (app (sym buf-alloc) (int 16)) (let (app (app (app (sym read) (int 0)) (var 0)) (int 16)) (let (app (app (app (app (sym scan-byte) (var 1)) (int 0)) (var 0)) (int 10)) (let (app (app (app (sym parse-i64) (var 2)) (int 0)) (var 0)) (let (if (app (app (sym lt) (var 0)) (int 2)) (int 0) (if (app (app (sym lt) (var 0)) (int 4)) (int 1) (if (app (app (sym eq) (app (app (sym mod) (var 0)) (int 2))) (int 0)) (int 0) (rec (lam (let (app (app (sym div) (var 0)) (int 50000)) (let (app (app (sym mod) (var 1)) (int 50000)) (if (app (app (sym gt) (app (app (sym mul) (var 0)) (var 0))) (var 1)) (int 1) (if (app (app (sym eq) (app (app (sym mod) (var 1)) (var 0))) (int 0)) (int 0) (app (var 3) (app (app (sym add) (app (app (sym mul) (var 1)) (int 50000))) (app (app (sym add) (var 0)) (int 2))))))))) (app (var 0) (app (app (sym add) (app (app (sym mul) (var 1)) (int 50000))) (int 3))))))) (let (if (var 0) (app (app (app (sym write) (int 1)) (str "yes")) (int 3)) (app (app (app (sym write) (int 1)) (str "no")) (int 2))) (let (app (app (app (sym write) (int 1)) (str "\n")) (int 1)) (int 0))))))))Metadata View
The last important view is the metadata view. This format is simply a JSON file that tracks all the information that was stripped away by the canonicalizer. It stores variable names, comments, effects, and function hashes in a file with the .tacd extension, alongside its matching canonical view.
{
"tacd_version": "1",
"targets_hash_blake3": "c698e455d69d6fb9240d1d2c5e85bd83dfb55140e47b4d3162ac93fcf8b58cdf",
"display": {
"binder": "ibuf",
"type_hint": "Int",
"effect_hint": [
"Alloc",
"IO",
"Mut"
],
}In order to generate authoring view code, the canonical view and the metadata view are fed into the Tacit renderer.
Language Implementation
The Tacit compiler is written in Rust. I admit that I've never programmed in Rust and that the compiler is 100% vibe-coded. I've put a lot of trust into the Opus 4.7 and GPT-5.5 frontier models to write correct code, based on the design and ADR documents. Combined with tests, the compiler and toolchain appears to work well.
The compiler uses the inkwell Rust library to generate LLVM intermediate representation (IR). The IR can then be compiled to a native binary for any CPU architecture supported by LLVM, using the decades of optimization that LLVM provides. This allows Tacit to be cross-platform from the beginning.
The compiler supports the concept of "hole nodes" in an AST to represent invalid syntax, which allows the compiler to emit detailed errors and continue parsing the rest of the AST.
Tacit does not support Foreign function interface (FFI) to interoperate with other programming languages, nor does it support use of third-party libraries. To do so would invalidate the effect-tracking model that Tacit is built upon. If Tacit were to make function calls into unknown code, it would lose the ability to make guarantees about what those functions do and the types of undefined behaviour that they may cause.
In order to interact with third-party libraries, Tacit supports a well-defined application binary interface (ABI) that allows two-way communication between a Tacit library and a host program written in C or Rust. A host program can then load both third-party libraries and Tacit code, and act as an intermediary that passes data between the two. In this architecture, code that could have undefined behaviour stays isolated from Tacit and allows Tacit to remain internally consistent.
Teaching the LLM
Tacit is a brand-new experimental programming language, so LLMs do not know how to write Tacit code. During development of Tacit, frontier models with full repository context can write Tacit code if they ingest a sufficient amount of context. However, for a developer that wants to write Tacit code using only the compiler and toolchain, a method of teaching an LLM model how to write Tacit code is required.
Claude came up with the concept of a Tacit "primer", which is a block of text that describes how to write Tacit code. It contains explanations, examples of correct code, examples of incorrect code to avoid, coding patterns, and compiler error details. When the primer is sent to an LLM model alongside the prompt for what the Tacit code should do, the model can successfully write valid Tacit code in the authoring view. The trade-off is that sending the primer costs additional tokens and context.
Primer Evaluation
In order to author and refine a primer, an evaluation harness was required. I used the Opus 4.7 model with full repository context to author 47 simple programming tasks in Tacit, as well as equivalent solutions in Rust and Python. This corpus of tasks acts as a baseline against which primer-authored Tacit code could be compared and evaluated.
The evaluation harness is configured with API keys for Anthropic and OpenRouter so that multiple different mid-tier models could be evaluated. During an evaluation run, the harness sends the current primer and a pre-defined prompt for each of the 47 programming tasks to the model under evaluation. The models evaluated were Sonnet 4.6, Haiku 4.5, GPT-5.4 and GPT-5.4-mini.
Initial attempts compared token counts of generated Tacit to Python, but I later realized that this was an unfair comparison. Python offers a lot of options for shorthand or syntactic sugar to accomplish complex operations with very concise syntax (for example, array slicing). Comparison to Rust is much more fair, because Tacit tries to make similar guarantees (for example, around memory ownership) and has approximately the same level of code verbosity as Rust.
One-Shot Results
The first round of evaluation used a very primitive version of Tacit that had no standard library. In order to achieve success, LLM models would implement primitives and algorithms, and unroll recursive solutions into enumerated cases up to some maximum. The primer went through multiple revisions to be made more precise and successful at teaching the LLM how to write correct Tacit code on its first try.
The token count of the primer started at approximately 10,000 tokens with a success rate of 6% of implemented tasks showing the correct output. Over successive evaluations, the primer eventually grew to ~13,000 tokens with a success rate of 61%. Further attempts to improve the primer actually hit a plateau and then started to regress. A primer of ~16,000 tokens had a success rate of 51%.
At this point, it made sense to plan out a standard library and start implementing the typical functions that most other programming languages provide. This put Tacit on a more equal footing and made tasks easier to achieve. Functions implemented included buffer-backed vectors, text indexing, counting helpers, stream helpers (stdin/stdout), UTF-8 encoding, and so on.
Once the standard library was implemented, the primer needed to be updated to teach the model how to use the new functions. The ideal trade-off is that the generated Tacit code uses less tokens overall when compared to the number of tokens added to the primer. This turned out to be the case until further primer updates once again hit a plateau.
Repair Loop
To address the success rate plateau, the evaluation harness was updated to allow a configurable number of turns for a repair loop. The model was given up to two additional turns to correct any errors, and was evaluated on the final success rate and the average number of turns required for success. The harness would feed specific compiler errors into the subsequent prompts so that the LLM could target its fixes.
For Sonnet and GPT-5.4, the success rate for a two-turn repair loop was 98% and 92% respectively. For Haiku and GPT-5.4-mini, the success rate was 47% and 45% respectively.
Expanded Standard Library
A further improvement in token efficiency was attempted by expanding the standard library to include dataclasses, closures, and higher-order combinators like map, fold and for-each. The results improved marginally, but it became clear that further additions to the standard library (and therefore the primer as well) were not sufficiently reducing the overall token cost of the Tacit output.
If the primer token cost was excluded, Tacit tokens were still approximately 2-3 times higher than the equivalent Rust code.
Current State
At this point in development of Tacit, the evaluation harness proved that a mid-tier model such as Sonnet can be successfully taught how to write Tacit code using the primer method. A success rate of 100% or near 100% is possible within two repair turns, depending on task complexity.
In addition to the standard library that was implemented for token efficiency gains, the following language features have been implemented:
Units (i.e. modules/packages)
Unit manifest, dependencies and dependency lockfile
Unit testing
Fixed-width integers
Bit manipulation
Standard library written in Tacit (rather than being implemented inside the compiler initially)
Host program ABI
Token Efficiency Conclusion
Excluding the primer token count, generated Tacit code is still ~2x the token count of equivalent Rust code. The primer itself is now over 26,000 tokens. In order to eliminate the primer, viable short-term options would be to fine-tune an existing model or implement a Mixture of Experts (MoE) plugin. Fine-tuning is impractical at this point, because thousands of examples of valid Tacit code in many different varieties would be required. I did not investigate the feasibility of the MoE option, but I assume it would also require a large body of Tacit code.
Based on the evaluation results, this has led me to conclude that one cannot write a new programming language that is more token-efficient than existing programming languages, due to model training barriers.
If Tacit were to become widely popular and achieve a large body of open source code, and if frontier LLM model developers decided to train models on Tacit, only then could Tacit have a reasonable chance of matching or exceeding the token efficiency of existing languages.
If a frontier model company were to design a new LLM-first programming language, and a new tokenizer, and train an LLM model on that new language (all in tandem), then real token efficiency gains could be found. I predict that this will eventually happen at some point in the future.
Tacit Advantages
Despite failing the token efficiency goal, Tacit still has real advantages for LLMs. Evaluation results so far have shown that a mid-tier model can be successfully taught how to write accurate Tacit code using a primer, and a simple programming task can be achieved with near 100% success within two repair turns.
Tacit may still provide token efficiency gains by requiring fewer turns to achieve success, and eliminating the need for extra turns to run a formatting tool. The primer cost is paid once per session, and its cost can be amortized over the entire session. If the LLM model being used has a large context window, executing as many development tasks as possible within that same window that contains the primer will lower the overall token cost per task.
Tacit takes advantage of using multiple views of a program AST, depending on the task. When authoring new code, the token-efficient authoring view is used. When debugging difficult problems, a future inspection view will be used. Further toolchain improvements will allow the LLM model to selectively choose how much information it requires in the inspection view or workflow tooling.
LLMs are good at understanding memory ownership and borrowing semantics, especially when the primitives to track those values are guaranteed up front by the toolchain, rather than having to be inferred and re-inferred by the model. The same applies to failure paths and edge cases. With effect tracking being enforced by the toolchain, Tacit code requires the LLM to be explicit and robust.
Future Improvements
I have a list of ideas for improvements to the language in the future. It was difficult to decide when to publish an initial version of Tacit because I wanted to make it as feature-complete as possible. If I waited to implement everything, the language wouldn't be published for a long time. The list below describes potential future improvements:
End-to-end workflow primer (how to use future tools like diff/merge/debugger)
Expanded debugging capabilities
Optimization and hardening
Capability system (similar to the effect tracking system, but for code permissions, think seccomp)
Provable code
Self-hosting (Tacit compiler written in Tacit)
Three-way merge for multi-agent collaborative development
Two-way transpilation to/from C/Rust/Ghidra P-Code
This would allow LLMs to reason about existing code more easily, especially when looking for bugs
Fun Facts
Tacit was designed and implemented with Opus 4.7 and GPT-5.5
I used a Claude Code Pro ($20/month) and a ChatGPT Plus ($25/month) subscription
The models did the heavy lifting, but required human guidance throughout
This project took 29 days (April 19, 2026 to May 17, 2026) in my free time. I was hampered by session limits most of the time, so someone with a higher-tier subscription probably could have done it faster.
Conclusion
In conclusion, I consider this experiment a success. I learned about programming language design, I published a new programming language that could be used as the basis for further experimentation, and I had fun.
Don't be afraid to try something ambitious if you have enough knowledge to get started and you're willing to learn!
Links
A Nintendo GameBoy emulator named tacboy to serve as a standalone example of a project written in Tacit