Agent | Democratizing Data

Embedding workflow templates in skills: shifting the LLM's role from "generation" to "rendering"

Sat, 28 Mar 2026 18:23:00 -0700

The dream of LLM-powered ML workflow generation

At my company, an ML feature called provides capabilities like RFM analysis, recommendation, and contextual bandits. To run ML predictions at scale, the system calls an ML API that spins up parallel workers on AWS Batch behind the scenes. To make this parallelization work, input tables are aggregated per profile, which is a deliberate trade-off for scalability. These processes are orchestrated through digdag workflows (.dig files, executed on Treasure Workflow, a hosted digdag service) containing SQL (Hive or Trino), where the ML API is invoked via digdag’s http> operator.

Originally, the pre-processing and post-processing workflows were built by MLEs on a paid Professional Services team using their own templates, then deployed to customers who purchased PS engagements. But there was a desire to scale this beyond PS customers, and LLM-based workflow and SQL generation was seen as the path forward. Despite models getting better every day, generating stable workflows and SQL with LLMs proved difficult. For example, TD-specific UDFs in SQL don’t come naturally to the models. After several attempts, we had given up.

The rise of Claude Code and agent-friendly CLIs

Then our CEO called for a push toward becoming an AI-native organization, and Claude Code was rolled out company-wide. Adoption spread beyond software engineers to PdMs, Solution Architects, and even sales. Two developments were particularly impactful: , a unified agent-friendly CLI that can call TD’s various microservice APIs, and , a desktop application built on top of tdx. With Claude Code and tdx working together, agents could create marketing journeys, analyze tables in TD, and visualize results. The CEO personally created onboarding challenge tasks for employees to accelerate adoption, and as a result, automation has been spreading across both internal and customer environments.

Out of this momentum, a Skills marketplace was born for both and internal use. These skills improve reproducibility and accelerate automation of complex tasks.

What the agent-friendly CLI made possible

The biggest contribution of tdx was exposing Treasure Workflow’s endpoints through a CLI, which let Claude Code autonomously create workflows, push them, run them, inspect the results, and iterate. Workflows can take anywhere from a few minutes to over an hour to execute, which has always made automated testing painful.

Thanks to Claude Code + tdx, it became possible to generate a workflow, queue it for execution in the background, and verify the results. This was a game changer.

An agent-friendly CLI that handles API interactions end-to-end is no longer a nice-to-have. It’s essential.

Turning ML workflow templates into skills

That said, as I mentioned at the top, no matter how smart Claude Code’s models get, it’s still hard for an LLM to generate workflows that are consistently reliable for anyone to run, especially for customers with no prior knowledge.

So I reframed the problem. If generating workflows and SQL from scratch is too hard, why not create templates that generate workflows and SQL from configuration parameters? By embedding templates in a skill, the LLM’s responsibility narrows from “generate workflows and SQL from scratch” to “choose the right parameters for the data and the problem.” The workflows themselves become deterministic since they’re pre-built as templates. Deterministic logic should live in scripts, not in the LLM’s probabilistic output.

This idea was inspired by how cdp-api dynamically generates digdag workflows from database values.

(Japanese, see slide 29 for the code example)

Jinja2 templates for digdag workflows

Concretely, I templated the .dig files with Jinja2 and had the LLM focus on determining configuration values, with config.yml as the single source of truth for all modifiable parameters. Claude renders the templates directly from config.yml without any external tooling. Seeing the .dig.j2 extension for the first time gave me a small thrill.

Internally, I use two kinds of variables: render-time parameters with {{ }} and digdag runtime parameters with ${ }. The former handles things like branching on whether the SQL engine is Hive or Trino, or when algorithms and hyperparameter candidates can be fixed ahead of time. The latter is for cases like storing hyperparameter tuning results in a table, then dynamically assigning those values to a training task at runtime via SQL.

OpenAPI as the contract with the agent

One tricky part of templating was that the ML API accepts complex parameters, and somehow the agent needs to understand all of them. Fortunately, our project managed ML endpoint parameters with OpenAPI, so we could hand the full spec to the agent.

Our project uses to generate model.py from the OpenAPI spec for parameter validation at runtime. Giving the machine-readable openapi.yml to the agent and having it translated into a markdown document within the skill turned out to work great. Long live standard formats.

Skill-creator agent vs. skill-user agent

While building the skills, I asked Claude how best to test them. Its suggestion: spin up a separate agent process to exercise the skills through trial and error. I tried it, and it was an excellent experience.

When you’re using the skill-user agent, you’re not reading the OpenAPI spec or skill documentation yourself. Instead, you start thinking in terms of what you want to try: “I want to run this algorithm with this parameter combination.” Normally, when doing manual sanity checks, the spec is already in your head, and you tend to skip the tedious, complex parameter combinations. But when an agent can do it for you, you get greedy.

The skill-user agent came back and told me: “I looked at the skill, but that parameter combination isn’t supported in the OpenAPI spec yet.” I had assumed our QA end-to-end tests would have caught this, but it was a close call. Because the OpenAPI spec was maintained manually, the Python code internally supported the combination, but the spec was missing the parameter, so requests couldn’t pass through.

I fixed the bug quickly, deployed to the development environment, and updated the skill. Claude then picked up the new parameter combination and used it as if it had always been there. Impressive.

Building the skill alongside the product taught me that having a live execution environment pays for itself many times over.

Wrapping up

By sharing these skills on the internal marketplace, the workflow creation step that used to require paid PS engagements was simplified, and even customers without PS contracts could benefit.

Treasure Studio also helped here: ML prediction results can now be visualized directly, making it easy to run analysis and model improvement cycles. Turning those analysis patterns into skills too seems like a natural next step, but that’s out of scope for this post.

When I was drafting this post, I bounced ideas off Claude, and it argued that “the LLM’s strengths are understanding problem structure and parameter inference.” Providing an end-to-end CLI to support that lets agents run the generate-execute-fix loop autonomously. And by templating the complex parts of that loop, you create a clean division of labor: domain experts build the templates, and agents (used by people without that domain knowledge) fill in the parameters.

ワークフローテンプレートをskillに埋め込んで、LLMの役割を「生成」から「レンダリング」に変えた話

Sat, 28 Mar 2026 18:03:00 -0700

MLワークフローのLLMによる自動生成の夢

会社のというML機能では、(MLではないが)RFMや推薦、Contextual Banditといった処理を行う機能がある。特にMLの予測処理をスケーラブルに実行するために、ML用APIをcallすると裏側でAWS Batchが起動して並列でworkerを立ち上げるという仕組みを取っている。この並列実行をするためには、入力テーブルをprofile（マーケティングキャンペーンの対象となるエンドユーザー）ごとに集約するという割り切りを入れることで、スケーラビリティを確保している。これらの処理は、digdagのworkflow（.digファイル。実際にはTreasure Workflow と呼ばれる hosted digdagで実行される）と、workflow内に記述されるSQL（Hive または Trino）とで管理されるワークフローからML APIがdigdagの http> operatorで呼び出される。

元々のプロジェクト開始時の座組としては、こうした前処理的・後処理的なワークフローはProfessional Serviceと呼ばれる有償のポストセールスエンジニアのMLEが自分たちで作成したテンプレートを元に、PSを購入した顧客にそのワークフローを展開していたのだが、この枠組みを超えて多くの顧客へ展開したいということで、LLMによるワークフロー・SQL生成が期待されていた。だが、モデルが日々良くなってるとはいえLLMによるワークフロー・SQL生成を安定して行うことは難しく（例えば、TD固有のUDFはぱっと出てこない）、何度か挑戦していたが諦めていた。

Claude Codeの進化とエージェント用CLI

そんな中、会社でAI Native企業になるというCEOの号令のもとClaude Codeの全社導入が進んだ。ソフトウェアエンジニアだけではなく、PdMやSolution Architect、営業にまでClaude Codeの利用が拡大された。特に重要な取り組みとしては、TDの各種マイクロサービスのAPIを叩ける統一的なエージェント用CLIのと、tdxを内包したデスクトップアプリケーションであるの貢献が大きい。Claude Codeとtdxが連携することで、エージェントがマーケティングのジャーニーを作成したりTDにあるテーブルの分析結果を可視化したりできるようになったのである。CEO自ら社員向けにオンボーディングチャレンジタスクを作ったりして利用加速を進めた結果、自社内及び顧客環境でも様々な自動化が進んでいる。

この流れで、・社内向けのSkills marketplaceが誕生した。これらを利用することで、作業の再現性が高まり、複雑な処理の自動化が加速しているのである。

ワークフロー生成におけるエージェント用CLIの功績

tdxの一番大きな功績は、Treasure WorkflowのエンドポイントをCLIから叩けるようにしたことで、Claude Codeがワークフローを作成、push、実行、結果を見て改善というフローが自律的に回せるようになったことである。ワークフローというものは実行するのに少なくとも数分、長ければ1時間以上かかるものもあるため、。

それを、Claude Code + tdxのおかげで、生成したワークフローを実行し、バックグラウンドに積んで実行結果を検証というサイクルが実現可能になったのである。これは革命的である。

APIとのやり取りを含めて一気通貫できるエージェント向けCLIは、もはやなくてはならない存在であろう。

MLワークフローテンプレートのskill化

とはいえ、冒頭でも書いたようにClaude Codeのモデルがいかに賢くなったとはいえ、事前知識がない顧客など、誰が実行しても安定的に納得の行くワークフローをLLMに生成させるのは困難である。

そこで、問題の見方を変えてみた。ワークフロー・SQLをスクラッチで生成するのが難しければ、設定ファイルのパラメータを元にワークフローとSQLを生成するテンプレートを作ればよいのである。つまり、テンプレートをskillに埋め込むことで、LLMの責任範囲は「ワークフローとSQLをスクラッチから生成する」ことから「データや問題に適したパラメータを選ぶ」ことへと狭まる。つまり、欲しいワークフローはあらかじめテンプレート化することでそこは確定的な処理として扱うのである。「」の応用といったところか。

この発想は、cdp-apiがDBの値をもとに動的にdigdagのworkflowを生成するというところから着想を得た。

digdagワークフローのJinja2テンプレート

実際にskillとして行ったのは、Jinja2でdigファイルをテンプレート化し、LLMは config.yml をsingle source of truthとしてすべての変更可能なパラメータを置き、設定値を決定することにフォーカスをさせた。 .dig.j2 というあまり世の中で見ない拡張子を見たときは、ちょっとドキドキした。

内部ではレンダー時に決定するパラメータ {{ }} と、digdagの変数として記述できるランタイム時のパラメータ ${ } を使い分けている。前者は、例えばSQLのエンジンがHiveかTrinoかによって分岐させたり、使うアルゴリズムやハイパーパラメータの候補が事前に決定できる場合に使われる。後者は、例えばハイパラチューンをした結果をテーブルに格納し、その結果をSQLで取得した際に動的に学習タスクに割り当てる、みたいなときに使う。

OpenAPI = agentとの契約書

テンプレート化をする際に厄介なのが、ML APIに渡せるパラメータが複雑であり、それをどのようにagentに教えるかということである。幸いにも、自分たちのプロジェクトでは、ML用エンドポイントのパラメータはOpenAPIで管理されており、そのspecで網羅的に渡すことが出来た。

元々、MLソリューション実行時にはOpenAPIからで生成されたmodel.pyでパラメータのバリデーションを行っていたのだが、機械可読な仕様である openapi.yml をエージェントに渡すことで、それを翻訳したmarkdownにskill化していた。標準フォーマット万歳である。

skill作成 agent VS skill使用 agent

skill作成中に作ったskillのテストどうするのがいいの？とClaudeに聞いたら、別プロセスでagent立ち上げて試行錯誤をするのが良いと教えてもらったので、それを実践した。これは非常に良い体験だった。

具体的には、skill使用agentを使うときは、OpenAPIやskillのドキュメントを自分で読むわけではないので、「このアルゴリズムとこのパラメータの組み合わせを実行したい」みたいな欲求が湧いてくるのである。通常、手動でやる sanity check だと頭に仕様が乗っており、面倒で複雑なパラメータの組み合わせはサボりがちなのだが、agentに任せれば簡単にできるという欲が湧いてくる。

しかし、skill使用agentから帰ってきた言葉は「skillを見たけどまだその組み合わせはOpenAPI的には実装されていない」という答えだった。一応、QAのend to endのテストもあると思っていたのに、危ないところであった。実際にはOpenAPIを手動で管理していたため、Pythonコード内部では実装されていたが、リクエストから受け付けるパラメータの漏れがあるのが見つかった。

このバグを早急に修正して、development環境にdeployし、skillもupdateしたところ、新しい組み合わせのパラメータをさも今までもあったかのようにClaudeは使いこなしていた。流石である。

開発しながらskillも作ることで、実行環境を用意することがかなり恩恵があるということを学んだ瞬間である。

まとめ

こうして得られたskillを社内skills marketplaceに共有することで、今まで有償PSが作っていたworkflow作成ステップを簡略化し、さらには有償PSを買っていない顧客でも利用可能となった。

また、Treasure Studioの恩恵として、MLで予測した結果を可視化できるようになり、簡単に分析・モデル改善のサイクルが回せるようになった。これらの分析のパターンもskill化していくと良さそうだが今回はスコープ外とした。

この記事を書くときにClaudeと壁打ちをしていたのだが、「LLMの強みは問題の構造を理解することとパラメータ推論」とClaudeは主張している。それを支援するための一気通貫したCLIを用意することで、生成→実行→修正のループを自律的に回せるようになる。そこでループの複雑なタスクをテンプレート化することで、ドメイン知識のある人間と、（知識がない人が使う）agentの役割分担ができるのだと学んだ。