Creating content at scale with a full-time job is a logistics problem. I want to publish across 10+ platforms — long-form articles, short-form video scripts, LinkedIn posts, newsletter editions — on a consistent schedule. The bottleneck is not ideas; it is the mechanical work of transforming an idea into formatted, platform-appropriate content in a format that does not require me to spend 5-10 hours per week on tasks a machine should do. So I built a system. Here is what I built, what works, and what failed in ways I did not anticipate.

The Architecture: 3 Layers

The system has three layers that mirror how I think about reliable AI workflows generally. The first layer is directives: plain-language instruction documents that define what each workflow should do, what inputs it takes, what outputs it produces, and how to handle edge cases. Think of these as the SOPs a good employee would follow. The second layer is orchestration: an AI agent that reads the directives, makes decisions about which tools to call in which order, and handles errors. The third layer is execution: deterministic Python scripts that do the actual work — calling APIs, formatting outputs, writing files, posting to platforms. The AI handles the uncertain decisions; the scripts handle the certain computations.

Why this separation? Because LLMs are probabilistic. They make different decisions on different runs, hallucinate formatting details, and compound errors across multi-step tasks. If I let an LLM do everything end-to-end — ideate the topic, write the post, format it for LinkedIn, post it via API — I get a 60-70% success rate at best. By pushing the deterministic work into Python scripts (API calls, text formatting, file operations) and reserving the LLM for the genuinely uncertain decisions (what angle to take on this topic, how to adapt this paragraph for a different audience), the success rate improves dramatically.

What Actually Works

The repurposing pipeline is the highest-value part of the system. I write one long-form post (like this one) and the pipeline automatically derives: a 1,200-word version for Substack, a 300-word version for LinkedIn, a 150-word thread for X, and a video script version structured for a talking-head recording. The tone adaptation — making the LinkedIn version more conversational, the Substack version more analytical — was surprisingly good out of the box with explicit instruction in the directive. I expected to need heavy manual editing. In practice, I edit about 20% of the output.

The self-healing error correction was the architectural feature I underestimated. When a script fails — an API rate limit, a malformed response, a format it has not seen — the agent reads the error, diagnoses it, fixes the script, and retests. This turns a 30-minute debugging session into a 90-second automated fix. The key is that the agent updates the directive with what it learned: "this API requires a 1-second delay between requests." That learning persists across sessions so the next run does not make the same mistake.

What Failed (Honestly)

LLMs hallucinate formats. I cannot stress this enough. I have detailed formatting specifications in every directive — word counts, section headers, required callouts — and the agent still occasionally produces output with the wrong structure, an extra section that was not requested, or a word count 40% over spec. The fix was to add a validation script at the end of every pipeline: check output length, check required sections are present, check required sections are not duplicated. When validation fails, the pipeline re-runs with the error message as additional context. This added 15 minutes of development time per pipeline and eliminated 80% of formatting failures.

The other failure was multi-platform auth management. Each platform has a different auth flow — OAuth2 for some, API keys for others, browser automation for the ones with no API. Maintaining valid credentials for 10+ platforms turned into its own maintenance burden. I have not solved this elegantly. My current approach is to automate the platforms with stable APIs (Ghost, LinkedIn, Substack) and handle the rest manually. The 80/20 principle applies here: automating the three highest-volume platforms covered 80% of the publishing work. The remaining platforms were not worth the maintenance cost of automation.

The Philosophy

The principle I keep coming back to is: AI should handle the 80% mechanical work so humans can focus on the 20% that requires genuine judgment. The mechanical work in content creation is adaptation, formatting, and distribution. The work that requires judgment is the original idea, the specific example that makes the abstract concrete, the decision about whether a claim is accurate. If I am spending my content time on formatting and posting, I am working below my skill level. If the AI is making the judgment calls, the content will be undifferentiated. The system works when it respects that division.