We talk a lot about what AI can do—but far less about what it can’t do without the right foundation.
📊 According to Gartner, by 2026, 60% of AI projects will be abandoned because they lack the data infrastructure needed to succeed.
Much of what we hear about AI today is cutting-edge and exciting—but none of it is possible without the right data foundation. In our work with CPA firms, my team supports establishing that foundation first, ensuring that the systems are set up in a way that AI can build upon reliably.
Because at the end of the day, the age-old principle still applies: garbage in, garbage out. Without the right foundation, even the most powerful AI tools won’t deliver meaningful value.
With that context, here are three key concepts I discuss regularly when helping firms get their data house in order for the AI era.
-
Data hygiene
ChatGPT tells me that the classic definition of hygiene is:
“Conditions or practices conducive to maintaining health and preventing disease, especially through cleanliness.”
This concept of hygiene applies directly to how we manage data in our firms.
Hygiene, applied to your data specifically, is about the ongoing standard, discipline, and compliance around your data intake, processing, and cleansing. It’s not a one-time cleanup. It’s a continuous operational practice that protects the integrity of your tech foundation.
Once you’ve invested in building structured, normalized, and beautifully modeled data processes—congratulations! You’re further along than most (almost everyone actually).
If you spend months (or years, which is not uncommon!) investing in the right tools and architecture for your AI future, the last thing you want is poor compliance to erode the quality of that data. You’ll end up needing to rebuild what you just finished.
💡 Tip: After any data or technology transformation, establish team-wide processes for structuring, normalizing, and labeling data—and reinforce them consistently to protect the long-term integrity of your AI efforts.
-
Data bankruptcy
For this one, I asked Perplexity for the definition of "bankruptcy”—it came back with:
"A legal process in which an individual or business is formally declared unable to pay their outstanding debts."
Now, imagine applying that concept to your firm’s data.
Instead of financial liabilities, think of an overwhelming burden of disorganized, outdated, or inaccessible data—what we often call "data debt." In the same way that financial bankruptcy happens when the cost and complexity of repaying debts exceed the ability to recover and hence the financial and legal liabilities must be “wiped out” for the company to restructure, data bankruptcy is when the effort required to clean and organize legacy data far outweighs the potential value and hence the firm is better off focusing on building a high-quality data system from present day forward.
Maybe your firm has 10+ years of documents scattered across outdated folder structures, client notes buried in untitled PDFs, or each partner managing their own private filing system. In these situations, it's common to see firm leaders feeling paralyzed by the sheer weight of "data debt"—making it hard to move forward without also taking on the impossible task of fully integrating legacy content.
The answer isn’t always to fix everything at once. Often, the best approach is to take an 80/20 approach to historical data. So focus on getting your current and future data in order first. Establish modern architecture, file structures, and protocols moving forward, and then selectively integrate older data as needed, piece by piece. This relieves the burden and allows forward momentum while still respecting your historical records.
💡 Tip: If your legacy data is holding your team back, consider declaring data bankruptcy on portions of your archive. Start fresh with AI-ready data processes and bring in the old only when it adds real value.
-
Data modularity
Microsoft Copilot tells me that “Modularity” is:
“The quality of consisting of separate parts that, when combined, form a complete whole.”
You don’t need to go all-in on one giant platform to be “AI ready.” But you do need to ensure your systems are built with modularity and interoperability at the core.
That’s because the pace of AI innovation is outpacing most firms’ current tech stacks. New AI-first vendors are popping up constantly. Existing productivity suites, like Microsoft 365, are rolling out embedded capabilities. And the capabilities we’ll want five years from now may not even exist yet.
Your system needs to be able to flex—not just around the tools of today, but around those of tomorrow. That means your data must be accessible, portable, and able to "play nice" with whatever’s coming next.
Think of it like a pit crew in a Formula 1 race: every component (tires, tools, crew members) is optimized to work quickly, independently, and together—so when change happens, you can adapt fast without losing momentum. Your systems should be modular enough to swap in new parts, tools, or vendors without reengineering the entire foundation.
🛠️ Tip: Design your stack with modularity in mind. Whether you’re using point solutions or multi-module platforms, ensure that every tool is loosely coupled but tightly aligned. Ask:
- Can each part function independently and still connect well with others?
- Does this architecture make it easy to plug into new AI tools?
- Will our systems scale and adapt as new capabilities emerge?
Modularity isn’t just a technical preference—it’s an operational strategy for thriving in an AI-driven future.
The future belongs to the data-ready
Skipping the fundamentals will lead to disastrous consequences in the future.
We all want AI to solve our hardest problems. But AI is only as strong as the data foundation it stands on. That foundation starts with clean inputs (hygiene), smart tradeoffs (bankruptcy), and smart architecture (modularity).
If you want AI that works tomorrow, start building the right data habits today.