2. Are We Ready for AI Orchestration at Scale?
Inventory your container estate (Kubernetes/OpenShift), continuous integration and continuous delivery, and observability. Validate multitenant isolation, GPU scheduling, secrets management and software bill of materials/patch workflows. Align with DevSecOps baselines, Trusted Internet Connections 3.0 and zero-trust plans. If maturity is low, start with managed orchestration while you harden pipelines and standardize images.
3. Can Our Facilities Support AI Workloads?
Confirm power density, cooling and floor space for GPUs; review UPS and generator capacity. Assess network throughput to systems and cloud exchanges. Check physical security requirements, supply chain lead times and maintenance windows. Where constraints exist, prioritize colocation or provider GPU capacity while modernizing core data center infrastructure.
DISCOVER: Implementing AI requires thoughtful planning to ensure long-term viability.
4. What Governance Should We Apply to AI Workloads?
Adopt the National Institute of Standards and Technology’s AI Risk Management Framework with institutional policy for model risk scoring, human oversight and privacy. Integrate approvals into campus IT governance and security review processes; require model lineage, data set provenance and model cards. Monitor drift and bias, log prompts and outputs appropriately and establish rollback procedures. Enforce procurement and vendor contract clauses addressing intellectual property, security and incident response.
5. How Do We Scale AI Without Overbuilding?
Pilot with a small, high-value use case. Rightsize GPUs/CPUs from real use, not peak estimates. Establish chargeback and total cost of ownership tracking for training versus inference. Expand iteratively across campus departments, reusing patterns and pipelines. Sunset underused resources and continually re-evaluate building versus buying as offerings mature.
