6 Questions with Head of Enterprise Solutions of Oasis Labs, Vishwa Raman.
As Head of Enterprise Solutions at Oasis Labs, Vishwa Raman leads multi-faceted efforts that include engineering, product, and customer success. He earned a PhD in computer science from the University of California, Santa Cruz, with postdoctoral research experience at Carnegie Mellon. An engineer by training with experience in formal methods, applied machine learning, and differential privacy, he has worked in the security and privacy industry for over ten years, and over the last six years, he has worked in blockchain.
Let’s follow alongside some of his research and dive deeper into decentralized AI with six questions!
Oasis: How would you personally define decentralized AI, and how has this evolved since the dawn of AI technology?
Vishwa Raman: Tackling the second question first, AI technologies have existed for far longer than the idea of decentralized AI. The latter is less an evolution since the dawn of AI and is more an attempt at a revolution that has gained momentum since the advent of Large Language Models (LLMs) and ChatGPT. It attempts to stem an emergent concern that AI is too powerful to be controlled by a select few and needs to be democratized with transparency around training, inference, and incentives.
Decentralized AI, in my view, is best captured by the following visual:
AI is a powerful technology and the strides we have seen over the last few years are breathtaking. It is poised to become the most significant derivative data product in generations. The understanding of what data is used to build models, how useful they are when applied to specific human scale problems, and ensuring value flows all the way back to the contributors of the data and models, should be vested with humanity. Anyone anywhere should be able to build models that solve problems that improve outcomes for everyone. Transparency in training and inference, data provenance, data utility, incentive mechanisms, governance, fairness and inclusivity are critical and best enabled using trustless, confidential, and privacy-preserving blockchain technologies.
O: How did you start working with Decentralized AI and what are the benefits you have seen in your research?
VR: The earliest foray into deep learning for me was building a multi-modal driver assistive system, where a driver could talk to an AI assistant like the way they would talk to a friend, with context augmented by where the driver was looking. This was in 2012 using Theano, a system that predates TensorFlow, to build a Multi-Layer Perceptron to classify driver head orientation. Needless to say, we have come a long way in the 12 years since that work.
The earliest foray into decentralized AI was building a simple logistic regression model for classification to run as a smart contract on our early Rust-based runtime, predating Emerald! This was an on-chain proof-of-concept and could handle miniscule datasets of a few thousand vectors and a handful of classes. It led to an affirmation of the fact that large-scale machine learning on-chain, even if realizable, will be prohibitive in both time and cost. We, therefore, built Parcel that would support off-chain confidential compute using Google Confidential VMs with the expectation that we would commit data access requests and grants on-chain, store data confidentially off-chain, and relegate compute intensive operations, such as ML training, to off-chain workers. But, Parcel may have been too early for its time. AI training, via federated learning, remained in prototype and discussion stages.
With the advances in transformer based models such as LLMs, we are at a fascinating stage where realizing our vision is well within our grasp, given Sapphire and ROFL.
The benefits, which we have pursued since the formation of the company, are in the above graphic.
O: Can you break down the architecture of Decentralized AI and ROFL, and the use cases of working alongside Oasis architecture such as Sapphire or other technologies?
VR: Data is non-rival – once it is shared in its raw form, it can get reproduced, reshared, and reused ad infinitum without the data owner being aware of how it is being used and monetized. Consider AI pipelines: The data used to train models oftentimes has a clearly defined owner. Individuals as owners of their identity information, demographic data, financial data, wealth and buying patterns, thought leadership pieces, works of fiction and non-fiction, talks, lectures, etc. Corporations as owners of the vast amounts of data for which they have data rights, and the products they build with that data provide services and improve outcomes (for example healthcare systems).
Once this data is shared, it is difficult to know how much value is derived from the data via derivative products such as AI models. Responsible and privacy-preserving use of data brings us to the following requirements:
A. Data use requires data owner consent
B. Data use should be within confidential environments so that the data is used by an attested, verified algorithm and is not accessible to any individual or corporation
C. The infrastructure that performs computation should be verifiably confidential and not owned by any single entity to ensure trustlessness
D. The value derived from data, either in kind or otherwise, should feed back to the data owners
Sapphire and the Oasis Network enable all of the above. For AI pipelines, we have the following additional requirements:
A. Training workloads should run with confidentiality for the training data, with the models that are generated, stored confidentially, with defined ownership. The data should never be used for any purpose other than training
B. Derivative products such as AI models, which “carry” the intelligence in the data, and are the model providers’ intellectual property, should be used within confidential environments so that the inference results flow out, but the models remain private and confidential
These requirements cannot be satisfied in fully-replicated verifiable computing environments, as in blockchain runtimes, such as Sapphire. The compute costs will be prohibitive, besides the inability to perform some operations given the constrained blockchain runtime environments that preclude network access, non-deterministic algorithms, and communication between compute nodes. The storage costs will be infeasible, since AI models are oftentimes billions of bytes in size.
This brings us to ROFL, which, in simple terms, provides similar mechanisms for verifiable computing as Sapphire and the Oasis Network but with the ability to run compute intensive workloads, such as AI training and inference off-chain, to enable scale and to reduce cost.
From an architectural standpoint, ROFL is an extension of Sapphire, that enables anyone to train AI models off-chain and perform inference off-chain, using similar confidentiality guarantees as on-chain computation at a fraction of the compute and storage costs. As an extension to Sapphire, ROFL enables these AI workloads to seamlessly use Sapphire for verifiability, provenance tracking, usage tracking, value determination and transfer, extending trustlessness and transparency to AI pipelines.
Anyone should be able to run ROFL nodes and support a decentralized network of confidential computing for use in AI training and inference. Validators can co-locate ROFL nodes with their Sapphire and consensus nodes and establish a confidential computing mesh for decentralized AI. That would be the vision.
This enables the realization of the objectives outlined in our graphic for decentralized AI!
O: With the integration of blockchain technology, what objectives does Decentralized AI look to accomplish in the blockchain industry and beyond?
VR: The following are objectives that can be tackled by Decentralized AI using a blockchain network with or without confidentiality,
A. Provenance tracking for the data that goes into training decentralized AI models so that consumers know exactly what data was used to build each model, to enable informed choice in the use of these models
B. Transparency in value flows, where payments for model use are in crypto, opening up the possibility of compensating data owners in a verifiable non-repudiable manner given provenance
C. Establishment of Decentralized Autonomous Organizations (DAOs), that bring together a consortium of related industry entities to provide their data and/or models for use, with on-chain tracking of the value each organization brings, based on usage, to enable transparency in incentives, and democratization of governance.
Note that without confidentiality in use, as provided by the likes of ROFL, the data and models will remain in place as sharing them requires trust that hasn’t worked well in the past leading to data silos. With confidential runtimes like Sapphire and verifiably confidential off-chain computation provided by ROFL, we can tackle the following additional objectives,
A. Enable the movement of data and models from the clouds that host them to any decentralized node that provides confidential computing
B. Enable joint training of AI models using data from different entities given data confidentiality in-use
C. Enable models to freely participate in marketplaces without explicit trust, as the infrastructure handles the required confidentiality and integrity
In short, ROFL puts the decentralized in Decentralized AI.
O: Considering the explosive growth and evolution of AI technologies, how can you see Decentralized AI working with various industries in the future?
VR: There is no difference in the way Decentralized AI works with industries compared to centralized models. The difference is trust and consequent loss in market efficiency. Decentralized AI has the ability, when combined with confidential computing, to truly make data fluid with the underlying blockchain ensuring fair and responsible use. With data fluidity comes efficiency as anyone with sufficient background can build and deploy a model, for open use, with an expectation for fair compensation. It simply makes the data markets more efficient, trustless, and democratic.
O: Can you share some info about the current research you are working on?
VR: Most of my work, if not all of it, is in the area of privacy-preserving access to sensitive data. One aspect of what I do is enabling access to regulated data, such as HIPAA regulated healthcare data, with minimal friction, using differential privacy. The other aspect, which I am an ardent fan of, is democratization of healthcare using AI. The day we realize a DAO of healthcare institutions that share and deploy AI models so that every doctor has the same ability to help her patients as the best regarded specialist for her patient’s condition, is the one I cannot wait for. A doctor should be able to call a “friend” to discuss a patient’s condition and jointly determine the best treatment option available to her patient to ensure the best long-term outcomes.