Decentralized storage

anca 4 Courses

Who Actually Controls Your Data, and Why It Matters

Have you ever wondered what happens to your files after you upload them to Google Drive or Dropbox? Somewhere, in a data center you will never visit, a corporation now holds your data, your encryption keys, and the power to delete, censor, or hand over your information to a government at any time. Decentralized storage was built to change this. In this lesson, we will explore how it works, why privacy is still not automatic even in decentralized systems, and what it takes to build a storage stack that is truly private by design.

What Is Decentralized Storage?

Traditional cloud storage uses location addressing: you retrieve a file by pointing to a specific server URL like s3.amazonaws.com/bucket/photo.jpg. If that server goes offline, the company gets acquired, or a government issues a takedown order, your link breaks and your data may disappear. Three companies (AWS, Azure, and Google Cloud) now control roughly 67% of the world’s cloud infrastructure, creating enormous honeypots of concentrated data.

Decentralized storage replaces this model with content addressing. Files are identified by a cryptographic hash of their contents, a unique digital fingerprint generated from the data itself. The same photo stored on 100 different nodes always produces the same address. This eliminates link rot, mathematically guarantees data integrity, and removes dependence on any single server. Nobody can quietly swap out your file for something else without changing its address.

The leading protocols each solve the storage problem differently, and understanding those differences matters especially when it comes to privacy.

The Main Protocols and How They Work

IPFS (InterPlanetary File System) is the foundational peer-to-peer protocol. When you add a file, it is broken into blocks, each cryptographically hashed, and organized into a structure called a Merkle DAG. The file receives a unique Content Identifier (CID), and anyone on the network who has the file can serve it to you. IPFS maintains roughly 23,000 active peers in its public network. However, IPFS alone does not guarantee your data persists. If nobody commits to storing your file (a process called “pinning”), it can eventually be garbage-collected from nodes whose owners did not intentionally keep a copy.

Filecoin solves the persistence problem by building an economic incentive layer on top of IPFS. Storage providers lock up FIL tokens as collateral and earn rewards for storing client data, while two cryptographic proofs keep them honest. Proof-of-Replication verifies that a provider has created a unique physical copy of your data through a computationally intensive sealing process. Proof-of-Spacetime then continuously audits providers through random challenges, proving they still hold the data over time. As of late 2025, Filecoin stores roughly 1,110 PiB of actual client data, with clients including the Smithsonian Institution and the Internet Archive.

Arweave takes a radically different approach: pay once, store forever. A one-time payment in AR tokens funds a storage endowment that pays miners gradually over 200 or more years, based on the conservative assumption that storage costs decline at least 0.5% annually. Its “blockweave” structure links each new block to both the previous block and a randomly selected older block, incentivizing miners to store as much historical data as possible. The permaweb processes over 30 million transactions daily and serves 288 million data requests per day.

Storj and Sia are the most privacy-forward by default. Storj encrypts all data client-side with AES-256-GCM before sharding it into 80 pieces distributed across independent nodes. You only need any 29 of those 80 pieces to reconstruct the original file. Sia uses a similar model with 30 encrypted shards (10 needed to rebuild), enforced through on-chain smart contracts where hosts lock up Siacoin collateral that gets slashed for poor performance.

Protocol	Model	Encryption Default	Persistence	Best For
IPFS	Peer-to-peer file system	None (content is public)	Only while pinned	Linking & content addressing
Filecoin	Economic storage market	None natively; user must encrypt	Contract-enforced with proofs	Large-scale verified storage
Arweave	Permanent blockweave	None (content is public & permanent)	200+ year endowment model	Permanent records, NFT metadata
Storj	Sharded cloud storage	AES-256-GCM client-side by default	Subscription-based	Private file storage, S3 replacement
Sia	Contract-based storage	Client-side by default	Collateral-backed smart contracts	Developer-controlled private storage

Table 1: Decentralized storage protocols compared by architecture and privacy defaults

Why Centralized Storage Is Structurally Broken for Privacy

The privacy failures of centralized cloud storage are not occasional bugs. They are features of the architecture. When billions of records sit in a single company’s data centers, that company becomes both an irresistible target for hackers and a convenient pipeline for government surveillance.

IBM’s 2024 Cost of a Data Breach Report found the global average breach cost hit $4.88 million, a 10% jump and the largest annual increase since the pandemic. Breaches in cloud environments averaged $5.17 million. Healthcare topped the list for the fourteenth consecutive year at $9.77 million per breach. The 2025 report showed a slight dip to $4.44 million globally, but U.S. breaches climbed to an all-time high of $10.22 million.

The 2024 Snowflake breach campaign illustrates the cascading danger of centralized data. A single threat actor used stolen credentials to access approximately 165 Snowflake customer accounts that lacked multi-factor authentication, exposing 109 million AT&T customer records, 560 million Ticketmaster records, and sensitive data from Santander Bank, Neiman Marcus, and dozens more. In December 2024, the PowerSchool breach compromised records of an estimated 62 million students and 10 million teachers across 6,000 schools, including Social Security numbers and medical records dating back decades.

Surveillance compounds the problem. Under FISA Section 702, reauthorized in April 2024 with an expanded definition of which companies must comply, the U.S. government can compel cloud providers to hand over user data, often under gag orders that prevent disclosure to affected users. The reauthorization actually broadened potential surveillance to companies that provide Wi-Fi, manage data centers, or maintain communications equipment. As Edward Snowden put it: “If you run on Google’s or Amazon’s technology, how do you know when it starts spying on you? You have no awareness because it happens at a hidden layer of the software.”

Censorship risk is equally real. AWS’s deplatforming of Parler in January 2021, combined with simultaneous action by Apple, Google, Okta, and Stripe, demonstrated that a handful of infrastructure providers can erase a company from the internet overnight. With AWS controlling approximately 31% of cloud market share, these decisions carry enormous power over what information can exist online.

The Five Layers of Privacy Protection in Decentralized Storage

Decentralized storage deploys multiple overlapping privacy mechanisms. Understood individually, each is simple. Combined, they create a defense-in-depth that centralized systems cannot match.

1. Client-Side Encryption

Your data is encrypted on your own device before it ever touches the network. Only you hold the decryption key, not the storage provider, not node operators, not anyone. Storj and Sia do this automatically. For IPFS, Filecoin, and Arweave, you must encrypt before uploading because they do not do it natively. The critical difference from centralized storage: when AWS encrypts your data “at rest,” AWS holds the keys. With client-side encryption, losing your key means losing your data permanently. There is no “forgot password” button, but there is also no backdoor.

2. Sharding

Sharding splits encrypted files into many small pieces distributed across different nodes in different locations. An individual node operator sees only one encrypted fragment, which is meaningless without all other pieces and the decryption key. Storj creates 80 shards across 80 nodes. Sia distributes 30 shards across 30 hosts. Even compromising multiple nodes reveals nothing useful.

3. Erasure Coding

Erasure coding provides redundancy without storing complete copies. Using Reed-Solomon mathematics, extra “parity” pieces are generated so the original file can be reconstructed from any subset. Storj’s 29-of-80 scheme means 51 nodes (63%) can go offline simultaneously and your data is still fully recoverable. This is far more resilient than traditional three-copy replication, while using similar storage overhead.

4. Encryption in Transit and at Rest

Encryption in transit (via TLS) protects data while it moves between your device and storage nodes, preventing eavesdropping. Encryption at rest ensures stored data remains unreadable even if physical drives are stolen. In decentralized systems, the combination of client-side encryption with both transit and rest encryption means data is never exposed in plaintext outside your device.

5. Zero-Knowledge Proofs

Zero-knowledge proofs allow storage providers to prove they are storing your data correctly without revealing the data itself. Filecoin uses zk-SNARKs to compress its Proof-of-Spacetime proofs for on-chain verification. The provider essentially says “I still have your data” and proves it mathematically, without transmitting or exposing any of the actual content.

The Privacy Gap: Even Decentralized Storage Has Limits

Important: IPFS, Filecoin, and Arweave are all public by default. Decentralized does not automatically mean private. This is the most important nuance in this entire lesson.

The official IPFS documentation is explicit on this point. IPFS itself does not protect knowledge about CIDs and the nodes that provide or retrieve them. Anyone who knows a CID can access the associated file. There is no built-in access control layer. IPFS uses transport encryption (data encrypted between nodes) but not content encryption, meaning data at rest is unencrypted on every node that stores it.

CIDs are deterministic: identical content always produces the same hash, enabling tracking. If you know what a file looks like, you can compute its CID and search for who hosts or requests it. The Distributed Hash Table (DHT) that IPFS uses to locate content is itself a privacy leak. All DHT queries happen in public, so third parties can monitor traffic to see what CIDs are being requested, when, and by whom. When you retrieve content, your node by default advertises that it now has that content, making your interest in specific files visible to the network.

Filecoin inherits IPFS’s limitations and adds its own. All storage deals are recorded on-chain and publicly auditable. Even if you encrypt your data, the metadata (who stored how much data, with which provider, for how long) remains visible, enabling traffic analysis. Filecoin’s own documentation warns that uploading unencrypted files allows storage providers to read them and share copies with third parties.

Arweave’s permanence amplifies privacy risks. Data is not only public but permanent and immutable. There is no right to be forgotten. Unencrypted sensitive data uploaded to Arweave is exposed forever. Even encrypted permanent data carries risk: future cryptographic breakthroughs, including advances in quantum computing, could eventually break today’s encryption standards, and the data will still be there waiting.

Pinning services like Pinata and web3.storage, which many projects rely on for IPFS persistence, are centralized entities that can be legally compelled to remove content. This partially undermines the censorship resistance claims that decentralized storage is often marketed on.

How Privacy-Preserving Blockchains Close the Gap

The solution architecture emerging in Web3 combines decentralized storage (for cheap, persistent, censorship-resistant data hosting) with a privacy and access control layer that manages encryption keys and authorization logic in a confidential environment.

This is exactly the kind of problem that Oasis Network was designed for. Oasis’s privacy-first philosophy holds that confidentiality should be a default property of computation, not an optional add-on. Its Sapphire environment uses Trusted Execution Environments (TEEs), which are hardware-isolated secure enclaves where encrypted data enters, gets decrypted and processed inside the protected zone, then gets re-encrypted before leaving. Node operators cannot see transaction inputs, return values, or smart contract state. Applied to storage, a confidential smart contract can hold encryption keys and access policies in private state, verify a user’s authorization inside the enclave, and release decryption keys only to authorized parties, all without exposing the keys to anyone, including the validators running the network.

Other projects are building complementary infrastructure. Lit Protocol provides decentralized key management through threshold multi-party computation. Developers define access control conditions based on on-chain rules: for example, “only holders of NFT X can decrypt” or “only wallets with 100 or more tokens.” When a user requests decryption, network nodes verify the conditions and each contributes a key share that assembles into the decryption key. No single node ever holds the complete key. With over $422 million in assets under decentralized management, Lit has become a widely used access control layer for Web3 storage applications.

The full three-layer model works like this in practice. First, encrypt data client-side using keys managed by a privacy-preserving layer. Second, upload the encrypted data to IPFS, Filecoin, or Arweave. Third, register the CID and key metadata in a confidential smart contract and define access policies. Fourth, when someone requests access, the privacy layer verifies authorization inside a TEE or multi-party computation network and releases the decryption key only to authorized wallets. This approach solves every major gap: content is encrypted (not public), metadata is confidential (not exposed on-chain), access is controlled (not open to anyone with the CID), and the entire system remains decentralized (no single point of failure).

Real-World Use Cases Already Running Today

NFT metadata was the first mainstream use case. When NFT images live on centralized servers, they can disappear overnight. Over 12,000 NFTs were delisted in 2025 due to broken hosting. Solana’s Metaplex framework now defaults to Arweave for all metadata, with over 10,000 NFT projects launched this way. After discovering that Bored Ape Yacht Club metadata was on a centralized server, Yuga Labs scrambled to migrate to IPFS. This episode drove industry-wide adoption and became a cautionary tale that every NFT project now knows by heart.

AI and data infrastructure is the fastest-growing segment. Filecoin’s Onchain Cloud, launched on mainnet in January 2026, positions itself as a full decentralized alternative to AWS for AI workloads, with verifiable storage ensuring training data integrity. Arweave’s AO hyper-parallel computing layer, launched in February 2025, enables decentralized AI agents whose entire state history is permanently stored and recomputable. The DePIN (Decentralized Physical Infrastructure) sector grew from $5.2 billion to over $19 billion in market cap between 2024 and September 2025, driven largely by AI storage demand.

Healthcare represents the highest-stakes privacy application. Projects like BurstIQ (a HIPAA-compliant blockchain platform), Medicalchain, and Patientory store encrypted patient data on IPFS while using blockchain for access control and audit trails. Estonia’s national healthcare system uses blockchain-based infrastructure for record integrity. Research published in 2025 demonstrated an IPFS-blockchain integration using attribute-based encryption for fine-grained medical record access control, where a doctor can be granted access to one category of records without seeing the full patient file.

DeFi privacy is evolving beyond simple file storage toward confidential computation on financial data. One protocol encrypted its entire on-chain order book with just 47 modified lines of Solidity, demonstrating that privacy does not have to mean a complete architecture rebuild.

Centralized vs. Decentralized Storage at a Glance

Dimension	Centralized (AWS, Google, Azure)	Decentralized (IPFS, Filecoin, Storj, etc.)
Privacy	Provider holds keys; subject to subpoenas and internal scanning	Client-side encryption; user holds keys (Storj and Sia natively)
Censorship Resistance	Provider can delete content; governments compel removal	Data replicated across thousands of independent nodes globally
Availability	99.99% SLA (AWS S3); single-company dependency	99.97% (Storj); no single point of failure; IPFS fragile without pinning
Cost (1 TB/month)	~$23 (AWS S3 Standard) plus egress fees	$0.19 (Filecoin) to $10 (Storj); Arweave ~$5,000 one-time permanent
Performance	Millisecond retrieval; global CDN	Seconds to minutes; improving but not real-time ready
Data Permanence	Exists only while you pay; provider can delete	Arweave: 200-year guarantee; Filecoin: contract-enforced proofs
User Control	Account-based; ToS can change; accounts can be frozen	Key-based; cryptographic ownership; no accounts to freeze
Regulatory Compliance	Mature (HIPAA, SOC 2, FedRAMP certifications available)	Developing; GDPR conflicts with immutability; jurisdictional complexity
Developer Experience	Excellent SDKs, extensive documentation, one-click deployment	Improving; Storj offers S3 compatibility; steeper learning curve overall

Table 2: Centralized vs. decentralized storage across key dimensions

Privacy Features by Protocol: A Practical Comparison

Feature	IPFS	Filecoin	Arweave	Storj	Sia
Client-side encryption by default	No	No	No	Yes	Yes
Public content by default	Yes	Yes	Yes (permanent)	No	No
Access control built in	No	No	No	Partial	Partial
On-chain deal metadata visible	N/A	Yes	Yes	No	Yes (host contracts)
Node operator can read data	Yes (if unencrypted)	Yes (if unencrypted)	Yes (if unencrypted)	No (sharded + encrypted)	No (sharded + encrypted)
Cryptographic storage proofs	No	Yes (zk-SNARKs)	Yes (Proof of Access)	Audited contracts	Yes (proof of work)
Right to deletion	Partial (unpin)	Partial (let contract expire)	No (permanent)	Yes	Yes
Compatible with privacy layers (Oasis, Lit)	Yes	Yes	Yes	Yes	Yes

Table 3: Privacy feature comparison across the major decentralized storage protocols

Conclusion: Privacy Requires Intentional Architecture, Not Just Decentralization

The core insight from this lesson is that decentralization is necessary but not sufficient for privacy. IPFS, Filecoin, and Arweave provide censorship resistance, redundancy, and data integrity. But without additional encryption and access control layers, they are actually more public than Google Drive. The real privacy gains come from combining decentralized storage with client-side encryption (Storj and Sia do this natively), zero-knowledge proofs (Filecoin’s zk-SNARKs), and confidential access control (Oasis Network, Lit Protocol).

The cost advantage is striking: decentralized storage runs roughly 78% cheaper on average than centralized alternatives. But the deeper value proposition is architectural. In a world where a single misconfiguration exposed 109 million phone records and government surveillance authority was just expanded to cover more companies, the question is not whether centralized storage will fail on privacy again. It is when. Decentralized storage, properly layered with privacy-preserving technology, eliminates the single points of failure, the data honeypots, and the corporate backdoors that make those failures inevitable.

The technology is production-ready today. The missing piece is awareness, which is exactly what this lesson was designed to address.

Transparency Note: The video introduction to this lesson was generated using NotebookLM. We’ve included this AI-synthesized summary to offer a visual and conversational way to grasp the core concepts. However, for the specific technical details please rely on the written lesson above.

Previous Lesson

Back to Course

WEB 3

Course Content