CUSTOMIZED SERVER INFRASTRUCTUREWe started at the lowest level. We fabricated our own chassis, sourced hardware components tailored to our specific workload, and deployed the assembled servers in our own datacenter. We also designed and deployed a custom layout for our server racks in order to optimize cooling, ease of maintenance, networking speed, and power redundancy.
Each of our servers also runs our own flavor of Linux that’s customized to ensure data moves quickly throughout our system and reduce variability in performance.
RELIABILITY BY DESIGNBy building and integrating our entire technology stack, we gained a deep understanding of the behavior and possible failures that can occur. This has allowed us to add redundancy and appropriate failover policies at each integration point, minimizing the impact of failures on our customers.
At the infrastructure level, we’ve designed our system for maximum availability to ensure our users can always reliably access their content stored in Upthere. We are also constantly improving our storage redundancy mechanisms — like replication and erasure encoding — to prevent data loss and corruption. We have even built our own monitoring and analytics platform to provide real-time insights into our system’s performance and reliability.
All the code running on our servers is written with the mandate that losing customer data is unacceptable, even though we know things can and will go wrong. Our data model is based on idempotent user operations, so if servers fail or otherwise hit an irrecoverable error, we can replay operations in the cloud later to recover state changes that would otherwise be lost.
Finally, our client applications operate under the same paradigms of idempotent user operations. This enables our clients to safely retry and replay operations in order to update local state more efficiently and ensure user changes always make it to the cloud.
EFFICIENT AND FLEXIBLE STORAGE MODELWe also focused on performance when designing the way we store files. When data enters the system, Upthere extracts and indexes metadata, storing it separately from the full payload. This allows us to optimize data retrieval for specific usage patterns. It also enables us to auto-organize content and provide a powerful search engine.
This design also benefits client devices, which can minimize data usage by only fetching and caching the specific metadata and previews needed for the task at hand.
SCALABLE PROCESSING PIPELINEWe decoupled our data-processing pipeline — which handles things like metadata extraction and indexing — from the actual data-ingestion pipeline. This allows us to manage the scale of our compute resources independently of our ingestion machines.
This decoupling provides a major operational benefit when scaling our backend infrastructure, and it also ensures that intensive data processing requirements will never inhibit our ability to reliably save and store new data coming into the system.
LAYERED SECURITY APPROACHAs a custodian of humankind’s information, we consider data security a top priority. Our vertical integration gives us deep insight into potential vulnerabilities and control over securing weaknesses. Each component of our stack is compartmentalized and designed with the assumption that any other could be vulnerable — the “Defense in Depth” strategy.
Upthere follows best practices like bcrypt hashed passwords and TLS, but at the end of the day, the weakest link in data security tends to be human behavior. To this end, we enforce stringent password requirements and offer two-factor authentication; going forward, we’ll continue to reinforce our efforts in addressing human vulnerabilities.
Traditional storage services treat consumer devices as independent storage locations, yielding a fragmented data experience. Instead of directly accessing the device- or OS-specific file system to store and cache data, client applications using the Upthere framework queue up mutation "tasks" to be performed at the optimal time. The framework acts as a write-through cache, asynchronously dispatching tasks to the server when the device has optimal connectivity and providing results to the client when available.
The Upthere framework makes it easy to fetch metadata, previews, and payloads independently, automatically taking care of network utilization and transparently caching data. Clients automatically benefit from speedy retrieval, efficient data usage, and optimized battery consumption with no additional work. This design combines the power of the cloud with the deeply integrated native APIs of today’s modern operating systems to enable seamless, cloud-based storage experiences.
Upthere, on the other hand, treats a file as a piece of rich data that can be consumed in a variety of ways appropriate to the device and use case at hand. Our data-processing pipeline handles a wide array of file types to extract appropriately-sized previews, transcoded video streams, and detailed, searchable metadata. The myriad rich metadata extracted from each different type of file allows for an extremely efficient consumption and sharing experience for our customers.
When a user wants to consume content, Upthere considers network speed and device form factor in order to optimize what resolutions are transferred to the local device. Client applications pull only what they need in order to provide the fastest, yet still comprehensive, user experience. This also enables customers to view almost any type of file from within one unified experience, Upthere Home, rather than having to find, download, and install specific applications on multiple devices.
Sharing between devices and people is faster and more efficient with UpOS™ than with traditional sync models. Instead of always downloading or transferring entire payloads between devices, UpOS™ simply distributes available metadata and lightweight previews to recipients. The full payload is only downloaded when the recipient needs it.
Unlike traditional folder hierarchies that require manual maintenance of nested organizational structures, UpOS™ is entirely query-based, allowing for complex query combinations to surface content in whatever way our customers’ minds work. This is also true of our Loops infrastructure, used for organization and sharing; rather than copying or moving payloads between folders, Loops can dynamically represent the same piece of content in different contexts defined by different queries.