The Case for Cross-entity Delta Encoding in Web Compression (Extended)

Authors

  • Benjamin Wollmer 1) University of Hamburg, Germany, 2) Baqend, Hamburg, Gemerany
  • Wolfram Wingerath 2) University of Oldenburg, Germany 3) Baqend, Hamburg, Gemerany
  • Sophie Ferrlein Baqend, Hamburg, Gemerany
  • Fabian Panse University of Hamburg, Germany
  • Felix Gessert 1) University of Hamburg, Germany, 2) Baqend, Hamburg, Gemerany
  • Norbert Ritter University of Hamburg, Germany

DOI:

https://doi.org/10.13052/jwe1540-9589.2217

Keywords:

Delta encoding, caching, dictionary compression

Abstract

Delta encoding and shared dictionary compression (SDC) for accelerating Web content have been studied extensively in research over the last two decades, but have only found limited adoption in the industry so far; compression approaches that use a custom-tailored dictionary per website have all failed in practice due to lacking browser support and high overall complexity. General-purpose SDC approaches such as Brotli reduce complexity by shipping the same dictionary for all use cases, while most delta encoding approaches just consider similarities between versions of the same entity (but not between different entities). In this study, we investigate how much of the potential benefits of SDC and delta encoding are left on the table by these two simplifications. As our first contribution, we describe the idea of cross-entity delta encoding that uses cached assets from the immediate browser history for content encoding instead of a precompiled shared dictionary; this avoids the need to create a custom dictionary, but enables highly customized and efficient compression. Second, we present an experimental evaluation of compression efficiency to hold cross-entity delta encoding against state-of-the-art Web compression algorithms. We consciously compare algorithms some of which are not yet available in browsers to understand their potential value before investing resources to build them. Our results indicate that cross-entity delta encoding is over 50% more efficient for text-based resources than compression industry standards. We hope our findings motivate further research and development on this topic.

The extended version of our previously published paper [10] includes an additional section on the deltas of HTML files, a more detailed description of our approach (including a new visualization for the different dictionary strategies), a deeper discussion of compression efficiency, and details on additional future and ongoing work.

Downloads

Download data is not yet available.

Author Biographies

Benjamin Wollmer, 1) University of Hamburg, Germany, 2) Baqend, Hamburg, Gemerany

Benjamin Wollmer is a data engineer at Baqend as well as a PhD student at the databases and information systems group (DBIS) at the University of Hamburg. His PhD thesis is supervised by Norbert Ritter and his research interests revolve around efficient data transmission and compression algorithms in the Web. As a data engineer at Baqend, Benjamin is also part of the team that develops and operates the real-user monitoring and analytics solution built into Speed Kit to generate performance-critical insights.

Wolfram Wingerath, 2) University of Oldenburg, Germany 3) Baqend, Hamburg, Gemerany

Wolfram Wingerath is professor for data science at the University of Oldenburg (UOL) and Data Science Advisor at Baqend. Before joining UOL in 2022, Wolle headed Baqend’s data engineering team and was responsible for developing and operating Baqend’s real-user monitoring pipeline for zero-latency analytics. His research interests revolve around data-intensive applications and high-performance web infrastructure, but he also has a passion for hands-free coding to increase productivity (https://handsfree-coding.gi.de). Wolle has published several books, articles, and tutorials on these topics together with his colleagues, and frequently talks about his research at scientific and developer conferences.

Sophie Ferrlein, Baqend, Hamburg, Gemerany

Sophie Ferrlein is a Data Scientist at Baqend where she turns web tracking data into actionable web performance insights and data products. Based on her BSc in Computer Science for Media Applications and her BA in Journalism, she focuses her professional efforts on communicating data effectively.

Fabian Panse, University of Hamburg, Germany

Fabian Panse currently heads the databases and information systems group (DBIS) at University of Hamburg as a substitute professor where he had been a postdoctoral researcher before. Fabian has been doing research in the fields of deduplication, uncertain data management and data quality since 2009. During this time, he wrote several papers that address the problems of measuring data quality, evaluating duplicate detection algorithms, and test data generation.

Felix Gessert, 1) University of Hamburg, Germany, 2) Baqend, Hamburg, Gemerany

Felix Gessert is the CEO and co-founder of the Backend-as-a-Service company Baqend. During his PhD studies at the University of Hamburg, he developed the core technology behind Baqend’s Web performance service. Felix is passionate about making the Web faster by turning research results into real-world applications. He frequently talks at conferences about exciting technology trends in data management and Web performance. As a Junior Fellow of the German Informatics Society (GI), he is working on new ideas to facilitate the research transfer of academic computer science innovation into practice.

Norbert Ritter, University of Hamburg, Germany

Norbert Ritter is the Dean of the Faculty for Mathematics, Informatics, and Natural Science of the University of Hamburg, but headed the databases and information systems group (DBIS) as a full professor until August 2022. Norbert received his PhD from the University of Kaiserslautern in 1997. His research interests include distributed and federated database systems, transaction processing, caching, cloud data management, information integration, and autonomous database systems. He has been teaching NoSQL topics in various database courses for several years. Seeing the many open challenges for NoSQL systems, he, Felix, Wolle, and Benjamin have been organizing the annual Scalable Cloud Data Management Workshop (https://scdm.cloud) to promote research in this area.

References

Jyrki Alakuijala, Andrea Farruggia, Paolo Ferragina, Eugene Kliuchnikov, Robert Obryk, Zoltan Szabadka, and Lode Vandevenne. Brotli: A general-purpose data compressor. ACM TOI, 37(1).

Mun Choon Chan and T.Y.C. Woo. Cache-Based Compaction: A New Technique for Optimizing Web transfer. In IEEE INFOCOM ’99. Conference on Computer Communications.

Dane Orion Knecht, John Graham-Cumming, and Matthew Browning Prince. Method and apparatus for reducing network resource transmission size using delta compression.

David G Korn and Kiem-Phong Vo. Engineering a differencing and compression data format. In USENIX annual technical conference, general track, pages 219–228, 2002.

Bryan McQuade, Kenneth Mixter, Wei-Hsin Lee, and Jon Butler. A proposal for shared dictionary compression over http. 2016.

Jeffrey C. Mogul, Fred Douglis, Anja Feldmann, and Balachander Krishnamurthy. Potential Benefits of Delta Encoding and Data Compression for HTTP. SIGCOMM CCR, 1997.

Omer Shapira. SDCH at LinkedIn., 2015. Accessed: 2022-01-20.

Wolfram Wingerath, Felix Gessert, Erik Witt, Hannes Kuhlmann, Florian Bücklers, Benjamin Wollmer, and Norbert Ritter. Speed Kit: A Polyglot & GDPR-Compliant Approach For Caching Personalized Content. In ICDE, Dallas, Texas, 2020.

Benjamin Wollmer, Wolfram Wingerath, Sophie Ferrlein, Felix Gessert, and Norbert Ritter. Compaz: Exploring the Potentials of Shared Dictionary Compression on the Web. In 22th International Conference on Web Engineering, ICWE, 2022.

Benjamin Wollmer, Wolfram Wingerath, Sophie Ferrlein, Fabian Panse, Felix Gessert, and Norbert Ritter. The Case for Cross-Entity Encoding in Web Compression. In Proceedings of the 22nd International Conference on Web Engineering (ICWE).

Benjamin Wollmer, Wolfram Wingerath, and Norbert Ritter. Context-Aware Encoding & Delivery in the Web. In 20th International Conference on Web Engineering, ICWE, 2020.

Downloads

Published

2023-04-20

How to Cite

Wollmer, B. ., Wingerath, W. ., Ferrlein, S. ., Panse, F. ., Gessert, F. ., & Ritter, N. . (2023). The Case for Cross-entity Delta Encoding in Web Compression (Extended). Journal of Web Engineering, 22(01), 131–146. https://doi.org/10.13052/jwe1540-9589.2217

Issue

Section

ICWE2022