Loading…
Scylla Summit 2018 has ended
Breakout Track 2 [clear filter]
Tuesday, November 6
 

1:00pm PST

Joining Billions of Rows in Seconds with One Database Instead of Two: Replacing MongoDB and Hive with Scylla
Many organizations struggle to balance traditional big data infrastructure with NoSQL databases. Other organizations do the smart thing and consolidate the two. This presentation explores Numberly’s experience migrating an intensive and join hungry production workload from MongoDB and Hive to Scylla. Using Scylla, we were able to accommodate a join of billions of rows in seconds, while also dramatically reducing operational and development complexity by using a single database for our hybrid analytical use case. As a bonus, we’ll cover benchmarks for Dask (a flexible parallel computing library for analytic computing) and Spark, highlighting their differences and lessons learned along the way.

Speakers
avatar for Alexys Jacob

Alexys Jacob

CTO, Numberly
Numberly helps advertisers maximize the efficiency of their marketing strategy by combining digital expertise, strategic marketing vision, hands-on approach and big data skills. Alexys is passionate about distributed computing patterns and architectures and has given talks on these... Read More →


Tuesday November 6, 2018 1:00pm - 1:30pm PST
Breakout 2 - Sequoia

1:30pm PST

Access-control in Scylla: What You Can Do, How It Works, and Why It's Worth the Trouble
The security of data managed by Scylla is crucial. There are many aspects of systems and information security and Scylla includes features to address an important selection of them. In this talk, we'll discuss Scylla's support for managing identities and for defining schemes for limiting access to resources based on roles. We will discuss how these features tie in to principles of secure systems , briefly describe how the functionality is implemented, and finally demonstrate the user perspective.

Speakers
avatar for Jesse Haber-Kucharsky

Jesse Haber-Kucharsky

Software Engineer, ScyllaDB
Jesse has a strong interest in systems programming and applying math to solve engineering problems. He has worked on the software platform for self-driving cars, NFC drivers for smartphones, and large-scale distributed storage systems. He has a BSc from the University of Waterloo... Read More →


Tuesday November 6, 2018 1:30pm - 2:00pm PST
Breakout 2 - Sequoia

2:00pm PST

Introducing ValuStor, A Memcached Alternative Made to Run with Scylla
In this presentation, we share approaches to replacing RAM-only caching infrastructure while achieving high performance against a persistent datastore. Memcached has proven very popular, but it also requires its users to sacrifice reliability, scalability, redundancy, availability, and security. To address these issues, Sensaphone implemented a memcached replacement called ValuStor, an easy-to-use key-value database client layer written in C++ that works well with Scylla.  ValuStor includes features like client-side write queues, multi-threading support, automatic adaptive consistency, and support for multiple data types (including JSON).

Speakers
avatar for Derek Ramsey

Derek Ramsey

Software Engineering Manager, Sensaphone
Derek is the Software Engineering Manager at Sensaphone in Aston, Pennsylvania. His team produces remote monitoring devices that protect critical infrastructure, alerting when things go wrong and logging time-series data. A longtime advocate for free and open content, Derek authored... Read More →


Tuesday November 6, 2018 2:00pm - 2:30pm PST
Breakout 2 - Sequoia

2:30pm PST

Kiwi.com Migration to Scylla: The Why, the How, the Fails and the Status
At Kiwi.com we never stop innovating our product and our architecture. Over the past couple of years, we saw a significant rise in technology requirements both globally and internally and had already tried several database solutions. The transformation went from small applications to complex microservices architectures. We first migrated to Cassandra from a big PostgreSQL cluster to get better performance and scalability, but our demands never stopped growing. That is why we decided to go with Scylla. In this talk, I will cover how our team approached testing of Scylla, the migration plan, how it impacts our business and how it influenced our high-level architecture of the application and infrastructure. It has a significant impact on disaster recovery and availability of our overall system.

Speakers
avatar for Martin Strýček

Martin Strýček

Engineering Manager, Kiwi.com
Passionate about his work helping companies grow by constant innovation in technology, Martin was employee number 3 at piano.io, a world leader for online content monetization. Steered technology growth at Exponea, the Fastest Growing SaaS in Europe. Now managing ScyllaDB and GCP... Read More →


Tuesday November 6, 2018 2:30pm - 3:00pm PST
Breakout 2 - Sequoia

3:20pm PST

From SAP to Scylla: Tracking the Fleet at GPS Insight
Originally using SAP Adaptive Server Enterprise (ASE), the GPS Insight team soon found that relational databases simply aren’t a match for high volume machine data. To top it off, SAP ASE’s clustering technology proved cumbersome to manage and operate. In this presentation, you’ll learn about GPS Insight’s hybrid Scylla deployment that runs on-premises and on AWS datacenter. GPS Insight relies on Scylla to capture and analyze GPS data, offloading data from RDBMS to Scylla for hybrid analytics approach.

Speakers
avatar for Doug Stuns

Doug Stuns

Cassandra/Scylla Engineer, GPSInsight
Senior systems integrator specializing in enterprise deliverables focusing on Oracle, MySQL, PostgreSQL and NoSQL (Scylla, Cassandra, Couchbase/DB, Hadoop-based solutions) with development of e-commerce web applications across multiple industries. Doug has designed, developed, deployed... Read More →


Tuesday November 6, 2018 3:20pm - 3:45pm PST
Breakout 2 - Sequoia

3:45pm PST

Best Practices for Running Spark with Scylla
Spark and Scylla deployments are a common theme. Executing analytics workloads on transactional data provide insights to the business team. ETL workloads using Spark and Scylla are common too. We cover different workloads we have seen in practice and how we helped optimize both Spark and Scylla deployments to support a smooth and efficient workflow. Best practices we discuss include correctly sizing the Spark and Scylla nodes, tuning partitions sizes, setting connectors concurrency and Spark retry policies. In addition, we will cover ways to use Spark and Scylla in migrations from different data models.

Speakers
avatar for Eyal Gutkind

Eyal Gutkind

VP of Solutions, ScyllaDB
Eyal Gutkind is a solution architect for Scylla. Prior to Scylla Eyal held product management roles at Mirantis and DataStax. Prior to DataStax Eyal spent 12 years with Mellanox Technologies in various engineering management and product marketing roles. Eyal holds a BSc. degree in... Read More →


Tuesday November 6, 2018 3:45pm - 4:10pm PST
Breakout 2 - Sequoia

4:10pm PST

Rebuilding the Ceph Distributed Storage Solution with Seastar
RedHat built a distributed object storage solution named Ceph which first debuted ten years ago. Now we are seeing rapid developments in the industry and we want to take advantage of them. In this talk, we will briefly introduce Ceph, revisit the problems we are seeing when profiling its I/O performance with flash device, and explain why we want to embrace the future by switching to Seastar. We’ll share our experiences with the audience of how and when we are porting our software to this framework.

Speakers
avatar for Kefu Chai

Kefu Chai

Senior Software Engineer, Red Hat
kefu is a developer. currently, he focuses on distributed storage systems.


Tuesday November 6, 2018 4:10pm - 4:35pm PST
Breakout 2 - Sequoia

4:35pm PST

User Briefs: Discord, Nauto
Two organizations will be presenting short talks on their use cases and implementations.

Discord: The Joy of Opinionated Systems

“Infinitely configurable” is just another way to say “so many ways to shoot yourself in the foot you’ll never get bored!” In this talk we briefly explore the pitfalls of common Open Source system design and see what happens when you take the less-travelled path like ScyllaDB has. Discord has saved time and been able to move quickly without incident by deploying Scylla and trusting in the well-formed opinions of others.

Nauto: An Online Method for Merging Time Ranges on Top of Scylla

Nauto devices are installed in fleets to help improve driving behavior as well as fleet managers to know who is driving well and who is not. One of the fundamental components around which everything revolves is the notion of “trips” -- a trip being the time when the vehicle started to when it came a full stop parked. Vehicle states such as moving and stopped are inferred from accelerometer data and send to the cloud servers over LTE connections sampled at very short intervals.

These states are then combined in an online algorithm that builds trip segments, extending or combining them as and when we get more state updates. Further each trip segment records attributes such as the route and speed. In order to be able to do this at scale, we create, merge, and delete these trips as they grow in a time-series store on Scylla. The web servers directly serve these routes out of Scylla.

Speakers
avatar for Rohit Saboo

Rohit Saboo

Machine Learning Engineering Lead, Nauto
Rohit is leading an ML Engineering team at Nauto, and has worked on various efforts such as finding trips and identifying drivers for Nauto-equipped vehicles and lossless sensor data compression. He was also a founding engineer for a startup working on search-related technologies... Read More →
avatar for Mark Smith

Mark Smith

Director of Engineering, Discord
Mark Smith is currently helping bring the world together around gaming at Discord. Formerly he spent a handful of years in the infrastructure at Dropbox, once writing code that caused the data center team to get paged by the high temperature alarms. When he’s not managing or writing... Read More →


Tuesday November 6, 2018 4:35pm - 5:00pm PST
Breakout 2 - Sequoia
 
Wednesday, November 7
 

9:00am PST

Consensus in Eventually Consistent Databases
Eventually consistent databases choose to remain available under failure, allowing for conflicting data to be stored in different replicas (later repaired by background processes). Weakening the consistency guarantees improves not only availability, but also performance, as the number of replicas involved in a given operation can be minimized. There are, however, use-cases that require the opposite trade-off. Indeed, Apache Cassandra and Scylla provide Lightweight Transactions (LWT), which allow single-key linearizable updates. The mechanism underlying LWT is asynchronous consensus. In this talk, we'll describe the characteristics and requirements of Scylla's consensus implementation, and how it enables strongly consistent updates. We will also cover how consensus can be applied to other aspects of the system, such as schema changes, node membership, and range movements, in order to improve their reliability and safety. We will thus show that an eventually consistent database can leverage consensus without compromising either availability or performance.

Speakers
avatar for Duarte Nunes

Duarte Nunes

Software Developer, ScyllaDB
Duarte Nunes is a Software Engineer working on ScyllaDB. He has a background in concurrent programming, distributed systems and low-latency software. Prior to ScyllaDB, he worked on MidoNet, an open source distributed network virtualization platform, making it fast and scalable.


Wednesday November 7, 2018 9:00am - 9:30am PST
Breakout 2 - Sequoia

9:30am PST

Make Scylla Fast Again! Find out how using Tools, Talent, and Tracing
Scylla strives to deliver high throughput at low, consistent latencies under any scenario. But in the field things can and do get slower than one would like. Some of those issues come from bad data modelling and anti-patterns. Some others from lack of resources and bad system configuration, and in rare cases even product malfunction.

But how to tell them apart? And once you do, how to understand how to fix your application or reconfigure your system? Scylla has a rich ecosystem of tools available to answer those questions and in this talk we’ll discuss the proper use of some of them and how to take advantage of each tool’s strength. We will discuss real examples using tools like CQL tracing, nodetool commands, the Scylla monitor and others.

Speakers
avatar for Avi Kivity

Avi Kivity

CTO, ScyllaDB
Avi Kivity, CTO of ScyllaDB, is known mostly for starting the Kernel-based Virtual Machine (KVM) project, the hypervisor underlying many production clouds. He has worked for Qumranet and Red Hat as KVM maintainer until December 2012. Avi is now CTO of ScyllaDB, a company that seeks... Read More →


Wednesday November 7, 2018 9:30am - 10:00am PST
Breakout 2 - Sequoia

10:00am PST

Kiwi.com Takes Flight with Scylla
Kiwi.com provides a powerful flight, train and bus search engine driven by volatile data — entries expire in just a couple of days. The compute engine loads data every couple of hours from the cluster, running in blue-green deployment and conducting several simultaneous A/B tests. To keep  full table scans predictable , Kiwi.com implemented  a dedicated cache, to store post-processed results  from the database. Where Cassandra’s limitations forced the team to implement a custom scanning service to read newly created SStables and stream updates to the cache, Scylla made it easy and safe to do performant full-table scans. Our Cassandra to Scylla migration, benchmarking on GCP and bare metal OVH, and the benchmarking and performance results with the primary focus on full table scan as the rest of our benchmarking results.

Speakers
avatar for Jan Plhak

Jan Plhak

Head of C++ Development, Kiwi.com
Jan studied abstract mathematics and has spent the last 5 years tackling a range of challenges in the travel industry, including routing algorithms, custom graph databases, search engines and more.


Wednesday November 7, 2018 10:00am - 10:30am PST
Breakout 2 - Sequoia

10:50am PST

Scaling your time series data with Newts
Today's datasets are growing at an exponential rate. Collection, storage, analysis, and reporting are becoming more challenging, and the results more valued. A decade ago, RRDTool's algorithms were well-suited to our requirements, but they fall short of scaling to current demands. A new direction is needed, one that prioritizes write-optimized storage, and that scales beyond a single host.

This presentation will provide an overview of Newts, a distributed time-series data store based on ScyllaDB, show how it compares to other solutions, and take a look at how it is integrated in OpenNMS.

Speakers
avatar for Jesse White

Jesse White

CTO, OpenNMS
Jesse is the CTO of The OpenNMS Group Inc., where he leads the development of OpenNMS, an enterprise grade network management platform. His technical experience ranges from writing Linux Kernel modules in C, continuous integration tooling in Java, and graphing libraries in Javascript... Read More →


Wednesday November 7, 2018 10:50am - 11:20am PST
Breakout 2 - Sequoia

11:20am PST

Keeping Your Latency SLAs No Matter What!
As a real time Big Data database, there are few things more important than keeping latencies low and bounded. Scylla has been delivering great tail latencies from our day one, but the job of making them better never ends and there is always more to do. In this talk we will explore some of the changes made to Scylla in the past few releases to help keep latencies down.

Speakers
avatar for Glauber Costa

Glauber Costa

VP Field Engineering, ScyllaDB
Glauber Costa is VP of Field Engineering at ScyllaDB. He shares his time between the engineering department working on upcoming Scylla features and helping customers succeed.Before ScyllaDB, Glauber worked with Virtualization in the Linux Kernel for 10 years, with contributions ranging... Read More →


Wednesday November 7, 2018 11:20am - 11:50am PST
Breakout 2 - Sequoia
 
Filter sessions
Apply filters to sessions.