Collaborative, Large-Scale Data Analytics and Visualization with Python

Subscribers:
344,000
Published on ● Video Link: https://www.youtube.com/watch?v=6rlHo2XHfRM



Duration: 1:22:38
118 views
2


NumPy and recently Pandas have made Python ubiquitous for scientific computing and data analytics. The technical stack for Python works very well for a wide variety of problems that fit in single-address space (RAM of a single computer). For problems that require larger data sets, current solution approaches are to use memory-mapped files, MPI, IPython parallel and/or a standard map-reduce system like Disco (or Hadoop). These techniques typically significantly complicate the software solution from the simple array (table)-oriented expression that makes NumPy (Pandas) so powerful and popular. These approaches can also result in significant data movement throughout the memory hierarchy (which is the common bottleneck in data-centric computing today). Blaze, is an array / table for python that can be used to manage and manipulate very-large, disjoint, data sets in an array-oriented fashion with Python. It is built on a C++-library (dynd) that provides dynamic, multi-dimensional arrays with flexible data types. It also leverages Numba, an array-oriented, python compiler that takes a subset of the Python syntax to LLVM IR and optimized machine code. In this talk I will discuss Blaze and Numba design and roadmap. I will also provide an overview and example of web-based visualizations with Bokeh which allows Python developers to easily produce interactive, web-based visualizations leading in to an overview of Wakari which provides easy access to executable IPython notebooks in the cloud.




Other Videos By Microsoft Research


2016-07-26Scalable learning of Bayesian network classifiers
2016-07-26Everything you always wanted to know about web-based device fingerprinting (but were afraid to ask)
2016-07-26Making Reusable Hardware Design IPs Usable: an NoC perspective
2016-07-26Lessons from Megaprojects: The Creators and Destroyers of Capital
2016-07-26Unsupervised Transcription of Historical Documents
2016-07-26Online Learning and Adaptation Over Networks
2016-07-26Easy Generation and Efficient Verification of Unsatisfiability Proofs
2016-07-26Learning to Understand Natural Language in Physically-Grounded Environments
2016-07-26Frontiers of Accessibility: From the Body to the Mind, the Heart, and the Soul
2016-07-26Analyzing neurological disorders using functional and structural brain imaging data
2016-07-26Collaborative, Large-Scale Data Analytics and Visualization with Python
2016-07-26Gap Probabilities for Zeroes of Stationary Gaussian Functions
2016-07-26Random Walks on Groups and the Kaimanovich-Vershik Conjecture for Lamplighter Groups
2016-07-26Sensing without Sensors
2016-07-26Optimal Falsifications for Cyber-Physical Systems using Trajectory Splicing
2016-07-26A Sensor Fusion Approach towards Gesture Recognition on the Wearable Ring Form Factor
2016-07-26Digital Traces in Online Places: Methods, Software, and Applications for Social Behavioral Research
2016-07-26Stabilizer: Statistically Sound Performance Evaluation
2016-07-26Revenue maximization and prophet inequalities
2016-07-26On the Number of Matroids
2016-07-26RealTime Collaborative Analysis with (Almost) Pure SQL: A Case Study in Biogeochemical Oceanography



Tags:
microsoft research