Applying Comparison Operators to DataFrame - p.12 Data Analysis with Python and Pandas Tutorial
Welcome to part 12 of the Data Analysis with Python and Pandas tutorial series. In this tutorial, we're goign to talk briefly on the handling of erroneous/outlier data. Just because data is an outlier, it does not mean it is erroneous. A lot of times, an outlier data point can nullify a hypothesis, so the urge to just get rid of it can be high, but this isn't what we're talking about here.
What would an erroneous outlier be? An example I like to use is when measuring fluctuations in something like, say, a bridge. As bridges carry weight, they can move a bit. In storms, that can wiggle about a bit, there is some natural movement. As time goes on, and supports weaken, the bridge might move a bit too much, and eventually need to be reinforced. Maybe we have a system in place that constantly measures fluctuations in the bridge's height.
Text based tutorial and sample code: http://pythonprogramming.net/comparison-operators-data-analysis-python-pandas-tutorial/
http://pythonprogramming.net
https://twitter.com/sentdex