Tech Talk: Obtaining and Validating Big Data

Subscribers:
58,000
Published on ● Video Link: https://www.youtube.com/watch?v=qhpM0r6toB0



Duration: 13:10
282 views
3


In this video Konstantinos Pouliasis presents a toolkit combining bash scripting, a refined postgres database design and Node.js libraries for effectively fetching, validating and storing big data publicly available from governmental resources.

Basic unix scripting programming constructs and specific commands are being presented. Curl command is utilized to fetch the zipped CSV files, the unzip command is coupled with sed to extract data and achieve some first data cleanup. A database side validation of CSV data integrity is being exposed and, next, a database design technique utilizing Postgres partitioning is being elaborated as appropriate for large seasonal data sets. Finally, the library pg-pool with its multithreaded connection capabilities is presented as an ideal complementary to automatizing data insertion using Node.js.

----

Fullstack Academy was recently ranked the #1 coding bootcamp in the U.S. Learn more at https://www.fullstackacademy.com.