Big Data Deduplication and data matching using Python
Andrew Rowe will present the lessons learnt and techniques used to process very large amounts of data from the ABS Census. The Australian Bureau of Statistics used Python to investigate data from the 2006 Australian Census. Python is an integral part of ABS systems to determine duplicated entries and link people in the Census to other ABS collections. You will learn about: Handling large data. Dealing with confidentiality. Multiprocessing techniques. Performance tips and tricks. * Difference between if( 1 < 2 ) and if 1 < 2.