#  24019 CGA's Data Science/Big Data Projects 

 



 **Project Title:** 24019 CGA's Data Science/Big Data Projects

 **Keywords**: Data Science, Big Data, High Performance Computing, Cloud Computing, Python

 **Mentor**: Devika Jain, Center for Geographic Analysis, Harvard University

 **Project Description:**

 Big geospatial data include datasets that are too large to be processed using traditional GIS tools. The objective of GIS Data Science/ Geospatial Big Data work stream at CGA is to:

 -Apply Data Science, Machine Learning, AI techniques for complex geospatial analysis

 -Design solutions for geospatial big data problems which cannot be handled by traditional GIS technologies

 -Scale geospatial applications on cluster (FASRC) and cloud (AWS/MOC) computing environments

 -Use geospatial databases (PostGIS, OmniSci) to perform large scale complex analysis on big data

 -Visualize large geospatial data at high speed using GPU based databases and other tools

 Read more here: <https://gis.harvard.edu/gis-data-science-big-data-workstream-cga>

 **Tasks and Responsibilities:**

 This position will be closely working with the Data Science Project Manager in various projects to accomplish the following:

 -Applying Data Science, Machine Learning, AI techniques for complex geospatial analysis

 -Designing software solutions (using Python) for geospatial big data problems

 -Scaling geospatial applications on High Performance computing and cloud computing environments

 -Using geospatial databases to perform large scale complex analysis on big data

 -Creating visualization for large geospatial data at high speed using advanced solutions

 -Processing complex and varied geospatial datasets mainly social media data (Twitter, Facebook etc.)

 **Minimum Qualifications:**

 -B.S. in geography, computer science, engineering, or a GIS related field.

 -2+ years of experience working in a GIS operational environment.

 **Additional Qualifications:**

 -Must be proficient in the design and development of software solutions for geospatial applications.

 -Must be conversant in different programming languages such as Python, C/C++, Java.

 -Hands-on experience with Jupyter Notebook, Geodatabases, LINUX is desirable.

 -Must have hands-on experience using High Performance/Cluster Computing, Cloud Computing (e.g. AWS).

 -Experience in ESRI big data tools and container applications (such as Docker) is a plus.

 -Superior technical skills, attention to detail, and the ability to function as a contributing team member required.

 -Must have strong communication skills and experience in providing technical support to a broad user community.

 **Terms of Project:** Ongoing