Introduction#

About Me#

Name: Syed Fahad Sultan

سید فہد سلطان

Pronunciation: sæjjɪd fah(aː)d solˈtˤɑːn

Just call me “Dr. Sultan” (Pronounced: Sool-tahn 🔈)

_images/name.png

Fig. 3 Names work differently in different parts of the world!#

I am originally from Lahore, Pakistan (Fun fact: Population of Lahore > Population of New York City) and joined Furman University in Fall 2022 after earning my Ph.D. in Computer Science from State University of New York at Stony Brook.

_images/lahore.png

Fig. 4 Badshahi Mosque, Lahore, Pakistan.#

Fresh out of college, I worked as a professional video game developer for a startup that later got acquired by the Japanese gaming giant DeNA. During this time, I was part of the team that built TapFish, the top-grossing game worldwide, for two weeks in 2011, on both the App Store and Google Play.

I then went on to work at Technology for People Initiative, an applied research lab in my where I mined social media and cell phone data for proxies of socio-economic indicators that allowed more inclusive policy-making for marginalized communities. During these years, I also dabbled in data journalism and helped organize a boot camp on using data for journalists with the support of the International Center for Journalists (ICFJ) and the Knight Foundation.

In 2015, I moved to Mecca, Saudi Arabia to work for the GIS Innovation Center (now Wadi Makkah). There I worked on innovative urban sensing techniques for better crowd control during the annual pilgrimage to the city, the largest human gathering in the world every year.

During my PhD, Fahad worked at the intersection of computational neuroscience, bioinformatics and machine learning. My work focused on identifying neurological and genetic biomarkers linking type-2 diabetes with cognitive disorders such as Alzheimer’s and other dementias.

fishy

fishy

fishy

1. Video games developer

2. City Planner

3. Neuroscientist

Teaching Assistant#

_images/mister.jpeg

Fig. 5 If your code has bugs, Mister Cat will find them!#

How to Reach Me#

Office: Riley Hall 200-D

Email: fahad.sultan@furman.edu

Office hours:

Monday: 1:30 PM – 4:30 PM

Friday: 9:30 AM – 11:30 AM

Drop by office, for any other time

Open door policy, when not in class or meeting

_images/office.png

Fig. 6 Computer Science Department Suite in Riley Hall.#

If you don’t like your odds for catching me in my office, schedule a meeting for 15 minutes or 30 minutes in advance.

_images/courseweek.png

Fig. 7 Course week. I am in my office most weekdays from 9 AM - 5 PM and have an open-door policy.#

About the Course#

Course website: https://fahadsultan.com/csc272

The Syllabus is available on the course website. In particular, please make sure to read the Grading, Academic Integrity and Textbook sections carefully.

Last but not least, please go over the Course Project in detail. The project is a major component of the course and will be due at the end of the semester.

All of the course content will be posted on this website.

Important announcements will be made on both the course website homepage and on Moodle

All assignments and exams should be submitted on Moodle. All grades will be uploaded on Moodle

Exams#

There will be four exams in the course, one roughly every month, including the final. Exams constitute 40% of your course grade.

All exams will be on the computer, with a large programming component. Questions will be posted on Moodle and you will have to submit your solutions on Moodle, just like assignments.

You will be evaluated on your ability to apply knowledge to new problems and not just on your ability to retain and recall information.

The exams (and the project), more than the assignments, are primarily going to determine your grade.

All exams are going to be cumulative, with focus on the topics covered since last exam.

You will be assigned an interim grade on Workday after every Exam.

Diligent work on the homework and assignments will be rewarded here.

_images/exams.png

Fig. 8 Consistent effort and regular feedback.#

Assignments#

Approach assignments purely as opportunities to learn, prepare for exams and to prepare for your career.

It is not worth cheating on assignments. Just come talk to me if you are struggling with an assignment. I will literally just tell you the answer.

On assignments, expect near maximal flexibility from me. For every assignment, the soft (recommended) deadline will be a week after the assignment is posted. The hard deadline will be before the next exam.

In other words, as long as you submit the assignment before the next exam, you will get full credit.

Written Assignments:

Written assignments are to help you build a deeper understanding of algorithms and math covered in class.

These could simply be math problems or involve tracing algorithms and dry-runs.

Both handwritten or typed submissions are acceptable. Submissions, as always, on Moodle.

Programming Assignments:

Programming assignments are going to be posted at the start of the lab session each week and will be due before the start of the next lab session.

You should expect similar questions on the exams.

Class Participation#

I may periodically give out class participation points during class for answering or asking a question

Given the glut of information accessible online and otherwise in this day and age, meaningful interactions with your peers and teachers is essentially what you are paying college tuition for.

Please come to class, labs and office hours

Please ask questions during class

Please answer questions and participate in discussions during class

Giant Asterisk *#

Everything is tentative and subject to change

https://raw.githubusercontent.com/fahadsultan/csc272/main/assets/complaints.jpeg

Fig. 9 Complaints Box on Moodle#

This is my first teaching this course. Any and all feedback is welcome!

I have created an anonymous feedback poll on Moodle. Please use this to anonymously share any feedback.

Share any changes you want me to make in the course, at any point in the semester. You can submit multiple times over the span of the semester.

Think of it as a Complaints Box for the course.

What is Data Mining?#

“Data Mining” is a term from the 1990s, back when it was an exciting and popular new field. Around 2010, people instead started to speak of “big data”. Today, the popular term is “data science”. There are some who even regard data mining as synonymous with machine learning. There is no question that some data mining appropriately uses algorithms from machine learning. However, during all this time, the concept remained the same: use the most powerful hardware, the most powerful programming systems, and the most efficient algorithms to solve problems in science, commerce, healthcare, government, the humanities, and many other fields of human endeavor.

_images/venn.png

Fig. 10 Drew Conway’s Data Science Venn Diagram#

From the Venn Diagram, the course content is going to cover ✅ Hacking Skills and ✅ Math & Statistics in detail but not ☐ Substantive Expertise. For that missing piece, I strongly encourage you to bring in knowledge from your GERs and other Non-CS department courses into this class and the term project in particular. Nothing would make me happier than to see projects that combines CS with your other interests.

Expect lots of Programming and lots of Math in the Course!

“But wait, I am not a Math Person!” you say!

There is no such thing as a “Math Person”. I do recognize, however, that Math Anxiety is a real thing and is very common. It is a feeling of fear based on a belief that one is not good at math or that math is inherently difficult.

Please use this course as an opportunity to overcome your Math anxiety!

In this course, the code you write will be mostly math. Most modern “AI” is just that: math, in code.

This presents a unique opportunity for you to overcome your Math anxiety. You will be able to see the math in action, be able to visualize the results and have a conversation with it.

Trust me, there is a tremendous amount of beauty and joy to be found in mathematics. And if beauty and joy aren’t really your thing, then let me also assure you there is a lot of money to be made these days by being good at coding math. Either way, the rewards are well worth the effort!

Central Dogma of Data Science#

Originally, “data mining” or “data dredging” was a derogatory term referring to attempts to extract information that was not supported by the data. Later, around the same time the iPhone was released and social networks started to take off, “data mining” took on a more positive meaning. Data was declared to be the “new oil” and the job of a Data Scientist was declared as “The Sexiest Job of the 21st Century”.

Today, I argue that the term “data mining” has mixed connotations. At the heart of the ambivalence towards data mining is the central dogma of data science:

_images/dogma.png

Fig. 11 Commonly known as Central Dogma of Statistics.#

The central dogma of data science, simply put, is that general claims about a population can be made from a sample of data. This raises all sorts of questions and concerns about the sampling process such as the representativeness of the sample, the size of the sample, the sampling bias, etc. Which in turn raises concerns about potential negative effects of the claims made based on questionable data.

I’ll end here. I hope you are excited about the course and the term project.

In the next lecture, we will begin by introducing one of the most important tools in exploratory data analysis: pandas.

Assignment 0#

  1. Go over course syllabus

    • Course website homepage

  2. Written Assignment 0: Pre-course Survey (On Moodle)

    • Please complete before next class