Respond to the introduction discussion self-introduction.
We will do self-introduction this Friday via zoom.
Course GitHub organization invitation https://github.com/tulane-math-7360-2021
Go to GitHub Education to get your student benifit (you need to use your tulane.edu email address).
Once you have got your GitHub id, please tell me through email (xji4@tulane.edu). Your GitHub id will then be invited to the course GitHub organization.
Project and homework submission via GitHub
Our course has a GitHub page. Please take a look at https://tulane-math-7360-2021.github.io/ and its source code at https://github.com/tulane-math-7360-2021/tulane-math-7360-2021.github.io. You could check all the development history of this website through the commit history.
Lab sessions
There will be recordings (not many) in the future.
Do need to submit the lab “work” by pushing it to your Git Repo on the course organization.
There will be “solutions” posted (after the following Monday lecture) for future lab sessions (when there are questions).
Using R is a course objective
try to use R as much as possible for lab sessions and homework assignments
free to use any language for course project
Homework assignment starts week 2. 1st assignment due on week 4. Expected frequency: one per 2-3 weeks.
Will provide optional reading material on Course Webpage.
Github page contains the most up-to-date materials.
Find a dataset of interest to you.
Turn in a brief one-page description by the end of week 3. (points: 3/30)
Submit a mid-term report (2 - 4 pages, no more than 4 please) by the end of week 12. (points: 7/30)
Present your work to your peers week 15 and 16 (December 3, 6, 8, and 10). (points: 10/30)
Submit a final report (4 - 8 pages, no more than 8 please) by the end of the semester by December 18 (early submissions are encouraged).
Submit code to your own private GitHub repository on the course GitHub organization by December 18. (Report + Code, points: 10/30)
(Optional) make a GitHub page for your project.
(Optional) make an R shiny app to showcase your findings.
Amazon data http://jmcauley.ucsd.edu/data/amazon/, https://nijianmo.github.io/amazon/index.html, https://cseweb.ucsd.edu/~jmcauley/datasets.html
Sports/eSports prediction
Hurricane prediction!
1000 human genome project
Reproduce findings of a paper in your field (could be hard).
Google “data science projects” to get more ideas
Additions:
Include the brief description with modifications if needed
Give an abstract on your plan
Current progress and future plan
Introduce the dataset. Explain why you choose it. Explain what questions you want to ask and explore using the dataset.
Analysis. Explain the statistical methods that you use for analyzing the dataset. Explain what you have done to generate the results (make your analysis reproducible).
Results. Illustrate your results. Use figures and tables to imiprove readability.
Discussions. This is the place to put in almost whatever you want to share. Some difficulties you met in the analysis, what you learned from the analysis, some future directions.
Previous lecture notes from Dr. Michelle Lacey (Math Department @ Tulane)
Course material from Dr. Hua Zhou (Biostatistics Department @ UCLA)
Various online sources
Statistics, the science of data analysis, is the applied mathematics in the 21st century.
Data is increasing in volume, velocity, and variety.
My favorite definition of a data scientist:
A data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician.
@Huber94HugeData; @Huber96MassiveData
Data Size | Bytes | Storage Mode |
---|---|---|
tiny | \(10^2\) | piece of paper |
small | \(10^4\) | a few pieces of paper |
medium | \(10^6\) (MB) | a floppy disk |
large | \(10^8\) | hard disk |
huge | \(10^9\) (GB) | hard disk(s) |
massive | \(10^{12}\) (TB) | hard disk(s); RAID storage |
This course introduces some computing skills and software tools for handling data.
Read syllabus and About the course for a tentative list of topics and course logistics.
Comments from 2020 course evaluations
From “Additional comments about your experience in this course”:
I learned more from going through the material in the labs than in the actual lectures. I think that maybe smaller homework assignments more frequently would have been more effective than three huge assignments.
I thought the course was well organized. The notes were easy to access for later reference which was really helpful. I think the only thing that could be improved upon is spending a little less time on learning R and spending more time discussing the theory behind Data Analysis. Some of the concepts addressed at the end felt rushed and they were the concepts with which I wanted to spend more time.
Awesome! Professor Ji is very nice. He answered all my confusions. He always explain things clear enough for us to comprehend.
Dr. Ji is very knowledgeable and goes out of his way to be available as a resource for students needing help with the course or outside R/research/data questions. He made the structure of the course very flexible without sacrificing rigor, which I think everyone really appreciated during this stressful semester. I had experience in statistics/R going in, but have improved so much and learned so many new packages and strategies thanks to this course. I especially benefitted from the homework assignments, which provided practice with skills that I have been able to immediately apply to my research.
From “Comment on the strongest aspects of this course”:
I liked that this class had a lab component. The labs were helpful in making sure that I was understanding the material.
I received a lot of help from Professor Xiang, who was always was ready to answer my questions in class or through email.
Dr. Ji was by far the best part of the course. He was extremely helpful and prompt with responses to questions. He wanted us to do well in the course.
Useful in learning R
This course started from the very basics and covered a lot of information quickly, but was extremely useful and approachable for students coming from different departments with different levels of previous experience. I really appreciated that the course was tailored towards providing a brief overview of many relevant topics and developing practical skills, rather than memorizing formulas or taking long comprehensive exams. I would highly recommend taking this course with Dr. Ji to anyone who uses R or wants to start learning.