
Cloudera DS200 Exam  CheatTest.com Free DS200 Sample Questions: Q: 1
From historical data, you know that 50% of students who take Cloudera's Introduction to Data Science: Building Recommenders Systems training course pass this exam, while only 25% of students who did not take the training course pass this exam.
You also know that 50% of this exam's candidates also take Cloudera's Introduction to Data Science: Building Recommendations Systems training course.
What is the probability that any individual exam candidate will pass the data science exam?
A. 3/8
B. 1/4
C. 1/8
D. 1/2
Answer: C
Q: 2
You are about to sample a 100dimensinal unitcube. To adequately sample any single given dimension, you need only capture 10 points. How many points do you need to order to sample the complete 100dimensional unit cube adequately?
A. 10010
B. 1010
C. Log2(100)
D. 100
E. 1000
F. 1010
Answer: E
Q: 3
A company has 20 software engineers working to fix on a project. Over the past week, the team has fixed 100 bugs. Although the average number of bugs. Although the average number of bugs fixed per engineer id five. None of the engineer fixed exactly five bugs last week.
One engineer points out that some bugs are more difficult to fix than others. What metric should you use to estimate how hard a particular bug is to fix?
A. The tech lead's estimate of how many hours would be needed to fix the bug.
B. The priority of the bug according to the project manager
C. The number of years that the engineer who was assigned the bug has worked at the company
D. The number of bugs that had been found in each subcomponent of the project
Answer: D
Q: 4
You have a large m x n data matrix M. You decide you want to perform dimension reduction/clustering on your data and have decide to use the singular value decomposition (SVD; also called principal components analysis PCA)
Refer to the passage above.
What represents the SVD of the Matrix standard M given the following information:
U is m x m unitary
V is n x n unitary
S is m x n diagonal
Q is n x n invertible
D is n x n diagonal
L is m x m lower triangular
U is m x m upper triangular
A. M = U S V
B. M = U P
C. M = Q D Q1
D. M = L U
Answer: A
Q: 5
You need to analyze 60,000,000 images stored in JPEG format, each of which is approximately 25 KB. Because your Hadoop cluster isn't optimized for storing and processing many small files you decide to do the following actions:
1. Group the individual images into a set of larger files
2. Use the set of larger files as input for a MapReduce job that processes them directly with Python using Hadoop streaming
Which data serialization system gives you the flexibility to do this?
A. CSV
B. XML
C. HTML
D. Avro
E. Sequence Files
F. JSON
Answer: B, F
Q: 6
Given the following sample of numbers from a distribution:
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89
What are the five numbers that summarize this distribution (the five number summary of sample percentiles)?
A. 1, 3, 8, 34, 89
B. 1, 4, 13, 34, 89
C. 1, 1.5, 5, 24.5, 89
D. 1, 2.5, 8, 27.5, 89
Answer: A 
