Practice Exam

Q1 : Storage of JSON files with occasionally changing schema, for ANSI SQL queries.

This is correct because of the requirement to support occasionally (schema) changing JSON files and aggregate ANSI SQL queries: you need to use BigQuery, and it is quickest to use ‘Automatically detect’ for schema changes.

Q2 : Low-cost one-way one-time migration of two 100-TB file servers to GCP; data will only be accessed from Germany.

This is correct because you are performing a one-time (rather than an ongoing series) data transfer from on-premises to Google Cloud Platform for users in a single region (Germany). Using a Regional storage bucket will reduce cost and also conform to regulatory requirements.

Q3 : Cost-effective backup to GCP of multi-TB databases from another cloud including monthly DR drills.

This is correct because you will need to access your backup data monthly to test your disaster recovery process, so you should use a Nearline bucket; also because you will be performing ongoing, regular data transfers, so you should use Storage Transfer Service.

Q4 : 250,000 devices produce a JSON device status every 10 seconds. How do you capture event data for outlier time series analysis?

This is correct because the data type, volume, and query pattern best fit Cloud Bigtable capabilities.

Q5 : Event data in CSV format to be queried for individual values over time windows. Which storage and schema to minimize query costs?

This is correct because it is a recommended best practice. Use Cloud Bigtable and this schema for this scenario. Cloud Storage would have cheaper STORAGE costs than Cloud Bigtable, but we want to minimize QUERY costs.

Q6 : Customer wants to maintain investment in existing Apache Spark code data pipeline.

This is correct because Cloud Dataproc is a managed Hadoop service and runs Apache Spark applications.

Q7 : Host a deep neural network machine learning model on GCP. Run and monitor jobs that could occasionally fail.

This is correct because of the requirement to host an ML DNN. Cloud ML Engine for Tensorflow can handle DNNs. Google recommends monitoring Jobs, not Operations.

Q8 : Cost-effective way to run non-critical Apache Spark jobs on Cloud Dataproc?

This is correct because Spark and high-memory machines only need the standard mode. Also, use preemptible nodes because you want to save money and this is not mission-critical.

Q9 : Promote a Cloud Bigtable solution with a lot of data from development to production and optimize for performance.

This is correct because Cloud Bigtable allows you to ‘scale in place,’ which meets your requirements for this scenario.

Q10 : As part of your backup plan, you want to be able to restore snapshots of Compute Engine instances using the fewest steps.

This is correct because the scenario asks how to recreate instances. You can create an instance directly from a snapshot without restoring to disk first.

Q11 : You want to minimize costs to run Google Data Studio reports on BigQuery queries by using prefetch caching.

This is correct because you must set Owner credentials to use the ‘enable cache’ option in BigQuery. It is also a Google best practice to use the ‘enable cache’ option when the business scenario calls for using prefetch caching. 1) Report must use Owner’s Credentials. 2) You don’t need to tell the users not to use the report, you need to tell the system to use Query and Pre-fetch caching to cut down on BigQuery jobs.

Q12 : A Data Analyst is concerned that a BigQuery query could be too expensive.

This is correct. SELECT limits the input data.

Q13 : BigQuery data is stored in external CSV files in Cloud Storage; as the data has increased, the query performance has dropped.

This is correct. The performance issue is because the data is stored in a non-optimal format in an external storage medium.

Q14 : Source data is streamed in bursts and must be transformed before use.

This is correct because the unpredictable data requires a buffer

Q15 : Calculate a running average on streaming data that can arrive late and out of order.

This is correct because together, Cloud Pub/Sub and Cloud Dataflow can provide a solution.

Q16 : Testing a Machine Learning model with validation data returns 100% correct answers.

Correct

This is correct. The 100% accuracy is an indicator that the validation data may have somehow gotten mixed in with the training data. You will need new validation data to generate recognizable error.

Q17 : A client is using Cloud SQL database to serve infrequently changing lookup tables that host data used by applications. The applications will not modify the tables. As they expand into other geographic regions they want to ensure good performance. What do you recommend?

This is correct. A read replica will increase the availability of the service and can be located closer to the users in the new geographies.

Q18 : A client wants to store files from one location and retrieve them from another location. Security requirements are that no one should be able to access the contents of the file while it is hosted in the cloud. What is the best option?

Incorrect

The specific requirement is that the file cannot be decrypted in the cloud. This feature simply makes decryption more private and secure. So it is not the best solution because it does not satisfy the business requirements stated in the question.

Q19 : Three Google Cloud services commonly used together in data engineering solutions. (Described in this course).

Correct. Cloud Pub/Sub provides messaging, Cloud Dataflow is used for ETL and data transformation, and Cloud BigQuery is used for interactive queries.

Q20: What is AVRO used for?

This is correct. AVRO is a serialization / de-serialization standard.

Q21 : A company has a new IoT pipeline. Which services will make this design work? Select the services that should be used to replace the icons with the number “1” and number “2” in the diagram.

This is correct because device data captured by Cloud IoT Core gets published to Cloud Pub/Sub

Q22 : A company wants to connect cloud applications to an Oracle database in its data center. Requirements are a maximum of 9 Gbps of data and a Service Level Agreement (SLA) of 99%.

This is correct. Partner Interconnect is useful for data up to 10 Gbps and is offered by ISPs with SLAs.

Q23 : A client has been developing a pipeline based on PCollections using local programming techniques and is ready to scale up to production. What should they do?

This is correct. The PCollection indicates it is a Cloud Dataflow pipeline. And the Cloud Runner will enable the pipeline to scale to production levels.

Q24 : A company has migrated their Hadoop cluster to the cloud and is now using Cloud Dataproc with the same settings and same methods as in the data center. What would you advise them to do to make better use of the cloud environment?

This is correct. Storing persistent data off the cluster allows the cluster to be shut down when not processing data. And it allows separate clusters to be started per job or per kind of work, so tuning is less important.

Q25 : An application has the following data requirements. 1. It requires strongly consistent transactions. 2. Total data will be less than 500 GB. 3. The data does not need to be streaming or real time. Which data technology would fit these requirements?

This is correct. Cloud SQL supports strongly consistent transactions. And the size requirements will fit with a Cloud SQL instance.

Preparation resources

Storage and Database Documentation

Disks :

  • cloud.google.com/compute/docs/disks
  • cloud.google.com/bigtable/docs/choosing-ssd-hdd

Cloud Storage : World-wide storage and retrieval of any amount of data at any time

  • cloud.google.com/storage/docs/

Cloud Memorystore : Fully managed in-memory data store service

  • cloud.google.com/memorystore/docs/redis/

Cloud SQL : MySQL and PostgreSQL database service

  • cloud.google.com/sql/docs/

Cloud Datastore : NoSQL document database service

  • cloud.google.com/datastore/docs/

Cloud Firestore : Store mobile and wev app data at global scale

  • cloud.google.com/firestore/docs/

Firebase Realtime Database : Store and sync data in real time

  • firebase.google.com/docs/database/

Cloud Bigtable : NoSQL wide-column database service.

  • cloud.google.com/bigtable/docs

Cloud Spanner : Mission-critical, scalabale, relational database service

  • cloud.google.com/spanner/docs/

Data Analytics Documentation

BigQuery : A fully managed, highly scalable data warehouse with built-in ML

  • cloud.google.com/bigquery/docs/

Cloud Dataproc : Managed Spark and Hadoop service

  • cloud.google.com/dataproc/docs/

Cloud Dataflow : Real time batch and stream data processing

  • cloud.google.com/dataflow/docs/

Cloud Datalab : Explore, analyze, and visualize large datasets.

  • cloud.google.com/datalab/docs

Cloud Dataprep : Cloud data service to explore, clean, and prepare data for analysis

  • cloud.google.com/dataprep/docs/

Cloud Pub/Sub : Ingest event streams from anywhere, at any scale.

  • cloud.google.com/pubsub/docs/

Google data studio : Tell great stories to support better business decisions.

  • marketingplatform.google.com/about/data-studio/

Cloud Composer : A fully managed workflow orchestration service built on Apache Airflow.

  • cloud.google.com/composer/docs

Machine Learning

Cloud Machine Learning Engine : Build superior models and deploy them into production

  • cloud.google.com/ml-engine/docs/

Cloud TPU : Train and run ML models faster than ever.

  • cloud.google.com/tpu/docs/

Cloud AutoML : Easily train high-quality, custom ML models

  • cloud.google.com/automl/docs/

Cloud Natural Language : Derive insights from unstructured text

Cloud Speech-to-text : Speech to text conversion powered by ML

Cloud Translation : Dynamically translate between languages

Cloud Text-to-speech : Text to speech conversion powered by ML

Dialogflow Enterprise Edition : Create conversational experiences across devices and platvforms

Cloud Vision: Derive insight from images powered by ML

Cloud Video Intelligence : Extract metadata from videos

Infrastructure Documentation :

Stackdriver : Monitoring and management for services, containers, applications and infrastructure.

  • Monitoring : Monitoring for applications on GCP and AWS.
  • Logging : logging for applications on GCP and AWS
  • Error Reporting : Identifies and helps you understand application errors
  • Trace : Find performance bottlenecks in production
  • Debugger : Investigate code behavior in production

Transparent Service level indicators : Monitor google cloud services and their effects on your workloads.

Cloud deployment manager: Manage cloud resources with simples templates.

Cloud Console : GCP integrated management console. Cloud Shell : Command-line management from any browser.

Review of tips

  • TIP 1: Create your own custom preparation plan using the resources in this course.
  • TIP 2: Use the Exam Guide outline to help identify what to study.
  • TIP 3: Product and technology knowledge.
  • TIP 4: This course has touchstone concepts for self-evaluation, not technical training. Seek training if needed.
  • TIP 5: Problem solving is the key skill.
  • TIP 6: Practice evaluating your confidence in your answers.
  • TIP 7: Practice case evaluation and creating proposed solutions.
  • Tip 8: Use what you know and what you don’t know to identify correct and incorrect answers.
  • Tip 9: Review or rehearse labs to refresh your experience
  • Tip 10: Prepare!

Practice exam : https://cloud.google.com/certification/practice-exam/data-engineer

Categories: CloudTech

Brax

Dude in his 30s starting his digital notepad