Generative Data Intelligence

AI-powered code suggestions and security scans in Amazon SageMaker notebooks using Amazon CodeWhisperer and Amazon CodeGuru | Amazon Web Services

Date:

Amazon SageMaker comes with two options to spin up fully managed notebooks for exploring data and building machine learning (ML) models. The first option is fast start, collaborative notebooks accessible within Amazon SageMaker Studio—a fully integrated development environment (IDE) for machine learning. You can quickly launch notebooks in Studio, easily dial up or down the underlying compute resources without interrupting your work, and even share your notebook as a link in few clicks. In addition to creating notebooks, you can perform all the ML development steps to build, train, debug, track, deploy, and monitor your models in a single pane of glass in Studio. The second option is Amazon SageMaker notebook instances—a single, fully managed ML compute instance running notebooks in the cloud, offering you more control on your notebook configurations.

Today, we are excited to announce the availability of Amazon CodeWhisperer and Amazon CodeGuru Security extensions in SageMaker notebooks. These AI-powered extensions help accelerate ML development by offering code suggestions as you type, and ensure that your code is secure and follows AWS best practices.

In this post, we show how you can get started with Amazon CodeGuru Security and CodeWhisperer in Studio and SageMaker notebook instances.

Solution overview

The CodeWhisperer extension is an AI coding companion that provides developers with real-time code suggestions in notebooks. Individual developers can use CodeWhisperer for free in Studio and SageMaker notebook instances. The coding companion generates real-time single-line or full function code suggestions. It understands semantics and context in your code and can recommend suggestions built on AWS and development best practices, improving developer efficiency, quality, and speed.

The CodeGuru Security extension offers security and code quality scans for Studio and SageMaker notebook instances. This assists notebook users in detecting security vulnerabilities such as injection flaws, data leaks, weak cryptography, or missing encryption within the notebook cells. You can also detect many common issues that affect the readability, reproducibility, and correctness of computational notebooks, such as misuse of ML library APIs, invalid run order, and nondeterminism. When vulnerabilities or quality issues are identified in the notebook, CodeGuru generates recommendations that enable you to remediate those issues based on AWS security best practices.

In the following sections, we show how to install each of the extensions and discuss the capabilities of each, demonstrating how these tools can improve overall developer productivity.

Prerequisites

If this is your first time working with Studio, you first need to create a SageMaker domain. Additionally, make sure you have appropriate access to both CodeWhisperer and CodeGuru using AWS Identity and Access Management (IAM).

You can use these extensions in any AWS Region, but requests to CodeWhisperer will be served through the us-east-1 Region. Requests will be served to CodeGuru in the Region of the Studio domain and if CodeGuru is supported in the Region. For all non-supported Regions, the requests will be served through us-east-1.

Set up CodeWhisperer with SageMaker notebooks

In this section, we demonstrate how to set up CodeWhisperer with SageMaker Studio.

Update IAM permissions to use the extension

You can use the CodeWhisperer extension in any Region, but all requests to CodeWhisperer will be served through the us-east-1 Region.

To use the CodeWhisperer extension, ensure that you have the necessary permissions. On the IAM console, add the following policy to the SageMaker user execution role:

{ "Version": "2012-10-17", "Statement": [ { "Sid": "CodeWhispererPermissions", "Effect": "Allow", "Action": ["codewhisperer:GenerateRecommendations"], "Resource": "*" } ]
}

Install the CodeWhisperer extension

You can install the CodeWhisperer extension through the command line. In this section, we look at the steps involved. To get started, complete the following steps:

  1. On the File menu, choose New and Terminal.
  2. Run the following commands to install the extension:
    conda activate studio
    pip install amazon-codewhisperer-jupyterlab-ext
    jupyter server extension enable amazon_codewhisperer_jupyterlab_ext
    conda deactivate
    restart-jupyter-server

Refresh your browser, and you will have successfully installed the CodeWhisperer extension.

Use CodeWhisperer in Studio

After we complete the installation steps, we can use CodeWhisperer by opening a new notebook or Python file. For our example we will open a sample Notebook.

You will see a toolbar at the bottom of your notebook called CodeWhisperer. This shows common shortcuts for CodeWhisperer along with the ability to pause code suggestions, open the code reference log, and get a link to the CodeWhisperer documentation.

The code reference log will flag or filter code suggestions that resemble open-source training data. Get the associated open-source project’s repository URL and license so that you can more easily review them and add attributions.

To get started, place your cursor in a code block in your notebook, and CodeWhisperer will begin to make suggestions .If you don’t see suggestions, press Alt+C in Windows or Option+C in Mac to manually invoke suggestions.

The following video shows how to use CodeWhisperer to read and perform descriptive statistics on a data file in Studio.

Use CodeWhisperer in SageMaker Notebook Instances

Complete the following steps to use CodeWhisperer in notebook instances:

  1. Navigate to your SageMaker notebook instance.
  2. Make sure you have attached the CodeWhisperer policy from earlier to the notebook instance IAM role.
  3. When the permissions are added, choose Open JupyterLab.
  4. Install the extension. by using a terminal, on the File menu, choose New and Terminal, and enter the following commands:
    pip install amazon-codewhisperer-jupyterlab-ext
    jupyter server extension enable amazon_codewhisperer_jupyterlab_ext

  5. Once the commands complete, on the File menu, choose Shut Down to restart our Jupyter Server.
  6. Refresh the browser window.

You will now see the CodeWhisperer extension installed and ready to use.

Let’s test it out in a Python file.

  1. On the File menu, choose New and Python File.

The following video shows how to create a function to convert a JSON file to a CSV.

Set up CodeGuru Security with SageMaker notebooks

In this section, we demonstrate how to set up CodeGuru Security with SageMaker Studio.

Update IAM permissions to use the extension

To use the CodeGuru Security extension, ensure that you have the necessary permissions. Complete the following steps to update permission policies with IAM:

  1. Preferred: On the IAM console, you can attach the AmazonCodeGuruSecurityScanAccess managed policy to your IAM identities. This policy grants permissions that allow a user to work with scans, including creating scans, viewing scan information, and viewing scan findings.
  2. For custom policies, enter the following permissions:
    { "Version": "2012-10-17", "Statement": [ { "Sid": "AmazonCodeGuruSecurityScanAccess", "Effect": "Allow", "Action": [ "codeguru-security:CreateScan", "codeguru-security:CreateUploadUrl", "codeguru-security:GetScan", "codeguru-security:GetFindings" ], "Resource": "arn:aws:codeguru-security:*:*:scans/*" } ] }

  3. Attach the policy to any user or role that will use the CodeGuru Security extension.

For more information, see Policies and permissions in IAM.

Install the CodeGuru Security extension

You can install the CodeGuru Security extension through the command line. To get started, complete the following steps:

  1. On the File menu, choose New and Terminal.
  2. Run the following commands to install the extension in the conda environment:
    conda activate studio
    pip install amazon-codeguru-jupyterlab-extension
    conda deactivate

Refresh your browser, and you will have successfully installed the CodeGuru extension.

Run a code scan

The following steps demonstrate running your first CodeGuru Security scan using an example file:

  1. Create a new notebook called example.ipynb with the following code for testing purposes:
    import torch
    # import tensorflow as tf def tensorflow_avoid_using_nondeterministic_api_noncompliant(): data = tf.ones((1, 1)) # Noncompliant: Determinism of tf.compat.v1.Session # can not be guaranteed in TF2. Ítf.config.experimental.enable_op_determinism() tf.compat.v1.Session( target='', graph=None, config=None ) layer = tf.keras.layers.Input(shape=[1]) model = tf.keras.models.Model(inputs=layer, outputs=layer) model.compile(loss="categorical_crossentropy", metrics="AUC") model.fit(x=data, y=data) def pytorch_sigmoid_before_bceloss_compliant(): # Compliant: `BCEWithLogitsLoss` function integrates a `Sigmoid` # layer and the `BCELoss` into one class # and is numerically robust. loss = nn.BCEWithLogitsLoss() input = torch.randn(3, requires_grad=True) target = torch.empty(3).random_(2) output = loss(input, target) output.backward()

The below code has intentionally incorporated common bad practices to showcase the capabilities of Amazon CodeGuru Security.

  1. Important: Please confirm that the CodeGuru-Security extension is installed and if the LSP server says Fully initialized as shown below when you open your notebook.

If you don’t see the extension fully initialized, return to the previous section to install the extension and complete the installation steps.

  1. Initiate the scan. You can initiate a scan in one of the following ways:
    • Choose any code cell in your file, then choose the lightbulb icon.
    • Choose (right-click) any code cell in your file, then choose Run CodeGuru scan.

When the scan is started, the scan status will show as CodeGuru: Scan in progress.

After a few seconds, when the scan is complete, the status will change to CodeGuru: Scan completed.

View and address findings

After the scan is finished, your code may have some underlined findings. Hover over the underlined code, and a pop-up window appears with a brief summary of the finding. To access additional details about the findings, right-click on any cell and choose Show diagnostics panel.

This will open a panel containing additional information and suggestions related to the findings, located at the bottom of the notebook file.

After making changes to your code based on the recommendations, you can rerun the scan to check if the issue has been resolved. It’s important to note that the scan findings will disappear after you modify your code, and you’ll need to rerun the scan to view them again.

Enable automatic code scans

Automatic scans are disabled by default. Optionally, you can enable automatic code scans and set the frequency and AWS Region for your scan runs. To enable automatic code scans, complete the following steps.

  1. In Studio, on the Settings menu, choose Advanced Settings Editor.
  2. For Auto scans, choose Enabled.
  3. Specify the scan frequency in seconds and the Region for your CodeGuru Security scan.

For our example, we configure CodeGuru to perform an automatic security scan every 240 seconds in the us-east-1 Region. You can modify this value for any region that CodeGuru Security is supported.

Conclusion

SageMaker Studio and SageMaker Notebook Instances now support AI-powered CodeWhisperer and CodeGuru extensions that help you write secure code faster. We encourage you to try out both extensions. To learn more about CodeGuru Security for SageMaker, refer to Get started with the Amazon CodeGuru Extension for JupyterLab and SageMaker Studio, and to learn more about CodeWhisperer for SageMaker, refer to Setting up CodeWhisperer with Amazon SageMaker Studio. Please share any feedback in the comments!


About the authors

Raj Pathak is a Senior Solutions Architect and Technologist specializing in Financial Services (Insurance, Banking, Capital Markets) and Machine Learning. He specializes in Natural Language Processing (NLP), Large Language Models (LLM) and Machine Learning infrastructure and operations projects (MLOps).

Gaurav Parekh is a Solutions Architect helping AWS customers build large scale modern architecture. His core area of expertise include Data Analytics, Networking and Technology strategy. Outside of work, Gaurav enjoys playing cricket, soccer and volleyball.

Arkaprava De is a Senior Software Engineer at AWS. He has been at Amazon for over 7 years and is currently working on improving the Amazon SageMaker Studio IDE experience. You can find him on LinkedIn.

Prashant Pawan Pisipati is a Principal Product Manager at Amazon Web Services (AWS). He has built various products across AWS and Alexa, and is currently focused on helping Machine Learning practitioners be more productive through AWS services.

spot_img

Latest Intelligence

spot_img

Chat with us

Hi there! How can I help you?