DataOps 3 - Databricks Code Promotion using DevOps CI/CD

Поділитися
Вставка
  • Опубліковано 15 січ 2025

КОМЕНТАРІ • 97

  • @santanughosh615
    @santanughosh615 4 роки тому +2

    Thanks, this question was asked in an interview. I have implemented the same and able to deploy the Notebook with Devops pipeline. Great Video. Thank you Dave.

  • @naveenkumarmurugan1962
    @naveenkumarmurugan1962 3 роки тому +1

    Delivered the purpose … you are rocking

  • @t.vishnuvardhanreddy956
    @t.vishnuvardhanreddy956 3 роки тому +2

    could you please suggest how to sync libraries to Azure repo like notebooks

    • @DaveDoesDemos
      @DaveDoesDemos  3 роки тому

      Hi thanks for the comment. This demo describes how to deploy notebooks, but the process is very similar for libraries. In the case of a library your build process would usually use a Linux shell to build the library (wheel/egg etc.) and your release would then push these into the cluster.
      Please don't think of the repo as storage to sync with, the repo is your source for development purposes only, and the code will be taken from this and built into a deployment artefact by the DevOps process. It's the artefact that will be deployed to production, and production should never connect to your repository.

  • @valorking135
    @valorking135 3 роки тому +1

    Thank you so much , keep growing 😊

  • @EduardoHerrera
    @EduardoHerrera 3 роки тому +1

    thanks for this video, just what I need it!

  • @akankshakothari2075
    @akankshakothari2075 4 роки тому +3

    I am trying to deploy all files from the folder but its giving error "Exception calling "ReadAllBytes" with "1" argument(s): "Could not find a part of the path". Though I can deploy one file. Please help how can I deploy all files from the folder to databricks

    • @DaveDoesDemos
      @DaveDoesDemos  4 роки тому

      Hi thanks for the comment. A few people had this issue so I added the update at ua-cam.com/video/l35MBEJiUgk/v-deo.html which deals with path issues. I also updated the instructions on GitHub which are now accurate. Sorry for the confusion, I wasn't able to edit this video but did put the link in the description. Let me know if you still have problems and I'll try to help out

    • @JosefinaArayaTapia
      @JosefinaArayaTapia 4 роки тому

      @@DaveDoesDemos Thanks :)

  • @rahulpathak868
    @rahulpathak868 3 роки тому +2

    It was really a great demo, can you please help demo of deploying data factory that have databricks notebook activity running.

    • @DaveDoesDemos
      @DaveDoesDemos  3 роки тому +3

      Hi Rahul, when doing this sort of thing you just need to create a build/test process for each deliverable. A Python library would have a build and test pipeline, as would the notebooks, and finally the ADF environment would have its own. Each build will produce an artifact in DevOps which you can tag as current, ready for deployment etc. and then you'd have a release pipeline which brings all three artifacts together and deploys to a test environment for integration testing before deploying to production and setting up triggers. Hope that makes sense? I'm currently working on a more full featured ADF demo to show more of the governance side of DevOps, hopefully that will also be useful as it highlights that scripted deployment is the easy part of DevOps!

    • @rahulpathak868
      @rahulpathak868 3 роки тому +1

      @@DaveDoesDemos thank you dave, waiting to see your another session soon.

  • @sameerr8849
    @sameerr8849 Рік тому

    Simple daigram or flow chat well help really well to keep things in mind for longer time Dev so i was hope that will help.

  • @swagatbiswal1779
    @swagatbiswal1779 4 роки тому +2

    I am facing issue , during the execution of the script , for deployment . Need your help on this Dave

    • @swagatbiswal1779
      @swagatbiswal1779 4 роки тому +1

      $Secrets = "Bearer " + "$(Databricks)"

    • @swagatbiswal1779
      @swagatbiswal1779 4 роки тому +1

      It's showing the term 'Databricks' is not recognised as the name of cmdlet , function script file or operable

    • @DaveDoesDemos
      @DaveDoesDemos  4 роки тому

      Hi there, did you set up the variable group in Azure DevOps pointing to your Key Vault with the Databricks secret in it?

    • @DaveDoesDemos
      @DaveDoesDemos  4 роки тому

      Hopefully this video will help resolve the issue ua-cam.com/video/l35MBEJiUgk/v-deo.html

    • @DaveDoesDemos
      @DaveDoesDemos  4 роки тому

      I've also updated the documents to correct the two issues we found (including variable groups). Thanks again for reaching out, it really helps to make sure I've not missed anything.

  • @ashishsangwan5925
    @ashishsangwan5925 Рік тому

    @dave - Can you send code for each loop ....for how to deploy multiple files from a folder ? It would be great help

  • @Sangeethsasidharanak
    @Sangeethsasidharanak 4 роки тому +1

    Nice demo..how can we add unit test when deploying notebooks using azure devops..

    • @DaveDoesDemos
      @DaveDoesDemos  4 роки тому +1

      Thanks for the question. Azure DevOps can consume the output of most test suites, so you'd just run a test using something like PyTest and then import the output into your build or release. Take a look at github.com/davedoesdemos/DataDevOps/tree/master/PythonTesting for instructions for Python - this isn't in Databricks but it shows the technique you need.

  • @sudheershadows1032
    @sudheershadows1032 2 роки тому

    Could you please explain me more about the binary contents in power shell script

  • @muralijonna5238
    @muralijonna5238 Рік тому

    Thanks for sharing such a wonderful demo can please create one demo how to create CI/CD pipeline for azure AD access token with service principal

  • @richardlanglois5183
    @richardlanglois5183 3 роки тому +1

    Great presentation!

  • @tejashavele
    @tejashavele 4 роки тому +2

    Is Build pipeline really required? Can we include Publish Build Artifacts task as a part of Release pipeline and not have build pipeline at all ?

    • @DaveDoesDemos
      @DaveDoesDemos  4 роки тому +1

      Thanks for the comment, and great question! If you are viewing Azure DevOps as a pure automation tool then no, technically there is nothing stopping you doing things that way and it will probably work for the immediate requirement. This is the same as the argument on Azure Data Factory where some people choose to use the Git repo directly. It's not good practice though since you're missing out on the whole purpose of the DevOps tooling, which is governance. To use the governance side of the tool properly then yes, you should ideally perform a build to create an artefact and then release to deploy and test it. I believe there are changes coming which will allow you to add "stages" in DevOps in a single pipeline, and although that appears to do what you're asking, ultimately you'd still have build and release with an immutable artefact in the object store. For data projects this is often a stumbling block because the initial requirement always looks like a simple automation one. I guarantee though if your project is large enough you'll eventually work out why you need the governance that goes with this, so I'd recommend using it from the start :)

    • @tejashavele
      @tejashavele 4 роки тому +1

      @@DaveDoesDemos Thanks Dave for a very quick revert. Appreciate your in depth answer. It makes sense 👍
      Keep posting videos. They are really nice ones.

    • @DaveDoesDemos
      @DaveDoesDemos  4 роки тому +1

      @@tejashavele Thanks for the feedback, more coming soon around IoT scenarios :)

  • @abhijeetmisra251
    @abhijeetmisra251 3 роки тому +2

    can you please let me know if u can have databricks notebook in ADF that can be moved to dev to QA

    • @DaveDoesDemos
      @DaveDoesDemos  3 роки тому

      Hi thanks for the question. The notebook would be separate to ADF. The Notebook and ADF pipelines would each use their own release mechanisms and you must create your processes to make sure these line up. If they are developed together then you would deploy both to your testing, QA, production together so that any structured integration testing can test them together. If they are not developed together then they would be deployed independently and your processes need to ensure you have some kind of "contract" in place to ensure the functionality of one is understood by the other.

  • @swagatbiswal1779
    @swagatbiswal1779 3 роки тому +2

    Mate, why are you not making CI/CD pipeline using Snowflake?

    • @DaveDoesDemos
      @DaveDoesDemos  3 роки тому

      Hi, thanks for the comment! I don't use Snowflake, so unfortunately can't help you there. I'm sure the process would be similar if they have a code first approach and either API access or some other way to push the artifacts into the environment.

  • @aaminasiddiqui8613
    @aaminasiddiqui8613 3 роки тому

    Hi @Dave Does Demos
    Thanks for it.
    How we can find out the powershell script.

    • @DaveDoesDemos
      @DaveDoesDemos  3 роки тому

      Hi the powershell is in the instructions linked to in the description github.com/davedoesdemos/DataDevOps/blob/master/Databricks/DatabricksDevOps.md

    • @aaminasiddiqui8613
      @aaminasiddiqui8613 3 роки тому +1

      @@DaveDoesDemos Thanks alot

  • @rajkiranboggala9722
    @rajkiranboggala9722 4 роки тому +2

    Getting git sync error as there’s no master anymore in git, it’s now main. Is there any work around?

    • @DaveDoesDemos
      @DaveDoesDemos  4 роки тому

      The "master" and "Main" are just labels so won't be the cause of your issue. Do you have it configured to use main? If you really think it's that you can rename main to master in Git.

    • @rajkiranboggala9722
      @rajkiranboggala9722 4 роки тому +1

      @@DaveDoesDemos thanks for your prompt reply. But, I do not think the sync works as I tried using azure DevOps git integration and it failed at git sync. When I tried the same with git integration with GitHub it did sync; could you try to connect a notebook via Azure DevOps git integration? probably you’d understand where I’m stuck. I would need to build the CI/CD and so I’d need to overcome this issue.

    • @DaveDoesDemos
      @DaveDoesDemos  4 роки тому

      @@rajkiranboggala9722 Hi I just tested this on Azure DevOps using both an existing and a new repo, and with Master and Main naming. All of these scenarios sync fine on my set up. I suggest first copy your file somewhere for backup and then unlink in the Databricks interface by clicking the git button at the top of revision history. Click unlink and then save. Next, click the button again and click link - make sure you set your url correctly using dev.azure.com/yourorg/yourproject/_git/yourrepo and then choose your correct branch. If the branch doesn't show then save anyway, wait for it to fail then try again and it should load. Also make sure you don't have a policy on your main branch in your org preventing commits. This would usually be the case in any well set up organisation since you should not be commiting to main/master but to a feature branch instead and then doing a pull request. If all of the above don't work, it may be a support ticket as there's something weird happening. Hope that helps!

    • @rajkiranboggala7085
      @rajkiranboggala7085 4 роки тому

      @@DaveDoesDemos I did follow everything as mentioned in the video, but, getting error: Error linking with Git: Response from Azure DevOps Services: status code: 500, reason phrase: ?{"$id":"1","innerException":null,"message":"Unable to complete authentication for user due to looping logins","typeName":"Microsoft.VisualStudio.Services.Identity.IdentityLoopingLoginException, Microsoft.VisualStudio.Services.WebApi","typeKey":"IdentityLoopingLoginException","errorCode":0,"eventId":4207}
      I do not know what to do!!! But, thanks for checking

    • @DaveDoesDemos
      @DaveDoesDemos  4 роки тому

      @@rajkiranboggala7085 500 would be server error which I would guess is incorrect url for the repo, are you certain that's right?

  • @SrikanthSiddhani
    @SrikanthSiddhani 4 роки тому +2

    Hi Dave, how to configure release pipeline for other environment (eg.; production) which is under another subscription.

    • @DaveDoesDemos
      @DaveDoesDemos  4 роки тому

      Hi thanks for the comment. Subscription wouldn't make much difference here, you'd just need to use variables for the connection to the cluster in each deployment pipeline and it will connect using the credentials you provide. Use a variable group for each deployment and store the variable values in Key Vault as in the demo. You could use several keyvaults in different subscriptions if necessary, but for simplicity you might want to choose a subscription for management and store the vault there. If you get stuck let me know and I'll see what I can do.

    • @SrikanthSiddhani
      @SrikanthSiddhani 4 роки тому

      @@DaveDoesDemos Thanks Dave, can you make a short video on how to configure pipeline to move notebook from one environment to another. That would be more helpful.

    • @DaveDoesDemos
      @DaveDoesDemos  4 роки тому +1

      Unfortunately I'm pretty busy at the moment but will certainly try to make something specific to that in the future. The video will be almost exactly the same though. The only thing that you'd change to send the code to a different environment is the script. specifically the $(Databricks) variable with a different secret, and the $URI might be for a different cluster. Both of these can be set up in variable groups such that you have one variable group for test and a different one for production. Hope that's helpful I'll try to get a guide written as soon as I can.
      $Secret = "Bearer " + "$(Databricks)"
      # Set the URI of the workspace and the API endpoint
      $Uri = ".azuredatabricks.net/api/2.0/workspace/import"
      I do have another Databricks code promotion video planned which will use an upcoming feature for integration with Github. As soon as I get access to the new code and permission to post about it I'll get that up too :)

    • @SrikanthSiddhani
      @SrikanthSiddhani 4 роки тому

      @@DaveDoesDemos Thank you Dave

  • @vkincanada5781
    @vkincanada5781 2 роки тому

    Can you please make a video on "Databricks Code Promotion using DevOps CI/CD" using Pipeline Artifact YAML method please..

    • @DaveDoesDemos
      @DaveDoesDemos  2 роки тому

      Hi, the methods would be identical using YAML so in theory you should be able to Google for the code examples. I have a strong preference against YAML for data pipelines in any team that doesn't have a dedicated pipeline engineer. Data teams simply don't need the stress of learning yet another markup language just to achieve something there's a perfectly good GUI for. Data deployment pipelines don't change often enough to make YAML worthwhile in my opinion. The time is better spent doing data transformation and modelling work.

  • @JosefinaArayaTapia
    @JosefinaArayaTapia 4 роки тому +1

    hi, this tutorial was very helpful to me.
    I would like to know if you have any about implementing Cluster through Devops?

    • @DaveDoesDemos
      @DaveDoesDemos  4 роки тому

      Hi, no I don't have a demo for that yet, but you could use the same techniques of ARM templates and API to achieve this I think.

  • @marwanechahoud7993
    @marwanechahoud7993 4 роки тому +1

    thank you Dave for this demo, it shows all the steps needed. I suggest something if you can add "CI-CD" in your video title to make it easy for others to find your video, for me I spent hours scrolling down on youtube to find yours, I guess yours should be on the top result :), thanks again David (y)

    • @DaveDoesDemos
      @DaveDoesDemos  4 роки тому +1

      Thanks for the suggestion, I'll add that in :)

  • @guptaashok121
    @guptaashok121 3 роки тому +1

    If we give file path upto a folder location instead of file, would it deploy all files in that folder. if not could you guide how can we make it recursive for all files or even to child folders as well.

    • @DaveDoesDemos
      @DaveDoesDemos  3 роки тому +1

      Hi Ashok, check out the newer version of Databricks which works in a much more normal way. There you can open a project (aka whole repo!) in the Databricks interface. While the theory is the same their implementation is much nicer. Unfortunately I've not had time to make a new video on this yet. The making artifacts and deployment will be pretty much the same, just that you now check in the whole repo at once.

    • @guptaashok121
      @guptaashok121 3 роки тому +1

      @@DaveDoesDemos thanks for quick reply. I could just use your code with foreach loop to deploy all files.

    • @DaveDoesDemos
      @DaveDoesDemos  3 роки тому

      @@guptaashok121 yes that definitely works

    • @trenkill2221
      @trenkill2221 3 роки тому

      @@guptaashok121 Can you tell me what component you use to make that loop and where it implements it? It is not very clear to me how to make CICD with all the files in a folder

    • @ketaraj
      @ketaraj 2 роки тому

      @@DaveDoesDemos Hi , Had you got any video created for moving all the notebooks in a particular folder please

  • @vinodnoel
    @vinodnoel Рік тому

    thank you so much

  • @DaveDoesDemos
    @DaveDoesDemos  5 років тому

    Have you tried out this demo? Maybe you're already doing DataOps and have some advice for others. Leave a comment here to start the discussion!

    • @SrikanthSiddhani
      @SrikanthSiddhani 4 роки тому

      Hi Dave, how to configure release pipeline for other environment (eg.; production) which is under another subscription.

    • @jaxo116
      @jaxo116 4 роки тому +1

      Hi Dave, I'm currently working on a CD pipeline that requires to install libraries in the Databricks Cluster, which apparently needs more settings, also would like to know if there is a DataOps 1 video this cuz on the playlist i'm only able to see from videos 2, 3 and 4.
      Excellent channel!
      Regards from Mexico!

    • @DaveDoesDemos
      @DaveDoesDemos  4 роки тому

      Hi @@jaxo116, thanks for the comment. Libraries should be the same process to upload, you can find details in the Databricks API reference on their site. I believe you have the choice to upload libraries to the workspace or to the cluster, which has some effect on the management of the libraries but works in basically the same way for either. If you have specifics let me know and I can look into it in more detail.
      Yes the naming of the series was bad, sorry! I had intended to upload an introduction to data ops video and never got around to it. There is a video about environments which serves as a bit of an intro. I will be putting out more videos soon explaining the wider DevOps stuff for data people. I've been running internal workshops on this so think I have a good handle on where data folk need the most pointers and explanation. There's a bunch of DevOps info out there, the problem is it's all aimed at coders so I've been trying to translate that for data people :)

  • @vickym3193
    @vickym3193 3 роки тому +1

    In my workspace, when I go under User Settings - Git Integration and select Azure DevOps Services as Git Provider, it gives an error pop up that says
    Error: Uncaught TypeError: Cannot read property 'value' of undefined Reload the page and try again. If the error persists, contact support. Reference error code: d2f8c3b5119a49cb9e6e854e9b336725
    Any idea on how I can resolve this or what is this error related to?

    • @DaveDoesDemos
      @DaveDoesDemos  3 роки тому +1

      That sounds like a support issue so I'd contact support and raise a ticket. Sorry I can't be more help

  • @HarryStyle93
    @HarryStyle93 4 роки тому +1

    Thank you for the demo but in that way only the feature branch of the first notebook will be versioned and not the preprod or prod ones. What is the right way to promote code from one notebook to another keeping them versioned and merging branches step by step? if I overwrite notebook content by workspace importing API , that will be replacing the notebook and git versioning link will fall

    • @DaveDoesDemos
      @DaveDoesDemos  4 роки тому +1

      Production is versioned within the object store of Azure DevOps. Humans won't have access to the production environment so there won't be any changes in that environment anyway, since all changes are made and submitted via the development environment. Hope that makes sense?

    • @HarryStyle93
      @HarryStyle93 4 роки тому +1

      @@DaveDoesDemos thank you! yes but what if you use Github which is supported by Databricks and the use case where you can have some "middle-way" environments such as Int, QA, Staging, etc ? if you want to keep all these envs versioned you should link git notebook per notebook or is there an automation mechanism that allows it?

    • @DaveDoesDemos
      @DaveDoesDemos  4 роки тому +3

      @@HarryStyle93 if you're using GitHub the process is identical. You write the code in your dev environment and then use DevOps to copy that into an artifact (the deliverable code). This code is then pushed into the various other environments, which you're free to configure however you like. In real life I'd recommend Dev (where you code and unit test) then Test (where you run integration testing against fake data with known answers), then QA/Preprod (where you run against real data but place the results somewhere else for checking, and finally production. These can be the same environment/cluster if you really want to work that way or need to save money, but ideally should be different.
      When doing this, the only environment linked to GitHub is the development environment, all of the others get notebooks and other stuff delivered by DevOps as artifacts and should not have human interaction.

    • @HarryStyle93
      @HarryStyle93 4 роки тому +1

      @@DaveDoesDemos ok I see your point. Thank you

  • @xelaCO14
    @xelaCO14 3 роки тому

    thank you!

  • @misterliver
    @misterliver 2 роки тому

    Thanks for the video Dave. It has been very helpful for me. There isn't much out there about Databricks CI/CD. After adapting to your stream-of-consciousness style, it seems the presentation of ideas vs the actions in the video are totally out of sync from a scripting perspective. If viewers have some experience with the CI/CD process in Azure DevOps already, this probably is not a blocker, but it could be a little difficult if no experience (the target audience?) or if English is not your first language.

    • @DaveDoesDemos
      @DaveDoesDemos  2 роки тому +1

      Hi Benjamin thanks for the comment and feedback. It's a difficult subject to cover well as most data people see CI/CD as scripted deployment, which is very easy. I wanted to cover it in the way it's intended which required a little more understanding of the collaborative nature of CI/CD and Git, and DevOps in general. I'm working on a bunch of new content in Microsoft UK around collaborative DevOps, testing and more agile data architectures with this stuff in mind and hopefully will translate these to some more up to date videos later in the year. This is borne out of seeing large mature customers hitting operational scale issues as data pros work in traditional ways. It's a long road though!

  • @yogeshjain5549
    @yogeshjain5549 4 роки тому +2

    Hi Dave,
    Thanks for this demo.
    I am actually trying to deploy only those notebooks which got checked in latest in Master branch.
    Is there a way to achieve this functionality.
    FYI: i have found one bash script which is giving latest committed files but without path using below link.
    levelup.gitconnected.com/continuous-integration-and-delivery-in-azure-databricks-1ba56da3db45
    Let me know if you have any idea about it.

    • @DaveDoesDemos
      @DaveDoesDemos  4 роки тому

      Hi thanks for the question. You should be deploying as a whole asset rather than picking and choosing, that way your deployment asset is a complete solution. Don't think of this as just version control, it's much wider than that.
      Having said that, you can do this with some scripting if you really need to, but it would add complexity. Is there a reason you don't want to grab everything in the folder?
      Finally, Databricks have announced Workspace 2 which will change the way this works for the better. Your whole project in Databricks will be a repo so you won't need to check files in one by one. I have no info on when this is coming, but as soon as I get access I will make a demo of the new functionality.

    • @yogeshjain5549
      @yogeshjain5549 4 роки тому +1

      @@DaveDoesDemos
      Thanks for your detailed information.
      In this request, i actually just wanted to deploy to PROD only those notebooks which got changed rather than deploying all of the folders and notebooks again and again.
      I have found the script to do this, my next challenge is to overwrite existing file on PROD if present rather putting it on 'Deployment' folder and then moving it manually to its actual location.

    • @DaveDoesDemos
      @DaveDoesDemos  4 роки тому +1

      @@yogeshjain5549 keep in mind that this introduces a risk that your code will not be in a known consistent state, and will be much harder to replicate the environment using your release pipelines. This is the reason we deploy as one artifact for the project, and it eventually enables you to use ephemeral environments for continuous integration and deployment. You may also want to encapsulate some of your code into libraries which will be deployed to the cluster separately, making the notebooks smaller. There are no hard rules though so as long as you know what you're doing it should work fine.

    • @GuyBehindTheScreen1
      @GuyBehindTheScreen1 2 роки тому +1

      @@DaveDoesDemos I feel like I have to be missing something. This appears to create a pipeline for a single file, but your saying we should grab an entire folder (all notebooks) which is what I'm attempting to do. Do I need to wrap your powershell script in a for loop to hit copy all my artifacts that were created in the pipeline step?

    • @DaveDoesDemos
      @DaveDoesDemos  2 роки тому

      @@GuyBehindTheScreen1 No you're not missing anything. When I made this video the API only supported single file copy so you'd have to copy them in a loop. The Databricks interface now supports whole repos so the method is slightly different but the concept is the same.

  • @goofydude02
    @goofydude02 2 роки тому

    18K + views but 900 subscribers why? if you are watching the content, no harm to subscribe right?

  • @tarunacharya1337
    @tarunacharya1337 2 роки тому +1

    Awesome demo Dave, thanks a lot - I have replicated this and works ok with one notebook in the same environment - the file name is hardcoded - $fileName = "$(System.DefaultWorkingDirectory)/_Build Notebook Artifact/NotebooksArtifact/DemoNotebookSept.py", how can I generalise this for all the files and folders in the main branch and what happens to $newNotebookName in this case?

    • @DaveDoesDemos
      @DaveDoesDemos  2 роки тому +2

      Hi glad you enjoyed the demo. I'd recommend looking at using the newer Databricks methods which I've not had a chance to demo yet. These allow you to open a whole project at a time. For my older method you'd want to list out the contents of the folder and iterate through an array of filenames. In theory since you'll want your deploy script to be explicit you could even list them in the script using copy and paste, although this may get frustrating in a busy environment.

  • @AlejoBohorquez960307
    @AlejoBohorquez960307 2 роки тому

    Thanks for sharing such a valuable piece of information. Quick question, I'm wondering what if my workspace is not accessible over Public Network and my Azure DevOps is using a Microsoft Self Hosted Pipeline? Any thoughts?

    • @DaveDoesDemos
      @DaveDoesDemos  2 роки тому +1

      In that case you'd need to set up private networking with vnets. The method would be the same, you just have a headache getting the network working. Usually there's no reason to do this though, I would recommend using cloud native networking, otherwise you're just adding operational cost for no benefit (unless you work for the NSA or a nuclear power facility...).

    • @AlejoBohorquez960307
      @AlejoBohorquez960307 2 роки тому

      Yeah! we are facing that scenario (customer requirement). Basically, the Azure DevOps Microsoft hosted agent (and because of that the release pipeline) wherever it'll get deployed on demad, needs to be able to reach our private databricks cluster URL passing through our azure firewall. So far I haven't got any strategy working on this. Would appreciate if you know some documentation to take a glimpse. Thanks for answering. New subscriber!

    • @DaveDoesDemos
      @DaveDoesDemos  2 роки тому

      @@AlejoBohorquez960307 Sorry I missed the hosted agent part. Unfortunately I think you need to use a self hosted agent on your vnet to do this, or reconfigure the Databricks to use a public endpoint. It's very normal to use public endpoints on Databricks, we didn't even support private connections until last year and many large global businesses used it quite happily. I often argue that hooking it up to your corporate network poses more of a risk since attacks would then be targeted rather than random (assuming you didn't make your url identifiable, of course).