Resources

Workflow Demo Live Q&A - September 25th!

On September 25th, we hosted a Workflow Demo on data-level permissions using Posit Connect (with Databricks, Snowflake, OAuth: https://youtu.be/ivEoeyWJzVY?feature=shared) Links mentioned in the Q&A: Release Blurb: https://docs.posit.co/connect/news/#posit-connect-2024.08.0 Security: https://docs.posit.co/connect/admin/integrations/oauth-integrations/security.html Publishing Quarto: https://docs.posit.co/connect/how-to/basic/publish-databricks-quarto-notebook/ sparklyr: https://github.com/sparklyr/sparklyr?tab=readme-ov-file#connecting-through-databricks-connect-v2 odbc: https://github.com/r-dbi/odbc?tab=readme-ov-file#odbc- Helpful resources for this workflow: Full examples to get you started: https://github.com/posit-dev/posit-sdk-py/tree/main/examples/connect Admins will likely be most interested in starting here: https://docs.posit.co/connect/admin/integrations/oauth-integrations/databricks/ End users will be most interested here: https://docs.posit.co/connect/user/oauth-integrations/ Databricks Integrations with Python Cookbook: https://docs.posit.co/connect/cookbook/content/integrations/databricks/python/ Databricks Integrations with R Cookbook: https://docs.posit.co/connect/cookbook/content/integrations/databricks/r/ Snowflake Integrations with Python Cookbook: https://docs.posit.co/connect/cookbook/content/integrations/snowflake/python/

Sep 26, 2024
29 min

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hey everybody, thanks for jumping over here to the Q&A. We're going to give people probably another minute or so to join us over in this room.

Well, thank you all so much for joining us today and thank you to Zach for such a great session. As a reminder, we host these workflow demos the last Wednesday of every month and they are all recorded. And so the recording is made available immediately after the demo. And so we now have, I believe, 19 different workflows shared there. I can't believe we've been doing it this long already. So anything from building a model annotation tool to pins workflows to building beautiful business reports.

I want to add while I know many people joining me already be Posit customers or using our professional products. If you're new to Posit team and would like to try it out or if you want to chat more with our team, always feel free to message me on LinkedIn or email me. But I'm also going to put a link in the chat where you can schedule time to chat with our team. And sometimes that's really helpful if there's questions that might be better suited for a more one on one conversation where we can actually dive deeper into your own workflows.

Introductions

Well, to get us started here today, I would love to just quickly go around and do some brief introductions. And I didn't introduce myself when I kicked it off, but I'm Rachel Dempsey. I lead customer marketing here at Posit. And so I host a few different community events like this one. If you've never been to our data science hangout before, I host that every Thursday. We have a different leader from the community joining to answer questions from you all as well.

Hello, everyone. My name is David Kegley. I'm a software engineer on the Connect team. I've mostly focused on the integration stuff that we've been talking about today. But a lot of the Kubernetes work and office execution back end as well.

Hi, everyone. My name is Pablo Bianco Lascaray. I work with David and Zach on the Connect team. I'm a quality assurance engineer on the team. Let me just try to make sure things work before we release them.

Yep. My name is Zach Verham. I just talked for 20 minutes about this feature that we're really, really excited about and really happy to be on this workflow demo to share. And I work with David and Pablo. I'm a software engineer. Before working on this feature, I was doing a lot of work on sort of what the experience is for Connect admins as they interact with the product.

Q&A: authentication and SAML

Does this feature work when Connect is using the SAML authentication provider?

Yeah, I can take that one. So the well, the feature itself uses OAuth under the hood and does require that you're using OAuth with the external data sources. It does not require that Connect itself is using the any particular authentication provider. So you might be using SAML or LDAP or PAM or username and password off. Any of those are fine and any of those can be used with this new feature.

Q&A: Quarto and Markdown reports

Another question that came in on Slido is, can you deploy Quarto or Markdown reports from the RStudio IDE to Posit Connect using Snowflake or Databricks credentials?

Yeah, I can take that one as well. So the there currently this feature is limited to interactive content. So it does require that a Connect user actually visits the content to to inject the credential. So unfortunately, we can't support rendered content yet. So our Markdown and static Quarto are not supported yet, although we do have plans on the roadmap to be able to support those with a little bit of a different OAuth flow. So there should be news coming out about that soon.

I don't want to talk about deadlines specifically, but I think that's one of the next things we're going to work on. So unfortunately, static and rendered content still do require environment variables, but we'll be working on that soon.

Yeah, I can add a little bit of flavor here. I was just looking at that at this particular problem this morning and figuring out what the solution will be. So this is something that we are like very much aware of and something that we want to support. And the question specifically says without setting variables. Right. But we do have a guide specifically deploying Quarto to connect using environment variables. So if anybody is curious about that, we can also provide the links.

Q&A: data masking visibility

Another question was, is it apparent to viewers of the data that masks have been applied? Or is logic exposed anywhere for viewers to understand why they may not see all the data?

Yeah, that's a great question. So it is it is going to feel different for viewers who are used to interacting with Kinect, because up until today, any any content you run on Kinect, every viewer of that content is going to see the same view. There is no such thing as like a dynamic application where one viewer sees one thing and another viewer sees a different thing. So it is not obvious right off the bat that that what you're seeing is is correct, even though it is like the ACLs are applied at the data source. That's one of the pieces of feedback we've gotten a few times now. And I think it's something we'll be addressing probably through the dashboard.

We can make it a little bit more apparent that that these integrations are in use through the dashboard. Another option is for the application author to to display some banner or something at the top, just to let folks know that what they see might differ from what someone else what someone else sees. It is a it's an explicit choice by the content author. So it's not something that's going to happen without any effort. And so it should be pretty easy for the content author to just display some banner and let folks know what to expect.

Q&A: OAuth cookbook and open source support

Another one was thanks for the demo. Zach mentioned that the cookbook for the OAuth sample code may be released this week. Is it released yet? And if so, can you post the link?

It's not released yet, but it definitely will be. We're working on that. And when that's available, we'll make sure that the links are available to everybody.

Yeah, and we'll have links to the SDK, the Python SDK, and that has a subset of the examples that will be in the full cookbook. So if you want to look at things today, there are examples in there. There will just be a more robust set of examples that will go out later this week.

Another Slido question was, will this Databricks integration be available on open source or studio?

So the integrations that Connect is managing, unfortunately, are not going to be available on the open source version. There is no open source version of Connect or the closest thing we have is Connect Cloud, which is free to use and then public alpha. But we we don't have support for these OAuth integrations there yet. If you're looking to use Databricks and Snowflake from R and Python, Posit helps maintain the R odbc package, which has some really helpful Snowflake and Databricks connectors. We can share a link to that. And then Posit also helps maintain the PySparkly R package for interacting with Spark on Databricks.

Q&A: LDAP and OAuth providers

If we still use LDAP for login to Connect, can we also use this OAuth feature out to Databricks?

Yeah, yeah, for sure. So, yeah, regardless of the provider that you're using to log in to Connect, you can still set up an integration with Databricks through OAuth.

The other question, I believe, was are there any OAuth API providers that are on the shortlist to add?

Yeah, it's hard to know sort of exactly what the timelines are for certain things. Again, right, like scoping out when things will land is hard, but we have in the immediate short term, GitHub is one that we're prioritizing. And then we are going to be looking at like BigQuery and AWS and figuring out how to set up first party integrations for those as well.

Yeah, I can add to that. Azure is supported today. That was one of the first ones that we implemented. So we have support currently for Azure Databricks and Snowflake. And then we're working on GitHub now. That should be happening pretty soon. And then after that, I think, as Zach said, it's going to be like Google and AWS probably next. But if there are other things that I didn't list, then we would love to hear about what our customers are using and what they want us to integrate with next.

And one additional caveat here is we showed this briefly at the end of the workflow demo. But there is a custom integration type that lets you fill in all of the like it gives you kind of full control over filling in all the information you need to hit any standard OAuth providers. So if you have something that is not supported implicitly in one of those dropdown templates that we have, that custom integration is sort of a way to integrate with anything that we don't have in that list.

So if you have something that is not supported implicitly in one of those dropdown templates that we have, that custom integration is sort of a way to integrate with anything that we don't have in that list.

Yeah, I think that bears repeating, right? Even though we have a list right now of three supported providers, we do provide a way to add custom integrations there. So you might be able to add support to a lot of things today. In the future, though, there will be better, easier ways to do it, but you should be able to do it today too.

Q&A: Unity Catalog and Active Directory groups

Does the masking and filtering in Unity Catalog also work with Active Directory groups? In addition to providing a username like you showed in the demo?

That's not something we have tested. I think it would depend on how Active Directory is configured. I'm assuming Unity Catalog has some way to connect out to Active Directory, which would mean that Databricks would be aware of the user's group. If that's the case, I think you should be able to write a Databricks filter that uses the group to perform some logic during the masking process. And that would mean, so I think that would mean, yes, it would be possible to do that today. I'll try to find some associated documentation, and I'll send a link in the chat if I can find it.

Q&A: alternative approaches for user-specific data

What are some alternative approaches that you have seen customers use when retrieving user-specific data from an external data catalog?

I think the most common one we've seen, like specifically within the context of Connect, is deploying those multiple instances of the same app and setting static service account-esque credentials on that app, where the service account has the same permissions as the user that is interacting with that duplicated instance of the app that's published, which obviously is very clunky, and that's why we wanted to build this feature.

There are two others that come to mind. There's the possibility of actually implementing OAuth inside of your content. It is technically possible to handle the OAuth flow from your content, but it would mean you would have to implement it for every piece of content, and that's a lot of work. It's also really difficult to maintain, and it gets really complicated, particularly with high-availability Connect. You never know which Python or R process you're going to connect to, and so that would be weird.

The other option is using some of the metadata provided by Connect when the content is running. Connect does expose information about the connecting viewer to the content itself. You can get that information out of one of Connect's headers. You could write application-specific logic for every piece of content that says, if this viewer then access the external resource with this other set of credentials, but that doesn't scale very well either.

So we think that what we've got here is a much better approach for managing that because it's going to scale a little bit better. Rather than maintaining that access control logic inside of your content, you can maintain the access control logic at the data source where they have really robust controls for writing those access control rules.

Rather than maintaining that access control logic inside of your content, you can maintain the access control logic at the data source where they have really robust controls for writing those access control rules.

Q&A: row-level security and security considerations

Are today's row-level security features new for Databricks in RStudio?

The row-level security features are not new for Databricks, but the ability to delegate those row-level security features to RStudio is new. Previously, Workbench had integrations with Databricks already, where the user inside of Workbench could have delegated those Databricks rules to Workbench, but that ability did not exist in Connect before today. So now that Connect can kind of inherit those row-level permissions, it gives you the ability to have that same experience that you have in Workbench in Connect.

Are there security considerations we need to keep in mind when building apps using this feature?

We have a doc in our admin guide that lays out a lot of those implications. The sort of high-level summary I can give here is that publishers and app writers need to consider sort of the lifecycle of their app and where in memory the OAuth credentials are and avoid using, for example, global variables to store that information. But we have a doc that helps kind of talk through what those security considerations are and the things to be aware of when writing your apps. But the publishers just need to be aware of how the feature works and where those credentials are stored. That's the main thing to be aware of when using the feature.

Q&A: release status and version requirements

Somebody asked, is this feature released already or is it soon to be released?

It is available, yes. Yeah, if you upgrade to the August edition, August 24 edition of Kinect, it'll have the feature in it.

Is there a minimum requirement for the R version or Python version to use these new OAuth features?

No, no minimum requirements for R or Python versions. Everything we've implemented to use this feature uses really standard REST APIs and is built on top of OAuth. So you might be somewhat limited by what libraries you have available, but that doesn't mean it's not possible. You would just have to write a little bit more boilerplate to make it work.

Run as current user comparison

I thought for sure someone would ask about run as current user. I'm very hesitant to bring it up at all. But I do want to point out that this feature is similar to run as current user, but it is still a little bit different.

Run as current user allows the viewer of an application to view the application. Basically, it's a way to use Kerberos with Connect. But it does require PAM authentication. And this new feature can be used with any authentication provider. So that's one distinction. Another distinction is that with run as current user, every viewer of the application is executing the content in their own isolated process on the server. This OAuth integrations feature will share content processes under the hood. So every viewer of the application is not getting their own instance of a process. It does have some different security implications. We try to talk about that a little bit in the security talk that we've already shared out in the chat. But this new feature is much more lightweight. And it's going to work with any authentication provider. Those are the main distinctions.

I don't have anything in particular, but I do appreciate all the questions that were asked. So thank you, everyone. Yeah, thank you so much, everybody, for taking the time to join us today. I just wanted to add that I said this in the beginning, but we do hold these monthly workflow demos the last Wednesday of every month at 11 a.m. Eastern Time. And so if you ever have suggestions or workflows that you'd like to see, please let us know. You can add that into the comments here below in YouTube.

But I did want to give you a little sneak preview of next month's demo. Ryan Johnson is going to be joining us to talk about PDFs and talking about types and showing you how to create beautiful PDF with Posit. And so we'd love to have you join us at that one as well. This link that I showed on here adds the recurring series of the workflow demos. They are also all recorded as well. So you can see them all on the playlist there. Thank you so much, Pablo, David and Zach. Really appreciate you taking the time to join us and have a great rest of the day, everybody.