I’m almost more interested in somebody who can pass a technical test, but who has an undergraduate degree in sociology, in economics, in political science, in one of these soft science type things that it shows that you’ve thought about how to think about things from a theoretical perspective or a human perspective. And yet you still have the tech schools. I like that combination of skills. It’s a very interesting one.   

David Yakobovitch

This is HumAIn a weekly podcast focused on bridging the gap between humans and machines in this age of acceleration. My name is David Yakobovitch and on this podcast, I interview experts in sociology, psychology, artificial intelligence researchers on consumer-facing products and consumer-facing companies to help audiences better understand AI and its many capabilities. If you like the show, remember to subscribe and leave a review.

David Yakobovitch

From AI governance and your privacy to how you can level up your learning and awareness on AI systems. Today’s guest speaker shares his take on both public and private partnerships in the age of AI. Enabling the future of work requires continuous learning. It requires a commitment to remove your biases and it requires your persistence to show up to opportunities. Find out what Jed Doughtery’ has to share on this episode of HumAIn.

Thanks for tuning back into the HumAIn podcast I have with us today, Jed Doughtery¹, who is a lead data scientist at Dataiku², talking all about things in machine learning, AI, governance, and how you can be involved in the AI race. 

Jed Doughtery

One of the big pushes that Dataiku goes for, or has tried to build it’s been building on it’s product since 2013 is this kind of inclusive idea of artificial intelligence is that as long as we’re building this big scary thing, and the whole world is now beginning to use it, should be able to be used as by many people as possible because limiting it to a few smart dudes in their ivory tower or at the very forefront of a view, few very small tech companies is also going to limit who gets power from it. 

So the more we can make these tools relatively easy to use the more evenly distributed the power that comes from them is going to be. And we’ve always kind of flopped to do that. It scares a lot of people in the data science community that we make some of this stuff easy because they want to be the gatekeeper. But it’s also as long as we’re also educated while we’re giving people the ability to do these additional things, then we’re doing the right thing or at least trying. 

David Yakobovitch

That’s such an interesting topic so definitely we should discuss that during the podcast because whether it’s Google next or whether it’s a splintering of code or drag and drop with products like yourselves and many other competitors out there,  that’s interesting.

Jed Doughtery

Google and Amazon right now, both of them are pushing for this thing of like let us do the algorithms and then you just push your data to our API endpoints. We’ll do the magic and then return the results, which keeps the power in their hands. Although it lets everybody build stuff on top of that power to us, it’s an interesting give-back and forth. They’re not saying; “let us do everything”. They’re just saying: “let us maintain the control of the actual algorithms”. 

David Yakobovitch

When we think of enterprise machine learning pipelines, that’s something that’s been all the rage and it’s, again, splintering. Do you learn how to code and build all the pipelines by scratch?  If you’re a citizen data scientist, maybe that’s something you’re looking to up-skill on so you could be on a data science team, but if you’re the enterprise executive, perhaps you like these automation tools that Google Amazon, even Dataiku that you guys are coming out with. Do you have any thoughts on what direction the industry’s going towards?

Jed Doughtery

What I see a lot in the larger enterprise folks that I work with is they have a  hard time. Okay, let’s just talk about the median income of a data scientist at Google.  There’s a Wall Street Journal post that came out pretty recently about this. 

Talking about what’s not just the data scientists at Google or the big four tech companies, but just the median income employee there. Facebook it’s like $245,000, no big company in America, except those big four tech companies is going to be hitting that median income for their data scientists.

Which means they have a different pool of talent to pull from, period. And like these are huge companies like the largest banks, the largest, manufacturing, the largest, pharmaceutical companies in America are still not hiring for the same talent pool that Google, Amazon, Facebook,  Netflix are hiring from.

And so if you’re an executive, you know this, and as much as you want all your guys to be exactly like building everything from scratch because that’s where the absolute to getcha comes from. You also know that it’s just not the people you’re hiring. And maybe it doesn’t need to be the way it doesn’t need to be, might be a few different avenues of things that you can replace or tools you can give these people so they don’t need to be this full spectrum, super data scientist. That’s kind of what Google is trying to provide.

We think this is the hardest part. And so we’re going to give you this. So then you can have people who do everything else.  Google and Amazon have slightly different perspectives on what the hardest part might be than what Dataikudoes. That’s kind of what they’re all trying to help with. 

David Yakobovitch

In April 2019, I had the opportunity to be part of the New York AI conference and this was a community conference where we talked about AI ethics and new AI tools, and the direction the industry was taking. And there were a lot of amped-up audience members who were very concerned about workforce initiatives and how everyday people can transition into tech, relevant careers and there are reports always coming out.

We also saw in April, Deloitte came out with a 2019 human capital report. And that report said: “We’re moving to a future of super jobs”. We’re moving to a future of jobs where one person replaces what two or three people used to do as a result of automation, code, click and drop. what’s your take on that? 

Jed Doughtery

It’s feasible. A guy is running for president of the United States right now, who just had an amazing podcast on FiveThirtyEight and I’m ripped using your podcast reference. That’s not another guy’s podcast, but Andrew Yang, he’s a big proponent of universal income or universal basic income, and a big part of his argument for it is exactly along these lines of realistically, this is it.

And everybody’s like, she’s the horse and buggy analogy. Oh, we’re just going to find new jobs to replace these old jobs. But, that’s maybe not realistic. And realistically this time there is not going to be the same set of middle-class white-collar jobs existing anymore in 10 years in 20 years, it’s not just the bus drivers that the cab drivers and the people working at the front of the supermarket that is going to be replaced or removed by AI.

It’s going to be a huge part of the management, the work tracking HR data pipelines within companies. All of that is the real danger of jobs making that slightly easier for everybody else very easily, likely to be displaced. 

I was talking to a friend of mine. She’s in the film industry and part of what the film industry does is they take, everybody writes a script for a television show. That script gets sent to a set of lawyers. Those lawyers manually look at every name in the script and then look that name up on Google to make sure that it’s not as someone’s actual name is going to sue them for using a television script. This is like a super-solved problem in NLP. 

Someday, somebody soon is going to do this and all those lawyer jobs are going to go away. That’s true, like, throughout every single industry, there are examples of this. And I don’t think we have any idea of how disruptive this is going to be, but that was a silly little example, but it’s everywhere.

David Yakobovitch

That’s around the automation pillar of how do you take a repetitive process and simplify that through a dashboard, simplify that through a report or simplify that through an insight that can replace someone who was doing that manually and then create this super job that’s augmented by the human. And another example to piggyback off that is Ginny who’s the CEO of IBM was talking about that IBM’s created a very interesting model recently,  for HR. 

So I’ll use the use case she brought up on work tracking. She said in April 2019, that now they’re able to at 95% accuracy predict when you’re going to quit your job. It was out in the news and everyone’s like, well, that’s interesting from an HR perspective, you want to protect the company, serve the best interests, but how to bow from the employees perspective? 

Jed Doughtery

You made an interesting statistical assumption there, which is 95% of the accuracy of when I’m going to quit my job. It’s probably not that. It’s probably if I have a thousand people I can predict with 95% accuracy when the total batch of them will quit their job for me. It’s much better for the employer right than it is for the employee. I only quit my job, maybe one or two times or three times in my life.

Then being wrong one in 20 times for one in 20 employees is a very different 

experience for them than it is for me is the wrong one and that, it’s a weird statistical point I don’t know if I’m doing it perfectly well, but this type of  95% accuracy that’s good that’s almost always better for the person who has the widest batch of people, where they’re just happy they got it right 19 out of 20 times than one wrong person or the wrong one that they picked. Not to say it’s not a super interesting technology, but there is a,  it is high accuracy things that are still wrong every once in a while are always better for the people dealing with the batch than the person is the individual. 

David Yakobovitch

One of my colleagues is in this space, Chris Butler talks a lot about empathy mapping. He talks about these HR systems that have triggers and talking about, yes, 95 out of a hundred times, it’s accurate or 19 out of 20 times it’s accurate. But what about that other person singled out where the mistake occurred is that legal case waiting to happen or is that an issue with the AI system? 

And so, but Chris talks about empathy mapping, he says how can we design AI systems to be diverse and to be inclusive and to be well educated and trained for multiple scenarios? And that’s one of the big challenges and a lot of companies are starting to see. This year is yes, we could implement AI everywhere. AI for all AI, for every company. But how good is it? And how accurate is it?

Jed Doughtery

The diversity on teams is huge and the explainability of models is huge. If I’m going to let somebody go before they quit, I better know exactly what feature set pushed them into the category they’re probably going to quit. 

And I would want to know which of these input features like, could I tweak one of these input features? Can I pay these guys three grand more to change this? So far AI has been about prediction and not an explanation of these predictions. 

And that works well. What ll you’re trying to do is get the most people to click through the ad or whatever, but now that as expanding it to other industries, the explanation of why the choice was made is becoming more and more and more important. And most of the high-class models and most of the most predictive models are bad at that. This is a huge thing that Dataiku is working on this year. It’s almost more important to stop making your models more accurate and start working on making them more explainable. That is the key, as we start rolling these things out across huge industries. 

David Yakobovitch

And then making them more explainable. You and I have had the opportunity to chat before the podcast about the Google next conference that you also checked out in April. And we know that Google’s come out with their, “what if” tool to also help with that explainability of models and a lot of new products as well. And what were some things that you saw out of that? Those demos and those new product lines that you think could be helpful for the explainability of an AI?

Jed Doughtery

Google is doing some interesting stuff. I don’t know if it’s a lot beyond a toy yet. They’re working hard to make it more than that. There’s some interesting open-source software that’s already out there that tries to start bringing into bridging this gap. The Lime Package is quite a good start. I am part of Python.

There are other packages out there that are working for this local explainability were, okay.  If I’ve chosen this person to be in this category locally, what mild changes in this person’s input set would move them into the other category? Google’s “What if” allows you to slide along a single flight, a single feature while holding everything else equal and see how that changes. 

I was talking to somebody at mini tab which is like an old statistical program, so I was talking to a statistician for many tabs. And one of the things they’re working on is trying to bring themselves back into the forefront of this type of stuff by working extremely hard and being able to combine different features into like a single one if slider, so you have like these are your three most important features as these adjust in your feature space. How does this thing change? Google and Amazon are working on this hard, but a lot of other people are a lot smaller players in the industry are also working on this.

One thing I noticed with Amazon they’re trying to build endpoints that are just single. So you have a single prediction point and you have an endpoint that will provide some NLP process, so this endpoint will recognize all the nouns in your block of text. You send it by any block of texts, this thing spits back out the nouns, but that endpoint doesn’t give you anything else, like an endpoint gives you none of the “what if”, none of the error bounds. Nothing. It’s just, you put in, you trust in Amazon and you get it out.  that’s a little bit scary.

David Yakobovitch

It is scary. What if my emails were running? And it says the email is sensitive or not sensitive. That could be good for a company, protecting information and trade secrets, but it still is very “Black Box” and it’s how do you empower the users to then have decisions driven by AI?

So if instead of HR being flagged: “this email is sensitive”, we should look at this employee. We should look to terminate them. Let’s use that other AI algorithm to see if they’re one of that one out of 19, or five out of a hundred that should be removed. Or how about, instead of empowering those employees to say a pop-up appears in your email browser? We think the email contents you’re sharing are sensitive. Please be mindful. Are you sure you want to submit it? 

Jed Doughtery

So pushing the power back to the user immediately. I like that.

David Yakobovitch

it’s interesting. And that not a lot of the companies are doing that yet. And that is possible, and that’s empowering. Human augmented intelligence. When it comes to AI governance thinking about ethical decision-making thinking about regulations, humans should be at the forefront. And I don’t think it’s hard to do. It’s what you were saying earlier Jed is how diverse is the audience who are building these systems? 

Jed Doughtery

I have an interesting quick example about that: A couple of years ago I was working for the city of New York, building out an algorithm that looked at 311 calls, which is like, a boring 911, basically like potholes, ”kids below me are partying” that type of thing.

We built out this algorithm based on the idea of the algorithm was which calls are worth following up on at which calls are basically like cranky people who will never be happy. That way New York city would know where to devote its resources.

And we built out the algorithm and it was working super well with very high accuracy. But before we put it into production, of course, I mapped it. I knew where the past 24 hours I was getting realistic calls versus cranky calls because location shouldn’t matter in this context. Unless there’s like a whole bunch of cranky people living in the same area, which then you maybe have a whole different issue, but of course, location mattered a ton.

And I was only able to immediately recognize location-based trends because I live here in New York and I know “Oh, this neighborhood in Brooklyn” or “this neighborhood in the Bronx”. The background of people in those neighborhoods. And it’s just the model had just learned to be racist. 

Like, because systematically, historically in New York, people from communities where there are higher percentages, people of color were more ignored by the city. It just learned to ignore those people and still, and so, of course, we could not put this model in production, but without some personal knowledge of the area, we were putting this over or without adding in census data or something like that where then we’d be able to see these red flags being right raised.

I never would have picked up on that. We would have put that model into production and started ignoring people which is kind of concerning. If you don’t have people connected to the things you’re trying to predict, it’s easy to miss a trend to assume that you have complete data when you do not.

David Yakobovitch

At the New York AI conference, I was in April also I had the chance to be on the panel and  I sat down with Albert Christie who is featured in episode two of the HumAIn podcast. And he painted a post-apocalyptic scenario. He said for those of us in the United States or Europe (who are very familiar with privacy rights) it’s being explored. Data is very sensitive and things cannot be used without your permission.

But in Eastern Asian countries, a lot of people know in China, there’s this social credit monitoring system that’s been implemented, which is very interesting, to say the least. And all he said: “Let’s think about the U S, what if that system was put in place? What is it if your Facebook data, your medical searches on the web MD, your audio messages from your Facebook portal, Alexa, devices we use and all this information was used to the side, whether you should be treated for diagnosis, whether you should get approved a loan, all your search habits if it’s all being accounted for”.

It’s interesting because it creates a, it creates a bigger snapshot of who you are as a person, but it just leads down that same rabbit hole of we’re building racist AI systems, and we’re not building fair AI systems and that’s a huge concern.

Jed Doughtery

As we start building more AI systems to make decisions about humans and about how humans should be treated. And how we’ve been treated in the past becomes the input for these systems. AI systems still learn through supervised learning, which means a set of records that have been applied to a sub-label has been applied to this set of records. And if we’re making decisions about humans, the only people who have been making these decisions about humans in the past are humans.

So your labels are generated by humans, which means they have all the failures and foibles of our current society. Pushing those into a model makes that model exactly as good as our current society or worse because it’s missing some points. And clearly, our current society has institutional racism built into it and institutional sexism built into it. 

David Yakobovitch

One extra scenario he shared was like, let’s imagine a future where autonomous vehicles are on the road and a scenario appears where the autonomous vehicle needs to kill someone. Because it’s just in that space where it’s moving too fast and at this point it’s unavoidable. So if you’re in the vehicle it probably won’t kill you. It’s going to target either another vehicle or it’s going to target the person on the road and the insights came out if you’re a shareholder of that autonomous vehicle company probably won’t hit you because of that data (we’ll be early there) saying: “Oh, this person owns 10,000 shares!” let’s make sure we don’t kill them. And so the almost comical outlook was saying buy into these companies now, and you’ll be safe when the algorithms go wrong. 

Jed Doughtery

I like that angle. You originally think like it just chooses like the most utilitarian model. Like minimum death or something like that.

Then of course you’d never buy a car that’s going to kill you. And then start layering, those different things on top of that. I like that it gets to shareholders. That’s good. 

David Yakobovitch

And it’s so interesting because there was big news that came out in April, which is still being unpacked. Of course, we know Lyft, IPO Uber is in the process of getting that set up as well. Uber just came out saying, look, last year we made $10 billion. But we also lost $3 billion, so that just hit there, and there the financial briefing to go public. 

And then the icing on the cake, what I found so fascinating is Ford as a car company came out saying: “we overestimated the capabilities of autonomous driving and we’re down-scaling our units and downscaling our investments we do not think autonomous vehicles will be ready by 2025 as all the predictions were saying”. They just recently announced that. 

Jed Doughtery

What they mean is we overestimated human enthusiasm for autonomous driving.

David Yakobovitch

Was it the hype for the AI? Was it the hype for the humans was hype for the systems? 

Jed Doughtery

Maybe the North American car manufacturers and the world in general thought that people would be more willing to give up their driving rights than maybe they actually are and now we’re seeing that human pushback.

I do think the technology absolutely would be ready by 2025 from a technological perspective. You’d have far fewer deaths on the road in 2025 if everybody’s switched over to autonomous driving vehicles? Yes. A hundred percent. But will people give that up? I don’t know. I doubt it, I don’t think it will, I don’t think it would be an easy transition by any means.

David Yakobovitch

And if you’re in a city like both you and I are based like New York City, it’s a lot easier to give up having a view.

Jed Doughtery

I’ve been out of the car for 10 years.

David Yakobovitch

But here it’s easy, then it does bring up the question of a grid system, so, some major cyber-attacks occurred to New York City, streetlight cameras, and the police cameras to look at license plates that actually went down in April as well and it was down for over a week and a lot of news outlets reported how the New York City responded to their facial recognition software is down and was it happened in Albany too. So it happened in the state capital of New York as well.

The interesting thing is it crippled Albany’s systems, Albany completely has been offline. They’re pulling out like dot-matrix computers and how we got to get back up online. But New York city being so centralized or almost futuristic, if you will, without parking, without cars still happened and lifestyle went on. I actually, I don’t think you and I would have known about it if it wasn’t so publicized.

Jed Doughtery

I had no idea, but I was also out of town. What exactly went down? It was like the parking garages and the public stop or entry-exit points like that? Or a centralized computing system? 

David Yakobovitch

Computer centralized system. Basically, all the police cars are hooked up to the wifi networks and they have the body cams and they have the cameras in the car and they can also use software to maybe detect if people are not where they should be or who these people are, so the wifi network got hacked and there was a service request that took it down. 

And some people are saying: “Oh no the system went down!”, but it was the wifi. The wifi was not secured and got compromised. And this was not the link NYC street wifi that everyone uses and then, of course, the blame goes around.

Why did this system go down? Who took down the system? And it’s less to isolate this incident, but more to think about. In the future of Evil Corp, we think about Mr. Robot, we think about a society where everything lives digitally. How much are these systems protecting us or protecting the bureaucracies? What’s your take on that?

Jed Doughtery

It’s very associated also with what you were talking about in China earlier, they managed to centralize where all of their data is going which allows them to do this like global score. At the same time, it makes their system may be weaker because if you knock out that central point, then the whole thing falls into the garbage. At least in the United States,  Amazon has 30% a year and data, Facebook has 40% of your data. Google has 20% of your data, whatever that may be. However, you split your shopping and emailing and social networking time. 

And thank God they’re kind of separated if they were all together, then somebody would have this complete idea of your life and be able to predict every moment of it and say, how much of a worthy individual you were to society. Having a distributed network that’s the whole idea behind the coin revolution, distributed networks are stronger than centralized networks from an attacker’s perspective.

The more we rely on a municipal centralized source of wifi as people are moving to the less or the more susceptible to a complete knockout attack you’re gonna be, so centralization is weak, period. Like the Death Star, a single proton torpedo, you blow the whole thing.

David Yakobovitch

You cannot have a geeky episode without a Star Wars reference. So that is phenomenal and since we just had that segue-way on Star Wars, just briefly, what is your take on Disney? 

Disney has been doing a lot in New York. I don’t think we’ve talked much on the podcast so far, but for those who don’t know, Disney recently completed their acquisition of Fox and the Fox 21st Century products suites, including movies and assets. And now they own a majority stake in Hulu and ESPN and ABC and all these organizations. It was announced just a few months ago that Disney is building a 70 story skyscraper in Manhattan. They’re building a 70 storey skyscraper. 

Jed Doughtery

The Avengers tower.

David Yakobovitch

The Avengers Tower. You can go there, take pictures with all your favorite action figures and buy a Disney park pass. But in addition to that, they’ve made a huge bet on New York and a huge bet on New York City. We’ve seen companies who’ve also considered making that bet or not making that bet and Disney’s doing that. You mentioned your work also with the New York government and in these projects, like the 311 projects. 

The New York government also made a big bet the other day. They rezoned some land in New York City. That’s been sitting empty as vacant lots and said: “we’re going to build a 30 story skyscraper for workforce initiatives. We’re going to build this 30-floor skyscraper just for training up the future workforce of America”. 

And I wanted to take it back to this direction because I’m so passionate about education and you are as well about democratizing access for machine learning and AI. But this is a question that always comes up at panels and everywhere. Like how do we save America? How do we save the people? And you were talking about the case earlier,  the truck drivers and middle America, any hunches there? 

Jed Doughtery

New York City is a place that is still, maybe almost exclusively-inclusive is what I would say. New York City is still a place where people from all walks of life, run into each other, touch each other accidentally. Have crazy random interactions daily. New York is still a place that is not isolated into a single industry or a single group of people or in a single way.  That’s the biggest driver of why people keep coming here.

I just spent a bunch of time in San Francisco and San Francisco has been just owned by tech. You can’t go anywhere without talking to people who’ve already read all the same blog posts you have or had the same conversations you’ve had, or have the same fears you have, worries about the future of the world, whatever it’s frankly, boring.  You go to LA and everybody’s just freaked out about the film industry, in Boston everybody’s worried about education, Washington DC. Everybody is politics, politics, politics, 

New York, nobody owns New York. And so when you come here, you meet different people from different walks of life, with different fears, different aspirations, different goals. I love that about here.  That’s why people are making bets about here. And that’s why the center of machine learning could be here. 

Because you have the people who are gonna be affected by it,  you have the people who know about all of these different worlds, have the business knowledge of all these different worlds, as well as you have this growing tech base of folks who can implement this stuff.  This city is a great place for machine learning right now. 

David Yakobovitch

It’s a great place for machine learning and then it’s just, well, it’s a great place for jobs, right. You know, ADP came out with their payroll report that they come out with every month and they said we are now officially the lowest in the United States, the lowest job unemployment in April 2019 since 1969. When do we land on the moon? When did this happen? And who failed to land on the moon in April 2019? Israel. So that also recently happened. 

It’s so interesting how we can go over 50 years later in technology advancements and processing, and then crashes happen and bugs happen. And what we talked about in New York City so this bug I kept talking about wasn’t a bug. It was his hack. This crash is DDOS, whatever we’re gonna call it. But New York’s calling it the Y2K-like bug. Which is what brought their systems down and say, what? Y2K in 2020? Oh my gosh, we must be running on Cobol mainframe.

Jed Doughtery

The banking sector is so. We have that going for us kids. If you want to get a job that will never go away, just learn COBOL, because banks will never turn themselves off. And that’s what they ended up writing themselves in. 

David Yakobovitch

It’s such an interesting tangent and for those on the episode, whether you are looking to get into tech or you’re currently in tech determining the next direction you’re going. Cobol is one of those legacy languages that it’s a very solid language, just been around a long time and some consultants travel the world and make like $250 an hour to keep Cobol systems online at banks, which is crazy. It’s so in demand.

But  Cobol is not what you should learn unless you’re right now 60 plus and that’s what I gave the same remarks and data science live, working on an episode between Monterrey, Mexico, in New York on those topics.

But when I talk with students who I’m educating weekly and helping people get into this pathway, I always get the same question and it’s tough to answer. They say, okay, so should I learn code? If I’m going to learn code, it’s going to be SQL, it’s going to be Python with those two I’ll get a solid start to play with the whole package, I got it. Or should I just pick up Google cloud or AWS or Azure? Or products like data IQ and these other platforms and get certified and work with application interfaces. I wanted to know your thoughts about it, Jed. 

Jed Doughtery

The former. You can pick that other stuff up as you work, wherever you end up going, they’ll have already chosen something and you’ll learn that on the job. And I would say even before. The thing that’s most useful day to day in my job is the Linux command line. I mean, yes, I write Python, I write SQL, but I’m constantly jumping on different people’s production servers and when you’re on somebody’s production server you need to be able to navigate around the Linux command line.

And if I am looking at a gigantic 20 gigs CSV file, and I need to be able to open this, the first thing I’m going to need to know is how to manipulate it using an OCHIN set and things like that. These are old-school tools, but Linux is running like 99% of the servers in the world right now, it’s not going away. Learn the next command line. It’s fun. It’s easy. It’s an incredibly powerful tool. That’s my only addition to that.

David Yakobovitch

So we’ll rephrase we’ll say: don’t learn COBOL, but do you consider learning Linux commands. And the great thing about Linux if you’re a non-techie in the audience you might say: ‘Oh my gosh, what is Linux’?

Jed Doughtery

Right.  of bearded guys running Linux on their laptops, like, Oh, I have a Bluetooth running on my laptop. No, I use a Mac laptop, or I use a PC laptop, but every time I go onto a server, anywhere it’s Linux, that’s the background of every business in America.

David Yakobovitch

And the truth is you could pick up the basics of bash and Linux in four to six hours. So that’s the other thing important to know is we talk about workforce initiatives, everyone fears. Do I need to go and study four more years of knowledge to become tech-relevant?

I don’t think that’s true. A lot of the boot camps went into the nine-month models and the six-month models now, the three-month models’ timeline is always relevant based on goals and outcomes to be achieved. But the part that is still being determined is, how quickly those shifts are going to be seen across the entire economy?

Jed Doughtery

It’s a quick train up as opposed to four years, eight-year degrees.

David Yakobovitch

Absolutely. I talked about super jobs, but it’s also about super degrees. I mentioned that earlier in education that now because students don’t know that direction, some students are going into a double major, triple major, triple minor. 

Jed Doughtery

You never know, cover all the bases. I’ve hired folks out of boot camps and I’ve hired Ph.D. ‘s and I’ve had boot camp people, out class I believe Ph.D. a hundred percent. It is because this is such a new industry that 10 years at school, you may have learned 5% of that might’ve applied to the job that you’re working on right now in the industry.

The boot camps have more flexibility. However, just because you went to a boot camp doesn’t make you any good. There’s no guarantee there from a hiring perspective that’s how I see it. I’m not like this guy went to a Bootcamp he’s going to know what he’s going to be doing or she went to Harvard for her Ph.D.,  she’s going to know what she’s doing. Hiring is hard in this profession right now. We haven’t figured out exactly how to do that. There are 1 billion blocks about that out there. 

David Yakobovitch

Like, it’s an initial feature that says: “Oh, you went to Bootcamp. So you might be quite competent. I do want to interview because I am curious to see your skills” or “you did go to an Ivy tower and I still want to interview you”. But it’ll get you in the door.  It’ll get you to that. A Phone screen or technical screen, and then you have to perform for any of those data scientists who are listening in here today. Are those students ready to get to the next level? Any tips for them? Interviews have been talked about ad nauseum on the interwebs, but since you hire a lot and you interview a lot of candidates… (could you give them any recommendation). 

Jed Doughtery

Hi everybody! I don’t want to say that I have got your questions in my interview, but I have some expected things during an interview and this is a hard thing to train, or is it a hard thing to teach, which is maybe why I  expect this because I want to get lucky with the people I’m hiring so that they have this, is a certain intuition about data.

As part of our interview process, we give people a big messy dataset, and there are certain things if somebody’s played with data a lot before they’ll notice as suspicious or iffy about it. Maybe without them being able to even put it into words what’s “Oh, why is this strange?” 

I want people to pick out these oddities and ask me questions like: “why is this off in this?” Instead of just, all right. I took a dataset. I know exactly what I’m supposed to do. Turn all these variables with categorical variables, turn these into numerics, rescale them, throw them into the random forest, get an output in handy that I do not want that just because you know how to do that physical process the fact that is you didn’t look at it, you didn’t think about it, you’re just qualified. So looking that thinking that intuition is what I look for. 

David Yakobovitch

Getting the solution is great, but then what does getting the solution do? Perhaps it creates, again, that black box issue of you got the solution, but could there have been issues with the data issues with the thought process that perhaps with a closer look could have been solved or could have been, remedied earlier? 

Jed Doughtery

With the advent of auto ML and kind of the tools that we’re talking about Amazon releasing earlier, I can throw this dirty dataset into their API right now and get back the exact solution you gave me with your 60 lines of Python. So I need you to go a step further and think. 

David Yakobovitch

It’s so interesting and that is one of the big takeaways on education is yes, feel free to play with auto ML. Feel free to play with SageMaker in the AU. Feel free to play with data IQ products and get a sense of that automation. 

But you also want to know the behind-the-scenes. You also want to be able to process what’s going on because when you have to customize something, knowing code, knowing Linux, understanding SQL and Python is going to take you to that next level. 

Jed Doughtery

What’s interesting to me and I’m partially biased because I do have a degree in political science. I’m almost more interested in somebody who can pass a technical test, but who has an undergraduate degree in sociology, in economics, in political science, in one of these soft science type things that show that you’ve thought about how things are from an ethical perspective or a human perspective. And yet you still have the tech schools. I  like that combination of skills.  it’s a very interesting one. 

David Yakobovitch

It’s what makes us human. And we talk about that every day on HumAIn. If you’re interested to hear more about the work that Jed has shared with us today, you can check it out at Dataiku, you can check out Jed Doughtery as well. And also go to the HumAIn podcast website. That’s humainpodcast.com Thanks for being with us today Jed and look forward to talking with you soon. Thanks, bye. 

Works Cited

¹Jed Dougherty

Companies Cited

²Dataiku