(00:15:22) Podcast Clip of
Making Wind Energy More Efficient With Data At Turbit Systems
ОТНОСТНО ТОЗИ ЕПИЗОД
Summary
Wind energy is an important component of an ecologically friendly power system, but there are a number of variables that can affect the overall efficiency of the turbines. Michael Tegtmeier founded Turbit Systems to help operators of wind farms identify and correct problems that contribute to suboptimal power outputs. In this episode he shares the story of how he got started working with wind energy, the system that he has built to collect data from the individual turbines, and how he is using machine learning to provide valuable insights to produce higher energy outputs. This was a great conversation about using data to improve the way the world works.
Announcements- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise.
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- Today’s episode of the Data Engineering Podcast is sponsored by Datadog, a SaaS-based monitoring and analytics platform for cloud-scale infrastructure, applications, logs, and more. Datadog uses machine-learning based algorithms to detect errors and anomalies across your entire stack—which reduces the time it takes to detect and address outages and helps promote collaboration between Data Engineering, Operations, and the rest of the company. Go to dataengineeringpodcast.com/datadog today to start your free 14 day trial. If you start a trial and install Datadog’s agent, Datadog will send you a free T-shirt.
- You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data platforms. For more opportunities to stay up to date, gain new skills, and learn from your peers there are a growing number of virtual events that you can attend from the comfort and safety of your home. Go to dataengineeringpodcast.com/conferences to check out the upcoming events being offered by our partners and get registered today!
- Your host is Tobias Macey and today I’m interviewing Michael Tegtmeier about Turbit, a machine learning powered platform for performance monitoring of wind farms
- Introduction
- How did you get involved in the area of data management?
- Can you start by describing what you are building at Turbit and your motivation for creating the business?
- What are the most problematic factors that contribute to low performance in power generation with wind turbines?
- What is the current state of the art for accessing and analyzing data for wind farms?
- What information are you able to gather from the SCADA systems in the turbine?
- How uniform is the availability and formatting of data from different manufacturers?
- How are you handling data collection for the individual turbines?
- How much information are you processing at the point of collection vs. sending to a centralized data store?
- Can you describe the system architecture of Turbit and the lifecycle of turbine data as it propagates from collection to analysis?
- How do you incorporate domain knowledge into the identification of useful data and how it is used in the resultant models?
- What are some of the most challenging aspects of building an analytics product for the wind energy sector?
- What have you found to be the most interesting, unexpected, or challenging aspects of building and growing Turbit?
- What do you have planned for the future of the technology and business?
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
- Thank you for listening! Don’t forget to check out our other show, Podcast.__init__ to learn about the Python language, its community, and the innovative ways it is being used.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
- Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat
- Turbit Systems
- LIDAR
- Pulse Shaping
- Wind Turbine
- SCADA
- Genetic Algorithm
- Bremen Germany
- Pitch
- Yaw
- Nacelle
- Anemometer
- Neural Network
- Swarm64
- Tensorflow
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Английски
САЩ
ВИЖТЕ БЕЛЕЖКИТЕ 🔗
00:00:13Introduction and Call for Contributions
00:01:40Interview with Michael Tiegmeier: Introduction and Background
00:10:16Optimizing Wind Turbine Performance
00:18:37Data Collection and Analysis Challenges
00:24:15System Architecture and Data Pipeline
00:30:30Challenges in Building Analytics Solutions
00:36:19Future Plans and Trends in Energy Sector
ТРАНСКРИПЦИЯ 🔗
WEBVTT
NOTE
Transcription provided by Podhome.fm
Created: 7/6/2024 1:33:45 PM
Duration: 2448.072
Channels: 1
1
00:00:13.955 --> 00:00:17.974
Hello, and welcome to the data engineering podcast, the show about modern data management.
2
00:00:18.500 --> 00:00:22.440
What are the pieces of advice that you wish you had received early in your career of data engineering?
3
00:00:23.140 --> 00:00:33.114
If you hand a book to a new data engineer, what wisdom would you add to it? I'm working with O'Reilly on a project to collect the 97 things that every data engineer should know, and I need your help.
4
00:00:33.495 --> 00:00:35.114
Go to data engineering podcast.com/90
5
00:00:36.480 --> 00:00:55.200
7 things to add your voice and share your hard earned expertise. And when you're ready to build your next pipeline or want to test out the projects you hear about on the show, you'll need somewhere to deploy it. So check out our friends over at Linode. With their managed Kubernetes platform, it's now even easier to deploy and scale your workflow, so try out the latest Helm charts from tools like Pulsar, Packaderm, and Daxter.
6
00:00:55.760 --> 00:01:02.980
With simple pricing, fast networking, object storage, and worldwide data centers, you've got everything you need to run a bulletproof data platform.
7
00:01:03.295 --> 00:01:04.915
Go to data engineering podcast.com/linode,
8
00:01:06.415 --> 00:01:08.034
that's l I n
9
00:01:08.335 --> 00:01:15.950
o d e, today and get a $60 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show.
10
00:01:16.329 --> 00:01:19.789
You listen to this show to learn and stay up to date with what's happening in databases,
11
00:01:20.225 --> 00:01:24.564
streaming platforms, big data, and everything else you need to know about modern data management.
12
00:01:25.024 --> 00:01:34.979
For more opportunities to stay up to date, gain new skills, and learn from your peers, there are a growing number of virtual events that you can attend from the comfort and safety of your home. Go to data engineering podcast.com/conferences
13
00:01:36.605 --> 00:01:40.385
to check out the upcoming events being offered by our partners and get registered today.
14
00:01:40.685 --> 00:01:52.915
Your host is Tobias Macy. And today, I'm interviewing Michael Tiegmeier about Turbot, a machine learning powered platform for performance monitoring of wind farms. So, Michael, can you start by introducing yourself? Hi. Yeah. Of course. So, yeah, I'm,
15
00:01:53.455 --> 00:01:57.795
Michael, and I'm, the founder and CEO of, Turbot Systems.
16
00:01:58.095 --> 00:02:00.915
And we are basically a data analytics platform
17
00:02:01.579 --> 00:02:03.200
for wind turbines.
18
00:02:04.140 --> 00:02:04.640
And,
19
00:02:05.340 --> 00:02:06.320
we have built,
20
00:02:06.780 --> 00:02:10.959
some tools to make the the maintenance and operation of wind farms
21
00:02:11.295 --> 00:02:12.034
more efficient.
22
00:02:12.334 --> 00:02:14.915
And, yeah, I'm looking forward to a great conversation.
23
00:02:15.614 --> 00:02:18.754
And do you remember how you first got involved in the area of data management?
24
00:02:19.295 --> 00:02:19.795
Yeah.
25
00:02:20.540 --> 00:02:21.280
Of course.
26
00:02:21.580 --> 00:02:22.080
So
27
00:02:22.620 --> 00:02:23.980
my my education is,
28
00:02:24.460 --> 00:02:28.960
is a physicist. So I somehow always was dealing with data in
29
00:02:29.635 --> 00:02:36.855
in the university when we met made some experiments. And then also you had to write some programs to analyze data, of course.
30
00:02:37.314 --> 00:02:37.814
So
31
00:02:38.490 --> 00:02:41.230
that that's, like, 1 part where I got
32
00:02:41.530 --> 00:02:45.230
confronted with some data, let's say, and also maybe some super complicated,
33
00:02:45.770 --> 00:02:46.270
data.
34
00:02:47.055 --> 00:02:48.435
And in my studies,
35
00:02:48.975 --> 00:03:04.920
I I wrote my bachelor's thesis about some measurements I did on on wind turbines. So as, you should know, wind turbines need to be directed into the wind in order to generate power. And, what I was doing in that, bachelor's thesis was
36
00:03:05.315 --> 00:03:11.095
I was, making measurements with a laser system. It's called LIDAR, so light detection and ranging.
37
00:03:11.635 --> 00:03:14.200
And, with that laser system, you could
38
00:03:14.599 --> 00:03:15.739
see the
39
00:03:17.239 --> 00:03:23.825
wind direction and the wind speed in front of the turbine. So that means that the turbine could or that
40
00:03:24.125 --> 00:03:26.465
was, at that time, that was somehow the,
41
00:03:27.085 --> 00:03:29.665
the vision to control the turbine
42
00:03:30.890 --> 00:03:36.430
before the wind is actually at the turbine. So see what's coming in before the turbine and then
43
00:03:36.810 --> 00:03:45.205
have some control algorithms that that turn the turbine in the right direction and the pitch of the rotor blades and and and the right angles before the turbine actually,
44
00:03:45.605 --> 00:03:52.469
before the wind is actually at the turbine. So but the measurement itself was, like, of course, again, with a lot of data, and you had to
45
00:03:52.849 --> 00:03:56.069
match these data of this new LIDAR system
46
00:03:56.515 --> 00:04:02.135
with the data from the turbine, like when is the turbine running, what wind speeds were measured with
47
00:04:02.435 --> 00:04:07.420
the LIDAR system, what wind speeds were maybe measured with some other systems, like some
48
00:04:07.720 --> 00:04:09.580
anemometers on top of the turbine.
49
00:04:09.959 --> 00:04:10.459
And,
50
00:04:11.565 --> 00:04:12.305
yeah, and
51
00:04:12.685 --> 00:04:14.465
so to to make the story,
52
00:04:15.005 --> 00:04:17.425
maybe to up until I get to Turbot,
53
00:04:18.205 --> 00:04:21.580
later on then, I was still doing some some physics.
54
00:04:22.120 --> 00:04:31.565
I I also might, I I I made my my master in in in laser physics, but this time, in in pulse shaping, temporal and and space pulse shaping.
55
00:04:32.025 --> 00:04:34.604
And we did this this with splitting
56
00:04:34.985 --> 00:04:35.725
a laser
57
00:04:36.264 --> 00:04:36.764
beam
58
00:04:37.065 --> 00:04:42.030
into its parts, into its different frequencies. And then you could control each frequency,
59
00:04:42.409 --> 00:04:45.069
the polarization and and the amplitude of that
60
00:04:45.595 --> 00:04:52.495
frequency. And then you put it the laser together, and then you could have a laser pulse that that's formed in in the time domain.
61
00:04:52.960 --> 00:05:01.300
So, like, there's coming a lot of energy in the beginning of the laser poles and then maybe later a little bit more. And what we were doing there is that we were
62
00:05:01.865 --> 00:05:05.565
trying to get electrons out of out of the atoms,
63
00:05:06.025 --> 00:05:13.430
and we didn't really know how to push the electron to to to put it in in some easier words, maybe.
64
00:05:14.130 --> 00:05:21.815
So the the electron is moving in the in the atom, and and at some point in time, you need to push the electron out. So and we didn't really know how to
65
00:05:22.115 --> 00:05:24.695
form that pulse. And we did this
66
00:05:24.995 --> 00:05:28.855
then with trial and error and basically with a genetic algorithm,
67
00:05:29.569 --> 00:05:31.830
And that was the first time where I
68
00:05:32.210 --> 00:05:32.870
have seen
69
00:05:33.409 --> 00:05:34.370
the power of,
70
00:05:35.090 --> 00:05:36.229
yeah, such algorithms.
71
00:05:36.944 --> 00:05:40.564
And I got super interested in in in, yeah, the power of,
72
00:05:41.025 --> 00:05:44.965
what you could do with data analytics and and, let's say, the the first
73
00:05:45.680 --> 00:05:48.340
idea of machine learning. It's not really machine learning, but,
74
00:05:48.960 --> 00:06:01.245
first algorithm that that comes into searching how how a computer is finding something out that you don't know about. And then later you try as a physicist to understand, okay, what what was going on there? Why did it work? Why did we
75
00:06:01.625 --> 00:06:07.020
why were we able to put the electron out of the atom just with this form of pulse?
76
00:06:07.640 --> 00:06:08.140
And,
77
00:06:08.840 --> 00:06:13.815
yeah, that was quite interesting. And and then later with with my knowledge about,
78
00:06:14.755 --> 00:06:15.495
wind energy,
79
00:06:15.955 --> 00:06:16.615
I was,
80
00:06:17.555 --> 00:06:19.335
I was thinking, like, what to do,
81
00:06:20.660 --> 00:06:23.080
what to do in in life after after studying.
82
00:06:23.780 --> 00:06:30.755
And I was looking for something that maybe is also making it sounds maybe stupid, but making the world a little little bit better.
83
00:06:31.375 --> 00:06:32.595
And I came
84
00:06:33.134 --> 00:06:40.259
to renewable energies, and and I I found wind energy most interesting because parts are turning. You have a lot of data.
85
00:06:40.639 --> 00:06:45.860
You have it's it's international. You you can go around the world, and and
86
00:06:46.395 --> 00:06:47.375
it has everything
87
00:06:47.835 --> 00:06:48.655
that you need
88
00:06:49.354 --> 00:06:49.854
for
89
00:06:50.235 --> 00:06:53.435
for your brain to to have some interesting things to work on. And,
90
00:06:54.169 --> 00:06:55.850
so, yeah, that's how
91
00:06:56.650 --> 00:07:05.795
also, I decided to to found Turbot Systems because actually, this is a kind quite a interesting story, maybe. The first time I got on the turbine
92
00:07:06.415 --> 00:07:06.915
was
93
00:07:07.215 --> 00:07:15.740
with this lighter measurements, and and I I got quite dizzy, like some sort of seasick. Like, turbine tower is, like, 100 meters high, and,
94
00:07:16.280 --> 00:07:20.300
when you're at the top and the turbine is switched off or is even running,
95
00:07:20.995 --> 00:07:25.575
there's a lot of movement of the tower. And if you don't cannot look outside
96
00:07:25.955 --> 00:07:29.920
because you are inside of the tower, you you get seasick. And I was thinking, okay,
97
00:07:30.540 --> 00:07:32.240
if there's so much vibration
98
00:07:32.620 --> 00:07:36.560
due to the wind, then, of course, you need need to see some some
99
00:07:37.104 --> 00:07:40.005
some wind direction also in in in the wind movement.
100
00:07:40.705 --> 00:07:41.205
And,
101
00:07:41.665 --> 00:07:45.125
then together with some mates from the university, I I was
102
00:07:45.690 --> 00:07:47.389
looking to that problem
103
00:07:48.169 --> 00:07:52.990
more deeply. And I found out, yeah, there's there's a relation between the wind direction
104
00:07:53.574 --> 00:07:55.675
and the type of the movement of the tower.
105
00:07:56.134 --> 00:07:56.615
And,
106
00:07:56.935 --> 00:07:59.354
that also meant that you could maybe,
107
00:08:00.134 --> 00:08:02.794
see the wind direction more precisely. And
108
00:08:03.400 --> 00:08:06.380
so we did some measurements, and this is how I came
109
00:08:06.920 --> 00:08:13.875
to, to found Tervit, actually. And then later, we became more a data scientist sites company.
110
00:08:14.255 --> 00:08:18.355
Yeah. It's definitely a very interesting problem domain because as you said, wind energy
111
00:08:18.735 --> 00:08:19.875
is ubiquitous
112
00:08:20.390 --> 00:08:28.650
in terms of its availability around the world because the air is always moving. So it's something that can provide a lot of benefit, particularly
113
00:08:28.974 --> 00:08:31.235
for countries who are just starting to
114
00:08:31.615 --> 00:08:42.960
build out their renewable infrastructure. I know that Germany has been using wind energy fairly heavily for a number of years at this point. So I'm sure that that also helped in terms of access to be able to
115
00:08:43.340 --> 00:08:50.235
build out your product while being able to sort of remain local and do things, within your home country. Yeah. Totally. Like,
116
00:08:50.695 --> 00:08:56.930
apart from Denmark I hope I'm not saying too much wrong here, but, apart from Denmark, Germany has been
117
00:08:57.490 --> 00:09:00.710
quite early in in wind energy. And, of course, Germany is
118
00:09:01.089 --> 00:09:02.149
always known as
119
00:09:02.610 --> 00:09:04.850
a engineering country, maybe. And,
120
00:09:05.825 --> 00:09:08.404
yeah, like, Wit Energy has been here
121
00:09:08.705 --> 00:09:09.845
at my home.
122
00:09:11.105 --> 00:09:16.050
I'm I'm born in Bremen, and and we have a lot of wind turbines there and also around Berlin.
123
00:09:16.510 --> 00:09:20.370
In Brandenburg, there are there are many, many turbines. And I could see
124
00:09:20.795 --> 00:09:24.895
that they exist, let's say. And then, of course, Germany is a good country
125
00:09:25.355 --> 00:09:26.095
to build
126
00:09:26.395 --> 00:09:31.759
a new company for wind energy, I think, because of all the resources that you have here and the knowledge
127
00:09:32.459 --> 00:09:34.480
and and the connections that you can potentially,
128
00:09:35.019 --> 00:09:37.120
get. On the other hand, also, maybe Germany's,
129
00:09:38.220 --> 00:09:43.185
very special in in the wind energy domain because of its history. And
130
00:09:43.645 --> 00:09:45.025
that's actually a good thing.
131
00:09:45.405 --> 00:09:51.700
Wind turbines are owned in Germany by many, many people. There's, like, this this thing, it's which is called, so,
132
00:09:53.520 --> 00:10:04.175
energy for the for the people, let's say. And then small cities, they they invest with many people in in a wind turbine, and then they profit from it, financially. And this concept,
133
00:10:04.555 --> 00:10:08.175
maybe you don't see so much in other countries like the USA or
134
00:10:08.620 --> 00:10:12.400
China where where there's more, like, big manufacturers that own
135
00:10:12.780 --> 00:10:13.920
big wind farms
136
00:10:14.380 --> 00:10:14.780
and,
137
00:10:15.665 --> 00:10:21.605
yeah. And so for Turbot Systems in particular, I know that 1 of the main focuses
138
00:10:21.985 --> 00:10:41.095
of the product that you're building out is to help improve the overall operating efficiency of the turbines, both individually and in aggregate. So I'm wondering if you can just talk a bit more about some of the ways that you're helping to optimize the output and some of the most problematic factors that contribute to performance
139
00:10:41.790 --> 00:10:47.410
degradation in wind turbines and in oh, and both individually and in aggregate? Mhmm. Yeah.
140
00:10:49.055 --> 00:10:52.014
So basically, a wind turbine is like a plane. So,
141
00:10:52.815 --> 00:10:55.315
it has wings, which we maybe call
142
00:10:55.694 --> 00:10:56.194
rotors,
143
00:10:56.610 --> 00:10:57.910
and they are directed
144
00:10:58.450 --> 00:11:06.895
they they must be shaped in a very special way, and they must be directed into the winds while the turbine is turning in a very directed
145
00:11:08.075 --> 00:11:08.575
into
146
00:11:09.755 --> 00:11:10.255
the
147
00:11:11.435 --> 00:11:11.935
the
148
00:11:13.115 --> 00:11:13.615
wind,
149
00:11:15.160 --> 00:11:17.660
directed into the the wind,
150
00:11:18.760 --> 00:11:33.220
so that, actually, the turbines are facing into the wind, which which we call yaw. And and both these pitch and yawing needs to be, yeah, optimal in order to to get all of the energy out of the wind. And so
151
00:11:33.600 --> 00:11:38.660
when I was talking about the lighter system, and my bachelor see this, the the goal was to correct
152
00:11:39.805 --> 00:11:46.225
the the way the turbine is turning into the wind. So the problem is that in in big wind farms, in big yeah.
153
00:11:46.560 --> 00:11:47.459
In big wind
154
00:11:47.839 --> 00:11:52.180
farms, you have other turbines in the wind park that are creating turbulences,
155
00:11:52.959 --> 00:11:54.180
and you have maybe
156
00:11:55.055 --> 00:12:03.795
sites that or a forest at at some sort of some some part of the wind park that is redirecting the wind in a weird way, let's say. And you want to make sure that
157
00:12:04.210 --> 00:12:06.710
the turbine algorithm or the turbine behavior
158
00:12:07.090 --> 00:12:09.430
is always in such a way that it gets the maximum
159
00:12:09.810 --> 00:12:20.965
possible power output. So that's directed correctly into the wind and also with the pitch pitch systems. So but if you have a measurement of the wind on top of the nacelle that's behind the rotor plane,
160
00:12:21.410 --> 00:12:28.470
then you always have some arrows and you wanna be able to correct this. And in addition to that, it's like a very simple problem
161
00:12:28.935 --> 00:12:41.440
that sometimes the technicians that go up and and and put the anemometer that's measure measuring the wind's direction on top of the nacelle, they do this with an arrow and sometimes with, like, more than 5 to 10 degrees, and nobody's
162
00:12:41.899 --> 00:12:43.920
detecting that. And then you have a
163
00:12:45.095 --> 00:13:04.165
a bad performance of the of the turbine. So this is how we started with, as I said, with the vibration measurements. But then later on, going more in the the in the data analytics part, well, you get it you you get a lot of information from the turbine or potentially get it. So the turbine is logging a lot of data like wind speed, wind direction,
164
00:13:04.944 --> 00:13:05.444
temperature
165
00:13:05.985 --> 00:13:09.365
of the outside air, then temperatures of the gearbox,
166
00:13:09.824 --> 00:13:14.600
temperature, like, a lot of data up to 500 different values. And
167
00:13:14.980 --> 00:13:15.800
up to now
168
00:13:16.660 --> 00:13:18.040
or maybe the past
169
00:13:18.415 --> 00:13:21.075
up to the past 2 years, nobody really
170
00:13:21.455 --> 00:13:23.635
analyzed this data, these these
171
00:13:24.015 --> 00:13:24.915
huge datasets.
172
00:13:25.695 --> 00:13:29.140
So another thing that we found out is that
173
00:13:29.520 --> 00:13:31.380
sometimes the turbine is
174
00:13:32.720 --> 00:13:35.220
operating in the in the throttle mode
175
00:13:35.895 --> 00:13:37.915
that nobody knows about. So
176
00:13:38.855 --> 00:13:40.235
sometimes because of regulations,
177
00:13:40.775 --> 00:13:41.495
because of,
178
00:13:42.214 --> 00:13:44.475
noise regulations, the turbine should not
179
00:13:44.790 --> 00:13:46.089
produce much
180
00:13:46.390 --> 00:13:52.730
power or is producing less power than it actually could. And sometimes these turbines go into these
181
00:13:53.125 --> 00:13:53.865
noise modes,
182
00:13:54.645 --> 00:13:59.385
without anybody knowing it. And so we figured out, okay, let's let's do some
183
00:13:59.765 --> 00:14:12.485
general analyzation of the normal behavior of a turbine, and let's look if there's something that we can find with turbine is not behaving in a normal way. And that's, like, that's totally a data analytics
184
00:14:12.945 --> 00:14:13.445
problem.
185
00:14:14.065 --> 00:14:28.400
We don't really maybe have all of the domain knowledge of 1 particular turbine, how it should turn, and how it should behave. But we can look at the data and and see and and look for abnormal abnormalities. And with that
186
00:14:28.714 --> 00:14:30.255
example that I was talking about,
187
00:14:30.634 --> 00:14:35.454
you can understand, like, if if the turbine is producing half the energy that it could,
188
00:14:35.910 --> 00:14:38.329
then, of course, this is a huge factor,
189
00:14:38.870 --> 00:14:39.930
economic factor.
190
00:14:40.790 --> 00:15:03.205
And if you find these data points and these these turbines that are not producing enough energy, then then you clearly have a value that you can give to your customers. And so you mentioned that at least up until the last couple of years, that a lot of this data that was being collected with the systems that are embedded into the turbines is being ignored or not analyzed in any great detail.
191
00:15:03.585 --> 00:15:08.405
I'm wondering what the current state of the art is as far as being able to
192
00:15:08.980 --> 00:15:13.080
analyze the performance of the turbines and correct for errors
193
00:15:13.459 --> 00:15:16.760
and do any sort of preventive maintenance to reduce downtime?
194
00:15:17.300 --> 00:15:17.800
Yeah.
195
00:15:18.685 --> 00:15:19.185
So
196
00:15:19.565 --> 00:15:20.785
up to now,
197
00:15:21.165 --> 00:15:26.385
they're standard, at least in Germany, to have 10 minute average values of different,
198
00:15:26.970 --> 00:15:29.630
measurements at the turbine, for instance, wind speed,
199
00:15:30.089 --> 00:15:39.264
power output of the turbine. And so that's the standard. And, basically, these this data has been locked in the past just because of regulations.
200
00:15:39.885 --> 00:15:43.100
For instance, like, if if the turbine is shut down
201
00:15:43.480 --> 00:15:48.460
because of too much energy in the grid, then yeah. In this in the in in in this case,
202
00:15:49.480 --> 00:15:56.525
you have the data to to see and locate there has been such amount of wind before this event, and you
203
00:15:57.225 --> 00:16:05.089
would have generated so and so much energy because of this grid shutdown. And that's why, basically, maybe people were
204
00:16:05.550 --> 00:16:07.410
logging data. But now,
205
00:16:07.870 --> 00:16:18.515
people also under start understanding that you can you can do more with the data. So, also, more data is logged in the newer turbines, and there are more sensors, and the sensors
206
00:16:19.020 --> 00:16:25.440
potentially can not only log 10 minute average values, but also maybe second values or sub second values.
207
00:16:26.334 --> 00:16:30.834
So, potentially, you you can get more data than you could get maybe in the past.
208
00:16:31.615 --> 00:16:35.154
And, yeah, it's like it's a physical system. The turbine is
209
00:16:35.650 --> 00:16:38.230
is is a machine, and you can you can
210
00:16:38.690 --> 00:16:40.950
grab a topic and then look into detail
211
00:16:41.330 --> 00:16:45.495
and look at the into into the data and see if you can optimize something there.
212
00:16:46.375 --> 00:16:47.435
So, yeah, so
213
00:16:48.055 --> 00:16:52.475
just to give that example again to where where you can reduce the
214
00:16:52.790 --> 00:16:53.610
the power,
215
00:16:54.230 --> 00:17:03.834
where where the where the the power of the turbine is reduced because of some regulations or because nobody is noticing it. Yeah. Maybe I can explain a little bit more how we do it. So,
216
00:17:04.295 --> 00:17:07.035
we we basically try to find datasets
217
00:17:07.975 --> 00:17:15.460
that we definitely know about that the turbine is behaving in a good way. So we filter out these datasets, and,
218
00:17:16.179 --> 00:17:18.495
we call them our training dataset.
219
00:17:19.355 --> 00:17:20.015
And then
220
00:17:20.315 --> 00:17:22.575
we train neural networks
221
00:17:22.955 --> 00:17:24.895
on this dataset. And
222
00:17:25.390 --> 00:17:29.570
we have to think about, okay, what physical system makes sense? Like, what is the input
223
00:17:30.030 --> 00:17:30.770
of that
224
00:17:31.230 --> 00:17:33.570
black box formula, and what's the output?
225
00:17:34.030 --> 00:17:35.090
And the input
226
00:17:35.695 --> 00:17:47.770
for the power output can be, of course, the wind speed, but the the energy that is contained in the in the wind is also dependent on on the density of the air, and the density
227
00:17:48.150 --> 00:17:51.690
is dependent on the temperature, for instance. So if you have a value
228
00:17:52.675 --> 00:17:55.095
a time series maybe of wind speed
229
00:17:55.395 --> 00:17:57.735
and and temperatures of the outside air,
230
00:17:58.035 --> 00:18:01.255
then you can use these 2 values as an input
231
00:18:01.770 --> 00:18:02.510
to generate
232
00:18:02.810 --> 00:18:05.790
the power, to to to simulate the power output.
233
00:18:06.330 --> 00:18:09.790
And if you have a dataset where you know, okay, the turbine is behaving correctly,
234
00:18:10.274 --> 00:18:11.894
then you can train a neural network
235
00:18:12.355 --> 00:18:14.855
on that behavior, and then you can simulate
236
00:18:15.475 --> 00:18:16.215
with new
237
00:18:16.835 --> 00:18:17.335
datasets
238
00:18:18.020 --> 00:18:24.200
how does that turbine should have behaved in that scenario, in that physical scenario. And then you can make comparisons.
239
00:18:24.740 --> 00:18:32.975
You can add some more information like status logs and and other European data and service data from the from the maintenance companies
240
00:18:33.355 --> 00:18:34.975
and mix everything together
241
00:18:36.090 --> 00:18:39.710
and create a value out of that. And then as far as the
242
00:18:40.090 --> 00:18:43.285
types of data that you're able to access from the sensors
243
00:18:43.665 --> 00:18:45.685
and the control systems and the turbine,
244
00:18:46.145 --> 00:18:50.670
what are some of the challenges that you're dealing with as far as just the data collection?
245
00:18:51.050 --> 00:18:52.750
And what is the
246
00:18:53.210 --> 00:18:54.190
level of variability
247
00:18:54.570 --> 00:18:55.070
between
248
00:18:55.610 --> 00:18:56.510
different turbines
249
00:18:56.810 --> 00:18:58.305
and different manufacturers, Yeah.
250
00:19:02.557 --> 00:19:03.057
For
251
00:19:07.550 --> 00:19:08.050
Yeah.
252
00:19:08.590 --> 00:19:11.230
For a good write, there there have been,
253
00:19:12.590 --> 00:19:15.025
companies on the market that have had,
254
00:19:15.425 --> 00:19:17.765
specialized exactly for that problem because
255
00:19:18.785 --> 00:19:19.765
every turbine
256
00:19:20.785 --> 00:19:22.245
somehow is a prototype
257
00:19:22.705 --> 00:19:23.205
because
258
00:19:23.950 --> 00:19:27.809
if maybe you let's say you you buy a turbine from manufacturer a,
259
00:19:28.270 --> 00:19:30.610
and you put it in your site,
260
00:19:31.635 --> 00:19:37.655
specific site, and then you have an additional contract for data management with another company.
261
00:19:38.370 --> 00:19:41.270
And so you can imagine how many potential
262
00:19:41.890 --> 00:19:42.390
variations
263
00:19:42.690 --> 00:19:45.670
of of combinations of manufacturers and data
264
00:19:47.495 --> 00:19:48.395
data collecting
265
00:19:49.335 --> 00:19:49.835
computers
266
00:19:50.135 --> 00:19:56.120
there are on the market. And that means that that there's a huge variety of the of the datasets.
267
00:19:56.740 --> 00:19:57.240
So
268
00:19:57.860 --> 00:20:15.380
we also had to learn that in the beginning. And you cannot assume that that if you have 1 turbine type that the data is looking always the same because you don't know if it's been generated by the same type of system. So the best way to deal with that problem is to
269
00:20:15.840 --> 00:20:20.100
to look at each and every turbine as 1 system and and
270
00:20:21.055 --> 00:20:22.515
not make cross correlations
271
00:20:22.895 --> 00:20:26.115
too early with with let's say, if you have 1 turbine
272
00:20:27.055 --> 00:20:30.960
type you want to make cross correlations with with many other turbine types
273
00:20:31.580 --> 00:20:32.460
of the same model
274
00:20:33.020 --> 00:20:34.080
sorry. So the same
275
00:20:34.460 --> 00:20:35.440
turbine type
276
00:20:35.885 --> 00:20:37.745
and make the cross correlations over that,
277
00:20:38.525 --> 00:20:40.865
you you better you're you're better set
278
00:20:41.245 --> 00:20:43.505
if you have, like, for each and every turbine,
279
00:20:44.030 --> 00:20:47.970
a specific model. And that also means, again, that you have a lot
280
00:20:48.670 --> 00:20:55.145
of machine learning models, that you have a lot of data that you need to train. There's a lot of scalability
281
00:20:55.685 --> 00:20:58.540
problems, let's say, that that you have to look to.
282
00:21:00.540 --> 00:21:06.080
And, yeah. And then then, of course, the standard data problems, you have data gaps. You have
283
00:21:06.505 --> 00:21:09.245
data points that are weird, like,
284
00:21:10.265 --> 00:21:12.365
outside temperature of 1, 000 degrees.
285
00:21:13.279 --> 00:21:15.779
So you need to handle that. Or constant
286
00:21:16.720 --> 00:21:17.220
constant,
287
00:21:17.840 --> 00:21:18.899
temperature of
288
00:21:19.440 --> 00:21:21.299
minus 10 during summer.
289
00:21:22.315 --> 00:21:23.355
Doesn't make sense also,
290
00:21:24.075 --> 00:21:30.335
degrees Celsius, of course. Yeah. And and and you need to clean your dataset. I I think every data scientist
291
00:21:30.720 --> 00:21:31.620
knows how problematic
292
00:21:32.000 --> 00:21:32.980
that can be.
293
00:21:33.520 --> 00:21:34.020
And,
294
00:21:35.200 --> 00:21:38.580
yeah, that so that that's this has really been a challenge,
295
00:21:39.445 --> 00:21:42.425
to build some some automated systems that clean
296
00:21:44.005 --> 00:21:44.505
these
297
00:21:44.805 --> 00:21:45.305
very
298
00:21:49.100 --> 00:21:49.760
these datasets
299
00:21:50.060 --> 00:21:50.460
that are,
300
00:21:51.340 --> 00:21:52.560
that have a great variety.
301
00:21:53.260 --> 00:21:57.745
And then in terms of the actual collection of the data, how are you handling
302
00:21:58.205 --> 00:21:59.825
getting it from the turbines?
303
00:22:00.285 --> 00:22:01.985
And how much of the information
304
00:22:02.285 --> 00:22:04.145
are you processing or filtering
305
00:22:05.150 --> 00:22:17.585
on the collection point versus how much you're bringing back into your core service layer for being able to do more aggregate analysis across multiple turbines? Yeah. Yeah. I think in general, you can ask yourself as a data scientist
306
00:22:18.285 --> 00:22:19.745
data science company,
307
00:22:20.285 --> 00:22:22.304
what if you if you delete data,
308
00:22:22.730 --> 00:22:29.870
you delete information. So if you if you say, okay, I don't trust this data point because it has 1, 000 degrees Celsius
309
00:22:30.325 --> 00:22:32.985
outside air temperature. And you can ask yourself, okay,
310
00:22:33.765 --> 00:22:40.830
why is that so? Is it because of the because of the real turbine control system or maybe it's a sensor.
311
00:22:41.130 --> 00:22:42.029
Maybe it is
312
00:22:42.490 --> 00:22:49.789
a calculation error during data collection, and you wanna know that because maybe that's the problem that the turbine has.
313
00:22:50.115 --> 00:22:52.534
Maybe it's a sensor. Maybe the temperature sensor
314
00:22:52.835 --> 00:22:57.255
gives you weird values, and because of that, the turbine is shutting down. So
315
00:22:57.559 --> 00:23:00.679
you need to be with data cleaning, you need to be quite
316
00:23:01.640 --> 00:23:02.700
that's a big point.
317
00:23:03.240 --> 00:23:04.860
If you wanna throw away data,
318
00:23:06.385 --> 00:23:09.665
so what we basically do, we we mark data as as,
319
00:23:10.385 --> 00:23:13.525
not trustable, let's say, and then we can later
320
00:23:13.825 --> 00:23:14.325
reanalyze,
321
00:23:15.060 --> 00:23:17.800
how maybe maybe that's because there's a sensor
322
00:23:18.180 --> 00:23:18.680
error.
323
00:23:19.220 --> 00:23:25.634
And so yeah. So we basically get everything that we that we can get, and then later we we flag data
324
00:23:25.934 --> 00:23:28.355
to be trustworthy or not. And,
325
00:23:29.215 --> 00:23:31.475
so to answer your question, I think the most
326
00:23:31.799 --> 00:23:34.860
preparation of the data is is is been done on the database
327
00:23:35.559 --> 00:23:36.460
that we have.
328
00:23:39.294 --> 00:23:42.755
Today's episode of the data engineering podcast is sponsored by Datadog,
329
00:23:43.135 --> 00:23:46.835
a SaaS based monitoring and analytics platform for cloud scale infrastructure,
330
00:23:47.400 --> 00:23:49.180
applications, logs, and more.
331
00:23:49.800 --> 00:24:00.855
Datadog uses machine learning based algorithms to detect errors and anomalies across your entire stack, which reduces the time it takes to detect and address outages and helps promote collaboration between data engineering,
332
00:24:01.394 --> 00:24:03.174
operations, and the rest of the company.
333
00:24:03.559 --> 00:24:05.179
Go to data engineering podcast.com/datadog
334
00:24:06.759 --> 00:24:13.305
today to start your free 14 day trial. And if you start a trial and install Datadog's agent, they'll send you a free t shirt.
335
00:24:15.385 --> 00:24:19.885
And then as far as the overall system architecture of Turbot,
336
00:24:20.185 --> 00:24:28.500
how have you designed the overall pipeline of being able to go from collection of that remote data at each of the individual turbines
337
00:24:28.800 --> 00:24:30.020
into your central
338
00:24:33.035 --> 00:24:36.895
dashboarding and analysis for your customers and just the overall
339
00:24:37.275 --> 00:24:45.540
life cycle of data as it propagates from the control systems in the turbine through to the analysis that you're delivering to your customers?
340
00:24:46.855 --> 00:24:47.355
Mhmm.
341
00:24:49.095 --> 00:24:50.315
So, basically, we get
342
00:24:50.695 --> 00:24:51.355
the data,
343
00:24:51.975 --> 00:24:53.435
in different time periods,
344
00:24:54.009 --> 00:24:55.950
sometimes in real time, sometimes
345
00:24:57.289 --> 00:25:03.945
every hour, sometimes every day. It depends on on on the customer and whatever the customer has set up in his turbine.
346
00:25:05.365 --> 00:25:05.525
And,
347
00:25:06.405 --> 00:25:07.785
then this data is,
348
00:25:08.725 --> 00:25:11.545
is locked into the database or written into the database.
349
00:25:12.299 --> 00:25:17.760
And then we have different jobs running on the database, cleaning the data, flagging data,
350
00:25:18.380 --> 00:25:18.860
and,
351
00:25:19.419 --> 00:25:22.175
we have jobs that that train the models
352
00:25:22.555 --> 00:25:24.175
that then yeah.
353
00:25:24.475 --> 00:25:25.995
Then jobs that that,
354
00:25:26.635 --> 00:25:28.415
generate simulation data.
355
00:25:28.960 --> 00:25:29.460
Then
356
00:25:30.800 --> 00:25:32.980
we we compare the data with,
357
00:25:33.760 --> 00:25:38.260
so so the simulated data with the real measured data,
358
00:25:39.155 --> 00:25:43.095
then we can detect, we have jobs that detect abnormalities
359
00:25:44.115 --> 00:25:45.015
in these datasets.
360
00:25:45.635 --> 00:25:47.095
And then finally,
361
00:25:48.160 --> 00:25:50.740
1 has to ask you himself, okay.
362
00:25:51.360 --> 00:25:53.140
What is really the value to the customer?
363
00:25:54.160 --> 00:26:00.815
Is it detecting abnormalities, or is it detecting an error? And what does it mean detecting error? Like,
364
00:26:01.275 --> 00:26:04.015
in in the best case, it is something like
365
00:26:04.520 --> 00:26:08.140
a real action point that you can give to your customer, for instance. Okay.
366
00:26:09.000 --> 00:26:09.980
Gearbox temperature
367
00:26:10.280 --> 00:26:12.860
has been too high for the past 2 months,
368
00:26:13.205 --> 00:26:14.105
So you better
369
00:26:14.965 --> 00:26:22.860
send out the service team to check why that is, or maybe you can even tell the customer why, the temperature is so high.
370
00:26:23.820 --> 00:26:24.320
And
371
00:26:24.780 --> 00:26:26.880
this last part, I think, is the most
372
00:26:27.500 --> 00:26:28.480
important part
373
00:26:29.020 --> 00:26:30.400
because there you really
374
00:26:30.860 --> 00:26:31.760
need to understand
375
00:26:32.425 --> 00:26:32.925
the
376
00:26:33.385 --> 00:26:35.725
the your customers. You really need to understand
377
00:26:36.185 --> 00:26:38.425
what's the problem that you're really solving. And,
378
00:26:38.905 --> 00:26:39.405
that's
379
00:26:40.140 --> 00:26:41.120
I think, also,
380
00:26:41.980 --> 00:26:44.320
as a data scientist, sometimes you need to
381
00:26:45.100 --> 00:26:47.355
maybe focus more on your customers
382
00:26:47.915 --> 00:26:52.575
than on on what you what you generate as as data sets. And,
383
00:26:53.674 --> 00:26:54.174
yeah,
384
00:26:55.150 --> 00:26:58.370
you you really need to understand what what are you delivering to your customer.
385
00:26:58.670 --> 00:27:00.130
And on that point too,
386
00:27:00.510 --> 00:27:01.650
how much of
387
00:27:02.005 --> 00:27:09.705
a feedback cycle are you able to build with the Turbot system as far as being able to determine some of these
388
00:27:10.210 --> 00:27:11.030
turbine misalignments,
389
00:27:11.330 --> 00:27:16.710
are you able to then feed that back into the turbine itself to be able to automate some of that correction?
390
00:27:17.144 --> 00:27:30.880
Or does it require generating a notification to your customer who's managing the turbine and the wind farms to then be able to do their own maintenance or operations as far as bringing the turbines into alignment and things like that?
391
00:27:31.660 --> 00:27:32.560
Yeah. So
392
00:27:33.100 --> 00:27:34.320
if you generate
393
00:27:35.245 --> 00:27:38.225
some some action points for your customers, they basically
394
00:27:38.924 --> 00:27:42.865
get an email or a pop up message in our web tool or
395
00:27:43.740 --> 00:27:45.600
in the app and and and then,
396
00:27:46.300 --> 00:27:53.440
they they can they can understand, okay, I have something to solve here, then they can put it in their own schedule
397
00:27:54.525 --> 00:27:57.185
and, solve the problem. And after that,
398
00:27:57.565 --> 00:27:58.945
they can divide
399
00:27:59.325 --> 00:28:00.305
give us feedback.
400
00:28:00.925 --> 00:28:05.570
So there are some basic questions like how how helpful has this been to you or
401
00:28:06.190 --> 00:28:09.409
how relevant has this been to you so that in the next iteration,
402
00:28:10.055 --> 00:28:10.875
we can then,
403
00:28:12.055 --> 00:28:15.115
flag these detected events and and understand,
404
00:28:15.815 --> 00:28:16.315
okay,
405
00:28:16.775 --> 00:28:22.690
if we show this kind of error, how relevant was it to the customer or how how good were we with the prediction
406
00:28:23.150 --> 00:28:25.010
so that we can then improve
407
00:28:25.715 --> 00:28:27.415
the way how we do stuff
408
00:28:27.955 --> 00:28:33.175
or use that labeled data to retrain other neural networks to do some optimizations.
409
00:28:34.010 --> 00:28:39.230
And then the other question too, as far as being able to build useful notifications
410
00:28:39.690 --> 00:28:42.270
is having the necessary domain knowledge
411
00:28:42.855 --> 00:28:49.274
of how the turbines work and the atmospheric conditions that contribute to different performance outcomes.
412
00:28:49.870 --> 00:28:53.730
And I know that you mentioned that you have some background of doing
413
00:28:54.030 --> 00:29:05.434
research and working with turbines. But what are some of the other ways that you're incorporating some of that domain knowledge into your product to ensure that you're able to provide the most value to your customers?
414
00:29:06.050 --> 00:29:07.590
Yeah. I think there are some
415
00:29:08.370 --> 00:29:11.030
some things that you really need to know,
416
00:29:11.650 --> 00:29:12.790
that you have to learn
417
00:29:13.170 --> 00:29:14.630
also as a data scientist.
418
00:29:15.585 --> 00:29:16.085
Like,
419
00:29:16.465 --> 00:29:19.765
let's just give an example. Like, if you don't know that the turbine
420
00:29:20.705 --> 00:29:22.485
has a has a limit,
421
00:29:23.585 --> 00:29:28.230
limited power output. So if there's a lot of wind, the turbine will never produce
422
00:29:29.090 --> 00:29:31.429
more power than, let's say, 3 megawatts.
423
00:29:32.394 --> 00:29:41.455
And it depends on the man manufacturing turbine type. And if you don't know that, then you might think, oh, maybe the turbine is not generating enough
424
00:29:41.800 --> 00:29:42.300
power
425
00:29:42.720 --> 00:29:43.220
and
426
00:29:43.640 --> 00:29:47.580
this is like just an example of some domain knowledge that you
427
00:29:47.880 --> 00:29:52.075
need to know in order to to train the networks correctly and to
428
00:29:52.375 --> 00:29:54.635
to to make the right conclusions out of your
429
00:29:55.015 --> 00:29:57.035
data and, data analytics.
430
00:29:57.440 --> 00:30:00.500
And then sometimes there's also stuff that you
431
00:30:01.919 --> 00:30:04.580
or problems that you cannot really know
432
00:30:05.025 --> 00:30:08.485
if you don't have 20 years of experience as a turbine technician.
433
00:30:08.785 --> 00:30:13.685
And in these cases, we we just have a network of other companies that we work with,
434
00:30:14.309 --> 00:30:14.809
and,
435
00:30:15.190 --> 00:30:16.170
we can then
436
00:30:16.550 --> 00:30:20.630
give them that problem, and then they can analyze it. And,
437
00:30:21.429 --> 00:30:22.730
together with our customers,
438
00:30:23.325 --> 00:30:24.305
they then can
439
00:30:24.845 --> 00:30:25.665
make the
440
00:30:26.285 --> 00:30:29.985
decisions what to do next with that kind of very special problem.
441
00:30:30.340 --> 00:30:31.640
And then as far as
442
00:30:32.020 --> 00:30:41.155
the work that you're doing to build out this product, what are you finding to be some of the most challenging aspects of building an analytics solution for the wind energy sector?
443
00:30:41.455 --> 00:30:42.115
I think
444
00:30:42.495 --> 00:30:45.715
handling so much different data sources is the
445
00:30:47.549 --> 00:30:50.690
was and is the the the biggest problem.
446
00:30:51.549 --> 00:30:54.505
And the second is the the quality
447
00:30:55.445 --> 00:30:56.265
of your data.
448
00:30:56.885 --> 00:30:57.284
And,
449
00:30:57.845 --> 00:31:00.325
you really make you really want to make your
450
00:31:01.830 --> 00:31:03.930
you you want to build a data lake
451
00:31:04.310 --> 00:31:04.810
and
452
00:31:06.790 --> 00:31:08.490
not a data sump.
453
00:31:09.510 --> 00:31:11.465
I don't know if that's a correct word. But,
454
00:31:12.265 --> 00:31:14.924
yeah, you you wanna have a good data pool,
455
00:31:15.225 --> 00:31:19.164
and that's really hard with so many different data sources
456
00:31:19.950 --> 00:31:24.370
that you cannot really trust. And yeah. And then maybe another thing is also making things scalable
457
00:31:25.070 --> 00:31:30.665
is a hard thing. You have different connections to very different to many different turbines,
458
00:31:31.445 --> 00:31:35.465
and Internet connections are breaking down very often. And
459
00:31:36.485 --> 00:31:37.679
this is, like, really
460
00:31:37.980 --> 00:31:38.799
a huge problem.
461
00:31:39.260 --> 00:31:47.285
And are there any particular technologies that you've been able to lean on to help with some of that scalability problem in terms of being able to
462
00:31:47.825 --> 00:31:55.110
handle the data collection and ensure that you're able to get reliable throughput? Yeah. We're we're working together with,
463
00:31:55.510 --> 00:31:57.370
company called, Swarm 64.
464
00:31:58.070 --> 00:31:58.970
And they
465
00:31:59.350 --> 00:31:59.850
basically
466
00:32:00.815 --> 00:32:02.514
managed us to to handle
467
00:32:03.054 --> 00:32:10.195
a lot of data in real time. And and with real time, I really mean real time, like 1 second or sub second values.
468
00:32:10.720 --> 00:32:11.220
And,
469
00:32:12.080 --> 00:32:13.460
they they help us
470
00:32:13.920 --> 00:32:18.100
to solve that scalability problem if you get more data
471
00:32:18.404 --> 00:32:24.825
and even so much data that you, yeah, that that you cannot handle it with usual databases any longer.
472
00:32:25.205 --> 00:32:27.065
And what we also want to
473
00:32:27.730 --> 00:32:31.270
achieve, we want to give feedback to the turbine
474
00:32:31.570 --> 00:32:32.950
in real time. And
475
00:32:33.650 --> 00:32:35.030
for instance, that could be
476
00:32:35.524 --> 00:32:40.424
that you have, 1 turbine standing in front of the wind park, and it's getting wind gust.
477
00:32:40.804 --> 00:32:41.205
And,
478
00:32:42.005 --> 00:32:55.025
that gust is moving through the wind park, And then the first turbine is telling the other turbines, okay, there's coming a wind gust, and you should better behave like this or like that. And, this information is then sent back to Turbot.
479
00:32:55.965 --> 00:33:05.500
The the algorithms are giving the best way how to yaw and pitch the other turbines in the wind park, and, all that is happening in real time. And,
480
00:33:05.900 --> 00:33:16.535
for that, you really need to handle a lot of data very fast. And for cases where you have maybe some sort of weather system coming through an area, are you then also able
481
00:33:17.350 --> 00:33:28.555
to feed that information to other installations of turbines that might be in the path of the weather event in terms of being able to improve their energy output or,
482
00:33:29.175 --> 00:33:29.675
maybe
483
00:33:30.135 --> 00:33:33.140
throttle them so that it prevents potential damage if they're,
484
00:33:33.679 --> 00:33:35.780
especially high wind gust or things like that?
485
00:33:36.640 --> 00:33:37.860
Yeah. Of course. Like,
486
00:33:38.400 --> 00:33:40.340
if there's a very momentarily
487
00:33:40.880 --> 00:33:44.745
wind gust coming to the wind park, you you you could potentially do that.
488
00:33:45.605 --> 00:33:48.745
If there's a huge weather system coming,
489
00:33:49.445 --> 00:33:50.585
that's mainly part,
490
00:33:51.470 --> 00:33:52.290
of the
491
00:33:52.910 --> 00:33:55.570
that that's mainly the job of the grid operators,
492
00:33:56.750 --> 00:33:57.250
or
493
00:33:57.630 --> 00:33:59.570
yeah. Mainly that because
494
00:34:00.014 --> 00:34:10.540
they they need to shut down some turbines in advance because they're knowing, okay, we are gonna produce a lot of energy, and that's too much for the grid. So let's better shut down some of some of the turbines.
495
00:34:11.080 --> 00:34:13.820
And that's actually happening quite often in Germany,
496
00:34:14.335 --> 00:34:15.855
especially in the north and the
497
00:34:16.255 --> 00:34:17.235
at the seaside.
498
00:34:18.175 --> 00:34:22.835
There are some turbines that are shut off 50% of the time, and nobody's using
499
00:34:23.135 --> 00:34:27.299
the energy that that the turbine could potentially generate during these times.
500
00:34:27.760 --> 00:34:30.819
That's another interesting aspect to this system is
501
00:34:31.200 --> 00:34:33.539
the energy storage and energy distribution
502
00:34:34.465 --> 00:34:35.685
capability. I'm wondering
503
00:34:36.225 --> 00:34:42.645
how that factors into some of the decision making that you provide to the turbine operators as far as
504
00:34:43.190 --> 00:34:45.210
ways to ensure that they aren't,
505
00:34:45.670 --> 00:34:49.770
generating excess energy that's going to just get dumped or generating
506
00:34:50.295 --> 00:35:03.569
excess energy that is going to potentially overload their grids or storage systems and ways that you're able to maybe bring that information into the overall equation or some of the other external data sources that you're able to rely on to feed into your models.
507
00:35:04.109 --> 00:35:04.690
I mean,
508
00:35:05.150 --> 00:35:05.650
yeah.
509
00:35:06.190 --> 00:35:09.535
You're right. It's it's quite interesting, and there's tons of
510
00:35:09.995 --> 00:35:13.055
topics and and problems that you could potentially solve.
511
00:35:14.715 --> 00:35:18.810
This particular problem that you're mentioning right right now, we're not solving at the moment.
512
00:35:20.310 --> 00:35:25.145
As far as I know, there are other companies around that that do that that that have specialized
513
00:35:26.325 --> 00:35:26.985
on predicting
514
00:35:28.005 --> 00:35:34.025
weather in the future and predicting the power output of for the grid operators, and then you can
515
00:35:34.569 --> 00:35:35.069
trade
516
00:35:35.450 --> 00:35:37.549
the day head auctions for electricity.
517
00:35:38.569 --> 00:35:39.790
And that's that's
518
00:35:40.809 --> 00:35:43.470
totally another problem that you wanna solve there.
519
00:35:43.775 --> 00:35:48.115
And for us, it's more the the operation of the turbine. And
520
00:35:48.575 --> 00:35:50.595
if you have the turbine running,
521
00:35:50.895 --> 00:35:53.710
let it run the best way it can. And,
522
00:35:54.570 --> 00:35:55.390
I mean, yeah,
523
00:35:55.849 --> 00:35:58.270
I see a lot of potential in in analyzing
524
00:35:59.050 --> 00:36:02.015
also also that kind of data. And and we're also
525
00:36:02.555 --> 00:36:03.855
getting weather information,
526
00:36:05.115 --> 00:36:07.694
data from from a third party source.
527
00:36:08.474 --> 00:36:08.974
But
528
00:36:09.470 --> 00:36:18.610
this is more because we wanna understand the operation of the turbine better and and make the operation of the turbine and the service maintenance better of that turbine.
529
00:36:19.214 --> 00:36:21.954
And then as far as your overall experience
530
00:36:22.415 --> 00:36:32.440
of building out Turbot systems, both from the technical and business aspects, what have you found to be some of the most interesting or unexpected or challenging lessons learned in the process?
531
00:36:33.380 --> 00:36:34.120
Yeah. I think
532
00:36:35.065 --> 00:36:37.005
talking about what I said earlier,
533
00:36:37.545 --> 00:36:40.605
like, the the last question is, I think it's focus,
534
00:36:40.984 --> 00:36:43.165
especially when you're starting a company.
535
00:36:43.600 --> 00:36:46.420
It's quite hard to also as a scientist,
536
00:36:47.120 --> 00:36:57.595
you have so many ideas and you know that so many things are potentially working out. But in order for to bring something to the market, you really need to focus and you really need to understand,
537
00:36:58.375 --> 00:37:05.720
what problem you're solving. And you need to concentrate on on maybe 1 problem first and and do that the best way you can.
538
00:37:06.020 --> 00:37:08.120
And then later on, you can add
539
00:37:08.465 --> 00:37:08.965
more
540
00:37:09.265 --> 00:37:10.645
problems that you solve.
541
00:37:11.025 --> 00:37:15.930
I think that was the the biggest lesson of of the past years. Yeah.
542
00:37:16.650 --> 00:37:41.690
And as you look toward the near to medium term of what you're building out both technically and in the business, what are some of the things that you have planned that you're most excited about or overall trends in the energy sector or technology capabilities that you're looking forward to try and incorporate or take advantage of? Yeah. I'm I'm I'm basically very excited about how much other problems there are in in this data that you could potentially
543
00:37:42.525 --> 00:37:43.345
solve. And
544
00:37:43.724 --> 00:37:44.944
the more we are growing,
545
00:37:45.565 --> 00:37:49.905
and we are able to handle to and and and and to manage
546
00:37:50.270 --> 00:37:51.170
all these different
547
00:37:51.470 --> 00:37:51.970
problems,
548
00:37:52.830 --> 00:37:55.010
the more I'm I'm looking forward because,
549
00:37:55.470 --> 00:37:56.070
yeah, this
550
00:37:56.510 --> 00:37:58.050
it's it's it's really fun.
551
00:37:58.885 --> 00:38:00.345
And basically, the this
552
00:38:00.964 --> 00:38:02.425
is really the real time
553
00:38:03.045 --> 00:38:09.920
control algorithms for the turbine that that fascinate me the most. And I think there's that there's a great, potential,
554
00:38:10.380 --> 00:38:12.800
in the real time operation of of the turbines.
555
00:38:13.420 --> 00:38:14.560
But sometimes it's
556
00:38:14.955 --> 00:38:15.855
sometimes it's
557
00:38:16.875 --> 00:38:21.355
the the basic things that that give the mace the the the most,
558
00:38:21.755 --> 00:38:22.255
value.
559
00:38:23.060 --> 00:38:23.540
And,
560
00:38:24.020 --> 00:38:25.560
it's sometimes technically
561
00:38:25.860 --> 00:38:27.480
not so fancy, but,
562
00:38:28.020 --> 00:38:32.040
you're just solving a basic problem, and that has a great value for your customers.
563
00:38:32.835 --> 00:38:33.335
And,
564
00:38:34.115 --> 00:38:35.015
sometimes that's
565
00:38:35.555 --> 00:38:37.095
that's the the better things.
566
00:38:37.555 --> 00:38:55.015
Well, for anybody who wants to get in touch with you or follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And then as a final question, I would just like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today. I think the biggest gap is really the
567
00:38:55.474 --> 00:38:56.214
the handling
568
00:38:56.595 --> 00:38:59.880
of how to clean the data. So you have
569
00:39:00.420 --> 00:39:02.760
great packages like, let's say, TensorFlow
570
00:39:03.380 --> 00:39:07.815
with which you can train models easily and you can
571
00:39:08.595 --> 00:39:11.235
do almost everything with that. But there's not
572
00:39:12.195 --> 00:39:16.350
I don't know if it's possible, but, there's nothing like a general solution
573
00:39:16.650 --> 00:39:18.430
for cleaning datasets.
574
00:39:18.970 --> 00:39:23.470
I would wish that there's some sort of some some sort of a solution for that.
575
00:39:24.635 --> 00:39:27.994
And maybe it doesn't exist because it's too complicated. But,
576
00:39:28.474 --> 00:39:31.695
I would be super happy if there's a package that does that for you.
577
00:39:33.250 --> 00:39:36.470
Yeah. I'm sure that plenty of people would be happy to see that as well.
578
00:39:36.770 --> 00:39:37.270
Yeah.
579
00:39:38.450 --> 00:39:55.895
Alright. Well, thank you very much for taking the time today to join me and discuss the work that you've been doing with Turbot Systems. It's definitely very interesting problem domain and an interesting technical solution that you're building for it. So I appreciate all the time and energy you've put into that, and I hope you enjoy the rest of your day. Thank you very much. I enjoyed it.
580
00:40:01.315 --> 00:40:04.560
For listening. Don't forget to check out our other show, podcast.init@pythonpodcast.com
581
00:40:07.100 --> 00:40:11.355
to learn about the Python language, its community, and the innovative ways it is being used.
582
00:40:11.755 --> 00:40:13.135
And visit the site at dataengineeringpodcast.com
583
00:40:14.475 --> 00:40:23.920
to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show, then tell us about it. Email hosts at data engineering podcast.com
584
00:40:24.460 --> 00:40:29.760
with your story. And to help other people find the show, please leave a review on Itunes and tell your friends and coworkers.
NOTE
Transcription provided by Podhome.fm
Created: 7/6/2024 1:33:45 PM
Duration: 2448.072
Channels: 1
1
00:00:13.955 --> 00:00:17.974
Hello, and welcome to the data engineering podcast, the show about modern data management.
2
00:00:18.500 --> 00:00:22.440
What are the pieces of advice that you wish you had received early in your career of data engineering?
3
00:00:23.140 --> 00:00:33.114
If you hand a book to a new data engineer, what wisdom would you add to it? I'm working with O'Reilly on a project to collect the 97 things that every data engineer should know, and I need your help.
4
00:00:33.495 --> 00:00:35.114
Go to data engineering podcast.com/90
5
00:00:36.480 --> 00:00:55.200
7 things to add your voice and share your hard earned expertise. And when you're ready to build your next pipeline or want to test out the projects you hear about on the show, you'll need somewhere to deploy it. So check out our friends over at Linode. With their managed Kubernetes platform, it's now even easier to deploy and scale your workflow, so try out the latest Helm charts from tools like Pulsar, Packaderm, and Daxter.
6
00:00:55.760 --> 00:01:02.980
With simple pricing, fast networking, object storage, and worldwide data centers, you've got everything you need to run a bulletproof data platform.
7
00:01:03.295 --> 00:01:04.915
Go to data engineering podcast.com/linode,
8
00:01:06.415 --> 00:01:08.034
that's l I n
9
00:01:08.335 --> 00:01:15.950
o d e, today and get a $60 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show.
10
00:01:16.329 --> 00:01:19.789
You listen to this show to learn and stay up to date with what's happening in databases,
11
00:01:20.225 --> 00:01:24.564
streaming platforms, big data, and everything else you need to know about modern data management.
12
00:01:25.024 --> 00:01:34.979
For more opportunities to stay up to date, gain new skills, and learn from your peers, there are a growing number of virtual events that you can attend from the comfort and safety of your home. Go to data engineering podcast.com/conferences
13
00:01:36.605 --> 00:01:40.385
to check out the upcoming events being offered by our partners and get registered today.
14
00:01:40.685 --> 00:01:52.915
Your host is Tobias Macy. And today, I'm interviewing Michael Tiegmeier about Turbot, a machine learning powered platform for performance monitoring of wind farms. So, Michael, can you start by introducing yourself? Hi. Yeah. Of course. So, yeah, I'm,
15
00:01:53.455 --> 00:01:57.795
Michael, and I'm, the founder and CEO of, Turbot Systems.
16
00:01:58.095 --> 00:02:00.915
And we are basically a data analytics platform
17
00:02:01.579 --> 00:02:03.200
for wind turbines.
18
00:02:04.140 --> 00:02:04.640
And,
19
00:02:05.340 --> 00:02:06.320
we have built,
20
00:02:06.780 --> 00:02:10.959
some tools to make the the maintenance and operation of wind farms
21
00:02:11.295 --> 00:02:12.034
more efficient.
22
00:02:12.334 --> 00:02:14.915
And, yeah, I'm looking forward to a great conversation.
23
00:02:15.614 --> 00:02:18.754
And do you remember how you first got involved in the area of data management?
24
00:02:19.295 --> 00:02:19.795
Yeah.
25
00:02:20.540 --> 00:02:21.280
Of course.
26
00:02:21.580 --> 00:02:22.080
So
27
00:02:22.620 --> 00:02:23.980
my my education is,
28
00:02:24.460 --> 00:02:28.960
is a physicist. So I somehow always was dealing with data in
29
00:02:29.635 --> 00:02:36.855
in the university when we met made some experiments. And then also you had to write some programs to analyze data, of course.
30
00:02:37.314 --> 00:02:37.814
So
31
00:02:38.490 --> 00:02:41.230
that that's, like, 1 part where I got
32
00:02:41.530 --> 00:02:45.230
confronted with some data, let's say, and also maybe some super complicated,
33
00:02:45.770 --> 00:02:46.270
data.
34
00:02:47.055 --> 00:02:48.435
And in my studies,
35
00:02:48.975 --> 00:03:04.920
I I wrote my bachelor's thesis about some measurements I did on on wind turbines. So as, you should know, wind turbines need to be directed into the wind in order to generate power. And, what I was doing in that, bachelor's thesis was
36
00:03:05.315 --> 00:03:11.095
I was, making measurements with a laser system. It's called LIDAR, so light detection and ranging.
37
00:03:11.635 --> 00:03:14.200
And, with that laser system, you could
38
00:03:14.599 --> 00:03:15.739
see the
39
00:03:17.239 --> 00:03:23.825
wind direction and the wind speed in front of the turbine. So that means that the turbine could or that
40
00:03:24.125 --> 00:03:26.465
was, at that time, that was somehow the,
41
00:03:27.085 --> 00:03:29.665
the vision to control the turbine
42
00:03:30.890 --> 00:03:36.430
before the wind is actually at the turbine. So see what's coming in before the turbine and then
43
00:03:36.810 --> 00:03:45.205
have some control algorithms that that turn the turbine in the right direction and the pitch of the rotor blades and and and the right angles before the turbine actually,
44
00:03:45.605 --> 00:03:52.469
before the wind is actually at the turbine. So but the measurement itself was, like, of course, again, with a lot of data, and you had to
45
00:03:52.849 --> 00:03:56.069
match these data of this new LIDAR system
46
00:03:56.515 --> 00:04:02.135
with the data from the turbine, like when is the turbine running, what wind speeds were measured with
47
00:04:02.435 --> 00:04:07.420
the LIDAR system, what wind speeds were maybe measured with some other systems, like some
48
00:04:07.720 --> 00:04:09.580
anemometers on top of the turbine.
49
00:04:09.959 --> 00:04:10.459
And,
50
00:04:11.565 --> 00:04:12.305
yeah, and
51
00:04:12.685 --> 00:04:14.465
so to to make the story,
52
00:04:15.005 --> 00:04:17.425
maybe to up until I get to Turbot,
53
00:04:18.205 --> 00:04:21.580
later on then, I was still doing some some physics.
54
00:04:22.120 --> 00:04:31.565
I I also might, I I I made my my master in in in laser physics, but this time, in in pulse shaping, temporal and and space pulse shaping.
55
00:04:32.025 --> 00:04:34.604
And we did this this with splitting
56
00:04:34.985 --> 00:04:35.725
a laser
57
00:04:36.264 --> 00:04:36.764
beam
58
00:04:37.065 --> 00:04:42.030
into its parts, into its different frequencies. And then you could control each frequency,
59
00:04:42.409 --> 00:04:45.069
the polarization and and the amplitude of that
60
00:04:45.595 --> 00:04:52.495
frequency. And then you put it the laser together, and then you could have a laser pulse that that's formed in in the time domain.
61
00:04:52.960 --> 00:05:01.300
So, like, there's coming a lot of energy in the beginning of the laser poles and then maybe later a little bit more. And what we were doing there is that we were
62
00:05:01.865 --> 00:05:05.565
trying to get electrons out of out of the atoms,
63
00:05:06.025 --> 00:05:13.430
and we didn't really know how to push the electron to to to put it in in some easier words, maybe.
64
00:05:14.130 --> 00:05:21.815
So the the electron is moving in the in the atom, and and at some point in time, you need to push the electron out. So and we didn't really know how to
65
00:05:22.115 --> 00:05:24.695
form that pulse. And we did this
66
00:05:24.995 --> 00:05:28.855
then with trial and error and basically with a genetic algorithm,
67
00:05:29.569 --> 00:05:31.830
And that was the first time where I
68
00:05:32.210 --> 00:05:32.870
have seen
69
00:05:33.409 --> 00:05:34.370
the power of,
70
00:05:35.090 --> 00:05:36.229
yeah, such algorithms.
71
00:05:36.944 --> 00:05:40.564
And I got super interested in in in, yeah, the power of,
72
00:05:41.025 --> 00:05:44.965
what you could do with data analytics and and, let's say, the the first
73
00:05:45.680 --> 00:05:48.340
idea of machine learning. It's not really machine learning, but,
74
00:05:48.960 --> 00:06:01.245
first algorithm that that comes into searching how how a computer is finding something out that you don't know about. And then later you try as a physicist to understand, okay, what what was going on there? Why did it work? Why did we
75
00:06:01.625 --> 00:06:07.020
why were we able to put the electron out of the atom just with this form of pulse?
76
00:06:07.640 --> 00:06:08.140
And,
77
00:06:08.840 --> 00:06:13.815
yeah, that was quite interesting. And and then later with with my knowledge about,
78
00:06:14.755 --> 00:06:15.495
wind energy,
79
00:06:15.955 --> 00:06:16.615
I was,
80
00:06:17.555 --> 00:06:19.335
I was thinking, like, what to do,
81
00:06:20.660 --> 00:06:23.080
what to do in in life after after studying.
82
00:06:23.780 --> 00:06:30.755
And I was looking for something that maybe is also making it sounds maybe stupid, but making the world a little little bit better.
83
00:06:31.375 --> 00:06:32.595
And I came
84
00:06:33.134 --> 00:06:40.259
to renewable energies, and and I I found wind energy most interesting because parts are turning. You have a lot of data.
85
00:06:40.639 --> 00:06:45.860
You have it's it's international. You you can go around the world, and and
86
00:06:46.395 --> 00:06:47.375
it has everything
87
00:06:47.835 --> 00:06:48.655
that you need
88
00:06:49.354 --> 00:06:49.854
for
89
00:06:50.235 --> 00:06:53.435
for your brain to to have some interesting things to work on. And,
90
00:06:54.169 --> 00:06:55.850
so, yeah, that's how
91
00:06:56.650 --> 00:07:05.795
also, I decided to to found Turbot Systems because actually, this is a kind quite a interesting story, maybe. The first time I got on the turbine
92
00:07:06.415 --> 00:07:06.915
was
93
00:07:07.215 --> 00:07:15.740
with this lighter measurements, and and I I got quite dizzy, like some sort of seasick. Like, turbine tower is, like, 100 meters high, and,
94
00:07:16.280 --> 00:07:20.300
when you're at the top and the turbine is switched off or is even running,
95
00:07:20.995 --> 00:07:25.575
there's a lot of movement of the tower. And if you don't cannot look outside
96
00:07:25.955 --> 00:07:29.920
because you are inside of the tower, you you get seasick. And I was thinking, okay,
97
00:07:30.540 --> 00:07:32.240
if there's so much vibration
98
00:07:32.620 --> 00:07:36.560
due to the wind, then, of course, you need need to see some some
99
00:07:37.104 --> 00:07:40.005
some wind direction also in in in the wind movement.
100
00:07:40.705 --> 00:07:41.205
And,
101
00:07:41.665 --> 00:07:45.125
then together with some mates from the university, I I was
102
00:07:45.690 --> 00:07:47.389
looking to that problem
103
00:07:48.169 --> 00:07:52.990
more deeply. And I found out, yeah, there's there's a relation between the wind direction
104
00:07:53.574 --> 00:07:55.675
and the type of the movement of the tower.
105
00:07:56.134 --> 00:07:56.615
And,
106
00:07:56.935 --> 00:07:59.354
that also meant that you could maybe,
107
00:08:00.134 --> 00:08:02.794
see the wind direction more precisely. And
108
00:08:03.400 --> 00:08:06.380
so we did some measurements, and this is how I came
109
00:08:06.920 --> 00:08:13.875
to, to found Tervit, actually. And then later, we became more a data scientist sites company.
110
00:08:14.255 --> 00:08:18.355
Yeah. It's definitely a very interesting problem domain because as you said, wind energy
111
00:08:18.735 --> 00:08:19.875
is ubiquitous
112
00:08:20.390 --> 00:08:28.650
in terms of its availability around the world because the air is always moving. So it's something that can provide a lot of benefit, particularly
113
00:08:28.974 --> 00:08:31.235
for countries who are just starting to
114
00:08:31.615 --> 00:08:42.960
build out their renewable infrastructure. I know that Germany has been using wind energy fairly heavily for a number of years at this point. So I'm sure that that also helped in terms of access to be able to
115
00:08:43.340 --> 00:08:50.235
build out your product while being able to sort of remain local and do things, within your home country. Yeah. Totally. Like,
116
00:08:50.695 --> 00:08:56.930
apart from Denmark I hope I'm not saying too much wrong here, but, apart from Denmark, Germany has been
117
00:08:57.490 --> 00:09:00.710
quite early in in wind energy. And, of course, Germany is
118
00:09:01.089 --> 00:09:02.149
always known as
119
00:09:02.610 --> 00:09:04.850
a engineering country, maybe. And,
120
00:09:05.825 --> 00:09:08.404
yeah, like, Wit Energy has been here
121
00:09:08.705 --> 00:09:09.845
at my home.
122
00:09:11.105 --> 00:09:16.050
I'm I'm born in Bremen, and and we have a lot of wind turbines there and also around Berlin.
123
00:09:16.510 --> 00:09:20.370
In Brandenburg, there are there are many, many turbines. And I could see
124
00:09:20.795 --> 00:09:24.895
that they exist, let's say. And then, of course, Germany is a good country
125
00:09:25.355 --> 00:09:26.095
to build
126
00:09:26.395 --> 00:09:31.759
a new company for wind energy, I think, because of all the resources that you have here and the knowledge
127
00:09:32.459 --> 00:09:34.480
and and the connections that you can potentially,
128
00:09:35.019 --> 00:09:37.120
get. On the other hand, also, maybe Germany's,
129
00:09:38.220 --> 00:09:43.185
very special in in the wind energy domain because of its history. And
130
00:09:43.645 --> 00:09:45.025
that's actually a good thing.
131
00:09:45.405 --> 00:09:51.700
Wind turbines are owned in Germany by many, many people. There's, like, this this thing, it's which is called, so,
132
00:09:53.520 --> 00:10:04.175
energy for the for the people, let's say. And then small cities, they they invest with many people in in a wind turbine, and then they profit from it, financially. And this concept,
133
00:10:04.555 --> 00:10:08.175
maybe you don't see so much in other countries like the USA or
134
00:10:08.620 --> 00:10:12.400
China where where there's more, like, big manufacturers that own
135
00:10:12.780 --> 00:10:13.920
big wind farms
136
00:10:14.380 --> 00:10:14.780
and,
137
00:10:15.665 --> 00:10:21.605
yeah. And so for Turbot Systems in particular, I know that 1 of the main focuses
138
00:10:21.985 --> 00:10:41.095
of the product that you're building out is to help improve the overall operating efficiency of the turbines, both individually and in aggregate. So I'm wondering if you can just talk a bit more about some of the ways that you're helping to optimize the output and some of the most problematic factors that contribute to performance
139
00:10:41.790 --> 00:10:47.410
degradation in wind turbines and in oh, and both individually and in aggregate? Mhmm. Yeah.
140
00:10:49.055 --> 00:10:52.014
So basically, a wind turbine is like a plane. So,
141
00:10:52.815 --> 00:10:55.315
it has wings, which we maybe call
142
00:10:55.694 --> 00:10:56.194
rotors,
143
00:10:56.610 --> 00:10:57.910
and they are directed
144
00:10:58.450 --> 00:11:06.895
they they must be shaped in a very special way, and they must be directed into the winds while the turbine is turning in a very directed
145
00:11:08.075 --> 00:11:08.575
into
146
00:11:09.755 --> 00:11:10.255
the
147
00:11:11.435 --> 00:11:11.935
the
148
00:11:13.115 --> 00:11:13.615
wind,
149
00:11:15.160 --> 00:11:17.660
directed into the the wind,
150
00:11:18.760 --> 00:11:33.220
so that, actually, the turbines are facing into the wind, which which we call yaw. And and both these pitch and yawing needs to be, yeah, optimal in order to to get all of the energy out of the wind. And so
151
00:11:33.600 --> 00:11:38.660
when I was talking about the lighter system, and my bachelor see this, the the goal was to correct
152
00:11:39.805 --> 00:11:46.225
the the way the turbine is turning into the wind. So the problem is that in in big wind farms, in big yeah.
153
00:11:46.560 --> 00:11:47.459
In big wind
154
00:11:47.839 --> 00:11:52.180
farms, you have other turbines in the wind park that are creating turbulences,
155
00:11:52.959 --> 00:11:54.180
and you have maybe
156
00:11:55.055 --> 00:12:03.795
sites that or a forest at at some sort of some some part of the wind park that is redirecting the wind in a weird way, let's say. And you want to make sure that
157
00:12:04.210 --> 00:12:06.710
the turbine algorithm or the turbine behavior
158
00:12:07.090 --> 00:12:09.430
is always in such a way that it gets the maximum
159
00:12:09.810 --> 00:12:20.965
possible power output. So that's directed correctly into the wind and also with the pitch pitch systems. So but if you have a measurement of the wind on top of the nacelle that's behind the rotor plane,
160
00:12:21.410 --> 00:12:28.470
then you always have some arrows and you wanna be able to correct this. And in addition to that, it's like a very simple problem
161
00:12:28.935 --> 00:12:41.440
that sometimes the technicians that go up and and and put the anemometer that's measure measuring the wind's direction on top of the nacelle, they do this with an arrow and sometimes with, like, more than 5 to 10 degrees, and nobody's
162
00:12:41.899 --> 00:12:43.920
detecting that. And then you have a
163
00:12:45.095 --> 00:13:04.165
a bad performance of the of the turbine. So this is how we started with, as I said, with the vibration measurements. But then later on, going more in the the in the data analytics part, well, you get it you you get a lot of information from the turbine or potentially get it. So the turbine is logging a lot of data like wind speed, wind direction,
164
00:13:04.944 --> 00:13:05.444
temperature
165
00:13:05.985 --> 00:13:09.365
of the outside air, then temperatures of the gearbox,
166
00:13:09.824 --> 00:13:14.600
temperature, like, a lot of data up to 500 different values. And
167
00:13:14.980 --> 00:13:15.800
up to now
168
00:13:16.660 --> 00:13:18.040
or maybe the past
169
00:13:18.415 --> 00:13:21.075
up to the past 2 years, nobody really
170
00:13:21.455 --> 00:13:23.635
analyzed this data, these these
171
00:13:24.015 --> 00:13:24.915
huge datasets.
172
00:13:25.695 --> 00:13:29.140
So another thing that we found out is that
173
00:13:29.520 --> 00:13:31.380
sometimes the turbine is
174
00:13:32.720 --> 00:13:35.220
operating in the in the throttle mode
175
00:13:35.895 --> 00:13:37.915
that nobody knows about. So
176
00:13:38.855 --> 00:13:40.235
sometimes because of regulations,
177
00:13:40.775 --> 00:13:41.495
because of,
178
00:13:42.214 --> 00:13:44.475
noise regulations, the turbine should not
179
00:13:44.790 --> 00:13:46.089
produce much
180
00:13:46.390 --> 00:13:52.730
power or is producing less power than it actually could. And sometimes these turbines go into these
181
00:13:53.125 --> 00:13:53.865
noise modes,
182
00:13:54.645 --> 00:13:59.385
without anybody knowing it. And so we figured out, okay, let's let's do some
183
00:13:59.765 --> 00:14:12.485
general analyzation of the normal behavior of a turbine, and let's look if there's something that we can find with turbine is not behaving in a normal way. And that's, like, that's totally a data analytics
184
00:14:12.945 --> 00:14:13.445
problem.
185
00:14:14.065 --> 00:14:28.400
We don't really maybe have all of the domain knowledge of 1 particular turbine, how it should turn, and how it should behave. But we can look at the data and and see and and look for abnormal abnormalities. And with that
186
00:14:28.714 --> 00:14:30.255
example that I was talking about,
187
00:14:30.634 --> 00:14:35.454
you can understand, like, if if the turbine is producing half the energy that it could,
188
00:14:35.910 --> 00:14:38.329
then, of course, this is a huge factor,
189
00:14:38.870 --> 00:14:39.930
economic factor.
190
00:14:40.790 --> 00:15:03.205
And if you find these data points and these these turbines that are not producing enough energy, then then you clearly have a value that you can give to your customers. And so you mentioned that at least up until the last couple of years, that a lot of this data that was being collected with the systems that are embedded into the turbines is being ignored or not analyzed in any great detail.
191
00:15:03.585 --> 00:15:08.405
I'm wondering what the current state of the art is as far as being able to
192
00:15:08.980 --> 00:15:13.080
analyze the performance of the turbines and correct for errors
193
00:15:13.459 --> 00:15:16.760
and do any sort of preventive maintenance to reduce downtime?
194
00:15:17.300 --> 00:15:17.800
Yeah.
195
00:15:18.685 --> 00:15:19.185
So
196
00:15:19.565 --> 00:15:20.785
up to now,
197
00:15:21.165 --> 00:15:26.385
they're standard, at least in Germany, to have 10 minute average values of different,
198
00:15:26.970 --> 00:15:29.630
measurements at the turbine, for instance, wind speed,
199
00:15:30.089 --> 00:15:39.264
power output of the turbine. And so that's the standard. And, basically, these this data has been locked in the past just because of regulations.
200
00:15:39.885 --> 00:15:43.100
For instance, like, if if the turbine is shut down
201
00:15:43.480 --> 00:15:48.460
because of too much energy in the grid, then yeah. In this in the in in in this case,
202
00:15:49.480 --> 00:15:56.525
you have the data to to see and locate there has been such amount of wind before this event, and you
203
00:15:57.225 --> 00:16:05.089
would have generated so and so much energy because of this grid shutdown. And that's why, basically, maybe people were
204
00:16:05.550 --> 00:16:07.410
logging data. But now,
205
00:16:07.870 --> 00:16:18.515
people also under start understanding that you can you can do more with the data. So, also, more data is logged in the newer turbines, and there are more sensors, and the sensors
206
00:16:19.020 --> 00:16:25.440
potentially can not only log 10 minute average values, but also maybe second values or sub second values.
207
00:16:26.334 --> 00:16:30.834
So, potentially, you you can get more data than you could get maybe in the past.
208
00:16:31.615 --> 00:16:35.154
And, yeah, it's like it's a physical system. The turbine is
209
00:16:35.650 --> 00:16:38.230
is is a machine, and you can you can
210
00:16:38.690 --> 00:16:40.950
grab a topic and then look into detail
211
00:16:41.330 --> 00:16:45.495
and look at the into into the data and see if you can optimize something there.
212
00:16:46.375 --> 00:16:47.435
So, yeah, so
213
00:16:48.055 --> 00:16:52.475
just to give that example again to where where you can reduce the
214
00:16:52.790 --> 00:16:53.610
the power,
215
00:16:54.230 --> 00:17:03.834
where where the where the the power of the turbine is reduced because of some regulations or because nobody is noticing it. Yeah. Maybe I can explain a little bit more how we do it. So,
216
00:17:04.295 --> 00:17:07.035
we we basically try to find datasets
217
00:17:07.975 --> 00:17:15.460
that we definitely know about that the turbine is behaving in a good way. So we filter out these datasets, and,
218
00:17:16.179 --> 00:17:18.495
we call them our training dataset.
219
00:17:19.355 --> 00:17:20.015
And then
220
00:17:20.315 --> 00:17:22.575
we train neural networks
221
00:17:22.955 --> 00:17:24.895
on this dataset. And
222
00:17:25.390 --> 00:17:29.570
we have to think about, okay, what physical system makes sense? Like, what is the input
223
00:17:30.030 --> 00:17:30.770
of that
224
00:17:31.230 --> 00:17:33.570
black box formula, and what's the output?
225
00:17:34.030 --> 00:17:35.090
And the input
226
00:17:35.695 --> 00:17:47.770
for the power output can be, of course, the wind speed, but the the energy that is contained in the in the wind is also dependent on on the density of the air, and the density
227
00:17:48.150 --> 00:17:51.690
is dependent on the temperature, for instance. So if you have a value
228
00:17:52.675 --> 00:17:55.095
a time series maybe of wind speed
229
00:17:55.395 --> 00:17:57.735
and and temperatures of the outside air,
230
00:17:58.035 --> 00:18:01.255
then you can use these 2 values as an input
231
00:18:01.770 --> 00:18:02.510
to generate
232
00:18:02.810 --> 00:18:05.790
the power, to to to simulate the power output.
233
00:18:06.330 --> 00:18:09.790
And if you have a dataset where you know, okay, the turbine is behaving correctly,
234
00:18:10.274 --> 00:18:11.894
then you can train a neural network
235
00:18:12.355 --> 00:18:14.855
on that behavior, and then you can simulate
236
00:18:15.475 --> 00:18:16.215
with new
237
00:18:16.835 --> 00:18:17.335
datasets
238
00:18:18.020 --> 00:18:24.200
how does that turbine should have behaved in that scenario, in that physical scenario. And then you can make comparisons.
239
00:18:24.740 --> 00:18:32.975
You can add some more information like status logs and and other European data and service data from the from the maintenance companies
240
00:18:33.355 --> 00:18:34.975
and mix everything together
241
00:18:36.090 --> 00:18:39.710
and create a value out of that. And then as far as the
242
00:18:40.090 --> 00:18:43.285
types of data that you're able to access from the sensors
243
00:18:43.665 --> 00:18:45.685
and the control systems and the turbine,
244
00:18:46.145 --> 00:18:50.670
what are some of the challenges that you're dealing with as far as just the data collection?
245
00:18:51.050 --> 00:18:52.750
And what is the
246
00:18:53.210 --> 00:18:54.190
level of variability
247
00:18:54.570 --> 00:18:55.070
between
248
00:18:55.610 --> 00:18:56.510
different turbines
249
00:18:56.810 --> 00:18:58.305
and different manufacturers, Yeah.
250
00:19:02.557 --> 00:19:03.057
For
251
00:19:07.550 --> 00:19:08.050
Yeah.
252
00:19:08.590 --> 00:19:11.230
For a good write, there there have been,
253
00:19:12.590 --> 00:19:15.025
companies on the market that have had,
254
00:19:15.425 --> 00:19:17.765
specialized exactly for that problem because
255
00:19:18.785 --> 00:19:19.765
every turbine
256
00:19:20.785 --> 00:19:22.245
somehow is a prototype
257
00:19:22.705 --> 00:19:23.205
because
258
00:19:23.950 --> 00:19:27.809
if maybe you let's say you you buy a turbine from manufacturer a,
259
00:19:28.270 --> 00:19:30.610
and you put it in your site,
260
00:19:31.635 --> 00:19:37.655
specific site, and then you have an additional contract for data management with another company.
261
00:19:38.370 --> 00:19:41.270
And so you can imagine how many potential
262
00:19:41.890 --> 00:19:42.390
variations
263
00:19:42.690 --> 00:19:45.670
of of combinations of manufacturers and data
264
00:19:47.495 --> 00:19:48.395
data collecting
265
00:19:49.335 --> 00:19:49.835
computers
266
00:19:50.135 --> 00:19:56.120
there are on the market. And that means that that there's a huge variety of the of the datasets.
267
00:19:56.740 --> 00:19:57.240
So
268
00:19:57.860 --> 00:20:15.380
we also had to learn that in the beginning. And you cannot assume that that if you have 1 turbine type that the data is looking always the same because you don't know if it's been generated by the same type of system. So the best way to deal with that problem is to
269
00:20:15.840 --> 00:20:20.100
to look at each and every turbine as 1 system and and
270
00:20:21.055 --> 00:20:22.515
not make cross correlations
271
00:20:22.895 --> 00:20:26.115
too early with with let's say, if you have 1 turbine
272
00:20:27.055 --> 00:20:30.960
type you want to make cross correlations with with many other turbine types
273
00:20:31.580 --> 00:20:32.460
of the same model
274
00:20:33.020 --> 00:20:34.080
sorry. So the same
275
00:20:34.460 --> 00:20:35.440
turbine type
276
00:20:35.885 --> 00:20:37.745
and make the cross correlations over that,
277
00:20:38.525 --> 00:20:40.865
you you better you're you're better set
278
00:20:41.245 --> 00:20:43.505
if you have, like, for each and every turbine,
279
00:20:44.030 --> 00:20:47.970
a specific model. And that also means, again, that you have a lot
280
00:20:48.670 --> 00:20:55.145
of machine learning models, that you have a lot of data that you need to train. There's a lot of scalability
281
00:20:55.685 --> 00:20:58.540
problems, let's say, that that you have to look to.
282
00:21:00.540 --> 00:21:06.080
And, yeah. And then then, of course, the standard data problems, you have data gaps. You have
283
00:21:06.505 --> 00:21:09.245
data points that are weird, like,
284
00:21:10.265 --> 00:21:12.365
outside temperature of 1, 000 degrees.
285
00:21:13.279 --> 00:21:15.779
So you need to handle that. Or constant
286
00:21:16.720 --> 00:21:17.220
constant,
287
00:21:17.840 --> 00:21:18.899
temperature of
288
00:21:19.440 --> 00:21:21.299
minus 10 during summer.
289
00:21:22.315 --> 00:21:23.355
Doesn't make sense also,
290
00:21:24.075 --> 00:21:30.335
degrees Celsius, of course. Yeah. And and and you need to clean your dataset. I I think every data scientist
291
00:21:30.720 --> 00:21:31.620
knows how problematic
292
00:21:32.000 --> 00:21:32.980
that can be.
293
00:21:33.520 --> 00:21:34.020
And,
294
00:21:35.200 --> 00:21:38.580
yeah, that so that that's this has really been a challenge,
295
00:21:39.445 --> 00:21:42.425
to build some some automated systems that clean
296
00:21:44.005 --> 00:21:44.505
these
297
00:21:44.805 --> 00:21:45.305
very
298
00:21:49.100 --> 00:21:49.760
these datasets
299
00:21:50.060 --> 00:21:50.460
that are,
300
00:21:51.340 --> 00:21:52.560
that have a great variety.
301
00:21:53.260 --> 00:21:57.745
And then in terms of the actual collection of the data, how are you handling
302
00:21:58.205 --> 00:21:59.825
getting it from the turbines?
303
00:22:00.285 --> 00:22:01.985
And how much of the information
304
00:22:02.285 --> 00:22:04.145
are you processing or filtering
305
00:22:05.150 --> 00:22:17.585
on the collection point versus how much you're bringing back into your core service layer for being able to do more aggregate analysis across multiple turbines? Yeah. Yeah. I think in general, you can ask yourself as a data scientist
306
00:22:18.285 --> 00:22:19.745
data science company,
307
00:22:20.285 --> 00:22:22.304
what if you if you delete data,
308
00:22:22.730 --> 00:22:29.870
you delete information. So if you if you say, okay, I don't trust this data point because it has 1, 000 degrees Celsius
309
00:22:30.325 --> 00:22:32.985
outside air temperature. And you can ask yourself, okay,
310
00:22:33.765 --> 00:22:40.830
why is that so? Is it because of the because of the real turbine control system or maybe it's a sensor.
311
00:22:41.130 --> 00:22:42.029
Maybe it is
312
00:22:42.490 --> 00:22:49.789
a calculation error during data collection, and you wanna know that because maybe that's the problem that the turbine has.
313
00:22:50.115 --> 00:22:52.534
Maybe it's a sensor. Maybe the temperature sensor
314
00:22:52.835 --> 00:22:57.255
gives you weird values, and because of that, the turbine is shutting down. So
315
00:22:57.559 --> 00:23:00.679
you need to be with data cleaning, you need to be quite
316
00:23:01.640 --> 00:23:02.700
that's a big point.
317
00:23:03.240 --> 00:23:04.860
If you wanna throw away data,
318
00:23:06.385 --> 00:23:09.665
so what we basically do, we we mark data as as,
319
00:23:10.385 --> 00:23:13.525
not trustable, let's say, and then we can later
320
00:23:13.825 --> 00:23:14.325
reanalyze,
321
00:23:15.060 --> 00:23:17.800
how maybe maybe that's because there's a sensor
322
00:23:18.180 --> 00:23:18.680
error.
323
00:23:19.220 --> 00:23:25.634
And so yeah. So we basically get everything that we that we can get, and then later we we flag data
324
00:23:25.934 --> 00:23:28.355
to be trustworthy or not. And,
325
00:23:29.215 --> 00:23:31.475
so to answer your question, I think the most
326
00:23:31.799 --> 00:23:34.860
preparation of the data is is is been done on the database
327
00:23:35.559 --> 00:23:36.460
that we have.
328
00:23:39.294 --> 00:23:42.755
Today's episode of the data engineering podcast is sponsored by Datadog,
329
00:23:43.135 --> 00:23:46.835
a SaaS based monitoring and analytics platform for cloud scale infrastructure,
330
00:23:47.400 --> 00:23:49.180
applications, logs, and more.
331
00:23:49.800 --> 00:24:00.855
Datadog uses machine learning based algorithms to detect errors and anomalies across your entire stack, which reduces the time it takes to detect and address outages and helps promote collaboration between data engineering,
332
00:24:01.394 --> 00:24:03.174
operations, and the rest of the company.
333
00:24:03.559 --> 00:24:05.179
Go to data engineering podcast.com/datadog
334
00:24:06.759 --> 00:24:13.305
today to start your free 14 day trial. And if you start a trial and install Datadog's agent, they'll send you a free t shirt.
335
00:24:15.385 --> 00:24:19.885
And then as far as the overall system architecture of Turbot,
336
00:24:20.185 --> 00:24:28.500
how have you designed the overall pipeline of being able to go from collection of that remote data at each of the individual turbines
337
00:24:28.800 --> 00:24:30.020
into your central
338
00:24:33.035 --> 00:24:36.895
dashboarding and analysis for your customers and just the overall
339
00:24:37.275 --> 00:24:45.540
life cycle of data as it propagates from the control systems in the turbine through to the analysis that you're delivering to your customers?
340
00:24:46.855 --> 00:24:47.355
Mhmm.
341
00:24:49.095 --> 00:24:50.315
So, basically, we get
342
00:24:50.695 --> 00:24:51.355
the data,
343
00:24:51.975 --> 00:24:53.435
in different time periods,
344
00:24:54.009 --> 00:24:55.950
sometimes in real time, sometimes
345
00:24:57.289 --> 00:25:03.945
every hour, sometimes every day. It depends on on on the customer and whatever the customer has set up in his turbine.
346
00:25:05.365 --> 00:25:05.525
And,
347
00:25:06.405 --> 00:25:07.785
then this data is,
348
00:25:08.725 --> 00:25:11.545
is locked into the database or written into the database.
349
00:25:12.299 --> 00:25:17.760
And then we have different jobs running on the database, cleaning the data, flagging data,
350
00:25:18.380 --> 00:25:18.860
and,
351
00:25:19.419 --> 00:25:22.175
we have jobs that that train the models
352
00:25:22.555 --> 00:25:24.175
that then yeah.
353
00:25:24.475 --> 00:25:25.995
Then jobs that that,
354
00:25:26.635 --> 00:25:28.415
generate simulation data.
355
00:25:28.960 --> 00:25:29.460
Then
356
00:25:30.800 --> 00:25:32.980
we we compare the data with,
357
00:25:33.760 --> 00:25:38.260
so so the simulated data with the real measured data,
358
00:25:39.155 --> 00:25:43.095
then we can detect, we have jobs that detect abnormalities
359
00:25:44.115 --> 00:25:45.015
in these datasets.
360
00:25:45.635 --> 00:25:47.095
And then finally,
361
00:25:48.160 --> 00:25:50.740
1 has to ask you himself, okay.
362
00:25:51.360 --> 00:25:53.140
What is really the value to the customer?
363
00:25:54.160 --> 00:26:00.815
Is it detecting abnormalities, or is it detecting an error? And what does it mean detecting error? Like,
364
00:26:01.275 --> 00:26:04.015
in in the best case, it is something like
365
00:26:04.520 --> 00:26:08.140
a real action point that you can give to your customer, for instance. Okay.
366
00:26:09.000 --> 00:26:09.980
Gearbox temperature
367
00:26:10.280 --> 00:26:12.860
has been too high for the past 2 months,
368
00:26:13.205 --> 00:26:14.105
So you better
369
00:26:14.965 --> 00:26:22.860
send out the service team to check why that is, or maybe you can even tell the customer why, the temperature is so high.
370
00:26:23.820 --> 00:26:24.320
And
371
00:26:24.780 --> 00:26:26.880
this last part, I think, is the most
372
00:26:27.500 --> 00:26:28.480
important part
373
00:26:29.020 --> 00:26:30.400
because there you really
374
00:26:30.860 --> 00:26:31.760
need to understand
375
00:26:32.425 --> 00:26:32.925
the
376
00:26:33.385 --> 00:26:35.725
the your customers. You really need to understand
377
00:26:36.185 --> 00:26:38.425
what's the problem that you're really solving. And,
378
00:26:38.905 --> 00:26:39.405
that's
379
00:26:40.140 --> 00:26:41.120
I think, also,
380
00:26:41.980 --> 00:26:44.320
as a data scientist, sometimes you need to
381
00:26:45.100 --> 00:26:47.355
maybe focus more on your customers
382
00:26:47.915 --> 00:26:52.575
than on on what you what you generate as as data sets. And,
383
00:26:53.674 --> 00:26:54.174
yeah,
384
00:26:55.150 --> 00:26:58.370
you you really need to understand what what are you delivering to your customer.
385
00:26:58.670 --> 00:27:00.130
And on that point too,
386
00:27:00.510 --> 00:27:01.650
how much of
387
00:27:02.005 --> 00:27:09.705
a feedback cycle are you able to build with the Turbot system as far as being able to determine some of these
388
00:27:10.210 --> 00:27:11.030
turbine misalignments,
389
00:27:11.330 --> 00:27:16.710
are you able to then feed that back into the turbine itself to be able to automate some of that correction?
390
00:27:17.144 --> 00:27:30.880
Or does it require generating a notification to your customer who's managing the turbine and the wind farms to then be able to do their own maintenance or operations as far as bringing the turbines into alignment and things like that?
391
00:27:31.660 --> 00:27:32.560
Yeah. So
392
00:27:33.100 --> 00:27:34.320
if you generate
393
00:27:35.245 --> 00:27:38.225
some some action points for your customers, they basically
394
00:27:38.924 --> 00:27:42.865
get an email or a pop up message in our web tool or
395
00:27:43.740 --> 00:27:45.600
in the app and and and then,
396
00:27:46.300 --> 00:27:53.440
they they can they can understand, okay, I have something to solve here, then they can put it in their own schedule
397
00:27:54.525 --> 00:27:57.185
and, solve the problem. And after that,
398
00:27:57.565 --> 00:27:58.945
they can divide
399
00:27:59.325 --> 00:28:00.305
give us feedback.
400
00:28:00.925 --> 00:28:05.570
So there are some basic questions like how how helpful has this been to you or
401
00:28:06.190 --> 00:28:09.409
how relevant has this been to you so that in the next iteration,
402
00:28:10.055 --> 00:28:10.875
we can then,
403
00:28:12.055 --> 00:28:15.115
flag these detected events and and understand,
404
00:28:15.815 --> 00:28:16.315
okay,
405
00:28:16.775 --> 00:28:22.690
if we show this kind of error, how relevant was it to the customer or how how good were we with the prediction
406
00:28:23.150 --> 00:28:25.010
so that we can then improve
407
00:28:25.715 --> 00:28:27.415
the way how we do stuff
408
00:28:27.955 --> 00:28:33.175
or use that labeled data to retrain other neural networks to do some optimizations.
409
00:28:34.010 --> 00:28:39.230
And then the other question too, as far as being able to build useful notifications
410
00:28:39.690 --> 00:28:42.270
is having the necessary domain knowledge
411
00:28:42.855 --> 00:28:49.274
of how the turbines work and the atmospheric conditions that contribute to different performance outcomes.
412
00:28:49.870 --> 00:28:53.730
And I know that you mentioned that you have some background of doing
413
00:28:54.030 --> 00:29:05.434
research and working with turbines. But what are some of the other ways that you're incorporating some of that domain knowledge into your product to ensure that you're able to provide the most value to your customers?
414
00:29:06.050 --> 00:29:07.590
Yeah. I think there are some
415
00:29:08.370 --> 00:29:11.030
some things that you really need to know,
416
00:29:11.650 --> 00:29:12.790
that you have to learn
417
00:29:13.170 --> 00:29:14.630
also as a data scientist.
418
00:29:15.585 --> 00:29:16.085
Like,
419
00:29:16.465 --> 00:29:19.765
let's just give an example. Like, if you don't know that the turbine
420
00:29:20.705 --> 00:29:22.485
has a has a limit,
421
00:29:23.585 --> 00:29:28.230
limited power output. So if there's a lot of wind, the turbine will never produce
422
00:29:29.090 --> 00:29:31.429
more power than, let's say, 3 megawatts.
423
00:29:32.394 --> 00:29:41.455
And it depends on the man manufacturing turbine type. And if you don't know that, then you might think, oh, maybe the turbine is not generating enough
424
00:29:41.800 --> 00:29:42.300
power
425
00:29:42.720 --> 00:29:43.220
and
426
00:29:43.640 --> 00:29:47.580
this is like just an example of some domain knowledge that you
427
00:29:47.880 --> 00:29:52.075
need to know in order to to train the networks correctly and to
428
00:29:52.375 --> 00:29:54.635
to to make the right conclusions out of your
429
00:29:55.015 --> 00:29:57.035
data and, data analytics.
430
00:29:57.440 --> 00:30:00.500
And then sometimes there's also stuff that you
431
00:30:01.919 --> 00:30:04.580
or problems that you cannot really know
432
00:30:05.025 --> 00:30:08.485
if you don't have 20 years of experience as a turbine technician.
433
00:30:08.785 --> 00:30:13.685
And in these cases, we we just have a network of other companies that we work with,
434
00:30:14.309 --> 00:30:14.809
and,
435
00:30:15.190 --> 00:30:16.170
we can then
436
00:30:16.550 --> 00:30:20.630
give them that problem, and then they can analyze it. And,
437
00:30:21.429 --> 00:30:22.730
together with our customers,
438
00:30:23.325 --> 00:30:24.305
they then can
439
00:30:24.845 --> 00:30:25.665
make the
440
00:30:26.285 --> 00:30:29.985
decisions what to do next with that kind of very special problem.
441
00:30:30.340 --> 00:30:31.640
And then as far as
442
00:30:32.020 --> 00:30:41.155
the work that you're doing to build out this product, what are you finding to be some of the most challenging aspects of building an analytics solution for the wind energy sector?
443
00:30:41.455 --> 00:30:42.115
I think
444
00:30:42.495 --> 00:30:45.715
handling so much different data sources is the
445
00:30:47.549 --> 00:30:50.690
was and is the the the biggest problem.
446
00:30:51.549 --> 00:30:54.505
And the second is the the quality
447
00:30:55.445 --> 00:30:56.265
of your data.
448
00:30:56.885 --> 00:30:57.284
And,
449
00:30:57.845 --> 00:31:00.325
you really make you really want to make your
450
00:31:01.830 --> 00:31:03.930
you you want to build a data lake
451
00:31:04.310 --> 00:31:04.810
and
452
00:31:06.790 --> 00:31:08.490
not a data sump.
453
00:31:09.510 --> 00:31:11.465
I don't know if that's a correct word. But,
454
00:31:12.265 --> 00:31:14.924
yeah, you you wanna have a good data pool,
455
00:31:15.225 --> 00:31:19.164
and that's really hard with so many different data sources
456
00:31:19.950 --> 00:31:24.370
that you cannot really trust. And yeah. And then maybe another thing is also making things scalable
457
00:31:25.070 --> 00:31:30.665
is a hard thing. You have different connections to very different to many different turbines,
458
00:31:31.445 --> 00:31:35.465
and Internet connections are breaking down very often. And
459
00:31:36.485 --> 00:31:37.679
this is, like, really
460
00:31:37.980 --> 00:31:38.799
a huge problem.
461
00:31:39.260 --> 00:31:47.285
And are there any particular technologies that you've been able to lean on to help with some of that scalability problem in terms of being able to
462
00:31:47.825 --> 00:31:55.110
handle the data collection and ensure that you're able to get reliable throughput? Yeah. We're we're working together with,
463
00:31:55.510 --> 00:31:57.370
company called, Swarm 64.
464
00:31:58.070 --> 00:31:58.970
And they
465
00:31:59.350 --> 00:31:59.850
basically
466
00:32:00.815 --> 00:32:02.514
managed us to to handle
467
00:32:03.054 --> 00:32:10.195
a lot of data in real time. And and with real time, I really mean real time, like 1 second or sub second values.
468
00:32:10.720 --> 00:32:11.220
And,
469
00:32:12.080 --> 00:32:13.460
they they help us
470
00:32:13.920 --> 00:32:18.100
to solve that scalability problem if you get more data
471
00:32:18.404 --> 00:32:24.825
and even so much data that you, yeah, that that you cannot handle it with usual databases any longer.
472
00:32:25.205 --> 00:32:27.065
And what we also want to
473
00:32:27.730 --> 00:32:31.270
achieve, we want to give feedback to the turbine
474
00:32:31.570 --> 00:32:32.950
in real time. And
475
00:32:33.650 --> 00:32:35.030
for instance, that could be
476
00:32:35.524 --> 00:32:40.424
that you have, 1 turbine standing in front of the wind park, and it's getting wind gust.
477
00:32:40.804 --> 00:32:41.205
And,
478
00:32:42.005 --> 00:32:55.025
that gust is moving through the wind park, And then the first turbine is telling the other turbines, okay, there's coming a wind gust, and you should better behave like this or like that. And, this information is then sent back to Turbot.
479
00:32:55.965 --> 00:33:05.500
The the algorithms are giving the best way how to yaw and pitch the other turbines in the wind park, and, all that is happening in real time. And,
480
00:33:05.900 --> 00:33:16.535
for that, you really need to handle a lot of data very fast. And for cases where you have maybe some sort of weather system coming through an area, are you then also able
481
00:33:17.350 --> 00:33:28.555
to feed that information to other installations of turbines that might be in the path of the weather event in terms of being able to improve their energy output or,
482
00:33:29.175 --> 00:33:29.675
maybe
483
00:33:30.135 --> 00:33:33.140
throttle them so that it prevents potential damage if they're,
484
00:33:33.679 --> 00:33:35.780
especially high wind gust or things like that?
485
00:33:36.640 --> 00:33:37.860
Yeah. Of course. Like,
486
00:33:38.400 --> 00:33:40.340
if there's a very momentarily
487
00:33:40.880 --> 00:33:44.745
wind gust coming to the wind park, you you you could potentially do that.
488
00:33:45.605 --> 00:33:48.745
If there's a huge weather system coming,
489
00:33:49.445 --> 00:33:50.585
that's mainly part,
490
00:33:51.470 --> 00:33:52.290
of the
491
00:33:52.910 --> 00:33:55.570
that that's mainly the job of the grid operators,
492
00:33:56.750 --> 00:33:57.250
or
493
00:33:57.630 --> 00:33:59.570
yeah. Mainly that because
494
00:34:00.014 --> 00:34:10.540
they they need to shut down some turbines in advance because they're knowing, okay, we are gonna produce a lot of energy, and that's too much for the grid. So let's better shut down some of some of the turbines.
495
00:34:11.080 --> 00:34:13.820
And that's actually happening quite often in Germany,
496
00:34:14.335 --> 00:34:15.855
especially in the north and the
497
00:34:16.255 --> 00:34:17.235
at the seaside.
498
00:34:18.175 --> 00:34:22.835
There are some turbines that are shut off 50% of the time, and nobody's using
499
00:34:23.135 --> 00:34:27.299
the energy that that the turbine could potentially generate during these times.
500
00:34:27.760 --> 00:34:30.819
That's another interesting aspect to this system is
501
00:34:31.200 --> 00:34:33.539
the energy storage and energy distribution
502
00:34:34.465 --> 00:34:35.685
capability. I'm wondering
503
00:34:36.225 --> 00:34:42.645
how that factors into some of the decision making that you provide to the turbine operators as far as
504
00:34:43.190 --> 00:34:45.210
ways to ensure that they aren't,
505
00:34:45.670 --> 00:34:49.770
generating excess energy that's going to just get dumped or generating
506
00:34:50.295 --> 00:35:03.569
excess energy that is going to potentially overload their grids or storage systems and ways that you're able to maybe bring that information into the overall equation or some of the other external data sources that you're able to rely on to feed into your models.
507
00:35:04.109 --> 00:35:04.690
I mean,
508
00:35:05.150 --> 00:35:05.650
yeah.
509
00:35:06.190 --> 00:35:09.535
You're right. It's it's quite interesting, and there's tons of
510
00:35:09.995 --> 00:35:13.055
topics and and problems that you could potentially solve.
511
00:35:14.715 --> 00:35:18.810
This particular problem that you're mentioning right right now, we're not solving at the moment.
512
00:35:20.310 --> 00:35:25.145
As far as I know, there are other companies around that that do that that that have specialized
513
00:35:26.325 --> 00:35:26.985
on predicting
514
00:35:28.005 --> 00:35:34.025
weather in the future and predicting the power output of for the grid operators, and then you can
515
00:35:34.569 --> 00:35:35.069
trade
516
00:35:35.450 --> 00:35:37.549
the day head auctions for electricity.
517
00:35:38.569 --> 00:35:39.790
And that's that's
518
00:35:40.809 --> 00:35:43.470
totally another problem that you wanna solve there.
519
00:35:43.775 --> 00:35:48.115
And for us, it's more the the operation of the turbine. And
520
00:35:48.575 --> 00:35:50.595
if you have the turbine running,
521
00:35:50.895 --> 00:35:53.710
let it run the best way it can. And,
522
00:35:54.570 --> 00:35:55.390
I mean, yeah,
523
00:35:55.849 --> 00:35:58.270
I see a lot of potential in in analyzing
524
00:35:59.050 --> 00:36:02.015
also also that kind of data. And and we're also
525
00:36:02.555 --> 00:36:03.855
getting weather information,
526
00:36:05.115 --> 00:36:07.694
data from from a third party source.
527
00:36:08.474 --> 00:36:08.974
But
528
00:36:09.470 --> 00:36:18.610
this is more because we wanna understand the operation of the turbine better and and make the operation of the turbine and the service maintenance better of that turbine.
529
00:36:19.214 --> 00:36:21.954
And then as far as your overall experience
530
00:36:22.415 --> 00:36:32.440
of building out Turbot systems, both from the technical and business aspects, what have you found to be some of the most interesting or unexpected or challenging lessons learned in the process?
531
00:36:33.380 --> 00:36:34.120
Yeah. I think
532
00:36:35.065 --> 00:36:37.005
talking about what I said earlier,
533
00:36:37.545 --> 00:36:40.605
like, the the last question is, I think it's focus,
534
00:36:40.984 --> 00:36:43.165
especially when you're starting a company.
535
00:36:43.600 --> 00:36:46.420
It's quite hard to also as a scientist,
536
00:36:47.120 --> 00:36:57.595
you have so many ideas and you know that so many things are potentially working out. But in order for to bring something to the market, you really need to focus and you really need to understand,
537
00:36:58.375 --> 00:37:05.720
what problem you're solving. And you need to concentrate on on maybe 1 problem first and and do that the best way you can.
538
00:37:06.020 --> 00:37:08.120
And then later on, you can add
539
00:37:08.465 --> 00:37:08.965
more
540
00:37:09.265 --> 00:37:10.645
problems that you solve.
541
00:37:11.025 --> 00:37:15.930
I think that was the the biggest lesson of of the past years. Yeah.
542
00:37:16.650 --> 00:37:41.690
And as you look toward the near to medium term of what you're building out both technically and in the business, what are some of the things that you have planned that you're most excited about or overall trends in the energy sector or technology capabilities that you're looking forward to try and incorporate or take advantage of? Yeah. I'm I'm I'm basically very excited about how much other problems there are in in this data that you could potentially
543
00:37:42.525 --> 00:37:43.345
solve. And
544
00:37:43.724 --> 00:37:44.944
the more we are growing,
545
00:37:45.565 --> 00:37:49.905
and we are able to handle to and and and and to manage
546
00:37:50.270 --> 00:37:51.170
all these different
547
00:37:51.470 --> 00:37:51.970
problems,
548
00:37:52.830 --> 00:37:55.010
the more I'm I'm looking forward because,
549
00:37:55.470 --> 00:37:56.070
yeah, this
550
00:37:56.510 --> 00:37:58.050
it's it's it's really fun.
551
00:37:58.885 --> 00:38:00.345
And basically, the this
552
00:38:00.964 --> 00:38:02.425
is really the real time
553
00:38:03.045 --> 00:38:09.920
control algorithms for the turbine that that fascinate me the most. And I think there's that there's a great, potential,
554
00:38:10.380 --> 00:38:12.800
in the real time operation of of the turbines.
555
00:38:13.420 --> 00:38:14.560
But sometimes it's
556
00:38:14.955 --> 00:38:15.855
sometimes it's
557
00:38:16.875 --> 00:38:21.355
the the basic things that that give the mace the the the most,
558
00:38:21.755 --> 00:38:22.255
value.
559
00:38:23.060 --> 00:38:23.540
And,
560
00:38:24.020 --> 00:38:25.560
it's sometimes technically
561
00:38:25.860 --> 00:38:27.480
not so fancy, but,
562
00:38:28.020 --> 00:38:32.040
you're just solving a basic problem, and that has a great value for your customers.
563
00:38:32.835 --> 00:38:33.335
And,
564
00:38:34.115 --> 00:38:35.015
sometimes that's
565
00:38:35.555 --> 00:38:37.095
that's the the better things.
566
00:38:37.555 --> 00:38:55.015
Well, for anybody who wants to get in touch with you or follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And then as a final question, I would just like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today. I think the biggest gap is really the
567
00:38:55.474 --> 00:38:56.214
the handling
568
00:38:56.595 --> 00:38:59.880
of how to clean the data. So you have
569
00:39:00.420 --> 00:39:02.760
great packages like, let's say, TensorFlow
570
00:39:03.380 --> 00:39:07.815
with which you can train models easily and you can
571
00:39:08.595 --> 00:39:11.235
do almost everything with that. But there's not
572
00:39:12.195 --> 00:39:16.350
I don't know if it's possible, but, there's nothing like a general solution
573
00:39:16.650 --> 00:39:18.430
for cleaning datasets.
574
00:39:18.970 --> 00:39:23.470
I would wish that there's some sort of some some sort of a solution for that.
575
00:39:24.635 --> 00:39:27.994
And maybe it doesn't exist because it's too complicated. But,
576
00:39:28.474 --> 00:39:31.695
I would be super happy if there's a package that does that for you.
577
00:39:33.250 --> 00:39:36.470
Yeah. I'm sure that plenty of people would be happy to see that as well.
578
00:39:36.770 --> 00:39:37.270
Yeah.
579
00:39:38.450 --> 00:39:55.895
Alright. Well, thank you very much for taking the time today to join me and discuss the work that you've been doing with Turbot Systems. It's definitely very interesting problem domain and an interesting technical solution that you're building for it. So I appreciate all the time and energy you've put into that, and I hope you enjoy the rest of your day. Thank you very much. I enjoyed it.
580
00:40:01.315 --> 00:40:04.560
For listening. Don't forget to check out our other show, podcast.init@pythonpodcast.com
581
00:40:07.100 --> 00:40:11.355
to learn about the Python language, its community, and the innovative ways it is being used.
582
00:40:11.755 --> 00:40:13.135
And visit the site at dataengineeringpodcast.com
583
00:40:14.475 --> 00:40:23.920
to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show, then tell us about it. Email hosts at data engineering podcast.com
584
00:40:24.460 --> 00:40:29.760
with your story. And to help other people find the show, please leave a review on Itunes and tell your friends and coworkers.
Опровержение: The podcast and artwork embedded on this page are from Tobias Macey, which is the property of its owner and not affiliated with or endorsed by Listen Notes, Inc.