Introduction

Many people are pursuing data science as a career (to become a data scientist) choice these days. With the recent data deluge, companies are voraciously headhunting people who can handle, understand, analyze, and model data.

Be it college graduates or experienced professionals, everyone is busy searching for the best courses or training material to become a data scientist. Some of them even manage to learn Python or R, but still can't land their first analytics job!

What most people fail to understand is that the data science/analytics industry isn't just limited to using Python or R. There are several other coding languages which companies use to run their businesses.

Among all, the most important and widely used language is SQL (Structured Query Language). You must learn it.

I've realized that, as a newbie, learning SQL is somewhat difficult at home. After all, setting up a server enabled database engine isn't everybody's cup of tea. Isn't it? Don't you worry.

In this article, we'll learn all about SQL and how to write its queries.

Note: This article is meant to help R users who wants to learn SQL from scratch. Even if you are new to R, you can still check out this tutorial as the ultimate motive is to learn SQL here.

Why learn SQL ?
What is SQL?
Getting Started with SQL
- Data Selection
- Data Manipulation
- Strings & Dates
Practising SQL in R

Machine learning challenge, ML challenge

Why learn SQL ?

Good question! When I started learning SQL, I asked this question too. Though, I had no one to answer me. So, I decided to find it out myself.

SQL is the de facto standard programming language used to handle relational databases.

Let's look at the dominance / popularity of SQL in worldwide analytics / data science industry. According to an online survey conducted by Oreilly Media in 2016, it was found that among all the programming languages, SQL was used by 70% of the respondents followed by R and Python. It was also discovered that people who know Excel (Spreadsheet) tend to get significant salary boost once they learn SQL.

Also, according to a survey done by datasciencecentral, it was inferred that R users tend to get a nice salary boost once they learn SQL. In a way, SQL as a language is meant to complement your current set of skills.

Since 1970, SQL has remained an integral part of popular databases such as Oracle, IBM DB2, Microsoft SQL Server, MySQL, etc. Not only learning SQL with R will increase your employability, but SQL itself can make way for you in database management roles.

What is SQL ?

SQL (Structured Query Language) is a special purpose programming language used to manage, extract, and aggregate data stored in large relational database management systems.

In simple words, think of a large machine (rectangular shape) consisting of many, many boxes (again rectangles). Each box comprises a table (dataset). This is a database. A database is an organized collection of data. Now, this database understands only one language, i.e, SQL. No English, Japanese, or Spanish. Just SQL. Therefore, SQL is a language which interacts with the databases to retrieve data.

Following are some important features of SQL:

It allows us to create, update, retrieve, and delete data from the database.
It works with popular database programs such as Oracle, DB2, SQL Server, etc.
As the databases store humongous amounts of data, SQL is widely known for it speed and efficiency.
It is very simple and easy to learn.
It is enabled with inbuilt string and date functions to execute data-time conversions.

Currently, businesses worldwide use both open source and proprietary relational database management systems (RDBMS) built around SQL.

Getting Started with SQL

Let's try to understand SQL commands now. Most of these commands are extremely easy to pick up as they are simple "English words." But make sure you get a proper understanding of their meanings and usage in SQL context. For your ease of understanding, I've categorized the SQL commands in three sections:

Data Selection - These are SQL's indigenous commands used to retrieve tables from databases supported by logical statements.
Data Manipulation - These commands would allow you to join and generate insights from data.
Strings and Dates - These special commands would allow you to work diligently with dates and string variables.

Before we start, you must know that SQL functions recognize majorly four data types. These are:

Integers - This datatype is assigned to variables storing whole numbers, no decimals. For example, 123,324,90,10,1, etc.
Boolean - This datatype is assigned to variables storing TRUE or FALSE data.
Numeric - This datatype is assigned to variables storing decimal numbers. Internally, it is stored as a double precision. It can store up to 15 -17 significant digits.
Date/Time - This datatype is assigned to variables storing data-time information. Internally, it is stored as a time stamp.

That's all! If SQL finds a variable whose type is anything other than these four, it will throw read errors. For example, if a variable has numbers with a comma (like 432,), you'll get errors. SQL as a language is very particular about the sequence of commands given. If the sequence is not followed, it starts to throw errors. Don't worry I've defined the sequence below. Let's learn the commands. In the following section, we'll learn to use them with a data set.

Data Selection

SELECT - It tells you which columns to select.
FROM - It tells you columns to be selected should be from which table (dataset).
LIMIT - By default, a command is executed on all rows in a table. This command limits the number of rows. Limiting the rows leads to faster execution of commands.
WHERE - This command specifies a filter condition; i.e., the data retrieval has to be done based on some variable filtering.
Comparison Operators - Everyone knows these operators as (=, !=, <, >, <=, >=). They are used in conjunction with the WHERE command.
Logical Operators - The famous logical operators (AND, OR, NOT) are also used to specify multiple filtering conditions. Other operators include:
- LIKE - It is used to extract similar values and not exact values.
- IN - It is used to specify the list of values to extract or leave out from a variable.
- BETWEEN - It activates a condition based on variable(s) in the table.
- IS NULL - It allows you to extract data without missing values from the specified column.
ORDER BY - It is used to order a variable in descending or ascending order.

Data Manipulation

Aggregate Functions - These functions are helpful in generating quick insights from data sets.
- COUNT - It counts the number of observations.
- SUM - It calculates the sum of observations.
- MIN/MAX - It calculates the min/max and the range of a numerical distribution.
- AVG - It calculates the average (mean).
GROUP BY - For categorical variables, it calculates the above stats based on their unique levels.
HAVING - Mostly used for strings to specify a particular string or combination while retrieving data.
DISTINCT - It returns the unique number of observations.
CASE - It is used to create rules using if/else conditions.
JOINS - Used to merge individual tables. It can implement:
- INNER JOIN - Returns the common rows from A and B based on joining criteria.
- OUTER JOIN - Returns the rows not common to A and B.
- LEFT JOIN - Returns the rows in A but not in B.
- RIGHT JOIN - Returns the rows in B but not in A.
- FULL OUTER JOIN - Returns all rows from both tables, often with NULLs.
ON - Used to specify a column for filtering while joining tables.
UNION - Similar to rbind() in R. Combines two tables with identical variable names.

You can write complex join commands using comparison operators, WHERE, or ON to specify conditions.

Strings and Dates

NOW - Returns current time.
LEFT - Returns a specified number of characters from the left in a string.
RIGHT - Returns a specified number of characters from the right in a string.
LENGTH - Returns the length of the string.
TRIM - Removes characters from the beginning and end of the string.
SUBSTR - Extracts part of a string with specified start and end positions.
CONCAT - Combines strings.
UPPER - Converts a string to uppercase.
LOWER - Converts a string to lowercase.
EXTRACT - Extracts date components such as day, month, year, etc.
DATE_TRUNC - Rounds dates to the nearest unit of measurement.
COALESCE - Imputes missing values.

These commands are not case sensitive, but consistency is important. SQL commands follow this standard sequence:

SELECT
FROM
WHERE
GROUP BY
HAVING
ORDER BY
LIMIT

Practising SQL in R

For writing SQL queries, we'll use the sqldf package. It activates SQL in R using SQLite (default) and can be faster than base R for some manipulations. It also supports H2 Java database, PostgreSQL, and MySQL.

You can easily connect database servers using this package and query data. For more details, check the GitHub repo by its author.

When using SQL in R, think of R as the database machine. Load datasets using read.csv or read.csv.sql and start querying. Ready? Let’s begin! Code every line as you scroll. Practice builds confidence.

We'll use the babynames dataset. Install and load it with:

> install.packages("babynames")
> library(babynames)
> str(babynames)

This dataset contains 1.8 million observations and 5 variables. The prop variable is the proportion of a name given in a year. Now, load the sqldf package:

> install.packages("sqldf")
> library(sqldf)

Let’s check the number of rows in this data.

> sqldf("select count(*) from mydata")
#1825433

Ignore the warnings here. Next, let's look at the data — the first 10 rows:

> sqldf("select * from mydata limit 10")

* selects all columns. To select specific variables:

> sqldf("select year, sex, name from mydata limit 10")

To rename a column in the output using AS:

> sqldf("select year, sex as 'Gender' from mydata limit 10")

Filtering data with WHERE and logical conditions:

> sqldf("select year, name, sex as 'Gender' from mydata where sex == 'F' limit 20")
> sqldf("select * from mydata where prop > 0.05 limit 20")
> sqldf("select * from mydata where sex != 'F'")
> sqldf("select year, name, 4 * prop as 'final_prop' from mydata where prop <= 0.40 limit 10")

Ordering data:

> sqldf("select * from mydata order by year desc limit 20")
> sqldf("select * from mydata order by year desc, n desc limit 20")
> sqldf("select * from mydata order by name limit 20")

Filtering with string patterns:

> sqldf("select * from mydata where name like 'Ben%'")
> sqldf("select * from mydata where name like '%man' limit 30")
> sqldf("select * from mydata where name like '%man%'")
> sqldf("select * from mydata where name in ('Coleman','Benjamin','Bennie')")
> sqldf("select * from mydata where year between 2000 and 2014")

Multiple filters with logical operators:

> sqldf("select * from mydata where year >= 1980 and prop < 0.5")
> sqldf("select * from mydata where year >= 1980 and prop < 0.5 order by prop desc")
> sqldf("select * from mydata where name != '%man%' or year > 2000")
> sqldf("select * from mydata where prop > 0.07 and year not between 2000 and 2014")
> sqldf("select * from mydata where n > 10000 order by name desc")

Basic aggregation:

> sqldf("select sum(n) as 'Total_Count' from mydata")
> sqldf("select min(n), max(n) from mydata")
> sqldf("select year, avg(n) as 'Average' from mydata group by year order by Average desc")
> sqldf("select year, count(*) as count from mydata group by year limit 100")
> sqldf("select year, n, count(*) as 'my_count' from mydata where n > 10000 group by year order by my_count desc limit 100")

Using HAVING instead of WHERE for aggregations:

> sqldf("select year, name, sum(n) as 'my_sum' from mydata group by year having my_sum > 10000 order by my_sum desc limit 100")

Counting distinct names:

> sqldf("select count(distinct name) as 'count_names' from mydata")

Creating new columns using CASE (if/else logic):

> sqldf("select year, n, case when year = '2014' then 'Young' else 'Old' end as 'young_or_old' from mydata limit 10")
> sqldf("select *, case when name != '%man%' then 'Not_a_man' when name = 'Ban%' then 'Born_with_Ban' else 'Un_Ban_Man' end as 'Name_Fun' from mydata")

Joining data sets using a key:

> crash <- read.csv.sql("crashes.csv", sql = "select * from file")
> roads <- read.csv.sql("roads.csv", sql = "select * from file")
> sqldf("select * from crash join roads on crash.Road = roads.Road")
> sqldf("select crash.Year, crash.Volume, roads.* from crash left join roads on crash.Road = roads.Road")

Joining with aggregation and multiple keys:

> sqldf("select crash.Year, crash.Volume, roads.* from crash left join roads on crash.Road = roads.Road order by 1")
> sqldf("select crash.Year, crash.Volume, roads.* from crash left join roads on crash.Road = roads.Road where roads.Road != 'US-36' order by 1")
> sqldf("select Road, avg(roads.Length) as 'Avg_Length', avg(N_Crashes) as 'Avg_Crash' from roads join crash using (Road) group by Road")
> roads$Year <- crash$Year[1:5]
> sqldf("select crash.Year, crash.Volume, roads.* from crash left join roads on crash.Road = roads.Road and crash.Year = roads.Year order by 1")

String operations in sqldf with RSQLite extension:

> library(RSQLite)
> help("initExtension")

> sqldf("select name, leftstr(name, 3) as 'First_3' from mydata order by First_3 desc limit 100")
> sqldf("select name, reverse(name) as 'Rev_Name' from mydata limit 100")
> sqldf("select name, rightstr(name, 3) as 'Back_3' from mydata order by First_3 desc limit 100")

Summary

The aim of this article was to help you get started writing queries in SQL using a blend of practical and theoretical explanations. Beyond these queries, SQL also allows you to write subqueries aka nested queries to execute multiple commands in one go. We shall learn about those in future tutorials.

As I said above, learning SQL will not only give you a fatter paycheck but also allow you to seek job profiles other than that of a data scientist. As I always say, SQL is easy to learn but difficult to master. Do practice enough.

In this article, we learned the basics of SQL. We learned about data selection, aggregation, and string manipulation commands in SQL. In addition, we also looked at the industry trend of SQL language to infer if that's the programming language you will promise to learn in your new year resolution. So, will you?

If you get stuck with any query written above, do drop in your suggestions, questions, and feedback in comments below!

I Used AI to Build a "Simple Image Carousel" at VibeCodeArena. It Found 15+ Issues and Taught Me How to Fix Them.

‍

My Learning Journey

I wanted to understand what separates working code from good code. So I used VibeCodeArena.ai to pick a problem statement where different LLMs produce code for the same prompt. Upon landing on the main page of VibeCodeArena, I could see different challenges. Since I was interested in an Image carousal application, I picked the challenge with the prompt "Make a simple image carousel that lets users click 'next' and 'previous' buttons to cycle through images."

‍

Within seconds, I had code from multiple LLMs, including DeepSeek, Mistral, GPT, and Llama. Each code sample also had an objective evaluation score. I was pleasantly surprised to see so many solutions for the same problem. I picked gpt-oss-20b model from OpenAI. For this experiment, I wanted to focus on learning how to code better so either one of the LLMs could have worked. But VibeCodeArena can also be used to evaluate different LLMs to help make a decision about which model to use for what problem statement.

The model had produced a clean HTML, CSS, and JavaScript. The code looked professional. I could see the preview of the code by clicking on the render icon. It worked perfectly in my browser. The carousel was smooth, and the images loaded beautifully.

But was it actually good code?

I had no idea. That's when I decided to look at the evaluation metrics

What I Thought Was "Good Code"

A working image carousel with:

Clean, semantic HTML
Smooth CSS transitions
Keyboard navigation support
ARIA labels for accessibility
Error handling for failed images

It looked like something a senior developer would write. But I had questions:

Was it secure? Was it optimized? Would it scale? Were there better ways to structure it?

Without objective evaluation, I had no answers. So, I proceeded to look at the detailed evaluation metrics for this code

What VibeCodeArena's Evaluation Showed

The platform's objective evaluation revealed issues I never would have spotted:

Security Vulnerabilities (The Scary Ones)

No Content Security Policy (CSP): My carousel was wide open to XSS attacks. Anyone could inject malicious scripts through the image URLs or manipulate the DOM. VibeCodeArena flagged this immediately and recommended implementing CSP headers.

Missing Input Validation: The platform pointed out that while the code handles image errors, it doesn't validate or sanitize the image sources. A malicious actor could potentially exploit this.

Hardcoded Configuration: Image URLs and settings were hardcoded directly in the code. The platform recommended using environment variables instead - a best practice I completely overlooked.

SQL Injection Vulnerability Patterns: Even though this carousel doesn't use a database, the platform flagged coding patterns that could lead to SQL injection in similar contexts. This kind of forward-thinking analysis helps prevent copy-paste security disasters.

Performance Problems (The Silent Killers)

DOM Structure Depth (15 levels): VibeCodeArena measured my DOM at 15 levels deep. I had no idea. This creates unnecessary rendering overhead that would get worse as the carousel scales.

Expensive DOM Queries: The JavaScript was repeatedly querying the DOM without caching results. Under load, this would create performance bottlenecks I'd never notice in local testing.

Missing Performance Optimizations: The platform provided a checklist of optimizations I didn't even know existed:

No DNS-prefetch hints for external image domains
Missing width/height attributes causing layout shift
No preload directives for critical resources
Missing CSS containment properties
No will-change property for animated elements

Each of these seems minor, but together they compound into a poor user experience.

Code Quality Issues (The Technical Debt)

High Nesting Depth (4 levels): My JavaScript had logic nested 4 levels deep. VibeCodeArena flagged this as a maintainability concern and suggested flattening the logic.

Overly Specific CSS Selectors (depth: 9): My CSS had selectors 9 levels deep, making it brittle and hard to refactor. I thought I was being thorough; I was actually creating maintenance nightmares.

Code Duplication (7.9%): The platform detected nearly 8% code duplication across files. That's technical debt accumulating from day one.

Moderate Maintainability Index (67.5): While not terrible, the platform showed there's significant room for improvement in code maintainability.

Missing Best Practices (The Professional Touches)

The platform also flagged missing elements that separate hobby projects from professional code:

No 'use strict' directive in JavaScript
Missing package.json for dependency management
No test files
Missing README documentation
No .gitignore or version control setup
Could use functional array methods for cleaner code
Missing CSS animations for enhanced UX

The "Aha" Moment

Here's what hit me: I had no framework for evaluating code quality beyond "does it work?"

The carousel functioned. It was accessible. It had error handling. But I couldn't tell you if it was secure, optimized, or maintainable.

VibeCodeArena gave me that framework. It didn't just point out problems, it taught me what production-ready code looks like.

My New Workflow: The Learning Loop

This is when I discovered the real power of the platform. Here's my process now:

Step 1: Generate Code Using VibeCodeArena

I start with a prompt and let the AI generate the initial solution. This gives me a working baseline.

Step 2: Analyze Across Several Metrics

I can get comprehensive analysis across:

Security vulnerabilities
Performance/Efficiency issues
Performance optimization opportunities
Code Quality improvements

This is where I learn. Each issue includes explanation of why it matters and how to fix it.

Step 3: Click "Challenge" and Improve

Here's the game-changer: I click the "Challenge" button and start fixing the issues based on the suggestions. This turns passive reading into active learning.

Do I implement CSP headers correctly? Does flattening the nested logic actually improve readability? What happens when I add dns-prefetch hints?

I can even use AI to help improve my code. For this action, I can use from a list of several available models that don't need to be the same one that generated the code. This helps me to explore which models are good at what kind of tasks.

For my experiment, I decided to work on two suggestions provided by VibeCodeArena by preloading critical CSS/JS resources with <link rel="preload"> for faster rendering in index.html and by adding explicit width and height attributes to images to prevent layout shift in index.html. The code editor gave me change summary before I submitted by code for evaluation.

Step 4: Submit for Evaluation

After making improvements, I submit my code for evaluation. Now I see:

What actually improved (and by how much)
What new issues I might have introduced
Where I still have room to grow

Step 5: Hey, I Can Beat AI

My changes helped improve the performance metric of this simple code from 82% to 83% - Yay! But this was just one small change. I now believe that by acting upon multiple suggestions, I can easily improve the quality of the code that I write versus just relying on prompts.

Each improvement can move me up the leaderboard. I'm not just learning in isolation—I'm seeing how my solutions compare to other developers and AI models.

So, this is the loop: Generate → Analyze → Challenge → Improve → Measure → Repeat.

Every iteration makes me better at both evaluating AI code and writing better prompts.

‍

‍

What This Means for Learning to Code with AI

This experience taught me three critical lessons:

1. Working ≠ Good Code

AI models are incredible at generating code that functions. But "it works" tells you nothing about security, performance, or maintainability.

The gap between "functional" and "production-ready" is where real learning happens. VibeCodeArena makes that gap visible and teachable.

2. Improvement Requires Measurement

I used to iterate on code blindly: "This seems better... I think?"

Now I know exactly what improved. When I flatten nested logic, I see the maintainability index go up. When I add CSP headers, I see security scores improve. When I optimize selectors, I see performance gains.

Measurement transforms vague improvement into concrete progress.

3. Competition Accelerates Learning

The leaderboard changed everything for me. I'm not just trying to write "good enough" code—I'm trying to climb past other developers and even beat the AI models.

This competitive element keeps me pushing to learn one more optimization, fix one more issue, implement one more best practice.

How the Platform Helps Me Become A Better Programmer

VibeCodeArena isn't just an evaluation tool—it's a structured learning environment. Here's what makes it effective:

Immediate Feedback: I see issues the moment I submit code, not weeks later in code review.

Contextual Education: Each issue comes with explanation and guidance. I learn why something matters, not just that it's wrong.

Iterative Improvement: The "Challenge" button transforms evaluation into action. I learn by doing, not just reading.

Measurable Progress: I can track my improvement over time—both in code quality scores and leaderboard position.

Comparative Learning: Seeing how my solutions stack up against others shows me what's possible and motivates me to reach higher.

What I've Learned So Far

Through this iterative process, I've gained practical knowledge I never would have developed just reading documentation:

How to implement Content Security Policy correctly
Why DOM depth matters for rendering performance
What CSS containment does and when to use it
How to structure code for better maintainability
Which performance optimizations actually make a difference

Each "Challenge" cycle teaches me something new. And because I'm measuring the impact, I know what actually works.

The Bottom Line

AI coding tools are incredible for generating starting points. But they don't produce high quality code and can't teach you what good code looks like or how to improve it.

VibeCodeArena bridges that gap by providing:

✓ Objective analysis that shows you what's actually wrong
✓ Educational feedback that explains why it matters
✓ A "Challenge" system that turns learning into action
✓ Measurable improvement tracking so you know what works
✓ Competitive motivation through leaderboards

My "simple image carousel" taught me an important lesson: The real skill isn't generating code with AI. It's knowing how to evaluate it, improve it, and learn from the process.

The future of AI-assisted development isn't just about prompting better. It's about developing the judgment to make AI-generated code production-ready. That requires structured learning, objective feedback, and iterative improvement. And that's exactly what VibeCodeArena delivers.

Here is a link to the code for the image carousal I used for my learning journey

#AIcoding #WebDevelopment #CodeQuality #VibeCoding #SoftwareEngineering #LearningToCode

‍

Exclusive SQL Tutorial on Data Analysis in R

Explore this post with:

Introduction

Table of Contents

Why learn SQL ?

What is SQL ?

Getting Started with SQL

Data Selection

Data Manipulation

Strings and Dates

Practising SQL in R

Summary

Subscribe to The HackerEarth Blog

Thank you for subscribing!

Hire top tech talent with our recruitment platform

Discover more articles

How I used VibeCode Arena platform to build code using AI and leant how to improve it

I Used AI to Build a "Simple Image Carousel" at VibeCodeArena. It Found 15+ Issues and Taught Me How to Fix Them.

My Learning Journey

What I Thought Was "Good Code"

What VibeCodeArena's Evaluation Showed

Security Vulnerabilities (The Scary Ones)

Performance Problems (The Silent Killers)

Code Quality Issues (The Technical Debt)

Missing Best Practices (The Professional Touches)

The "Aha" Moment

My New Workflow: The Learning Loop

Step 1: Generate Code Using VibeCodeArena

Step 2: Analyze Across Several Metrics

Step 3: Click "Challenge" and Improve

Step 4: Submit for Evaluation

Step 5: Hey, I Can Beat AI

What This Means for Learning to Code with AI

1. Working ≠ Good Code

2. Improvement Requires Measurement

3. Competition Accelerates Learning

How the Platform Helps Me Become A Better Programmer

What I've Learned So Far

The Bottom Line

The Mobile Dev Hiring Landscape Just Changed

Revolutionizing Mobile Talent Hiring: The HackerEarth Advantage

Introducing a New Era in Mobile Assessment

Assess the Skills That Truly Matter

Streamlining Your Assessment Workflow

Quantifiable Impact on Hiring Success

A Better Experience for Everyone

Unlock a New Era of Mobile Talent Assessment

Vibe Coding: Shaping the Future of Software

A New Era of Code

From Machine Language to Natural Language

The Promise and the Pitfalls

The Economic Impact

Explore HackerEarth’s top products for Hiring & Innovation