For those unsure of how to use PDO (coming from the mysql_
functions), I made a very, very simple PDO wrapper that is a single file. It exists to show how easy it is to do all the common things applications need done. Works with PostgreSQL, MySQL, and SQLite.
Basically, read it while you read the manual to see how to put the PDO functions to use in real life to make it simple to store and retrieve values in the format you want.
I want a single column
$count = DB::column('SELECT COUNT(*) FROM `user`);
I want an array(key => value) results (i.e. for making a selectbox)
$pairs = DB::pairs('SELECT `id`, `username` FROM `user`);
I want a single row result
$user = DB::row('SELECT * FROM `user` WHERE `id` = ?', array($user_id));
I want an array of results
$banned_users = DB::fetch('SELECT * FROM `user` WHERE `banned` = ?', array(TRUE));
NOTE: This answer was originally posted at StackOverflow.com by Xeoncross
- Mary answered 12 years ago
I use three different ways to prevent my web application from being vulnerable to SQL injection.
-
Use of
mysql_real_escape_string()
, which is a pre-defined function in PHP, and this code add backslashes to the following characters:\x00
,\n
,\r
,\
,'
,"
and\x1a
. Pass the input values as parameters to minimize the chance of SQL injection. -
Use of MySQLi.
- The most advanced way is to use PDOs.
I hope this will help you.
NOTE: This answer was originally posted at StackOverflow.com by Soumalya Banerjee
- Carolyn answered 12 years ago
- last active 12 years ago
There are many ways of preventing SQL injections and other SQL hacks. You can easily find it on the Internet (Google Search). Of course PDO is one of the good solution. But I would like to suggest you some good links prevention from SQL Injection.
What is SQL injection and how to prevent
Microsoft explanation of SQL injection and prevention in PHP
and some other like Preventing SQL injection with MySQL and PHP
Now, why you do you need to prevent your query from SQL injection?
I would like to let you know: Why do we try for preventing SQL injection with a short example below:
Query for login authentication match:
$query="select * from users where email='".$_POST['email']."' and password='".$_POST['password']."' ";
Now, if someone (a hacker) puts
$_POST['email']= admin@emali.com' OR '1=1
and password anything….
The query will be parsed in the system only upto:
$query="select * from users where email='admin@emali.com' OR '1=1';
The other part will be discarded. So, what will happen? A non-authorized user (hacker) will be able to login as admin without having his password. Now, he can do anything what admin/email person can do. See, it’s very dangerous if SQL injection is not prevented.
NOTE: This answer was originally posted at StackOverflow.com by Manish Shrivastava
- Jeff answered 12 years ago
- last active 12 years ago
Why are you all reinventing the wheel? A simple way to develop is to use some stable and reliable framework such as Joomla, Drupal or ZendFramework. The choice will rely on your needs and only you can select the appropriate one. If you just start using PHP, maybe it’s more difficult to choose because you need to have some basic knowledge about design patterns…but if you are experimented developer, you should already know that!
NOTE: This answer was originally posted at StackOverflow.com by Yoong Kim
- Laurie answered 12 years ago
Using this PHP function mysql_escape_string() you can get a good prevention in a fast way.
For example:
SELECT * FROM users WHERE name = '".mysql_escape_string($name_from_html_form)."'
mysql_escape_string — Escapes a string for use in a mysql_query
For more prevention you can add at the end …
wHERE 1=1 or LIMIT 1
Finally you get:
SELECT * FROM users WHERE name = '".mysql_escape_string($name_from_html_form)."' LIMIT 1
NOTE: This answer was originally posted at StackOverflow.com by Nicolas Finelli
- Christine answered 12 years ago
- last active 12 years ago
Parameterized query AND input validation is the way to go. There is many scenarios under which SQL injection may occur, even though mysql_real_escape_string() has been used.
Those examples are vulnerable to SQL injection :
$offset = isset($_GET['o']) ? $_GET['o'] : 0;
$offset = mysql_real_escape_string($offset);
RunQuery("SELECT userid, username FROM sql_injection_test LIMIT $offset, 10");
or
$order = isset($_GET['o']) ? $_GET['o'] : 'userid';
$order = mysql_real_escape_string($order);
RunQuery("SELECT userid, username FROM sql_injection_test ORDER BY `$order`");
In both case you can’t use ‘ to protect the encapsulation.
source : The Unexpected SQL Injection (When Escaping Is Not Enough)
NOTE: This answer was originally posted at StackOverflow.com by Cedric
- Kathy answered 13 years ago
Injection Prevention – mysql_real_escape_string()
PHP has a specially-made function to prevent these attacks. All you need to do is use the mouthful of a function mysql_real_escape_string.
What mysql_real_escape_string does is take a string that is going to be used in a MySQL query and return the same string with all SQL Injection attempts safely escaped. Basically, it will replace those troublesome quotes(‘) a user might enter with a MySQL-safe substitute, an escaped quote \’.
/NOTE: you must be connected to the database to use this function!
// connect to MySQL
$name_bad = "' OR 1'";
$name_bad = mysql_real_escape_string($name_bad);
$query_bad = "SELECT * FROM customers WHERE username = '$name_bad'";
echo "Escaped Bad Injection: <br />" . $query_bad . "<br />";
$name_evil = "'; DELETE FROM customers WHERE 1 or username = '";
$name_evil = mysql_real_escape_string($name_evil);
$query_evil = "SELECT * FROM customers WHERE username = '$name_evil'";
echo "Escaped Evil Injection: <br />" . $query_evil;
you can find more detail here
http://www.tizag.com/mysqlTutorial/mysql-php-sql-injection.php
NOTE: This answer was originally posted at StackOverflow.com by rahularyansharma
- Vincent answered 13 years ago
- last active 13 years ago
I always use DATETIME fields for anything other than row metadata (date created or modified).
As mentioned in the MySQL documentation:
The DATETIME type is used when you need values that contain both date and time information. MySQL retrieves and displays DATETIME values in ‘YYYY-MM-DD HH:MM:SS’ format. The supported range is ‘1000-01-01 00:00:00’ to ‘9999-12-31 23:59:59’.
…
The TIMESTAMP data type has a range of ‘1970-01-01 00:00:01’ UTC to ‘2038-01-09 03:14:07’ UTC. It has varying properties, depending on the MySQL version and the SQL mode the server is running in.
You’re quite likely to hit the lower limit on TIMESTAMPs in general use — e.g. storing birthdate.
NOTE: This answer was originally posted at StackOverflow.com by scronide
- Christine answered 16 years ago
- last active 12 years ago
From my experiences If you want a date field in which insertion happens only once and u don’t want o have update or any other action on that particular field go with date time .
For example in a user table REGISTRATION DATE filed.
In that user table if u want to know the last logged in time of a particular user go with a filed of timestamp type let that filed updated with a trigger.
NOTE: This answer was originally posted at StackOverflow.com by Kannan Prasad
- Shirley answered 13 years ago
Not sure if this has been mentioned already, but worth noting in MySQL you can use something along the lines of below when creating your table columns
on update CURRENT_TIMESTAMP
This will update the time each instance you modify a row, sometimes very helpful for stored last edit info. This only works with timestamp, not datetime however.
NOTE: This answer was originally posted at StackOverflow.com by leejmurphy
- Rose answered 13 years ago
I always use a UNIX timestamp, simply to maintain sanity when dealing with a lot of datetime info, especially when performing adjustments for timezones, adding/subtracting dates, and the like. When comparing timestamps, this excludes the complicating factors of timezone and allows you to spare resources in your server side processing (Whether it be application code or database queries) in that you make use of light weight arithmetic rather then heavier date-time add/subtract functions.
NOTE: This answer was originally posted at StackOverflow.com by Oliver Holmberg
- Vicki answered 13 years ago
The main difference is that DATETIME is constant while TIMESTAMP is effected by the time_zone setting.
So it only matters when you have – or may in the future have – synchronized clusters across time zones.
In simpler words: If I have a database in Australia, and take a dump of that database to synchronize/populate a database in America, then the TIMESTAMP would update to reflect the real time of the event in the new time zone, while DATETIME would still reflect the time of the event in the au time zone.
A great example of DATETIME being used where TIMESTAMP should have been used is in Facebook, where their servers are never quite sure what time stuff happened across time zones.
Once I was having a conversation in which the time said I was replying to messages before the message was actually sent.
This of course could also have been caused by bad time zone translation in the messaging software if the times were being posted rather than synchronized.
NOTE: This answer was originally posted at StackOverflow.com by ekerner
- Rhonda answered 15 years ago
- last active 14 years ago
I make this decision on a semantic base.
I use a timestamp when I need to record a (more or less) fixed point in time. For example when a record was inserted into the database or when some user action took place.
I use a datetime field when the date/time can be set and changed arbitrarily. For example when a user can save later change appointments.
NOTE: This answer was originally posted at StackOverflow.com by unbeknown
- Bonnie answered 16 years ago
- last active 14 years ago
I recommend PHP Simple HTML DOM Parser
it really has nice features like
foreach($html->find('img') as $element)
echo $element->src . '<br>';
NOTE: This answer was originally posted at StackOverflow.com by user1090298
- Brenda answered 12 years ago
Native XML Extensions
I prefer using one of the native XML extensions since they come bundled with PHP, are usually faster than all the 3rd party libs and give me all the control I need over the markup.
DOM
The DOM extension allows you to operate on XML documents through the DOM API with PHP 5. It is an implementation of the W3C’s Document Object Model Core Level 3, a platform- and language-neutral interface that allows programs and scripts to dynamically access and update the content, structure and style of documents.
DOM is capable of parsing and modifying real world (broken) HTML and it can do XPath queries. It is based on libxml.
It takes some time to get productive with DOM, but that time is well worth IMO. Since DOM is a language agnostic interface, you’ll find implementations in many languages, so if you need to change your programming language, chances are you will already know how to use that language’s DOM API then.
A basic usage example can be found in Grabbing the href attribute of an A element and a general conceptual overview can be found at Noob question about DOMDocument in php
How to use the DOM extension has been covered extensively on StackOverflow, so if you choose to use it, you can be sure most of the issues you run into can be solved by searching/browsing Stack Overflow.
XMLReader
The XMLReader extension is an XML Pull parser. The reader acts as a cursor going forward on the document stream and stopping at each node on the way.
XMLReader, like DOM, is based on libxml. I am not aware on how to trigger the HTML Parser Module, so chances are using XMLReader for parsing broken HTML might be less robust than using DOM where you can explicitly tell it to use libxml’s HTML Parser Module.
A basic usage example can be found at getting all values from h1 tags using php
SimpleXml
The SimpleXML extension provides a very simple and easily usable toolset to convert XML to an object that can be processed with normal property selectors and array iterators.
SimpleXML is an option when you know the HTML is valid XHTML. If you need to parse broken HTML, don’t even consider SimpleXml because it will choke.
A basic usage example can be found at A simple program to CRUD node and node values of xml file and there is lots of additional examples in the PHP Manual.
3rd Party Libraries (libxml based)
If you prefer to use a 3rd party lib, I’d suggest to use a lib that actually uses DOM/libxml underneath instead of String Parsing.
phpQuery
phpQuery is a server-side, chainable, CSS3 selector driven Document Object Model (DOM) API based on jQuery JavaScript Library written in PHP5 and provides additional Command Line Interface (CLI).
Zend_Dom
Zend_Dom provides tools for working with DOM documents and structures. Currently, we offer Zend_Dom_Query, which provides a unified interface for querying DOM documents utilizing both XPath and CSS selectors.
QueryPath
QueryPath is a PHP library for manipulating XML and HTML. It is designed to work not only with local files, but also with web services and database resources. It implements much of the jQuery interface, but it is heavily tuned for server-side use.
FluentDom
FluentDOM ist a jQuery like fluent XML interface for the DOMDocument in PHP.
fDOMDocument
fDOMDocument extends the standard DOM to use exceptions at all occasions of errors instead of PHP warnings or notices. They also add various custom methods and shortcuts for convinience and to simplify the usage of DOM.
3rd Party (not libxml based)
The benefit of building upon DOM/libxml is that you get good performance out of the box because you are based on a native extension. However, not all 3rd party libs go down this route, some of them listed below
SimpleHtmlDom
- A HTML DOM parser written in PHP5+ let you manipulate HTML in a very easy way!
- Require PHP 5+.
- Supports invalid HTML.
- Find tags on an HTML page with selectors just like jQuery.
- Extract contents from HTML in a single line.
I generally do not recommend this parser. The codebase is horrible and the parser itself is rather slow and memory hungry. Any of the libxml based libraries should outperform this easily.
Ganon
- A universal tokenizer and HTML/XML/RSS DOM Parser
- Ability to manipulate elements and their attributes
- Supports invalid HTML and UTF8
- Can perform advanced CSS3-like queries on elements (like jQuery — namespaces supported)
- A HTML beautifier (like HTML Tidy)
- Minify CSS and Javascript
- Sort attributes, change character case, correct indentation, etc.
- Extensible
- Parsing documents using callbacks based on current character/token
- Operations separated in smaller functions for easy overriding
- Fast and Easy
Never used it. Can’t tell if it’s any good.
HTML 5
You can use the above for parsing HTML5, but there can be quirks due to the markup HTML5 allows. So for HTML5 you want to consider using a dedicated parser, like
A Python and PHP implementations of a HTML parser based on the WHATWG HTML5 specification for maximum compatibility with major desktop web browsers.
We might see more dedicated parsers once HTML5 is finalized.
WebServices
If you don’t feel like programming PHP, you can also utilizes Web Services. In general, I found very little utility for these, but that’s just me and my Use Cases.
YQL
The YQL Web Service enables applications to query, filter, and combine data from different sources across the Internet. YQL statements have a SQL-like syntax, familiar to any developer with database experience.
ScraperWiki.
ScraperWiki’s external interface allows you to extract data in the form you want for use on the web or in your own applications. You can also extract information about the state of any scraper.
Regular Expressions
Last and least recommended, you can extract data from HTML with Regular Expressions. In general using Regular Expressions on HTML is discouraged.
Most of the snippets you will find on the web to match markup are brittle. In most cases they are only working for a very particular piece of HTML. Tiny markup changes, like adding a space somewhere, can make the Regex fails when it’s not properly written. You should know what you are doing before using Regex on HTML.
HTML parsers already know the syntactical rules of HTML. Regular expression have to be taught them with each new Regex you write. Regex are fine in some cases, but it really depends on your UseCase.
You can write more reliable parsers, but writing a complete and reliable custom parser with Regular Expressions is a waste of time when the aforementioned libraries already exist and do a much better job on this.
Also see Parsing Html The Cthulhu Way
Books
If you want to spend some money, have a look at
I am not affiliated with PHP Architects or the authors.
NOTE: This answer was originally posted at StackOverflow.com by Gordon
- Pamela answered 14 years ago
- last active 12 years ago
phpQuery and QueryPath are extremely similar in replicating the fluent jQuery API. That’s also why they’re one of the easiest approaches to properly parse HTML in PHP.
Examples for QueryPath
Basically you first create a queryable DOM tree from a HTML string:
$qp = qp("<html><body><h1>title</h1>..."); // or give filename or URL
The resulting object contains a complete tree representation of the HTML document. It can be traversed using DOM methods. But the common approach is to use CSS selectors like in jQuery:
$qp->find("div.classname")->children()->...;
foreach ($qp->find("p img") as $img) {
print qp($img)->attr("src");
}
Mostly you want to use simple #id and .class or DIV tag selectors for ->find(). But you can also use xpath statements, which sometimes are faster. Also typical jQuery methods like ->children() and ->text() and particularily ->attr() simplify extracting the right HTML snippets. (And already have their SGML entities decoded.)
$qp->xpath("//div/p[1]"); // get first paragraph in a div
QueryPath also allows injecting new tags into the stream (->append), and later output and prettify an updated document (->writeHTML). It can not only parse malformed HTML, but also various XML dialects (with namespaces), and even extract data from HTML microformats (XFN, vCard).
$qp->find("a[target=_blank]")->toggleClass("usability-blunder");
.
phpQuery or QueryPath?
Generally QueryPath is better suited for manipulation of documents. While phpQuery also implements some pseudo AJAX methods (just HTTP requests) to more closely resemble jQuery. It is said that phpQuery is often faster than QueryPath (because overall less features).
For further informations on the differences see this comparison:
http://web.archive.org/web/20101230230134/http://www.tagbytag.org/articles/phpquery-vs-querypath (Original source went missing, so here’s an internet archive link. Yes, you can still locate missing pages, people.)
And here’s a comprehensive QueryPath introduction: http://www.ibm.com/developerworks/opensource/library/os-php-querypath/index.html?S_TACT=105AGX01&S_CMP=HP
Advantages
- Simplicity and Reliability
- Simple to use alternatives ->find(“a img, a object, div a”)
- Proper data unescaping (in comparison to regular expression greping)
NOTE: This answer was originally posted at StackOverflow.com by mario
- Dorothy answered 14 years ago
- last active 12 years ago
There is also Goutte (PHP Web Scraper) which is now available :
https://github.com/fabpot/Goutte/
NOTE: This answer was originally posted at StackOverflow.com by Shal
- Jeffery answered 12 years ago
Try the Simple HTML Dom Parser:
// Create DOM from URL or file
$html = file_get_html('http://www.example.com/');
// Find all images
foreach($html->find('img') as $element)
echo $element->src . '<br>';
// Find all links
foreach($html->find('a') as $element)
echo $element->href . '<br>';
NOTE: This answer was originally posted at StackOverflow.com by NAVEED
- Larry answered 14 years ago
- last active 13 years ago
Yes you can use simple_html_dom for the purpose. However I have worked quite a lot with the simple_html_dom, particularly for web scrapping and have found it to be too vulnerable. It does the basic job but I won’t recommend it anyways.
I have never used curl for the purpose but what I have learned is that curl can do the job much more efficiently and is much more solid.
Kindly check out this link: http://spyderwebtech.wordpress.com/2008/08/07/scraping-websites-with-curl/
NOTE: This answer was originally posted at StackOverflow.com by Spoilt
- Margaret answered 13 years ago
we have created quite a few crawlers for our needs before. at the end of the day, it is usually simple regular expressions that do the thing best. while libraries listed above are good for the reason they are created, if you know what you are looking for, regular expressions is more safe way to go, as you can handle also non-valid html/xhtml structures, which would fail, if loaded via most of the parsers.
NOTE: This answer was originally posted at StackOverflow.com by jancha
- Renee answered 13 years ago